commonpower.data_forecasting.data_sources.CSVDataSource
- class CSVDataSource(file_path: str, datetime_format: str = '%d.%m.%Y %H:%M', rename_dict: dict = {}, auto_drop: bool = False, resample: timedelta = datetime.timedelta(seconds=3600), aggregation_method: str = 'mean', aggregation_alignment: str = 'future', interpolation_method: str = 'time', **csv_read_kwargs)[source]
Bases:
PandasDataSourceDataSource based on .csv data. It imports from a .csv file, does some preprocessing and stores it in an internal pandas data frame.
- Parameters:
file_path (str) – Path to the source .csv file.
datetime_format (_type_, optional) – Datetime format the source .csv file. Specifically, this refers to the (required) column “t”. Defaults to “%d.%m.%Y %H:%M”.
rename_dict (dict, optional) – Dict to specify column renaming. Format: {“original name”: “new name”, …}. Defaults to {}.
auto_drop (bool, optional) – If set to true, all columns of the source data except those mentioned in rename_dict will be dropped. Defaults to False.
resample (timedelta, optional) – Resamples the source data to this value. If the time interval of the source data is larger than the resample value, the data is interpolated linearly. Defaults to timedelta(hours=1).
aggregation_method (str, optional) – Method by which to aggregate multiple source datapoints for one sample. Parameter to pandas Resampler.aggregate. e.g. ‘mean’, ‘last’, ‘first’, ‘max’ Defaults to ‘mean’
aggregation_alignment (str, optional) – How to align the resampled windows. For easier use, aliases ‘future’ and ‘past’ are provided Parameter to pandas resample(label=, closed=) ‘left’ (‘future’) or ‘right’ (‘past’) Defaults to ‘future’
interpolation_method (str, optional) – Method by which to interpolate fewer source datapoints to more samples Parameter to pandas interpolate() Defaults to ‘time’
Methods
apply_to_columnAllows applying a transformation to a column of the data (using pandas df.apply()).
create_time_featuresCreates time features from the datetime index.
get_date_rangeReturns the date range data is available for.
get_limitsReturns the limits for each variable in the data source.
get_variablesReturns the list of element names that data is available for.
shift_time_seriesShifts time series by a given timedelta.