commonpower.data_forecasting.data_sources.CSVDataSource

class CSVDataSource(file_path: str, datetime_format: str = '%d.%m.%Y %H:%M', rename_dict: dict = {}, auto_drop: bool = False, resample: timedelta = datetime.timedelta(seconds=3600), aggregation_method: str = 'mean', aggregation_alignment: str = 'future', interpolation_method: str = 'time', **csv_read_kwargs)[source]

Bases: PandasDataSource

DataSource based on .csv data. It imports from a .csv file, does some preprocessing and stores it in an internal pandas data frame.

Parameters:
  • file_path (str) – Path to the source .csv file.

  • datetime_format (_type_, optional) – Datetime format the source .csv file. Specifically, this refers to the (required) column “t”. Defaults to “%d.%m.%Y %H:%M”.

  • rename_dict (dict, optional) – Dict to specify column renaming. Format: {“original name”: “new name”, …}. Defaults to {}.

  • auto_drop (bool, optional) – If set to true, all columns of the source data except those mentioned in rename_dict will be dropped. Defaults to False.

  • resample (timedelta, optional) – Resamples the source data to this value. If the time interval of the source data is larger than the resample value, the data is interpolated linearly. Defaults to timedelta(hours=1).

  • aggregation_method (str, optional) – Method by which to aggregate multiple source datapoints for one sample. Parameter to pandas Resampler.aggregate. e.g. ‘mean’, ‘last’, ‘first’, ‘max’ Defaults to ‘mean’

  • aggregation_alignment (str, optional) – How to align the resampled windows. For easier use, aliases ‘future’ and ‘past’ are provided Parameter to pandas resample(label=, closed=) ‘left’ (‘future’) or ‘right’ (‘past’) Defaults to ‘future’

  • interpolation_method (str, optional) – Method by which to interpolate fewer source datapoints to more samples Parameter to pandas interpolate() Defaults to ‘time’

Methods

apply_to_column

Allows applying a transformation to a column of the data (using pandas df.apply()).

create_time_features

Creates time features from the datetime index.

get_date_range

Returns the date range data is available for.

get_limits

Returns the limits for each variable in the data source.

get_variables

Returns the list of element names that data is available for.

shift_time_series

Shifts time series by a given timedelta.