commonpower.data_forecasting.data_sources.PandasDataSource

class PandasDataSource(data: DataFrame, frequency: timedelta = datetime.timedelta(seconds=3600))[source]

Bases: DataSource

Data source based on a pandas dataframe.

Parameters:
  • data (pd.DataFrame) – Dataframe containing the data. The index needs to be a datetime index.

  • frequency (timedelta, optional) – Frequency of the data. Defaults to timedelta(hours=1).

Methods

apply_to_column

Allows applying a transformation to a column of the data (using pandas df.apply()).

create_time_features

Creates time features from the datetime index.

get_date_range

Returns the date range data is available for.

get_limits

Returns the limits for each variable in the data source.

get_variables

Returns the list of element names that data is available for.

shift_time_series

Shifts time series by a given timedelta.

__call__(from_time: datetime, to_time: datetime) ndarray[source]

Return the data in this date range.

Parameters:
  • from_time (datetime) – Start time of observation.

  • to_time (datetime) – End time of observation.

Returns:

np.ndarray – Data of shape (n_horizon, n_vars).

apply_to_column(column: str, fcn: callable) PandasDataSource[source]

Allows applying a transformation to a column of the data (using pandas df.apply()).

Parameters:
  • column (str) – Column to manipulate.

  • fcn (callable) – Transformation to apply. The fcn needs to take one argument which refers to the value of a cell: fcn(x).

Returns:

DataSource – self

create_time_features(month: bool = True, day: bool = True, hour: bool = True) PandasDataSource[source]

Creates time features from the datetime index. The features are encoded cyclically via sin and cos transformations. The created features are (if enabled): month_sin, month_cos, day_sin, day_cos, hour_sin, hour_cos

Parameters:
  • month (bool, optional) – If True, the month is added as a feature. Defaults to True.

  • day (bool, optional) – If True, the weekday is added as a feature. Defaults to True.

  • hour (bool, optional) – If True, the hour is added as a feature. Defaults to True.

Returns:

PandasDataSource – self

get_date_range() List[datetime][source]

Returns the date range data is available for.

Returns:

List[datetime] – [start_date, end_date]

get_limits() dict[str, tuple[float, float]][source]

Returns the limits for each variable in the data source.

Returns:

dict[str, tuple[float, float]]

{“element1”: (lower_bound, upper_bound),

”element2”: (lower_bound, upper_bound)}

get_variables() List[str][source]

Returns the list of element names that data is available for.

Returns:

List[str] – List of available elements.

shift_time_series(shift_by: timedelta) DataSource[source]

Shifts time series by a given timedelta. The shift is done in a rolling fashing such that the start and end timestamps do not change. Can be used to simulate more diverse data.

Parameters:

shift_by (timedelta) – Time delta to shift by. Posititve values shift into the “future”, negative into the “past”.

Returns:

DataSource – self