commonpower.control.runners.BaseTrainer

class BaseTrainer(sys: ~commonpower.core.System, global_controller: ~commonpower.control.controllers.OptimalController = <commonpower.control.controllers.OptimalController object>, wrapper: ~gymnasium.core.Wrapper | None = None, horizon: ~datetime.timedelta = datetime.timedelta(days=1), episode_length: int = 24, dt: ~datetime.timedelta = datetime.timedelta(seconds=3600), continuous_control: bool = False, history: ~commonpower.modeling.history.ModelHistory | None = None, solver: ~pyomo.opt.base.solvers.OptSolver = <pyomo.solvers.plugins.solvers.gurobi_direct.GurobiDirect object>, save_path: str = './saved_models/test_model', seed: int | None = None, normalize_actions: bool = True, limited_date_range: ~typing.List[~datetime.datetime] | None = None)[source]

Bases: BaseRunner

Base class for any runner used for training one or multiple reinforcement learning (RL) agents.

Parameters:

sys (System) – power system to be controlled
global_controller (OptimalController) – instance of controller taking over control of all nodes that have not yet been assigned a controller. Mostly used to balance the system using a market node or a generator. Defaults to OptimalController(“global”).
wrapper (gym.Wrapper) – wrapper for the environment that handles the RL agents during training (used for example for single-agent RL control).
horizon (timedelta) – amount of time that the controller looks into the future
episode_length (int) – number of time steps to simulate before the system is reset during RL training if continuous_control=False
dt (timedelta) – control time interval
continuous_control (bool) – whether to use an infinite control horizon
history (ModelHistory) – logger
solver (OptSolver) – solver for optimization problem
save_path (str) – local path to folder in which the trained policy will be stored (as .zip file) after the training is finished
seed (int) – seed for the global random number generator of numpy (we use np.random.seed(seed) instead
generator) (of instantiating our own)
normalize_actions (bool) – whether or not to normalize the action space
limited_date_range (list) – limits the system’s date range such that we only train over a specific interval

Returns:

BaseTrainer

Methods

`finish_run`	Terminates run.
`prepare_run`	In addition to the preparation in BaseRunner, we also instantiate an environment function as an API for the RL training.
`run`	Simulates the scenario for a given number of time steps.
`set_start_time`	Set start time from external.
`system_feasible`	Check whether the current system set-up is feasible.

prepare_run()[source]

In addition to the preparation in BaseRunner, we also instantiate an environment function as an API for the RL training.

Returns: None