commonpower.control.runners.SingleAgentTrainer

class SingleAgentTrainer(sys: ~commonpower.core.System, alg_config: ~commonpower.control.configs.algorithms.SB3MetaConfig, global_controller: ~commonpower.control.controllers.OptimalController = <commonpower.control.controllers.OptimalController object>, policy: ~stable_baselines3.common.policies.BasePolicy | None = None, wrapper: ~gymnasium.core.Wrapper | None = None, logger: ~commonpower.control.logging_utils.loggers.BaseLogger | None = None, horizon: ~datetime.timedelta = datetime.timedelta(days=1), episode_length: int = 24, dt: ~datetime.timedelta = datetime.timedelta(seconds=3600), continuous_control: bool = False, history: ~commonpower.modeling.history.ModelHistory | None = None, solver: ~pyomo.opt.base.solvers.OptSolver = <pyomo.solvers.plugins.solvers.gurobi_direct.GurobiDirect object>, save_path: str = './saved_models/test_model', seed: int | None = None, normalize_actions: bool = True, limited_date_range: ~typing.List[~datetime.datetime] | None = None)[source]

Bases: BaseTrainer

Runner for training a single RL agent (with algorithms from the StableBaselines 3 repository).

Parameters:
  • sys (System) – power system to be controlled

  • global_controller (OptimalController) – instance of controller taking over control of all nodes that have not yet been assigned a controller. Mostly used to balance the system using a market node or a generator. Defaults to OptimalController(“global”).

  • alg_config (SB3MetaConfig) – configuration for the RL algorithm and policy to be trained

  • policy (BasePolicy) – policy instance (can be handed over to be retrained)

  • wrapper (gym.Wrapper) – wrapper for the environment that handles the RL agents during training (used for example for single-agent RL control).

  • logger (BaseLogger) – object for handling training logs

  • horizon (timedelta) – amount of time that the controller looks into the future

  • episode_length (int) – number of time steps to simulate before the system is reset during RL training if continuous_control=False

  • dt (timedelta) – control time interval

  • continuous_control (bool) – whether to use an infinite control horizon

  • history (ModelHistory) – logger

  • solver (OptSolver) – solver for optimization problem

  • save_path (str) – local path to folder in which the trained policy will be stored (as .zip file) after the training is finished

  • seed (int) – seed for the global random number generator of numpy (we use np.random.seed(seed) instead

  • generator) (of instantiating our own)

  • normalize_actions (bool) – whether or not to normalize the action space

  • limited_date_range (list) – limits the system’s date range such that we only train over a specific interval

Returns:

SingleAgentTrainer

Methods

finish_run

Terminates run.

prepare_run

Prepare the training by initializing the system and its controllers.

run

Simulates the scenario for a given number of time steps.

set_start_time

Set start time from external.

system_feasible

Check whether the current system set-up is feasible.

_run(n_steps: int = 24)[source]

Runs the single-agent RL training algorithm for a given number of time steps and saves the trained policy.

Returns:

None

finish_run()[source]

Terminates run.

Returns:

None

prepare_run()[source]

Prepare the training by initializing the system and its controllers. Assigns a global controller that takes over control of all entities which require inputs and have not been assigned a controller by the system’s set-up. Sets an initial policy if no pre-trained policy was handed over at instantiation.

Returns:

None