commonpower.control.runners.SingleAgentTrainer

class SingleAgentTrainer(sys: ~commonpower.core.System, alg_config: ~commonpower.control.configs.algorithms.SB3MetaConfig, global_controller: ~commonpower.control.controllers.OptimalController = <commonpower.control.controllers.OptimalController object>, policy: ~stable_baselines3.common.policies.BasePolicy | None = None, wrapper: ~gymnasium.core.Wrapper | None = None, logger: ~commonpower.control.logging_utils.loggers.BaseLogger | None = None, horizon: ~datetime.timedelta = datetime.timedelta(days=1), episode_length: int = 24, dt: ~datetime.timedelta = datetime.timedelta(seconds=3600), continuous_control: bool = False, history: ~commonpower.modeling.history.ModelHistory | None = None, solver: ~pyomo.opt.base.solvers.OptSolver = <pyomo.solvers.plugins.solvers.gurobi_direct.GurobiDirect object>, save_path: str = './saved_models/test_model', seed: int | None = None, normalize_actions: bool = True, limited_date_range: ~typing.List[~datetime.datetime] | None = None)[source]

Bases: BaseTrainer

Runner for training a single RL agent (with algorithms from the StableBaselines 3 repository).

Parameters:

sys (System) – power system to be controlled
global_controller (OptimalController) – instance of controller taking over control of all nodes that have not yet been assigned a controller. Mostly used to balance the system using a market node or a generator. Defaults to OptimalController(“global”).
alg_config (SB3MetaConfig) – configuration for the RL algorithm and policy to be trained
policy (BasePolicy) – policy instance (can be handed over to be retrained)
wrapper (gym.Wrapper) – wrapper for the environment that handles the RL agents during training (used for example for single-agent RL control).
logger (BaseLogger) – object for handling training logs
horizon (timedelta) – amount of time that the controller looks into the future
episode_length (int) – number of time steps to simulate before the system is reset during RL training if continuous_control=False
dt (timedelta) – control time interval
continuous_control (bool) – whether to use an infinite control horizon
history (ModelHistory) – logger
solver (OptSolver) – solver for optimization problem
save_path (str) – local path to folder in which the trained policy will be stored (as .zip file) after the training is finished
seed (int) – seed for the global random number generator of numpy (we use np.random.seed(seed) instead
generator) (of instantiating our own)
normalize_actions (bool) – whether or not to normalize the action space
limited_date_range (list) – limits the system’s date range such that we only train over a specific interval

Returns:

SingleAgentTrainer

Methods

`finish_run`	Terminates run.
`prepare_run`	Prepare the training by initializing the system and its controllers.
`run`	Simulates the scenario for a given number of time steps.
`set_start_time`	Set start time from external.
`system_feasible`	Check whether the current system set-up is feasible.

_run(n_steps: int = 24)[source]

Runs the single-agent RL training algorithm for a given number of time steps and saves the trained policy.

Returns:: None

finish_run()[source]

Terminates run.

Returns:: None

prepare_run()[source]

Prepare the training by initializing the system and its controllers. Assigns a global controller that takes over control of all entities which require inputs and have not been assigned a controller by the system’s set-up. Sets an initial policy if no pre-trained policy was handed over at instantiation.

Returns:: None