commonpower.control.controllers.RLControllerSB3

class RLControllerSB3(name: str, obs_handler: ~commonpower.control.observation_handling.ObservationHandler | None = None, train: bool = True, device: str = 'cpu', safety_layer=None, cost_callback: ~typing.Callable = <function single_step_cost_callback>, pretrained_policy_path: str | None = None)[source]

Bases: RLBaseController

Controller class for RL agents trained with algorithms from the StableBaselines repository (https://stable-baselines3.readthedocs.io/). Single-agent RL algorithms only!

Base class for reinforcement learning (RL) controllers. Requires a safety layer to ensure constraint satisfaction. For the RL controller, there are two different modes: training and deployment. In training mode, the action is obtained through a callback from the Gym environment. During deployment, the action is computed by propagating the observation through the trained neural network policy. Saving and loading this policy and computing the action in deployment mode depend on the RL algorithm and therefore have to be implemented in the respective subclasses.

Parameters:

name (str) – name of the controller
obs_handler (ObservationHandler) – entity that takes care of processing observations for RL controllers.
train (bool) – whether the controller is in training mode
device (str) – whether to use ‘cpu’ or ‘cuda’ (GPU)
safety_layer (BaseSafetyLayer) – safety layer instance
cost_callback (Callable) – function used within the cost function of the controller to compute additional cost terms
pretrained_policy_path (str) – directory with stored policy parameters of an existing policy

Returns:

RLBaseController

Methods

`act_array_to_dict`	Converts numpy array of actions to dictionary.
`add_entity`	Add a controllable entity to the controller.
`add_system`	When adding a system to a controller, the system tree is searched recursively and all controllable entities that do not yet have a controller are added to 'nodes'.
`clip_to_bounds`	Clips the control inputs to their bounds to avoid numerical errors.
`compute_control_input`	In training mode, the control input is computed within the training algorithm and passed to this controller through a callback.
`detach`	Remove controller from all controlled entities
`filter_history_for_time_period`	Filters all element histories for a given time period
`flatten_obs`	Converts observation dictionary to a numpy array.
`get_cost`	Compute control cost for one time step
`get_id`	Get ID of controller.
`get_input_space`	Derives action space of the controller from the list of its controlled entities.
`get_nodes`	Get controlled nodes.
`get_top_level_nodes`	Retrieve the controlled entities at the highest level in the tree.
`initialize`	Initial set-up of controller and safety layer
`load`	Loading a pre-trained policy from a directory.
`predict_action`	Compute the control action based on a given observation by propagating this observation through the policy network.
`reset_history`	Delete history
`save`	Save neural network policy parameters and structure.
`set_mode`	Set mode to training (True) or deployment.
`set_normalize_inputs`
`update_history`	Insert new data into training history.

Attributes

obs_mask

load(env, config: dict, policy_kwargs: dict | None = None)[source]

Loading a pre-trained policy from a directory.

Parameters:

env (ControlEnv) – The gym environment constructed from the power system the RL algorithm interacts with. Required to construct the neural network policy because it determines the number of inputs (observations) and outputs (actions) of the network.
config (dict) – Configuration for the StableBaselines policy class (also constructs training buffers etc., which is why this also contains algorithm parameters).
policy_kwargs (dict) – Configuration of the actual neural networks of the policy (e.g., number of neurons in the hidden layers of the actor and critic network of an ActorCriticPolicy). Depends on policy type. Consult the StableBaselines documentation (https://stable-baselines3.readthedocs.io/en/master/) for more information.

Returns:

None

predict_action(obs: ndarray, deterministic: bool = True) → ndarray[source]

Compute the control action based on a given observation by propagating this observation through the policy network.

Parameters:

obs (np.ndarray) – observation at current time step (has to be numpy array, not dictionary, since a dictionary cannot be processed by the neural network.)
deterministic (bool) – Whether to use a deterministic action selection algorithm

Returns:

np.ndarray – control action

save(policy: BasePolicy, save_path: str = './saved_models/test_model')[source]

Save neural network policy parameters and structure.

Parameters:

policy (BasePolicy) – policy trained with algorithm from StableBaselines
save_path (str) – where to save the policy parameters

Returns:

None