commonpower.control.controllers.RLBaseController

class RLBaseController(name: str, obs_handler: ~commonpower.control.observation_handling.ObservationHandler | None = None, train: bool = True, device: str = 'cpu', safety_layer=None, cost_callback: ~typing.Callable = <function single_step_cost_callback>, pretrained_policy_path: str | None = None)[source]

Bases: BaseController

Base class for reinforcement learning (RL) controllers. Requires a safety layer to ensure constraint satisfaction. For the RL controller, there are two different modes: training and deployment. In training mode, the action is obtained through a callback from the Gym environment. During deployment, the action is computed by propagating the observation through the trained neural network policy. Saving and loading this policy and computing the action in deployment mode depend on the RL algorithm and therefore have to be implemented in the respective subclasses.

Parameters:
  • name (str) – name of the controller

  • obs_handler (ObservationHandler) – entity that takes care of processing observations for RL controllers.

  • train (bool) – whether the controller is in training mode

  • device (str) – whether to use ‘cpu’ or ‘cuda’ (GPU)

  • safety_layer (BaseSafetyLayer) – safety layer instance

  • cost_callback (Callable) – function used within the cost function of the controller to compute additional cost terms

  • pretrained_policy_path (str) – directory with stored policy parameters of an existing policy

Returns:

RLBaseController

Methods

act_array_to_dict

Converts numpy array of actions to dictionary.

add_entity

Add a controllable entity to the controller.

add_system

When adding a system to a controller, the system tree is searched recursively and all controllable entities that do not yet have a controller are added to 'nodes'.

clip_to_bounds

Clips the control inputs to their bounds to avoid numerical errors.

compute_control_input

In training mode, the control input is computed within the training algorithm and passed to this controller through a callback.

detach

Remove controller from all controlled entities

filter_history_for_time_period

Filters all element histories for a given time period

flatten_obs

Converts observation dictionary to a numpy array.

get_cost

Compute control cost for one time step

get_id

Get ID of controller.

get_input_space

Derives action space of the controller from the list of its controlled entities.

get_nodes

Get controlled nodes.

get_top_level_nodes

Retrieve the controlled entities at the highest level in the tree.

initialize

Initial set-up of controller and safety layer

load

predict_action

Actual forward pass of the current policy.

reset_history

Delete history

save

set_mode

Set mode to training (True) or deployment.

set_normalize_inputs

update_history

Insert new data into training history.

Attributes

obs_mask

compute_control_input(obs: OrderedDict | None = None, input_callback: Callable | None = None) Tuple[OrderedDict, float][source]

In training mode, the control input is computed within the training algorithm and passed to this controller through a callback. It is then verified by the safety layer and adjusted (minimally) in case it violates any constraints. In deployment mode, the control input is computed from the stored neural network policy and verified by the safety layer.

Parameters:
  • obs (OrderedDict) – observation at current time point

  • input_callback (Callable) – only needed in training mode - retrieves action selected within training algorithm

Returns:

Tuple

tuple containing
  • action (OrderedDict)

  • penalty for action adjustment performed by safety layer (float).

initialize()[source]

Initial set-up of controller and safety layer

predict_action(obs: ndarray, deterministic: bool = True) ndarray[source]

Actual forward pass of the current policy. Needs to be implemented by subclasses.

Parameters:
  • obs (np.ndarray) – observation at current time step (has to be numpy array, not dictionary, since a dictionary cannot be processed by the neural network.)

  • deterministic (bool) – Whether to use a deterministic action selection algorithm

Returns:

np.ndarray – control action

reset_history()[source]

Delete history

Returns:

None

set_mode(mode: str)[source]

Set mode to training (True) or deployment.

Parameters:

mode (str) – ‘train’, ‘test’

Returns:

None

set_normalize_inputs(normalize_inputs: bool)[source]
Parameters:

normalize_inputs (bool) – Whether actions sampled from RL policy are normalized. Needed during deployment.

Returns:

update_history(info_dict: dict)[source]

Insert new data into training history.

Parameters:

info_dict (dict) – dictionary of information that should be written into history

Returns:

None