commonpower.control.controllers.RLBaseController
- class RLBaseController(name: str, obs_handler: ~commonpower.control.observation_handling.ObservationHandler | None = None, train: bool = True, device: str = 'cpu', safety_layer=None, cost_callback: ~typing.Callable = <function single_step_cost_callback>, pretrained_policy_path: str | None = None)[source]
Bases:
BaseControllerBase class for reinforcement learning (RL) controllers. Requires a safety layer to ensure constraint satisfaction. For the RL controller, there are two different modes: training and deployment. In training mode, the action is obtained through a callback from the Gym environment. During deployment, the action is computed by propagating the observation through the trained neural network policy. Saving and loading this policy and computing the action in deployment mode depend on the RL algorithm and therefore have to be implemented in the respective subclasses.
- Parameters:
name (str) – name of the controller
obs_handler (ObservationHandler) – entity that takes care of processing observations for RL controllers.
train (bool) – whether the controller is in training mode
device (str) – whether to use ‘cpu’ or ‘cuda’ (GPU)
safety_layer (BaseSafetyLayer) – safety layer instance
cost_callback (Callable) – function used within the cost function of the controller to compute additional cost terms
pretrained_policy_path (str) – directory with stored policy parameters of an existing policy
- Returns:
RLBaseController
Methods
act_array_to_dictConverts numpy array of actions to dictionary.
add_entityAdd a controllable entity to the controller.
add_systemWhen adding a system to a controller, the system tree is searched recursively and all controllable entities that do not yet have a controller are added to 'nodes'.
clip_to_boundsClips the control inputs to their bounds to avoid numerical errors.
In training mode, the control input is computed within the training algorithm and passed to this controller through a callback.
detachRemove controller from all controlled entities
filter_history_for_time_periodFilters all element histories for a given time period
flatten_obsConverts observation dictionary to a numpy array.
get_costCompute control cost for one time step
get_idGet ID of controller.
get_input_spaceDerives action space of the controller from the list of its controlled entities.
get_nodesGet controlled nodes.
get_top_level_nodesRetrieve the controlled entities at the highest level in the tree.
Initial set-up of controller and safety layer
loadActual forward pass of the current policy.
Delete history
saveSet mode to training (True) or deployment.
Insert new data into training history.
Attributes
obs_mask- compute_control_input(obs: OrderedDict | None = None, input_callback: Callable | None = None) Tuple[OrderedDict, float][source]
In training mode, the control input is computed within the training algorithm and passed to this controller through a callback. It is then verified by the safety layer and adjusted (minimally) in case it violates any constraints. In deployment mode, the control input is computed from the stored neural network policy and verified by the safety layer.
- Parameters:
obs (OrderedDict) – observation at current time point
input_callback (Callable) – only needed in training mode - retrieves action selected within training algorithm
- Returns:
Tuple –
- tuple containing
action (OrderedDict)
penalty for action adjustment performed by safety layer (float).
- predict_action(obs: ndarray, deterministic: bool = True) ndarray[source]
Actual forward pass of the current policy. Needs to be implemented by subclasses.
- Parameters:
obs (np.ndarray) – observation at current time step (has to be numpy array, not dictionary, since a dictionary cannot be processed by the neural network.)
deterministic (bool) – Whether to use a deterministic action selection algorithm
- Returns:
np.ndarray – control action
- set_mode(mode: str)[source]
Set mode to training (True) or deployment.
- Parameters:
mode (str) – ‘train’, ‘test’
- Returns:
None