commonpower.control.environments.ControlEnv

class ControlEnv(system: ControllableModelEntity, continuous_control: bool = False, episode_length: int = 24, fixed_start: datetime | None = None, normalize_action_space: bool = True, history: ModelHistory | None = None)[source]

Bases: Env

Class that provides the interface between our power system and any reinforcement learning algorithm. Based on the OpenAI Gym API (which is now maintained as ‘gymnasium’, see https://gymnasium.farama.org/). Manages all RL controllers within the power system.

Parameters:

system (ControllableModelEntity) – power system including Pyomo model with all constraints
continuous_control (bool) – if true, the environment is never resetted
episode_length (int) – how many environment interaction steps to complete before resetting the environment
fixed_start (datetime) – if None, we will train from multiple random start times. Otherwise, we will always train from the same start time.
normalize_action_space (bool) – whether to normalize the action space to [-1,1]
history (ModelHistory) – logger

Returns:

ControlEnv

Methods

`close`	After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
`get_wrapper_attr`	Gets the attribute name from the environment.
`render`	Compute the render frames as specified by `render_mode` during the initialization of the environment.
`reset`	Reset the power system to the beginning of an episode (which spans 24 hours).
`rl_action_callback`	Passes current action selected by training algorithm to the compute_control_input() function of the BaseController class.
`set_mode`
`step`	Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources.

Attributes

`metadata`
`np_random`	Returns the environment's internal `_np_random` that if not set will initialise with a random seed.
`render_mode`
`reward_range`
`spec`
`unwrapped`	Returns the base non-wrapped environment.
`action_space`
`observation_space`

_denormalize_action(action: OrderedDict) → OrderedDict[source]

Denormalize action to original input space.

Parameters:: action (OrderedDict) – normalized action

Returns:

_get_action_space() → Dict[source]

Retrieve action space from RL controllers

Returns:: gym.spaces.Dict – dictionary of agent IDs and their action spaces

_get_normalized_action_space() → Dict[source]

Normalize all actions to [-1,1]

Returns:: gym.spaces.Dict – dictionary of agent IDs and their action spaces

_get_observation_space() → Dict[source]

Retrieve observation space from list of RL controllers and their observation masks.

Returns:: gym.spaces.Dict – dictionary of agent IDs and their observation spaces

_is_done() → Tuple[bool, bool][source]

Determines whether the environment has to be reset. “Done” normally means that a goal has been reached, which is never the case in power systems control. It can also mean that a safety violation occured (which also should not happen in our case, but could be implemented in case we want to let a system fail.) “Truncated” means that we have reach the end of a pre-defined time limit and therefore want to reset. We currently assume that all agents terminate an episode at the same time, as we have a centralized time management

Returns:: tuple(bool, bool) – Done, truncated

reset(*, seed: int | None = None, options: dict | None = None) → Tuple[dict, dict][source]

Reset the power system to the beginning of an episode (which spans 24 hours).

Parameters:

seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. This should be taken care of by calling super().reset(seed=seed) in the first line of this function. (https://gymnasium.farama.org/api/env/#gymnasium.Env.reset)
options – not needed

Returns:

Tuple –

tuple containing

observations of all RL agents after reset (dict)
additional information for observations (dict)

rl_action_callback(ctrl_id: str)[source]

Passes current action selected by training algorithm to the compute_control_input() function of the BaseController class.

Parameters:: ctrl_id (str) – ID of the controller for which to retrieve the action
Returns:: dict – actions for all controlled entities assigned to this controller

step(action: OrderedDict | None) → Tuple[dict, dict, bool, bool, dict][source]

Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources. Handled within the System class. The actions of the RL agent are selected within the RL training algorithm and are passed on to the power system using a callback. After the system update, a reward is computed which indicates how good the action selected by the algorithm was in the current state. This reward is passed to the training algorithm to gradually improve the policies of the RL agents.

Parameters:

action (OrderedDict) – actions of RL agents (here as a dictionary of agent IDs and their respective actions)

Returns:

Tuple –

tuple containing:

observations of all RL agents (dict), here as a dictionary of agent IDs and their respective observations
rewards of all RL agents (dict)
whether the episode has terminated (bool). We assume that all agents terminate an episode at the same time, as we have a centralized time management. Always false for continuous control
same as above but the gymnasium API makes a difference between terminated and truncated, which can be useful for other environments but is not needed in our case
additional information (dict)