commonpower.control.environments.ControlEnv

class ControlEnv(system: ControllableModelEntity, continuous_control: bool = False, episode_length: int = 24, fixed_start: datetime | None = None, normalize_action_space: bool = True, history: ModelHistory | None = None)[source]

Bases: Env

Class that provides the interface between our power system and any reinforcement learning algorithm. Based on the OpenAI Gym API (which is now maintained as ‘gymnasium’, see https://gymnasium.farama.org/). Manages all RL controllers within the power system.

Parameters:
  • system (ControllableModelEntity) – power system including Pyomo model with all constraints

  • continuous_control (bool) – if true, the environment is never resetted

  • episode_length (int) – how many environment interaction steps to complete before resetting the environment

  • fixed_start (datetime) – if None, we will train from multiple random start times. Otherwise, we will always train from the same start time.

  • normalize_action_space (bool) – whether to normalize the action space to [-1,1]

  • history (ModelHistory) – logger

Returns:

ControlEnv

Methods

close

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

get_wrapper_attr

Gets the attribute name from the environment.

render

Compute the render frames as specified by render_mode during the initialization of the environment.

reset

Reset the power system to the beginning of an episode (which spans 24 hours).

rl_action_callback

Passes current action selected by training algorithm to the compute_control_input() function of the BaseController class.

set_mode

step

Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources.

Attributes

metadata

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

render_mode

reward_range

spec

unwrapped

Returns the base non-wrapped environment.

action_space

observation_space

_denormalize_action(action: OrderedDict) OrderedDict[source]

Denormalize action to original input space.

Parameters:

action (OrderedDict) – normalized action

Returns:

_get_action_space() Dict[source]

Retrieve action space from RL controllers

Returns:

gym.spaces.Dict – dictionary of agent IDs and their action spaces

_get_normalized_action_space() Dict[source]

Normalize all actions to [-1,1]

Returns:

gym.spaces.Dict – dictionary of agent IDs and their action spaces

_get_observation_space() Dict[source]

Retrieve observation space from list of RL controllers and their observation masks.

Returns:

gym.spaces.Dict – dictionary of agent IDs and their observation spaces

_is_done() Tuple[bool, bool][source]

Determines whether the environment has to be reset. “Done” normally means that a goal has been reached, which is never the case in power systems control. It can also mean that a safety violation occured (which also should not happen in our case, but could be implemented in case we want to let a system fail.) “Truncated” means that we have reach the end of a pre-defined time limit and therefore want to reset. We currently assume that all agents terminate an episode at the same time, as we have a centralized time management

Returns:

tuple(bool, bool) – Done, truncated

reset(*, seed: int | None = None, options: dict | None = None) Tuple[dict, dict][source]

Reset the power system to the beginning of an episode (which spans 24 hours).

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. This should be taken care of by calling super().reset(seed=seed) in the first line of this function. (https://gymnasium.farama.org/api/env/#gymnasium.Env.reset)

  • options – not needed

Returns:

Tuple

tuple containing
  • observations of all RL agents after reset (dict)

  • additional information for observations (dict)

rl_action_callback(ctrl_id: str)[source]

Passes current action selected by training algorithm to the compute_control_input() function of the BaseController class.

Parameters:

ctrl_id (str) – ID of the controller for which to retrieve the action

Returns:

dict – actions for all controlled entities assigned to this controller

step(action: OrderedDict | None) Tuple[dict, dict, bool, bool, dict][source]

Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources. Handled within the System class. The actions of the RL agent are selected within the RL training algorithm and are passed on to the power system using a callback. After the system update, a reward is computed which indicates how good the action selected by the algorithm was in the current state. This reward is passed to the training algorithm to gradually improve the policies of the RL agents.

Parameters:

action (OrderedDict) – actions of RL agents (here as a dictionary of agent IDs and their respective actions)

Returns:

Tuple

tuple containing:
  • observations of all RL agents (dict), here as a dictionary of agent IDs and their respective observations

  • rewards of all RL agents (dict)

  • whether the episode has terminated (bool). We assume that all agents terminate an episode at the same time, as we have a centralized time management. Always false for continuous control

  • same as above but the gymnasium API makes a difference between terminated and truncated, which can be useful for other environments but is not needed in our case

  • additional information (dict)