commonpower.control.environments.ControlEnv
- class ControlEnv(system: ControllableModelEntity, continuous_control: bool = False, episode_length: int = 24, fixed_start: datetime | None = None, normalize_action_space: bool = True, history: ModelHistory | None = None)[source]
Bases:
EnvClass that provides the interface between our power system and any reinforcement learning algorithm. Based on the OpenAI Gym API (which is now maintained as ‘gymnasium’, see https://gymnasium.farama.org/). Manages all RL controllers within the power system.
- Parameters:
system (ControllableModelEntity) – power system including Pyomo model with all constraints
continuous_control (bool) – if true, the environment is never resetted
episode_length (int) – how many environment interaction steps to complete before resetting the environment
fixed_start (datetime) – if None, we will train from multiple random start times. Otherwise, we will always train from the same start time.
normalize_action_space (bool) – whether to normalize the action space to [-1,1]
history (ModelHistory) – logger
- Returns:
ControlEnv
Methods
closeAfter the user has finished using the environment, close contains the code necessary to "clean up" the environment.
get_wrapper_attrGets the attribute name from the environment.
renderCompute the render frames as specified by
render_modeduring the initialization of the environment.Reset the power system to the beginning of an episode (which spans 24 hours).
Passes current action selected by training algorithm to the compute_control_input() function of the BaseController class.
set_modeAdvance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources.
Attributes
metadatanp_randomReturns the environment's internal
_np_randomthat if not set will initialise with a random seed.render_modereward_rangespecunwrappedReturns the base non-wrapped environment.
action_spaceobservation_space- _denormalize_action(action: OrderedDict) OrderedDict[source]
Denormalize action to original input space.
- Parameters:
action (OrderedDict) – normalized action
Returns:
- _get_action_space() Dict[source]
Retrieve action space from RL controllers
- Returns:
gym.spaces.Dict – dictionary of agent IDs and their action spaces
- _get_normalized_action_space() Dict[source]
Normalize all actions to [-1,1]
- Returns:
gym.spaces.Dict – dictionary of agent IDs and their action spaces
- _get_observation_space() Dict[source]
Retrieve observation space from list of RL controllers and their observation masks.
- Returns:
gym.spaces.Dict – dictionary of agent IDs and their observation spaces
- _is_done() Tuple[bool, bool][source]
Determines whether the environment has to be reset. “Done” normally means that a goal has been reached, which is never the case in power systems control. It can also mean that a safety violation occured (which also should not happen in our case, but could be implemented in case we want to let a system fail.) “Truncated” means that we have reach the end of a pre-defined time limit and therefore want to reset. We currently assume that all agents terminate an episode at the same time, as we have a centralized time management
- Returns:
tuple(bool, bool) – Done, truncated
- reset(*, seed: int | None = None, options: dict | None = None) Tuple[dict, dict][source]
Reset the power system to the beginning of an episode (which spans 24 hours).
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. This should be taken care of by calling super().reset(seed=seed) in the first line of this function. (https://gymnasium.farama.org/api/env/#gymnasium.Env.reset)
options – not needed
- Returns:
Tuple –
- tuple containing
observations of all RL agents after reset (dict)
additional information for observations (dict)
- rl_action_callback(ctrl_id: str)[source]
Passes current action selected by training algorithm to the compute_control_input() function of the BaseController class.
- Parameters:
ctrl_id (str) – ID of the controller for which to retrieve the action
- Returns:
dict – actions for all controlled entities assigned to this controller
- step(action: OrderedDict | None) Tuple[dict, dict, bool, bool, dict][source]
Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources. Handled within the System class. The actions of the RL agent are selected within the RL training algorithm and are passed on to the power system using a callback. After the system update, a reward is computed which indicates how good the action selected by the algorithm was in the current state. This reward is passed to the training algorithm to gradually improve the policies of the RL agents.
- Parameters:
action (OrderedDict) – actions of RL agents (here as a dictionary of agent IDs and their respective actions)
- Returns:
Tuple –
- tuple containing:
observations of all RL agents (dict), here as a dictionary of agent IDs and their respective observations
rewards of all RL agents (dict)
whether the episode has terminated (bool). We assume that all agents terminate an episode at the same time, as we have a centralized time management. Always false for continuous control
same as above but the gymnasium API makes a difference between terminated and truncated, which can be useful for other environments but is not needed in our case
additional information (dict)