commonpower.control.wrappers.MultiAgentWrapper

class MultiAgentWrapper(env)[source]

Bases: Wrapper

Wrapper to standardize ControlEnv to the API for MAPPO/IPPO implementation of the on-policy repository (https://github.com/marlbenchmark/on-policy/tree/main/onpolicy). NOTE: We use our own fork of this repository, see the Readme file.

Parameters:

env (ControlEnv) – power system environment with multi-agent API

Returns:

MultiAgentWrapper

Methods

act_space_dict_to_list

Transforms an action space in the form of a nested dictionary into a list of Box spaces for each agent.

class_name

Returns the class name of the wrapper.

close

Closes the wrapper and env.

get_wrapper_attr

Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.

obs_space_dict_to_list

Transforms the observation space in the form of a nested dictionary into a list of Box spaces for each agent

render

Uses the render() of the env that can be overwritten to change the returned data.

reset

Reset the environment

step

Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources.

wrapper_spec

Generates a WrapperSpec for the wrappers.

Attributes

action_space

Return the Env action_space unless overwritten then the wrapper action_space is used.

metadata

Returns the Env metadata.

np_random

Returns the Env np_random attribute.

observation_space

Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render_mode

Returns the Env render_mode.

reward_range

Return the Env reward_range unless overwritten then the wrapper reward_range is used.

spec

Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

unwrapped

Returns the base environment of the wrapper.

_unpack_obs(obs: dict) ndarray[source]

Convert dictionary of {agent_id: observation_dict} to a dictonary of {agent_id: flattened observation arrays}.

Parameters:

obs (dict) – observation dictionary {agent_id: observation_dict}

Returns:

np.ndarray – flat array of observations

act_space_dict_to_list(action_space: dict) Tuple[List[Box], dict][source]

Transforms an action space in the form of a nested dictionary into a list of Box spaces for each agent. Returns the original keys to allow re-transformation

Parameters:

action_space (dict) – nested dictionary of {agent_id: {node_id: {element_id: el_action_space}}}

Returns:

Tuple

tuple containing:
  • list of flattened agent action spaces (List[gym.spaces.Box])

  • dictionary with original actions keys from the action space received as an input (dict)

obs_space_dict_to_list(observation_space: dict) List[Box][source]

Transforms the observation space in the form of a nested dictionary into a list of Box spaces for each agent

Parameters:

observation_space (dict) – nested dictionary of {agent_id: {node_id: {element_id: el_obs_space}}}

Returns:

List[gym.spaces.Box] – list of flattened agent observation spaces

reset(*, seed=None, options=None)[source]

Reset the environment

Parameters:
  • seed – seed for the random number generator

  • options – not needed here

Returns:

None

step(action: List[ndarray]) Tuple[List[ndarray], List[float], bool, bool, dict][source]

Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources. Handled within the System class. The actions of the RL agent are selected within the RL training algorithm and are passed on to the power system using a callback. After the system update, a reward is computed which indicates how good the action selected by the algorithm was in the current state. This reward is passed to the training algorithm to gradually improve the policies of the RL agents.

Parameters:

action (List[np.ndarray]) – actions of RL agents (here as a list of numpy arrays)

Returns:

Tuple

tuple containing:
  • observations of all RL agents, here as a list of observations of each agent as numpy arrays (list).

  • rewards of all RL agents (list).

  • whether the episode has terminated (bool). We assume that all agents terminate an episode at the same time, as we have a centralized time management. Always false for continuous control

  • same as above (bool), but the gymnasium API makes a difference between terminated and truncated, which can be useful for other environments but is not needed in our case

  • additional information (dict)