commonpower.control.wrappers.MultiAgentWrapper
- class MultiAgentWrapper(env)[source]
Bases:
WrapperWrapper to standardize ControlEnv to the API for MAPPO/IPPO implementation of the on-policy repository (https://github.com/marlbenchmark/on-policy/tree/main/onpolicy). NOTE: We use our own fork of this repository, see the Readme file.
- Parameters:
env (ControlEnv) – power system environment with multi-agent API
- Returns:
MultiAgentWrapper
Methods
Transforms an action space in the form of a nested dictionary into a list of Box spaces for each agent.
class_nameReturns the class name of the wrapper.
closeCloses the wrapper and
env.get_wrapper_attrGets an attribute from the wrapper and lower environments if name doesn't exist in this object.
Transforms the observation space in the form of a nested dictionary into a list of Box spaces for each agent
renderUses the
render()of theenvthat can be overwritten to change the returned data.Reset the environment
Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources.
wrapper_specGenerates a WrapperSpec for the wrappers.
Attributes
action_spaceReturn the
Envaction_spaceunless overwritten then the wrapperaction_spaceis used.metadataReturns the
Envmetadata.np_randomReturns the
Envnp_randomattribute.observation_spaceReturn the
Envobservation_spaceunless overwritten then the wrapperobservation_spaceis used.render_modeReturns the
Envrender_mode.reward_rangeReturn the
Envreward_rangeunless overwritten then the wrapperreward_rangeis used.specReturns the
Envspecattribute with the WrapperSpec if the wrapper inherits from EzPickle.unwrappedReturns the base environment of the wrapper.
- _unpack_obs(obs: dict) ndarray[source]
Convert dictionary of {agent_id: observation_dict} to a dictonary of {agent_id: flattened observation arrays}.
- Parameters:
obs (dict) – observation dictionary {agent_id: observation_dict}
- Returns:
np.ndarray – flat array of observations
- act_space_dict_to_list(action_space: dict) Tuple[List[Box], dict][source]
Transforms an action space in the form of a nested dictionary into a list of Box spaces for each agent. Returns the original keys to allow re-transformation
- Parameters:
action_space (dict) – nested dictionary of {agent_id: {node_id: {element_id: el_action_space}}}
- Returns:
Tuple –
- tuple containing:
list of flattened agent action spaces (List[gym.spaces.Box])
dictionary with original actions keys from the action space received as an input (dict)
- obs_space_dict_to_list(observation_space: dict) List[Box][source]
Transforms the observation space in the form of a nested dictionary into a list of Box spaces for each agent
- Parameters:
observation_space (dict) – nested dictionary of {agent_id: {node_id: {element_id: el_obs_space}}}
- Returns:
List[gym.spaces.Box] – list of flattened agent observation spaces
- reset(*, seed=None, options=None)[source]
Reset the environment
- Parameters:
seed – seed for the random number generator
options – not needed here
- Returns:
None
- step(action: List[ndarray]) Tuple[List[ndarray], List[float], bool, bool, dict][source]
Advance the environment (in our case, the power system) by one step in time by applying control actions to discrete-time dynamics and updating data sources. Handled within the System class. The actions of the RL agent are selected within the RL training algorithm and are passed on to the power system using a callback. After the system update, a reward is computed which indicates how good the action selected by the algorithm was in the current state. This reward is passed to the training algorithm to gradually improve the policies of the RL agents.
- Parameters:
action (List[np.ndarray]) – actions of RL agents (here as a list of numpy arrays)
- Returns:
Tuple –
- tuple containing:
observations of all RL agents, here as a list of observations of each agent as numpy arrays (list).
rewards of all RL agents (list).
whether the episode has terminated (bool). We assume that all agents terminate an episode at the same time, as we have a centralized time management. Always false for continuous control
same as above (bool), but the gymnasium API makes a difference between terminated and truncated, which can be useful for other environments but is not needed in our case
additional information (dict)