commonpower.control.wrappers.SingleAgentWrapper

class SingleAgentWrapper(env)[source]

Bases: Wrapper

Wrapper to standardize ControlEnv to the API for single-agent RL training with any RL algorithm from the StableBaselines 3 repository.

Parameters:

env (ControlEnv) – power system environment with multi-agent API

Returns:

SingleAgentWrapper

Methods

class_name

Returns the class name of the wrapper.

close

Closes the wrapper and env.

get_wrapper_attr

Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.

render

Uses the render() of the env that can be overwritten to change the returned data.

reset

Reset the environment

step

Step function with the single-agent API (takes numpy array action and outputs numpy array observation)

wrapper_spec

Generates a WrapperSpec for the wrappers.

Attributes

action_space

Return the Env action_space unless overwritten then the wrapper action_space is used.

metadata

Returns the Env metadata.

np_random

Returns the Env np_random attribute.

observation_space

Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render_mode

Returns the Env render_mode.

reward_range

Return the Env reward_range unless overwritten then the wrapper reward_range is used.

spec

Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

unwrapped

Returns the base environment of the wrapper.

_unpack_obs(obs: dict) ndarray[source]

Convert dictionary of {agent_id: observation_dict} to flattened observation array.

Parameters:

obs (dict) – observation dictionary {agent_id: observation_dict}

Returns:

np.ndarray – flat array of observations

reset(*, seed=None, options=None)[source]

Reset the environment

Parameters:
  • seed – seed for the random number generator

  • options – not needed here

Returns:

None

step(action: ndarray) Tuple[ndarray, float, bool, bool, dict][source]

Step function with the single-agent API (takes numpy array action and outputs numpy array observation)

Parameters:

action (np.ndarray) – action selected by the RL policy

Returns:

Tuple

tuple containing:
  • single-agent observation (np.ndarray)

  • single-agent reward (float)

  • whether the environment is terminated (bool)

  • whether environment is truncated. In our case, the same as terminated (bool)

  • additional information (dict)