
Our current public API. Advanced users can check the PufferLib source for additional utilities, but note that we tend to move these around more often. Contributions welcome!
Emulation#
Wrap your environments for broad compatibility. Supports passing creator functions, classes, or env objects. The API of the returned PufferEnv is the same as Gym/PettingZoo.
- class pufferlib.emulation.GymnasiumPufferEnv(env=None, env_creator=None, env_args=[], env_kwargs={})
- property render_mode
- seed(seed)
- reset(seed=None)
- step(action)
Execute an action and return (observation, reward, done, info)
- render()
- close()
- class pufferlib.emulation.PettingZooPufferEnv(env=None, env_creator=None, env_args=[], env_kwargs={}, to_puffer=False)
- property render_mode
- property agents
- property possible_agents
- property done
- observation_space(agent)
Returns the observation space for a single agent
- action_space(agent)
Returns the action space for a single agent
- reset(seed=None)
- step(actions)
Step the environment and return (observations, rewards, dones, infos)
- render()
- close()
Environments#
All included environments expose make_env and env_creator functions. make_env is the one that you want most of the time. The other one is used to expose e.g. class interfaces for environments that support them so that you can pass around static references.
Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have custom policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies.
The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.ocean.make_env
- pufferlib.environments.ocean.torch.Policy
alias of
Default
Models#
PufferLib model default policies. They are vanilla PyTorch policies with no custom PufferLib API. Optionally, you can split the forward pass into encode and decode functions. This allows you to use our convenience wrapper for LSTM support.
- class pufferlib.models.Default(*args: Any, **kwargs: Any)
Default PyTorch policy. Flattens obs and applies a linear layer.
PufferLib is not a framework. It does not enforce a base class. You can use any PyTorch policy that returns actions and values. We structure our forward methods as encode_observations and decode_actions to make it easier to wrap policies with LSTMs. You can do that and use our LSTM wrapper or implement your own. To port an existing policy for use with our LSTM wrapper, simply put everything from forward() before the recurrent cell into encode_observations and put everything after into decode_actions.
- forward(observations)
- encode_observations(observations)
Encodes a batch of observations into hidden states. Assumes no time dimension (handled by LSTM wrappers).
- decode_actions(hidden, lookup, concat=True)
Decodes a batch of hidden states into (multi)discrete actions. Assumes no time dimension (handled by LSTM wrappers).
- class pufferlib.models.LSTMWrapper(*args: Any, **kwargs: Any)
Wraps your policy with an LSTM without letting you shoot yourself in the foot with bad transpose and shape operations. This saves much pain. Requires that your policy define encode_observations and decode_actions. See the Default policy for an example.
- forward(x, state)
- class pufferlib.models.Convolutional(*args: Any, **kwargs: Any)
The CleanRL default NatureCNN policy used for Atari. Itβs just a stack of three convolutions followed by a linear layer
Takes framestack as a mandatory keyword argument. Suggested default is 1 frame with LSTM or 4 frames without.
- forward(observations)
- encode_observations(observations)
- decode_actions(flat_hidden, lookup, concat=None)
- class pufferlib.models.ProcgenResnet(*args: Any, **kwargs: Any)
Procgen baseline from the AICrowd NeurIPS 2020 competition Based on the ResNet architecture that was used in the Impala paper.
- forward(observations)
- encode_observations(x)
- decode_actions(hidden, lookup)
linear decoder function
- class pufferlib.models.ResidualBlock(*args: Any, **kwargs: Any)
- forward(x)
- class pufferlib.models.ConvSequence(*args: Any, **kwargs: Any)
- forward(x)
- get_output_shape()
Vectorization#
Distributed backends for PufferLib-wrapped environments
- pufferlib.vector.make(env_creator_or_creators, env_args=None, env_kwargs=None, backend=<class 'pufferlib.vector.Serial'>, num_envs=1, **kwargs)
- class pufferlib.vector.Serial(env_creators, env_args, env_kwargs, num_envs, **kwargs)
- reset(seed=42)
- step(actions)
- property num_envs
- async_reset(seed=42)
- send(actions)
- recv()
- close()
- class pufferlib.vector.Multiprocessing(env_creators, env_args, env_kwargs, num_envs, num_workers=None, batch_size=None, zero_copy=True, **kwargs)
Runs environments in parallel using multiprocessing
Use this vectorization module for most applications
- reset(seed=42)
- step(actions)
- property num_envs
- recv()
- send(actions)
- async_reset(seed=42)
- close()
- class pufferlib.vector.Ray(env_creators, env_args, env_kwargs, num_envs, num_workers=None, batch_size=None, **kwargs)
Runs environments in parallel on multiple processes using Ray
Use this module for distributed simulation on a cluster.
- reset(seed=42)
- step(actions)
- recv()
- send(actions)
- async_reset(seed=42)
- close()
CleanRL Integration#
Wrap your PyTorch policies for use with CleanRL
- class pufferlib.frameworks.cleanrl.Policy(policy)
Wrap a non-recurrent PyTorch model for use with CleanRL
- get_value(x, state=None)
- get_action_and_value(x, action=None)
- forward(x, action=None)
Wrap your PyTorch policies for use with CleanRL but add an LSTM. This requires you to use our policy API. Itβs pretty simple β see the default policies for examples.
- class pufferlib.frameworks.cleanrl.RecurrentPolicy(policy)
Wrap a recurrent PyTorch model for use with CleanRL
- property lstm
- get_value(x, state=None)
- get_action_and_value(x, state=None, action=None)
- forward(x, state=None, action=None)
SB3 Binding#
Minimal CNN + LSTM example included in demo.py
RLlib Binding#
Wrap your policies for use with RLlib (Shelved until RLlib is more stable)