../_images/header.png

Our current public API. Advanced users can check the PufferLib source for additional utilities, but note that we tend to move these around more often. Contributions welcome!

Emulation#

Wrap your environments for broad compatibility. Supports passing creator functions, classes, or env objects. The API of the returned PufferEnv is the same as Gym/PettingZoo.

class pufferlib.emulation.GymnasiumPufferEnv(env=None, env_creator=None, env_args=[], env_kwargs={})
property render_mode
seed(seed)
reset(seed=None)
step(action)

Execute an action and return (observation, reward, done, info)

render()
close()
class pufferlib.emulation.PettingZooPufferEnv(env=None, env_creator=None, env_args=[], env_kwargs={}, to_puffer=False)
property render_mode
property agents
property possible_agents
property done
observation_space(agent)

Returns the observation space for a single agent

action_space(agent)

Returns the action space for a single agent

reset(seed=None)
step(actions)

Step the environment and return (observations, rewards, dones, infos)

render()
close()

Environments#

All included environments expose make_env and env_creator functions. make_env is the one that you want most of the time. The other one is used to expose e.g. class interfaces for environments that support them so that you can pass around static references.

Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have custom policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies.

The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.ocean.make_env

pufferlib.environments.ocean.torch.Policy

alias of Default

Models#

PufferLib model default policies. They are vanilla PyTorch policies with no custom PufferLib API. Optionally, you can split the forward pass into encode and decode functions. This allows you to use our convenience wrapper for LSTM support.

class pufferlib.models.Default(*args: Any, **kwargs: Any)

Default PyTorch policy. Flattens obs and applies a linear layer.

PufferLib is not a framework. It does not enforce a base class. You can use any PyTorch policy that returns actions and values. We structure our forward methods as encode_observations and decode_actions to make it easier to wrap policies with LSTMs. You can do that and use our LSTM wrapper or implement your own. To port an existing policy for use with our LSTM wrapper, simply put everything from forward() before the recurrent cell into encode_observations and put everything after into decode_actions.

forward(observations)
encode_observations(observations)

Encodes a batch of observations into hidden states. Assumes no time dimension (handled by LSTM wrappers).

decode_actions(hidden, lookup, concat=True)

Decodes a batch of hidden states into (multi)discrete actions. Assumes no time dimension (handled by LSTM wrappers).

class pufferlib.models.LSTMWrapper(*args: Any, **kwargs: Any)

Wraps your policy with an LSTM without letting you shoot yourself in the foot with bad transpose and shape operations. This saves much pain. Requires that your policy define encode_observations and decode_actions. See the Default policy for an example.

forward(x, state)
class pufferlib.models.Convolutional(*args: Any, **kwargs: Any)

The CleanRL default NatureCNN policy used for Atari. It’s just a stack of three convolutions followed by a linear layer

Takes framestack as a mandatory keyword argument. Suggested default is 1 frame with LSTM or 4 frames without.

forward(observations)
encode_observations(observations)
decode_actions(flat_hidden, lookup, concat=None)
class pufferlib.models.ProcgenResnet(*args: Any, **kwargs: Any)

Procgen baseline from the AICrowd NeurIPS 2020 competition Based on the ResNet architecture that was used in the Impala paper.

forward(observations)
encode_observations(x)
decode_actions(hidden, lookup)

linear decoder function

class pufferlib.models.ResidualBlock(*args: Any, **kwargs: Any)
forward(x)
class pufferlib.models.ConvSequence(*args: Any, **kwargs: Any)
forward(x)
get_output_shape()

Vectorization#

Distributed backends for PufferLib-wrapped environments

pufferlib.vector.make(env_creator_or_creators, env_args=None, env_kwargs=None, backend=<class 'pufferlib.vector.Serial'>, num_envs=1, **kwargs)
class pufferlib.vector.Serial(env_creators, env_args, env_kwargs, num_envs, **kwargs)
reset(seed=42)
step(actions)
property num_envs
async_reset(seed=42)
send(actions)
recv()
close()
class pufferlib.vector.Multiprocessing(env_creators, env_args, env_kwargs, num_envs, num_workers=None, batch_size=None, zero_copy=True, **kwargs)

Runs environments in parallel using multiprocessing

Use this vectorization module for most applications

reset(seed=42)
step(actions)
property num_envs
recv()
send(actions)
async_reset(seed=42)
close()
class pufferlib.vector.Ray(env_creators, env_args, env_kwargs, num_envs, num_workers=None, batch_size=None, **kwargs)

Runs environments in parallel on multiple processes using Ray

Use this module for distributed simulation on a cluster.

reset(seed=42)
step(actions)
recv()
send(actions)
async_reset(seed=42)
close()

CleanRL Integration#

Wrap your PyTorch policies for use with CleanRL

class pufferlib.frameworks.cleanrl.Policy(policy)

Wrap a non-recurrent PyTorch model for use with CleanRL

get_value(x, state=None)
get_action_and_value(x, action=None)
forward(x, action=None)

Wrap your PyTorch policies for use with CleanRL but add an LSTM. This requires you to use our policy API. It’s pretty simple – see the default policies for examples.

class pufferlib.frameworks.cleanrl.RecurrentPolicy(policy)

Wrap a recurrent PyTorch model for use with CleanRL

property lstm
get_value(x, state=None)
get_action_and_value(x, state=None, action=None)
forward(x, state=None, action=None)

SB3 Binding#

Minimal CNN + LSTM example included in demo.py

RLlib Binding#

Wrap your policies for use with RLlib (Shelved until RLlib is more stable)