Web-based Multi-Agent Reinforcement Learning: The AI Tag Game Journey

1/12/2026
Reinforcement LearningRay RLlibPettingZooPythonONNX

1. The Beginning: Inspiration from Arc Raiders

Recently, I read some development articles from Embark Studios about the game 'Arc Raiders'. I was deeply impressed by their approach: rather than injecting specific patterns, they drop the AI into harsh physics-based terrains with a survival condition ("reach the destination or die"), allowing it to learn and adapt on its own.

Related Links:

I wanted to implement this idea in an accessible, lightweight web environment. Deferring complex planning, I jumped straight into coding with a goal of a progressively evolving 3-stage AI model:

  • Stage 1: Basic function where the AI performs specific objectives.
  • Stage 2: Player vs. AI competitive mode.
  • Stage 3: A cooperative mode where the player and an AI ally team up against a formidable enemy AI.

2. Setting Up the RL Environment: Tech Stack

Approaching reinforcement learning for the first time, my mindset was focused on "utilizing tools effectively" rather than dwelling on grandiose theory.

  • Python 3.14.0 & PyTorch: For building the baseline models.
  • Gymnasium: The standard API for single-agent environments.
  • PettingZoo: For creating 'multi-agent' environments simulating interactions like predators and prey.
  • Ray RLlib: For distributed reinforcement learning training utilizing multiple CPU cores.
  • ONNX: The format used to export trained models into the web environment.

3. Trials During Training: The Lazy Agent Problem

Contrary to the expectation that the AI would quickly dominate humans, I encountered my first major hurdle: the 'Lazy Agent' phenomenon.

The AI fell into a local minimum strategy: "Instead of moving and taking penalties for failing, it's better to stay still and minimize the loss."

This problem was exacerbated when controlling all agents with a 'Single Brain'. To reduce the overall loss, some agents intentionally sacrificed themselves—a bizarre behavior to observe. This highlighted the urgent need for distributed learning.

# Example of multi-agent setup utilizing PettingZoo + Ray RLlib
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from ray.tune.registry import register_env
 
def env_creator(args):
    return PettingZooEnv(my_custom_env.env())
 
register_env("my_env", env_creator)

4. Evolution of the Predator: How to Catch a Human?

"Can't it just block the entrance to win?" Merely increasing the predator's speed or adding simple penalties was one-dimensional and boring. I wanted the win rate to be balanced at a tense 50:50. To achieve this, I bestowed unique abilities to each agent to balance the gameplay.

Furthermore, conducting real-time learning on server interactions was too resource-intensive for the web. Instead, I solved the progressive growth through model evolution stages:

  • Stage 1 (Basic Predator): Players can easily survive by finding a hiding spot. If careless, they get caught.
  • Stage 2 (Searching Predator): When the player disappears from sight, it actively searches likely hiding spots.
  • Stage 3 (Pattern Learner): Deploys a predator pre-trained to counter the player's preferred escape routes. It essentially invalidates any human exact-path tricks.

During this process, ONNX model conversion was quite tricky. After several failed attempts, I resolved it by integrating the ONNX conversion process directly into the training script (train_rllib.py).


5. Conclusion and Retrospective

Beyond simply mimicking an idea, independently discovering environmental flaws—like "merely blocking the exit shouldn't guarantee a win"—and balancing the agents was immensely satisfying.

Seeing the trained model successfully run dynamically on the web gave me a huge sense of accomplishment. Next, I plan to explore how to integrate these reinforcement learning components into other services, like e-commerce.