AI Research

Why Our AI Wins 85% of Dou Di Zhu Games

March 10, 2026 · 5 min read · By Malinguo AI Team

Dou Di Zhu (斗地主, "Fight the Landlord") is China's most popular card game, played by hundreds of millions. It's a 3-player asymmetric game where one "Landlord" with extra cards battles two cooperating "Peasants." Building an AI that masters this game requires understanding imperfect information, multi-agent dynamics, and long-horizon planning.

The Challenge

Unlike chess or Go, Dou Di Zhu involves:

Our Approach: PPO Self-Play

We use Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm, combined with self-play training. Here's how it works:

  1. Self-play arena — Three AI agents play against each other continuously, generating millions of game episodes.
  2. Experience collection — Each game generates training data: states, actions, rewards.
  3. Policy optimization — PPO updates the policy network to maximize expected rewards while staying close to the previous policy (preventing catastrophic forgetting).
  4. Opponent pool mixing — We maintain a pool of past checkpoints. The training agent plays against a mix of current and historical versions, preventing overfitting to a single strategy.

Key Results

After training on 500,000+ episodes:

The AI discovered strategies that surprised even experienced Dou Di Zhu players — particularly its aggressive use of "rocket" (双王) as a psychological weapon rather than just a trump card.

What's Next

We're applying the same PPO self-play framework to mahjong variants. The key challenge is the much larger state space (136 tiles vs 54 cards) and the 4-player dynamics. Our Changsha Mahjong model already achieves a 63% win rate, and Sichuan, Riichi, and Guangdong models are in active training.

Want to try our AI? Head to the AI Playground →