Why Our AI Wins 85% of Dou Di Zhu Games
Dou Di Zhu (斗地主, "Fight the Landlord") is China's most popular card game, played by hundreds of millions. It's a 3-player asymmetric game where one "Landlord" with extra cards battles two cooperating "Peasants." Building an AI that masters this game requires understanding imperfect information, multi-agent dynamics, and long-horizon planning.
The Challenge
Unlike chess or Go, Dou Di Zhu involves:
- Hidden information — you can't see your opponents' cards
- Asymmetric roles — Landlord plays against a 2-player team
- Combinatorial complexity — thousands of possible card combinations per turn
- Cooperation without communication — Peasants must coordinate implicitly
Our Approach: PPO Self-Play
We use Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm, combined with self-play training. Here's how it works:
- Self-play arena — Three AI agents play against each other continuously, generating millions of game episodes.
- Experience collection — Each game generates training data: states, actions, rewards.
- Policy optimization — PPO updates the policy network to maximize expected rewards while staying close to the previous policy (preventing catastrophic forgetting).
- Opponent pool mixing — We maintain a pool of past checkpoints. The training agent plays against a mix of current and historical versions, preventing overfitting to a single strategy.
Key Results
After training on 500,000+ episodes:
- 85% win rate against greedy heuristic opponents
- 72% win rate against rule-based expert bots
- Learns complex strategies like bomb timing, chain combos, and endgame counting
- Inference time: 12ms per decision on a single GPU
The AI discovered strategies that surprised even experienced Dou Di Zhu players — particularly its aggressive use of "rocket" (双王) as a psychological weapon rather than just a trump card.
What's Next
We're applying the same PPO self-play framework to mahjong variants. The key challenge is the much larger state space (136 tiles vs 54 cards) and the 4-player dynamics. Our Changsha Mahjong model already achieves a 63% win rate, and Sichuan, Riichi, and Guangdong models are in active training.
Want to try our AI? Head to the AI Playground →