I run chess engine tournaments using the Cute Chess GUI.
Structure
- Tournament
- Regular
- 12 Engines per Division
- Type = Round Robin
- Rounds = 100
- Games Per Encounter = 2
- Play Each Opening = 2 Times
- Swap Sides = Checked
- 2,200 Games per Engine = 13,200 Games per Tournament
- Cross-Division
- 6 Engines
- 3 from Upper Division
- 3 from Lower Division
- Type = Round Robin
- Rounds = 100
- Games Per Encounter = 2
- Play Each Opening = 2 Times
- Swap Sides = Checked
- 1,000 Games per Engine = 3,000 Games per Tournament
- Regular
- Time Control
- Moves = Whole Game
- Bullet = 2 min/game + 1 sec/move
- Blitz = 5 min/game + 3 sec/move
- Rapid = 14 min/game + 7 sec/move
- Opening Suite
- PGN / EPD = Arasan.pgn
- Depth = 99 Plies
- Order = Random
- Opening Book = None
- Draw Adjudication (Move Number) = Off
- Resign Adjudication (Move Number) = Off
- Thinking on Opponent’s Time = Unchecked
Multithreading
I configure all engines to run using a single CPU core. My Chess PC has 64 cores (128 logical processors with hyper-threading). I configure Cute Chess to use 60 cores via Tools > Settings > Tournaments > Maximum Number of Concurrent Games. This leaves 4 cores for the operating system and other applications.
Quality Control
Ideally, shortly after a tournament begins, I search its PGN file for the following phrases to find games that terminated prematurely due to buggy engines.
- abandon
- stall
- disconnect
- forfeit
If any engine repeatedly causes such terminations, I remove the engine from my PC (to prevent accidentally including it in future tournaments), download a replacement engine of similar strength, and restart the tournament. If I don’t notice premature terminations until the tournament has progressed significantly, I’ll wait for the tournament to complete, remove the buggy engine’s games (using Hiarcs Chess Explorer Pro), then run a gauntlet tournament for a replacement engine matched against the same opponents of the buggy engine.
This eliminates invalid games that artificially raise the rating of other engines by crediting them a cheap win. I am willing to tolerate a few prematurely terminated games per engine. However, any engine with a failure rate higher than one half of one percent reduces the reliability of my rating lists, so I replace the buggy engine with a reliable engine of similar playing strength.