Testing Strength Reduction Parameters

I played an entertaining game against MadChess a few evenings ago. Prior to the game, I adjusted MadChess’ strength reduction parameters because I felt their values caused the engine to play too strongly for a given Elo rating. Perhaps my adjustments made the engine too weak. I’m using “feel”, a very unscientific process. Nevertheless, I enjoyed the game.

I played white. MadChess played black, set to 900 Elo. Time control is blitz, 5m + 5s. The game began 1.e4 c5 2.Nf3 g6 3.d4 d6 4.dxc5 Qa5+ 5.Nc3 Qxc5 6.Nd5 Qc6.

Here I missed a tactic. Can you spot it?

I didn’t see it. The tactic is available on this move and my next three moves because MadChess didn’t “see” the tactic either. See Search Speed for an explanation of how MadChess’ strength reduction algorithm affects the engine’s “sight”.

The game continued 7.Be2 e5 8.O-O h6 9.Be3 Be7 10.c4 Kf8 11.b3 Be6 12.Rc1 Na6 13.Qd2 Qc8 14.Ne1 Bd8 15.Nd3 Qd7 16.f4 exf4 17.N3xf4 Kg7

r2b2nr/pp1q1pk1/n2pb1pp/3N4/2P1PN2/1P2B3/P2QB1PP/2R2RK1 w - - 1 18

Let’s light this fuse. Bd4+!

Material is even, but clearly I’m winning. However, winning “won” games at blitz time control is not guaranteed at my patzer skill level. There’s always a chance I blunder material back to my opponent. Or run out of time. Pushing those concerns aside, I pressed my advantage, didn’t make any egregious errors, and managed my time well.

After I won the game, I gave Komodo Dragon two seconds per position to analyze the game. I then played through the game with Komodo Dragon and MadChess running at full strength, each displaying four best moves (MultiPV = 4). I explored a few variations where I didn’t understand why a particular move didn’t work, consulted the engines, and added refutation lines and comments to the game.

Banks 96th Amateur Series Division 7

MadChess 3.1 participated in Graham Banks’ 96th amateur tournament in division 7.

MadChess 3.1 won the tournament!

                           1    2    3    4    5    6    7    8    9    0    1    2    
1   MadChess 3.1 64-bit    **** ½0½½ 1½0½ ½½½½ 001½ 1½11 1111 ½011 11½1 ½½11 111½ ½½1½  29.5/44
2   Inanis 1.1.0 64-bit    ½1½½ **** 1½0½ ½1½½ 1001 ½½0½ ½01½ 111½ 01½1 ½1½1 ½111 ½½½1  27.5/44
3   Devel 4.0.2.3          0½1½ 0½1½ **** 0½½½ 010½ ½110 1½01 ½1½1 ½½½½ 1101 1½½½ 11½½  25.5/44
4   Odonata 0.6.2 64-bit   ½½½½ ½0½½ 1½½½ **** ½011 0½½½ 0101 1½½½ ½0½1 1½01 111½ 0011  24.5/44
5   Zevra 2.5 64-bit       110½ 0110 101½ ½100 **** ½0½½ 11½0 ½½1½ ½½1½ 1½½0 ½1½½ 0½½1  24.0/44  522.25
6   Lozza 2.4 64-bit       0½00 ½½1½ ½001 1½½½ ½1½½ **** ½1½½ ½½½½ ½0½1 0½11 ½½10 1½11  24.0/44  500.25
7   Supernova 2.4 64-bit   0000 ½10½ 0½10 1010 00½1 ½0½½ **** ½1½1 ½011 11½½ 10½0 ½½11  21.5/44
8   Blunder 8.5.5 64-bit   ½100 000½ ½0½0 0½½½ ½½0½ ½½½½ ½0½0 **** 111½ ½1½½ 011½ ½111  21.0/44
9   Delocto 200419 64-bit  00½0 10½0 ½½½½ ½1½0 ½½0½ ½1½0 ½100 000½ **** 1½1½ 0½11 ½½1½  19.5/44
10  KnightX 3.5 64-bit     ½½00 ½0½0 0010 0½10 0½½1 1½00 00½½ ½0½½ 0½0½ **** 1½11 ½½11  18.0/44
11  Myrddin 0.89 64-bit    000½ ½000 0½½½ 000½ ½0½½ ½½01 01½1 100½ 1½00 0½00 **** 1½11  16.0/44
12  EggNog 4.0 64-bit      ½½0½ ½½½0 00½½ 1100 1½½0 0½00 ½½00 ½000 ½½0½ ½½00 0½00 ****  13.0/44

Games

Chess.com Hans Niemann Report

Hans Niemann at the 2022 Sinquefield Cup.

Chess.com released their Hans Niemann Report. In the report, the Chess.com Fair Play Team concludes Hans Niemann “has likely cheated in more than 100 online chess games, including several prize money events.”

Fair Play Team’s Report

The report includes several tables and charts of statistical evidence, a description of Chess.com’s cheat detection system, along with numerous emails between the Fair Play Team and Hans. In his email responses to questions asked by the Fair Play Team, Hans confesses to cheating online.

Regarding Hans’ game against Magnus Carlsen at the 2022 Sinquefield Cup, the Fair Play Team states, “In our view, this game and the surrounding behaviors and explanations are bizarre… However, we are currently unaware of any evidence that Hans cheated in this game, and we do not advocate for any conclusions regarding cheating being made based on this one encounter.”

In our view, [the Niemann – Carlsen Sinquefield Cup] game and the surrounding behaviors and explanations are bizarre.

The report stresses Chess.com’s cheat detection system was built to detect cheating in online play, not over the board (OTB) play. Nonetheless, regarding Niemann’s rapidly rising FIDE rating, the Fair Play team states, “his results are statistically extraordinary.” With regard to OTB cheat detection, the report states, “Chess.com has historically not been involved in OTB or classical chess fair play decisions, as we do not run OTB or classical chess events… We have shared our findings with FIDE and will cooperate with any investigation or requests they pursue. It is our belief that OTB event organizers should be taking much stronger precautions against cheating by all players to ensure fair play.”

I posted the following comment on the TalkChess forum.

My Thoughts (Posted on TalkChess)

My previous TalkChess forum post (Fri Sep 09, 2022 12:20am UTC) and MadChess blog post:


[Hans] seems to be playing a character: the wild-eyed, misunderstood genius who cannot suffer fools and is impatient with the unimaginative, bourgeois chess establishment.

Whether this is based on…

  1. Derogatory comments Hans endured while stuck in the 2400s (not smart enough, not talented enough, etc) that inspired him to seek revenge by doubling-down his efforts, leading to a legitimate rise to the elite level, or…
  2. Hans realizing there’s no audience for the bad-boy, trash-talking villain in the 2400s; that no one cares unless the villain rises to the top and knocks the princes and kings off their thrones; so Hans decided to leverage computer assistance to get himself on the stage.

… is unclear to me at this time.


I read the entire Hans Niemann Report. I am leaning towards explanation #2. The evidence of Hans cheating online is damning. As a consequence, I question the man’s motivations. Does he want to improve his skill or does he want attention and adulation? On the other hand, the over the board (OTB) evidence demonstrates serious abnormalities but is not conclusive.

I just can’t get past the man’s cocky attitude, snarky interviews, and rage and bravado. I can’t reconcile it with his incoherent post-game analysis. Especially when compared to Vassily Ivanchuk’s famous post-game analysis of an entire game completely from memory with no visual board to prompt him. I’m skeptical that Hans’ zest for put-downs and reticence to discuss details of his OTB thought process are simply an odd personality quirk or a manifestation of some-can-do-but-can’t-explain.

It’s looking more and more likely he simply shifted arenas for his devious conquest from online to OTB. Perhaps for vengeance, fame, money, or a taste of each. Or maybe just for the amusement of punking us all, satisfying some psychological need known only to provocateurs, as I suggested with my earlier reference to Andy Kaufman.

Having achieved a respectable degree of proficiency professionally and in my chess engine hobby, I have come to appreciate the hard-won knowledge earned by a long slog up the learning curve. I respect expert opinion. So I find it exceedingly difficult to toss aside the concerns of Magnus Carlsen, Ian Nepomniachtchi, Fabiano Caruana, Hikaru Nakamura, and other high-caliber players who have stated or insinuated there’s something suspect with Hans’ chess. It’s erratic, disjointed, alien.

I realize what I’ve expressed here is subjective opinion. While I spend most of my professional time and energy on more objective matters of software development, I live in a world of human beings. In my experience it’s personally and professionally valuable to form opinions of people’s character. I- we all- must navigate their idiosyncrasies, disguised motivations, veils of personal mythology, calculations of political expediency, etc.

I’d stay the hell away from a character like Hans Niemann.

Follow the Story

Follow discussion of the report here: Carlsen Withdrawal After Loss to Niemann.

MadChess Blunders… Er, Wait

Gene Wilder as Dr. Frankenstein

I had a Frankenstein moment yesterday. I use the word “Frankenstein” in reference to MadChess. Meaning, I’ve used my powers of concentration to create this thing, a chess engine, by imagining, thinking, then writing code to “teach” it to play a game of chess. This thing becomes terribly powerful. It’s capable of playing the game I “taught” it with the strength of a super Grandmaster, far exceeding my abilities.

While I’m glad I can use MadChess to analyze my games against Internet opponents (after the game is complete, of course), and rely on MadChess’ suggested improvements and variations (it is a Grandmaster after all), it would be nice to play a game against it. For that to be enjoyable, though, I must handicap MadChess so it plays weaker and gives me a fighting chance to win.

The Universal Chess Interface (UCI), a protocol supported by chess applications that enables users to connect the application to any chess engine, supports handicapping an engine via the UCI_LimitStrength and UCI_Elo options. The user selects a playing strength (Elo rating). It’s up to the engine’s author to determine how to simulate weak play at the specified Elo rating.

In MadChess’ source code, I accomplish this via four techniques:

  1. Removing knowledge (of piece mobility, king safety, pawn structure, etc).
  2. Decreasing search speed (examine, say, 1,500 positions per second instead of two million).
  3. Randomly selecting an inferior move at most e centipawns worse than the best move (where e varies inversely with Elo rating).
  4. Occasionally (p percent of moves), selecting a blunder (seriously inferior move) at most b centipawns worse (where b is much greater than e).

See The MadChess UCI_LimitStrength Algorithm for details.

So I’m playing a 5m + 5s blitz game against MadChess. I’ve handicapped MadChess by limiting its strength to 900 Elo. As the author of the engine I know this implies MadChess will blunder on approximately 11% of its moves. This blunder can be as severe as approximately 400 centipawns, which is greater than the value of a knight or bishop. I know this because I wrote the code and devised the handicapping formulas.

I play white, MadChess plays black. The game begins 1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 h5 5.Nc3 Nf6 6.Bc4 Na5 7.Be2 e5 8.Nf3 Bb4 9.Bd2 Rh6 10.Nxe5 Rh8 11.Nc4 Rb8 12.O-O Nxc4 13.Bxc4 Qc7 14.Qe2 Ng4 15.g3 Qd8 16.a3 Bd6 17.Nd5 Bc5 18.Qf3.

1rbqk2r/pp1p1pp1/8/2bN3p/2B1P1n1/P4QP1/1PPB1P1P/R4RK1 b - - 4 18

I’m up a pawn. Now MadChess threatens my queen and light-squared bishop via 18… Ne5. I evade the threat to my queen and protect my bishop by playing 19. Qc3, eyeing black’s undefended knight on e5 and pawn on g7. MadChess persists, threatening my queen again via 19… Bd4.

1rbqk2r/pp1p1pp1/8/3Nn2p/2BbP3/P1Q3P1/1PPB1P1P/R4RK1 w - - 7 20

Er, wait… that move makes no sense. Doesn’t it simply blunder the bishop? I examine the board and determine black’s bishop on d4 is unprotected by any of its pieces. “Oh,” I think to myself, “this must be MadChess’ limit-strength algorithm kicking in. The engine is making an idle threat but hanging its bishop. This is well within the 400 centipawn blunder it may play once every nine moves or so.” I capture the black bishop via 20. Qxd4.

1rbqk2r/pp1p1pp1/8/3Nn2p/2BQP3/P5P1/1PPB1P1P/R4RK1 b - - 0 20

Frankenstein, I mean MadChess, replies 20… Nf3+, forking my king and queen. Doh!

1rbqk2r/pp1p1pp1/8/3N3p/2BQP3/P4nP1/1PPB1P1P/R4RK1 w - - 1 21

I walked into that trap. I played on hoping MadChess would blunder some material back to me, leveling the game. But it was I who made further mistakes. I resigned after black’s 33rd move forked another of my pieces.

So I have used my powers of concentration to create a chess engine (via coding) that can defeat my powers of concentration (playing a game of chess).

So I have used my powers of concentration to create a chess engine (via coding) that can defeat my powers of concentration (playing a game of chess). I attempt to weaken my creation, act satisfied when I believe it has indeed played a weak move, then am smacked in the face by the brutish engine. Surreal.

To correct this problem, I need to code less and play more.

After the game I gave Komodo Dragon, a world-class chess engine, two seconds per move to analyze the game. See the full game, with Komodo’s suggested improvements, below.

Into The Void

MadChess 3.1 participated in a tournament Graham Banks arranged, named Into The Void.

This is MadChess 3.1’s debut tournament. As a fan of Black Sabbath, I must say I approve of the name of the tournament.

                           1    2    3    4    5    6    7    8    9    0    1    2    
1   Nawito 22.07 64-bit    **** 1010 0½½1 1010 111½ 00½1 ½1½½ ½½½½ ½1½1 ½111 ½101 110½  27.0/44
2   CM11th Archangel       0101 **** ½½01 ½½01 1½½½ ½11½ ½½0½ 110½ 10½1 ½½10 ½½11 101½  25.5/44
3   Inanis 1.1.0 64-bit    1½½0 ½½10 **** ½½01 11½½ ½0½½ ½½10 ½½½0 ½110 ½101 10½½ 1011  24.0/44  517.25
4   K2 0.99                0101 ½½10 ½½10 **** 1½00 0100 ½001 111½ ½½½1 ½1½0 1111 0½1½  24.0/44  514.50
5   MadChess 3.1 64-bit    000½ 0½½½ 00½½ 0½11 **** 10½½ 1½½1 ½½½1 0010 ½111 ½11½ ½11½  23.5/44
6   Delocto 200419 64-bit  11½0 ½00½ ½1½½ 1011 01½½ **** 00½1 10½½ 01½½ 010½ ½0½0 ½111  22.5/44
7   Chess Tiger 2007.1     ½0½½ ½½1½ ½½01 ½110 0½½0 11½0 **** 0½½0 ½½½0 ½111 01½0 1½½½  22.0/44
8   Blunder 8.5.5 64-bit   ½½½½ 001½ ½½½1 000½ ½½½0 01½½ 1½½1 **** ½10½ 010½ 0011 110½  21.0/44
9   Lozza 2.4 64-bit       ½0½0 01½0 ½001 ½½½0 1101 10½½ ½½½1 ½01½ **** 0001 ½0½1 ½½1½  20.5/44
10  ECE 20.1 64-bit        ½000 ½½01 ½010 ½0½1 ½000 101½ ½000 101½ 1110 **** ½1½½ 1½½0  19.5/44
11  Leorik 2.2 64-bit      ½010 ½½00 01½½ 0000 ½00½ ½1½1 10½1 1100 ½1½0 ½0½½ **** 1½10  19.0/44
12  KnightX 3.4 64-bit     001½ 010½ 0100 1½0½ ½00½ ½000 0½½½ 001½ ½½0½ 0½½1 0½01 ****  15.5/44

Games