by Stephen Ham
2/26/2021 – What is the best continuation? Which rook should you to move to the half-open b-file? You look at the 30 games in Mega, but cannot find a convincing answer. So you consult the latest chess engines, the NN variety with the most profound understanding. But even there you find clear differences of opinion. ICCF GM Steven Ham, who represents Team USA on Board #1, uses this practical example to explain why he prefers one engine to assist with his deep preparation. His conclusion: Fat Fritz 2 is the best overall chess analysis tool available.
Your key to fresh ideas, precise analyses and targeted training!
Everyone uses ChessBase, from the World Champion to the amateur next door. It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.
I’ve been America’s top-rated ICCF Grandmaster for several years. As such, I’m an active and experienced chess engine user. So I know that utilizing the best chess engine for high-quality chess analysis is vital for success. But, how does one determine which engine to use?
Many rely upon chess engine rating lists where thousands of game results generate ratings. However, this data is quantitative, sourced from massive numbers of speed chess games. But does quantitative data (large numbers of low-quality games) translate into qualitative data (superior chess analysis)? After all, correspondence chess players often spend many days determining which move and which continuation is best. Each candidate move is deeply investigated, as are all subsequent candidate replies, and the replies to those replies, creating a branch of analysis often extending a dozen or more whole moves. Then additional branches are developed. This growing tree of analysis requires examination in great detail and depth. From this completed mass of accumulated data, the game move is then selected. So should one perform qualitative analyses with the best speed chess engine? Is the best sprinter also the best marathon runner?
To answer this question, I perform engine tests on my ICCF games, since I already know them intricately. Most engines handle tactical positions well. Some perform better than others, and some solve positions consistently faster. But very few perform satisfactorily in more technical positions. Also, most engines perform best in open positions, while closed positions sometimes confuse them. Then their move selection may include “mindless” piece shuffling and erroneous evaluations.
My engine experience evolved along with technology. In the pre-NNUE (Efficiently Updatable Neural Network) era, Stockfish was my engine of choice. Its search function was excellent, reaching high ply depth faster than other engines by pruning away lines that didn’t merit consideration. Also, its move selection was generally better than others. However, I was frustrated by its often unsatisfactory evaluation function. It assessed most of my game positions with the infamous 0.00 evaluation. But, these positions were not always equal. And even when they were, some equal positions were more dynamic than others. To a chess engine, one equal position is as good as any other, but merely assigning a 0.00 evaluation to positions tells me very little. So, I always imposed my own evaluations onto Stockfish’s output, and then made my own move selection, which frequently differed from the engine’s preferred choice.
Another problem was closed positions. Then I had to switch to using fully neural net engines (i.e. Fat Fritz 1 and Leela) to generate reasonable analyses and evaluations, or perform solely human analysis which was then “blunder-checked” by brute force engines.
However, after the development of Stockfish NNUE, its evaluation improved 100%. Clearly, the NNUE concept is the way to go. It maintains Stockfish’s fantastic search performance while adding a better evaluation function and move selection. However, while fewer positions are evaluated 0.00, too many still are. And many positions remain incorrectly evaluated, either given scores that are excessively high/low. Some are outright wrong. So, NNUE represents a huge improvement, but there is still room for more.
Enter Fat Fritz 2. Existing NNUE architecture has a network of 256 neurons. To my mind, if 256 neurons represent an improvement, then surely more is better! At this point, I must confess to being a computer dummy, knowing nothing about computer programming and hardware. I just know how to perform deep chess analyses with chess engines. So when reading that Fat Fritz 2 supplies a NNUE that’s twice the size of what I was using, I was intrigued.
I was already a happy Fat Fritz 1 user, finding that its evaluations generally match mine. It’s an objective evaluation partner. In that regard, it can be relied upon better than Stockfish NNUE. However, the latter’s faster and deeper search capacity, and significantly superior endgame performance, kept it my primary analysis tool.
So, if Stockfish NNUE’s impressive search performance could be paired with some of Fat Fritz’s superior evaluation, then I’d have the best analysis tool available. This was accomplished with Fat Fritz 2, and I’ve been delighted by my experience with it. It’s now my primary correspondence chess tool and partner.
How is this preference justified? I conducted several test matches between Fat Fritz 2 and Stockfish NNUE’s latest development iterations. These engine matches were played at relatively long time controls, unlike the speed games at the rating agencies. My matches included engine books: the default book that came with Fat Fritz 2, and my own book based upon my ICCF opening preferences. The result? Well, like the recent TCEC Super-Final Match where the two top engines played 100 games at a long time control with opening books, my match games also resulted in an overwhelming number of draws. Still, Fat Fritz 2 leads overall by +1. So, these engines are of equivalent match strength.
But I want an analysis tool.
There are more ways to examine a chess engine’s analytical capacity than just matches. I used Fat Fritz 2 to examine my own ICCF games, both ongoing and completed. I’d already developed a large body of high-quality analyses for those games, and Fat Fritz 2 generated the same high-quality work. But where formerly I had to override Stockfish NNUE’s evaluation with my own, Fat Fritz 2 supplied an evaluation that generally matched mine. When we differed, Fat Fritz 2 was found to be correct by extending the analysis.
Finally – Opening Theory. Prior to the latest TCEC event, it was announced that the opening books would be very short – just a handful of moves, and then the engines would be on their own. Many complained that this was unfair to Stockfish NNUE. After all, fully neural net engines develop their nets by playing complete games. That experience and data is retained by their nets, enabling them to quickly and accurately play openings. I too perceive that both my GPU-reliant neural net engines (Fat Fritz 1 and Leela) evaluate opening positions better than Stockfish NNUE. Similarly, Fat Fritz 2’s NNUE is based upon Fat Fritz 1 games. So, it’s my opinion that it too displays superior performance over Stockfish NNUE in selecting opening moves. Fat Fritz 2 has already assisted me in selecting between various opening theory continuations.
For example, I have a new tournament commencing, and so am reviewing opening theory. Here’s what I experienced today in the following D38 opening: 1 d4 Nf6 2 c4 e6 3 Nc3 Bb4 4 Nf3 d5 5 cxd5 exd5 6 Bg5 Nbd7 7 e3 c5 8 Bd3 Qa5 9 Qc2 c4 10 Bf5 0-0 11 0-0 Re8 12 Nd2 g6 13 Bh3 Bxc3 14 Qxc3 Qxc3 15 bxc3 Kg7
Is this position equal or does White have an edge? What is White’s best continuation?
Step one is to examine ChessBase’s online database. 30 games are presented. White won four, yet lost two, including a game by the highest rated GMs. However, some of these games were of differing quality, including speed games and those played by much lower-rated players. So, these game results alone are not a determining factor.
White’s rooks remain undeveloped and his pawn structure doesn’t allow any immediate file openings. Placing a rook on b1 seems natural, but which one? Of the 30 database games, 23 saw 16 Rfb1. But is this “right” rook? White thinks there’s no future for his KR anywhere but b1, where it pressures Black’s b7 pawn. When Black develops his bishop, he may need to play …b6. Now White’s QR finds a future with a4-a5 etc. with pressure on Black’s queenside.
But I suspect this is instead the “wrong” rook to b1. I prefer 16 Rab1. Yes, the KR doesn’t have an immediate future. But thinking long-term, Black has a weakness at the end of his pawn chain on d5. Perhaps White can target it with a subsequent g3 and then Bg2. White may also consider a later pawn break with f3 and e4 when the KR supports it with Re1. But 16 Rab1 was played in only three games in the database, one of which was by a 1600-rated player. However, White won the other two games. So, it’s time for engine analysis and evaluation.
The latest development version of Stockfish NNUE is dated Feb. 20. It immediately prefers 16 Rfb1, the database’s “majority” move, and stays there through 54 plies and about 20 minutes, declaring a small white advantage. Allowing roughly the same time and search depth for the subsequent moves, we see 16…b6 17 Bf4 h6 18 Bxd7!? This exchange of a seemingly good bishop for a relatively undeveloped knight is a small surprise. 18…Bxd7 19 Be5. The point of the prior exchange – White pins the knight. But Black can simply break the pin with 19…g5 20 f3 Kg6 21 e4
Stockfish NNUE’s evaluation dropped during this 53 ply search, declaring the position equal. So roughly 3.5 hours later, Stockfish NNUE’s evaluation moved from White edge with 16 Rfb1 to equality with 21 e4. Equality is my evaluation too as the QR looks silly. But equality was also my evaluation after 16 Rfb1.
Firing up Fat Fritz 2 and allowing it to also search for roughly 20 minutes for each move, it initially chooses 16 Rfb1 too. But after three minutes, it selects my choice of 16 Rab1 and stays there for the remaining 17 minutes. Then 16…b6 17 Bf4 h6 18 Bxd7!? That exchange again! Nxd7. This is new. In the corresponding position, Stockfish NNUE played 18…Bxd7. So to be fair, and for comparison purposes, we need to examine 18…Bxd7 too.
So, 18…Bxd7 19 Be5 Re6. Once again a deviation from Stockfish’s 19…g5 in the related position. To be fair, we need to examine this sub-sub line of 19…g5 to see why Fat Fritz 2 rejected Stockfish’s choice. After 19…g5?? we immediately see it’s a blunder because of 20 f4! By leaving the KR on f1, 20 f4! threatens to rip open Black’s kingside. After 20…g4 21 f5
Fat Fritz 2 claims a decisive advantage of 95%. Wow! That’s another reason why 16 Rab1 seems best – a compelling tactical result of leaving the KR on f1.
Returning to 19 Be5 Re6 we see 20 e4. Before reaching this move, Fat Fritz 2 also favored 20 f3 b5 21 e4 with a White advantage, again due to leaving the KR on f1. But the immediate 20 e4 looks better yet because of the threat of 21 exd5 and the pin on Black’s knight. 20…Kf8 21 f3
I stopped here after 20 minutes and a 52 ply search depth. Fat Fritz’s evaluation is a White advantage of 62%. Rather than centipawns, its ChessBase/Fritz GUI defaults to a victory/loss percentage. Here, 62% represents a 26% chance of victory versus only a 2% chance of loss. Pretty good odds for White!
We’ve thus examined both the sub-line beginning with 18…Bxd7 and the sub-sub line with 19…g5?? Fat Fritz 2 demonstrated that both Black continuations that worked in the line selected by Stockfish NNUE instead lead to a White advantage after 16 Rab1. Now back to the 18…Nxd7 mainline.
18…Nxd7 19 h4!? Bb7 (Initially Fat Fritz 2 preferred 19…f6, planning …g5. White then counters with Rfe1 and an eventual e4 push, with advantage. More evidence that leaving the KR on f1 is best.) 20 Rfe1 (Even more justification for 16 Rab1.) Bc6 (Black stops the threat of a4-a5 and prepares …b5 to protect his c-pawn.) 21 e4. The 16 Rfb1 line was stopped after move 21. So to be fair, we should make a comparison now. After 20 minutes and a 56 ply search depth, Fat Fritz claims a White edge. Specifically the valuation is 58%, with a winning prospect of 19% while only a 3% risk of losing. I concur with Fat Fritz 2’s valuation. White is better, although it’s only a small advantage. This continuation is also more dynamic for White than that reached after 16 Rfb1. I speculate that when Fat Fritz 2 switched from 16 Rfb1 to 16 Rab1, it sensed the subsequent advantages of leaving the KR where it was.
For the record, I normally generate analysis lines many moves deeper than this. In fact, I did here too, but stopped for the benefit of you readers who may grow weary of a mass of analyses. Nonetheless, the longer analytical lines and resulting branches confirm Fat Fritz 2’s correct move selection and evaluation beginning with 16 Rab1.
You can play through all the lines giving in the above replayer. Note that you can start a (normal brute force) engine by clicking on the fan icon, and analye the moves further.
Conclusion: From engine match testing, analyses of my own games, and opening theory evaluations, I’m convinced that Fat Fritz 2 is the best overall chess analysis tool available.
About Steven Ham
He learned to play chess when he was eight. Due to being raised in rural Minnesota where OTB chess opportunities were unknown, he began playing correspondence chess when about 12. Now, at age 45, his rating is 2508 (ICCF) and 2432 (USCF). Victories in two ICCF Master tournaments allow him to play in the XXIV ICCF World Championship Semi-Final. He is preparing for that challenge by keeping in shape via friendly challenge matches. Since he presumes that many of his opponents in the ICCF will be using computers to guide their play, he is looking to gain valuable experience when combating a mechanical enemy.
Steve is married to Tao Ham and the proud father of Alexander (32 months old) and Cordelia (10 months). His hobbies beside chess are: weight-lifting, kick-boxing, playing soccer, and listening to Celtic folk music.
Pop-up for detailed settings