The Week in Chess

Wednesday, June 18, 2014

Stockfish 14061621 x64 Tests - Possible Regression

The latest development of Stockfish 14061621 was released on June 16, 2014. There were several patches incorporated in it since version 5 released more than 2 weeks ago in which many have passed the short and long time control tests. This motivated me to test it with the anticipation of having a much stronger engine. But my enthusiasm was doused when the first test run at 1 minute + 1 second time control immediately showed that the newer Stockfish had difficulty scoring higher against Stockfish 5 and then in the end it was defeated with a score of 51-49 in favor of Stockfish 5. I thought it was just a fluke, so I conducted more tests simultaneously at 3 minutes + 2 seconds in the same computer and another 1 minute + 1 second in another computer. The results were 52-48 and 51.5-48.5 respectively, again in favor of Stockfish 5.  I also made 3 minutes + seconds matches with Stockfish 14061621 against Houdini 4, Komodo 7a and Gull 3 and it showed that it won against them all but it was considerably lower than the scores made by Stockfish 5 against the same opponents.

I begin to doubt if my tests are wrong or the Stockfish team had not noticed the regression.  There were no similar negative results published in the chess forums, so I thought that maybe some have the same negative results but was just shy to share it. This is something for the Stockfish team to verify.

Here are some samples of my tests:


Stockfish 14061621 x64 vs. Stockfish 5 x64 Match 100R 1M1S 1
RankEngineScoreStStS-B
1Stockfish 5 x64 51.0/100· ·· ·· ··17-15-68 2499.00 
2Stockfish_14061621_x6449.0/10015-17-68· ·· ·· ·· 2499.00 


100 games played / Tournament finished

Tournament start: 2014.06.17, 00:22:38
Latest update: 2014.06.17, 06:54:07
Level: Blitz 1/1
Hardware: AMD Phenom(tm) II X4 945 Processor with 1.8 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5


Stockfish 14061621 x64 vs. Stockfish 5 x64 - Match 100R 1M1S 2
RankEngineScoreStStS-B
1Stockfish 5 x64 51.5/100· ·· ·· ·12-9-79 2497.75 
2Stockfish_14061621_x6448.5/1009-12-79· ·· ·· · 2497.75 


100 games played / Tournament finished

Tournament start: 2014.06.18, 01:54:37
Latest update: 2014.06.18, 19:40:21
Level: Blitz 1/1
Hardware: Intel(R) Core(TM)2 CPU 4300 @ 1.80GHz with 3.9 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5


Stockfish 14061621 x64 vs. Stockfish 5 x64 - Match 100R 3M2S
RankEngineScoreStStS-B
1Stockfish 5 x64 52.0/100· ·· ·· ··14-10-76 2496.00 
2Stockfish_14061621_x6448.0/10010-14-76· ·· ·· ·· 2496.00 


100 games played / Tournament finished

Tournament start: 2014.06.17, 09:40:51
Latest update: 2014.06.18, 03:44:13
Level: Blitz 3/2
Hardware: AMD Phenom(tm) II X4 945 Processor with 1.8 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5
Download the Stockfish 14061621 test games PGN here.

Mars 1.8 x64 - Gauntlet Matches

Mars 1.8 x64 is a UCI chess engine by Trap released last June 13, 2014.

Mars 1.8 scored 50.75% with 913.5 wins, 340 losses and 1093 draws against the top 18 Ipplit Clones chess engines selection. It drew with the older version Mars 1.5 and has and estimated ELO rating 4 points lower than its predecessor. It has not performed convincingly better, therefore it will not be included in the regular rating list as it would just pollute the list. There were several releases between version 1.5 and 1.8 which were just a few weeks or days interval and a new version 1.9 was issued right after the release of version 1.8. This is too much for the rating list publishers especially that the released versions do not have significant improvements. The gauntlet matches of the official 1.8 release may just be a waste of time which I will try to avoid next time by doing proper preliminary tests and not be too excited to test new versions.
.
Rank Engine Est. ELO Raw ELO Games Score% Points Win Loss Draw
1 Houdini 4 Pro x64 3106.07 162.13 100 71.50 71.5 55 12 33
2 Robodini 1.1 x64 3049.70 75.42 100 59.50 59.5 37 18 45
3 Strelka 6 3058.54 31.87 100 54.00 54.0 26 18 56
4 Critter 1.6a x64 3010.66 8.55 100 50.50 50.5 23 22 55
5 Mars 1.8 x64 2978.72 3.60 1800 50.75 913.5 367 340 1093
6 Mars 1.5 x64 2982.88 1.94 100 50.00 50.0 19 19 62
7 Ivanhoe 46h x64 2949.47 -6.49 100 48.50 48.5 11 14 75
8 PanChess 00.537 x64 2966.38 -6.94 100 48.50 48.5 17 20 63
9 Fire 3.0 x64 2983.32 -15.13 100 47.00 47.0 12 18 70
10 Saros eXp R5 x64 2971.70 -16.28 100 47.00 47.0 12 18 70
11 Firenzina 2.4 xTreme x64 2971.53 -16.29 100 47.00 47.0 11 17 72
12 Igorrit 0.086v9 x64 2960.22 -17.57 100 47.00 47.0 20 26 54
13 Tactico Power 2011 x64 2946.08 -19.70 100 46.50 46.5 13 20 67
14 RobboLito 0.21Q x64 2957.79 -22.86 100 46.00 46.0 15 23 62
15 Bouquet 1.8 x64 2975.40 -23.76 100 46.00 46.0 18 26 56
16 LEOpard 0.7c x64 2940.35 -26.25 100 45.50 45.5 10 19 71
17 Akkad 0.52b x64 2886.78 -26.67 100 45.50 45.5 14 23 63
18 Vitruvius 1.11C x64 2935.49 -32.86 100 44.50 44.5 12 23 65
19 Black Mamba 2.0 x64 2920.05 -52.73 100 42.00 42.0 15 31 54
.
Download the gauntlet matches PGN games here.

Monday, June 16, 2014

The Elite Chess Tournament - 06/13/2014 - 100 Rounds, 3 Minutes + 2 Seconds

The recent chess tournaments conducted exclusively for the top 4 strongest chess engines had produced interesting result where Komodo and Houdini were closely fighting for the number 2 and 3 positions. Eventually, Houdini prevailed with just 3 poinst ELO rating advantage overall in the Owl Rating List which is a volatile lead. Stockfish 5 and Gull 3 were in constant position of number 1 and number 4 respectively.

There was a suggestion to conduct the tournament in a bit longer time control at 3 minutes base + 2 seconds increment to compare the results. Perhaps, I will make this time control feature a regular event and will name it "The Elite Chess Engines Tournament" which is exclusive for the top 4 strongest chess engines or maybe more when the condition permits. So, after the previous tournaments at 1 minute + 1 second time control, I made such competition immediately when it was finished. 

The result showed that the rankings sequence is still the same with Stockfish 5, Houdini 4 , Komodo 7a and Gull 3 in that order. I expected Komodo 7a to perform better in the 3M+2S time control against Houdini 4 as it is known to be stronger as time gets longer, but maybe this is not the right one.

Here is the tournament statistics:

The Elite Chess Engines Tournament, 06-13-2014 - 100R 3M2S
RankEngineScoreStHoKoGuS-B
1Stockfish 5 x64 179.5/300· ·· ·· ··25-16-5927-11-6239-5-56 24957.75 
2Houdini 4 Pro x64152.5/30016-25-59· ·· ·· ··21-20-5931-18-51 22436.25 
3Komodo 7a x64 145.5/30011-27-6220-21-59· ·· ·· ··25-17-58 21702.75 
4Gull 3 x64 122.5/3005-39-5618-31-5117-25-58· ·· ·· ·· 19250.25 


600 games played / Tournament finished

Tournament start: 2014.06.13, 09:39:36
Latest update: 2014.06.15, 16:22:34
Level: Blitz 3/2
Hardware: AMD Phenom(tm) II X4 945 Processor with 1.8 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5


Here is another tournament statistics format with ELO estimates:

Rank Engine Est. ELO Raw ELO Games Score% Points Win Loss Draw
1 Stockfish 5 x64 3168.84 50.21 300 59.83 179.50 91 32 177
2 Houdini 4 Pro x64 3106.91 -0.76 300 50.83 152.50 68 63 169
3 Komodo 7a x64 3103.71 -5.15 300 48.50 145.50 56 65 179
4 Gull 3 x64 3075.24 -44.30 300 40.83 122.50 40 95 165
.
Download the tournament games PGN here.

Friday, June 13, 2014

The Big Four Chess Tournament - 06/10/2014 - Third Batch, Final

The third batch of The Big Four Chess Tournament has just completed. The rankings are all the same from batch 1 to batch 3 with Stockfish 5 having the biggest gap of 41 points against Houdini, while Komodo 7a is 11 points behind Houdini and Gull 3 is 12 points lower than Komodo.

The Big Four Chess Tournament, 06-10-2014 - 100R 1M1S - Batch 3
RankEngineScoreStHoKoGuS-B
1Stockfish 5 x64 189.5/300· ·· ·· ··34-9-5742-10-4832-10-58 25948.25 
2Houdini 4 Pro x64148.5/3009-34-57· ·· ·· ··30-21-4934-21-45 21635.25 
3Komodo 7a x64 137.0/30010-42-4821-30-49· ·· ·· ··34-19-47 20387.25 
4Gull 3 x64 125.0/30010-32-5821-34-4519-34-47· ·· ·· ·· 19672.75 


600 games played / Tournament finished

Tournament start: 2014.06.12, 09:59:46
Latest update: 2014.06.13, 09:29:55
Level: Blitz 1/1
Hardware: AMD Phenom(tm) II X4 945 Processor with 1.8 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5


When all the games of the 3 batches were combined to "break the doubts" of the ranking between the 2nd and 3rd strongest engines in the Owl Rating List, it showed that Houdini 4 edged Komodo 7a by a mere 1.08 ELO rating points.  The margin is just too small statistically to declare Houdini 4 as the 2nd best. It might just be safe to say that Houdini 4 and Komodo 7a have the same strength in this rating list site, but for presentation let the ranking speak for itself. Let's wait for their respective engine updates to decide who really will be stronger convincingly.

Here is the combined tournament statistics:
.
Rank Engine True ELO Raw ELO Games Score% Points Win Loss Draw Change
1 Stockfish 5 x64 3168.84 55.76 900 61.28 551.5 314 111 475 4.28
2 Houdini 4 Pro x64 3106.91 -3.82 900 50.39 453.5 206 199 495 1.08
3 Komodo 7a x64 3103.71 -16.30 900 47.11 424.0 198 250 452 -5.38
4 Gull 3 x64 3075.24 -35.64 900 41.22 371.0 139 297 464 -0.72
.
Download the combined tournament games PGN here.

Owl Computer Chess Engines Rating List - 06/14/2014

The Owl Computer Chess Engines Rating List released, 06/14/2014.

View the full rating list here.

Thursday, June 12, 2014

The Big 4 Chess Tournament - 06/10/2014 - Second Batch

The second batch of The Big 4 Chess Tournament which started on June 10, 2014 has just finished. Once again it is Stockfish 5 who leads the pack, followed by Houdini 4, Komodo 7a and Gull 3. This is the same rank order as in the first batch. Notice that Stockfish had a big 29 points lead over the next placer Houdini, while Houdini had only 2 points lead over its closest rival Komodo and Gull is 21 points behind Komodo. The tournament result is shown below:

The Big Four Chess Tournament, 06-10-2014 - 100R 1M1S2
RankEngineScoreStHoKoGuS-B
1Stockfish 5 x64 178.0/300· ·· ·· ··30-16-5440-14-4627-11-62 25062.00 
2Houdini 4 Pro x64149.0/30016-30-54· ·· ·· ··20-20-6027-15-58 22060.00 
3Komodo 7a x64 147.0/30014-40-4620-20-60· ·· ·· ··36-16-48 21596.00 
4Gull 3 x64 126.0/30011-27-6215-27-5816-36-48· ·· ·· ·· 19912.00 


600 games played / Tournament finished

Tournament start: 2014.06.11, 01:37:10
Latest update: 2014.06.12, 08:52:33
Level: Blitz 1/1
Hardware: AMD Phenom(tm) II X4 945 Processor with 1.8 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5

The first two tournament batches were aggregated to get an estimated ELO rating based from the Owl Rating List. This time, it is Houdini 4 who is in number 2 position where previously it was occoupied by Komodo 7a, but the ELO rating difference is so tiny at 1.2 points that the superiority of Houdini over Komodo is not convincing. The third tournament batch is now ongoing and then the new Owl Rating List will be published upon completion to reflect the results of this Big 4 Chess Tournament. The performance statistics shown below:
.
Rank Engine Est. ELO Raw ELO Games Score% Points Win Loss Draw Change
1 Stockfish 5 x64 3166.3 57.7 600 60.3 362 206 82 312 1.72
2 Houdini 4 Pro x64 3106.8 1.2 600 50.8 305 133 123 344 0.99
3 Komodo 7a x64 3105.6 -14.1 600 47.8 287 133 159 308 -3.51
4 Gull 3 x64 3075.3 -44.8 600 41.0 246 89 197 314 -0.65
.
Download the second batch tournament PGN games here.

Wednesday, June 11, 2014

The Big 4 Chess Tournament - 06/10/2014 - Breaking The Doubts

The previous 3 Owl Rating List and the Top Chess Engines Selection List showed a difference which is perceptible to individuals with eagle eyes. In the rating list, the top 4 showed this in order: Stockfish 5, Komodo 7a, Houdini 4 Pro and Gull 3, while the top selection list has this ranking: Stockfish 5, Houdini 4 Pro, Komodo 7a and Gull 3. To be specific, Komodo 7a and Houdini 4 Pro are either in #2 or #3 of the list with only a few ELO points difference between them. The disparity between the two lists are caused by some factors such as the greater games in the Owl Rating List than in the Top Chess Engines Selection List which is a simulation of round-robin among the Top 20 that limits it to just 1900 games equally. Another possible factor is that Houdini has more opponents that have lower ELO ratings which could possibly drag down its ELO rating a bit lower. This leads to the question, who is really the best between the 2? One answer is that they are equal, because the difference is volatile  that could easily cause fluctuation in the ranking. But I decided to make a special 100 matches round-robin repeated 3 times for the top 4 chess engines in order to "break the doubts" of the right ranking among the top 4 with emphasis on Houdini and Komodo.

The first of the three 100 round matches was started yesterday and finished today, June 11, 2014. It showed Houdini in the number 2 spot above Komodo which is a pattern in the Owl Top Chess Engines List selection. The next batches of matches are ongoing which may be finished in 2 to 3 days. Results will be posted when ready and the new rating list will be published when it is over. Should there be no clear definite rank for Houdini and Komodo, it will be left as is to be decided maybe in their later versions.

Here is the result of the first 100 round matches:

The Big Four Chess Tournament, 06-10-2014 - 100R 1M1S
RankEngineScoreStHoKoGuS-B
1Stockfish 5 x64 184.0/300· ·· ·· ··25-17-5843-15-4241-9-50 25304.00 
2Houdini 4 Pro x64156.0/30017-25-58· ·· ·· ··19-16-6534-17-49 22694.00 
3Komodo 7a x64 140.0/30015-43-4216-19-65· ·· ·· ··32-21-47 20850.00 
4Gull 3 x64 120.0/3009-41-5017-34-4921-32-47· ·· ·· ·· 18960.00 


600 games played / Tournament finished

Tournament start: 2014.06.10, 00:16:55
Latest update: 2014.06.11, 01:26:46
Level: Blitz 1/1
Hardware: AMD Phenom(tm) II X4 945 Processor with 1.8 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5

Download the round-robin matches here.

Saturday, June 7, 2014

Texel 1.04 x64 - Gauntlet Matches 100 Rounds

Texel 1.04 x64 is a UCI chess engine by Peter Osterlund released last May 29, 2014.

Texel scored 40.05% with 480 wins, 878 losses and 642 draws against the selection of 20 top chess engines in the 100 rounds gauntlet matches. It earned 2822.61 ELO rating points, number 12 spot in the Top Chess Engines Selection list and number 40 rank in the Owl Rating List. Texel made a huge jump into the top with almost 300 ELO rating points increase from the previous version 1.01 tested. To make way for Texel in the Top Chess Engines Selection, Strelka 6 was retired from the list in order to prune some more of the Ippolit clones.
Rank Engine True ELO Raw ELO Games Score% Points Win Loss Draw Change
1 Stockfish 5 x64 3164.56 235.23 100 87.50 87.5 78 3 19 -1.80
2 Houdini 4 Pro x64 3105.83 209.78 100 85.00 85.0 75 5 20 -0.62
3 Komodo 7a x64 3109.09 180.75 100 82.50 82.5 71 6 23 -1.84
4 Gull 3 x64 3075.96 151.70 100 81.00 81.0 65 3 32 -1.38
5 Critter 1.6a x64 3012.01 123.42 100 76.50 76.5 62 9 29 -0.51
6 Strelka 6 3061.26 78.33 100 71.00 71.0 54 12 34 -2.62
7 Equinox 2.02 x64 2969.55 75.08 100 71.50 71.5 52 9 39 -0.51
8 Fire 3.0 x64 2984.76 61.39 100 68.50 68.5 53 16 31 -0.94
9 Rybka 4.1 x64 2958.30 48.65 100 66.00 66.0 53 21 26 -0.69
10 Protector 1.6.0 x64 2842.23 3.07 100 61.00 61.0 42 20 38 2.60
11 Hannibal 1.4b x64 2830.87 -13.78 100 58.00 58.0 43 27 30 1.18
12 Deep Hiarcs 14 2817.49 -31.49 100 56.00 56.0 35 23 42 1.20
13 Shredder 12 x64 2830.95 -47.48 100 53.50 53.5 35 28 37 -0.15
14 Spike 1.4 2808.04 -60.93 100 51.50 51.5 32 29 39 -0.60
15 Texel 1.04 x64 2822.61 -72.22 2000 40.05 801.0 480 878 642 2822.61
16 Naum 4.2 x64 2779.34 -93.44 100 47.00 47.0 28 34 38 -0.04
17 Senpai 1.0 x64 2780.54 -106.43 100 45.50 45.5 30 39 31 0.04
18 Murka 3 x64 2716.57 -112.70 100 44.00 44.0 24 36 40 1.89
19 Junior 13.8.04 x64 2735.18 -193.11 100 33.50 33.5 19 52 29 -1.20
20 DiscoCheck 5.2 x64 2709.66 -216.62 100 30.00 30.0 14 54 32 -1.33
21 Sjeng 2010 2748.75 -219.20 100 29.50 29.5 13 54 33 -2.26
.
Download the gauntlet matches PGN games here.

Chessdom News