The Week in Chess

Friday, October 31, 2014

Fried Shark 141028 x64 vs. Stockfish 14102815 x64 - SF War - 100R 1M1S

Fried Shark  141028 is an SF clone compiled today from the latest SF build. Fried Shark means nothing but incidentally is a famous street food in Trinidad and Tobago.

This is another match against the greatest and strongest engine in the world in a 100 round 1 minute base + 1 second increment. The result showed that Fried Shark 141028 won against Stockfish 1402815 with a score of 52.5-47.5. There will be no ELO rating and this will not be reflected in the Owl Rating List.

Lucas Braesch of the Stockfish team commented in my post for possible regression that I have no clue with what I am doing and knows nothing about statistics. It is probably the rudest comment I received in my life and I consider it an insult. No wonder some people are alienated by some of the Stockfish team's lack of good manners.

Fried Shark 141028 x64 vs. Stockfish 14102815 x64 - SF War - 100R 1M1S
RankEngineScoreFrStS-B
1Fried Shark 141028 x6452.5/100· ·· ·· ··16-11-73 2493.75 
2Stockfish 14102815 x6447.5/10011-16-73· ·· ·· ·· 2493.75 


100 games played / Tournament finished

Tournament start: 2014.10.31, 20:43:37
Latest update: 2014.10.31, 22:19:04
Level: Blitz 1/1
Hardware: AMD Phenom(tm) IIX4 945 Processor with 4.0 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5
Download the match PGN games here.

Fried Shark 141028 x64 vs. Sugar 2.0e x64 - SF Clones War - 100R 1M1S

This is a battle of  SF Clones between Sugar 2.0e x64 by Marco Zerbinati and Fried Shark. Sugar is currently the strongest SF clone and number 2 in the Owl rating list. Fried Shark is new but existed somewhere in different camouflage.

Fried Shark 141028 swallowed Sugar 2.0e with a score of 53-47. There is no ELO rating but is best left to the reader's imagination.

Fried Shark 141028 x64 vs. Sugar 2.0e x64 - SF Clones War - 100R 1M1S
RankEngineScoreFrSuS-B
1Fried Shark 141028 x6453.0/100· ·· ·· ··19-13-68 2491.00 
2Sugar 2.0e x64 47.0/10013-19-68· ·· ·· ·· 2491.00 


100 games played / Tournament finished

Tournament start: 2014.10.31, 18:51:34
Latest update: 2014.10.31, 20:40:05
Level: Blitz 1/1
Hardware: AMD Phenom(tm) IIX4 945 Processor with 4.0 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5
Download the match PGN games here.

Wednesday, October 29, 2014

Stockfish 14102815 Tests - Possible Regression


There is a possible regression with the latest build of Stockfish 14102815 based on observation and sample tests.

I just finished the gauntlet matches ran for Stockfish 14102712 yesterday for rating list publication but withheld it when the resulting ELO rating was lesser than the previously published results. A new version 14102721 was released from abrok.eu and immediately tested it for possible candidate blog posting. About 70% of the test run was completed but another followup release labeled 14102815 appeared again at abrok.eu, so I decided to forego the pending test and set-up new gauntlet matches for the latest release hoping that it will be suitable for the rating list publication.

As I watched the first 10 rounds of Stockfish 14102815 against Komodo 8 and Houdini 4 Pro, I noticed that it has difficulty scoring an advantage over the two great rivals in the 3 computers that it was running.  I went home and arranged another set of tournament gauntlet matches and saw the same negative pattern.  As I was about to sleep, I had a feeling that it will have a lower score than the previous versions after the 100 round matches. Sure enough, when I woke up and scrutinized the results, it produced a lower score.

What went wrong? It is supposed that each new release will most likely produce a better performance. I suspect that the last patch has caused the most negative result, so I immediately arranged a quick match of 30 seconds base + 500 milliseconds increment time control to determine if my suspicion is true. And it was! Stockfish 14102815 lost by 8 points to the older 14102712 version.

In the office, I hurriedly arranged the gauntlet matches in the 3 computers for confirmation of what I observed. This time it is between the two latest consecutive patches 14102815 vs. 14102721 in a longer 200 rounds duration each at quick time control of 30 seconds + 500 ms.  Well, the aggregate result is 314.5-285.5 score with 29 points advantage by the older 14102721 against the latest 14102815 which is approximately 8 ELO rating points. The latest Stockfish 14102815 lost in all the 3 batches.

This test result should be verified by the testers and the Stockfish team.

The seemingly good latest patch may have regression problem which was described as follows:

Author: mstembera
Date: Tue Oct 28 22:23:01 2014 +0800
Timestamp: 1414506181

max_piece_type cleanup, and slight speed increase.

No functional change.

Resolves #81 


Specifically, the change goes as follows;

src/evaluate.cpp 
assert(target & (pos.pieces(C) ^ pos.pieces(C, KING)));
- PieceType pt;
- for (pt = QUEEN; pt > PAWN; --pt)
+ for (PieceType pt = QUEEN; pt > PAWN; --pt)
if (target & pos.pieces(C, pt))
    return pt;
   - return pt;
   + return PAWN;
}






























Here are the statistics of the regression tests;

AGGREGATE RESULTS:
   # PLAYER                               :  RATING    POINTS  PLAYED    (%)
   1 Stockfish 14102721 x64    :    8.48          314.5        600     52.4%
   2 Stockfish 14102815 x64    :   -8.48          285.5        600     47.6%
 
 Batch 1

Stockfish 14102815 x64 vs. Stockfish 14102721 x64 - Match 100R 30S+500ms
RankEngineScoreStStS-B
1Stockfish 14102721 x64101.5/200· ·· ·· ·· ·26-23-151 9997.75 
2Stockfish 14102815 x6498.5/20023-26-151· ·· ·· ·· · 9997.75 


200 games played / Tournament finished

Tournament start: 2014.10.29, 14:10:59
Latest update: 2014.10.29, 15:44:19
Level: Blitz 0:30/0.5
Hardware: AMD Phenom(tm) IIX4 945 Processor with 6.0 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5
Batch 2

Stockfish 14102815 x64 vs. Stockfish 14102721 x64 - Match 100R 30S+500ms2
RankEngineScoreStStS-B
1Stockfish 14102721 x64106.0/200· ·· ·· ·· ·33-21-146 9964.00 
2Stockfish 14102815 x6494.0/20021-33-146· ·· ·· ·· · 9964.00 


200 games played / Tournament finished

Tournament start: 2014.10.29, 01:03:54
Latest update: 2014.10.29, 03:12:59
Level: Blitz 0:30/0.5
Hardware: AMD A8-5600K APU with Radeon(tm) HD Graphics, 8GM RAM
Operating system: Linux 3.16.4 with WINE
Table created with: Arena 3.5
Batch 3

Stockfish 14102815 x64 vs. Stockfish 14102721 x64 - Match 100R 30S+500ms3
RankEngineScoreStStS-B
1Stockfish 14102721 x64106.5/200· ·· ·· ·· ·35-22-143 9957.75 
2Stockfish 14102815 x6493.5/20022-35-143· ·· ·· ·· · 9957.75 


200 games played / Tournament finished

Tournament start: 2014.10.29, 14:06:17
Latest update: 2014.10.29, 16:15:00
Level: Blitz 0:30/0.5
Hardware: AMD A8-5600K APU with Radeon(tm) HD Graphics, 8GB RAM
Operating system: Linux 3.16.4 with WINE
Table created with: Arena 3.5
Download the test matches PGN games here.

Friday, October 24, 2014

Stockfish 14102319 x64 vs. Komodo 8, Houdini 4 - 100 Rounds, 30 Seconds + 500 ms

Today, I conducted a very quick test for the latest Stockfish 14102319 which was released on October 23, 2014. The time control was 30 seconds base + 1/2 second increment. This was just to determine quickly whether Stockfish made a good improvement from the previous released version. The last patch showed only a very small margin of wins in the short and long time control tests, so there is not much to expect. When the matches were over and saw the results, I decided to just share it together with some observations.

Well, the matches were all won by Stockfish 14102319 with a score of 56-44 against Komodo 8 and 59.5-40.5 against Houdini 4. The pattern is pretty much the same in my regular tournament at 1 minute base + 1 second increment compared with the special longer tests at 5 minutes, 10 minutes or 60 minutes. It was always Stockfish, Komodo and Houdini in that order of the ranking.

The general idea that Houdini is very strong in fast blitz and Komodo at long time control is not true anymore. Stockfish rules in any time control as can be seen by the results in most rating list sites.

Here is the result of the quick blitz:

Stockfish 14102319 x64 vs. Komodo 8 x64 - Match 100R 30S+500ms
RankEngineScoreStKoS-B
1Stockfish 14102319 x6456.0/100· ·· ·· ··31-19-50 2464.00 
2Komodo 8 x64 44.0/10019-31-50· ·· ·· ·· 2464.00 


100 games played / Tournament finished

Tournament start: 2014.10.24, 08:27:35
Latest update: 2014.10.24, 10:49:44
Level: Blitz 0:30/0.5
Hardware: AMD Phenom(tm) IIX4 945 Processor with 4.0 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5


Stockfish 14102319 x64 vs. Houdini 4 Pro x64 - Match 100R 30S+500ms
RankEngineScoreStHoS-B
1Stockfish 14102319 x6459.5/100· ·· ·· ··34-15-51 2409.75 
2Houdini 4 Pro x64 40.5/10015-34-51· ·· ·· ·· 2409.75 


100 games played / Tournament finished

Tournament start: 2014.10.24, 08:28:52
Latest update: 2014.10.24, 10:45:47
Level: Blitz 0:30/0.5
Hardware: AMD Phenom(tm) IIX4 945 Processor with 4.0 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5
Download the match PGN games here.

Thursday, October 16, 2014

SmarThink 1.70 vs. Top Chess Engines Selection - 100 Rounds, 1M+1S

SmarThink 1.70 was also on the edge of the Top Engines Selection circle in the last post, so it is proper to determine if it belonged to that group.

SmarThink breached the wall of the top circle where it defeated 4 opponents and held on against the strongest.  It managed to sneak in and is now in number 19 in the Top Engines Selection with a rating of 2758.84.

For some reasons, the removal of the duplicate versions of the strong chess engines paved the way for the successful assault of SmarThink.



Rank Engine True ELO Raw ELO Change Games Score% Points Win Loss Draw
1 Stockfish 14100223 x64 3165.56 305.25 7.37 100 92.50 92.5 85 0 15
2 Komodo 8 x64 3137.82 281.41 6.88 100 91.50 91.5 85 2 13
3 Houdini 4 Pro x64 3132.57 240.51 6.89 100 89.50 89.5 81 2 17
4 Gull 3 x64 3058.95 231.37 5.03 100 89.00 89.0 80 2 18
5 Rybka 4.1 x64 2968.66 130.71 6.16 100 82.00 82.0 70 6 24
6 Equinox 3.20 x64 3038.92 124.84 6.84 100 81.50 81.5 70 7 23
7 Deep HIARCS 14 2838.29 62.21 8.93 100 75.50 75.5 57 6 37
8 Chiron 2.0 x64 2840.22 48.28 10.48 100 74.00 74.0 61 13 26
9 Protector 1.7.0 x64 2871.40 9.31 7.44 100 69.50 69.5 54 15 31
10 Ice 2.0.2240 x64 2787.82 -45.50 1.93 100 62.50 62.5 50 25 25
11 Sjeng 2010 2764.36 -49.23 4.02 100 62.00 62.0 48 24 28
12 Naum 4.6 x64 2904.09 -52.94 4.79 100 61.50 61.5 44 21 35
13 Spike 1.4 2823.97 -52.94 5.66 100 61.50 61.5 41 18 41
14 Senpai 1.0 x64 2799.24 -74.85 5.78 100 58.50 58.5 45 28 27
15 Gaviota 1.0 x64 2756.30 -103.39 7.73 100 54.50 54.5 43 34 23
16 Hannibal 1.4b x64 2847.27 -135.02 1.99 100 50.00 50.0 33 33 34
17 SmarThink 1.70 2758.84 -135.02 -14.56 2000 33.95 679.0 404 1046 550
18 Shredder 12 x64 2800.00 -163.12 0.00 100 46.00 46.0 28 36 36
19 Disco Check 5.2.1 x64 2750.32 -202.43 4.07 100 40.50 40.5 26 45 29
20 Texel 1.04 x64 2787.93 -206.07 -4.73 100 40.00 40.0 25 45 30
21 Spark 1.0 x64 2782.96 -213.41 -4.55 100 39.00 39.0 20 42 38

Download the gauntlet matches PGN games here.

Arasan 17.4 x64 vs. Top Chess Engines Selection - 100 Rounds, 1M+1S

Arasan 17.4 happened to step into the edges of the Top Chess Engines selection circle last post, so this match was arranged.

The guns of the top chess engines were just too heavy for Arasan but it managed to extract a match win against Texel 1.04. The effort was valiant but it was pushed down to the lower bracket where it previously belonged.
 
Rank Engine True ELO Raw ELO Change Games Score% Points Win Loss Draw
1 Houdini 4 Pro x64 3132.57 309.66 6.89 100 94.50 94.5 89 0 11
2 Stockfish 14100223 x64 3165.56 309.66 7.37 100 94.50 94.5 89 0 11
3 Komodo 8 x64 3137.82 264.60 6.88 100 93.00 93.0 87 1 12
4 Equinox 3.20 x64 3038.92 239.30 6.84 100 92.00 92.0 84 0 16
5 Naum 4.6 x64 2904.09 83.03 4.79 100 82.50 82.5 69 4 27
6 Gull 3 x64 3058.95 71.15 5.03 100 81.50 81.5 71 8 21
7 Hannibal 1.4b x64 2847.27 -0.82 1.99 100 74.50 74.5 60 11 29
8 Protector 1.7.0 x64 2871.40 -5.40 7.44 100 74.00 74.0 62 14 24
9 Chiron 2.0 x64 2840.22 -23.19 10.48 100 72.00 72.0 58 14 28
10 Deep HIARCS 14 2838.29 -31.79 8.93 100 71.00 71.0 52 10 38
11 Spike 1.4 2823.97 -60.62 5.66 100 67.50 67.5 51 16 33
12 Senpai 1.0 x64 2799.24 -64.60 5.78 100 67.00 67.0 56 22 22
13 Sjeng 2010 2764.36 -68.55 4.02 100 66.50 66.5 54 21 25
14 Shredder 12 x64 2800.00 -91.67 0.00 100 63.50 63.5 46 19 35
15 SmarThink 1.70 2758.84 -91.67 -14.56 100 63.50 63.5 49 22 29
16 Disco Check 5.2.1 x64 2750.32 -99.18 4.07 100 62.50 62.5 48 23 29
17 Spark 1.0 x64 2782.96 -102.91 -4.55 100 62.00 62.0 44 20 36
18 Gaviota 1.0 x64 2756.30 -113.99 7.73 100 60.50 60.5 46 25 29
19 Ice 2.0.2240 x64 2787.82 -113.99 1.93 100 60.50 60.5 43 22 35
20 Arasan 17.4 x64 2718.54 -188.70 -50.52 2000 27.58 551.5 291 1188 521
21 Texel 1.04 x64 2787.93 -220.33 -4.73 100 45.50 45.5 30 39 31

Download the gauntlet matches PGN games here.

Chessdom News