Park Neutralized Stats

One thing that gives people like me headaches is having to deal with the fact that all ballparks are different. Don’t get me wrong, the uniqueness of the sport is part of what makes baseball so special. But it’s an imperfect science when attempting to compare players from different teams.

The other day, my friend and baseball statistic guru Ryan Spaeder was asked which park was the toughest on hitters, and this was his response:

I thought it was perfect. Obviously, Coors Field is easily the most “hitter friendly” ballpark, but it is almost a no-win situation for a hitter. The Rockies have zero Hall of Famers in their 24 years of existence and their two best candidates (Larry Walker and Todd Helton) are unlikely to get in any time soon. The argument is their stats are inflated due to Coors Field, although it is impossible to say by exactly how much. We have park neutralized metrics such as OPS+ and wRC+ which include a park adjustment, and while I use them regularly, I admit they aren’t perfect.

The common practice in adjusting for ballpark is to take the ballpark factor, which estimates how a park influences run scoring compared to league average, and apply to it to each hitter. However, all players receive the same adjustment, no matter if they bat left-handed or right-handed, or if they are fly ball, ground ball, pull or spray hitters, etc. The assumption that all types of hitters should be treated the same is what I’m attempting to correct.

So while I know I’m not completely fixing the problem here, I’m offering an alternative. What I have are park adjusted career totals based off home and road splits. My first attempt was to just multiply road stats by two, but that would completely eliminate the player from playing ANY games in their home park and turn ballpark advantages into disadvantages and vice versa.

Instead, I decided to include home stats, but only at the same rate that a player would visit other parks. For example, Babe Ruth played in an era when his league had eight teams. That means that if he played in all ballparks an equal amount of time, he’d play in his home ballpark 12.5% (1/8) of his games. The other 87.5% would come from his road stats. A player in 2016 would play in their home park 6.67% (1/15) of the time.

Example

The formula is simple. For Babe Ruth’s home runs, we take his home HR/PA (347 / 5150 = .0674) and divide that by the number of teams in his league (.0674 / 8 = .00842). Next we take his road HR/PA (367 / 5473 = .0671) and multiply that by (7 / 8 = .875), which is the number of opponents in the league divided by the number of teams in the league (.0671 * .875 = .0587). Next, we add those two numbers together to get his new HR rate (.00842 + .0587 = .06712). To get his final career HR total, we multiply his rate by his career plate appearances (.06712 * 10623 = 713 HR). Surprisingly, he actually loses a HR, even though he played most of his career with a short right field at Yankee Stadium. We’ll see, however, that many players have a bigger difference in their adjusted career totals.

500 HR
After neutralizing the stats, there are no new members of the 500 HR club, although we lose six players. The biggest drop is by Mel Ott, who loses over 20% of his career total. Ott played his entire career at The Polo Grounds, where the right field foul pole was just 258 feet from home plate. During his career there, lefthanded batters hit about 80% more HR there than they did at the other National League parks.
Capture
What is interesting about Mel Ott and The Polo Grounds is that while it allowed Ott to hit far more home runs, it came at the expense of other hits. So we take away 104 HR, but also credit him 83 singles, 87 doubles, and 21 triples. Overall, his production was increased at home, but not by as much as his home run total would indicate.

The second biggest drop is a bit surprising in Frank Thomas, who lost 90 home runs. Comiskey Park does favor HR hitters, but it’s far from the most drastic ballpark. Still, over his career, Thomas hit a HR in 6.2% of his plate appearances at home and 4.1% on the road.

David Ortiz is not only the biggest gainer on the list, but also among all players. He went from 525 HR to 589, and we’ll see that Fenway Park wreaks havoc on these neutralized stats.
HR Change

The common theme with both of these lists is that those who saw their totals increase all played in parks that favored pitchers with the long ball, while those who saw their totals decrease played in parks that favored home runs.

Now let’s look at what is probably the second most popular career batting list, the “3,000 Hit Club”. (Note: Since we only have home/away splits going back to 1913, any player who began their career before this season is not included. Thus, no Cobb, Wagner or Speaker).
3000 Hits
Just as with the “500 HR Club”, the “3000 Hit Club” only lost players. While some players see an increase their production and some see a decrease from these neutralized stats, as a whole, players will lose some production. This is due to home field advantage and because the majority of these neutralized stats are influenced by road stats. This may partly explain why there are no new members of either club.

Already, we see some of Fenway Park’s impact with David Ortiz’s increase in his home run total and both Yaz and Boggs seeing big decreases in their hit totals. Maybe the most telling is the list of players that saw the biggest decrease in their doubles.
Capture
Nine of the top 10 played most or all of their career at Fenway Park. You just don’t see this type of thing in other sports.

Players with Big Increases in Production

Capture
A fun thought experiment is to imagine how their careers would have turned out had Joe DiMaggio and Ted Williams been traded for each other, with DiMaggio taking advantage of the Green Monster and Williams facing a short RF porch. Instead for DiMaggio, he had to contend with 451 feet to left-center field at Yankee Stadium. We estimate that with a neutral park, he would hit 45 more home runs and increase his overall production, with 31 more points of OPS.

Capture
Rick Wilkins may be the player farthest from your mind when you started reading, yet here he is. What is most amazing about him is just how drastic his home/road splits were when he spent the majority of his career at a hitters park in Wrigley Field. For his career, Wilkins hit .216/.298/.350 at home and .272/.366/.471 on the road. Who knows? Maybe he just got incredible nights sleep in hotel beds.

Capture
It’s no secret that AT&T Park favors pitchers, and what makes Buster Posey so special is that his raw stats are impressive, even before a park adjustment. But if we estimate what they would look like at more favorable parks, it becomes even more obvious that he’s on an early path to the Hall of Fame.

Capture
In the Willie Davis comment in his New Historical Baseball Abstract, Bill James describes a method for converting a players stats from one run environment to another. This has come to be known as the “Willie Davis Method” and it is currently used on this site and is the basis for Baseball-Reference’s neutralized stats (with some additional adjustments). The problem with this method is it treats all batting events the same and are adjusted at the same proportion. As we have seen with Fenway Park, this is not the case. Anyway, Bill James introduced his method in Davis’s player comment because he spent much of his career at a horrible hitters park. As we can see from this neutralization method, Davis’s stats improve, and his +29 triples and +210 total bases are the most of any player in history.

Capture
As if Mike Piazza didn’t already have the most impressive statistics for any catcher in baseball history, they get even better after they are neutralized. In fact, every single home ballpark during Piazza’s career had a park factor below 1.

Players with Big Decreases in Production

Capture
Chuck Klein spent much of his career in the Baker Bowl, which was 280 ft to right field and 300 ft to right-center field. So it’s easy to see why he hit 63% of his home runs at home. When we neutralize his stats, his overall numbers are much less impressive, especially given the hitters era in which he played.

Capture
As we saw earlier, Wade Boggs takes a big hit, with his OPS dropping 63 points. This is similar to other Red Sox players, such as Bobby Doerr (-.084), Rico Petrocelli (-.069), Dom DiMaggio (-.059), Jim Rice (-.053) and Carl Yastrzemski (-.051).

Capture
Barry Larkin is an interesting case. He has a 21 point drop in OPS while his hit total increases by 29. The biggest change was losing 130 walks, decreasing his walk percentage from 10.4% to 8.9%. In fact, Riverfront Stadium regularly increased walks for right handed batters every season of Larkin’s career. This is just a reminder that park factors are not limited to balls in play.

Capture
As mentioned above, playing at Coors Field can be a no-win situation. Obviously, Larry Walkers stats have received a boost. But if we neutralize them, he still compares well to these two Hall of Famers.
Capture
Throw in 9 Gold Gloves and +94 fielding runs, and this should quell any fears you may have about his Coors-inflated stats.

Capture
CarGo loses 102 points in OPS, which is the most of anyone with at least 2000 career plate appearances. This may indicate that his style of play is more affected by Coors Field. It’s also possible that he has a tougher time adjusting to the different approaches opposing pitchers take on the road, as Eno Sarris suggests. This may point to a flaw in the neutralization method, especially for extreme ballparks like Coors Field.

Change in Type of Production

Capture
Hank Aaron played his career at two parks, Milwaukee County Stadium and Atlanta-Fulton County Stadium. Milwaukee favored pitchers in terms of the long ball, but Atlanta was known as “The Launching Pad” and had a big affect on home runs. Naturally, he sees a drop in his home run total, but also an increase in singles, doubles and triples. The overall level of production didn’t change much after neutralization, just how it changed.

Capture
Jay Buhner’s neutralized OPS is nearly identical to his actual OPS, but his peripherals see some changes. His singles and home runs increase, but his singles, doubles and walks decrease. This is just another example that a ballpark can change how a game is played while having very little impact on the run environment.

Flaws in the System

The unbalanced schedule and interleague play make it so not all teams visit the same ballpark an equal amount of times. This means that Rockies players will visit pitchers parks such as Petco, At&T and Dodgers Stadium more often than teams in other divisions. It’s possible this can be corrected by equalizing the amount a team will see a road park. However, this would complicate the process and I’m not completely comfortable with it.

As mentioned in the Carlos Gonzalez comment, it is possible that some players’ away stats are affected due to different approaches taken by pitchers based on the ballpark. I suspect this is only the case in the very extreme and unique parks. It is something to keep an eye on.

This method only uses career statistics, which contain large samples when dealing with home/road splits. A single season may not offer a big enough sample to completely trust, especially with part-time players.

Conclusion

This method is admittedly imperfect, but it does fix the problem with applying the same run adjustment to all players. If anything, it is an alternative to other methods of neutralizing ballparks. I’m open to any suggestions on improving this method and I may publish a pitching neutralization shortly.

For those interested, I have included a spreadsheet with neutralized stats that can be viewed here. It includes all players with at least 1000 PA and began their career after 1912.

Posted in General, Historical, Statistical Analysis | Leave a comment

Gauging the First Half

Instead of making a post about mid-season awards, which we are sure to see a few of during the All-Star break, I figured I’d try something different. Let’s take a look at how individual plays affect a team’s postseason probabilities.

Top Plays

Earlier in the season, I added a page that shows the top plays of the season in terms of win probability added. While placing a value on the importance of the individual game is interesting, we can take it one step further and look at how each play impacts a team’s playoff probability. A big hit in a game between two teams that are not in contention will have little to no effect. But a walk-off home run in a game between teams tied for the lead in a division will have a much greater impact. So let’s take a look at the biggest plays of the first half.

This list is sorted by championship win probability added (cWPA). Just as in-game win probability added shows the change in win probability in terms of percentage points, cWPA shows the change in World Series win probability. Your first thought upon seeing the cWPA values is probably how small they are. In fact, every play this season has a cWPA of less than 1 percentage point. This shows just how little of an impact, even the most important play of the first three months of the season, has on a team’s chances of winning the World Series. Another way to look at these numbers is to multiply them by 8, to see the change in probability of being one of the final 8 playoff teams.

1) Leonys Martin’s walk-off HR (0.71 cWPA)


With two outs and a runner on 2nd in the bottom of the ninth and his team down by a run, Martin fell behind in the count 1-2 on three Ryan Madson changeups. On the fourth pitch, Madson went changeup again and Martin deposited it into the right field bleachers. The walk-off increased the Mariners in-game win probability by 86 percentage points, but more importantly, it increased the Mariners probability of winning the World Series by 0.71 percentage points.
On a side note, this play is also 40th on our list, as it decreased the A’s World Series win probability by 0.38 percentage points. The change in percentage points is bigger for the Mariners since the game was of more importance, as they were ahead in the division by 1.5 games, while Oakland was 7 games back.

2) Salvador Perez’s go-ahead 2-R HR (0.63 WPA)


In the bottom of the 8th inning with two outs, Bryan Shaw was looking to send the game to the 9th with his team up by a run. He was facing Salvador Perez, who was 1 for 12 in his career vs Shaw. But on the 1st pitch, Perez gave the fans in left field a souvenir and his team the lead. This play decreased the Indians World Series win probability by 0.63 percentage points. You can actually see the moment when Bryan Shaw realizes that pitcher vs batter stats are too small of samples to trust.Bryan Shaw
This play is also 6th on our list, as it increased the Royals World Series win probability by 0.59 percentage points.

3) Ian Desmond’s go-ahead 2-R HR (0.62 cWPA)


The next two plays on this list are from the same crazy game in Oakland. The A’s were one out away from victory with Ryan Madson on the mound, when Ian Desmond gave Texas the lead with a 2-run HR off a changeup. This play increased the Rangers chances of winning the World Series by 0.62 percentage points. As Desmond rounded the bases, Rangers announcer Tom Grieve noted that Madson threw one too many changeups, which seems to be a recurring theme here.
This play is also 24th on this list, as it increased Oakland’s World Series win probability by 0.41 percentage points.

4) Khris Davis’s walk-off grand slam (0.61 cWPA)


The next half-inning, Texas closer Shawn Tolleson intentionally walked Josh Reddick to load the bases with one out. Next, Danny Valencia flew out to shallow right, which brought up Khris Davis, who had already hit two home runs in the game. Davis then ended the game on a walk-off grand slam, which left Adrian Beltre wondering “what the hell just happened”.
Capture
From the A’s perspective, this play is 27th on the list, as it decreased their World Series win probability by 0.40 percentage points.

5) Yasiel Puig’s walk-off single and error by Michael Taylor (0.60 cWPA)


Puig’s single would have put runners on 1st and 2nd with one out in the inning, but it was Taylor’s gaffe (and Puig’s hustle) that allowed both runners to score. This was the culmination of Michael Taylor’s horrendous game, where he also struck out in all five of his at bats. If you look close enough, you can see him calculating the cWPA in his head.
Capture

Here are the rest of our top 25 plays of the first half:

Rk Date Play Team VS cWPA Highlight
6 6/14 Salvador Perez HR KC CLE
+0.59
7 5/21 Matt Wieters HR BAL LAA
+0.58
8 7/08 Luis Valbuena HR HOU OAK
+0.56
9 6/22 Yasiel Puig little league HR WAS LAD
-0.56
10 6/12 Jayson Werth Single WAS PHI
+0.56
11 5/14 Albert Pujols HR SEA LAA
-0.52
12 6/05 Matt Wieters 1B BAL NYY
+0.47
13 7/07 Troy Tulowitzki 1B TOR DET
+0.47
14 4/12 Geovany Soto HR OAK LAA
+0.45
15 5/20 Melvin Upton HR LAD SD
-0.44
16 5/10 Ryan Rua HR TEX CHW
+0.43
17 4/12 Geovany Soto HR LAA OAK
-0.43
18 4/08 Starling Marte grand slam PIT CIN
+0.42
19 4/08 Starling Marte grand slam CIN PIT
-0.42
20 6/24 Adam Lind HR STL SEA
-0.42
21 5/21 Jayson Werth GIDP WAS MIA
-0.42
22 5/28 Drew Butera 2B CHW KC
-0.41
23 6/23 Adonis Garcia HR NYM ATL
-0.41
24 5/17 Ian Desmond HR OAK TEX
-0.41
25 6/11 Prince Fielder HR SEA TEX
-0.41

Most Critical Moments

We can measure the importance that a particular play has on a game by using leverage index (LI), but this is limited to the situation in the game and it treats all games the same. Just as with WPA and cWPA above, we can take this one step further and measure the importance of the game by including the game’s championship leverage index (CLI). This number shows the importance of the game for each team, where the average game equals 1. If a win or a loss has a significant effect on the team’s playoff probability, the CLI will be greater than 1. By multiplying the LI and CLI, we can measure the importance that a play has on a team’s playoff probability. We’ll call this number pCLI (for championship leverage index by play). This number can be read as “how many times more important this situation was compared to the average play on opening day”.

Below are the top 10 most critical situations of the first half. As with the list above, some plays will appear twice, since they were important to BOTH team’s playoff chances.

Rk Date Team Inning Outs Runners Score pCLI Outcome Highlight
1
5/17 TEX Bot 9 2 Loaded 5 – 4 15.4 Khris Davis grand slam
2
6/12 WAS Bot 9 2 Loaded 4 – 3 14.2 Jayson Werth 1B
3
6/05 WAS Bot 9 2 Loaded 10 – 9 14.1 Ivan de Jesus fly out
4
6/24 TOR Top 9 2 Loaded 2 – 3 12.7 Michael Saunders pop out
5
6/11 SEA Bot 11 2 1st & 2nd 2 – 1 12.5 Kyle Seager fly out
6
5/17 TEX Bot 9 1 Loaded 5 – 4 12.2 Danny Valencia fly out
7
7/07 TOR Bot 8 2 Loaded 4 – 3 12.1 Troy Tulowitzki 1B
8
6/11 TEX Bot 11 2 1st & 2nd 2 – 1 12.0 Kyle Seager fly out
9
6/11 SEA Bot 10 2 Loaded 1 – 1 11.8 Ketel Marte fly out
10
5/06 BOS Top 9 2 Loaded 2 – 3 11.7 Hanley Ramirez strikeout
11
6/10 SFN Bot 9 2 1st & 2nd 3 – 2 11.5 Brandon Crawford strikeout
12
7/06 HOU Top 9 2 Loaded 8 – 9 11.5 Dae-Ho Lee strikeout
13
6/10 LAN Bot 9 2 1st & 2nd 3 – 2 11.5 Brandon Crawford strikeout
14
6/11 TEX Bot 10 2 Loaded 1 – 1 11.3 Ketel Marte fly out
15
6/05 WAS Bot 9 1 Loaded 10 – 9 11.1 Zack Cozart strikeout
16
6/05 BAL Bot 8 2 Loaded 1 – 0 10.9 Matt Wieters 1B
17
6/05 BOS Bot 9 2 1st & 2nd 5 – 4 10.9 Marco Hernandez strikeout
18
6/30 NYN Top 9 2 Loaded 3 – 4 10.8 Javier Baez pop out
19
6/18 TOR Top 9 1 Loaded 2 – 4 10.6 Josh Donaldson GIDP
20
6/21 BAL Bot 8 2 Loaded 7 – 6 10.6 Adam Jones ground out

If we revisit these lists at the end of the season, there is a good chance it will be dominated by second half plays. The reason for this is, just as the most important plays happen in the later innings of the game, the most important games occur near the end of the season. However, 2016 may be different since 5 of the 6 division leaders currently have at least a 5 game lead, which may lead to less enjoyable divisional races. For the sake of exciting plays and games, let’s hope some of these leads shorten.

Posted in General, Statistical Analysis | 1 Comment

Reunions

There has been a lot of attention being paid to 30th anniversary of the 1986 New York Mets, especially with their reunion this weekend. In addition to being one of the best teams in baseball history, they are one of the most interesting. While I look forward to seeing the team on the field together once again, it will be on a somber note since Gary Carter, who succumbed to brain cancer four years ago, won’t be there to join them. Carter is the team’s lone Hall of Famer and is the only member of the 1986 Mets that is no longer with us.

This got me to thinking about how rare it is to have every member of a team survive multiple decades after they last took the field. So if you can get past the morbidity of this article, let’s take a look at which teams could have a reunion with all of their players still around to take the field.

Year Team # of Players WS Win
1978 San Diego Padres
38
84-78
1978 Seattle Mariners
34
56-104
1979 San Diego Padres
35
68-93
1980 Milwaukee Brewers
33
86-76
1980 Texas Rangers
41
76-85
1982 Milwaukee Brewers
33
95-67
1982 Texas Rangers
40
64-98
1983 Boston Red Sox
31
78-84
1983 Milwaukee Brewers
36
87-75
1983 Toronto Blue Jays
33
89-73

1978 San Diego Padres and Seattle Mariners
The earliest teams with all of its players currently living are the 1978 Padres and Mariners. The Mariners were pretty dreadful, losing 104 games in their 2nd season of existence. The Padres (84-78, 11 GB), on the other hand, were just starting to acquire big names, thanks to the advent of free agency and Ray Kroc’s deep pockets. A reunion of the 1973 Padres would include Hall of Famers Dave Winfield, Rollie Fingers and Gaylord Perry. Not to mention Gene Tenace, Randy Jones and Oscar Gamble, to name a few. But while the 1978 Padres had a number of big names, teams usually hold reunions for pennant or World Series winners.

Earliest Pennant winners with all currently living players

Year Team # of Players WS Win
1982 Milwaukee Brewers
33
N
1992 Atlanta Braves
41
N
1992 Toronto Blue Jays
40
Y
1993 Toronto Blue Jays
38
Y
1995 Cleveland Indians
41
N
1996 Atlanta Braves
42
N
1997 Cleveland Indians
46
N
1997 Florida Marlins
43
Y
1999 Atlanta Braves
44
N

Brewers manager Harvey Kuenn passed away in 1988, but every one of his “Wallbangers” has survived from the 1982 team. 2017 will be the 35th anniversary of their American League Pennant and it will be great to have every member be able to attend a possible reunion.

Posted in General, Historical | 1 Comment

MLB.tv Game Changer (formerly Dashboard)

I can’t tell you how many times I’ve been watching a game on MLB.tv and all of the sudden, my twitter feed blows up when something big happens, like Giancarlo Stanton hitting another 450+ ft homerun. But I missed it because I didn’t know that he was at bat. So I decided to make something that will allow me to customize my baseball viewing experience. Something that will allow me to see as much of the baseball that I want to see. I got the idea from using Dan Brooks’ (of brooksbaseball.net) MLB.tv RedZone, which switches between the games with the highest leverage index. This allowed me to see the potential of using MLB’s gameday data.

Enter the MLB.tv Dashboard, which allows you to customize what you want to watch the most, and automatically switches between games based on your priorities. Here are some of the things you can customize:

  • Batters: Want to see Bryce Harper, Mike Trout, Giancarlo Stanton, or even Bartolo Colon at the plate? Add it to your list, and the application will switch to their game when they come to the plate.
  • Pitchers: Don’t want to miss Jake Arrieta or Clayton Kershaw pitch? This will switch to their game while they are on the mound and go to another game on your list while their team is batting.
  • Baserunners: It doesn’t get much more exciting than when Billy Hamilton is on the bases. If the runner of your choosing is on 1st or 2nd base with the next base open, the application will make sure you see it.
  • Fantasy Teams: The three settings above allows you add all of the players on your fantasy team to track their progress.
  • Teams: Let’s say you’re a huge Braves fan and that’s mostly what you want to see. But you also want to see each of Manny Machado’s at bats. Put Machado at #1 and Atlanta at #2 and you’ll get to watch your Braves games, but the application switches to Orioles games when Machado comes to the plate. It will switch back when his at bat is over.
  • Leverage Index: If you don’t want to miss a tense moment in any game, set the LI to something like “>= 3.0″ and it will switch to any game that meets that criteria.
  • No-hitters: Don’t want to miss a potential no-hitter, but don’t think it’s important until after the 7th inning? You can set that as a priority.
  • Vin Scully: There’s only a few more months left that we get to appreciate the greatest announcer in baseball history. This setting will switch to Dodger games when they are at home, or on the road in San Diego, Anaheim and San Francisco.
  • Position Players Pitching: Who doesn’t love to see a left fielder pitch in a blowout or in the 18th inning? This setting will switch to a game if a non-pitcher is on the mound.
  • Extra Innings: Self-explanatory. This setting will switch to games that are in extra innings.
  • Replay Challenge/Review: Let’s say you are a masochist and you want to see all replay challenges. This will switch to those situations.

Now suppose none of your priority items are met, or your teams are on commercial break. In that event, the application will switch to the game with the current highest leverage index. It will keep switching between these games until any of your priority criteria are met.

In addition to the above priority items, you can also avoid watching any teams of your choosing. If you’re blacked out from seeing a certain team, you can add them to your ignore list and the application will avoid changing to their games since you won’t be able to watch them.

We all know MLB.tv has a delay from the actual live game to when you see it on your screen. You can adjust the delay timer setting if the games are switching too early or late. However, I recommend not changing this setting too much since it can severely alter the experience.

Finally, you can set whether or not you want the application to wait for the current at bat to finish before switching to a higher priority game, or if you want to it change immediately. The default setting is to wait. Changing the setting to switch immediately will allow you to not miss higher priority situations, but beware that you could see a lot of changing at inopportune times.

If you choose to use this, I hope you enjoy it. Let me know what you think.

Posted in Announcements, General, Site Additions | 7 Comments

Game Star Ratings

I’ve added a star rating to each game. It measures the “enjoyability” of a game based on a few different factors. There are many elements to a game that can make it enjoyable to the unbiased fan. I’ve tried to include the most important of these.

The rating system ranges from 0 stars to 5 stars and goes in increments of .25 stars. The average game will be around 2.5 stars.

Leverage Index (aLI)
The first and most important element is leverage index, which measures the importance of each situation in the game. The more crucial a moment in a game is to the outcome, the higher the leverage index will be. A leverage index of “1” is average. A leverage index of “3” is equal to three times as important as the average play. FanGraphs has a primer on LI for those interested.

In the game rating formula, I use average leverage index over the course of the entire game. I could have chosen to just go with the top X plays in a game, or the number of plays over a certain threshold, but I felt the average over the course of the game is best suited to gauge the intensity of the entire game.

Win Expectancy Change (WE+/-)
Next is change in win expectancy per play. Suppose an RBI single increases a team’s win expectancy from 55% to 65%. That would obviously be an increase of 10 percentage points. I calculate the average absolute value of WE change over the course of the entire game and use that number for my rating formula.

The average play in the average game will have a win expectancy change of about 3.3 percentage points. Bigger and more exciting plays will increase this number, while plays in blowout games will do the opposite.

Leverage index and win expectancy change very likely have a high correlation, which would cause these elements to be “double counted”. I have taken that into consideration and am fine with it since they are the most important factors in gauging a game’s intensity.

Championship Leverage Index (CLI)
Championship leverage index is similar to in-game leverage index (above), in that it gauges the importance of a single game as opposed to a single play. The game importance is measured in how much a team’s probability of winning the World Series changes in a win versus a loss.

The average game will have a CLI of 1 and is equal to the average game on opening day. In the 2nd Wild Card era (2012-present), the average game on opening day can change a team’s chances of winning the World Series by 0.59 percentage points.

The CLI used in this ratings formula is the average of the two team’s CLI for this game.

Examples: A team that is already eliminated has a 0% chance of winning the World Series. A win will not increase their chances, so their CLI will be 0. The same goes for a team that has already clinched their division. A division title ensures that a team is 1 of the 8 teams in the postseason tournament, meaning they have a 12.5% chance of winning the World Series. A win or a loss after clinching the division will not change this number. But a one-game playoff for the division (game 163) is a “win or go home” scenario and will have a CLI of around 21, since it is 21 times more important than the average game on opening day.

Comeback (CB)
The final element is comeback, which is defined as the highest win expectancy the losing team reached during the game. A comeback can range anywhere from 50 percentage points to 100 percentage points. A comeback of 100 percentage points means that the losing team had a 100% chance of winning, but still managed to lose the game. A comeback of 50 percentage points means the losing team was never able to increase their win expectancy above the 50% level at the beginning of the game and likely means the game was never much in doubt.

Formula and Weights
Each of the four elements (LI, WE+/-, CLI, CB) are individually compared to a large sample of games ranked in a percentile. These percentiles are then weighted and combined to create the star rating. The weights are:
aLI = 1.5
WE+/- = 1.5
CLI = 1
CB = 1

Example: A game has an average leverage index of 1.25, an average win expectancy change of 4.5 percentage points, a championship leverage index of 1.55, and a 85% comeback. Their percentiles and weights are:
aLI = 70 * 1.5
WE+/- = 82 * 1.5
CLI = 92 * 1
CB = 90 * 1

Their sum is 410. This number is divided by 25 and rounded to the nearest whole number. It is finally divided by 4 to give you the star rating. This game would be a 4 star game (410 / 25 = 16.4 = 16 / 4 = 4).

Elements of a game not currently included in star rating system
Individual game performances and milestones. A player hitting 4 HR in a game is exciting and uncommon and makes each of the at bats more important. A pitcher taking a no-hitter or perfect game late into the game has the same effect. These types of elements are currently not included, but are “on the table” for future versions.

Star Players
One could argue that the more superstar players in a game could make it more enjoyable. This rating system does not take the players superstar status or skill level into account.

Special Games
While Derek Jeter’s final home game was exciting in its own right, I would argue that it was even more enjoyable since it was his final game at Yankee Stadium. This rating system doesn’t take these rare situations into account.

The Home Crowd’s Enjoyment
As mentioned above, this star rating measures the enjoyment for the unbiased fan. The home crowd may have a different definition of an enjoyable game based on whether their team wins, but this system makes no such distinction.

Posted in Announcements, General, Statistical Analysis | 5 Comments

2016 Retrosheet Database

One of the best days of every offseason for a baseball nerd is Retrosheet annual end of season release day. It’s the day one of the best sites on the internet releases the play by play data from the previous season. If you’re like me, you download it immediately and go to town. But one thing I’ve always wanted was the ability to access the data during the current season.

So this past offseason, I designed a way to take mlb’s gameday data and convert it into a Chadwick-style retrosheet database. The database (.csv files) will be available and updated daily* in the downloads section. I’m making it available mainly because I know there are others out there, like me, that are interested in having an in-season pbp database. But also because I’d like to have more than one set of eyes on it, to help iron out the kinks and catch any errors.

Error Checking
I run a few processes to check for errors and to validate the data. But there is still the possibility that errors will come up from time to time. I’d like to make this a forum for error reporting, for those who are interested in helping.

Daily Download
Just as with this website, I intend to have updates available daily. Usually, the site is updated in the morning. But with a full-time job and two toddlers at home, there can sometimes be a delay.

Missing columns
There are a few columns in the events table that I have left blank:
“EVENT_TX”: It turns out that it is a huge process to replicate this. While I believe the “EVENT_TX” column is helpful in quickly identifying the play, I don’t use it in my queries and felt it wasn’t worth the hassle. The same goes for the “BAT_PLAY_TX” and the “RUNX_PLAY_TX” columns.

“BATTEDBALL_LOC_TX”: Gameday does include hit locations for all balls in play, but I have yet to dive into this data. If there is someone who has experience with this data and is willing to assist in converting gameday’s x and y coordinates to Project Scoresheet locations codes, please let me know.

“UMP_ID”: These columns for the six umpires are currently left blank.

“GWRBI_BAT_ID”: This is left blank because game winning RBI’s are no longer officially recorded.

ID’s for players making their Major League debut
Since these players have yet to be assigned official ID’s by retrosheet, I just give them the next available ID for their name. For example, if a John Smith were to make a debut, he would be assigned ID “smitj005″, since 005 would be next in line.

Building a Retrosheet Database
For those who are interested in using the data, but lack experience, David Temple at TechGraphs recently created a helpful two part tutorial.

Donations
If you find this data useful and have some disposable income, please consider donating. I do not get paid for my work on this website and while it is my passion to work with baseball data, it does take a lot of time and money (server costs) to keep it up. I’d like to also suggest donating to David Smith and the Retrosheet team.





Posted in Announcements, General, Site Additions | Leave a comment

2011 Royals Farm System

The Royals are on the brink of winning their first World Series in 30 years, leading the Mets three games to one. While it’s certainly not over (5 of 43 teams have come back from being down 3-1 in the World Series), I figured it was as good of a time as any to write about the Royals 2011 Farm System.

Compilation of Royals Top Prospect Rankings
Prospect Pos FG BA BP Sickels Avg
Mike Moustakas 3B 1 3 1 1 1.5
Eric Hosmer 1B 2 1 3 2 2.0
Wil Myers OF 3 2 4 3 3.0
John Lamb LHP 5 4 2 6 4.3
Mike Montgomery LHP 4 5 5 5 4.8
Danny Duffy LHP 7 7 7 4 6.3
Chris Dwyer LHP 9 8 6 9 8.0
Christian Colon SS 8 6 8 11 8.3
Jeremy Jeffress RHP 11 8 9.5
Brett Eibner OF 12 10 14 10 11.5
Tim Collins LHP 13 13 10 16 13.0
Aaron Crow RHP 14 9 16 14 13.3
Tim Melville RHP 15 14 15 14.7
Johnny Giavotella 2B 21 18 9 12 15.0
Yordano Ventura RHP 10 12 13 27 15.5
Cheslor Cuthbert 3B 17 15 12 19 15.8
Louis Coleman RHP 20 19 17 13 17.3
Jason Adam RHP 16 11 15 28 17.5
Robinson Yambati RHP 19 16 11 26 18.0
Salvador Perez C 18 17 20 18 18.3
Patrick Keating RHP 22 22 17 20.3
Will Smith LHP 19 25 22.0
Derrick Robinson OF 23 26 18 22.3
Jarrod Dyson OF 26 20 23.0
Kevin Chapman LHP 23 23.0
David Lough OF 24 25 22 23.7
Jeff Bianchi SS 30 21 21 24.0
Orlando Calixte SS 24 24.0
Clint Robinson 1B 28 20 24.0
Buddy Baumann LHP 24 24.0
Noah Arguelles LHP 25 25.0
Humberto Arteaga SS 27 23 25.0
Henry Barrera RHP 27 27.0
Crawford Simmons LHP 28 28.0
Lucas May C 29 29.0
Elisaul Pimentel RHP 29 29.0
Kelvin Herrera RHP 30 30.0
Greg Holland RHP

Below is a list of the top ten farm systems in 2011, using five different rankings from the industry. The first three were a consensus among most of the publications, with Kansas City topping the list in each one.

Compilation of Farm System Rankings
Rank Team BA BP Law Sickels THT Avg
1 Royals 1 1 1 1 1 1.0
2 Rays 2 2 2 2 2 2.0
3 Braves 3 3 3 4 3 3.2
4 Blue Jays 4 5 4 5 4 4.4
5 Yankees 5 4 9 6 14 7.6
6 Reds 6 9 8 7 11 8.2
7 Indians 7 7 17 3 9 8.6
8 Angels 15 6 6 8 8 8.6
9 Phillies 10 8 5 11 12 9.2
10 Twins 12 15 7 9 6 9.8

Rankings alone won’t do the praise for this system justice, so here are some of the comments coming from those who ranked them:

Kevin Goldstein at Baseball Prospectus:

This is not just the best minor-league system in baseball, it’s the best by a wide margin. The more I wrote about these prospects, the more trouble I had figuring out any way for things to go wrong. Another winning record could occur as early as 2012, but more importantly, the team should return to annual playoff contention shortly thereafter.

Keith Law at ESPN:

The phrase “Mission Accomplished” has acquired an ironic connotation of late, but if anyone could use the phrase earnestly to describe his own efforts, it would be [Dayton] Moore, as the Royals have arms coming out of their ears.

That’s particularly impressive when you consider that Kansas City’s top two prospects are bats, and there are some solid position player prospects further down in the system.

Jim Callis at Baseball America:

The Royals set a record by placing nine players on our Top 100 Prospects list, starting with three of the very best hitting prospects in the minors in 1B Eric Hosmer, 3B Mike Moustakas and OF Wil Myers. They also have an enviable collection of lefthanders, led by John Lamb, Mike Montgomery, Danny Duffy and Chris Dwyer.

John Sickels at minorleagueball.com:

What can you say? This is one hell of a farm system. While the young pitching gets a large amount of attention, and deservedly so, the Royals also have three of the most elite young bats in baseball in the Moustakas/Hosmer/Myers troika.

Matt Hagen at The Hardball Times:

Kansas City has a plethora of top-end impact talent and loads of depth throughout. The best system in baseball and a reason to follow America’s pastime for long-suffering Royal fans.

It’s difficult to find one negative comment about this farm system. It contained high impact talent AND depth at almost all positions. Years of futility earned the Royals multiple early round draft picks. From 2005-2010, they had a top five pick on five different occasions. Additionally, Kansas City was active on the international front, signing players out of the Dominican (Kelvin Herrera & Yordano Ventura), Venezuela (Salvador Perez), and even Nicaragua (Cheslor Cuthbert).

In 2011, Doug Gray at minorleagueball.com assigned a monetary value to every prospect in baseball based on John Sickel’s grading system. He estimated the Kansas City farm system to be worth $243 million while the next best team (Tampa Bay) was worth $184 million. Here is a graph Doug provided, showing all farm systems in 2011:
2011 Farm System dollar values

But having a top farm system has never guaranteed success. Scott McKinney at Royals Review studied prospect success and failure rates and determined that 70% of Baseball America Top 100 prospects are failures. In August of this year, Alex Speier wrote in the Boston Globe:

Remarkably, none of the last 14 organizations to be designated with the top farm system by Baseball America has won a World Series since receiving that accolade. The last team to hoist a championship trophy following a top farm system ranking was the 2005 White Sox, four years after they’d been named the top farm system in 2001.

Of course, this will change if the Royals can win just one of the next three World Series games.

So what does a team with the best farm system need to do to reach the next step?
First, they need to continue to develop these players, as none of them are finished products.
The Royals did just that, as a good number of their prospects reached the big league level. Of course, not all of them have reached their potential, but that is expected.

Next, they need to surround this core with complimentary players.
Whether it is from outside the organization through free agency or via trade, Dayton Moore may have done his best work in this aspect. He parlayed Wil Myers, Jake Odorizzi & Mike Montgomery into James Shields and more importantly, Wade Davis. He also signed Chris Young, Edinson Volquez, Kendrys Morales & Ryan Madson to fill out the roster. Finally, at the trade deadline this year, Moore traded prospects to acquire the final pieces to the puzzle in Johnny Cueto and Ben Zobrist.

Finally, they need luck.
Because sometimes no matter how hard you try, things just don’t go as planned. On the other hand, there’s always a chance to find that diamond in the rough that you weren’t expecting. As Branch Rickey said, “Luck is the residue of design.” It’s hard to tell how much of the Royals success is luck, but by putting the organization in the best situation possible, they have been in position to capitalize on many breaks.

We can now look back retroactively at the 2011 farm system to see where it ranks among the 30 teams in terms of wins above replacement. Granted, it is far too early to make a final judgement on these systems as most of the players are still beginning their careers. Below is a table showing how many wins above replacement each team’s 2011 farm system has produced, along with their winning percentages in subsequent seasons. The top prospect is the player with the most WAR in that farm system.

# Teams WAR 2011 2012 2013 2014 2015 Top Prospect
1 DBacks 100.4 .580 .500 .500 .395 .488 Paul Goldschmidt
2 Royals 91.4 .438 .444 .531 .549 .586 Salvador Perez
3 Angels 88.2 .531 .549 .481 .605 .525 Mike Trout
4 Braves 85.3 .549 .580 .593 .488 .414 Andrelton Simmons
5 Cardinals 74.9 .556 .543 .599 .556 .617 Matt Carpenter
6 Rays 70.6 .562 .556 .564 .475 .494 Desmond Jennings
7 Reds 70.3 .488 .599 .556 .469 .395 Todd Frazier
8 Indians 63.9 .494 .420 .568 .525 .503 Jason Kipnis
9 Pirates 61.6 .444 .488 .580 .543 .605 Starling Marte
10 Nationals 60.6 .497 .605 .531 .593 .512 Bryce Harper
11 Blue Jays 59.2 .500 .451 .457 .512 .574 Brett Lawrie
12 Mets 58.1 .475 .457 .457 .488 .556 Matt Harvey
13 White Sox 56.8 .488 .525 .389 .451 .469 Chris Sale
14 Astros 56.3 .346 .340 .315 .432 .531 Jose Altuve
15 Mariners 50.7 .414 .463 .438 .537 .469 Kyle Seager
16 Athletics 49.8 .457 .580 .593 .543 .420 Josh Donaldson
17 Twins 46.8 .389 .407 .407 .432 .512 Brian Dozier
18 Dodgers 42.7 .509 .531 .568 .580 .568 Kenley Jansen
19 Yankees 42.2 .599 .586 .525 .519 .537 Jose Quintana
20 Padres 42.2 .438 .469 .469 .475 .457 Anthony Rizzo
21 Rockies 41.5 .451 .395 .457 .407 .420 Nolan Arenado
22 Orioles 40.5 .426 .574 .525 .593 .500 Manny Machado
23 Red Sox 40.0 .556 .426 .599 .438 .481 Josh Reddick
24 Marlins 38.1 .444 .426 .383 .475 .438 Christian Yelich
25 Giants 36.5 .531 .580 .469 .543 .519 Brandon Crawford
26 Cubs 33.4 .438 .377 .407 .451 .599 Welington Castillo
27 Brewers 26.7 .593 .512 .457 .506 .420 Mike Fiers
28 Tigers 24.4 .586 .543 .574 .556 .460 Drew Smyly
29 Phillies 23.6 .630 .500 .451 .451 .389 Jarred Cosart
30 Rangers 23.4 .593 .574 .558 .414 .543 Pedro Strop

*Note: I have only included players with positive career WAR in these totals.

Surprisingly, the Diamondbacks have accumulated the most WAR of any team from the 2011 prospect class. However, Kansas City is not far behind in second place. Whether or not this farm system turns out to be among the all-time greats remains to be seen. But if they end up winning the World Series in the next few nights, it will be impossible not to deem it a success.

Thanks to Hawkins DuBois for helping out with prospect lists.

Posted in General, Historical, Statistical Analysis | Leave a comment

Top Plays of the 2015 Postseason (so far)

During the postseason, I have been making series win probability charts. They’re available on the front page of the site and regularly on Twitter. I’ve also been calculating championship win probability (cWPA) added for all players. This is similar to single-game win probability added, but in the context of a postseason series. cWPA takes into account the level of postseason series. Each series is twice as important as the previous level. For example, the average play in the League Championship Series is twice as important as the average play in the Division Series. On the extreme ends, the World Series is eight times as important as the Wild Card game.

What follows are the top eleven plays during the 2015 postseason, according to championship win probability added. Note: .077 cWPA can be interpreted as increasing a team’s probability of winning the World Series by 7.7 percentage points.

1) Jose Bautista homers off Sam Dyson (.077 cWPA)

Game Five ALDS
Toronto’s ALDS win probability went from 64% to 94% on this swing. Whether you agree with the bat flip or not, you can’t deny the impact of the play. According to championship win probability added, it was the second biggest homerun by a Blue Jay in postseason history. Of course, the biggest was Joe Carter’s walk-off in Game Six of the 1993 World Series, which was worth .300 cWPA.

2) Jose Bautista homers off of Ryan Madson (.063 cWPA)

Game Six ALCS
Bautista lacks no flare for the dramatic. This homerun increased the Blue Jays series win probability by 13 percentage points (5% to 18%). While his homerun off Dyson a week earlier was a 30 point increase in series win probability, this one came in the ALCS, a round twice as important as the ALDS, making the two very comparable in value.

3) Luis Valbuena homers off of Johnny Cueto (.058 cWPA)

Game Five ALDS
The two runs from Valbuena’s shot were the first on the board in the deciding Game Five and they increased Houston’s series win probability from 42%-65%. Unfortunately for Valbuena and the Astros, they were unable to hold the lead.

4) Daniel Murphy homers off of Zack Greinke (.051 cWPA)

Game Five NLDS
This wasn’t the most memorable play of the series. That goes to the Chase Utley slide. It’s probably not even the most memorable of the game. That would be Murphy’s stolen base after the Lucas Duda walk. However, it did have the most impact. It increased New York’s series win probability from 41% to 62%, giving the Mets a lead they wouldn’t relinquish.

5) Edwin Encarnacion homers off of Cole Hamels (.051 cWPA)

Game Five ALDS
Encarnacion’s tied the game, albeit briefly, and increased Toronto’s series win probability from 39% to 60%.

6) Wade Davis strikes out Ben Revere (.048 cWPA)

Game Six ALCS
The biggest non-homerun of the postseason. Revere struck out in just 10% of his plate appearances during the regular season, but Davis was able to get him swinging with runners at 2nd and 3rd with one out in the 9th. However, there was an obviously questionable strike two call that should have been ball three. The strikeout increased the Royals series win probability from 83% to 92%.

7) Carlos Correa error, scoring two runs (.047 cWPA)

Game Four ALDS
This is an even bigger play when you consider the alternative, a double play that scores just one run. Instead, this play increased the Royals series win probability from 22% to 41%. Correa had been having a tremendous game, with two homeruns earlier. It should be noted that the ball did skip off of Tony Sipp’s glove, making for a very unusual hop for Correa.

8) Rougned Odor scores on Russell Martin’s error (.047 cWPA)

Game Five ALDS
The most bizarre play of the 2015 postseason, very unlikely to be topped. Choo’s bat, Martin’s misfortune and Odor’s heads-up base running increased the Rangers series win probability from 43% to 61%.

9) Eric Hosmer drives in Lorenzo Cain from first base (.045 cWPA)

Game Six ALCS
Multiple people deserve credit for this play. Hosmer for the single, Cain for the amazing hustle, Bautista for not hitting the cutoff man, and third base coach Mike Jirschele for the awareness to send Cain after Bautista threw to Tulowitzki. All-in-all, the play increased the Royals series win probability from 87% to 96%.

10) Daniel Murphy’s RBI double off of Zack Greinke (.042 cWPA)

Game Five NLDS
New York got an early run in the first inning on Murphy’s double. Kiké Hernandez had trouble picking up the ball off the wall, enabling Murphy to take third base. Murphy should be given credit for his hustle and not just coasting into second base. This play increased the Mets series win probability from 44% to 61%.

11) Javier Baez homers off of John Lackey (.042 cWPA)

Game Four NLDS
Baez, who was starting in place of the injured Addison Russell, gave the Cubs a lead on the first pitch from John Lackey. Baez was batting ninth, behind the pitcher Jason Hammel, who extended the inning on a single to center field. The homerun increased the Cubs series win probability from 70% to 84%.

It should be noted there are sure to be bigger plays in the World Series, given the bigger importance in the postseason’s highest level. In the 2014 postseason, the top 18 plays (according to cWPA) came in the World Series.

Posted in General, Statistical Analysis | Leave a comment

Greatest Postseason Comebacks

The Houston Astros and Kansas City Royals are headed for a decisive Game Five tonight. On Monday in Game Four, the Astros held a 6-2 lead and were just six outs from advancing to the ALCS. With one out in the bottom of the seventh, the Royals had a 0.8% series win expectancy. If Kansas City wins tonight, their comeback will go down as one of the greatest in postseason history. If they lose, it will be all for naught.

The following are the fifteen biggest comebacks in postseason history according to series win expectancy. These are comebacks from the brink of elimination, so you will not see single game comebacks such as Game Four of the 1929 World Series when the Athletics scored ten runs in the seventh inning after being down 8-0. This is because Philadelphia was not close to elimination.

15) 2012 NLDS

Giants defeat Reds (7.0% series win expectancy)

2012NLDS2
Background:
In 2012, MLB added a second wild card. To fit the wild card game in between the end of the regular season and the beginning of the division series, they changed the format of the division series from 2-2-1 to 2-3 in order to eliminate an extra travel day. This made it more difficult for the Giants, who lost both games at home and would have to win the final three in Cincinnati.

Low Point:
During the second inning of Game Three, the Reds held a 1-0 lead. Cincinnati was just seven innings from completing the sweep against the Giants.

The Outcome:
The Giants tied the game and that score remained until extra innings when the Reds scored on an error to force a fourth game. San Francisco never trailed during Game Four and the big blow came in the fifth inning of Game Five on a Buster Posey grand slam off of Mat Latos.

14) 1992 NLCS

Braves defeat Pirates (6.6% series win expectancy)

1992NLCS
Background:
This was essentially the Pirates last opportunity at a World Championship. The nucleus of the team that lost both the 1990 and 1991 championship series was breaking apart. Barry Bonds and Doug Drabek were all but gone at the end of the season because the Pirates wouldn’t be able to compete with large market clubs in free agency.

While the Braves, with one of the best farm systems in baseball and the backing of billionaire owner Ted Turner, were looking for their second straight World Series appearance.

Low Point:
Going into the bottom of the ninth during Game Seven, Atlanta was down 2-0. They were facing Doug Drabek, who was just three outs away from a complete game shutout.

The Outcome:
The Braves quickly loaded the bases on a double, error, and a base on balls. Pirates manager Jim Leyland brought in his closer, Stan Belinda, in hopes of stopping the rally. Atlanta scored their first run on a sacrifice fly and reloaded the bases on a Damon Berryhill walk. A popout by pinch hitter Brian Hunter gave the Pirates their first out. With the pitcher’s spot due up, Braves manager Bobby Cox went with Francisco Cabrera, who had just eleven plate appearances all season.

In the outfield, Pirates center fielder Andy Van Slyke motioned to left fielder Barry Bonds to move in a few steps. Bonds did not oblige and it proved to be costly. Cabrera singled to left, easily scoring David Justice. Sid Bream rounded third as Bonds fielded the ball and threw home. The throw was a little off-line and Bream was safe by inches. Had Bonds moved in, most baseball fans might not be able to recognize Francisco Cabrera’s name.

13) 2003 ALCS

Yankees defeat Red Sox (6.0% series win expectancy)

2003ALCS
Background:
It had been 85 years since the Red Sox last won a World Series. During that time, the rival Yankees had won 26 World Championships. It was known as “The Curse of the Bambino” and the Red Sox were going to have to get through their rival to end the suffering. The series was everything that Fox could have dreamed for and more. The two biggest markets, literally fighting their way to the World Series.

Low Point:
In the bottom of the eight in Game Seven, the Yankees were down 5-2 with just five outs remaining. On the mound for Boston was Pedro Martinez, who was going through arguably the most dominant stretch by any pitcher in baseball history.

The Outcome:
After the Yankees quickly scored their third run, Red Sox manager Grady Little went to the mound, but ultimately chose to stick with Martinez, rather than go with late inning relievers Mike Timlin or Alan Embree. The decision would ultimately be the defining moment in Little’s career as the Yankees tied the score just two batters later.

The game went into the 11th when Yankees third basemen Aaron Boone hit just the second pennant-clinching walk-off homerun in LCS history.

12) 1960 World Series

Pirates defeat Yankees (5.9% series win expectancy)

1960WS
Background:
This was the classic David vs Goliath matchup. The Yankees were in their eleventh World Series in the last fourteen years while the Pirates were the champions of the senior circuit for the first time in 33 years. While the two teams split the first six games, the Yankees had outscored the Pirates 46-17.

Low Point:
In the top of the eighth in Game Seven, the Yankees scored two runs to increase their lead to 7-4. The Pirates had just six outs remaining vs Bobby Shantz.

The Outcome:
The bottom of the eighth began with three straight singles, which cut the lead to two. Yankees manager Casey Stengel replaced Shantz with Jim Coates, who then retired the first two batters he faced. Roberto Clemente followed with a single that cut the lead to one. That brought up backup catcher Hal Smith, who entered the game in the sixth inning. At the beginning of Smith’s plate appearance, the Pirates had a series win probability of 29%. Five pitches later, Smith hit the biggest series changing homerun in baseball history. The three-run shot gave the Pirates a 9-7 lead and a series win probability of 94%.

Unfortunately for Hal Smith and the Pirates, the Yankees scored two in the top of the ninth to tie the game. What followed would be the first World Series clinching Walk-Off homerun in history. The highlight of Bill Mazeroski dodging Pirates fans while waving his helmet around would be seen countless times by baseball fans. All while Hal Smith’s contribution would go largely forgotten.

11) 1995 ALDS

Mariners defeat Yankees (5.7% series win expectancy)

1995ALDS2
Background:
The Mariners shouldn’t have been in the playoffs. On August 15th, they were 12.5 games behind the division leading Angels and had a 0.4% chance of winning the division. Over the final month and a half of the regular season, they battle back and won a one-game tiebreaker.

Low Point:
During the third inning of Game Four, the Yankees were up 5-0 and just six innings away from advancing to the ALCS.

The Outcome:
Seattle quickly scored four runs in the fourth inning and eventually took the lead for good in the eighth inning of Game Four, forcing a deciding Game Five.

Through seven innings of Game Five, New York held a 4-2 lead. Their starter David Cone, had thrown 118 pitches and went back to the mound to face the heart of the Mariners order in the eighth. A Ken Griffey Jr solo homerun and a Doug Strange bases loaded walk tied the game.

Randy Johnson, who had thrown seven innings just two days before, came in the ninth inning to stop a Yankees rally and then struck out the side in the tenth. The Yankees would finally get a run across on him in the eleventh, moving just three outs away from a series victory.

The bottom of the eleventh would go on to be arguably the greatest moment in Mariners history. Facing New York’s ace, Jack McDowell, Seattle would get back-to-back singles by Joey Cora and Ken Griffey Jr. Designated Hitter Edgar Martinez would follow with a two-run double down the left field line as Griffey came around to score the winning run. The 1995 Mariners created so much excitement in Seattle, they are often credited for saving baseball in the city, as they would soon get funding for what would become Safeco Field.

10) 2012 NLDS

Cardinals defeat Nationals (3.7% series win expectancy)

2012NLDS1
Background:
After so many years in the basement of the National League East, Washington was finally benefiting from multiple early first round draft picks. During the regular season, they won a franchise record 98 games. The Cardinals were defending World Series champions and in their ninth postseason in the previous thirteen years.

Low Point:
The two clubs split the first four games, going into a deciding Game Five in Washington. The Nationals jumped out to a 6-0 lead, hitting three homeruns in the first three innings on Cardinals ace Adam Wainwright.

The Outcome:
St. Louis slowly chipped away, scattering five runs in the next five innings. They would go into the top of the ninth down by two, facing the Nationals closer Drew Storen.

A double and two walks loaded the bases with two outs. Daniel Descalso singled in two runs to tie the score and Pete Kozma singled in two more to take a 9-7 lead. Cardinals closer Jason Motte provided a perfect bottom of the ninth to advance to the NLCS.

9) 1980 NLCS

Phillies defeat Astros (3.6% series win expectancy)

1980NLCS
Background:
The Phillies, who have been in existence since 1883, were still looking for their first World Series championship. They had lost three of the last four NLCS.

Low Point:
During Game Four, in the bottom of the sixth inning, the Astros were leading 2-0 and had just loaded the bases with just one out. Houston was just three innings away from appearing in their first World Series.

The Outcome:
With right-handed batter Luis Pujols coming up with the bases loaded, Phillies manager Dallas Green chose to replace his ace Steve Carlton with right-handed reliever Dickie Noles. Pujols then flew out to rightfielder Bake McBride, scoring Gary Woods from third on the sacrifice fly. However, several Phillies noticed that Woods left the base before McBride made the catch. An appeal followed and Woods was declared out and the run was taken off the board, ending the inning.

Philadelphia then scored three runs in the eighth before allowing Houston to tie it in the ninth, forcing extra innings for the third straight game. In the tenth, two runs on a Pete Rose single, and doubles by Greg Luzinski and Manny Trillo put the Phillies up for good.

Game Five would be the fourth straight extra inning game. There were four lead changes with each team having at least a 90% probability of winning before blowing their lead. The 1980 NLCS would go down as one of the most exciting postseason series in baseball history.

8) 2014 AL Wild Card

Royals defeat Athletics (3.2% series win expectancy)

2014ALWC
Background:
The Kansas City Royals had not appeared in the postseason since 1985

Low Point:
Through seven innings, Oakland was ahead 7-3 with their ace Jon Lester on the mound.

The Outcome:
In the eighth, the Royals cut the deficit to one run by scoring three while stealing four bases on Oakland’s backup catcher Derek Norris, Jon Lester, and Luke Gregerson (who replaced Lester midway through the inning).

In the bottom of the ninth, the Royals manufactured the tying run with a Josh Willingham single, another stolen base by pinch runner Jarrod Dyson, and a sacrifice fly by Nori Aoki. The game would remained tied until the twelfth when Oakland took another lead with a walk, wild pitch, and single. Kansas City would have to mount yet another come back. A Christian Colon single scored Eric Hosmer (who tripled) to tie the game. Colon stole the Royals seventh base of the game to get into scoring position. Finally, catcher Salvador Perez singled in the winning run down the left field line, advancing Kansas City to the division series.

For a much more detailed account, you’ll certainly enjoy Andy McCullough’s The night Kansas City baseball came back to life.

7) 1968 World Series

Tigers defeat Cardinals (2.8% series win expectancy)

1968WS
Background:
1968 was the year of the pitcher. Cardinals ace Bob Gibson posted a ridiculous 1.12 ERA while Tigers hurler Denny McLain became the first 30 game winner since Dizzy Dean in 1934.

Low Point:
St. Louis took three of the first four games of the series. In the fourth inning of Game Five, they held a 3-0 lead and had two runners aboard with just one out.

The Outcome:
On the mound for Detroit was Mickey Lolich, who would throw three complete games during the series and earn MVP honors. Lolich would quickly stop the threat in the fourth inducing a flyout and a strikeout. In the bottom half, the Tigers cut the lead to one with two triples, a sacrifice fly and single. They would take the lead for good in the seventh and force a Game Six (which was a 13-1 Tigers blowout).

Game Seven looked like a classic pitchers duel between Lolich and Gibson, with each club scoreless through six innings. In the top of the seventh, Gibson got two quick outs before Detroit started to rally. Norm Cash and Willie Horton each singled, but the big blow came on Jim Northrup’s two-run triple. Detroit held on to the lead and won their first World Series since 1945.

6) 2003 NLCS

Marlins defeat Cubs (2.2% series win expectancy)

2003NLCS
Background:
The Cubs had not won a World Series since 1908 and had not won a pennant since 1945, while the Marlins were looking for their second World Series appearance in their eleventh season.

Low Point:
In the top of the eighth in Game Six, the Cubs held a 3-0 lead and were just five outs away from returning to the World Series.

The Outcome:
After getting the first out of the inning, Cubs pitcher Mark Prior allowed a double to Marlins leadoff hitter Juan Pierre. The next batter, Luis Castillo lofted a flyball down the leftfield line in foul territory. In what would forever be known as the “Steve Bartman Incident”, Cubs leftfielder Moises Alou leaped at the wall, only to have the ball deflected by a fan going for the souvenir. Alou was visibly upset because it would extend the at bat, which turned into a base on balls. The next batter, Ivan Rodriguez singled in Pierre, cutting the lead to two runs.

The next play is the most pivotal of the series, and should be more infamous than the Steve Bartman interference. Marlins rookie Miguel Cabrera hit a routine groundball to shortstop Alex Gonzalez. It’s possible the Cubs could have turned an inning ending double play. At the least, they could settle for one out. But Gonzalez misplayed the ball, and the error loaded the bases with just one out. Seven more runs would score in the inning and the Marlins would live to see another day.

Game Seven got off to a poor start for the “North Siders” as Miguel Cabrera hit a three-run homerun off Kerry Wood. But that lead wouldn’t last long as the Cubs scored three in the second to tie it and another two in the third inning to take the lead. The Cubs were now eighteen outs away from the World Series. Whether it was the “Curse of the Billy Goat“, bad luck or poor performance, the Marlins re-took the lead for good in the fifth inning, advancing to the Fall Classic.

5) 2011 World Series

Cardinals defeat Rangers (2.1% series win expectancy)

2011WS
Background:
The Rangers were back in the World Series after losing in 2010 to the Giants. The Cardinals were in their eighth postseason in the previous twelve years.

Low Point:
In the bottom of the ninth inning of Game Six, the Rangers led 7-5 and were just two outs away from a World Series championship.

The Outcome:
Facing Texas closer Neftali Feliz, Albert Pujols doubled into the left-centerfield gap. The next batter, Lance Berkman drew a four-pitch walk, putting the tying run on base. Feliz got Allen Craig to strikeout for the second out of the inning. The Rangers outfield was playing deep in hopes to limit any hit to a single. Unfortunately, rightfielder Nelson Cruz was not playing deep enough as David Freese lined a ball just out of his reach and off the wall, bringing in both runs and to tie the score.

The Cardinals “momentum”, however, was short lived as in the top of the tenth inning, Josh Hamilton hit a two-run homerun off of Cardinals closer Jason Motte. Surely this would be enough to give the Rangers their first World Championship. But in the bottom half, the Cardinals manufactured two runs on three singles, a sacrifice bunt and an RBI groundout. The game was yet again tied.

After Jake Westbrook held the Rangers scoreless in the top of the eleventh, ninth inning hero David Freese came to plate. On a full-count and the seventh pitch of the at bat, Freese hit a walk-off homerun to dead center, forcing a Game Seven. As the ball cleared the fence, play-by-play announcer Joe Buck told the audience “We will see you tomorrow night”. This was a nod to his late father, who twenty years earlier in the 1991 World Series used the same line when Kirby Puckett hit a walk-off homerun.

In Game Seven, the Rangers once again took a two-run lead in the first inning on doubles by Josh Hamilton and Michael Young. The lead would be short-lived as the Cardinals answered back with two of their own in the bottom of the first. In the second, David Freese added to his story with a two-run double. This is all the Cardinals would need as they held the Rangers scoreless the rest of the way.

4) 2004 ALCS

Red Sox defeat Yankees (1.8% series win expectancy)

2004ALCS
Background:
After the disappointment of losing the 2003 ALCS, the Red Sox were facing the Yankees once again a year later.

Low Point:
This time around, the Yankees took the first three games and looked to be well on their way to another American League pennant. No team in baseball history had ever come back from being down 3-0 in a seven game series.

In the top of the ninth in Game Four, the Yankees held a 4-3 lead as Derek Jeter led off with a walk.

The Outcome:
Closer Keith Foulke, in his second inning of work, quickly retired the side on a groundout, strikeout and popup. Just after midnight in the bottom half of the inning, with three outs remaining, Kevin Millar drew a walk against future Hall of Famer Mariano Rivera. With the tying run on base, Boston manager Terry Francona replaced Millar with pinch runner Dave Roberts. With Bill Mueller at the plate, Roberts drew three straight pickoff throws from Rivera. But on the first pitch of the at bat, Roberts took off and slid into second base, just ahead of the throw from Jorge Posada. It was the biggest stolen base in Red Sox history. The next pitch, Mueller showed bunt but wisely pulled back. On the third pitch, Mueller shot a groundball into center to tie the score. The game would go into the twelfth inning, when David Ortiz hit a two-run walk-off homerun off of Paul Quantrill.

Game Five would last fourteen innings and ended on another walk-off hit by David Ortiz. This time it was an RBI single scoring Johnny Damon.

Game Six was back in New York and was known as the “Bloody Sock Game”. Red Sox starter Curt Schilling would need a tendon in his right ankle temporarily sewn in place, allowing him to pitch. Schilling went seven innings and allowed one run, earning the win and forcing the series to to a seventh game.

Game Seven was the least competitive of all the games as Boston jumped out to an early 6-0 lead, capped off by a Johnny Damon grand slam in the second inning. The Red Sox became the first team in history to win the final four games after losing the first three.

3) 2002 World Series

Angels defeat Giants (1.5% series win expectancy)

2002WS
Background:
The Anaheim Angels were in the World Series for the first time in their 42nd year of existence. The Giants were looking for their first World Series championship since their New York days in 1954.

Low Point:
In the bottom of the seventh inning in Game Six, the Giants held a 5-0 lead, just eight outs away from winning the World Series.

The Outcome:
With Giants starter Russ Ortiz still on the mound in the seventh, the Angels got two runners aboard on singles by Troy Glaus and Brad Fuller. With thunder sticks banging and the Rally Monkey jumping, Scott Speizio took Giants reliever Felix Rodriguez deep to cut the lead to two runs. It would take the Giants two more relievers to stop the rally and keep the two-run lead.

In the eighth, lead-off hitter Darin Erstad cut the lead to one run with a solo shot off of Tim Worrell. That was followed by Tim Salmon and Garret Anderson singles, with still nobody out. Giants manager Dusty Baker then decided to bring in his closer Robb Nen to stop the bleeding. However, Troy Glaus, the first batter to face Nen, drove in the tying and go-ahead runs on a double. The Angels lead would stand as they forced a final game.

San Francisco got out to an early Game Seven lead in the second inning on a sacrifice fly by Reggie Sanders. This would not last long as the Bengie Molina would tie the score in the next half with an RBI double. A bases-clearing double by Garret Anderson in the third innging brought the score to 4-1 and the Angels the first World Series championship.

2) 1986 ALCS

Red Sox defeat Angels (0.8% series win expectancy)

1986ALCS
Background:
It had been 68 years since the Red Sox had last won a World Series and their last appearance in the fall classic in 1975 would end in heartbreak. The Angels were looking for their first World Series appearance in their 26 year history.

Low Point:
California took three of the first four games and held a 5-2 lead through eight innings of Game Five. They were just three outs away from going to the World Series.

The Outcome:
In the top of the ninth, facing Angels starter Mike Witt, had one aboard on a Bill Buckner single and two outs remaining. Red Sox designated hitter and former Angels MVP Don Baylor cut the lead to one on a two-run homerun. Witt would get the next batter, Dwight Evans, to pop up for the second out. Angels manager Gene Mauch finally replaced Witt with reliever Gary Lucas, who would hit Rich Gedman on the first pitch. With the tying run aboard, Mauch brought in his closer Donnie Moore to face Dave Henderson. Henderson deposited the seventh pitch over the left field wall to take the lead. As a child, I thought that Henderson jumped six feet into the air as he watched the ball clear the fence. While his vertical leap is not 72 inches, he did get some major hang time.

What is often forgotten in this game is the Angels tied the score in the bottom of the ninth on a Rob Wilfong RBI single. The score would remained tied until the eleventh inning when Dave Henderson would once again come up to face Donnie Moore, this time with the bases loaded and no outs. Henderson drove in another run on a sacrifice fly to center fielder Gary Pettis. This lead would hold and Boston would live to see another day.

Boston easily won Game Six 10-4 and Game Seven 8-1 to get another shot at the World Series.

1) 1986 World Series

Mets defeat Red Sox (0.6% series win expectancy)

1986WS
Background:
After Boston came back from having a 0.8% win expectancy in the ALCS, they were set to face the 108-win New York Mets in the World Series.

Low Point:
In the bottom of the tenth inning of Game Six, the Red Sox held a 5-3 lead with just two outs and the bases empty.

The Outcome:
The Mets’ last hope, catcher Gary Carter, singled to left. The pitcher’s spot was up next, so Mets manager Davey Johnson called on pinch hitter Kevin Mitchell, who lined the next pitch into center, moving Carter to second base. Ray Knight, the potential go-ahead run, lined an 0-2 pitch to right-center, scoring Carter and cutting Boston’s lead to one run.

Red Sox manager John McNamara chose to replace Schiraldi with Bob Stanley, who would uncork a wild pitch on the eighth pitch vs Mookie Wilson. This tied the game as Kevin Mitchell crossed the plate, while Ray Knight moved to second base. The next play would be the defining moment in first basemen Bill Buckner’s career. Mookie Wilson grounded the next pitch down the first base line in what looked to be a routine groundout to end the inning. Unfortunately for Buckner and the Red Sox, the ball scooted through his legs as Knight came around to score the winning run.

Boston took a three-run lead in the third inning of Game Seven, on solo homeruns by Dwight Evans and Rich Gedman and an RBI single by Wade Boggs. That lead lasted until the sixth when the Mets scored three of their own to tie the game. New York would score three more an inning later and that would be all they would need for the first World Series championship since 1969.

Posted in General, Historical, Statistical Analysis | Leave a comment

Season Similarity Scores

While Bill James was writing his Abstracts in the 1980’s, he came up with Similarity Scores, which puts a number on how similar the statistics are from two different players. Scores can range from 1000 for the most identical players to 0 for the most dissimilar. Consider that two players that are very much alike in Andre Thornton and John Mayberry Sr, who have a similarity score of 974 while Neifi Perez and Babe Ruth have a similarity score of 0.

I decided to take this concept and apply it to the winning percentages of teams in a single season and compare it to teams in other seasons. The method is very simple:
Find the difference between each team’s winning percentage from the two seasons being compared. For example, the New York Yankees in 2015 had a winning percentage of .537 while the Yankees of 1927 had a winning percentage of .714. The difference is .177 (.714-.537). Do this for all teams and find the average. Then multiply that average by 1000 and subtract it from 1000. Two identical seasons would have a score of 1000.

When comparing years with expansion teams (1961-present) to years before expansion (1901-1960), I only included teams that were active during the two seasons. So when comparing 2015 to 1915, only 16 teams will be included in the comparison since 14 teams were not active in 1915.

I was most curious to see which season is the most similar to 2015. Surprisingly (I guess any season before 2010 would surprise me), the most similar season to 2015 is 1945, which had a similarity score of 966. In fact, this is the second closest match (highest score) between two seasons in the entire history of baseball.

2015 vs 1945 (Similarity Score: 966):

2015-1945

Of course, when comparing 2015 to 1945, we are only considering the winning percentages of 16 of the 30 teams. So if we limit our search to the 30-team era (1998-2015), we’ll find that the most similar season to 2015 is …… 2014. Kind of boring, I know. But it’s natural that the most similar seasons are sequential since that is when team’s will have the most similar personnel.

2015 vs 2014 (Similarity Score: 937):

2015-2014

Now what if we want to see which season is most dissimilar to 2015 during the 30-team era? That would go to the 2002 season.

2015 vs 2002 (Similarity Score: 908):

2015-2002

Which two seasons in history have the most similar winning percentages? The 1922 and 1923 seasons. Again, back-to-back seasons are more likely to have higher scores because their team’s have changed the least.

1923 vs 1922 (Similarity Score: 968):

1923-1922

Next, these two seasons have the most dissimilar winning percentages:

1953 vs 1909 (Similarity Score: 819):

1953-1909

Finally, I’ve included a list of the top and bottom 15 similarity scores of all-time:

Highest Scores   Lowest Scores
Years Score Teams   Years Score Teams
1923 vs 1922 968 16   1953 vs 1909 819 16
2015 vs 1945 966 16   1954 vs 1909 831 16
1973 vs 1971 963 24   1961 vs 1912 836 16
1957 vs 1956 963 16   1915 vs 1906 836 16
1959 vs 1958 962 16   1946 vs 1906 839 16
1973 vs 1972 962 24   1943 vs 1909 839 16
1997 vs 1959 961 16   2002 vs 1908 840 16
1964 vs 1963 961 20   1956 vs 1912 840 16
1911 vs 1910 961 16   1942 vs 1907 840 16
1931 vs 1930 960 16   1953 vs 1912 842 16
1997 vs 1958 959 16   1942 vs 1913 842 16
1942 vs 1941 959 16   1942 vs 1909 842 16
1937 vs 1936 959 16   1957 vs 1912 843 16
1966 vs 1965 959 20   1928 vs 1915 843 16
1999 vs 1958 958 16   1921 vs 1909 843 16
Posted in General, Historical, Statistical Analysis | 2 Comments