Ballparks Database Updated!
Last month we rolled out the online version of the Seamheads Ballparks database, which contained descriptive information about every park ever used as a major league stadium, plus calculations of the impact on batting components for LH and RH batters beginning in 1950.
Today weâ€™ve released an update to the original data.Â Â The latest detailed documentation can always be found here, but here is a quick summary of the improvements:
1.Â Added the descriptive park data from Ron Selterâ€™s book Ballparks of the Deadball Era. This new and improved data covers parks used in the 1901 â€“ 1919 seasons.Â Â One side effect of using this newer data is that, for some parks, it made it appear that a change occurred in 1920 to the park, as the dimensions now differ between 1919 and 1920, when in reality it was just that the 1919 data was more accurate.Â Â To mitigate this issue, we extrapolated Mr. Selterâ€™s data past 1919 and into the 1920â€™s until we reached a season where we were reasonably certain that physical changes were actually made to the park.
2.Â Added data provided by Clem Comly of Retrosheet.org for the years 1919-1949 from the Retrosheet box score event files that enable us to create estimated LH/RH splits for these pre-1950 seasons.Â Â Â They are not yet â€˜trueâ€™ observed splits as, without play by play data, switch hitters must be excluded from our calculations, but they should be some of the best estimated splits you can find anywhere.
Weâ€™ll be diving into the data in some future articles, but for now, just a brief word about the park factor calculations.Â We provide two sets of calculations â€“ 1-year factors and 3-year factors.
The 1-year factors are â€˜observedâ€™ factors.Â Â While we do use an â€˜other parks correctorâ€™ as described in the detail documentation, these are essentially the factors that were observed for that particular year â€“ so a 120 doubles factor for LH batters in Fenway Park means that left-handed batters hit 20% more doubles at Fenway than LH batters for those same teamsâ€™ batters hit in games away from Fenway.
The 3-year factors are attempts at calculating the â€˜trueâ€™ factors.Â There are many, many ways we could have constructed our formula, and itâ€™s difficult to determine what the â€˜rightâ€™ way is, but we believe our way is at least a good and defensible way.Â Â Â Our basic formula is to use the 1-year factors for the season in question, the season immediately preceding, the season immediately following, and then the parkâ€™s long-term historical factor, all weighted equally.Â Â Â As some parks have rather long histories, while other may have life for only a few seasons, this is not a perfect method, but we believe it retains a basic simplicity while providing for a high degree of accuracy in estimating a parkâ€™s impact on offensive events.
We welcome any feedback on any of the data or suggestions for improvement, so try it out and enjoy!