April 8, 2020

Ballparks Database Updated!

March 15, 2011 by · 4 Comments 

Last month we rolled out the online version of the Seamheads Ballparks database, which contained descriptive information about every park ever used as a major league stadium, plus calculations of the impact on batting components for LH and RH batters beginning in 1950.

Today we’ve released an update to the original data.   The latest detailed documentation can always be found here, but here is a quick summary of the improvements:

1.  Added the descriptive park data from Ron Selter’s book Ballparks of the Deadball Era. This new and improved data covers parks used in the 1901 – 1919 seasons.   One side effect of using this newer data is that, for some parks, it made it appear that a change occurred in 1920 to the park, as the dimensions now differ between 1919 and 1920, when in reality it was just that the 1919 data was more accurate.   To mitigate this issue, we extrapolated Mr. Selter’s data past 1919 and into the 1920’s until we reached a season where we were reasonably certain that physical changes were actually made to the park.

2.  Added data provided by Clem Comly of Retrosheet.org for the years 1919-1949 from the Retrosheet box score event files that enable us to create estimated LH/RH splits for these pre-1950 seasons.    They are not yet ‘true’ observed splits as, without play by play data, switch hitters must be excluded from our calculations, but they should be some of the best estimated splits you can find anywhere.

We’ll be diving into the data in some future articles, but for now, just a brief word about the park factor calculations.  We provide two sets of calculations – 1-year factors and 3-year factors.

The 1-year factors are ‘observed’ factors.   While we do use an ‘other parks corrector’ as described in the detail documentation, these are essentially the factors that were observed for that particular year – so a 120 doubles factor for LH batters in Fenway Park means that left-handed batters hit 20% more doubles at Fenway than LH batters for those same teams’ batters hit in games away from Fenway.

The 3-year factors are attempts at calculating the ‘true’ factors.  There are many, many ways we could have constructed our formula, and it’s difficult to determine what the ‘right’ way is, but we believe our way is at least a good and defensible way.    Our basic formula is to use the 1-year factors for the season in question, the season immediately preceding, the season immediately following, and then the park’s long-term historical factor, all weighted equally.    As some parks have rather long histories, while other may have life for only a few seasons, this is not a perfect method, but we believe it retains a basic simplicity while providing for a high degree of accuracy in estimating a park’s impact on offensive events.

We welcome any feedback on any of the data or suggestions for improvement, so try it out and enjoy!



4 Responses to “Ballparks Database Updated!”
  1. Mark says:

    Love the ballpark index. For some reason I can spend hours with stuff like this

    One correction. The St George Cricket Grounds. You list the city as St George. That’s the neighborhood. This field was on Staten Island which at the time was an independent city and is a borough, just like Brooklyn. You list Ebbets Field as Brooklyn, not Flatbush. So the Cricket Grounds should say Staten Island, NY. Give us Staten Islanders something to be proud of. :-)

  2. KJOK says:

    Mark – Thanks for the comment. I sometime have questions about 19th century New York park locations, and whether they should be called just New York, or a borough name, or a neighborhood name. It does appear Staten Island would probably be the location name most consistent with what we have for other New York locations, so we’ll get that updated soon.

  3. John M. says:

    What were your sources on the locations of the 1800’s parks no longer in existence? How did you find the data for parks only used for one or just a few games? e.g Fairview Park Fair Grounds, Dover, DE.

  4. KJOK says:


    The source for most of the locations for 1800’s parks is book “Green Cathedrals”.
    Descriptive data for parks used for just a few games also came from Green Cathedrals. Batting data for parks used for just a few games comes mainly from Retrosheet.org.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!