{"id":32749,"date":"2019-02-28T15:47:10","date_gmt":"2019-02-28T20:47:10","guid":{"rendered":"https:\/\/seamheads.com\/blog\/?p=32749"},"modified":"2019-03-03T13:22:06","modified_gmt":"2019-03-03T18:22:06","slug":"estimating-park-factors-for-the-negro-leagues","status":"publish","type":"post","link":"https:\/\/seamheads.com\/blog\/2019\/02\/28\/estimating-park-factors-for-the-negro-leagues\/","title":{"rendered":"ESTIMATING PARK FACTORS FOR THE NEGRO LEAGUES"},"content":{"rendered":"<p>Most serious baseball fans understand that ballparks can have a large impact on statistical performance. However, trying to measure that exact impact often proves difficult. In work we\u2019ve done on Major League ballparks at Seamheads.com, it takes about three full years of data, regressed by about one additional season worth of games, to get a decent prediction of how a park will impact runs scored in year four. Some analysts have argued that about TEN (!) years of data would be the proper sampling to understand park effects IF parks did not change during that time. Unfortunately for those of us who would like to calculate an effect, parks do often get changed, old parks are abandoned, and new parks are built, all making it more difficult to pinpoint exactly how a park might impact offense at any point in time.<\/p>\n<p>A ballpark factor is a measure of how a ballpark influences batting events, with run scoring being the primary event measured. It\u2019s usually calculated as an index with 100 representing a neutral park for a league, above 100 meaning a park makes run scoring easier than an average park, and below 100 meaning a park makes scoring harder than an average park. For example, a factor of around 82 would mean run scoring in that park is reduced by around 18% compared to a neutral park. On the other hand, a ballpark factor close to 109 would mean run scoring is increased by around 9% for games played in that particular park. Park factors can also be calculated as plus or minus runs scores, with a -1.00 indicating the park reduces runs scored per game by 1 run for each team.<\/p>\n<p>Typically, for the major leagues, park factors are calculated by looking at home versus away statistics for a given team, perhaps making some adjustments for a team not facing its own pitchers and batters. For the Negro Leagues, however, we of course have complicating factors. For one, teams did not play an even number of home versus away games against each team. For another, teams often played at alternative or neutral sites. Then there\u2019s the problem of having a limited number of games against other blackball teams in a given year. Finally, there are the unbalanced schedules where a team may have played an \u2018easier\u2019 mix of opponents at home as opposed to on the road, or vice versa.<\/p>\n<p>Of course at Seamheads we don\u2019t let these little issues stop us from trying to make sense of the Negro Leagues, so we\u2019ll use what we have and see where it takes us.<\/p>\n<p>First, let\u2019s look at all known major blackball versus blackball team scores from 1902 through 1948 (note: some seasons are not complete), and see which parks have hosted the most games:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-32759\" src=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-1-Negro-League-Park-Factors-1.png\" alt=\"\" width=\"645\" height=\"750\" \/><\/p>\n<p>Schorling Park in Chicago, formerly home of the Chicago White Sox before Comiskey Park was built, easily tops the list, hosting over 8% of all the games we have in our database.<\/p>\n<p>We said earlier that for major league parks you need about three years worth of data, which is around 240 games, to do a \u2018good\u2019, reliable park factor calculation. Looking at the table, we only have thirteen parks that have even hosted more than 240 games, and those are spread out over 30 years in some cases. At this point we can forget about even considering doing a \u2018one year\u2019 park factor given the small number of games, but we\u2019ll still hope we can do some type of factor that will encompass multiple years.<\/p>\n<p>We do have one factor working in our favor. Schorling Park, the park used more than any other, was formerly a major league park, known as South Side Park (III). We have major league park factors for this park, and those can give us a hint of what type of park this might be, and even a point of reference to compare the Negro Leagues with the American League.<\/p>\n<p>Here are the calculated one year Runs factors from the seamheads.com ballparks database for Schorling Park aka South Side Park, with ranking indicating how much of a pitcher\u2019s park:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-32766\" src=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-II-Negro-League-Park-Factors.png\" alt=\"\" width=\"222\" height=\"231\" \/><\/p>\n<p>Out of the 16 primary MLB parks, four times South Side Park was the most pitcher friendly park in baseball, using one-year park factors. This will be a good reference point for us going forward.<\/p>\n<p>Let\u2019s start our analysis by just looking at the historical run scoring in each park:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-32770\" src=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-III-Negro-League-Park-Factors-1.png\" alt=\"\" width=\"468\" height=\"570\" srcset=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-III-Negro-League-Park-Factors-1.png 468w, https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-III-Negro-League-Park-Factors-1-246x300.png 246w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><\/p>\n<p>For our time period, the Negro Leagues averaged just over 10 runs per game (5 per team per game) so that\u2019s a nice number to be able to calculate against. Just looking at total runs we see Schorling Park averaged 8.4 runs per game, which would calculate to a park factor of 82, with a run adjustment per team of almost 1 run less expected to be scored per game than average. Highlighting a few other extreme parks, Stars Park in St. Louis And Catholic Protectory Oval in New York appear to be \u2018Coors like\u2019 in their impact on offense, while Crosley Field, Yankee Stadium and Comiskey Park substantially reduce offense.<\/p>\n<p>This is a nice start, but there are some biases here we need to get rid of. For one, some parks had more games in the higher offense eras, while others were primarily used in low scoring years. Here\u2019s a chart on run scoring in the Negro Leagues by year:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-32755\" src=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-IV-Negro-League-Park-Factors.png\" alt=\"\" width=\"645\" height=\"192\" \/><\/p>\n<p>We see there was a dead ball era in the Negro Leagues pre-1920, except for a bump up in the 1911-1914 seasons, and we see 1921 was the beginning of the \u2018live\u2019 ball for the Negro Leagues. So this is one bias we\u2019ll need to correct for.<\/p>\n<p>The other bias we need to adjust for is team bias. Maybe Stars Park had so many runs scored because the St. Louis Stars were such a good hitting club?<\/p>\n<p><strong>The Math-y Details<\/strong><\/p>\n<p>Feel free to skip this section on methodology, although we\u2019re going to try to avoid high level mathematics and statistics as much as possible. To take season run scoring and team quality out of play, but still have a good statistical sample size, the method to calculate park factors will be as follows:<\/p>\n<p>Summarize for each season by team the number of runs that team scored in every park. One positive factor for us here about the Negro Leagues is that teams played almost the same lineup every game, as there simply weren\u2019t many bench players to sub in. This means the quality of the offense should be relatively stable for each game. Here\u2019s an example of summing by season by team by park:<\/p>\n<p>1923, St. Louis Stars, Stars Park, 394 Runs, 58 Games, 6.8 R\/G<br \/>\n1923, St. Louis Stars, Rickwood Field, 28 Runs, 5 Games, 5.6 R\/G<\/p>\n<p>And we do that for every park the Stars played in during 1923. Then we compare the runs scored by each team in each park. In this example we would compare the 6.8 runs per game in Stars Park to the 5.6 in Rickwood Field, and we\u2019d give Stars Park a +1.2 vs. Rickwood, and give Rickwood a -1.2 vs. Stars Park. (We\u2019re going to have a similar pair calculation for the opponent teams in both parks). We also must figure out how to weight this difference, as we\u2019re looking at 58 games in one park vs. only 5 in the other park. For statistical reasons that we won\u2019t get into because they\u2019re above my head, we can take the Harmonic Mean of 58 and 5 as our weight of this 1.2 R\/G difference (which is 9.2). We do this for every pair of parks each team played in, then we sum them all together, and that gives us the total weighted +\/- of any park against all the other parks.<\/p>\n<p>Some general comments on the methodology and its simplifying assumptions:<br \/>\n1. Comparing as a \u2018plus\/minus\u2019 exercise versus other parks in the same season should result in the era (high or low scoring) being adjusted for, adjust for the mix of OTHER parks in the same season (we are now measuring each park against all other parks in the same season instead of against the historical runs\/game), AND this calculation should also adjust for good pitch\/no hit or good hit\/no pitch teams impact on run scoring in a park.<\/p>\n<p>2. We\u2019re not considering the opposing team pitchers\/defense directly. What we could have done is further restrict our sample to pairs of teams, such as St. Louis-Birmingham in Stars Park vs. St. Louis-Birmingham in Rickwood Field, etc. The argument for doing this would be that the quality of the opposition could impact the number of runs scored by a team in a park. While this is certainly true, it\u2019s probably also true that the INDIVIDUAL pitcher has an even bigger impact, so just because the Stars are playing Birmingham in both parks doesn\u2019t mean the quality of pitcher that they faced would be the same. Ideally, we\u2019d adjust for individual pitcher, but this gets very complicated just from a computational angle, and it\u2019s not clear exactly how to separate the pitcher quality from HIS park. For example, if you use pitcher ERA to adjust the quality and Willie Foster is your opposing pitcher, and his ERA is 2.50, it\u2019s not clear how much of the 2.50 is due to pitching in Schorling Park (which is the unknown we\u2019re trying to calculate in the first place), and how much is due to Foster\u2019s ability. Not restricting by opposition has the advantage of allowing for a larger sample size. St. Louis in 1923 played a neutral site game at Lebanon, IN against the ABCs. We would only have two parks to compare to that St. Louis and the ABCs played in \u2013 Stars Park and Washington Park. But by not restricting by opposition, we can use the data point that St. Louis scored 6 runs in that game to compare the Lebanon Park to all of the other parks the Stars played in that season, and all other parks the ABCs played in.<\/p>\n<p>3. We ARE considering the fact the home team scores more and the visitor scores less. Home field advantage was historically around 0.5 runs per game in the Negro Leagues. We add expected runs for each home team, and we subtract expected runs for the visitors.<\/p>\n<p><strong>Back to the Results<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-32756\" src=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-V-Negro-League-Park-Factors.png\" alt=\"\" width=\"553\" height=\"564\" srcset=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-V-Negro-League-Park-Factors.png 553w, https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-V-Negro-League-Park-Factors-294x300.png 294w\" sizes=\"auto, (max-width: 553px) 100vw, 553px\" \/><\/p>\n<p>We see some changes here. Schorling is now the most extreme pitcher\u2019s park. We know it was a pitcher\u2019s park even in the American League, and in the Negro National League it\u2019s being compared to a few more \u2018band box\u2019 parks like Stars Park in St. Louis, so we would somewhat expect it to show as an extreme pitcher\u2019s park.<\/p>\n<p>Parks that were used more in the lower scoring 1940s, like Yankee Stadium, Comiskey Park, and Crosley Field, still show as pitcher friendly, but much less so than before adjusting for run environment.<\/p>\n<p>Northwestern Park in Indianapolis, a pre-1920 \u2018dead ball era\u2019 park, now shows to be one of the better hitter parks.<\/p>\n<p>Stars Parks, with most of the games there played in the high scoring 1920\u2019s, now shows as much less extreme, but is still the best hitting park in the western Negro Leagues.<\/p>\n<p>It\u2019s the same story for Catholic Protectory Oval \u2013 1920\u2019s offense adjustments show it to be less extreme than adding 3 runs per game of offense, but still the most hitter friendly park in the Negro Leagues.<\/p>\n<p>Lewis Park and Rickwood Field go from pitcher friendly to neutral. Apparently, both the Memphis and Birmingham teams tended to have good pitching but poor offenses, which were biasing the original results.<\/p>\n<p>One final step we need to do. The 132 games we have for Forbes Field give us data that certainly is not as reliable in the statistical sense as the 954 games we have for Schorling Park. We need to adjust for that uncertainty by regressing these park factors towards the mean. What the \u2018right\u2019 regression to apply should be is not an easy number to determine. We mentioned earlier that three years of MLB park data regressed by 80 games is usually a good sample. For the Negro Leagues, given that we\u2019re using data over a parks\u2019 entire life, and perhaps introducing more \u2018noise\u2019, we\u2019ll use 160 games as our regression point \u2013 roughly 2 seasons of MLB home games. If we have 160 games of data for a park calculation, we\u2019ll regress that 50%, so that a +0.50 runs factor would become a regressed park factor estimate of +0.25.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-32757\" src=\"https:\/\/seamheads.com\/blog\/wp-content\/uploads\/Chart-VI-Negro-League-Park-Factors.png\" alt=\"\" width=\"645\" height=\"567\" \/><\/p>\n<p>Schorling of course, barely regresses, but as we go down the list, our uncertainty about the observed park factors increases, so the extreme\u2019s we saw for Northwestern Park and Catholic Protectory Oval get regressed down quite significantly. Stars Park is still a hitter haven, even with the regressed numbers.<\/p>\n<p>One final note \u2013 these latest park factors, along with the calculations for home field advantage, and for \u2018strength of schedule\u2019, will very soon be used to update the OPS+ and ERA+ calculations for players on seamheads.com. When that happens, a few players may see some \u2018significant\u2019 changes in those calculations.<\/p>\n<p>(NOTE: This post is an updated version of an earlier article that appeared in the October 31, 2012 issue of Outsider Baseball Bulletin).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most serious baseball fans understand that ballparks can have a large impact on statistical performance. However, trying to measure that exact impact often proves difficult. In work we\u2019ve done on Major League ballparks at Seamheads.com, it takes about three full years of data, regressed by about one additional season worth of games, to get a [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,4235],"tags":[1626,1058],"class_list":["post-32749","post","type-post","status-publish","format-standard","hentry","category-general","category-top-stories","tag-ballparks","tag-negro-leagues"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/posts\/32749","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/comments?post=32749"}],"version-history":[{"count":0,"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/posts\/32749\/revisions"}],"wp:attachment":[{"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/media?parent=32749"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/categories?post=32749"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/seamheads.com\/blog\/wp-json\/wp\/v2\/tags?post=32749"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}