April 17, 2014


November 22, 2011 by · 2 Comments 

Most baseball fans are familiar with the concept of ‘normalizing’ statistics. For MLB statistics, the most basic adjustment is to normalize for park effects. The simplest park normalization calculation takes the impact of a team’s park on runs scored then divides that number, either positive or negative, in half, and then that calculation is applied to a player’s OPS, ERA, wRC, etc. to get a normalized performance (usually indicated as OPS+, ERA+, wRC+). If you want to compare players from different leagues or seasons, add an adjustment for the individual league scoring rates and, viola, you have a normalized statistic.

However, the reason simple park calculations ‘work’ for normalization is that there is an underlying assumption that, except for home parks, players within a league all face almost identical conditions under which their teams perform. Those conditions include:

1.     Playing the same number of games as all other teams.
2.     Playing schedules with close to the same difficulty.
3.     Playing an equal number of home and away games (and not
playing any neutral site games).
4.     Playing most or all home games in the same park.

In reality we know that teams, and within teams the individual players, do not EXACTLY all meet these conditions. Some teams play more difficult schedules than others. Some batters may, by schedule or just bad luck, face better pitchers on average than other batters, and vice versa for pitchers. Some players may play more or fewer home games. But those are exceptions, and unless there’s a need to make really fine distinctions between very similar players, adjustments are typically not made for strength of competition, or for the fact that players play better at home than on the road, etc.

For the Negro Leagues, those assumed conditions all fall apart. Not just for the ‘pre-league’ 1900-1919 era, but even after formal leagues formed, the following conditions still prevailed in the Negro Leagues:

1.     Teams played varying numbers of total games.
2.     Teams played differing numbers of games against other
league teams.
3.     Teams played an unbalanced number of home and away games.
4.     Teams played in multiple ‘neutral’ parks.

As a result, the simple park calculation won’t work for the Negro Leagues. To do a good enough job of normalization, we need to adjust for frequency of home field advantage, the strength of the opponent’s batters and pitchers, and finally the combination of parks played in, both at home and on the road.

The steps used to normalize Negro League stats on seamheads.com are:

1.     Estimate the league-wide home field advantage in runs per
2.     Calculate each team’s Simple Rating System (SRS) number in
runs per game. SRS uses the run difference in each game
between teams plus an adjustment from #1 above based on the
game being home/away/neutral to come up with a Strength of
Schedule which feeds back into the final SRS rating.
Baseball-reference.com calculates SRS for MLB teams (in 2011
the Yankees led MLB with a 1.4 RPG SRS while Houston had a
-1.2 SRS). For more details on the calculation see the
football example at: http://www.pro-football-reference.com/blog/?p=37

3.     Estimate based on Runs Scored/Allowed the SRS broken down
into offense and defense/pitching for each team. Using 2011
MLB as an example, perhaps the Yankees 1.4 SRS would be 1.3
for Offense and 0.1 for defense/pitching. So if our
team is playing the Yankees, our pitchers are going to get a
lot more ‘credit’ for having faced the Yankee batters than
our batters will for having to face the Yankee
4.     Calculate a park factor adjustment for every park played
in. We do this using a ‘residual’ runs method. Again using
MLB 2011 as an example, if the MLB average RPG is 5.0, and
the Yankees with their 1.3 SRS offense, 0.1 defense are
playing the Astros who let’s say have a -0.2 SRS offense and
a -1.0 defense, then (ignoring home field advantage for
simplicity) our expected runs in would be Yankees 7.3
(5.0+1.3+1.0), Astros 4.7 (5.0-0.1-0.2) for 12.0 runs
expected if the park were neutral. If the actual score in
Yankee Stadium is Yankees 7, Astros 3 (10 actual runs) then
Yankee Stadium gets a -2 runs difference (12-10) for that
game. Every Yankee Stadium game difference is then
averaged to get the Yankee Stadium runs adjustment.
5.     For each game, apply a run adjustment based on the opponent
SRS (again with batters and pitchers getting separate
adjustments), adjust for whether the game was at home, away
or on a neutral site, apply the specific park adjustment,
then add those all together for final batter and pitcher
difficulty runs adjustments for that game. Finally, all of
the team’s individual game adjustments are then summed and
averaged for the normalizing factor for batters and pitchers
for that team.



  1. Justin Oakes says:

    Thanks for an interesting post, Kevin.

    1. Are the Negro League stats presented (so far) on the site raw or normalized?

    2. When do you envision the stats from 1923-1947 being added to the database?



  2. Justin:

    OPS+ and ERA+ are normalized for 1916-1922. We haven’t yet normalized the Cuban Winter League seasons. They weren’t a priority because:
    1. Almost all of the games were played in the same park.
    2. They actually did play a balanced, round-robin schedule.
    However, strength of schedule is important especially when you sometimes only have 3 teams in the league, so we will get those Cuban Leagues done soon.

    We will have the 1923 NNL added very soon, but after that the plan is to add 1903-08 Cuban Summer Leagues plus the 1902/03 and 1903/04 Cuban Winter leagues before the end of the year.

    For next year, we will be adding the 1908-1915 Negro League Teams data.

    We do also hope to add the 1923 ECL next year, but 1924 and later is still to be determined for some time 2013 and beyond…

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!