Introducing the New Negro Leagues Database

It’s been over five years since we originally launched the Negro Leagues Database. Over that time, there have been significant additions to the database, in terms of new seasons and statistics. But the website and the presentation of these statistics have largely remained the same. In May of 2015, I overhauled the Major League part of The Baseball Gauge, and I’ve wanted to do the same with the Negro Leagues section. Today, we re-launch the award-winning Negro Leagues Database. Here are some of the new features:

Per 162 games

One of the biggest issues with Negro Leagues statistics is that they are incomplete. We don’t have box scores for every game and we currently do not have data for every season and league. Because of this, it’s tough to compare Buck Leonard’s 62 career home runs to Cristóbal Torriente’s 70, the same way we compare Harmon Killebrew (573) to Andre Dawson (438).

To help fix this issue, I’ve included “per 162 games” rates on player and season/career leaderboard pages. Here we’ll see that Buck Leonard averaged 26 home runs per 162 games, while Torriente averaged 11.

Similarity scores

Comparing raw stats from Negro Leagues to Major Leagues is far from perfect. It doesn’t account for league quality, park factors or era. Having said that, we have similarity scores on all player pages, to see which Major Leaguer had the most similar career. Because of the issue described above, “per 162 games” statistics are used instead of career totals. There is also the ability to only compare to Hall of Famers or active players.

The similarity score tool shows us that Oscar Charleston’s most similar Major Leaguer was Rogers Hornsby
Charleston vs Hornsby

Defensive Regression Analysis

These fielding statistics have been available on the Major League site for a few years now and they are finally included in The Negro Leagues Database. Defensive Regression Analysis, created by Michael Humphreys, takes basic fielding statistics and estimates how many runs a player has saved (or allowed) compared to average.

Defensive Regression Analysis shows us that Dick Seay, while a lightweight with the bat (career 51 OPS+), saved 67 runs at second base in the season we have fielding data.

New Wins Above Replacement

The calculation for Wins Above Replacement now matches the Major League site. It uses Base Runs for offense, Defensive Regression Analysis for fielding, and runs allowed (with an adjustment for fielding) for pitching. The replacement level has been set at .294 to be consistent with Baseball-Reference and Fangraphs.

There is also Wins Above Average and Wins Above Greatness if you prefer a different baseline. As with the previous version of the website, Win Shares and Win Shares Above Bench are included.

The career leaders per 162 games contains many familiar names:
WAR per 162

Roster pages

These are available on team, year, franchise, and all-time pages. They contain vitals, uniform #’s, and birth/death information.

Data Coverage

These pages give the user an idea of which statistics we have and which we are missing.

New Logo

We have a beautiful new logo, which was kindly provided by Gary Cieradkowski, creator of the Infinite Baseball Card Set and author of The League of Outsider Baseball.

Finally, we have all the features that were previously available on The Negro Leagues Database as well as the Major League version of The Baseball Gauge.

This entry was posted in Announcements, General, Historical, Site Additions, Statistical Analysis. Bookmark the permalink.

3 Responses to Introducing the New Negro Leagues Database

  1. Dr. Doom says:

    Just amazing, Dan! Thank you for the great update. I guess I know how I’ll be spending my break time at work today…

  2. Greg Scholz says:

    Really wonderful stuff! Thank you for your efforts, feels like Christmas came early this year.

Leave a Reply

Your email address will not be published. Required fields are marked *


2 × = ten

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>