March 26, 2017

Introducing the New Negro Leagues Database

December 5, 2016 by · 2 Comments 

It’s been over five years since we originally launched the Negro Leagues Database. Over that time, there have been significant additions to the database, in terms of new seasons and statistics. But the website and the presentation of these statistics have largely remained the same. In May of 2015, I overhauled the Major League part of The Baseball Gauge, and I’ve wanted to do the same with the Negro Leagues section. Today, we re-launch the award-winning Negro Leagues Database. Here are some of the new features:

Per 162 games

One of the biggest issues with Negro Leagues statistics is that they are incomplete. We don’t have box scores for every game and we currently do not have data for every season and league. Because of this, it’s tough to compare Buck Leonard’s 62 career home runs to Cristóbal Torriente’s 70, the same way we compare Harmon Killebrew (573) to Andre Dawson (438).

To help fix this issue, I’ve included “per 162 games” rates on player and season/career leaderboard pages. Here we’ll see that Buck Leonard averaged 26 home runs per 162 games, while Torriente averaged 11.

Similarity scores

Comparing raw stats from Negro Leagues to Major Leagues is far from perfect. It doesn’t account for league quality, park factors or era. Having said that, we have similarity scores on all player pages, to see which Major Leaguer had the most similar career. Because of the issue described above, “per 162 games” statistics are used instead of career totals. There is also the ability to only compare to Hall of Famers or active players.

The similarity score tool shows us that Oscar Charleston’s most similar Major Leaguer was Rogers Hornsby
Charleston vs Hornsby

Defensive Regression Analysis

These fielding statistics have been available on the Major League site for a few years now and they are finally included in The Negro Leagues Database. Defensive Regression Analysis, created by Michael Humphreys, takes basic fielding statistics and estimates how many runs a player has saved (or allowed) compared to average.

Defensive Regression Analysis shows us that Dick Seay, while a lightweight with the bat (career 51 OPS+), saved 67 runs at second base in the season we have fielding data.

New Wins Above Replacement

The calculation for Wins Above Replacement now matches the Major League site. It uses Base Runs for offense, Defensive Regression Analysis for fielding, and runs allowed (with an adjustment for fielding) for pitching. The replacement level has been set at .294 to be consistent with Baseball-Reference and Fangraphs.

There is also Wins Above Average and Wins Above Greatness if you prefer a different baseline. As with the previous version of the website, Win Shares and Win Shares Above Bench are included.

The career leaders per 162 games contains many familiar names:
WAR per 162

Roster pages

These are available on team, year, franchise, and all-time pages. They contain vitals, uniform #’s, and birth/death information.

Data Coverage

These pages give the user an idea of which statistics we have and which we are missing.

New Logo

We have a beautiful new logo, which was kindly provided by Gary Cieradkowski, creator of the Infinite Baseball Card Set and author of The League of Outsider Baseball.

Finally, we have all the features that were previously available on The Negro Leagues Database as well as the Major League version of The Baseball Gauge.

Comments

2 Responses to “Introducing the New Negro Leagues Database”
  1. Steven Greenes says:

    What is the latest year do you intend to include in the database?

  2. For the Negro League specifically, we will possibly stop at 1948. However, we will likely add the Mexican League thru 1955, and the 1950’s Mandak League, as they both had many Negro League players.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!