Baseball Digest Daily
HomeAbout UsArticlesBlogPlayer TrackerMLB StatsBaseball ProspectusSeamheads

The Sabermetric Soapbox: Empiricism and Theory

by Matt Mitchell

From what we’ve seen to what we can imagine, and the other way around

My introduction to sabermetrics came from a regular trip to the local library. It was sometime in high school, and I meandered the shelves looking for some summer reading to do when I wasn’t perfecting my curly-q ice cream technique or umpiring a Little League game. Naturally, I always made a stop by the 790s in the Non-Fiction shelves: the sports section. I happened upon Curve Ball by Jim Albert and Jay Bennett, and was introduced to how statistical models could be used in baseball analysis. Meanwhile, many of the metrics they cited came from a basis of empirical observation.

Fast forward a few years, and I find myself reading another must have book for those interested in the statistical analysis of baseball: The Book by Tom M. Tango, Mitchel G. Lichtman, and Andrew E. Dolphin. In here, not only did they do analysis from empirically-derived metrics, but also utilized theoretical models in their analysis of the game.

In many ways, the foundations of sabermetrics are based on that which is observed, and that makes plenty of sense. When you’re studying a game that has been recorded for over a century, you have a sufficient amount of data to create valid methods of measurement. I imagine Bill James had lots of experimental trial and error in creating the details of his metrics like Game Score and Pythagorean Record. Defending any conclusion based on what you saw is a lot easier sell than purporting a theory. Plus, since we like history here, it makes sense to go back and look at what happened to see who had what impact on the game.

Theoretical models are still useful and still prominent. All those statistical project systems use them to varying extents. An analysis of a strategy is done in a simulated run scoring environment to reflect a game with a lot of runs being scored as well as one with very few. You can plug in a logical range of Equivalent Average values to see how they affect Equivalent Runs and other derived statistics (both metrics from BP).

But sabermetrics haven’t traveled the entire way into a completely theoretical model. The run expectancy matrix is really hard to simulate using a Markov chain Monte Carlo simulation if only because the only limit to scoring runs is producing 3 outs in a half inning and there are so many variants on the chain of events that lead to even 1 or 2 runs being scored in an inning, let alone 10 or 11. And with this as the basis of Linear Weights, it makes a completely theoretical model of the game difficult to produce.

Can it happen? Maybe, but it’ll be through empirical methods.

Column note: Like the column name? Hate it? Feel free to voice your opinion.

Comments (2) -> “The Sabermetric Soapbox: Empiricism and Theory”

  1. Richard Stroud
    09 April 2008 08:00
    1

    I like the title. I don’t know anything about sabermetrics so I can’t comment on the rest of the article. But the title is good.

  2. Mike Lynch
    09 April 2008 13:59
    2

    Great stuff, Matt!

Reply