Baseball Digest Daily
HomeAbout UsArticlesBlogPlayer TrackerMLB StatsBaseball ProspectusSeamheads

Working backwards

by Matt Mitchell

A different way to work with baseball statistics

I’ve started to take classes towards earning a Master’s degree in statistics (in case you never read my bio, my day job is as a statistician). My first class has been a general introduction to the program I still have yet to be admitted to and features numerous presentations on the various branches of statistics. Most recently this was on nonparametric statistics*.

*If you’re of the non-statistically inclined crowd, you probably read that last phrase and thought “Ok, I’m outta here!” I’m going to ask you to stay, because you’ll probably benefit from this as much as the stathead will.

For most of the statistical analysis you’ve probably seen, be it baseball or some other aspect of society, most of it has been done with trying to fit the data to a particular type of model. That is to say that someone looks at the information, mainly qualitative, and sees that he can do something like regression and uses that technique with the data. However, this is typically done without any validation that the method chosen is appropriate. So when a traditional technique is chosen, it may fit adequately, but there may be something better that can be used. The solution is using a nonparametric model.

Can this be used in baseball? I think so. The one that strikes me as a good place to start is age curves. Currently, this is done using either a time-series model, something that accounts for the nature of age as always increasing, or a quadratic model, which uses the idea that a player’s performance elevates to a peak at a certain age and tails off. But this is a perfect example of qualitative information that dictates what the model should look like.

So why hasn’t this been done? Easy: most people don’t know how to do it. Nonparametric statistics courses are taught at a very advanced level, being optional in MS degrees and only required for Ph.D. students. But I think this has growing value as the “stats vs. scouts” divide dissolves into a symbiotic relationship between the two camps.

So my challenge to the reader is this: Can you think of a set of valid assumptions about the game of baseball that could be used as a basis for this type of analysis?

Reply