Baseball Digest Daily
HomeAbout UsArticlesBlogPlayer TrackerMLB StatsBig League FuturesSeamheadsHeater

Statistics 101: Regression to the mean

by Matt Mitchell

And we return to our overview of statistical topics in sabermetrics.

Two weeks with out a post feels like a long time in the rapidly moving blogosphere, but I’m back from my little vacation. As good as it was (including my own bachelor party at Comiskey Park complete with a White Sox triumph), it’s time to get back to where we left off.

This week we start our exploration of the big “R” word in statistics, regression. However, this is a word that has something of a double meaning. Most of the time, referring to “regression” means referring to the examination of a relationship between two or more variables to see if there is any kind of trend. There are many flavors of this, and we’ll discuss those next week. This week, I’ll explain the idea of “regression to the mean”, which is only concerned with a single variable.

Regression to the mean is not a foreign concept when you understand that it is the way statistics handles the idea of things evening out based on a set of nonrandom data. In other words, regression to the mean adjusts a measure towards the average in order to eliminate the influence of variance, or luck.

This idea is discussed with decent depth in The Book. (I believe it is the work of Mr. Dolphin there, though Tangotiger or MGL should correct me if I’m wrong.) So I’ll try not to lift too much of what they printed, but the idea in using regression to the mean in baseball is the attempt to determine the true talent level of a player at a given moment.

Take the Chipper Jones chase for .400 again. His career average coming into 2007 was .310. Since Chipper has played for many years, we treat that as his pre-existing talent level. Of course, he’s been hitting well over that for the 2008 season. Did his talent level improve to where he’s now a .370 hitter? While we could say yes, we don’t, because we have a history for him that suggests otherwise. But we don’t want to diminish any improvement in his hitting ability that may have been made. Thus, we regress to the mean based on his current batting average, his career batting average, and the number of at bats.

I’m out of time to do the math for you now. (It’s time to go to the day job, unfortunately for here.) However, I hope this makes sense as to why this idea is important to sabermetrics.

Next week: More regression

Reply