Baseball Digest Daily
HomeAbout UsArticlesBlogPlayer TrackerMLB StatsBig League FuturesSeamheadsHeater

Good vs. Bad

by Matt Mitchell

How to make a stats geek hate your statistical analysis.

In my quest to find a quantifiable link between Spring Training and regular season performance (you can find that here), I was pointed to one of John Dewan’s Stat of the Week posts. I also found this post on Fangraphs by David Appleman, which in essence said something John alluded to in the previous link: spring training stats are hardly predictive of their regular season counterparts.

Yet, as I looked over Appleman’s post, which was more scientific rather than the informational prose of Dewan’s, I kept thinking how flawed his analysis came across. Here’s what he didn’t show in his work and what I’ll try to avoid doing when I’m trying to perform a similar exercise.

Shameless promotions for Fan Graphs aside, the initial issue I have is that he seemingly cherry picks his stat. Now, since K/9 is more indicative of a skill, it isn’t the worst choice (that would probably be something like the traditionally flawed ERA or BA). But he just seems to draw it up as if looking at K/9 is emblematic of looking at any other statistic, batting, fielding, or pitching, in order to answer his initial question which is very similar to the one I posed last week.

Some thing also goes underaccounted for in his “finding” as well. While I do like the use of correlation analysis for his approach, Applemen even cites the reason such an analysis can be flawed in this instance: sample size. That difference in correlation that he cites could just as easily be random statistical error inherent in a small sample. Did he look into the distributions to figure that out? The answer is the same as that of the Tootsie Pop question, lest he demonstrates that. You can find someone who does do this very well in the sidebar links under “Sabermetrics”; he goes by the moniker of Tangotiger.

And something that may be nitpicking a little: if you’re going to use an extreme in your distribution for an example, use some not-so-extreme observations as well. Matter of fact, start with those, then point out the extreme cases which you are examining.

Quite simply, good statistical analysis starts by understanding your context, examining the data thoroughly, and demonstrating such when you present the material in order to properly answer the question that initiated your work in the first place. Anything less and you give more ammo to those who want to discredit you, or worse, have what is really a false conclusion posed as fact by an unwitting reporter/politician, etc that is then disseminated to many a person. Keep this in mind next time you hear a report about drug studies on the national news, too.

Comments (2) -> “Good vs. Bad”

  1. Andrew
    20 March 2008 22:43
    1

    Matt,

    You might want to check out this study done by Beyond the Box Score last year concerning spring training statistics.

    http://www.beyondtheboxscore.com/story/2007/2/28/11030/7931

  2. Matt Mitchell
    21 March 2008 06:19
    2

    Nice link. I figured someone had to have done work on it, and have received some good link. My thought on all of them is that they show that spring statistics do not predict regular season statistics. I think I’m more interested in wondering whether spring stats could/should be used in considering who should get a given roster spot.
    Looks like I’ll have to finally get my BP subscription to read the article cited in your link too.

Reply