Baseball Digest Daily
HomeAbout UsArticlesBlogPlayer TrackerMLB StatsBig League FuturesSeamheadsHeater

Statistics 101: Probability

by Matt Mitchell

Continuing our look at key statistical topics that are used in sabermetrics

Hopefully you took the time to read last week’s introduction to simple descriptive statistics in this series on statistical concepts. If you learned from it and enjoyed it, good. I apologize if it was too math heavy, though I did try to mix in the “plain English” explanations with the mathematical definitions for those who dislike this subject that I love (and that also pays my bills). I know I said that this week would be on distributions, but in trying to write that piece I realized the need for a piece on probability first. We’ll begin there.

Everyone over the age of reason has some sort of grasp on this topic, be it a complete understanding of probability theory or just the understanding that a coin toss or die roll has different outcomes, some which appear more than others. There are a couple ways we look at probability:

  • We can figure out that there are a certain number of possible outcomes, and want to find the odds of one of those outcomes happening. This usually happens under the assumption that all outcomes are equally likely to occur at any time. This is like the dice example from last week. Statisticians sometimes refer to this as classical probability. Vegas card games are another example of something that utilizes this idea of probability.
  • Of course, in baseball, we know that all outcomes are not equally likely. Thus we often use empirical probability, which looks at what happens after an action is repeated time and again. We treat many baseball statistics like this: batting average and on base percentage are the most prominent examples. When these stats and others as treated as probabilities, the assumption is made that each at bat or plate appearance (or other event) is identical (i.e. there are no significant differences).

This understanding of probability also contains another assumption: each event is independent. Independence, in statistics, means that each event has no bearing on other events. For example, a crazy manager decides he will randomly fill out a line-up card for a 3 game series. The odds any player will be selected to lead off the first day is 1/25. These same odds hold for each game, since they are independent events.

However, we also need to understand when events are dependent, or when their probability is based on a previous event. Back to our crazy line-up manager example, the odds for a player being picked as the lead-off man at random were 1/25, but the odds for the other players to be picked at the #2 hitter are 1/24 since the player who was picked for the lead-off spot cannot be picked again.

Dependent probability is similar to conditional probability. Conditional probability is where we calculate the likelihood of an event given that something else. This is like looking at the “runners in scoring position” data or other splits.

So maybe you knew most of that, but for those that didn’t, hopefully you understand a little better how probability can be calculated. Since we’ll talk about probability distributions next time, I hope this help you understand what we will discuss a little bit more. In the mean time, think up of other ways probability is or can be used that I didn’t mention (they do exist, but I wanted to keep this simple).

Reply