Baseball Musings
Baseball Musings
September 19, 2003
A probabilistic Model of Range

Measuring the ability of fielders to turn batted balls into outs is one of the most unyielding areas of baseball research. Part of the problem was that for a long time, we didn't have good information. What we'd like to produce is a number that represents the player's range. Early attempts at this looked at plays per game; as we got defensive innings, we were able to make this plays per 9. But this did not adjust for important elements; how many batters a pitcher strikes out, or the handedness of the staff.

When STATS, Inc. started keeping data, it also collected parameters about the batted ball; its distance, direction, how hard it was hit, the type (ground, fly, etc.) and with that they created Zone Ratings to try to compensate for the unknown information. Zone Ratings gives you credit for all balls a fielder turns into an out, while penalizing them for balls not fielded within their zone. The problem with zone ratings is that all balls in the zone are treated the same. One might imagine that balls hit to the edge of the zone are harder to field than balls hit in the middle of the zone (where you would expect the fielder to position himself).

One of the big discoveries of recent years is that pitchers don't seem to effect fielding stats that much. A recent discussion of this can be found in this previous post. Let's just say that the amount a pitcher effects balls in play going for hits is up for discussion.

I work at the CIIR at UMass, and our much of our current work involves using probablistic models to understand and retrieve documents (where "document is a very generic term, not limited to text). So I thought, why not apply these to fielding? I'm asking the question, what is the probability of a batted ball becoming an out, given the parameters of that batted ball?

I've used the STATS, Inc. database to obtain three parameters for each ball; its direction (a slice of pie fanning out from home plate), its batted type (ground, fly, line, bunt or pop) and how hard the ball was hit (soft, medium or hard). I then did a maximum likelihood estimate of the probability of an out given those three parameters for each of the nine fielders. Mathematically, you might write it p(o,f|d,t,h).

Let's look at a specific example. Take a ground ball up the middle to the leftfield side of 2nd base, hit with medium speed. The probability of the batter reaching base is .416. The probability of the pitcher turning it into an out is .312. Shortstop, .258; second base, .013; catcher, .0009 (these may not add up due to rounding). If it were a line drive instead of a ground ball, the probabilities change:

Batter Reached.749
Centerfielder.193
Pitcher.04
Shortstop.018

Now, how do we use this information? These probabilities can be thought of as expectations; if a team has 1000 balls hit as line drives to the above direction with medium speed, 25% of them would be turned into outs. So if a team is turning more than 25% of those into outs, they are exceeding expectations. So my first attempt at using this information is to figure out, for each team, how many balls put into play against them should have been turned into at least one out. I'll then compare that to how many they actually turned into outs, and see what teams exceed expectations the most:

TeamExpectedActualRatio
Mariners 2878.0 2947.0 1.024
Phillies 2851.9 2916.0 1.022
Cardinals 3003.4 3063.0 1.020
Dodgers 2588.2 2635.0 1.018
Expos 2871.4 2911.0 1.014
Angels 2894.2 2927.0 1.011
White Sox 2790.3 2821.0 1.011
Padres 2790.8 2819.0 1.010
Braves 2939.2 2964.0 1.008
Brewers 2931.3 2952.0 1.007
Reds 2995.5 3011.0 1.005
Cubs 2590.7 2602.0 1.004
Astros 2783.2 2795.0 1.004
Royals 3002.0 3014.0 1.004
Orioles 2918.9 2930.0 1.004
Athletics 2955.1 2952.0 0.999
Marlins 2813.7 2810.0 0.999
Indians 3044.8 3040.0 0.998
Twins 3043.8 3036.0 0.997
Rockies 2996.7 2988.0 0.997
Devil Rays 3018.4 3009.0 0.997
Giants 2912.0 2899.0 0.996
Tigers 3116.0 3091.0 0.992
Mets 2950.0 2924.0 0.991
Blue Jays 2950.7 2911.0 0.987
Rangers 2952.2 2901.0 0.983
Diamondbacks 2796.2 2739.0 0.980
Pirates 3089.5 3020.0 0.978
Red Sox 2963.2 2888.0 0.975
Yankees 2959.6 2876.0 0.972

I will eventually extend this to each position on the team, then to individual fielders. One think to note, the Phillies do better here than they do in DER. I'm gone until Sunday night, but I hope this gives you something to think about. Enjoy your weekend!

Update: I mistakenly did not look for other research in this area. I'll point you to two posts on Baseball Primer by Michael Lichtman for a stat call UZR, or ultimate zone rating. Part I is here, and part II is here. The methodolgy is the same, although I think there are minor differences in the way we treat the data. I have to digest Michael's system a little more, but I'll be commenting on this soon.

Correction, 12/23/2003: Corrected a typo. Changed "So my first attempt at using this information is to figure out, for each team, how many balls put into play against them were turned into at least one out. I'll then compare that to how many they actually turned into outs, and see what teams exceed expectations the most:" to "So my first attempt at using this information is to figure out, for each team, how many balls put into play against them should have been turned into at least one out. I'll then compare that to how many they actually turned into outs, and see what teams exceed expectations the most:"'


Posted by David Pinto at 04:15 PM | Defense | TrackBack (1)