September 19, 2003
A probabilistic Model of Range
Measuring the ability of fielders to turn batted balls into outs is one of the most unyielding areas of baseball research. Part of the problem was that for a long time, we didn't have good information. What we'd like to produce is a number that represents the player's range. Early attempts at this looked at plays per game; as we got defensive innings, we were able to make this plays per 9. But this did not adjust for important elements; how many batters a pitcher strikes out, or the handedness of the staff.
When STATS, Inc. started keeping data, it also collected parameters about the batted ball; its distance, direction, how hard it was hit, the type (ground, fly, etc.) and with that they created Zone Ratings to try to compensate for the unknown information. Zone Ratings gives you credit for all balls a fielder turns into an out, while penalizing them for balls not fielded within their zone. The problem with zone ratings is that all balls in the zone are treated the same. One might imagine that balls hit to the edge of the zone are harder to field than balls hit in the middle of the zone (where you would expect the fielder to position himself).
One of the big discoveries of recent years is that pitchers don't seem to effect fielding stats that much. A recent discussion of this can be found in this previous post. Let's just say that the amount a pitcher effects balls in play going for hits is up for discussion.
I work at the CIIR at UMass, and our much of our current work involves using probablistic models to understand and retrieve documents (where "document is a very generic term, not limited to text). So I thought, why not apply these to fielding? I'm asking the question, what is the probability of a batted ball becoming an out, given the parameters of that batted ball?
I've used the STATS, Inc. database to obtain three parameters for each ball; its direction (a slice of pie fanning out from home plate), its batted type (ground, fly, line, bunt or pop) and how hard the ball was hit (soft, medium or hard). I then did a maximum likelihood estimate of the probability of an out given those three parameters for each of the nine fielders. Mathematically, you might write it p(o,f|d,t,h).
Let's look at a specific example. Take a ground ball up the middle to the leftfield side of 2nd base, hit with medium speed. The probability of the batter reaching base is .416. The probability of the pitcher turning it into an out is .312. Shortstop, .258; second base, .013; catcher, .0009 (these may not add up due to rounding). If it were a line drive instead of a ground ball, the probabilities change:
Batter Reached | .749 |
Centerfielder | .193 |
Pitcher | .04 |
Shortstop | .018 |
Now, how do we use this information? These probabilities can be thought of as expectations; if a team has 1000 balls hit as line drives to the above direction with medium speed, 25% of them would be turned into outs. So if a team is turning more than 25% of those into outs, they are exceeding expectations. So my first attempt at using this information is to figure out, for each team, how many balls put into play against them should have been turned into at least one out. I'll then compare that to how many they actually turned into outs, and see what teams exceed expectations the most:
Team | Expected | Actual | Ratio |
Mariners | 2878.0 | 2947.0 | 1.024 |
Phillies | 2851.9 | 2916.0 | 1.022 |
Cardinals | 3003.4 | 3063.0 | 1.020 |
Dodgers | 2588.2 | 2635.0 | 1.018 |
Expos | 2871.4 | 2911.0 | 1.014 |
Angels | 2894.2 | 2927.0 | 1.011 |
White Sox | 2790.3 | 2821.0 | 1.011 |
Padres | 2790.8 | 2819.0 | 1.010 |
Braves | 2939.2 | 2964.0 | 1.008 |
Brewers | 2931.3 | 2952.0 | 1.007 |
Reds | 2995.5 | 3011.0 | 1.005 |
Cubs | 2590.7 | 2602.0 | 1.004 |
Astros | 2783.2 | 2795.0 | 1.004 |
Royals | 3002.0 | 3014.0 | 1.004 |
Orioles | 2918.9 | 2930.0 | 1.004 |
Athletics | 2955.1 | 2952.0 | 0.999 |
Marlins | 2813.7 | 2810.0 | 0.999 |
Indians | 3044.8 | 3040.0 | 0.998 |
Twins | 3043.8 | 3036.0 | 0.997 |
Rockies | 2996.7 | 2988.0 | 0.997 |
Devil Rays | 3018.4 | 3009.0 | 0.997 |
Giants | 2912.0 | 2899.0 | 0.996 |
Tigers | 3116.0 | 3091.0 | 0.992 |
Mets | 2950.0 | 2924.0 | 0.991 |
Blue Jays | 2950.7 | 2911.0 | 0.987 |
Rangers | 2952.2 | 2901.0 | 0.983 |
Diamondbacks | 2796.2 | 2739.0 | 0.980 |
Pirates | 3089.5 | 3020.0 | 0.978 |
Red Sox | 2963.2 | 2888.0 | 0.975 |
Yankees | 2959.6 | 2876.0 | 0.972 |
I will eventually extend this to each position on the team, then to individual fielders. One think to note, the Phillies do better here than they do in DER. I'm gone until Sunday night, but I hope this gives you something to think about. Enjoy your weekend!
Update: I mistakenly did not look for other research in this area. I'll point you to two posts on Baseball Primer by Michael Lichtman for a stat call UZR, or ultimate zone rating. Part I is here, and part II is here. The methodolgy is the same, although I think there are minor differences in the way we treat the data. I have to digest Michael's system a little more, but I'll be commenting on this soon.
Correction, 12/23/2003: Corrected a typo. Changed "So my first attempt at using this information is to figure out, for each team, how many balls put into play against them were turned into at least one out. I'll then compare that to how many they actually turned into outs, and see what teams exceed expectations the most:" to "So my first attempt at using this information is to figure out, for each team, how many balls put into play against them should have been turned into at least one out. I'll then compare that to how many they actually turned into outs, and see what teams exceed expectations the most:"'
Posted by David Pinto at
04:15 PM
|
Defense
|
TrackBack (1)