January 21, 2006
Probabilistic Model of Range, 2005
A number of readers inquired over the last two months if the Probabilistic Range number for 2005 were going to be published this off-season. I'm happy to say I've acquired the data and I'll be presenting tables this week, on teams, defenses behind pitchers, and individual pitchers.
Here's last year's explanation of the model, which I won't repeat here. The idea is to look not just at the balls turned into outs, but how difficult those balls were to turn into outs. Teams or fielders who turn difficult plays into outs do well. Teams or fielders who let easy balls drop for hits (or make errors) do poorly.
One of the hotly debated aspects of this model is how parks are included in the model. The biggest criticism is that home players have too much influence on the model. I'm going to present three tables for the teams that show how parks change the data.
One will be the model as described in the previous work.
One will be the model without parks in the model.
The third will be a combination of the two, 50% of each.
All models are built on data from four years, 2002-2005.
Probabilistic Model of Range, 2005. Model Includes Parks
| Team | InPlay | Actual Outs | Predicted Outs | DER | Predicted DER | Difference |
| Astros | 4204 | 2963 | 2854.17 | 0.705 | 0.679 | 0.02589 |
| Indians | 4385 | 3108 | 2995.26 | 0.709 | 0.683 | 0.02571 |
| Phillies | 4211 | 2962 | 2853.80 | 0.703 | 0.678 | 0.02570 |
| Athletics | 4286 | 3064 | 2954.86 | 0.715 | 0.689 | 0.02546 |
| White Sox | 4457 | 3175 | 3066.86 | 0.712 | 0.688 | 0.02426 |
| Cardinals | 4414 | 3101 | 3007.96 | 0.703 | 0.681 | 0.02108 |
| Blue Jays | 4511 | 3156 | 3063.16 | 0.700 | 0.679 | 0.02058 |
| Braves | 4559 | 3162 | 3073.91 | 0.694 | 0.674 | 0.01932 |
| Twins | 4545 | 3193 | 3107.42 | 0.703 | 0.684 | 0.01883 |
| Angels | 4383 | 3070 | 2998.12 | 0.700 | 0.684 | 0.01640 |
| Giants | 4520 | 3152 | 3080.03 | 0.697 | 0.681 | 0.01592 |
| Orioles | 4377 | 3032 | 2964.67 | 0.693 | 0.677 | 0.01538 |
| Pirates | 4467 | 3095 | 3032.44 | 0.693 | 0.679 | 0.01400 |
| Diamondbacks | 4571 | 3118 | 3059.45 | 0.682 | 0.669 | 0.01281 |
| Red Sox | 4575 | 3127 | 3068.95 | 0.683 | 0.671 | 0.01269 |
| Devil Rays | 4560 | 3112 | 3054.72 | 0.682 | 0.670 | 0.01256 |
| Cubs | 4117 | 2871 | 2819.97 | 0.697 | 0.685 | 0.01239 |
| Mariners | 4546 | 3184 | 3128.12 | 0.700 | 0.688 | 0.01229 |
| Tigers | 4527 | 3152 | 3097.51 | 0.696 | 0.684 | 0.01204 |
| Brewers | 4252 | 2960 | 2916.77 | 0.696 | 0.686 | 0.01017 |
| Rangers | 4697 | 3200 | 3158.10 | 0.681 | 0.672 | 0.00892 |
| Dodgers | 4392 | 3073 | 3036.02 | 0.700 | 0.691 | 0.00842 |
| Mets | 4424 | 3094 | 3058.20 | 0.699 | 0.691 | 0.00809 |
| Rockies | 4537 | 3043 | 3013.43 | 0.671 | 0.664 | 0.00652 |
| Padres | 4423 | 3051 | 3047.08 | 0.690 | 0.689 | 0.00089 |
| Marlins | 4367 | 2965 | 2965.42 | 0.679 | 0.679 | -0.00010 |
| Yankees | 4483 | 3087 | 3092.01 | 0.689 | 0.690 | -0.00112 |
| Nationals | 4538 | 3161 | 3166.79 | 0.697 | 0.698 | -0.00128 |
| Royals | 4611 | 3068 | 3099.97 | 0.665 | 0.672 | -0.00693 |
| Reds | 4650 | 3148 | 3182.99 | 0.677 | 0.685 | -0.00753 |
Unlike 2004, this was a very good defensive year. Seven of the top eight teams in the list made the playoffs or were in contention as late as the last week of the season. Now for the teams with no park adjustment.
Probabilistic Model of Range, 2005. Model Does Not Include Parks
| Team | InPlay | Actual Outs | Predicted Outs | DER | Predicted DER | Difference |
| Phillies | 4211 | 2962 | 2812.44 | 0.703 | 0.668 | 0.03552 |
| Athletics | 4286 | 3064 | 2921.09 | 0.715 | 0.682 | 0.03334 |
| Indians | 4385 | 3108 | 2970.70 | 0.709 | 0.677 | 0.03131 |
| Astros | 4204 | 2963 | 2835.95 | 0.705 | 0.675 | 0.03022 |
| Braves | 4559 | 3162 | 3043.69 | 0.694 | 0.668 | 0.02595 |
| White Sox | 4457 | 3175 | 3061.04 | 0.712 | 0.687 | 0.02557 |
| Cardinals | 4414 | 3101 | 2992.97 | 0.703 | 0.678 | 0.02447 |
| Blue Jays | 4511 | 3156 | 3066.66 | 0.700 | 0.680 | 0.01981 |
| Giants | 4520 | 3152 | 3062.55 | 0.697 | 0.678 | 0.01979 |
| Dodgers | 4392 | 3073 | 2992.05 | 0.700 | 0.681 | 0.01843 |
| Cubs | 4117 | 2871 | 2799.86 | 0.697 | 0.680 | 0.01728 |
| Nationals | 4538 | 3161 | 3082.57 | 0.697 | 0.679 | 0.01728 |
| Orioles | 4377 | 3032 | 2960.89 | 0.693 | 0.676 | 0.01625 |
| Diamondbacks | 4571 | 3118 | 3051.28 | 0.682 | 0.668 | 0.01460 |
| Angels | 4383 | 3070 | 3007.42 | 0.700 | 0.686 | 0.01428 |
| Twins | 4545 | 3193 | 3130.04 | 0.703 | 0.689 | 0.01385 |
| Pirates | 4467 | 3095 | 3034.07 | 0.693 | 0.679 | 0.01364 |
| Mariners | 4546 | 3184 | 3124.61 | 0.700 | 0.687 | 0.01306 |
| Tigers | 4527 | 3152 | 3101.99 | 0.696 | 0.685 | 0.01105 |
| Brewers | 4252 | 2960 | 2913.06 | 0.696 | 0.685 | 0.01104 |
| Mets | 4424 | 3094 | 3051.37 | 0.699 | 0.690 | 0.00964 |
| Devil Rays | 4560 | 3112 | 3068.61 | 0.682 | 0.673 | 0.00951 |
| Rangers | 4697 | 3200 | 3165.60 | 0.681 | 0.674 | 0.00732 |
| Red Sox | 4575 | 3127 | 3104.20 | 0.683 | 0.679 | 0.00498 |
| Padres | 4423 | 3051 | 3039.75 | 0.690 | 0.687 | 0.00254 |
| Rockies | 4537 | 3043 | 3035.26 | 0.671 | 0.669 | 0.00171 |
| Marlins | 4367 | 2965 | 2958.27 | 0.679 | 0.677 | 0.00154 |
| Reds | 4650 | 3148 | 3155.28 | 0.677 | 0.679 | -0.00157 |
| Yankees | 4483 | 3087 | 3135.64 | 0.689 | 0.699 | -0.01085 |
| Royals | 4611 | 3068 | 3130.12 | 0.665 | 0.679 | -0.01347 |
You can see the big drop in the Red Sox defense if you don't include the park in the calculation of team range. Lots of balls that would be outs other places hit the wall in Fenway. Without the adjustment, the Red Sox defense looks worse than it is.
Here's the smoothed model:
Probabilistic Model of Range, 2005. 50% Model With Parks, 50% Model Without Parks
| Team | InPlay | Actual Outs | Predicted Outs | DER | Predicted DER | Difference |
| Phillies | 4211 | 2962 | 2833.12 | 0.703 | 0.673 | 0.03061 |
| Athletics | 4286 | 3064 | 2937.98 | 0.715 | 0.685 | 0.02940 |
| Indians | 4385 | 3108 | 2982.98 | 0.709 | 0.680 | 0.02851 |
| Astros | 4204 | 2963 | 2845.06 | 0.705 | 0.677 | 0.02805 |
| White Sox | 4457 | 3175 | 3063.95 | 0.712 | 0.687 | 0.02492 |
| Cardinals | 4414 | 3101 | 3000.46 | 0.703 | 0.680 | 0.02278 |
| Braves | 4559 | 3162 | 3058.80 | 0.694 | 0.671 | 0.02264 |
| Blue Jays | 4511 | 3156 | 3064.91 | 0.700 | 0.679 | 0.02019 |
| Giants | 4520 | 3152 | 3071.29 | 0.697 | 0.679 | 0.01786 |
| Twins | 4545 | 3193 | 3118.73 | 0.703 | 0.686 | 0.01634 |
| Orioles | 4377 | 3032 | 2962.78 | 0.693 | 0.677 | 0.01581 |
| Angels | 4383 | 3070 | 3002.77 | 0.700 | 0.685 | 0.01534 |
| Cubs | 4117 | 2871 | 2809.92 | 0.697 | 0.683 | 0.01484 |
| Pirates | 4467 | 3095 | 3033.25 | 0.693 | 0.679 | 0.01382 |
| Diamondbacks | 4571 | 3118 | 3055.36 | 0.682 | 0.668 | 0.01370 |
| Dodgers | 4392 | 3073 | 3014.04 | 0.700 | 0.686 | 0.01343 |
| Mariners | 4546 | 3184 | 3126.36 | 0.700 | 0.688 | 0.01268 |
| Tigers | 4527 | 3152 | 3099.75 | 0.696 | 0.685 | 0.01154 |
| Devil Rays | 4560 | 3112 | 3061.67 | 0.682 | 0.671 | 0.01104 |
| Brewers | 4252 | 2960 | 2914.92 | 0.696 | 0.686 | 0.01060 |
| Mets | 4424 | 3094 | 3054.78 | 0.699 | 0.691 | 0.00886 |
| Red Sox | 4575 | 3127 | 3086.58 | 0.683 | 0.675 | 0.00884 |
| Rangers | 4697 | 3200 | 3161.85 | 0.681 | 0.673 | 0.00812 |
| Nationals | 4538 | 3161 | 3124.68 | 0.697 | 0.689 | 0.00800 |
| Rockies | 4537 | 3043 | 3024.35 | 0.671 | 0.667 | 0.00411 |
| Padres | 4423 | 3051 | 3043.42 | 0.690 | 0.688 | 0.00171 |
| Marlins | 4367 | 2965 | 2961.85 | 0.679 | 0.678 | 0.00072 |
| Reds | 4650 | 3148 | 3169.14 | 0.677 | 0.682 | -0.00455 |
| Yankees | 4483 | 3087 | 3113.83 | 0.689 | 0.695 | -0.00598 |
| Royals | 4611 | 3068 | 3115.04 | 0.665 | 0.676 | -0.01020 |
I'm open as always to comments on which of these you think is best, or how any of them might be improved. The best suggestions I've heard, however, involve much more complicated programming. I like this simple model.
One thing is very clear, the Yankees, Royals and Reds did not help their pitching staffs in 2005, no matter how you look at the data.
A hat tip to Mitchel Lichtman, who used this idea first in UZR, but has gone on to private practice.
I know this subject was brought up previous in the other PMR discussions, but looking at the Difference column, most teams are positive. What I'd like to see is some numbers metric of above/below average. That's one of the reasons why numbers like Rate and OPS+ are so easy to use. Just looking at the number tells you how the player did compared to average.
With 25 of 30 teams in the basic model outperforming their predicted DER, and a 26th team being virtually identical to the model, it seems to me that there's either a problem in the model, or something in the BIP distribution changed between 2004 and 2005. The model should not be underpredicting DER for that many teams.
I've re-done the numbers for my own use with a fudge factor. If anyone wants them, drop me an e-mail.
David
Oh, and David, this is great stuff! Keep it coming.
Mike, the model is based on four years of data. My guess is if I ran all four team seasons, you'd see an even number above and below 0.
David, interesting data. One question and observation. How do you interpret the DER? If the Phillies have a DER of .703, does that mean there's a 70 percent chance that a ball put in play will turn into an out? And I noticed that there's not a lot of variation in the data. It seems like all the team are in the .68 to .70 range. Is there any concrete way to describe how a team with a DER of .700 differs from a team with a DER of .680?
Steve, your interpretation of DER is correct. These are fieldable balls in play, so out of the park home runs are removed from the equation.
It may help you get a handle on DER by thinking of it as 1.0-DER. That would be approximately batting average vs. the defense. So the difference between a .700 DER and a .680 DER is the difference between a .300 hitter and a .320 hitter.
David, you rock! Love this stuff.
Besides the Red Sox drop w/ no park variable, the Nationals rise is also quite drastic, from #12 to #27. That big stadium must let them catch many fly balls that would be HR's elsewhere.
Really great information.
Another way to think about what a .703 DER means is that the Phillies' fielders made 129 more outs than expected. That's about 100 runs (depending on distribution btwn IF and OF), or a reduction in team RA/G of 0.62. In fact, making the "difference" column in the table outs (or runs) rather than DER would probably be more helpful.
"Unlike 2004, this was a very good defensive year."
With about 130,000 BIP each year, I'm skeptical that there are really 'bad' or 'good' years. The THT site shows overall DER at .695 in 2004 and .695 in 2005. If that's right, it's hard to conclude this was a better fielding year. Or if DER was higher this year, I would look to factors like weather, park changes, strike zone changes, etc. before concluding the overall level of fielding changed.
David is taking the perspective, in his presentation and perhaps in his opinion, the following:
given that the "true mean" league-average is zero for the last 4 years, any deviation within those 4 years at the league-level is a true difference in skill.
There are a multitude of reasons why the "sample" mean is not zero for each of the last 4 years (within the 4-yr population). Change in personnel or quality of fielders is one, but that is probably way down the list. External factors, like the climate, or ball (or list your favorite) can be a cause.
It is much more beneficial, and probably more accurate, to list the annual "true mean" of each sample year as zero. It makes in-season comparisons much easier, and easier to grasp.
Year-to-year comparisons are probably more accurate as well (rather than making it seem like 80% of the fielders, improved, or whatever the number is).
That said, I love this!
There are two things being tracked: real DER and pred DER. Real DER changes very little -- approx .691 in 2004 and .694 in 2005 according to David's data. If 3 plays out of 1,000 makes '05 a "better fielding year," well OK (but the difference is w/in the MOE). But the predicted DER goes from .698 last year to .681 this year. That doesn't make sense to me. The distribution of 125,000 BIP can't possibly be that different.
Guy,
You raise an excellent point. I'll look into this. One problem is that if you look at the table from last year, there's a different underlying model, in that it's based on three years (2002-2004) of data. I don't know if that's enough to cause a difference, but I'm going to run 2004 against the four year model and see.