February 19, 2006
Probabilistic Model of Range, 2005, Runs Created Against Fielders
A few days ago I made my first attempt to calculate runs saved by teams based on the Probabilistic Model of Range (PMR). I'm using a modified version of the runs created formula that appeared in The Bill James Handbook 2005. That formula is designed for batters. I've modified it in the following ways:
- Count any time a fielder fails to get an out as a time on base. So if there is a failed fielder's choice, or the batter reaches on an error, it's a time on base. Since we're looking at defenses, this seems appropriate.
- Total bases are based on the number of bases achieved by the batter when he earns a time on base. So a two base error in this system counts the same as a double. The weights used for the various types of hits are the same as in the Handbook.
So that makes the formula (Times On Base - GDP)* (Weighted Total Base)/(Balls in Play). I'd like to hear what you think about the formula, but I believe it's a good first approximation. It was easy to apply to teams; you're just looking at all balls in play, and the likelihood that a particular ball will end in a particular result. But I wasn't quite sure how to then apply it to individual fielders.
When I was looking just at the probability of catching the ball, I wanted to look at all balls in play. I was looking at the piece of team DER that belonged to a particular fielder. But here, I'm trying to predict runs, so I made the decision to only look at balls in play in which the fielder had a non-zero chance of making the play. If you will, I used the probabilities of various balls in play to define the zone for the fielder, and the results of those balls to define runs created against (RCA).
The results made me wish I had worked on this last year. They're conveying information much more clearly than simply looking at the probability of catching the ball. Let's look at the shortstops first:
Probabilistic Model of Range, 2005, Runs Created Against Shortstops, Original Model (minimum 400 fieldable balls in play)
| Player | Fieldable Balls In Play | Actual Outs by Fielder | Predicted Outs by Fielder | RCA | Predicted RCA | RCA/27 Outs | Predicted RCA/27 | Runs Saved/27 Outs |
| John McDonald | 521 | 178 | 172.92 | 27.41 | 36.40 | 4.16 | 5.68 | 1.527 |
| Adam Everett | 1490 | 517 | 498.79 | 71.05 | 95.41 | 3.71 | 5.16 | 1.454 |
| Omar Infante | 529 | 191 | 173.53 | 30.93 | 37.40 | 4.37 | 5.82 | 1.446 |
| Bobby Crosby | 925 | 312 | 304.63 | 42.34 | 56.74 | 3.66 | 5.03 | 1.365 |
| Rafael Furcal | 1648 | 596 | 576.97 | 84.81 | 109.11 | 3.84 | 5.11 | 1.264 |
| Clint Barmes | 859 | 306 | 279.67 | 49.78 | 56.65 | 4.39 | 5.47 | 1.077 |
| Yuniesky Betancourt | 610 | 177 | 174.31 | 39.99 | 45.58 | 6.10 | 7.06 | 0.961 |
| Juan Uribe | 1729 | 537 | 557.52 | 82.57 | 103.72 | 4.15 | 5.02 | 0.872 |
| Julio Lugo | 1761 | 560 | 540.03 | 122.06 | 134.05 | 5.88 | 6.70 | 0.817 |
| Jimmy Rollins | 1625 | 510 | 519.18 | 89.33 | 106.01 | 4.73 | 5.51 | 0.784 |
| David Eckstein | 1737 | 615 | 617.90 | 97.98 | 114.19 | 4.30 | 4.99 | 0.688 |
| Jack Wilson | 1734 | 600 | 610.94 | 104.81 | 121.02 | 4.72 | 5.35 | 0.632 |
| Orlando Cabrera | 1613 | 469 | 481.07 | 87.65 | 99.85 | 5.05 | 5.60 | 0.558 |
| Oscar M Robles | 554 | 172 | 179.32 | 31.03 | 35.19 | 4.87 | 5.30 | 0.427 |
| Russ M Adams | 1511 | 401 | 437.66 | 91.76 | 106.33 | 6.18 | 6.56 | 0.381 |
| Neifi Perez | 1269 | 445 | 448.29 | 65.47 | 71.20 | 3.97 | 4.29 | 0.316 |
| Wilson Valdez | 516 | 155 | 153.97 | 27.33 | 28.84 | 4.76 | 5.06 | 0.297 |
| Miguel Tejada | 1846 | 572 | 590.91 | 122.90 | 132.45 | 5.80 | 6.05 | 0.251 |
| Juan Castro | 774 | 264 | 260.08 | 59.78 | 61.02 | 6.11 | 6.33 | 0.221 |
| Jhonny Peralta | 1603 | 509 | 549.61 | 94.74 | 106.72 | 5.03 | 5.24 | 0.218 |
| Jason A Bartlett | 769 | 281 | 270.60 | 47.53 | 47.88 | 4.57 | 4.78 | 0.211 |
| J.J. Hardy | 1085 | 346 | 359.91 | 66.62 | 71.75 | 5.20 | 5.38 | 0.184 |
| Omar Vizquel | 1620 | 538 | 558.31 | 86.16 | 91.63 | 4.32 | 4.43 | 0.107 |
| Alex Gonzalez | 1273 | 452 | 441.85 | 91.18 | 88.58 | 5.45 | 5.41 | -0.033 |
| Derek Jeter | 1913 | 561 | 602.13 | 115.11 | 122.15 | 5.54 | 5.48 | -0.063 |
| Carlos Guillen | 834 | 262 | 270.95 | 54.67 | 55.58 | 5.63 | 5.54 | -0.095 |
| Khalil Greene | 1246 | 399 | 409.60 | 74.67 | 74.98 | 5.05 | 4.94 | -0.110 |
| Jose Reyes | 1865 | 522 | 569.16 | 115.79 | 123.88 | 5.99 | 5.88 | -0.113 |
| Cesar Izturis | 1175 | 366 | 386.49 | 74.27 | 76.18 | 5.48 | 5.32 | -0.157 |
| Royce Clayton | 1528 | 473 | 502.95 | 99.61 | 98.82 | 5.69 | 5.30 | -0.381 |
| Bill Hall | 609 | 196 | 205.22 | 47.93 | 46.59 | 6.60 | 6.13 | -0.473 |
| Edgar Renteria | 1773 | 491 | 499.45 | 128.25 | 120.64 | 7.05 | 6.52 | -0.531 |
| Marco Scutaro | 846 | 259 | 282.20 | 50.54 | 48.34 | 5.27 | 4.62 | -0.644 |
| Michael Young | 1930 | 534 | 580.38 | 134.80 | 131.21 | 6.82 | 6.10 | -0.711 |
| Felipe Lopez | 1467 | 459 | 493.34 | 95.85 | 89.31 | 5.64 | 4.89 | -0.750 |
| Cristian Guzman | 1333 | 417 | 438.14 | 83.54 | 75.53 | 5.41 | 4.65 | -0.754 |
| Mike Morse | 581 | 156 | 170.02 | 47.09 | 43.64 | 8.15 | 6.93 | -1.220 |
| Angel Berroa | 1818 | 551 | 594.20 | 168.48 | 137.92 | 8.26 | 6.27 | -1.989 |
Many years ago, Bill James posed a question about shortstops; how many runs does one save with his glove? At the time, someone claimed Ozzie Smith saved 100 runs with his defense. Bill estimated at that time, the difference between the best and worst shortstop in the league was about 25 runs. As you can see here, among regular shortstops, Adam Evertt saved the most runs in 2005, about 24 below expectations. Angel Berroa, the worst regular in the majors, cost the Royals about 31 runs. That puts the difference at 55. Looking at the data, Berroa was an out lier. He was the rare shortstop who contributed nothing offensively while killing the team with his defense. He never should have played a full season at shortstop. Compared to Cristian Guzman, Everett saved about 35 runs, which fits in nicely with Bill's estimate. The magnitude of the numbers looks right to me.
The second feature to look at concerns Derek Jeter and Jose Reyes. There's been discussion as the model developed of how to handle ball in play could be handled by a particular fielder, but are caught by someone else. Runs created against appears to handle this situation quite well. Both Jeter and Reyes allowed fewer runs than the model predicts, despite turning many fewer outs than expected. Why? Others are turning the shortstop misses into outs. What happens, then, is that when calculating RCA, Derek and Jose don't get hurt in the numerator of the equation; they just get bumped up in the denominator.
However, when you look at them in terms of runs per game (27 outs), things change. The outs they don't get matter. They're so many outs behind where they should be, they're actually allowing more runs per game than expected. In other words, the cost of an out by one of these shortstops is high in terms of runs allowed. That I find to be a very cool result, that we can see both team and individual contributions to defense in one line.
The other thing we can see is who plays in tough ballparks. Let's demonstrate this with leftfielders:
Probabilistic Model of Range, 2005, Runs Created Against Leftfielders, Original Model (minimum 200 fieldable balls in play)
| Player | Fieldable Balls In Play | Actual Outs by Fielder | Predicted Outs by Fielder | RCA | Predicted RCA | RCA/27 Outs | Predicted RCA/27 | Runs Saved/27 Outs |
| Chris A Burke | 272 | 120 | 101.65 | 26.59 | 41.11 | 5.98 | 10.92 | 4.935 |
| Coco Crisp | 617 | 294 | 261.44 | 66.21 | 96.11 | 6.08 | 9.93 | 3.846 |
| Reed Johnson | 282 | 134 | 112.67 | 34.77 | 41.96 | 7.01 | 10.05 | 3.049 |
| Bobby Kielty | 242 | 99 | 96.77 | 21.15 | 30.47 | 5.77 | 8.50 | 2.733 |
| Carl Crawford | 717 | 341 | 309.93 | 58.46 | 84.48 | 4.63 | 7.36 | 2.731 |
| Matt T Holliday | 558 | 236 | 214.67 | 72.13 | 87.09 | 8.25 | 10.95 | 2.701 |
| Pedro Feliz | 300 | 138 | 131.98 | 29.23 | 40.98 | 5.72 | 8.38 | 2.663 |
| Jay Payton | 245 | 107 | 94.67 | 24.01 | 30.14 | 6.06 | 8.60 | 2.537 |
| Eric Byrnes | 433 | 209 | 185.38 | 34.82 | 43.41 | 4.50 | 6.32 | 1.824 |
| Carlos Lee | 708 | 307 | 289.83 | 90.47 | 104.77 | 7.96 | 9.76 | 1.803 |
| Kevin Mench | 521 | 231 | 213.63 | 51.08 | 61.35 | 5.97 | 7.75 | 1.784 |
| Moises Alou | 284 | 132 | 123.81 | 25.23 | 31.83 | 5.16 | 6.94 | 1.781 |
| Scott Podsednik | 557 | 260 | 240.02 | 55.04 | 66.05 | 5.72 | 7.43 | 1.715 |
| Craig Monroe | 255 | 99 | 102.25 | 35.00 | 42.14 | 9.54 | 11.13 | 1.582 |
| Raul Ibanez | 255 | 106 | 103.57 | 22.14 | 27.17 | 5.64 | 7.08 | 1.443 |
| Kelly A Johnson | 356 | 166 | 154.90 | 42.52 | 47.68 | 6.92 | 8.31 | 1.394 |
| Rondell White | 270 | 119 | 118.43 | 30.45 | 36.12 | 6.91 | 8.24 | 1.327 |
| Luis Gonzalez | 619 | 270 | 246.42 | 95.06 | 97.93 | 9.51 | 10.73 | 1.225 |
| Randy Winn | 497 | 226 | 209.19 | 47.81 | 53.67 | 5.71 | 6.93 | 1.215 |
| Cliff Floyd | 654 | 283 | 267.59 | 76.59 | 82.93 | 7.31 | 8.37 | 1.061 |
| Hideki Matsui | 520 | 218 | 204.33 | 65.97 | 69.62 | 8.17 | 9.20 | 1.029 |
| Adam Dunn | 616 | 246 | 243.81 | 88.16 | 95.94 | 9.68 | 10.62 | 0.948 |
| Frank Catalanotto | 389 | 163 | 160.10 | 49.82 | 54.50 | 8.25 | 9.19 | 0.939 |
| Jason Bay | 625 | 266 | 257.22 | 83.97 | 89.04 | 8.52 | 9.35 | 0.824 |
| Todd Hollandsworth | 251 | 103 | 101.45 | 36.08 | 37.60 | 9.46 | 10.01 | 0.549 |
| Shannon Stewart | 555 | 249 | 237.55 | 74.67 | 75.06 | 8.10 | 8.53 | 0.434 |
| Garret Anderson | 531 | 201 | 208.94 | 59.21 | 61.67 | 7.95 | 7.97 | 0.016 |
| Pat Burrell | 627 | 236 | 247.60 | 82.06 | 85.00 | 9.39 | 9.27 | -0.118 |
| David Dellucci | 228 | 84 | 90.32 | 29.17 | 30.85 | 9.38 | 9.22 | -0.152 |
| Marlon Byrd | 210 | 100 | 102.84 | 21.98 | 21.78 | 5.93 | 5.72 | -0.215 |
| Terrence Long | 423 | 166 | 163.61 | 76.66 | 73.49 | 12.47 | 12.13 | -0.342 |
| Reggie Sanders | 268 | 108 | 105.77 | 47.56 | 43.95 | 11.89 | 11.22 | -0.671 |
| Ryan Klesko | 504 | 204 | 202.51 | 76.35 | 69.22 | 10.11 | 9.23 | -0.877 |
| Miguel Cabrera | 495 | 188 | 208.43 | 72.91 | 72.74 | 10.47 | 9.42 | -1.049 |
| Manny Ramirez | 689 | 243 | 254.92 | 125.73 | 118.56 | 13.97 | 12.56 | -1.412 |
| Larry Bigbie | 232 | 98 | 96.20 | 34.67 | 28.06 | 9.55 | 7.87 | -1.677 |
You can pick out the stadiums where fielding is difficult. Look at Manny Ramirez, Matt Holliday, Luis Gonzalez and Terrance Long. They're all in left fields that generate a lot of runs. And look at Coco Crisp in left, saving 30 runs. Some of that has to translate to center at Fenway.
As always, I'm anxious to hear your criticisms of the model.
Update: I was just reading and responding to an e-mail from Studes at The Hardball Times (see comment below) and I need to clarify something. When I'm talking about runs saved by a fielder (Predicted RCA - RCA), I shouldn't be so liberal in attributing those runs to the fielder. As the Jeter/Reyes comment above shows, those numbers are influenced by teammates. You should think of the runs saved as being attributed to "the fielder and his surrounding teammates."
Update: In order to make things a bit clearer, I've noted in the headings to the tables that the Out columns are outs that are attributed to the fielder, whether actual or predicted.
Maybe it's just the engineer in me, but what Uncertainties are you coming up with in the data?
Other note, Bigbie at the bottom aye? Something seems odd about that to me. Are you using any sort of park factors?
Fantastic work,as always.
Hey David,
I think you're really onto something here, but I'm kind of confused by your approach and the tables.
When applying run values to situations like this (player's performance against an average with the same number of batted balls), I agree that you have to include both the run impact of the hits allowed/disallowed vs. average, AND the run value of the outs caught/not caught vs. average.
So I think you comparison of Everett and Berroa is incomplete (again, if I understand what you did). Given your approach, the "runs per nine outs" is the only really good number to use. Personally, I would use a "run value" for outs in my original calcuation (making the "per 27 outs" column unnecessary).
I completely don't understand your comment re: Jeter and Reyes; I guess I don't understand your methodology for attributing balls to fielders. Are you saying that the "zone" in question for shortstops differs by team? How do you know that balls they didn't get to were fielded by others?
Finally, I don't understand the ballpark point in your last table. How do you know that the ballpark is affecting these players vs. their own talent?
Sorry for all the questions.
this is great, thanks for getting this wonderful stuff up. I had suggested something along these lines a comment to a pmr post, but this is very well thought out and exactly why I think looking at balls that aren't outs is much more useful than looking at the ones that are. Thank you.
david honey,
any system which has astros adam everett and chris burke at the top of the ratings sounds great to me...