Baseball Musings
Baseball Musings
February 19, 2006
Probabilistic Model of Range, 2005, Runs Created Against Fielders

A few days ago I made my first attempt to calculate runs saved by teams based on the Probabilistic Model of Range (PMR). I'm using a modified version of the runs created formula that appeared in The Bill James Handbook 2005. That formula is designed for batters. I've modified it in the following ways:

  • Count any time a fielder fails to get an out as a time on base. So if there is a failed fielder's choice, or the batter reaches on an error, it's a time on base. Since we're looking at defenses, this seems appropriate.
  • Total bases are based on the number of bases achieved by the batter when he earns a time on base. So a two base error in this system counts the same as a double. The weights used for the various types of hits are the same as in the Handbook.

So that makes the formula (Times On Base - GDP)* (Weighted Total Base)/(Balls in Play). I'd like to hear what you think about the formula, but I believe it's a good first approximation. It was easy to apply to teams; you're just looking at all balls in play, and the likelihood that a particular ball will end in a particular result. But I wasn't quite sure how to then apply it to individual fielders.

When I was looking just at the probability of catching the ball, I wanted to look at all balls in play. I was looking at the piece of team DER that belonged to a particular fielder. But here, I'm trying to predict runs, so I made the decision to only look at balls in play in which the fielder had a non-zero chance of making the play. If you will, I used the probabilities of various balls in play to define the zone for the fielder, and the results of those balls to define runs created against (RCA).

The results made me wish I had worked on this last year. They're conveying information much more clearly than simply looking at the probability of catching the ball. Let's look at the shortstops first:

Probabilistic Model of Range, 2005, Runs Created Against Shortstops, Original Model (minimum 400 fieldable balls in play)
PlayerFieldable Balls In PlayActual Outs by FielderPredicted Outs by FielderRCAPredicted RCARCA/27 OutsPredicted RCA/27Runs Saved/27 Outs
John McDonald521 178 172.92 27.41 36.40 4.16 5.68 1.527
Adam Everett1490 517 498.79 71.05 95.41 3.71 5.16 1.454
Omar Infante529 191 173.53 30.93 37.40 4.37 5.82 1.446
Bobby Crosby925 312 304.63 42.34 56.74 3.66 5.03 1.365
Rafael Furcal1648 596 576.97 84.81 109.11 3.84 5.11 1.264
Clint Barmes859 306 279.67 49.78 56.65 4.39 5.47 1.077
Yuniesky Betancourt610 177 174.31 39.99 45.58 6.10 7.06 0.961
Juan Uribe1729 537 557.52 82.57 103.72 4.15 5.02 0.872
Julio Lugo1761 560 540.03 122.06 134.05 5.88 6.70 0.817
Jimmy Rollins1625 510 519.18 89.33 106.01 4.73 5.51 0.784
David Eckstein1737 615 617.90 97.98 114.19 4.30 4.99 0.688
Jack Wilson1734 600 610.94 104.81 121.02 4.72 5.35 0.632
Orlando Cabrera1613 469 481.07 87.65 99.85 5.05 5.60 0.558
Oscar M Robles554 172 179.32 31.03 35.19 4.87 5.30 0.427
Russ M Adams1511 401 437.66 91.76 106.33 6.18 6.56 0.381
Neifi Perez1269 445 448.29 65.47 71.20 3.97 4.29 0.316
Wilson Valdez516 155 153.97 27.33 28.84 4.76 5.06 0.297
Miguel Tejada1846 572 590.91 122.90 132.45 5.80 6.05 0.251
Juan Castro774 264 260.08 59.78 61.02 6.11 6.33 0.221
Jhonny Peralta1603 509 549.61 94.74 106.72 5.03 5.24 0.218
Jason A Bartlett769 281 270.60 47.53 47.88 4.57 4.78 0.211
J.J. Hardy1085 346 359.91 66.62 71.75 5.20 5.38 0.184
Omar Vizquel1620 538 558.31 86.16 91.63 4.32 4.43 0.107
Alex Gonzalez1273 452 441.85 91.18 88.58 5.45 5.41 -0.033
Derek Jeter1913 561 602.13 115.11 122.15 5.54 5.48 -0.063
Carlos Guillen834 262 270.95 54.67 55.58 5.63 5.54 -0.095
Khalil Greene1246 399 409.60 74.67 74.98 5.05 4.94 -0.110
Jose Reyes1865 522 569.16 115.79 123.88 5.99 5.88 -0.113
Cesar Izturis1175 366 386.49 74.27 76.18 5.48 5.32 -0.157
Royce Clayton1528 473 502.95 99.61 98.82 5.69 5.30 -0.381
Bill Hall609 196 205.22 47.93 46.59 6.60 6.13 -0.473
Edgar Renteria1773 491 499.45 128.25 120.64 7.05 6.52 -0.531
Marco Scutaro846 259 282.20 50.54 48.34 5.27 4.62 -0.644
Michael Young1930 534 580.38 134.80 131.21 6.82 6.10 -0.711
Felipe Lopez1467 459 493.34 95.85 89.31 5.64 4.89 -0.750
Cristian Guzman1333 417 438.14 83.54 75.53 5.41 4.65 -0.754
Mike Morse581 156 170.02 47.09 43.64 8.15 6.93 -1.220
Angel Berroa1818 551 594.20 168.48 137.92 8.26 6.27 -1.989

Many years ago, Bill James posed a question about shortstops; how many runs does one save with his glove? At the time, someone claimed Ozzie Smith saved 100 runs with his defense. Bill estimated at that time, the difference between the best and worst shortstop in the league was about 25 runs. As you can see here, among regular shortstops, Adam Evertt saved the most runs in 2005, about 24 below expectations. Angel Berroa, the worst regular in the majors, cost the Royals about 31 runs. That puts the difference at 55. Looking at the data, Berroa was an out lier. He was the rare shortstop who contributed nothing offensively while killing the team with his defense. He never should have played a full season at shortstop. Compared to Cristian Guzman, Everett saved about 35 runs, which fits in nicely with Bill's estimate. The magnitude of the numbers looks right to me.

The second feature to look at concerns Derek Jeter and Jose Reyes. There's been discussion as the model developed of how to handle ball in play could be handled by a particular fielder, but are caught by someone else. Runs created against appears to handle this situation quite well. Both Jeter and Reyes allowed fewer runs than the model predicts, despite turning many fewer outs than expected. Why? Others are turning the shortstop misses into outs. What happens, then, is that when calculating RCA, Derek and Jose don't get hurt in the numerator of the equation; they just get bumped up in the denominator.

However, when you look at them in terms of runs per game (27 outs), things change. The outs they don't get matter. They're so many outs behind where they should be, they're actually allowing more runs per game than expected. In other words, the cost of an out by one of these shortstops is high in terms of runs allowed. That I find to be a very cool result, that we can see both team and individual contributions to defense in one line.

The other thing we can see is who plays in tough ballparks. Let's demonstrate this with leftfielders:

Probabilistic Model of Range, 2005, Runs Created Against Leftfielders, Original Model (minimum 200 fieldable balls in play)
PlayerFieldable Balls In PlayActual Outs by FielderPredicted Outs by FielderRCAPredicted RCARCA/27 OutsPredicted RCA/27Runs Saved/27 Outs
Chris A Burke272 120 101.65 26.59 41.11 5.98 10.92 4.935
Coco Crisp617 294 261.44 66.21 96.11 6.08 9.93 3.846
Reed Johnson282 134 112.67 34.77 41.96 7.01 10.05 3.049
Bobby Kielty242 99 96.77 21.15 30.47 5.77 8.50 2.733
Carl Crawford717 341 309.93 58.46 84.48 4.63 7.36 2.731
Matt T Holliday558 236 214.67 72.13 87.09 8.25 10.95 2.701
Pedro Feliz300 138 131.98 29.23 40.98 5.72 8.38 2.663
Jay Payton245 107 94.67 24.01 30.14 6.06 8.60 2.537
Eric Byrnes433 209 185.38 34.82 43.41 4.50 6.32 1.824
Carlos Lee708 307 289.83 90.47 104.77 7.96 9.76 1.803
Kevin Mench521 231 213.63 51.08 61.35 5.97 7.75 1.784
Moises Alou284 132 123.81 25.23 31.83 5.16 6.94 1.781
Scott Podsednik557 260 240.02 55.04 66.05 5.72 7.43 1.715
Craig Monroe255 99 102.25 35.00 42.14 9.54 11.13 1.582
Raul Ibanez255 106 103.57 22.14 27.17 5.64 7.08 1.443
Kelly A Johnson356 166 154.90 42.52 47.68 6.92 8.31 1.394
Rondell White270 119 118.43 30.45 36.12 6.91 8.24 1.327
Luis Gonzalez619 270 246.42 95.06 97.93 9.51 10.73 1.225
Randy Winn497 226 209.19 47.81 53.67 5.71 6.93 1.215
Cliff Floyd654 283 267.59 76.59 82.93 7.31 8.37 1.061
Hideki Matsui520 218 204.33 65.97 69.62 8.17 9.20 1.029
Adam Dunn616 246 243.81 88.16 95.94 9.68 10.62 0.948
Frank Catalanotto389 163 160.10 49.82 54.50 8.25 9.19 0.939
Jason Bay625 266 257.22 83.97 89.04 8.52 9.35 0.824
Todd Hollandsworth251 103 101.45 36.08 37.60 9.46 10.01 0.549
Shannon Stewart555 249 237.55 74.67 75.06 8.10 8.53 0.434
Garret Anderson531 201 208.94 59.21 61.67 7.95 7.97 0.016
Pat Burrell627 236 247.60 82.06 85.00 9.39 9.27 -0.118
David Dellucci228 84 90.32 29.17 30.85 9.38 9.22 -0.152
Marlon Byrd210 100 102.84 21.98 21.78 5.93 5.72 -0.215
Terrence Long423 166 163.61 76.66 73.49 12.47 12.13 -0.342
Reggie Sanders268 108 105.77 47.56 43.95 11.89 11.22 -0.671
Ryan Klesko504 204 202.51 76.35 69.22 10.11 9.23 -0.877
Miguel Cabrera495 188 208.43 72.91 72.74 10.47 9.42 -1.049
Manny Ramirez689 243 254.92 125.73 118.56 13.97 12.56 -1.412
Larry Bigbie232 98 96.20 34.67 28.06 9.55 7.87 -1.677

You can pick out the stadiums where fielding is difficult. Look at Manny Ramirez, Matt Holliday, Luis Gonzalez and Terrance Long. They're all in left fields that generate a lot of runs. And look at Coco Crisp in left, saving 30 runs. Some of that has to translate to center at Fenway.

As always, I'm anxious to hear your criticisms of the model.

Update: I was just reading and responding to an e-mail from Studes at The Hardball Times (see comment below) and I need to clarify something. When I'm talking about runs saved by a fielder (Predicted RCA - RCA), I shouldn't be so liberal in attributing those runs to the fielder. As the Jeter/Reyes comment above shows, those numbers are influenced by teammates. You should think of the runs saved as being attributed to "the fielder and his surrounding teammates."

Update: In order to make things a bit clearer, I've noted in the headings to the tables that the Out columns are outs that are attributed to the fielder, whether actual or predicted.


Comments

Maybe it's just the engineer in me, but what Uncertainties are you coming up with in the data?

Other note, Bigbie at the bottom aye? Something seems odd about that to me. Are you using any sort of park factors?

Fantastic work,as always.

Posted by: Mike Lafser at February 19, 2006 02:07 PM

Hey David,

I think you're really onto something here, but I'm kind of confused by your approach and the tables.

When applying run values to situations like this (player's performance against an average with the same number of batted balls), I agree that you have to include both the run impact of the hits allowed/disallowed vs. average, AND the run value of the outs caught/not caught vs. average.

So I think you comparison of Everett and Berroa is incomplete (again, if I understand what you did). Given your approach, the "runs per nine outs" is the only really good number to use. Personally, I would use a "run value" for outs in my original calcuation (making the "per 27 outs" column unnecessary).

I completely don't understand your comment re: Jeter and Reyes; I guess I don't understand your methodology for attributing balls to fielders. Are you saying that the "zone" in question for shortstops differs by team? How do you know that balls they didn't get to were fielded by others?

Finally, I don't understand the ballpark point in your last table. How do you know that the ballpark is affecting these players vs. their own talent?

Sorry for all the questions.

Posted by: studes at February 19, 2006 03:47 PM

this is great, thanks for getting this wonderful stuff up. I had suggested something along these lines a comment to a pmr post, but this is very well thought out and exactly why I think looking at balls that aren't outs is much more useful than looking at the ones that are. Thank you.

Posted by: andy at February 19, 2006 05:46 PM

david honey,

any system which has astros adam everett and chris burke at the top of the ratings sounds great to me...

Posted by: lisa gray at February 19, 2006 08:04 PM
Post a comment









Remember personal info?