Baseball Musings: More on Probability and Range

September 22, 2003

More on Probability and Range

Last Friday, I wrote about the beginnings of a probabilistic model of range for fielders. In doing so I neglected to cite the work of Michael Lichtman, leading Michael to make this comment in a thread on Baseball Primer:

David's work is EXACTLY the same as my UZR and it would be nice if he referenced it as such. A person doing work in an area has a responsibility to research the work already done in that area.

Park effects CANNOT be included in these kinds of results, other than the effect that the park (weather, altitude, turf) has on the speed of the batted ball and to some small extent, the percentage of line drives, fly balls, and ground balls. So you might see SOME park effects reflected in this kind of analysis, but not much. The Oakland and Coors Field thing may be more of a coincidence than anything else.

Also, a significant part of park effects is the size of the foul territory (e.g., Oakland and LA). That normally doesn't show up in any kind of defensive measure, including Pinto's version of UZR...

I have apologized for my oversite, and Michael has thanked me for that. However, I'd like to disagree that our systems are exactly the same, and point out what I see as the differences.

First of all, what Michael deserves credit for is the methodology. He looked at the probability of getting outs in a zone for an average fielder and is comparing that to a given fielder. This is a probabilistic model, although he doesn't state this explicitly. His model is p(o,f|zone), or in English, the probability of an out by a fielder given the zone the ball is hit into. If you look back at my post I state my model as the following:

I'm asking the question, what is the probability of a batted ball becoming an out, given the parameters of that batted ball?

There's an important difference here. Michael is starting with zone ratings, and putting them in a probabilistic model. I'm starting with a probabilistic model, and I really don't care what the parameters turn out to be, as long as they work. The difference can really be seen in part II of Lichtman's post. Here, he's adjusting for all kinds of different factors. He figures out an adjustment factor (for example, a park effect), then multiplies his original figure by that. In my model, these factors would just be more parameters. No special adjustments are needed.

Also, in this work, I'm trying to get rid of any sort of defined zone. In order to decide if a ball is in a zone, you have to know the distance it traveled. But that distance depends on the whether or not a fielder stopped the ball. So two line drives that are similar otherwise will look very different if a fielder catches one but not the other. Does it make a difference? I'm not sure, but the idea of a zone is an artifical construct, and I would rather have it fall out from the parameters of the batted ball than from post-contact factors.

One of the other things I never liked about zone ratings (by the way, I worked extensively on the zone rating code when I was at STATS, Inc.) is that they do nothing to rate pitchers and catchers, and Michael kept this feature in his UZR. One reason for this is that zones are hard to define for these fielders. But since this work moves beyond the zone, it's not a problem. We can measure catcher and fielder stats as well as any other position. I consider this a big improvement over zone ratings.

Finally, the treatment of errors. STATS and Lichtman treat errors as if they were actually balls on which the fielder recorded an out. I once witnessed a lively discussion between Bill James and John Dewan over this. Bill thought this was wrong, that if you don't make the play you shouldn't get credit for getting to the ball. John disagreed. I did not have strong feelings about this at the time, but I've come down on Bill's side. One nice thing about this, you don't have to make a separate adjustment for the errors, as UZR does. There are just two types of balls, outs and non-outs. It doesn't matter if the non-out is an error or not; the fielder's probability will be lower, and since you would expect the errors to occur on easier balls, the fielder will pay a big penalty for his errors automatically (I have not proved the previous statement, but should be able to see if it's true once the complete system is in place).

Michael Lichtman deserves the credit for coming up with the idea of thinking about fielding in terms of probability of balls in a zone. My work has extended that idea and formalized it, making it easier to compute, and extending it to fielders not previously covered.

Posted by David Pinto at 09:01 PM | Defense | TrackBack (0)