Baseball Musings
Baseball Musings
January 23, 2006
Variation in Predicted DER

A reader named guy left the following comment to this post:

There are two things being tracked: real DER and pred DER. Real DER changes very little -- approx .691 in 2004 and .694 in 2005 according to David's data. If 3 plays out of 1,000 makes '05 a "better fielding year," well OK (but the difference is w/in the MOE). But the predicted DER goes from .698 last year to .681 this year. That doesn't make sense to me. The distribution of 125,000 BIP can't possibly be that different.

It appears it is that different. Here's how the aggregate for all four seasons break down:

SeasonIn PlayActual OutsPred. OutsDERPred. DERDifference
200213191591661 92748.99 0.695 0.703 -0.00825
200313365792756 91800.78 0.694 0.687 0.00715
200413295291912 93408.06 0.691 0.703 -0.01125
200513358992647 91018.16 0.694 0.681 0.01219

According to this table, balls in play in 2002 and 2004 were relatively easy to field, but were not fielded well. In 2003 and 2005, balls were difficult to field, but defenders picked them just fine. My first guess is to believe it's true, but I want to study the issue more.

Update: Here's a more detailed table, broken down by the type of batted ball and season.

SeasonBatted Ball TypeInPlayActual OutsPredicted OutsDERPredicted DERDifference
2002Bunt Fly249227 227.50 0.912 0.914 -0.00201
2003Bunt Fly301284 282.33 0.944 0.938 0.00554
2004Bunt Fly281257 258.17 0.915 0.919 -0.00415
2005Bunt Fly272260 260.00 0.956 0.956 0.00000
2002Bunt Grounder28612144 2137.95 0.749 0.747 0.00211
2003Bunt Grounder29982229 2257.07 0.743 0.753 -0.00936
2004Bunt Grounder29312262 2245.54 0.772 0.766 0.00562
2005Bunt Grounder29092209 2203.44 0.759 0.757 0.00191
2002Fly4303738924 39043.76 0.904 0.907 -0.00278
2003Fly4167338088 37349.57 0.914 0.896 0.01772
2004Fly4513638575 39530.60 0.855 0.876 -0.02117
2005Fly4274237634 37297.06 0.880 0.873 0.00788
2002Grounder5791643378 43953.03 0.749 0.759 -0.00993
2003Grounder5867344371 43818.41 0.756 0.747 0.00942
2004Grounder5971943861 44493.05 0.734 0.745 -0.01058
2005Grounder5989144273 43618.51 0.739 0.728 0.01093
2002Liner278526988 7386.75 0.251 0.265 -0.01432
2003Liner297797574 7883.40 0.254 0.265 -0.01039
2004Liner248856957 6880.70 0.280 0.277 0.00307
2005Liner277758271 7639.14 0.298 0.275 0.02275
2003Pop (Not used)233210 210.00 0.901 0.901 0.00000

Fielders did a really good job of catching line drives in 2005, and there were a lot more than in 2004.

Update: An explanation for the increase in line drives is posted here.


Comments

Where did you get these stats?

Posted by: Peter Angelos at January 23, 2006 09:28 PM

I'd guess that some of the fluctuations of inPlay & actualOuts are due to varying interpretations of the definitions of fly, grounder, liner, etc. Probably the people who do the recording are relatively constant within a season, but there's probably turnover between seasons,

Btw, I'm surprised that Predicted DER changes so much between seasons. Doesn't Predicted DER depend mainly on park, batted ball type, and location to which the ball is hit? As your chart shows, even within a particular batted ball type, there is substantial variation between seasons (e.g. liners '03 & '04). Was there a major change in parks or a substantial change in the distribution of batted ball locations? Is there another variable I'm forgetting?

Posted by: Jason at January 23, 2006 10:50 PM

"According to this table, balls in play in 2002 and 2004 were relatively easy to field, but were not fielded well. In 2003 and 2005, balls were difficult to field, but defenders picked them just fine."

I suppose this is theoretically possible, but extremely unlikely. Certainly convenient that fielding changes almost exactly offset the change in difficulty every year. What's far more likely is some kind of coding changes in the BIP data that are changing your predicted DER.

One possibility: Fielders were much better on LDs in 04-05 than 02-03, yet much worse on both FB and GB. It seems likely that some hits labeled LDs in 02-03 are now being called GBs and FBs. Also, some of the y-t-y changes in DER are implausibly large -- the DER on LDs has gone from about 25% to 28% in '04 and then 30% in '05. That's a huge change -- far greater than sampling error could produce (MOE = +/-.005). An individual player could show this kind of change, perhaps even all players at one position, but how could all MLB players improve on LDs so dramatically and quickly?

Posted by: Guy at January 23, 2006 10:50 PM

David,

More line drives, with more being caught could mean that some balls that would be labeled OF flies in the past are now being labeled as LDs.

Posted by: David Gassko at January 23, 2006 11:04 PM

Actually, the LD% is lower in 04-05 (19.8%) than in 02-03 (21.7%), though '04 is the really low year.

Looking at the DERs, I think the real split is 02-03 vs. 04-05. You may need a distinct 2004-2005 model to make this work. Here are the DERs (02-03, 04-05):
LD .253 .289
FB .909 .868
GB .753 .737

Posted by: Guy at January 23, 2006 11:28 PM

Scorer-bias can certainly make a play here. In 2003, the FB DER is .914 and it's .855 in 2004. That's an enormous difference: .059 outs per FB. You have a sample of 40,000 FB, or an average of 450 plays per OF position. 450 x .059 = 27 more hits allowed in 2004 than 2003, for each OF position? No. Not possible, at a player level. And then suddenly, it shoots right back up?

You definitely have to have a "seasonal" factor, such that each season is adjusted appropriately.

This just gets more fascinating!

Btw, Betancourt was one of the winners of the Fans' Scouting Report, as best fielding SS in the league. (Along with Adam Everett). Nice to see PMR match that.


Posted by: tangotiger at January 24, 2006 10:35 AM

I do agree that there are enough questions about the data to make the "four year" approach an issue. When we got our four-year data from BIS, we noticed some interesting trends, and there was clearly some bad data in 2002, which was their first year.

The 02/03 vs. 04/05 split might make sense. I looked at simple out percentages of outfield flies per year (not including home runs) and got 80% and 81% the first two years and 74% and 77% the next two years. I think it likely there's been an overall trend toward more standardization that perhaps took a bit of a jump in 2004. Line drive out percentage rose from 24.6% to 25.3% to 27.9% to 29.1% each year.

My impression is that their line drive categorization has evolved over time. David, do they have a written definition of a line drive vs. a fly?

Posted by: studes at January 24, 2006 02:11 PM

Even with "strict" written definitions, it still comes down to a judgement call.

There is definitely the issue of data quality. I did some "quality check" on the raw 2004 BIS data, and there were some problems. Not that big of a deal, but enough to tell me that integrity checks were not being run in places that they should be running. I wasn't looking at the GB/FB/LD/Bunt issue either, which I'm sure you'd have even more problems with.

You can also conceivably make the claim that it is not a "seasonal" standard, but a scorer-by-scorer standard. This means that you can have a team like the Blue Jays that are consistent year-to-year and a team like the Orioles that are all over the place (just for illustration), and that you shouldn't apply any adjustment for the Jays, and you should for the Orioles.

At this point, I think a seasonal correction would be the best first-step.

Posted by: tangotiger at January 24, 2006 03:01 PM

Good points Tango; nice to see you posting here :)

Would you happen to know how scoring assignments work? The model would certainly work best if it is aligned with the collection method. Is there one scorer per stadium? One per team (two scorers per game)? A pool of scorers that travel regularly?

Posted by: Jason at January 24, 2006 08:00 PM
Post a comment









Remember personal info?