Baseball Musings: Analyzing the Analysis

February 25, 2006

Analyzing the Analysis

Don Scotto at Beyond the Boxscore spent a lot of time with the Lineup Analysis tool and he's figured out how to build the lineup without the computer's help.

To me, the most interesting and counter-intuitive aspect is the number three hitter:

This was the biggest surprise: the 3 hitter should be the player that doesn't fit into any of the other spots. Every other spot has some significance, but if I were building a lineup, I would just put the leftover player in the 3 hole. This seemed very counterintuitive to me when I first heard it, but David Pinto noted, "Part of what it's telling us is that you need to spread out your easy outs." I still struggled to get this, but I'm starting to, now. Marc said something to the effect of "the worst players have to go somewhere." I guess this is really it; the other spots just have greater needs. If you can get a good hitter here, it means that your lineup is very deep.

A few years ago when I was working at UMass, someone at our lab pointed out an article that analyzed lineups using Markov chains. It showed the pitcher should bat eighth. I had a very tough time believing that, until I sat down with pencil and paper and found you really didn't lose that much giving the pitcher those extra at bats, a few runs at most. It took a while, but I got my head around the idea of a second leadoff hitter in the nine hole. So it's easier for me to deal with a poor hitter in the three spot.

Here's what I think is going on. There really are two lineups here. The 9-1-2 section of the order is the killer OBA guys. The 4-5-6-7 are the boppers. Three and eight are the easy outs, separated so the opposing pitcher doesn't get any easy stretches.

One and two get on base a lot, so hitter four almost always bats in the first inning. If he doesn't, he's also a good OBA guy so he's good at leading off the second inning. When the lineup turns over, you have the extra good OBA in front of the first two hitters. Since these two also have decent power, number nine is a new table setter. It would be very interesting to see if #9 and #4 led off a lot of innings, since you would expect #3 and #8 to make the third out when they get the opportunity. If that were to be true, the two lineup idea would be absolutely brilliant. At the two places you're most likely to run into the third out, at leadoff man is coming up next.

Update: Be sure to read the comments. There's an indirect link to a retrosheet research article in one. Here's the direct link to Evaluating Traditional Lineups.

Posted by David Pinto at 10:47 PM | Strategy | TrackBack (1)

Comments

Dude, you have blown my mind... "my mind has been blown".

The possibility that the third-spot, perhaps the most coveted spot in a baseball line-up could, in fact, be a give-a-way spot... wow. I mean, I could accept that the number nine spot could be rather easily interchanged with the number one spot as a "second leadoff" spot, but this is, as Ornette Coleman would say, something else.

Posted by: Hunter at February 25, 2006 11:54 PM

I just wish we could see it applied somewhere

Posted by: Marc Normandin at February 26, 2006 06:16 AM

I fiddled around with the lineup analysis and ran it for the Mets.

In their optimal lineup, Kaz Matsui bats third.

I just cannot imagine that there is a baseball manager alive who would have the guts to bat someone as unpopular as Matsui in the coveted 3-hole.
Can you imagine what the beat writers would do to Willie Randolph if he told them that the Opening Day lineup had Matsui batting third? They'd run him out of town before the second game of the season.

The Lineup Analysis feature is fascinating, though, and lots of fun to play with.

Thanks for making it available to us.

Posted by: fred at February 26, 2006 07:49 AM

Morong discovered an error in his original calculations on the #3 slot. Also, the #9 slot has inflated importance because he combined leagues (such that good production in #9 = AL team = more runs). He's got new #s, and if program is changed to reflect that I doubt it will still put bad hitters in #3 slot.

David's theory about spreading out easy outs is interesting, and I dont know if it's true. But this program can't really shed light on the issue because it isn't a true inning-by-inning simulation. It just takes each hitter's OBP and SLG, multiplies by fixed coefficients, and produces a R/G total. And since it's static rather than dynamic, it can't possibly work: if you move the traditional 3-4 hitters to hit 1st and 2nd to take advantage of their high OBP, the program should reduce the 'multiplier' for their OBP since they are now followed by inferior hitters. But the program doesn't do that.

Posted by: Guy at February 26, 2006 08:33 AM

Guy is right of course about the limits of the program, at least Ken Arneson's perl version which David rewrote.

Results which undermine the conventional wisdom are exciting, but a couple of studies by Tom Ruane bear on this discussion about possibly having a "leftover" or "below average" hitter in the #3 spot.

1) most recently an article at www.retrosheet.org, (following the research link) called "Evaluating tradional lineups" which addresses the question pretty directly. He used a markov chain analysis and 1993-2004 base-out transition data separated by leagues. For the American league, the best simulated lineups involved flip-flopping the current #3 and #4 hitters. For the National league the best lineups had a little more variation around the 3 spot; dropping the #2 hitter to the 3rd spot a few times, and bringing the #4 hitter forward most of the rest of the time. Basically all the best NL lineups involved shuffling the #2 through #5 spots among each other. So in his results, the #3 spot still gets filled by a "good" hitter

2) A much older study, on Situational Hitting, available on the baseball think factory site (http://www.baseballthinkfactory.org/btf/scholars/ruane/articles/situational_hitting.htm)
may shed some light on the "skills" best suited for hitting 3rd, and thus indirectly on what type of hitter is best used in that spot.

It contains a very interesting chart showing the percentages of base-out situations faced by each lineup position. [The leagues appear to be combined, and the data is from retrosheet for 1982-1983 and 1987]. The leadoff hitter, for example, being guaranteed a leadoff at bat once a game, had 40.4% of his plate appearances in a nobody on/nobody out situation. The #4 hitter was 2nd most likely to get a lead off type plate appearance. The #3 hitter was the most likely to bat with nobody on and two out, and the least likely to bat with nobody on and nobody out. The benefit of high on base percentage, particularly just reaching first, is minimized with this distribution of situations. The #3 hitter also batted an above average percentage of times with runners in scoring position, but about an average percentage of plate appearances in situations which include a runner at first.

In other words, the #3 spot had very limited leverage for ability to advance runners with extra base hits, but extra leverage for hitting singles. I'd infer from this distribution of situations that the ability to draw walks and hit doubles had somewhat less leverage in the #3 spot than for example the #4 spot, who had more opportunities to advance runners from first, and as noted, a higher rate of "leadoff" at bats. Consideration of these subtleties (beyond just overall OBP and slugging percentage) would probably increase optimization a little.

From the situational data, it seems like a spot suited for someone with at least average power and an above average batting average. By virtue of his position near the top of the order, the #3 hitter gets an above average number of plate appearances over the season, and his overall hitting skills are certainly leveraged. This conclusion could be undercut by ripple effects of situations passed to "down-lineup" batters, but Tom Ruane's markov chain analysis in "Evaluating Traditional Lineups" suggests this doesn't happen. The #3 hitter should be a good hitter.

Posted by: joe arthur at February 26, 2006 10:26 AM

as a followup to my earlier comment, Tom Ruane's recent article referred to an earler article by Mark Pankin. Now that I've looked at that article again I see that Mark's analysis includes regressions on component hitting skills. His results suggest that my specific speculation about the walk rate of the #3 hitter might be wrong.

Posted by: joe arthur at February 26, 2006 02:32 PM