Baseball Musings
Baseball Musings
February 24, 2006
Lineup Analysis

I love the way the internet works. Bob wrote my today to tell me about an article he posted at HireMeTheo on the optimum Red Sox lineup. He got the idea from two sources; Cyril Morong's recent article on the weights given to on-base average and slugging average depending on lineup slot, and Ken Arneson's application of this to the Oakland Athletics.

Ken was kind enough to post a Perl script on how to do this. I think this is just a fascinating subject and this type of program would be very useful to simulation game players (and maybe baseball managers in general). So I coded the analysis program as a python script, and now you can find the optimum lineup for any set of nine players based on the work of these gentlemen.

The program is called Lineup Analysis. Just fill in all the slots with data (you can make up names if you like), press submit and off you go. It supplies you with the 20 best and 20 worst lineups given that set of players.

Here are a couple of examples (they take a minute to run). Using The Bill James Handbook 2006 player projections, here's what the best and worst 2006 Yankees lineups look like. Notice that the program puts very unusual people at the top of the order. It also seems to put the worst hitters 8th. It also likes to put three high OBAs in front of the best power hitter.

Here's a look at my idea of the all-time best hitting team (Edgar Martinez is the DH). How would you like to start a game facing Bonds and Ruth? Don't tell Joe Morgan he's batting 9th.

I hope you'll give this a try. It takes a minute to run since it has to go through all the probabilities. If nothing else, it can generate some interesting discussions.

The general consensus is that lineups don't matter much. These program are showing that the difference between the best and worst lineups is half a run per game, or 80 runs a season. That's 8 wins. However, even the worst managers aren't going to use the worst lineup. If you keep your high OBA players at the top and the sluggers in the middle, you're not going to come out very near the top.


Posted by David Pinto at 08:00 PM | Strategy | TrackBack (0)
Comments

Is there a way for you to set it up so we could just put in a lineup and it would tell us the projected runs per game for that lineup instead of calculating the 20 best and 20 worst?

Posted by: Bill at February 24, 2006 11:49 PM

This is some fun stuff. Mike Schmidt batting eighth? I'll have to use this for my sim league.

Posted by: Brian at February 25, 2006 07:39 AM

Amazing. Stuff like this, along with a Red Sox win, are the two things that really brighten up my day.

Posted by: stat man at February 25, 2006 11:43 AM

Very cool, David! Can I make a request? Let the user specify a lineup (in addition to returning the best lineup). That way, we can calculate how many runs the manager of our favorite team is wasting by using a brain-dead lineup :-)

Posted by: Jason at February 25, 2006 01:47 PM

I could do that, but it's not that difficult to do a single lineup by hand, or even program a spread sheet.

Posted by: David Pinto at February 25, 2006 02:12 PM

I used this program to see how different projected outputs from my Phills might affect the runs scored by the team. I used 2005 numbers as a base, then adjusted the players' projections one by one to the 90% PECOTA projection.

Because the lineups actually used are not terribly less effecient than the "optimal" ones, I assumed the optimal lineup would be the projected runs.

Bottom line: If Ryan Howard meets his 90% projection, the Phils will score a whoooooooooole lotta runs.

Posted by: pawnking at February 25, 2006 03:01 PM

This kind of statistical analysis is loads of fun, and provides insight into the game. But if I were an actual manager, I would have to balance the stats with the human element.

Players are real people, and their performance can be affected by things like emotion and comfort level in addition to talent and skill. Batting leadoff could have a negative impact on Giambi's production; ditto for Manny batting second.

I think this is also why the Jamesian concept of a "situational bullpen" doesn't seem to work in real life. A team is probably more comfortable if they have a closer, even if it doesn't make statistical sense.

It's probably also why old-fashioned "baseball men" like Tommy Lasorda, Sparky Anderson, and Jack McKeon (and Ozzie Guillen) can be effective, even if they completely ignore statistical analysis. Scott Podsednik batting leadoff undoubtedly cost the White Sox some runs -- but they somehow managed to win anyway. Sparky used to do at least a couple "dumb" things a game, but he's in the Hall of Fame anyway.

I repeat: I love statistical analysis, and it has its place in running a baseball team. But it's not the only thing to consider.

Posted by: John Walters at February 25, 2006 04:08 PM

Leaving aside questions about the accuracy of the underlying coefficients from Morong's regression analysis, there is a huge logical contradiction here: the 'leverage' assigned to each lineup spot assumes a traditional lineup. You can't then create a non-traditional lineup and keep those values the same! To take a simple example: the formula puts great weight on the OBP of the 9th place hitter. So the program puts Sheffield in the 9 hole. But since the #3 hitter is now Cano (!), Sheffield's OBP obviously has less value than it would in a normal lineup. (Not to mention that any lineup that gives extra ABs to Cano and Williams at Sheffield's expense cannot possibly be maximizing production). This is a fun toy, but it would need a lot of work before it could provide accurate estimates of lineup run production.

Posted by: Guy at February 25, 2006 09:34 PM

Guy is on the spot here. There's no way you're going to get a good analysis by learning regression coefficients on a large data set and then apply them to single lineup. To get a reasonable result, you need to calculated expected runs for each lineup (via e.g. sampling).

Posted by: Jason at February 26, 2006 05:13 PM

This lineup analysis is neat. Still, like any model, it has limitations. In this case, the impact of speed is ignored in the model. Quicker base runners are not preferred over slower ones, by assumption.

Another real-world problem is that nobody knows what a player's OBA and Slugging will be this year. Even part way through the year, there are always uncertainties.

Still , this is a fascinating model. No doubt fancier verisions will be designed.

Posted by: David at February 26, 2006 11:47 PM

Hi. Love the lineup analysis, but have a problem with the supposed best lineup of all-time. How could Joe Morgan be taken over Rogers Hornsby?
Hornsby had a much higher career On-Base Percentage (.434 to .392) and Slugging Percentage
(.577 to .427) than Morgan, so the average runs scored per game would have to be higher with Hornsby in the lineup. And that doesn't take other HOF 2nd basemen like Charlie Gehringer into consideration. So, why Morgan? Thanks.

Posted by: David Wolfe at February 20, 2008 02:59 PM
Post a comment









Remember personal info?