Baseball Musings
Baseball Musings
November 06, 2003
Sabermetric Inroads

Robert Tagorda sends a link to this article on using game theory to determine the value of baseball players:


Using reams of historical data, Lonergan and Polak can measure the probability of a team's chance of winning a game, given any set of circumstances. With each at-bat, a player can help or hurt his team's chances.

Here's how their method works: Let's say the home team is down by two runs in the bottom of the fifth inning, with no outs and a runner on second base. At that moment, the home team has a 39% chance (or 0.39 probability) that it will win. If the batter grounds out, and the runner at second fails to advance, the team's chance of winning falls to 33%. The difference between the two, -0.06, is assigned to the batter who just grounded out.

DIFFERENT ANGLE. Polak and Lonergan add up all of a player's outcomes for the season. Doing so yields the exact number of wins -- or losses -- a player contributed to his team, relative to an average player. For example, New York Yankees slugger Jason Giambi contributed 4.9 wins (unadjusted for special circumstances -- see footnote in table below, which shows who would win 2003's MVP and Cy Young awards, according to this method). On the other end of the spectrum, maligned Yankees pitcher Jeff Weaver contributed -2.5 wins (also unadjusted). If all of the players' net win contributions are added up, the result equals the number of games over .500 the team finished in the regular season (the 2003 Yankees finished 101-61, 20 games above .500).

The method has some distinct differences from other quantitative analyses, such as the "sabermetrics" (named after SABR, the Society for American Baseball Research) method popularized by baseball historian Bill James and others. Sabermetrics also seeks to assign portions of wins to different players, but it relies on selective weighting of certain baseball statistics, such as a hitter's on-base percentage or a pitcher's earned-run average, and uses regression analysis to examine those stats' effect on wins or runs scored.

Lonergan and Polak claim that their method, which doesn't rely on traditional statistics, eliminates a step -- going directly to the measurement of game outcomes.


The article goes on to point out that at the current time, this method does not account for fielding, making the MVP table at the bottom of the article very suspect. All of the value calculated by the difference in situations is attributed to either the batter or the pitcher, and we know that's not true. Roy Halladay is the best scoring player in the AL by this method, yet some of those wins have to be attributed to the defense of the Blue Jays.

There is another problem as well. The researchers used "reams of historical data" to figure out the probability of winning in a particular situation. But these probabilities are not constant over time. The probability of coming back from three runs down in the fifth was much lower in the dead-ball sixties than in the swinging nineties. So, like linear weights, you can't have a formula that works correctly every year; you have to wait for the season to finish and then make adjustments.

Still, with the proper refinements it should be a good system, and I'm glad to see another club is testing the waters and using a new model to evaluate ballplayers. We'll have to look for a big improvement from a low payroll NL club next year.


Posted by David Pinto at 08:51 AM | Statistics | TrackBack (1)