Baseball Musings
Baseball Musings
February 13, 2009
Better Predictor

Last spring I used the Lineup Analysis Tool to predict how many runs per game teams would score during the season. Marcel the Monkey forecasts for OBA and Slug were entered for the likely starting lineup for each team. I promised to revisit the data, and here how the predictions compare to actual team runs per game:

Runs per game, 2008
Team Predicted Runs Scored Actual Runs Scored
NYY 5.89 4.87
BOS 5.78 5.22
CLE 5.6 4.97
DET 5.58 5.07
TB 5.45 4.78
CHW 5.29 4.98
TEX 5.28 5.56
LAA 5.2 4.72
ATL 5.18 4.65
PHI 5.15 4.93
OAK 5.14 4.01
BAL 5.12 4.86
TOR 5.1 4.41
COL 5.1 4.61
NYM 4.98 4.93
MIL 4.97 4.63
CHC 4.92 5.31
MIN 4.87 5.09
SEA 4.87 4.14
HOU 4.84 4.42
LAD 4.84 4.32
KC 4.82 4.27
STL 4.76 4.81
FLA 4.64 4.78
CIN 4.6 4.35
SD 4.57 3.93
ARZ 4.51 4.44
PIT 4.48 4.54
WSH 4.44 3.98
SF 3.99 3.95

The correlation is 0.61, which is a bit better than random but not great. Oakland was the biggest outlier on the downside, while Cubs had the biggest upside from the prediction.

Here's the data in graphical form (click graph for a larger image).

RunPredictionComparison.jpg

When I run the numbers this year, I'll include the result according to the regression equation. That should give us a somewhat more realistic view of a team's offense, since the projected starting lineup never plays every inning of every game.


Posted by David Pinto at 08:51 AM | Predictions | TrackBack (0)
Comments

I'm not stats wiz but from my understanding a random relationship results in a 0.00 correlation, which is far away from a 0.61 correlation.

A 0.61 correlation, from what I understand, means that the projection explained 61% of the movement, which is actually a large percent, just not a statistically significant relationship.

But a stats wiz can correct me if I am wrong on any of this.

Posted by: obsessivegiantscompulsive at February 13, 2009 10:31 AM

You're close... the r = .61, but to imply a percent influence, you need to square it. So r^2 = .3721, or 37% of the variance is explained by the Marcel/lineup tool projection.

There's really three sources of error here: players not performing like they projected, the lineup tool not accurately modeling a lineup, and the playing time for the players not being like it was projected. I think the third one is certainly out of the hands of the projection-makers, so it's too bad we couldn't factor it out somehow.

Posted by: Mike at February 13, 2009 11:25 AM

I've been thinking about trying this sort of thing. I wondered, though, whether it might be worth just straight-up using Marcel's projections, which include a playing-time piece: i.e., he (sort of) takes injuries and days off into account. The obvious problem is that teams that trade or sign free agents might show as having 1,200 at-bats from first base or something.

Has anyone tried this? Just add up all the runs that the monkey predicts? I'm curious how he did. (I'm kinda new to this stuff and not sure where I'd look for prior attempts.)

Posted by: Scooter at February 13, 2009 02:02 PM

It might be best to check the accuracy of the lineup analysis tool to check the runs predicted using actual data for 2008. I once used it to comment on another site and tango blasted me for using it. But I like it because it's simple to use and I was not using it to predict actual runs, but to compare lineups.

B-ref and other sites show how each lineup position did for the season. If this checks out, then the variations may be due to players not meeting projections, or teams being more effiicient or not as efficient in scoring runs. The Red Sox for example always seem to score less runs than the models predict they should have based on actual stats.

If I have time I will do it myself, but not much at the moment.

Posted by: Paul at February 15, 2009 07:35 PM
Post a comment









Remember personal info?