Baseball Musings
Baseball Musings
March 10, 2009
MIT Conference
Permalink

Sal Baxamusa summarizes the MIT's Sloan School of Business Sports Analytics Conference. I get a mention:

Baseball Musings blogger David Pinto said that the effectiveness of the shift at neutralizing lefthanders suggested that more hitters should attempt to dunk or bunt the ball the other way. Rehman was not optimistic, saying that for players it was a buy-in and ego issue that may not be worth broaching. He made a good point: how do you convince a slugger to dunk the ball the other way rather than rip a 400-foot bomb to right?

My main point was the free base was worth a lot. In other words, why would a hitter trade a near 1.000 OBA just to be macho and pull the ball? I understood Rehman and Pupura's point that it's difficult to tell a hitter to go the other way. Successful baseball, however, is all about making adjustments, so the players should want to adjust to the shift.

It's like the argument about walking Barry Bonds. It's a bad idea to walk Barry Bonds; you're increasing his team's offense, not decreasing it. The same with hitting against the shift. It's a free time on base. It helps, it doesn't hurt, and it's even better than a walk, since anyone on first is going to end up on third since no one is there to cover the base.

You can find my posts on the conference starting here and working backward.

Posted by StatsGuru at 10:38 AM | Comments (5) | TrackBack (0)
March 07, 2009
MIT Baseball Analytics
Permalink

I was not able to blog while I was on the panel, but I thought it went well. Tim Pupura from minor league baseball (former GM of the Astros) and Shiraz Rehman of the Diamondbacks represented management. Christina Kahrl of Baseball Prospectus and myself represented the outsiders. John Dewan of BIS moderated and it was a very lively discussion.

Posted by StatsGuru at 01:29 PM | Comments (0) | TrackBack (0)
MIT Hot Hand
Permalink

John Huizinga is now speaking on the hot hand in basketball. It's a good mathematical presentation with probabilities and p-values. Very interesting so far.

Update: One of the interesting things coming out of this is that John and his colleague created an NBA database based on chances, where a chance is essentially a continuous possession of the ball. Possessions can have multiple chances as balls go out of bounds or there are non-shooting fouls and offensive rebounds. It's a great way of modeling the game, where events occur within chances.

Update: Players are less likely to make a shot after making one. Also, they rush the next shot. The player is also more likely to take the team's next shot, especially a player who handles the ball. It's not defense that clamps down, it's the offense not taking as a good a shot.

Update: No evidence of hot hand, but NBA players play like they think there is.

Posted by StatsGuru at 01:03 PM | Comments (0) | TrackBack (0)
MIT Globalization Panel
Permalink

The next panel I'm attending is title Globalization of Sports. Jonathan Kraft of the New England Patriots, Tim Romani of the ICON Venture Group, David Baxter of Adidas and John Huizinga of the U. of Chicago are on the panel, moderated by Mike Gorman of the Boston Celtics. I had a chance to speak with Huizinga last night at the Celtics game. He's Yao Ming's agent, and the story of how that came to be was quite interesting. John is not an agent but an economics professor, and one of Ming's cousins was a student at UC.

Globalization.JPG

Kraft, Romani, Baxter, Huizinga, Gorman

Update: Gorman starts with a quote from DAVOS that sports is one of the top ten global industries, and the only one that truly achieved global dominance.

Update: Kraft says the Patriots are the first team with a Chinese web site, and now has 20 fan clubs in the country.

Update: Romani notes that with teams having built so many new stadiums and arenas in the last 20 years, there's not much left to do in the US, so their group is looking to international venues to build.

Update: David Baxter notes that adidas sells more NFL jerseys around the world than any other sport.

Update: Huizinga notes the NBA got off to a bad start in China because the NBA didn't understand the world was different outside of the US. The NBA wanted to start a reading program. The Chinese found that insulting, and were also trying to encourage students to study less and exercise more.

Update: Is globalization just another term for westernization? The panelist think short term that might be true, but eventually the east will send ideas west. Gorman is asking very good questions. He's done his homework.

Update: Gorman asks what is the Black Swan in terms of globalization. It stumps the panel for a minute, but Kraft thinks a terrorist attack at a facility would change the landscape. Huizinga agrees but adds corruption as a second problem.

Update: A question comes from the audience on the ability to be a fan of any team anywhere. Romani notes that they start designing venues from the camera positions.

Update: Romani notes right now there is money available for infrastructure building from the government, but not for the actual construction of stadiums.

Update: Gorman asks are we really globalizing sports, or just creating minor leagues around the world? I hope someday players are willing to play anywhere, because the majors exists around the world.

Posted by StatsGuru at 10:12 AM | Comments (0) | TrackBack (0)
MIT-Sloan Sports Analytics Conference
Permalink

The first panel of the day is starting, Evolution of the Fan Experience. Daryl Morey of the Houston Rockets is moderating Mark Donovan of the Philadelphia Eagles, Jeff Van Gundy and Bill Simmons of ESPN, Sean O'Brien of EA Sports and Brian Burke of the Toronto Maple Leafs.

EvolutionFan.JPG

Donovan, Van Gundy, Simmons, O'Brien, Burke, Morey

Update: Brian Burke says he's in the entertainment business, not the hockey business. His claim is that over 30 years a team is going to be .500, so you can't market around winning. He wants stars and action, likes that his team hits and fights.

Update: Bill Simmons believes ticket prices are a problem. With better TV and interactivity, why pay the money when you can stay home and have just as good an experience.

Update: Van Gundy notices empty seats at NBA games. People who own tickets might not show up because they don't want to pay the extra $150 for parking, food and extras.

Update: Mark Donovan talks about points of contacts with fans. They welcome them when they arrive at the park, high five them when they're leaving. The Eagles try to serve their fans real time in the park. They want the fans to be story tellers about the team.

Update: Simmons is showing he doesn't understand basic supply and demand. He dosen't understand why tickets are so much.

Burke points out that a lot of season tickets are being split among people, so everyone can afford them.

Update: Burke makes the point that someone from ESPN shouldn't be lecturing about what people are charged. Good laugh from the crowd.

Update: Brian Burke brings in focus groups to find out what music the fans want to hear at the game. He doesn't understand why O-bla-di is so popular.

Update: Van Gundy wants the NBA to develop more rivalries. He thinks opponents are too friendly. They'll have an altercation in the game, then hug afterward. He's like to see more rabid fans, thinks they are too passive.

Update: Simmons and Burke agree that the NHL season should be shorter. Simmons to keep demand high (he learns quickly), Burke because the season is too taxing on the players.

Update: Teams are using text messages at the game to build databases.

Update: Donovan says he wants replays at the game to be better than what you see on TV.

Update: Burke has now dissed the Beatles and Opera.

Posted by StatsGuru at 08:49 AM | Comments (0) | TrackBack (0)
March 04, 2009
Outfield Distribution
Permalink

Mike Fast explores DIPS at The Hardball Times. He includes a graph of the distribution of fly balls and line drives to the outfield and notes they peak at the three outfield positions:

Oddly enough, we see both fly balls and line drives peaking around the typical positions of the three outfielders and dipping in the gaps and along the lines. I can't think of any reason for balls in the air to preferentially group around those three vectors, so I assume that must be a scoring bias. Accurately marking the location of a ball fielded in the middle of a vast outfield expanse free of landmarks is a challenge, and the MLB stringer may tend to mark the fielding location closer toward the typical fielding position of whichever outfielder fielded the ball. I don't know whether BIS and STATS data suffer from a similar bias.

As I wrote him, I don't think it's a bias, I think it's real. I noticed the same pattern when I worked at STATS, Inc. STATS uses up to three reporter accounts to determine placement of the ball, so their data shouldn't be biased. In addition, the primary reporter is at the park, so he doesn't get fooled by TV angles. The only explanation I have is that those are good vectors for getting line drives by infielders, so to the extent a player can direct his hits, those are good places for the ball to go.

Please donate to the Baesball Musings Pledge Drive.

Posted by StatsGuru at 08:00 PM | Comments (2) | TrackBack (0)
February 25, 2009
Panelist
Permalink

I'll be covering the MIT Sloan Sports Analytics Conference for the third year in a row on Saturday, March seventh. I've also been asked to participate in the Baseball Analytics Panel, with a group of true heavy hitters. This may be your only chance to see Mark Cuban and me in the same conference!

Posted by StatsGuru at 11:27 PM | Comments (2) | TrackBack (0)
February 16, 2009
Selfish Stats
Permalink

I'm reading Michael Lewis's latest column in the New York Times, and this paragraph fascinated me. He speaking with Daryl Morey, a former STATS, Inc. employee who went on to the MIT Sloan Business School and became GM of the Houston Rockets:

When I ask Morey if he can think of any basketball statistic that can't benefit a player at the expense of his team, he has to think hard. "Offensive rebounding," he says, then reverses himself. "But even that can be counterproductive to the team if your job is to get back on defense." It turns out there is no statistic that a basketball player accumulates that cannot be amassed selfishly. "We think about this deeply whenever we're talking about contractual incentives," he says. "We don't want to incent a guy to do things that hurt the team" -- and the amazing thing about basketball is how easy this is to do. "They all maximize what they think they're being paid for," he says. He laughs. "It's a tough environment for a player now because you have a lot of teams starting to think differently. They've got to rethink how they're getting paid."

There is such a stat in baseball, runs. For an individual to score a lot of runs, he needs to be skilled in a number of baseball abilities; getting on base, running and hitting for power. If a player wants to drive in 100 runs in a season, he can do that by delivering in about 80 plate appearances, and do little the rest of the time. A run scorer, needs to constantly set up teammates, and take advantage of stolen base opportunities, be able to score on long hits, or hit for power himself. Another reason runs scored is a bit of an underrated stat.

Posted by StatsGuru at 12:46 PM | Comments (9) | TrackBack (0)
January 18, 2009
Ignoring Intangibles
Permalink

Lone Star Ball links to a story summing up the Michael Young situation in Texas. I love description of seamheads (emphasis added):

One side held that the Rangers owed Young more respect than to simply order him to move. The other said Young is a highly compensated employee who needs to simply do what he's told. That group was bolstered by the "seamheads" -- ardent fans of baseball statistics -- who judge things almost strictly by the numbers, and thus tend to disparage Young because of his lack of range at short, and ignore the intangibles he brings to the organization.

Maybe seamheads should take a course is quantifying things that aren't capable of being appraised at an actual or approximate value. Here's the syllabus:

  1. Guessing.
  2. Defending the guess with anecdotes.
  3. Adding smugness.

Yeah, that's the ticket.

Posted by StatsGuru at 11:04 AM | Comments (13) | TrackBack (0)
January 16, 2009
Perception and Reality
Permalink

Recently, Dave Studeman introduced the Drama Index (DI) to measure the importance of a game. He now combines that with Win Probability Added (WPA) to create a new stat, Postseason Probability Added, PPA. For every game, Studes multiplies together a player's WPA and his DI, then sums them over a season. Hitters and pitchers who contribute the most to their team winning (high game WPA) in dramatic games (high game DI) do very well over the season.

The point of this stat should not be to determine post-season awards, but to explain post-season voting. The PPA of Ryan Howard and Albert Pujols show why MVP voters had them so close, and why we liked Dernard Span so much.

It's interesting that the perception of the writers about Howard performing well in big games was correct. If it matters is another story. Dave isn't trying to sell this as a new way to judge who should be MVP. Stats like this are like RBI and wins; as much team stats and individual stats. If Howard hit well in games with a lower drama index, then there's likely a lot less drama down the stretch. If Albert Pujols plays on a better team in 2008, there might be more drama in his season. If a great players is on the roster of a great team, there's fewer chances for him to win games in dramatic fashion. These stats (WPA, PPA) explain perception more than they judge ability, in my opinion.

Give the post a read, however, because the results are quite interesting.

Posted by StatsGuru at 05:44 PM | Comments (0) | TrackBack (0)
January 13, 2009
Power Positions
Permalink

Cyril Morong graphs power at the eight defensive positions over time by league. The most interesting result is that power is much more important at shortstop in the AL than the NL, and while the NL concentrates power in the corner outfield positions, in the AL the three positions are converging.

Posted by StatsGuru at 08:50 AM | Comments (1) | TrackBack (0)
January 02, 2009
Dollar Values
Permalink

Dave Cameron explains how they arrive at dollar values for Win Values.

Posted by StatsGuru at 12:54 PM | Comments (0) | TrackBack (0)
December 31, 2008
New Data
Permalink

The Lahman Database update for 2008 is now available.

Posted by StatsGuru at 10:32 AM | Comments (0) | TrackBack (0)
December 30, 2008
New Values
Permalink

Value Wins appear to be the new rage around the baseball blogosphere. Right now, they are just for batters, but it looks like the Yankees got a good deal on Teixeira, the Red Sox got an even better deal on Pedroia, and A-Rod earned his salary in 2008.

Posted by StatsGuru at 04:56 PM | Comments (4) | TrackBack (0)
December 29, 2008
Measuring Drama
Permalink

Studes looks at the drama of games to show why we believe late season games are more important than early season games. His Washington Nationals graph shows great evidence why no one was watching in 2008.

Posted by StatsGuru at 07:47 AM | Comments (7) | TrackBack (0)
November 05, 2008
Sabermetrics Wins Again
Permalink

Congratulations to Barack Obama on his victory over John McCain. Congratulations, too, to Nate Silver, who called the election extremely well. It's another victory for sabermetrics, or at least the political equivalent.

After the 1990, STATS, Inc. published their annual Bill James Baseball Handbook. For the first time, that book contained Bill's projections for batters for the 1991 season. When Peter Gammons reviewed the book, one of the things he gleaned from the projections was that Bill James predicted that Jeff Bagwell was going to win the NL batting title.

James did not make that specific prediction in the book. Bagwell was traded to the Astros in September for Larry Anderson and it wasn't clear that Jeff was going to play, given the Astros had Ken Caminiti at third base. His BA projection, however, was the highest of any National League player, so Gammons was in a sense right.

Bill later told me that he felt his whole system was on the line that year. If Bagwell failed, no one would trust it again. The Astros move Bagwell to first base, however, and while he didn't win the batting title, he did hit .294 and win Rookie of the Year. James was vindicated, and the system survived.

Nate, I'm sure, faced the same kind of scrutiny with 538. He nailed the result. It's another victory for statistical analysis, and I hope this spreads to more areas of political decision making as well.

Posted by StatsGuru at 07:52 AM | Comments (4) | TrackBack (0)
November 04, 2008
New Interface
Permalink

I haven't heard much positive or negative about the new interface to the Day by Day Database. Please let me know if you like it better or not.

Posted by StatsGuru at 12:29 PM | Comments (7) | TrackBack (0)
November 02, 2008
New Database Interface
Permalink

I developed a new interface to the Day by Day Database. Please give it a try. My goal was to get you to your destination with fewer clicks and a shorter wait. Let me know if you find any bugs, typos, or if you have any ideas for improvements.

Posted by StatsGuru at 05:29 PM | Comments (2) | TrackBack (0)
October 03, 2008
Pacebo Protection
Permalink

More evidence that good hitters don't protect the batters in front of them.

Posted by StatsGuru at 08:41 AM | Comments (0) | TrackBack (0)
October 01, 2008
Final Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
September 30, 2008
Penultimate Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 04:35 AM | Comments (0) | TrackBack (0)
September 29, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:09 AM | Comments (0) | TrackBack (0)
September 28, 2008
Batting Race Update
Permalink

Dustin Pedroia beats out an infield hit to go 2 for 2 and raise his BA to .327. Joe Mauer is 0 for 3 so far to lower his BA to .328. Milton Bradley did not bat in the first for Texas.

Posted by StatsGuru at 03:46 PM | Comments (0) | TrackBack (0)
AL Batting Race
Permalink
Joe Mauer

Joe Mauer
Photo: Icon SMI

The AL batting brings two more games into the interesting category on this last day of the season. Joe Mauer leads with a .330 BA (.32954), a five point lead over Dustin Pedroia and a six point lead over Milton Bradley. Mauer can't sit on his laurels today as the Twins are battling for an AL Central title. If Mauer goes 0 for 3 with a walk, he ends up at .328 (.32768, actually). A four for four day by Bradley brings him to .33012; three for four brings him to .32771, which would give him the title. Pedroia plays a double header (weather permitting). A six for eight day by Dustin puts him at .33029, which would beat a four for four day by Bradley.

Picking up two hits appears to be the keep for Mauer. Even a two for five day raises his average to .33020, and will make it very difficult for either of the players chasing him to catch the former batting champ. Bradley, however, is safe to lead the league in OBA.

Posted by StatsGuru at 09:01 AM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:27 AM | Comments (0) | TrackBack (0)
September 27, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
September 26, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:22 AM | Comments (0) | TrackBack (0)
September 25, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:33 AM | Comments (0) | TrackBack (0)
September 24, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:49 AM | Comments (0) | TrackBack (0)
September 23, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:23 AM | Comments (0) | TrackBack (0)
September 22, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
September 21, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:07 AM | Comments (0) | TrackBack (0)
September 20, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:24 AM | Comments (0) | TrackBack (0)
September 19, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:41 AM | Comments (0) | TrackBack (0)
September 18, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
September 17, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
September 16, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
September 15, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (0) | TrackBack (0)
September 14, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
September 13, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:08 AM | Comments (0) | TrackBack (0)
September 12, 2008
Hit it With the Formula
Permalink

Earlier in the season one of my SportingNews.com columns developed a formula for predicting team saves from a team's winning percentage and average margin of victory (AMV). The Angels' AMV rose since that article, but is still low at 2.85. Interestingly, the team with the lowest AVM is another contender for best record in the league, the Tampa Bay Rays. The two teams show the variation in the results of the formula. The prediction for the two teams is for 55 saves on the year (55.1 for LAnaheim, 55.0 for Tampa Bay). The Rays pace puts them at 53 saves for the season, the Angels 65. Both teams provided their closers with the chance at the record, but the Angels exceeding expectations, and the relative health of K-Rod brought the record home to Anaheim.

Posted by StatsGuru at 08:46 AM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:04 AM | Comments (0) | TrackBack (0)
September 10, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:52 AM | Comments (0) | TrackBack (0)
September 09, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:53 AM | Comments (0) | TrackBack (0)
September 07, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:56 AM | Comments (0) | TrackBack (0)
September 06, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:40 AM | Comments (0) | TrackBack (0)
September 05, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:58 AM | Comments (0) | TrackBack (0)
September 04, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
September 03, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
September 02, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:00 AM | Comments (0) | TrackBack (0)
September 01, 2008
Labor Day Update
Permalink

The Day by Day Database is up to date. Hope everyone enjoys the last holiday of the summer with your friends and family. My thoughts go out to the people of the Gulf Coast who are suffering through hurricane Gustav today.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
August 31, 2008
The Dreaded WoW
Permalink
Aaron Heilman

Aaron Heilman
Photo: Icon SMI

With the score tied in the bottom of the ninth against Florida on Saturday, a lead-off walk by Aaron Heilman led to a runner on third with one out after a sacrifice and a wild pitch. At that point, Heilman issued two intentional walks to load the bases, then walked Josh Willingham to force in the winning run. It was a wild inning for Heilman, to say the least.

Heilman issued the 10th walk off walk in the majors this season. I define a walk off walk (WoW) as a walk or hit by pitch in the bottom of an inning (9th inning or later) in which the score is tied and the batter is credited with an RBI. This may turn out to be the biggest WoW season of the decade:

SeasonWoW
200010
20018
20029
20039
200411
20059
20065
20079
200810

Can pitchers issue two more in the final month of the season?

There's really no excuse for the WoW. Even if a pitcher throws the ball right down the middle of the plate, he still has a decent chance of getting an out. I'd rather give my defense a chance than take the risk of walking in the game winning run.

Posted by StatsGuru at 11:00 AM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:14 AM | Comments (0) | TrackBack (0)
August 30, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:27 AM | Comments (0) | TrackBack (0)
August 29, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (0) | TrackBack (0)
August 28, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:13 AM | Comments (0) | TrackBack (0)
August 27, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
August 26, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:57 AM | Comments (0) | TrackBack (0)
August 25, 2008
The Long and Short of It
Permalink

View from the Bleachers reprints some research from Bill James Online looking at the results of long and short at bats in terms of pitches seen. Players hit for a higher average and more power in short at bats, but get on base more in long at bats.

Posted by StatsGuru at 09:35 AM | Comments (4) | TrackBack (0)
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
August 24, 2008
Albert Takes the Lead
Permalink

The Braves and Cardinals are playing in the sixth inning. Albert Pujols is 2 for 2 with a homer, Chipper Jones is 0 for 2, and Albert has passed Chipper in the batting race, .359 to .358. We'll see if that holds up by the end of the game.

Update: Pujols ends the game 2 for 2, Chipper 1 for 3 and the two are tied at .359. Chipper is at .359459...., while Albert is .35867, so Chipper is still leading by less than a point. The Cardinals win the game 6-3.

Posted by StatsGuru at 03:47 PM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:48 AM | Comments (0) | TrackBack (0)
August 23, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:21 AM | Comments (0) | TrackBack (0)
August 22, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:09 AM | Comments (0) | TrackBack (0)
August 21, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:40 AM | Comments (0) | TrackBack (0)
August 20, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:35 AM | Comments (0) | TrackBack (0)
August 19, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:17 AM | Comments (0) | TrackBack (0)
August 18, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:42 AM | Comments (0) | TrackBack (0)
August 17, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:03 AM | Comments (0) | TrackBack (0)
August 16, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:48 AM | Comments (0) | TrackBack (0)
August 15, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:26 AM | Comments (0) | TrackBack (0)
August 14, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
August 13, 2008
Runs and RBI
Permalink

Alex Rodriguez doubles and scores in the top of the first to put the Yankees up 1-0. It's the 1576th run of his career, and he ranks 47th in Major League history. Alex has scored two more runs than he's driven in, 1574. That ranks 38th. This leads me to ask the question, is it easier to score runs than drive them in? Is that why RBI in general are given more glory than runs scored?

Part of the reason for the higher rank in RBI is that there are more runs to go around. While (obviously) every run is recorded, not every run is an RBI. I believe there's more going on here, however.

The ability to drive in runs depends on power, and power tends to fade with age. The ability to score runs depends on the ability to get on base, which tends to fade more slowly. Hitters with good strike zone judgement can stay in the majors longer as they compensate for fewer hits with more walks. They don't drive in as many runs, but they're still on base to score.

A-Rod is a little over 700 runs from both the run and RBI record. He could very well follow in the footsteps of Ruth and Aaron and set both.

Posted by StatsGuru at 01:25 PM | Comments (4) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:00 AM | Comments (0) | TrackBack (0)
August 12, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:04 AM | Comments (0) | TrackBack (0)
August 11, 2008
Rainy Day and Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
August 10, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:33 AM | Comments (0) | TrackBack (0)
August 09, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:45 AM | Comments (0) | TrackBack (0)
August 08, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
August 07, 2008
Counting Mistakes
Permalink

Sixty Feet, Six Inches comes up with a new statistic for pitchers, mistakes per inning. In combination with K per inning, it seems to be a good measure of the success of pitchers.

Posted by StatsGuru at 09:33 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
August 06, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:58 AM | Comments (0) | TrackBack (0)
August 05, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date. Verizon called and said someone would be here between 8 AM and 5 PM to fix my DSL line. Isn't it amazing how a high tech company like that can predicted a visit with such accuracy? I'll believe it when I see it. The guy will show up and probably tell me it's another ten days before the service comes on line.

With luck, blogging will be back to normal today.

Posted by StatsGuru at 07:35 AM | Comments (1) | TrackBack (0)
August 04, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
August 03, 2008
Friendly's Update
Permalink

The Day by Day Database is up to date as I start day five without Verizon DSL. Once again, the good people at Friendly's are supplying high-speed internet in exchange for breakfast.

Posted by StatsGuru at 08:54 AM | Comments (0) | TrackBack (0)
August 02, 2008
Saturday Pancakes and Sausage Update
Permalink

Once again, Verizon fails to fix my DSL service. Thanks to the good people at Friendly's, however, the Day by Day Database is up to date and I ate a nice breakfast. At some point I'll have time to rant about my Verizon experience, but for now I need to use the little high speed time I can find to update the site.

Posted by StatsGuru at 07:45 AM | Comments (1) | TrackBack (0)
August 01, 2008
Friendly's Update
Permalink

The Day by Day Database is up to date. Thanks once again to the good people at Friendly's for supplying WiFi at their restaurants free while I await Verizon to fix their screw-up.

Posted by StatsGuru at 07:48 AM | Comments (1) | TrackBack (0)
July 31, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

My DSL is still down, so I'm back blogging from Friendly's.

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
July 30, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:51 AM | Comments (0) | TrackBack (0)
July 29, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:46 AM | Comments (0) | TrackBack (0)
July 28, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (0) | TrackBack (0)
July 27, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
July 26, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:40 AM | Comments (0) | TrackBack (0)
July 25, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:25 AM | Comments (0) | TrackBack (0)
July 24, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:39 AM | Comments (0) | TrackBack (0)
July 23, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
July 22, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
July 21, 2008
Vacation Update
Permalink

The Day by Day Database is up to date.

I'm away with my family on vacation this week, so blogging will be light.

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
July 20, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
July 19, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:26 AM | Comments (0) | TrackBack (0)
July 18, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:02 AM | Comments (0) | TrackBack (0)
July 14, 2008
All-Star Update
Permalink

The Day by Day Database is up to date through the All-Star break.

Posted by StatsGuru at 06:50 AM | Comments (0) | TrackBack (0)
July 13, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:26 AM | Comments (0) | TrackBack (0)
July 12, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:43 AM | Comments (0) | TrackBack (0)
July 11, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:14 AM | Comments (0) | TrackBack (0)
July 10, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:02 AM | Comments (0) | TrackBack (0)
July 09, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:11 AM | Comments (0) | TrackBack (0)
July 08, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:21 AM | Comments (0) | TrackBack (0)
July 07, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:20 AM | Comments (0) | TrackBack (0)
July 06, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:47 AM | Comments (0) | TrackBack (0)
July 05, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:13 AM | Comments (1) | TrackBack (0)
July 04, 2008
July 4th Update
Permalink

The Day by Day Database is up to date.

Happy Independence Day to all my US readers! I hope you enjoy the day with family and friends.

Posted by StatsGuru at 07:47 AM | Comments (2) | TrackBack (0)
July 03, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
July 02, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
July 01, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:44 AM | Comments (0) | TrackBack (0)
June 30, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:54 AM | Comments (1) | TrackBack (0)
June 29, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:09 AM | Comments (0) | TrackBack (0)
June 28, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:57 AM | Comments (0) | TrackBack (0)
June 27, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
June 26, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:33 AM | Comments (0) | TrackBack (0)
June 25, 2008
Party Like it's 1999
Permalink

The Book Blog has Retrosheet's latest release. After two years of waiting, they are making available the 1999 season. This is great news. I'll be able to push the splits section of the Day by Day Database back to 1974. Also, they now have day by day batting and pitching lines going back to 1955, so we'll be able to extend that part of the database as well.

Now I just need to find the time.

Posted by StatsGuru at 12:22 PM | Comments (2) | TrackBack (0)
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
June 24, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:55 AM | Comments (0) | TrackBack (0)
June 23, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (0) | TrackBack (0)
June 22, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
June 21, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:37 AM | Comments (0) | TrackBack (0)
June 20, 2008
First Day of Summer Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:56 AM | Comments (0) | TrackBack (0)
June 19, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
June 18, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
June 17, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:55 AM | Comments (0) | TrackBack (0)
June 16, 2008
VORP Drive
Permalink

J.C. Bradbury takes a shot at VORP:

My point isn't that VORP is an awful or useless stat. To the contrary, there is clearly useful information contained in it. And those who prefer to hold discussions based on this metric should continue to do so. But there is no need for someone who does not speak to the language to learn the ins an outs of a new metric, as Sheinin suggests. I can talk about all its components without dropping the V-bomb. If you want to talk hitting, we can use OBP and SLG. Then you can bring in stolen bases and defense to capture other effects. For pitching, we can use strikeouts, walks, and homers. The big advantage of these is that I can have these conversations with people other than die-hard stat-heads. I can also explain the advantages of these metrics over traditional triple-crown stats, and that is a huge benefit.

I view VORP as an insider language, and by using it you can signal that you are insider. It's like speaking Klingon at a Star Trek convention. I can signal to others who speak the language that I am one of you. But, the danger of VORP is that once you bring it up the discussion goes down the wrong path as the uninitiated have reason to feel they are being told they are not as smart as the person making the argument. It's like constantly bringing up the fact that you only listen to NPR or watch the BBC news at dinner parties. The response is likely going to be the same, "well fuck you too, you pretentious asshole!"

I understand where Bradbury is coming from on this. I have the same problem with UNIX. If you talk to real programmers, UNIX is the be all and end all of operating systems. They are correct. UNIX, however, requires learning an inside language. The command names are cryptic, because when the OS was written, memory was so scarce that they couldn't afford to have commands longer than two or three letters. So when I work in UNIX, I have to have a book next to me so I can look up how to copy a file from one place to another. My Python scripts run just as well on Windows as on UNIX.

I find VORP useful, as I do runs created and win shares and lots of other metrics. Most of the time, however, I can look at a player's BA/OBA/Slugging line and get a pretty good picture of that hitter's abilities.

Posted by StatsGuru at 12:02 PM | Comments (3) | TrackBack (0)
VORP Drive
Permalink

J.C. Bradbury takes a shot at VORP:

My point isn't that VORP is an awful or useless stat. To the contrary, there is clearly useful information contained in it. And those who prefer to hold discussions based on this metric should continue to do so. But there is no need for someone who does not speak to the language to learn the ins an outs of a new metric, as Sheinin suggests. I can talk about all its components without dropping the V-bomb. If you want to talk hitting, we can use OBP and SLG. Then you can bring in stolen bases and defense to capture other effects. For pitching, we can use strikeouts, walks, and homers. The big advantage of these is that I can have these conversations with people other than die-hard stat-heads. I can also explain the advantages of these metrics over traditional triple-crown stats, and that is a huge benefit.

I view VORP as an insider language, and by using it you can signal that you are insider. It's like speaking Klingon at a Star Trek convention. I can signal to others who speak the language that I am one of you. But, the danger of VORP is that once you bring it up the discussion goes down the wrong path as the uninitiated have reason to feel they are being told they are not as smart as the person making the argument. It's like constantly bringing up the fact that you only listen to NPR or watch the BBC news at dinner parties. The response is likely going to be the same, "well fuck you too, you pretentious asshole!"

I understand where Bradbury is coming from on this. I have the same problem with UNIX. If you talk to real programmers, UNIX is the be all and end all of operating systems. They are correct. UNIX, however, requires learning an inside language. The command names are cryptic, because when the OS was written, memory was so scarce that they couldn't afford to have commands longer than two or three letters. So when I work in UNIX, I have to have a book next to me so I can look up how to copy a file from one place to another. My Python scripts run just as well on Windows as on UNIX.

I find VORP useful, as I do runs created and win shares and lots of other metrics. Most of the time, however, I can look at a player's BA/OBA/Slugging line and get a pretty good picture of that hitter's abilities.

Posted by StatsGuru at 12:02 PM | Comments (3) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
June 15, 2008
Father's Day Update
Permalink

The Day by Day Database is up to date.

Happy Father's Day to all the dads, dads to be and grand dads who love baseball. I hope you team wins today!

Posted by StatsGuru at 08:21 AM | Comments (0) | TrackBack (0)
June 14, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:47 AM | Comments (0) | TrackBack (0)
June 13, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:33 AM | Comments (0) | TrackBack (0)
June 12, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
June 11, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:51 AM | Comments (0) | TrackBack (0)
June 10, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:36 AM | Comments (0) | TrackBack (0)
June 09, 2008
Monday Update
Permalink

The Day by Day Database is up to date. I'm extremely ill today so blogging will be very light.

Posted by StatsGuru at 11:28 AM | Comments (5) | TrackBack (0)
June 08, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:19 AM | Comments (0) | TrackBack (0)
June 07, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
June 06, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
June 05, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:38 AM | Comments (0) | TrackBack (0)
June 04, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:16 AM | Comments (0) | TrackBack (0)
June 03, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:57 AM | Comments (0) | TrackBack (0)
June 02, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:34 AM | Comments (0) | TrackBack (0)
June 01, 2008
StatsGuru
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
May 31, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
May 30, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
May 29, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:28 AM | Comments (0) | TrackBack (0)
May 28, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:54 AM | Comments (0) | TrackBack (0)
May 27, 2008
Home Run Rates
Permalink

When we publish pitching rates, we tend to put everything in terms of nine innings; K per 9, BB per 9, ERA, etc. Home runs per nine, however, seems to be a tough one to grasp, probably because home runs are rare events. For example, how much better is a pitcher who allowed 1.2 home runs per 9 as opposed to a pitcher who allowed 1.3 home runs per 9? My suggestion is that we measure home runs per 200 innings. That is, how many would this pitcher allow over a full season. The 1.2 per nine pitcher would allow 26.7 home runs over 200 innings, while the 1.3 per nine pitcher would allow 28.9. Out experience would tell us that 40 home runs per 200 innings was a lot, and 10 would be very good. I'm interested in people's opinions on this.

Posted by StatsGuru at 05:05 PM | Comments (16) | TrackBack (0)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:00 AM | Comments (0) | TrackBack (0)
May 26, 2008
Memorial Day Update
Permalink

The Day by Day Database is up to date.

On this Memorial Day I'd like to thank all our verterans for their service and remember all those who gave their lives for their country.

Posted by StatsGuru at 07:24 AM | Comments (0) | TrackBack (0)
May 25, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
May 24, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:09 AM | Comments (0) | TrackBack (0)
May 23, 2008
What is Luck?
Permalink

In reponse to this post on Ian Kennedy, a commentator writes:

I wish David would write a piece (or link to it if he already has) defining what he thinks is luck. I think he is out to lunch on the concept. If a pitcher has a poor K/BB ratio yet he was effective - its luck!! There is no other explanation! This is a simplistic interpretation. Dave should consider other metrics before throwing luck around.

Actually, what I said was:

Kennedy walked four and struck out four. His walks and strikeouts have been close to even all year.

In other words, he show no improvement in a weak part of his game, a part that is totally under the control of the pitcher. The part of his game that improved Thursday night was the part that has to do with the interaction with hitters and fielders.

Now think of the outcomes in the three dimensions of pitcher, hitter and fielders as a cloud, a three dimensional structure that's tough to pin down. Kennedy's expected outcomes change with the fielders behind him and the batters he faces, as well as how well he is pitching that night. On some nights, he's in the great fielders, great pitching, lousy hitters part of the cloud. Some nights he's at the other, negative end, but he's still in the cloud. When he walks as many as he strikes out, I tend to believe that Kennedy's ability in the game was right in the middle of his cloud axis. So if he's the same pitcher he's been all year, then maybe the fielding axis came in at the extreme good end, or the opposition hitting axis came in at the extreme bad end. In other words, his good performance was due to things other than pitching ability, so the outcome for him was lucky.

What teams need to look for when they watch players develop is a move to a completely different cloud. Take Andrew Miller for example. In April, Andrew was pitching in a very similar cloud of outcomes as Kennedy, but in May, his walks and strikeouts indicate that he jumped to a cloud that in general has far better outcomes. (The worst outcomes in Miller's new cloud might be the best outcomes in his old cloud.) This however, is where it gets tough. Is it really a new cloud, or is it just a small sample of someone performing at the high end of their old cloud?

Part of that depends on the size of the cloud. Shawn Chacon, for example, show a lot of variance in performance over time. His cloud is large. Greg Maddux, in his prime, showed very little variance (he was always great). Maddux had a small cloud. So when Chacon puts together half a great season, you want to believe he's made a change for the better, but most likely he had a long run in a good part of his cloud. When Maddux had a bad game, it was shocking, because he seldom stepped out of a cloud with few bad outcomes.

So in this case, luck to me is a much different outcome without an obvious change in skills. Defining it exactly, as I hope the cloud metaphor demonstrates, is tough.

Posted by StatsGuru at 01:02 PM | Comments (9) | TrackBack (0)
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
May 22, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
May 21, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:21 AM | Comments (0) | TrackBack (0)
May 20, 2008
Daily Fix
Permalink

The Day by Day Database is up to date. Thanks to the good people at Hosting Matters for resolving the problem quickly.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
Technical Difficulties
Permalink

I've having some technical problems loading the Day by Day Database this morning. With luck, they'll be cleared up soon.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
May 19, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (0) | TrackBack (0)
May 18, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:52 AM | Comments (0) | TrackBack (0)
May 17, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
May 16, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
May 15, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
May 14, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
May 13, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:57 AM | Comments (0) | TrackBack (0)
May 12, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (0) | TrackBack (0)
May 11, 2008
Mothers Day Update
Permalink

The Day by Day Database is up to date.

A very happy mothers day to all the moms and mothers to be!

Posted by StatsGuru at 08:53 AM | Comments (0) | TrackBack (0)
May 10, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:33 AM | Comments (0) | TrackBack (0)
May 09, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
May 08, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:34 AM | Comments (0) | TrackBack (0)
May 07, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:00 AM | Comments (0) | TrackBack (0)
May 06, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:48 AM | Comments (0) | TrackBack (0)
May 05, 2008
Good for Tim
Permalink

I didn't get the Buck-McCarver broadcast on Saturday, but I wish I had so I could have heard this:

Joe remarked that Fukudome had gone 4-4 the day after the his cover issue hit the stand. After musing over Fukudome's ignorance of Cubs history and the variety of curses associated with them, he then asked Tim if he believed in curses or jinxes.

Now, it would be normal for Tim to play along with this silly idea. Tim however chose to rather bluntly shoot it down. "No," he said, "I don't believe in curses or jinxes or anything like that."

Buck then decided to bait McCarver by talking about how poorly McCarver had played after his two appearances on the cover of SI. McCarver responded again bluntly: "Can't a guy just play badly? What can't a guy just not play well? You don't need some curse or jinx to play poorly. Haven't we come far enough as a society not to believe in those things?"

Thanks, Tim for saying that so well. Hat tip to BBTF.

Posted by StatsGuru at 11:36 AM | Comments (0) | TrackBack (0)
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
May 04, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:24 AM | Comments (0) | TrackBack (0)
May 03, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
May 02, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
May 01, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (1) | TrackBack (0)
April 30, 2008
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
April 29, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
April 28, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
April 27, 2008
Rare Occurance
Permalink

This is really bucking the odds:

Consecutive losses for the Rockies despite leading in the eighth inning or later each game. It happened to only one other team in the past 100 years, the '78 Giants
.
Posted by StatsGuru at 09:49 AM | Comments (3) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:14 AM | Comments (0) | TrackBack (0)
April 26, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 10:01 AM | Comments (0) | TrackBack (0)
April 25, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:09 AM | Comments (0) | TrackBack (0)
April 24, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
April 23, 2008
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
April 22, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:42 AM | Comments (0) | TrackBack (0)
April 21, 2008
Patriots Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
April 20, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:07 AM | Comments (0) | TrackBack (0)
April 19, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
April 18, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:58 AM | Comments (0) | TrackBack (0)
April 17, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
April 16, 2008
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
April 15, 2008
More Data, More Research
Permalink

Mike Fast at The Hardball Times looks at how new tracking technologies might change the game of baseball. I must admit I haven't been keeping up with PITCHf/x as much as I should. My first thought is a probabilistic model of the strike zone. Given handedness of the batter and pitcher, the count, velocity, release point and coordinates of the pitch when crossing the plate boundary, (a line extending out from the front of the plate), what is the probability of:

  • A strike call
  • Contact on a swing
  • Fair in play
  • Type of ball in play
  • A base hit

There are a lot more pitches than balls in play, so one should be able to develop a better model than PMR with the data for a season. Then you can see which pitchers and hitters perform above or below average on these types of pitches. This will keep us busy for years to come.

Posted by StatsGuru at 10:33 AM | Comments (1) | TrackBack (0)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:20 AM | Comments (0) | TrackBack (0)
April 14, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:56 AM | Comments (0) | TrackBack (0)
April 13, 2008
Sunday Update
Permalink

The Day by Day Database is up to date.

I apologize for the lateness of the update. I spent the night suffering from an stomach ailment.

Posted by StatsGuru at 11:00 AM | Comments (2) | TrackBack (0)
April 12, 2008
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:18 AM | Comments (0) | TrackBack (0)
April 11, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:35 AM | Comments (0) | TrackBack (0)
April 10, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:20 AM | Comments (0) | TrackBack (0)
April 09, 2008
MattaMagic
Permalink

According to PythagenMatt, the Royals are the best team in the AL.

Posted by StatsGuru at 08:58 AM | Comments (2) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
April 08, 2008
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
April 07, 2008
First Place Marlins
Permalink

The Marlins defeat the Nationals 10-7 tonight and move a game ahead of Atlanta at the top of the NL East. They've been outscored 47 to 31, however, so don't expect that placement to last.

Posted by StatsGuru at 11:06 PM | Comments (1) | TrackBack (0)
What's the Probability?
Permalink

A nice probability quiz at The Numbers Guy. Here are two interesting ones:

4. In baseball, suppose the American League champion is better than the National League champion, such that it has a 55% probability of winning each game against the NL champ. Then the NL champ nonetheless will win a best-of-seven-games series four in 10 times. What is the smallest odd number, X, for which a World Series between these two league champs that is best-of-X will ensure that there's a 95% probability of a just result -- the superior AL champ winning?

5. Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind each of the other two, a cow. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, to reveal a cow. He then says to you, "Do you want to change your choice to door No. 2?" Is it to your advantage to switch your choice, assuming you prefer cars to cows?

I believe the answer to four is 383. I have a Python script that computes 95% confidence intervals, and the low end of 383 is 192. The low end of 381 is 190.

On question 5, the answer is to switch. I love asking this question. The answer is if you switch, you win the car 2/3 of the time. When you make the first choice, you'll be wrong 2/3 of the time. If you switch, you'll be right 2/3 of the time!

Posted by StatsGuru at 10:44 PM | Comments (12) | TrackBack (0)
How Clutch is Ortiz
Permalink

Cyril Morong does some significance testing on the number Bill James supplied on David Ortiz's clutch hitting. The only place Ortiz comes close to being significantly better in those situations is in extra-base hits:

Moving to XB%. He had a rate of 14.8% under normal circumstances while he a a 17.5% rate in the James clutch situations, for an 18% higher rate. The Z-score was 1.96 using the normal dropoff of about .013. So this is very close to being significant. His extrabase hit performance may truly be clutch.

If you think about it, that's what you really want in those situations, batters who can move runners a long distance with one hit. With a man on first, a single is nice, but an extra-base hit gives the runner a much higher probability of scoring.

Hat tip, BBTF.

Posted by StatsGuru at 11:53 AM | Comments (1) | TrackBack (0)
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
April 06, 2008
Closing the Door
Permalink

Joe Pawlikowski sends along this nugget:

The Angels do not give away ninth-inning leads. They have won 161 consecutive games when leading after eight innings. It's the longest active streak in the majors. The last ninth-inning loss came April 19, 2006, at Minnesota.

Since the start of the 2006 season, K-Rod has blown 10 saves. Either those were earlier in the game, or after the Angels took a lead in the top of the ninth (or in extra innings), or the Angels ended up winning the game anyway. I'd like to see the rest of the list in that time. I'm sure there aren't that many leads blown after eight innings.

Posted by StatsGuru at 10:23 AM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:13 AM | Comments (0) | TrackBack (0)
April 05, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
April 04, 2008
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:35 AM | Comments (0) | TrackBack (0)
April 03, 2008
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
April 02, 2008
Clutch Truce?
Permalink

The Other Fifteen wants a truce over clutch hitting.

This isn't a cry for fusion, or balance, or peaceful coexistence. The world wouldn't be a better place if newspaper articles all read "Today the Cubs and Brewers recorded 27 outs apiece in a contest at Wrigley Field, which revealed almost nothing about the two teams due to the small sample size involved." Nor would the world be a better place if VORP started including Steely-Eyed Resolve as one of its components.

What I am asking for is a simple truce: believers in clutch, I as a student of sabermetrics will stop telling you that clutch doesn't exist, or is insignificant, or what have you, if you will stop insisting that its existence in any way, shape or form has an impact on impartial evaluations of player performance. Do we have a deal?

No deal. There are clutch hits, which fit the narrative of the game discussed in the post. All players get clutch hits; that does not make them clutch hitters. When David Ortiz hits a walk off home run, there is no doubt in my mind it was a clutch hit. When Luis Sojo wins a World Series game with a hit, there's no doubt it was a clutch hit. That doesn't make them clutch hitters.

The narrative that X delivered in the clutch is fine. The narrative that he often delivered in the clutch is fine. The narrative that X is clutch hitter based on five or six at bats doesn't work.

Hat tip, The Hardball Times.

Posted by StatsGuru at 04:50 PM | Comments (6) | TrackBack (0)
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:56 AM | Comments (0) | TrackBack (0)
April 01, 2008
James Q & A
Permalink

Freakonomics publishes a Q & A with Bill James, where fans sent in questions. I like this one about the Cubs:

Q: Why can't the Chicago Cubs get into the World Series? Is it the small park? Low salaries? The curse of the billy goat? Does sabermetrics provide any insights?

A: Talking about the origins of it -- the Cubs fell into a trench in history in the late 1930's, when almost all baseball teams built farm systems, but the Cubs for several years refused to do so. This put them behind the curve, crippled them for the 1950's, and really the organization did not fully overcome that until about 1980.
Since 1980 they have had several teams that could have wandered into a World Series, with better luck. They haven't had any one overpowering team -- like the 1984 Tigers, or the 1992 Blue Jays, or the 1998 Yankees -- that was so good that it demanded a seat at the Last Banquet of Fall. And, unless you have a team that good, you're at the mercy of the fates.

Posted by StatsGuru at 05:57 PM | Comments (2) | TrackBack (0)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:14 AM | Comments (0) | TrackBack (0)
March 31, 2008
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:39 AM | Comments (0) | TrackBack (0)
March 28, 2008
What's in a Name?
Permalink

On Baseball and the Reds doesn't like the term sabermetrics.

While I know this is probably a minority opinion, I really dislike--almost despise--the term "sabermetrics." Maybe it's just because I didn't grow up with Bill James. But that term has always sounded both pompous and half-baked to me--like we're trying to claim some kind of grand authority or officiality by coming up with an official-sounding name for what we do.

I think at least part of the backlash against "sabermetrics" has as much to do with that name as anything else. I've occasionally interacted with a local reporter in Cincinnati for some stat-inspired articles on the Reds over the past year, and one thing I've tried to stress (as have the other folks like me who have contributed to these articles) is to try to avoid calling us sabermetricians. I don't want to give people that as a reason for ignoring some of the ideas we advocate.

I'd much prefer it if everyone just called what we do what it is--baseball research. There's nothing really special about it...we're just searching for better understanding of how the game works.

I used to work for a company call Dragon Systems, Inc. The name came from the owner's hobby of collecting Chinese Dragons. You can see the logo here. The company built the best speech recognition software available, but other business people would constantly complain about the name and the logo. They'd tell us no one knows why you do by the name. They'd say the logo looks like you're a Chinese restaurant. They were probably right, but the owners kept the name and the logo and built a very successful business because they built a damn good product.

The upside of the name was that when you said, "I work for Dragon Systems," everyone had to ask what the company did. If I said, "I work for Voice Products of America," they'd say that's nice and move on. The same is true for sabermetricians. Baseball researcher, big deal. Sabermetrician, what's that about?

My good friend Jim Storer is married to a doctor at Yale Medical School. She was at a reception for new fellows, and the various new doctors are being introduced. The MC notes that one is a sabermetrician, and asks, "Does anyone know what that is?" Linda raises her hand and answers, "Sadly, yes." That great bit of comedy doesn't happen if he's a baseball researcher.

So Justin, if you don't like the term, don't use it. Be a baseball researcher. But don't deny others the fun of being a sabermetrician.

Posted by StatsGuru at 08:58 AM | Comments (2) | TrackBack (0)
March 27, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:39 PM | Comments (1) | TrackBack (0)
March 25, 2008
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 11:02 PM | Comments (0) | TrackBack (0)
March 24, 2008
Cosmic Stats
Permalink

Cosmic Log looks at the science of baseball statistics quotes me from the AAAS meeting I spoke at in February.

Posted by StatsGuru at 04:49 PM | Comments (0) | TrackBack (0)
March 20, 2008
Ultimate Lineup
Permalink

Baseball Notes looks at the production of each team lineup slot in 2007 and comes up with the best combined lineup in the majors. There are some surprises in there as well.

Posted by StatsGuru at 03:24 PM | Comments (0) | TrackBack (0)
March 17, 2008
K-GB
Permalink

Rich Lederer at the Baseball Analysts looks at the relationship between strikeout and groundball rates among starting pitchers. Rich graphs the two rates against each other and groups pitchers by quadrant. What I think is interesting are the outliers on the high side of strikeouts and ground balls. If you take the five pitchers with the highest strikeout rates, you get a rotation of:

  • Erik Bedard
  • Scott Kazmir
  • Johan Santana
  • Jake Peavy
  • A.J. Burnett

If you do the same for the five highest GB% pitchers:

  • Derek Lowe
  • Fausto Carmona
  • Tim Hudson
  • Brandon Webb
  • Felix Hernandez

With the exception of Webb, the high K rotation is all aces, while the high ground ball rotation is mostly number two starters.

Posted by StatsGuru at 12:27 PM | Comments (2) | TrackBack (0)
March 13, 2008
Spring to Summer
Permalink

My latest column at SportingNews.com explores the relationship between spring training and regular season records.

The Baseball Musings pledge drive continues through March. Please consider making a donation.

Posted by StatsGuru at 04:21 PM | Comments (2) | TrackBack (0)
March 09, 2008
Complicated Stats
Permalink

Joe Posnanski takes on the idea that the new stats are too complicated:

But here's why this whole "It's so confusing" argument amuses me so much: People will tell you that the new stats are too convoluted or manufactured ... and yet there are NO stats more convoluted and manufactured than the basic statistics that baseball has been built around for more than 100 years.

Some of this should be obvious. Batting average? It's ridiculous. Preposterous. Imagine that no one had ever come up with batting average before ... and then someone on a blog came up with with this idea:

Blogger: I have come up with a new statistic. It involves balls put in play. I call it batting average.
Establishment: Great! How's it work?
B: See, what we'll do is, we'll take the number of hits that the batter has and divide it by the number of at-bats that he has in order to determine how often he gets a hit.
E: That sounds like on-base percentage. What's the difference?
B: Well, it's all in what you call "at-bats" For one thing, we don't count walks.
E: What do you mean you don't count walks?
B: They don't count. We take plate appearances and subtract walks. They never happened.
E: How can a walk never happen?
B: It just doesn't.
E: Aren't walks good things? Like in Little League, we always say "Walk's as good as a hit."
B: I hate walks. They're gone. So let's say a guy comes to the plate 12 times, and he gets four hits and walks twice ...
E: Right ... that's a .500 on-base percentage.
B: Exactly, but if you just subtract the walks, you will see that he has a .400 batting average.
E: Um, OK.
B: But there are other things. If you hit a fly ball, and someone tags up and scores a run, that does not count as an at-bat.
E: Why not?
B: Because you are sacrificing yourself for the betterment of the team? I call it a sacrifice fly. Get it?
E: Well, what are you sacrificing if it doesn't even count against your stats?
B: You just are, OK?
E: What if you hit a ground ball and the runner scores.
B: How's that?
E: Let's say the infield's back and a guy hits a ground ball to get the run in. How do you score that?
B: No, that's not a sacrifice fly.
E Why not? Doesn't that accomplish the same thing?
B: It just isn't. Come on, pay attention. What's it called. Sacrifice FLY? Hello! He didn't hit a fly ball.
E: It just seems to me ...
B: Sacrifice bunts also do not count as at-bats. And when you get hit by a pitch ... doesn't count.
E You don't get any statistical notice for getting hit by a pitch?
B: Like it never happened.
E: I'm afraid to ask this: What happens if you reach on an error.
B: That's the beauty of this system. According to my new batting average, you're out.
E: But you're not really out.
B: I know. Isn't it great?
E: Why does this have to be so complicated?
B: It's batting average! It will take over the world!

I like to explain this be asking a person to define an at bat as what it is, rather than what it isn't. You can't do it. Joe takes on ERA as well. It's a typical great Posnanski post.

The Baseball Musings pledge drive continues through March. Please consider making a donation.

Posted by StatsGuru at 07:50 PM | Comments (0) | TrackBack (0)
March 03, 2008
Win Share Aging
Permalink

The Baseball Crank publishes his latest age adjustments for Established Win Share Levels.

Posted by StatsGuru at 07:57 PM | Comments (0) | TrackBack (0)
February 25, 2008
Wouldn't You Like to be a Vorpy Too?
Permalink

Fire Joe Morgan revels in being called a VORPY by John Heyman.

It's a historic day. For years, man has waited for just the right term to use when insulting other men who love baseball numbers just a little too much. (What are they, gay for numbers? Probably.) And now, just like the wait for Shrek 3, that wait is ogre.

Jon Heyman has called us VORPies.

Now we can do that scene from Spartacus (and In and Out) in which we all stand up and declare, "I'm a VORPY!"

Posted by StatsGuru at 07:25 PM | Comments (7) | TrackBack (0)
February 16, 2008
Boston Symposia
Permalink

I participated in an AAAS symposia today on New Techniques in the Evaluation and Prediction of Baseball Performance. Thanks to Ed Aboufadel of Grand Valley State University for the invitation. Shane Jensen presented his SAFE system, a more sophisticated version of the Probabilistic Model of Range (PMR). Steve Wang showed new ways of visualizing data, concentrating on managers. Both were very interesting, and Alan Schwarz kept us on our toes as the moderator.

I talked about the Probabilistic Model of Range, and you can view the slide show here. One nice thing at this conference was a press conference after the talk. I've never done one of those before, and I must say the science writers asked very good questions. This was an unusual topic for this meeting but it went over well.

Update: AP covered the talk.

Update: Some browsers can't run the slide show. It works with IE. For those who can't you can download the actual power point presentation.

Download PowerPoint 2007 version. Unfortunately, the charts I used aren't compatible with PowerPoint 2003.

Also, word that Jeter is at the bottom of the list of shortstops doesn't play well with Yankees fans.

Posted by StatsGuru at 06:25 PM | Comments (3) | TrackBack (0)
January 29, 2008
Holding the Bannister
Permalink

Brian Bannister just became the favorite pitcher of sabermetricans.

Posted by StatsGuru at 02:42 PM | Comments (2) | TrackBack (0)
January 24, 2008
Upping the Ante
Permalink

MGL at The Book blog ups the ante on Tango Tiger's clutch project.

Here is the kicker. I am willing to donate a substantial sum of money to a charity chosen by one side of the debate - the "non-sabermetric" side of course, if they win. We would have to define "winning" - maybe best of 3, if we do 3 things, like clutch, batter/pitchers, and hot/cold. Or we can do each one separately.

If the sabermetric side wins, I will also donate money, but that will be to a charity of our choice and it will be less money.

I'm not sure how much, but it would be on the order of $10,000 for them and $5,000 for us. What the heck. Anything to make a point. If this flies, let none of my/our detractors/naysayers EVER say that I won't put my money where my mouth is! This should generate some good publicity and might encourage the media and perhaps some insiders to participate.

Mitchel is looking for members of the baseball media and baseball insiders to contribute to this project. If you're one of them, I hope you'll participate.

Posted by StatsGuru at 12:30 PM | Comments (0) | TrackBack (0)
January 22, 2008
Retaining Win Shares
Permalink

Over at The Book, win share aging curves.

Posted by StatsGuru at 03:21 PM | Comments (0) | TrackBack (0)
January 15, 2008
Predicting Clutch
Permalink

Tom Tango puts forward a great idea for the 2008 season to examine the idea of clutch hitting. He's asking for your help in ironing out the details.

Posted by StatsGuru at 12:05 PM | Comments (0) | TrackBack (0)
January 01, 2008
Lahman Database Updated
Permalink

The latest version of Sean Lahman's baseball database is available for download. It covers the history of MLB through 2007.

Posted by StatsGuru at 12:54 PM | Comments (0) | TrackBack (0)
December 11, 2007
Random Ortiz
Permalink

Dan Fox finds a lot lacking in Bill James latest Sports Illustrated article on clutch hitting. Fox:

First and foremost, the article seems to promote the idea that after the now famous study titled "Do Clutch Hitters Exist?" published in the 1977 Baseball Research Journal by Dick Cramer, that little to no work has been done on the subject of clutch hitting and that what has been done has had an in-grained bias. Quite to the contrary, the topic has been the subject of almost continual debate with a variety of studies published over the years as documented on Cyril Morong's fine site. And more recently there have been several very good analyses done as I discussed in the introduction to my Schrodinger's Bat column of March 1, 2007.

This is not a typical James entry from the Baseball Abstracts. There no details of the method used or why the results are interesting. For example, he produces a table of David Ortiz in clutch situations since 2002 and notes the numbers are impressive.

That's the regular season; I understand he's had a couple of hits in postseason as well. It's a pretty good record; in fact, you kind of have to see more data to understand how good it is. We've started an award for the major leagues' clutch hitter of the year, based on the data, and David could pretty much win it any year. Only a handful of players a year drive in 30 runs in clutch situations. As to whether these data prove that David is a clutch hitter ... I ain't going there. This discussion has been messed up for 30 years because we got our shoulders way out in front of our shoelaces. From now on, I'm holding back.

My problem with the Oritz table is there's no reason to believe it's not random. In 394 at bats, Ortiz produced 127 hits. Over the six seasons covered, Ortiz hit .298. A .298 hitter has a 95% confidence interval of 100 to 135 hits. Ortiz hit 35 home runs in that number of at bats. His home run rate during those six years was .07236. The 95% confidence interval for home runs is 19-39. So he's at the high end of the range, but still in the range. He had a 1 in 8 shot at hitting that many home runs.

When James says, "kind of have to see more data to understand how good it is," I take it he means compared to other players. Until I see the other data, I remain unconvinced that clutch hitting is more than random noise.

Bill, in a way, makes my point for me:

The other question everybody asks now is "How do you determine what is a clutch at-bat?" I'll have to stiff you on that one for right now. I'll explain it generally and leave the details for some other time.

"Clutch" is a complicated concept, containing at least seven elements:


  1. The score,

  2. The runners on base,

  3. The outs,

  4. The inning,

  5. The opposition,

  6. The standings,

  7. The calendar.


All these items whittle down the at bats to a very small number, and it is very difficult to find significance in small numbers of plate appearances. For now, I'll stick to my mantra that good hitters are the good clutch hitters.

Posted by StatsGuru at 08:41 AM | Comments (8) | TrackBack (0)
November 17, 2007
Visual Aid
Permalink

Josh Kalk introduced his first version of a web based tool for viewing PITCHf/x data. Right now, it allows you to see in two dimensions where a pitch passed the batter. For example, here's Alex Rodriguez. Compare him to Alfonso Soriano. Soriano swings at a lot more pitches out of the strike zone. However, when Alex swings out of the zone, it's usually a swing and a miss, while Soriano often makes contact. Alex appears to look for a ball in the strike zone, and when he's fooled the result is a swinging strike. Soriano appears to do a better job of putting the bat on the ball, although the result is often a foul tip on what should be a ball. I hope Josh's next enhancement is a way to view a particular result. I'd like to explore, for example, what batters get the most called strikes outside the zone. Does lack of selectivity really expand the strike zone from the umpires view?

Posted by StatsGuru at 11:22 AM | Comments (0) | TrackBack (0)
November 07, 2007
I Love Neal Huntington
Permalink

The new Pirates GM answers a question (emphasis added):

The Pirates upper management has widely ignored OBP (on base percentage) in the past. How important will OBP be in player evaluation under your leadership? -- Eric S., Pennsboro, W.Va

We are going to utilize several objective measures of player performance to evaluate and develop players. We'll rely on the more traditional objective evaluations: OPS (on base percentage plus slugging percentage) , WHIP (walks and hits per inning pitched), Runs Created, ERC (Component ERA), GB/FB (ground ball to fly ball ratio), K/9 (strikeouts per nine innings), K/BB (strikeouts to walks ratio), BB%, etc., but we'll also look to rely on some of the more recent variations: VORP (value over replacement player), Relative Performance, EqAve (equivalent average), EqOBP (equivalent on base percentage), EqSLG (equivalent slugging percentage), BIP% (balls put into play percentage), wOBA (weighted on base average), Range Factor, PMR (probabilistic model of range) and Zone Rating.

That said, we will continue to stress the importance of our subjective evaluations. Succinctly stated, we believe that a combination of quality objective and subjective analysis will allow us to maximize our probability of success and to make the best possible decisions.

My reaction is the same as Fire Joe Morgan.

Posted by StatsGuru at 08:53 AM | Comments (2) | TrackBack (0)
October 19, 2007
SABR Speaker
Permalink

Tomorrow at 12:30 PM, I'll be one of the speakers at Connecticut SABR chapter meeting. It takes place at Quinnipiac in Hamden, in the College of Liberal Arts, Building #1. If you're in the area and want to stop in, I hope to see you there. I'm told I'm the fifth speaker of the afternoon.

Posted by StatsGuru at 07:00 PM | Comments (0) | TrackBack (0)
October 02, 2007
Final Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
October 01, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
September 30, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:50 AM | Comments (0) | TrackBack (0)
Delayed Update
Permalink

Due to technical difficulties, the Day by Day Database update will take place later today.

Posted by StatsGuru at 09:41 AM | Comments (0) | TrackBack (0)
September 29, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:47 AM | Comments (0) | TrackBack (0)
September 28, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:32 AM | Comments (0) | TrackBack (0)
September 27, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:01 AM | Comments (0) | TrackBack (0)
September 26, 2007
Call For Data
Permalink

Squawking Baseball provides financial analysis of the market for players. They are trying to develop an open source financial database and are looking for your help. Stop by and see if you can contribute.

Posted by StatsGuru at 02:35 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
September 25, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
September 24, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:49 AM | Comments (0) | TrackBack (0)
September 23, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:33 AM | Comments (0) | TrackBack (0)
September 22, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:01 AM | Comments (0) | TrackBack (0)
September 21, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
September 20, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:07 AM | Comments (0) | TrackBack (0)
September 19, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:33 AM | Comments (0) | TrackBack (0)
September 18, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:35 AM | Comments (0) | TrackBack (0)
September 17, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:20 AM | Comments (0) | TrackBack (0)
September 16, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
September 15, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:50 AM | Comments (0) | TrackBack (0)
September 14, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
September 13, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
September 12, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:00 AM | Comments (0) | TrackBack (0)
September 11, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:10 AM | Comments (0) | TrackBack (0)
September 10, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
September 09, 2007
Runs Plus RBI
Permalink
Alex Rodriguez homers off the Tigers.
24 AUG 2007: New York Yankees third baseman Alex Rodriguez (13) connects on a 2 run home run during the Detroit Tigers 9-6 win over the New York Yankees at Comerica Park in Detroit, Michigan. Rodriguez would go 2-5 with 2 RBI and 2 runs scored in the loss. Note the excellent hand-eye coordination.

Photo: Andy Altenburger/Icon SMI



In honor of Gene Orza and A-Rod joining the 130-130 club, here's the list of top runs+rbi in the majors this season:

Most Runs + RBI, 2007
PlayerRuns ScoredRuns Batted InRuns Plus RBI
Alex Rodriguez130138268
Magglio Ordonez105123228
Matt Holliday94110204
Jimmy Rollins12281203
David Ortiz10198199
Bobby Abreu10594199
Prince Fielder94104198
Ryan Howard78113191
David Wright9793190
Carlos Pena84105189
Vladimir Guerrero79110189
Adam Dunn9396189

Alex holds a bigger lead over the number two position than Ordonez holds over the number ten slot. Almost all these players do a good job of both getting on base and hitting for power. Part of it too, is being surrounded by good players. It's no surprise that Abreu is on the list as he benefits from Jeter in front of him and A-Rod behind him, just as Alex takes advantage of Abreu and Matsui.

The one person who stands out is Jimmy Rollins, the only leadoff type hitter in the top twenty. Batting first cuts down on Jimmy's RBI opportunities, but with his power and ability to steal he puts himself in scoring position quite often, as well as driving in runs when he has the opportunity.

Alex needs 32 more runs + RBI to break 300. Setting the AL single season record for home runs gives him at least twenty two more, so if he passes Maris he likely reaches 300 runs + RBI as well.

Posted by StatsGuru at 08:28 AM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
September 08, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
September 07, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:57 AM | Comments (0) | TrackBack (0)
September 06, 2007
Minor League Reference
Permalink

Via Baseball Analysts, minors.baseball-reference.com is now on-line. One of the great things about the last few years is the emergence of up-to-date minor league stats online. Now, when someone is called to the majors, we can get a good idea of how they might perform by looking at complete career lines, and seeing how they handled progressing to tougher levels of play. Thanks to Sean Forman for the site!

Posted by StatsGuru at 12:23 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
September 05, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:00 AM | Comments (0) | TrackBack (0)
September 04, 2007
Back to Work Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:06 AM | Comments (0) | TrackBack (0)
September 03, 2007
Labor Day Update
Permalink

Happy Labor Day! The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
September 02, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:57 AM | Comments (0) | TrackBack (0)
September 01, 2007
AL ERA Race
Permalink

Dan Haren and Kelvim Escobar both came back to the pack today as each pitched poorly. The two started the day .05 earned runs apart. Haren gave up five runs in six innings to raise his ERA to 2.87. That gave Escobar a golden opportunity to take the lead, but he allowed five runs in 2 2/3 innings to inflate his ERA to 2.99. Both pitchers take losses, and now 0.40 earned runs separate 1st from 6th in the AL ERA race.

Posted by StatsGuru at 08:35 PM | Comments (0) | TrackBack (0)
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:56 AM | Comments (0) | TrackBack (0)
August 31, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
August 30, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:11 AM | Comments (0) | TrackBack (0)
August 29, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
August 28, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
August 27, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:28 AM | Comments (0) | TrackBack (0)
August 26, 2007
OPS Rant
Permalink

Walk Like a Sabermetrician rants against OPS, and especially against slugging percentage:

Slugging Average is not a fundamental baseball measurement. SLG may be fairly intuitive, and it certainly is venerable, but it is not something that obviously is an important measurement to have on its own. After all, slugging average doesn't really measure power, because it includes singles. So then what does it measure? It is bases gained on hits by the batter per at bat. But what is the greater significance of bases gained by the batter per at bat?

It really has none. Certainly it is good for batters to gain bases on hits; but that, in and of itself, is not a meaningful measurement. You can even look at the game in such a way that the goal is to gain bases--but in that case, the goal is not for the batter to gain bases, it is for the team to gain bases. And a team doesn't gain one base for each single on average, nor four bases for each homer, nor do the ratios between one base for a single, two for a double, etc. hold when talking about the bases gained by the team.

The point is not that Slugging Average is meaningless or stupid; the point is that it just is. It is one way of attempting to quantify the value of hits other then counting them all equally as batting average does. It is a crude way of doing so, but it does have a fairly strong correlation with runs and it is a nice thing to know.

When I worked at ESPN, I sometimes had to explain the numbers to producers of shows and pieces. Television people are about pictures, and few really care about numbers. The explanation of slugging percentage that worked for me was not in terms of bases gained by the batter (slugging percentage is the average distance traveled around the bases by a batter per at bat) but in terms of base runners. Slugging average represents the ability of a batter to drive runners a distance. The higher the slugging average, the more likely the batter is to drive a runner around to score. That explanation seemed to get through to visual thinkers. That explanation also takes care of including singles, since singles often drive runners two bases, and if you collect a lot of singles (Wade Boggs had a good slugging percentage despite a lack of home runs), you tend to move runners a lot.

Posted by StatsGuru at 10:34 PM | Comments (2) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:44 AM | Comments (0) | TrackBack (0)
August 25, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:18 AM | Comments (0) | TrackBack (0)
August 24, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
August 23, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
August 22, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
August 21, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
August 20, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
August 19, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
August 18, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:12 AM | Comments (0) | TrackBack (0)
August 17, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
August 16, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
August 15, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (0) | TrackBack (0)
August 14, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (0) | TrackBack (0)
August 13, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
August 12, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:20 AM | Comments (0) | TrackBack (0)
August 11, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:33 AM | Comments (0) | TrackBack (0)
August 10, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
August 09, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:40 AM | Comments (0) | TrackBack (0)
Day by Day
Permalink

The Day by Day database update is delayed due to technical difficulties.

Posted by StatsGuru at 09:15 AM | Comments (0) | TrackBack (0)
August 08, 2007
A New Save Rule
Permalink

For subscribers to Baseball Prospectus, my latest column is up on a new way to define a save.

Posted by StatsGuru at 05:02 PM | Comments (4) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:04 AM | Comments (0) | TrackBack (0)
August 07, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
August 06, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
August 05, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
August 04, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

I'm off to Pittsburgh for the Reds-Pirates game. Blogging will be light during the eight hour drive. :-)

Posted by StatsGuru at 05:54 AM | Comments (4) | TrackBack (0)
August 03, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
August 02, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
August 01, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
July 31, 2007
Reviewing the Convention
Permalink

Aaron Gleeman writes about his trip to the SABR convention.

Posted by StatsGuru at 11:44 AM | Comments (1) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:41 AM | Comments (0) | TrackBack (0)
July 30, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:33 AM | Comments (0) | TrackBack (0)
July 29, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:01 AM | Comments (0) | TrackBack (0)
July 28, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:40 AM | Comments (1) | TrackBack (0)
July 27, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:10 AM | Comments (0) | TrackBack (0)
July 26, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
July 25, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:49 AM | Comments (0) | TrackBack (0)
July 24, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:29 AM | Comments (0) | TrackBack (0)
July 23, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:23 AM | Comments (1) | TrackBack (0)
July 22, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:49 AM | Comments (0) | TrackBack (0)
July 21, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
July 20, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
July 19, 2007
League Averages
Permalink

The Day by Day Database now features League Splits for both batters and pitchers. Like the player and team splits, these are all batting stats. But if you've wondered what the league average is for shortstops hitting, or how the league bats by innings, you can get that with this new function. The data goes back to 2000.

Posted by StatsGuru at 04:26 PM | Comments (0) | TrackBack (0)
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
July 18, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
July 17, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:57 AM | Comments (0) | TrackBack (0)
July 16, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 12:34 AM | Comments (0) | TrackBack (0)
July 15, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:30 AM | Comments (0) | TrackBack (0)
July 14, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (0) | TrackBack (0)
July 13, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
July 10, 2007
Mid-Season Win Shares
Permalink

The Hardball Times posted their list of win shares through the All-Star break. I'm surprised that Vlad Guerrero leads the majors, and that Alex Rodriguez ranks fifth. I realize part of that difference is the difference in won-loss records between the teams; the Angels won more games, so they have more win shares to go around. Still, at this point, how many of you would vote for Vlad over A-Rod for MVP? You can compare the two of them here, since their OBAs are very close. They have the same number of hits and walks, but Alex hits many more home runs and even steals better. Vlad does have better clutch numbers, however, (6.8 to 3.8) which is used in the Runs Created calculation. Three win shares seems to big a gap to me, however.

Posted by StatsGuru at 10:46 AM | Comments (9) | TrackBack (0)
July 09, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:55 AM | Comments (0) | TrackBack (0)
July 08, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:03 AM | Comments (0) | TrackBack (0)
July 07, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
July 06, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:49 AM | Comments (0) | TrackBack (0)
July 05, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:07 AM | Comments (0) | TrackBack (0)
July 04, 2007
Independence Day Update
Permalink

Happy Independence Day! Here's hoping you enjoy the day celebrating with family and friends (and maybe catching a ball game). The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
July 03, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
July 02, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
July 01, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:47 AM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:47 AM | Comments (0) | TrackBack (0)
June 30, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:38 AM | Comments (0) | TrackBack (0)
June 29, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:57 AM | Comments (0) | TrackBack (0)
June 28, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
June 27, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
June 26, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
June 25, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
June 24, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:20 AM | Comments (0) | TrackBack (0)
June 23, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:49 AM | Comments (0) | TrackBack (0)
June 22, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:58 AM | Comments (0) | TrackBack (0)
June 21, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
June 20, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:11 AM | Comments (0) | TrackBack (0)
June 19, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:55 AM | Comments (0) | TrackBack (0)
June 18, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
June 17, 2007
Father's Day Update
Permalink

Happy Father's Day to all the dads who read Baseball Musings! The Day by Day Database is up to date.

Posted by StatsGuru at 08:30 AM | Comments (1) | TrackBack (0)
June 16, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:13 AM | Comments (0) | TrackBack (0)
June 15, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
June 14, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (0) | TrackBack (0)
June 13, 2007
Mid Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
June 12, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:12 AM | Comments (0) | TrackBack (0)
June 11, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:38 AM | Comments (0) | TrackBack (0)
June 10, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:49 AM | Comments (0) | TrackBack (0)
June 09, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
June 08, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 11:21 AM | Comments (0) | TrackBack (0)
June 07, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:57 AM | Comments (0) | TrackBack (0)
June 06, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:43 AM | Comments (0) | TrackBack (0)
June 05, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
June 04, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:47 PM | Comments (0) | TrackBack (0)
Monday Update Delayed
Permalink

Due to technical difficulties, the Day by Day Database update will be delayed today.

Posted by StatsGuru at 09:44 AM | Comments (0) | TrackBack (0)
June 03, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
June 02, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
June 01, 2007
Late Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 05:52 PM | Comments (0) | TrackBack (0)
Delayed Update
Permalink

Due to technical difficulties and my being on the road, the Day by Day Database update will be late today.

Posted by StatsGuru at 06:18 AM | Comments (0) | TrackBack (0)
May 31, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:52 AM | Comments (0) | TrackBack (0)
May 30, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:00 AM | Comments (0) | TrackBack (0)
May 29, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:45 AM | Comments (0) | TrackBack (0)
May 28, 2007
Memorial Day Update
Permalink

The Day by Day Database is up to date. Sorry for the late update, but the site suffered technical problems this morning.

I'd like to thank all my active military and veteran readers for their service on this Memorial Day, and remember the friends you've lost along the way.

Posted by StatsGuru at 09:06 AM | Comments (1) | TrackBack (0)
May 27, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:27 AM | Comments (0) | TrackBack (0)
May 26, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
May 25, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
May 24, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
May 23, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:38 AM | Comments (0) | TrackBack (0)
May 22, 2007
What's the OBA of Justice Ginsberg?
Permalink

The Volokh Conspiracy discusses applying sabermetric principles to the study of law.

And it occurred to me that in law we now are largely where sabermetrics was in the early days when Bill James began cranking out his seminal Baseball Abstracts. The bulk of what we teach our students reflects tradition, customs, and intuitive reasoning. Little of it has been subject to rigorous statistical analysis.

And the first comment is priceless:

Forgive me, but the first question that this conjures in my mind is:

Who's the legal equivalent of Joe Morgan?

This of course can be answer two ways. Who is Joe Morgan the player, and who is Joe Morgan the announcer?

Posted by StatsGuru at 11:09 AM | Comments (3) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:28 AM | Comments (0) | TrackBack (0)
May 21, 2007
When Beane Counts Go Bad
Permalink

The Cincinnati Reds own the best Beane Count in the National League. That's pretty amazing, given they sit fifteenth in the National League in terms of winning percentage. The Beane count looks at the rank of home runs and walks for a team. You want lots of homers and walks for your batters, few for your pitchers. If you do all four well, you should score runs and not allow many. So why is the Beane count so far off for the Reds?

There appear to be a few things working against the Reds. The first is that their opponents are making up for the walk difference by picking up more hits. That puts their OBA higher than the Reds batters. And while the Reds are out-homering the opposition, their opponents are knocking out a lot more doubles. So the Reds also trail in slugging percentage.

On top of that, the Reds do very poorly vs. their team averages with runners in scoring position, while the Reds pitchers allow about the same batting averages in that situation. So the Reds in general do two things well that winning team do well, they out homer and out walk their opponents. But they don't reach base other ways, making them a low dimensional offense and pitching staff.

Posted by StatsGuru at 11:56 AM | Comments (0) | TrackBack (0)
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
May 20, 2007
Adjusting WPA
Permalink

I'm not a big fan of WPA, but Vegas Watch adjusts WPA for the player's defensive position. One of the tables on the page confirms the defensive spectrum, which is a nice fallout of the study.

Posted by StatsGuru at 08:35 PM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
May 19, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
May 18, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
May 17, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:06 AM | Comments (0) | TrackBack (0)
May 16, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
May 15, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:00 AM | Comments (0) | TrackBack (0)
May 14, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
May 13, 2007
Mother's Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
May 12, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:01 AM | Comments (0) | TrackBack (0)
May 11, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (0) | TrackBack (0)
May 10, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:23 AM | Comments (0) | TrackBack (0)
May 09, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:46 AM | Comments (0) | TrackBack (0)
May 08, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
May 06, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
May 05, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (0) | TrackBack (0)
May 04, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:39 AM | Comments (1) | TrackBack (0)
May 03, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
May 02, 2007
Hump Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:53 AM | Comments (0) | TrackBack (0)
May 01, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:28 AM | Comments (1) | TrackBack (0)
April 30, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
April 29, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:56 AM | Comments (0) | TrackBack (0)
April 28, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:40 AM | Comments (0) | TrackBack (0)
April 27, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
April 26, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
April 25, 2007
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:14 AM | Comments (0) | TrackBack (0)
April 24, 2007
Clutch Belief
Permalink

Dejan Kovacevic, one of the new breed of sabermetric sports writers, examines players and coaches beliefs in clutch hitting. He offers a good contrast between the statistical research, and what the players think pro and con. It strikes me that all the people in favor of the idea offer just anecdotal evidence.

Of those who feel otherwise, Pirates pitching coach Jim Colborn said, "Dead wrong. There is an element in certain people that allows them to focus at their peak and get into a zone when the situation is more important."

He cited, from his playing days, Joe Rudi, a career .264 hitter who had a reputation of elevating his level every postseason for the Athletics, at least as measured by the intangibles of timely hits and key defensive plays.

"Believe me: For all the great players in that lineup, Joe Rudi was not the one you wanted to face. He just had a knack."

I like Jason Bay's explanation:

"It's not so much a matter of raising your level in a clutch situation. It's a matter of keeping your level the same," Bay said. "Baseball is predicated on the idea that the people who are the most successful are the ones who do things the same way most consistently. It's not an emotion game like football or hockey, where you can go bust some skulls."

Which jibes with what researchers sees in the stats:

Some players, the argument can be made, do become better in trying situations. But those cases -- and this is one area where statisticians and those inside the game tend to agree -- are much rarer than those where performance decreases.

Maybe it can be summed up as the great players don't choke.

Thanks to Dan Fox, who is quoted in the article, for the link.

Posted by StatsGuru at 08:11 AM | Comments (3) | TrackBack (0)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:16 AM | Comments (0) | TrackBack (0)
April 23, 2007
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
April 22, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:25 AM | Comments (0) | TrackBack (0)
April 21, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
April 20, 2007
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 03:48 AM | Comments (0) | TrackBack (0)
April 19, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
April 18, 2007
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 10:53 AM | Comments (0) | TrackBack (0)
Late Update
Permalink

The Day by Day Database will be updated late today due to technical difficulties with my laptop.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
April 17, 2007
Off to Boston
Permalink

I'm off to sit in on Andy Andres sabermetric class at Tufts. Bill James is the guest tonight. Pie grounded out in his first plate appearance. Gorzelanny is tossing a two-hitter at the Cardinals. San Diego leads Chicago 3-1 in the second and the Pirates are up 6-1 batting in the seventh.

Posted by StatsGuru at 03:00 PM | Comments (0) | TrackBack (0)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:28 AM | Comments (0) | TrackBack (0)
April 16, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
April 15, 2007
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:03 AM | Comments (0) | TrackBack (0)
April 14, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:11 AM | Comments (0) | TrackBack (0)
April 13, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:22 AM | Comments (0) | TrackBack (0)
April 12, 2007
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:56 AM | Comments (0) | TrackBack (0)
April 11, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date with all the completed games from yesterday. The umpires suspended the Brewers and Marlins game last night during the third rain delay, tied at 2 in the top of the 11th. In the past, that would have been called a tie and the game replayed:

The Milwaukee Brewers and Florida Marlins will be the first major-league teams to benefit from the rule amendment for suspended games.

Play was suspended Tuesday night - actually at 12:03 this morning local time - during the third rain delay of the game with the Brewers and Marlins tied, 2-2, after 10 innings at Dolphin Stadium.

Under the rule amended over the off-season, the game will resume tonight where it was halted, with the Brewers coming to bat in the top of the 11th. That resumption is set for 6 p.m., with the regularly scheduled game between the teams to follow.

I actually liked the old tie rule because it was quirky. But with travel schedules difficult, this new rule is much better for today's game.

Posted by StatsGuru at 07:22 AM | Comments (0) | TrackBack (0)
April 10, 2007
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:10 AM | Comments (0) | TrackBack (0)
April 09, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:38 AM | Comments (0) | TrackBack (0)
April 08, 2007
You Like Them, You Really Like Them!
Permalink

Any Which Way the Wind Blows introduces the Player Likability Percentage.

It is simply the number of times a player has been booed according to a Google search on the player's name divided by the number of times a player has received a curtain call according to a Google search on the player's name. So, Times Booed/Number of Curtain Calls.

It seems there's no end to the use of search engines!

Posted by StatsGuru at 03:51 PM | Comments (0) | TrackBack (0)
Easter Update
Permalink

The Day by Day Database is up to date. Happy Easter to all who are celebrating the holiday!

Posted by StatsGuru at 08:04 AM | Comments (0) | TrackBack (0)
April 07, 2007
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
April 06, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
April 05, 2007
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
April 04, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (0) | TrackBack (0)
April 03, 2007
Day by Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:28 AM | Comments (0) | TrackBack (0)
April 02, 2007
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:30 AM | Comments (0) | TrackBack (0)
March 24, 2007
Luck Hitters
Permalink

PROTRADE uses a metric similar to PMR to find the lucky hitters from 2006:

PROTRADE has mapped out every inch of the diamond, charting every batted ball in the majors over the past five years, calculating the probability that given their distance, direction and hardness, they become hits or outs. You may hear a manager encourage a slumping player by saying that he is "making great contact" or "hitting the ball hard" but that the balls just aren't finding a way through the defense for hits because they are getting "unlucky."

Marlins fans should be worried about their middle infielders.

Posted by StatsGuru at 07:56 AM | Comments (1) | TrackBack (0)
March 09, 2007
Who Gets Credit for a K?
Permalink

Steve Lombardi of Was Watching wrote earlier this week with a question about strikeouts:

David - I have a question that I thought may interest you...and one that I thought perhaps you may be able to help answer. It's regarding whether a pitcher earns a strikeout or if the batter allows it.

Say you have a great strikeout pitcher - in terms of the numbers that he racks up. Let's call him Medro Partinez. And, say you have two batters - one that whiffs a lot and one who makes a lot of contact. Let's call the strikeout-prone one Kave Dingman and the contact-maven Gony Twynn.

Conventional wisdom suggests that when Medro Partinez whiffs Gony Twynn, it's the pitcher who should be credited with earning the strikeout - whereas when Medro Partinez whiffs someone like Kave Dingman it's questionable as to whether Medro should get credit or Kave should get the blame (for allowing the whiff).

Is there a way to use the head-to-head data in the Day-to-Day database to determine if conventional wisdom is correct in this case? Should we be looking at pitchers strikeouts somewhat like we look at "easy" and "tough" saves? Or the flip side, when looking at the value of a batter, should we be more concerned about versus "who" (meaning the type of pitcher) he whiffs against (more so than how many times he strikes out)?

Many years ago I worked with Bill James on a game, and part of that game was predicting various rates for a particular batter against a particular pitcher. Bill used a formula that I don't have permission to divulge that predicts what the rate of any stat should be for a particular batter vs. a particular pitcher, given the rate for each and the league average for each. This formula basically says that the rate is a result of cooperation between both. In the case of strikeouts you would expect very few Ks from matchups between a low K pitcher and a low K hitter. You would expect a matchup between a high K pitcher and a high K hitter to be greatly above average. Against an average pitcher, the batter's K rate should be close to his career K rate and vice versa.

If this formula is true, graphing the actual K rate for a matchup vs. the predicted K rate should yield the line y=x (slope of 1, intercept of 0). To test this, I looked at all batters and pitchers with 2100 BFP since 2000 so we have a good measure of their K/PA, then chose matchups with at least 20 PA. Here's the graph with the trend line(click for full size):


StrikeoutPred.JPG

The equation of the regression line is y = .984x - 0.001, which is pretty close to y=x. This means that the contribution is pretty equal. Strikeouts are a collaboration between pitchers and batters, and there's no reason to give one more credit than the other.

Posted by StatsGuru at 06:03 PM | Comments (8) | TrackBack (0)
February 26, 2007
Aging Players
Permalink

The Baseball Crank looks at three years of data to see the aging patterns in Established Win Share Levels (EWSL).

Posted by StatsGuru at 10:21 AM | Comments (0) | TrackBack (0)
February 22, 2007
Power and Average
Permalink

U.S.S. Mariner catches Mike Hargrove spewing conventional wisdom that's incorrect.

Posted by StatsGuru at 02:41 PM | Comments (1) | TrackBack (0)
February 09, 2007
More Information
Permalink

The people who brought you minor league splits now do the same for college baseball. Excellent work!

Posted by StatsGuru at 10:44 AM | Comments (1) | TrackBack (0)
January 26, 2007
Ranking the Projections
Permalink

The good people at Fangraphs are offering sortable projections from various sources. Just what's needed to help build your fantasy roster! There's projections on an individual basis as well.

Posted by StatsGuru at 04:09 PM | Comments (0) | TrackBack (0)
January 21, 2007
Winning and Losing
Permalink

There's a addition to the Day by Day Database. You can now see team records with different players in a game. For example, here's the won-lost records for each Mets player during the 2006 season. I'm not sure how useful this will be, but it's not something that's easy to find other places, so I'm sure you'll make use of the information.

Update: Just to be clear, if a player appears in a game at any time, the player gets credited with the result for the team that day.

Posted by StatsGuru at 08:29 PM | Comments (6) | TrackBack (0)
January 12, 2007

Al Doyle pens an interesting essay on one of my favorite subjects, the double. He writes extensively on Earl Webb's single season record, but neglects Tris Speaker's career mark of 792. Back in his 1983 Abstract, Bill James talked about soft records:

Any time performance levels in a given category rise to where the record represents less than 18 seasons of outstanding performance, the record becomes soft; less than 15, very soft.

If you take the average of the MLB leaders in doubles over the last 10 seasons, you get 53.8 as the level of outstanding performance. That 14.72 years of outstanding performance to break the record. This record should be really soft, like the home run record was in the 1960s and the stolen base record was in the 1980s. But looking at Career Assessments in the latest Bill James Baseball Handbook only Miguel Cabrera owns better than a 10% chance of breaking the record (15%), and the only other person whose career path may take him there is Albert Pujols, and he's at 7%. (In comparison, there are four players with a better than 10% chance of breaking Aaron's home run record.) No one's been able to sustain a number of outstanding double seasons.

The other fascinating point of Doyle's article is how doubles are a leading indicator of declining batting skills.

Using a sharp fall in doubles as a warning indicator, Frank Thomas is the player to watch in 2007. The Big Hurt has seven seasons of 35 or more doubles on his resume, including an American League-best 46 in 1992. The 2006 AL Comeback Player of the Year had just 11 doubles in 466 ABs with the A's to go with his 39 HR and 114 RBI. Will Thomas continue his impressive run production with the Blue Jays, or will he decline rapidly as he approaches age 39?

We'll keep an eye on the Big Hurt's stats this year.

Posted by StatsGuru at 07:45 AM | Comments (4) | TrackBack (0)
January 09, 2007
Back to 1957
Permalink

As of tonight, the Retrosheet daily logs back to 1957 are part of the Day by Day Database. This completes the data that is available via retrosheet. If you'd like, you can see what the batters hit in the last game the Brooklyn Dodgers played against the New York Giants.

Retrosheet keeps moving their data back in time, and I'll try to keep up. The next big thing will be their completion of play by play data for the 1999 season. At that point, we'll be able to push the splits data back to 1974, giving us a complete record of all active players. If you make use of the pre-2002 data extensively, think about visiting the Retrosheet site and hitting the tip jar. Or better still, see if you or a relative has a score sheet from a game they are missing.

Enjoy the data!

Posted by StatsGuru at 10:37 PM | Comments (0) | TrackBack (0)
January 08, 2007
Go West, Young Man
Permalink

The Day by Day Database now goes back to 1958, the year the Dodgers and Giants moved their rivalry to the west coast. Here's how the Dodgers pitched against the Giants that year, and how the Giants pitchers fared against the Dodgers. I assume Ruben Gomez earned some fans that season.

Posted by StatsGuru at 08:20 PM | Comments (3) | TrackBack (0)
January 07, 2007
Party Like it's 1959
Permalink

The Day by Day Database now goes back to 1959, the year the White Sox won the pennant. The big strength of the pitching staff was keeping the ball in the park. And for fans of the Cubs, you can now research Billy Williams' career.

Posted by StatsGuru at 08:05 PM | Comments (2) | TrackBack (0)
Better Day by Day
Permalink

I'd like to announce improvements to the Batter Comparison and Pitcher Comparison functions in the Day by Day Database. You can now set the minimum for any counting category. So, if you want to sort by strikeouts, you can use something other than plate appearances to limit the number of players who show up on the list.

You can also do a "club" list, looking for players with 300 doubles, 50 triples and 400 home runs since 1960, for example.

Posted by StatsGuru at 01:33 PM | Comments (1) | TrackBack (0)
January 06, 2007
A Lifetime of Stats
Permalink

With the Day by Day Database now pushed back to 1960, all baseball game played in my lifetime are included. And while I know who had the most hits, runs, home runs and even wins in my life, I wondered who hit the most triples. None other than Willie Wilson.

Update: There were complaints about the amount of data returned by the above query. So I bit the bullet, and now you can specify a minimum for any counting category for batters. I'll work on this for the other comparison programs as well.

Posted by StatsGuru at 06:57 PM | Comments (2) | TrackBack (0)
61 in '61
Permalink

The Day by Day Database now goes back to 1961. You can re-live Roger Maris' chase of Babe Ruth's single season home run record. Looking at it, Maris suffered some pretty long stretches with few home runs. From 7/26 to 8/10 he hit just one, a total of 16 games, and at the beginning of the season, he also hit just one home run in his first 16 games.

The other thing we now have is a complete game by game record for all batters and pitcher who played for expansion teams. That's 14 of the current 30 MLB franchises.

Posted by StatsGuru at 02:53 PM | Comments (0) | TrackBack (0)
January 05, 2007
Back to 1962
Permalink

The Day by Day Database now goes back to 1962. Where else can you get the end of Vinegar Bend Mizell's career? Or look at the complete list of career Astros home runs?

Posted by StatsGuru at 07:48 PM | Comments (0) | TrackBack (0)
January 04, 2007
Anyone for Polo?
Permalink

The Day by Day Database now goes back to 1963. That was the last year of the Polo Grounds. Now my sister can see who knocked out the most hits on her birthday. Of course, he became more famous for his managing talents.

Posted by StatsGuru at 09:30 PM | Comments (0) | TrackBack (0)
Regress, I Guess!
Permalink

Studes posts a great article at The Hardball Times on using regression analysis to predict future performance of players. I was a bit surprised Soriano wasn't on the list of players to fall off the most. The Marcel projections give him a .275 BA, a .515 slugging percentage and a .333 OBA in 2007. That's a falloff of just two points in batting average, but over 50 points in OPS.

Brad Wilkerson, by the way, projects to a .249 BA, .446 slugging percentage and a .347 OBA.

Posted by StatsGuru at 09:05 AM | Comments (1) | TrackBack (0)
January 02, 2007
When I'm Sixty Four
Permalink

The Day by Day Database now runs back to 1964. That was the year the St. Louis Cardinals pitchers did a great job of keeping the ball in the park on their way to a World Championship. Do you think Ray Sadecki received more run support than Bob Gibson that season?

Posted by StatsGuru at 11:11 PM | Comments (0) | TrackBack (0)
January 01, 2007
In the Summer of '65
Permalink

The Day by Day Database now goes back to 1965. That means it includes games before I started grammar school. I'm getting old. It also has the stats for the last year the Braves were in Milwaukee.

Posted by StatsGuru at 08:59 PM | Comments (0) | TrackBack (0)
Get Your Kicks, In Season '66
Permalink

The Day by Day Database now goes back to 1966. That means you can look at a game log of Bob Watson's entire career. You can also take a gander at Sandy Koufax's finale.

Posted by StatsGuru at 04:55 PM | Comments (0) | TrackBack (0)
December 30, 2006
The Impossible Day Dream
Permalink

The Day by Day Database now contains game logs back to 1967. Check out the Red Sox of that season. You can click on Tony C's name to see when his injury occurred and the ominous HBP.

Posted by StatsGuru at 05:25 PM | Comments (0) | TrackBack (0)
December 29, 2006
Back to 1968
Permalink

The Day by Day Database now goes back to the turbulent year of 1968. Even though batting averages were low that year, OBAs were pretty good. And ERAs were very good. Be sure to click on Bob Gibson's name to see all the zeros in the runs column.

Posted by StatsGuru at 10:15 PM | Comments (1) | TrackBack (0)
The Division Era Complete
Permalink

The Day by Day Database now is complete from 1969 on. This means the entire division era is available for research. Here's the list of leaders in home runs in that time. No real surprises there. Here's the leaders in ERA. Jim Palmer just beats out Pedro Martinez among starters.

In the next couple of weeks, we should be able to push this back to 1957, giving us 50 years of day by day statistics.

Posted by StatsGuru at 07:15 PM | Comments (1) | TrackBack (0)
Strikeout Wars
Permalink

Steve Lombardi writes:

I have a question that I hope you can help me with - it's been mentioned recently that hitters, for the most part, control the ride on batted ball types over pitchers and their tendencies. However, how does it work for contact hitters and strikeout pitchers? Who wins that battle - in the non-batted ball contest? Anything you can share would be appreciated. Thanks and regards, Steve Lombardi

The Day by Day Database contains batter vs. pitcher matchups back to 2000. Using that data, I selected 20 low strikeout hitters and 20 high strikeout pitchers. Here are strikeout percentages (100*K/PA) for various combinations:

TestK Percentage
Overall MLB 16.75%
Best Batters vs. All Pitchers 7.67%
Best Pitchers vs. All Batters 24.53%
Best vs. Best 11.72%

The best vs. best comparison resulted in 403 K in 3438 plate appearances. It certainly appears that the contact hitters "win" the battle against the strikeout pitchers. Maybe that should be a strategy. When you go up against Clemens, send up a lineup of batters that don't strike out often.

Update: TangoTiger writes in the comments:

Studes pointed out on my blog that the Odds Ratio method works pefectly here.

Following the step-by-step on my blog, I get a matchup expectation of 11.8%, which compares mighty favorably to the actual 11.7%.

That is, there is no greater advantage on either side. The resultant matchup is exactly predicted by the matchup method.

Studes was right.

Bill James invented a similar method in the mid 90's. I didn't know it was in general use, but it gives the same answer, an expected strikeout rate of 11.8%. What this basically means is that the low strikeout hitters are farther above average than high strikeout pitchers.

Posted by StatsGuru at 10:21 AM | Comments (12) | TrackBack (0)
December 28, 2006
1970 On-Line
Permalink

The Day by Day Database now goes back to 1970. Here's a list of the doubles leaders from that year. Thirty-five doubles isn't bad for a catcher. Combine that with 45 homers (and even four triples) and it's no wonder Bench won the MVP that season.

Posted by StatsGuru at 07:17 PM | Comments (1) | TrackBack (0)
December 27, 2006
1971 Online
Permalink

The Day by Day Database now goes back to 1971. Here's the leaders in slugging percentage that year. I wonder if there was a big stink about Torre winning the MVP over Aaron or Stargell? It's almost the opposite of this year's AL voting. Instead of going for a great offensive season by a player at a tough defensive position, the 2006 voters gave it to a slugging first baseman. In 1971,Torre's BA and OBA from third base beat out 100 points more of slugging from two fantastic outfielders. I'm not even sure how well Torre played third base that year, as he was converted from catcher. On top of that, Pittsburgh finished with the best record in the National League, which should give a boost to Stargell. It's a very interesting vote. The vote also agrees with win shares, which give Torre 41, Stargell 35 and Aaron 33. Let out of the top was Fergie Jenkins with 37.

Posted by StatsGuru at 11:46 AM | Comments (3) | TrackBack (0)
December 26, 2006
Back to 1972
Permalink

The Day by Day Database now contains batter and pitcher daily lines back to 1972. Take a look at the great ERA race that year.

Posted by StatsGuru at 08:30 PM | Comments (3) | TrackBack (0)
December 19, 2006
One Year Back
Permalink

I've started work on moving the Day by Day Database back in time. Game logs now go back to 1973. Here's Tommy Agee's final season. I'm hoping by the start of the 2007 season to push the database back to 1957, giving us 1/2 century of data.

Posted by StatsGuru at 10:22 PM | Comments (0) | TrackBack (0)
December 12, 2006
Curling by the Numbers
Permalink

Michael Eglinski sends along this link as Bob Weeks profiles the Bill James of curling.

If you really want to understand how Glenn Howard won last week's Grand Slam of Curling event, the Home Hardware Masters, you should know that his hammer efficiency in the bonspiel was a remarkable 35%. And his force efficiency was a stunning 68%.

This shouldn't come as much of a surprise as Howard's team is ranked No.1 in Canada, statistically speaking.

These numbers and rankings might not mean anything to most curling fans, but if Dallas Bittle has his way, that will soon change.

Bittle, along with his partner Gerry Geurts, are to curling what Bill James is to baseball. For the past few years, they have been developing an entirely new way of looking at curling through statistics.

But please, don't call them statisticians. They're researchers of the game.

"We found there was a void in reporting results and how they were reported," Bittle explained. "We felt there was a need for a better statistical analysis."

See, I told you curling was like baseball!

Posted by StatsGuru at 10:23 PM | Comments (2) | TrackBack (0)
November 27, 2006
One Day at a Time
Permalink

The Day by Day Database is working again. Thanks to the staff at Hosting Matters for resolving the problem.

Posted by StatsGuru at 11:16 PM | Comments (0) | TrackBack (0)
October 02, 2006
Winning Ways
Permalink

The Hardball Times just published the final win shares data (American League, National League). Win shares puts Jeter and Mauer 1-2 in the American League, and if you look at the measure of Win Shares above bench, Jeter, Mauer and Ortiz are all tied. Note that a big advantage for Jeter over both Mauer and Ortiz is that he played more games than Mauer and Ortiz. If you will, on a per game or per at bat basis, Mauer and Ortiz (and others) are more valuable than Derek. But Jeter is always in the lineup, and there's something to be said for contributing every day.

Note, too, Mauer's excellent defensive contribution. Mauer ranks much higher on defense as a catcher than Jeter does as a shortstop. It's a very tough choice. My MVP ballot would probably look like:

  1. Jeter
  2. Mauer
  3. Ortiz
  4. C. Guillen
  5. Santana
  6. Morneau
  7. Ramirez
  8. Hafner
  9. Sizemore
  10. Mike Young

In the NL, Pujols edged out Carlos Beltran for total win shares in the last week of the season, while Carlos finished first in win shares above bench. Note that Ryan Howard did not rank very high in total win shares, just sixth on the list, behind David Wright of the Mets. A big part of that is park effect. Shea was the second toughest park for run scoring this season. Philadelphia still favors offense, although less than in previous seasons. This is going to pull Howard down while raising Wright and Beltran, despite the fact that Howard hit better on the road. So the win shares ranking of Howard may be too low.

But I don't think the ranking of Beltran is too high. Remove Carlos from Shea, and he hits as well as Ryan, plus he played a great defensive centerfield. Pujols's rank is quite impressive, given that he missed 19 games. So my NL ballot would look like:

  1. Carlos Beltran
  2. Albert Pujols
  3. Ryan Howard
  4. Miguel Cabrera
  5. David Wright
  6. Lance Berkman
  7. Alfonso Soriano
  8. Chase Utley
  9. Jose Reyes
  10. Barry Bonds (for old times sake)

In the AL, I could just as easily put Mauer ahead of Jeter. The playing time does it for me. If Jeter played a less demanding defensive position, I'd put Mauer ahead of him for sure. In the NL, put the top three in any order you like, I won't give you much of an argument. It just seems that people in the discussion on this topic don't give Beltran's defense enough consideration.

And one last thought. The Yankees player with the second highest number of win shares was Alex Rodriguez. He must have done something right during the season.

Posted by StatsGuru at 02:42 PM | Comments (16) | TrackBack (0)
Final Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
October 01, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
September 30, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
September 29, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:41 AM | Comments (0) | TrackBack (0)
September 28, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
September 27, 2006
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
September 26, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:17 AM | Comments (0) | TrackBack (0)
September 25, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:42 AM | Comments (0) | TrackBack (0)
September 24, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:07 AM | Comments (0) | TrackBack (0)
September 23, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
September 22, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:55 AM | Comments (0) | TrackBack (0)
September 21, 2006
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:07 AM | Comments (0) | TrackBack (0)
September 20, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
September 19, 2006
Late Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:35 PM | Comments (0) | TrackBack (0)
Day by Day Late Today
Permalink

The update of the Day by Day Database will be delayed today. Sorry for the inconvenience.

Posted by StatsGuru at 09:27 AM | Comments (0) | TrackBack (0)
September 18, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
September 17, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
September 16, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:26 AM | Comments (0) | TrackBack (0)
September 15, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:22 AM | Comments (0) | TrackBack (0)
September 14, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:28 AM | Comments (0) | TrackBack (0)
September 13, 2006
Midweek Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:15 AM | Comments (0) | TrackBack (0)
September 12, 2006
RBI Gap
Permalink

Phil Jefferis writes:

I heard something over the weekend that caught my attention. The Astros leading RBI man is obviously Lance Berkman, but who would you guess is second? The answer is tricky. Preston Wilson, who departed about a month ago after riding the bench for awhile, had 55 RBIs during his stay, which would currently put him in second place. The active Astro with the second most RBIs is Adam Everett with 54. Its a telling stat. A guy whose derives about 95% of his value from his defense is our second RBI guy. No wonder we're currently in our third 20+ inning scoreless streak for the year.

That was only part of what surprised me though. The part that I am interested in is the disparity between Lance and Everett. Lance, at 120 RBIs, has more than doubled his closest teammate. I'd have to think that 66 RBIs is about as big a disparity as you'll see in the Majors. Have you looked into this at all?

I took a quick look at the teams I thought might have a similar situation. Philly was my first choice, and from what I can tell, they are the closest with Howard leading Utley by 49 (1.55 times more).

Historically, the biggest gaps belong to two Cubs teams. In 2001, Sammy Sosa drove in 160 runs. Second on that team was Ricky Gutierrez with 66. That's a difference of 94, well more than double. Gutierrez was not known for his offense, and that was his career high. In 1959, Ernie drove in 143 runs with Bobby Thomson (yes, that Bobby Thomson) second behind him with 52, a difference of 91, also more than double. Since the start of the 20th Century, there are eleven teams where the leader in RBI more than doubled the second most on the team.

RBI Difference, Leader Doubling Second on Team
SeasonTeamRBI LeaderSecondDifference
2001 CHN 160 66 94
1959 CHN 143 52 91
1935 BSN 130 60 70
1925 NYA 138 68 70
1946 DET 127 59 68
1936 NY1 135 67 68
1999 MON 131 64 67
1972 SDN 111 47 64
1992 LAN 88 39 49
1912 CLE 90 45 45
1981 CHN 75 35 40
Posted by StatsGuru at 04:08 PM | Comments (2) | TrackBack (0)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:16 AM | Comments (0) | TrackBack (0)
September 11, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
September 10, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:18 AM | Comments (0) | TrackBack (0)
September 09, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:34 AM | Comments (0) | TrackBack (0)
September 08, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:03 AM | Comments (1) | TrackBack (0)
September 07, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:22 AM | Comments (0) | TrackBack (0)
September 06, 2006
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
September 05, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:05 AM | Comments (0) | TrackBack (0)
September 04, 2006
Labor Statistics
Permalink

The Day by Day Database is up to date. And happy Labor Day to all my readers, I hope you enjoy the day with your family and friends!

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
September 03, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:03 AM | Comments (0) | TrackBack (0)
September 02, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
September 01, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
August 31, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:58 AM | Comments (0) | TrackBack (0)
August 30, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
August 29, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 04:00 AM | Comments (0) | TrackBack (0)
August 28, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (0) | TrackBack (0)
August 27, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
August 25, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:36 AM | Comments (0) | TrackBack (0)
August 24, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
August 23, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
August 22, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
August 21, 2006
Daily Does of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:01 AM | Comments (0) | TrackBack (0)
August 20, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:39 AM | Comments (0) | TrackBack (0)
August 19, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:19 AM | Comments (0) | TrackBack (0)
August 18, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 01:00 PM | Comments (0) | TrackBack (0)
Update Delayed
Permalink

Due to technical difficulties, the update of the Day by Day database will be delayed.

Posted by StatsGuru at 07:57 AM | Comments (0) | TrackBack (0)
August 17, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
August 16, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
August 15, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:20 AM | Comments (0) | TrackBack (0)
August 14, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
August 13, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (1) | TrackBack (0)
August 12, 2006
Daily Does of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
August 11, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
August 10, 2006
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
August 09, 2006
Daily Does of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
August 08, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
August 07, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:34 AM | Comments (0) | TrackBack (0)
August 06, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:59 AM | Comments (0) | TrackBack (0)
August 05, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
August 04, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:14 AM | Comments (0) | TrackBack (0)
August 03, 2006
Don't Tango with the Tiger
Permalink

TangoTiger takes apart Tom Verducci at The Book Blog. (Hat tip, Baseball Think Factory.)

Posted by StatsGuru at 09:00 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
August 02, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:19 AM | Comments (1) | TrackBack (0)
August 01, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:38 AM | Comments (0) | TrackBack (0)
July 31, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:58 AM | Comments (0) | TrackBack (0)
July 30, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:44 AM | Comments (0) | TrackBack (0)
July 29, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
July 28, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
July 27, 2006
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:23 AM | Comments (0) | TrackBack (0)
July 26, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
July 25, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
July 24, 2006
Should Barmes Bat Second?
Permalink

Every time I see Clint Barmes batting second for the Rockies, I cringe a little. The man is carrying a .273 OBA this season. And no matter how much you think lineups matter or not, it seems putting that kind of OBA in front of the heart of the order just doesn't make a lot of sense.

But Barmes was put back in the second spot after the All-Star break and is on a tear. He's hit safely in eleven straight games. But more interesting, he does his best work in the second slot. Batting second this season, he's posting a .302 BA and a .331 OBA. In other slots, his BA is .163 and his OBA is .219. It works out pretty much the same for his career. You can argue that the sample sizes are pretty small, but I wonder if there's something else here? You can also see that Barmes' best base situation is a man on first. Is he particularly adept at taking advantage of the hole created by holding a runner at first? Not according to this hit chart. I'm open to speculation here. Why would Barmes be a better #2 hitter than other spots in the lineup?

Posted by StatsGuru at 09:00 AM | Comments (6) | TrackBack (0)
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:41 AM | Comments (0) | TrackBack (0)
July 23, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
July 22, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 04:54 AM | Comments (0) | TrackBack (0)
July 21, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:24 AM | Comments (0) | TrackBack (0)
July 19, 2006
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
July 18, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:05 AM | Comments (0) | TrackBack (0)
July 17, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
July 16, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
July 15, 2006
Power Speed Oddities
Permalink

Dan Lewis at ArmChairGM notices some strange team relationships between home runs and stolen bases.

Posted by StatsGuru at 09:28 AM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
July 14, 2006
Post-Break Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:01 AM | Comments (0) | TrackBack (0)
July 13, 2006
Using WPA
Permalink

Dave Studeman at The Hardball Times finds some uses for WPA in evaluating teams. I do like his Pythagorean Breakouts, although I'm not sure we can't get there simply by looking at batting splits and opposition batting splits.

The Team Leverage Index is also somewhat interesting, since it tells us what teams play a lot of close games. Studes notes that the Indians and the Cubs must play a lot of boring games since their LI is small. I think you'd find the great teams also play boring games, as the truly great teams tend to win games by large margins. I have little doubt the 1998 Yankees would be at the bottom of the list.

Posted by StatsGuru at 07:22 AM | Comments (1) | TrackBack (0)
July 12, 2006
New Deal Statistics
Permalink

The All-Star game gives me a chance to criticize one of my least favorite statistics, Win Probability Added (WPA). The All-Star game graph show my concern with this stat, and how some people are using it to evaluate players. Notice how little the early game homer counts for Vlad vs. how much the late game triple counts for Young. Now, each gave the AL a one run lead, but because Young's hit came late, it's more valuable. If Vlad's homer had been the only run in the game, it's value would not change. He would have had the most valuable hit in the game, but a late inning lead off triple by the NL (that never results in a run) would likely be more valuable.

WPA is great if you're a gambler and trying to bet on the game as it progresses. But it's lousy if you're trying to buy talent. Let's say some GMs decide this is something worth using to evaluate players. Players then would try to maximize their value in this system. Since most plate appearances, positive or negative result in very little change, the strategy for a player would be to only try in the late innings of close games! A player with a few hits in key situations looks like a star, while someone who contributes throughout the game but fails late looks like a loser. I just don't buy it.

This statistic is a tool to explain what happens, not a measure of a player's ability. To use it as such is wrong.

Posted by StatsGuru at 11:27 AM | Comments (10) | TrackBack (0)
July 10, 2006
All-Star Break Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
July 09, 2006
Break Dancing
Permalink

Jacob Luft at Sports Ilustrated uses the Day by Day Database to look at the best performances from break to break.

Posted by StatsGuru at 08:51 PM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 05:20 AM | Comments (0) | TrackBack (0)
July 08, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
July 07, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
July 06, 2006
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
July 05, 2006
Hump Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:47 AM | Comments (0) | TrackBack (0)
July 04, 2006
Independence Day Update
Permalink

Happy 4th of July! The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (0) | TrackBack (0)
July 03, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:09 AM | Comments (1) | TrackBack (0)
July 02, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 10:19 AM | Comments (0) | TrackBack (0)
July 01, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Sorry for the lateness of the updates the past few days. The flooding in the northeast is causing power problems for my data supplier.

Posted by StatsGuru at 12:09 PM | Comments (0) | TrackBack (0)
June 30, 2006
Late Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:45 PM | Comments (0) | TrackBack (0)
Update Delayed
Permalink

Due to technical difficulties, the Day by Day Database will not be updated until later this afternoon.

Posted by StatsGuru at 10:02 AM | Comments (0) | TrackBack (0)
June 29, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
June 28, 2006
Mid Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
June 27, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:26 AM | Comments (0) | TrackBack (0)
June 26, 2006
Rainy Days and Mondays Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
June 25, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:23 AM | Comments (0) | TrackBack (0)
June 24, 2006
Weekend Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:21 AM | Comments (0) | TrackBack (0)
June 23, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 05:17 AM | Comments (0) | TrackBack (0)
June 22, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:02 AM | Comments (0) | TrackBack (0)
June 21, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
June 20, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
June 19, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:03 AM | Comments (0) | TrackBack (0)
June 18, 2006
Father's Day Update
Permalink

The Day by Day Database is up to date.

Happy Father's Day to all the dads! Enjoy the day with your family!

Posted by StatsGuru at 08:07 AM | Comments (0) | TrackBack (0)
June 17, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:00 AM | Comments (0) | TrackBack (0)
June 16, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:16 AM | Comments (0) | TrackBack (0)
June 15, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:14 AM | Comments (0) | TrackBack (0)
June 14, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
June 13, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:47 AM | Comments (0) | TrackBack (0)
June 12, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 04:19 AM | Comments (0) | TrackBack (0)
June 11, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:04 AM | Comments (0) | TrackBack (0)
June 10, 2006
Weekend Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:53 AM | Comments (0) | TrackBack (0)
June 09, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
June 08, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
June 07, 2006
Mid Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:15 AM | Comments (0) | TrackBack (0)
June 06, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:20 AM | Comments (0) | TrackBack (0)
June 05, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
June 04, 2006
RBI Percentage Link
Permalink

For all of you coming from Gordon Edes column, welcome! Here's the link for the RBI Percentage tool, and instructions, and an explanation of the stat.

There's a lot more information in the Day by Day Database. Daily batting and pitching logs for players going back to 1974, and splits for players and teams from the 2000 season on. I hope you find all this useful.

Thanks very much to Gordon for the link.

Posted by StatsGuru at 09:47 AM | Comments (1) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date. Note that a HR per 9 column was added to pitcher comparisons.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
June 03, 2006
Allowing Homers
Permalink

At the request of a reader, HR per 9 are now part of pitcher comparisons. You can find links to all functionality of the Day by Day Database here.

Posted by StatsGuru at 03:39 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date. Note that there was an update yesterday, you can now create a table showing RBI Percentage.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
June 02, 2006
RBI Percentage
Permalink

Recently I posted about RBI percentage in relation to Albert Pujols' run at Hack Wilson. Someone suggested I add that functionality to the Day by Day Database. It was a pretty easy change, so here's a tool to generate RBI Percentages. You can set the dates (so you can go back to the 2000 season), the sort column the sort direction and the minimum number of runners on base. Here's the year to date chart through June 1, 2006. I hope you find this useful.

Posted by StatsGuru at 05:12 PM | Comments (6) | TrackBack (0)
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
June 01, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (0) | TrackBack (0)
May 31, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
May 30, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:59 AM | Comments (0) | TrackBack (0)
May 29, 2006
Memorial Day Update
Permalink

The Day by Day Database is up to date.

On this Memorial Day, I'd like to thank all the military readers (past and present) of Baseball Musings for their service and send my deepest condolences for the losses you've suffered. I hope all of you come home safely.

Posted by StatsGuru at 08:42 AM | Comments (1) | TrackBack (0)
Memorial Day Update
Permalink

The Day by Day Database is up to date.

On this Memorial Day, I'd like to thank all the military readers (past and present) of Baseball Musings for their service and send my deepest condolences for the losses you've suffered. I hope all of you come home safely.

Posted by StatsGuru at 08:42 AM | Comments (1) | TrackBack (0)
May 28, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:50 AM | Comments (0) | TrackBack (0)
May 27, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:44 AM | Comments (0) | TrackBack (0)
May 26, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:03 AM | Comments (0) | TrackBack (0)
May 25, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:44 AM | Comments (0) | TrackBack (0)
May 24, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
May 23, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
May 22, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:07 AM | Comments (0) | TrackBack (0)
May 21, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:08 AM | Comments (0) | TrackBack (0)
May 20, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:26 AM | Comments (0) | TrackBack (0)
May 19, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:33 AM | Comments (0) | TrackBack (0)
May 18, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:47 AM | Comments (0) | TrackBack (0)
May 17, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:49 AM | Comments (0) | TrackBack (0)
May 16, 2006
What a Day for an Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
May 15, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
May 14, 2006
Mother's Day Update
Permalink

The Day by Day Database is up to date.

Happy Mother's Day to all the mom's who love baseball, or love someone who loves baseball.

Posted by StatsGuru at 08:53 AM | Comments (0) | TrackBack (0)
May 13, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:46 AM | Comments (0) | TrackBack (0)
May 12, 2006
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
May 11, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:28 AM | Comments (0) | TrackBack (0)
May 10, 2006
Minor League Data
Permalink

Jeff Sackmann at Brew Crew Ball just published the minor league equivalent of the Day by Day Database. It's called the Minor League Splits Database and allows you to look at fairly up to date stats for current players in the minors. Check it out, and send Jeff a big thank you for his work. You can feast your eyes on Cole Hamels here.

Posted by StatsGuru at 03:11 PM | Comments (0) | TrackBack (0)
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:45 AM | Comments (0) | TrackBack (0)
May 09, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (1) | TrackBack (0)
May 08, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:04 AM | Comments (0) | TrackBack (0)
May 07, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
May 06, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
May 05, 2006
Day by Day Database Question
Permalink

Retrosheet now provides complete play by play from 1974 on, with the exception of 1999. It would fairly easy for me to make all this split data available in the Day by Day Database (game logs are complete back to 1974). Are people interested in this data? Do you think that making it available without 1999 means that people are going to make mistakes and publish it as career, missing the caveat? Or should I wait for the 1999 data to be published, so that I don't need to worry about these things?

Posted by StatsGuru at 12:59 PM | Comments (7) | TrackBack (0)
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:06 AM | Comments (0) | TrackBack (0)
May 04, 2006
A Day in the Life
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
May 03, 2006
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
May 02, 2006
Getting Close
Permalink

Peter at The Good Phight sends this post using retrosheet data to look at the Phillies in the clutch. Peter takes defines clutch very narrowly (which is good), but it leads to small sample sizes (which is bad).

What really caught my eye, however, was that retrosheet now has 2005 data available. So I went over to the events page and found that 1993-1998 data is also available! (As someone who donated to the cause, they might have sent out an e-mail.) That means we're one year (1999) away from having a completely public domain play-by-play from 1974 to 2005.

The Day by Day Database can then be populated with splits data for the careers of all active players, and some very interesting retired players. We'll have complete batter vs. pitcher records available to us. It's an exciting time to be a baseball researcher.

Posted by StatsGuru at 10:32 AM | Comments (4) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:50 AM | Comments (0) | TrackBack (0)
May 01, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:03 AM | Comments (0) | TrackBack (0)
April 30, 2006
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
April 29, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (0) | TrackBack (0)
April 28, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:04 AM | Comments (0) | TrackBack (0)
April 27, 2006
Thrusday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
April 26, 2006
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:17 AM | Comments (0) | TrackBack (0)
April 25, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:26 AM | Comments (0) | TrackBack (0)
April 24, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:51 AM | Comments (0) | TrackBack (0)
April 23, 2006
A Hit from the Bench
Permalink

The official scorer changed a ruling from last night's Giants-Rockies game, giving Barry Bonds an extra hit:

Official scorer Dave Einspahr reviewed Bonds' fourth at-bat from Saturday night, when pitcher Aaron Cook mishandled his comebacker and was charged with an error. Einspahr changed it to an infield single, raising Bonds' batting average from .206 to .235.

"It's kind of a do-or-die situation where, if he doesn't get it cleanly, it's not an easy play," Einspahr told The Associated Press on Sunday after the Giants had requested a review.

The hit takes Bonds much further from the Mendoza line.

Posted by StatsGuru at 03:47 PM | Comments (0) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
April 22, 2006
Weekend Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:20 AM | Comments (0) | TrackBack (0)
April 21, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:34 AM | Comments (0) | TrackBack (0)
April 20, 2006
Pitchers and Pitches Per PA
Permalink

The Baseball Savant doesn't buy the idea that fewer innings pitched per start is mostly the result of more selective batters seeing more pitches per plate appearance.

Posted by StatsGuru at 12:25 PM | Comments (5) | TrackBack (0)
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:32 AM | Comments (0) | TrackBack (0)
April 19, 2006
Five Hundred Doubles
Permalink

Luis Gonzalez reached the 500 double mark last night, the forty fourth player in the history of baseball hit that milestone. Should this be another one of those milestones that put you in the Hall of Fame? It's an impressive accomplishment. You need to play a long time and have good power. Twenty-eight of the forty four are in the Hall. Pete Rose would make it twenty nine if he were eligible. Biggio, Ripken, Gwynn and are going in at some point, and probably Bonds and Palmeiro. Roberto Alomar is also deserving. So Rose and those six would bring the total to 35 of 44. Seems like a Hall of Fame standard to me.

That said, I don't believe Luis will make it. The first half of his career, wasn't anything special. In judging his whole career, his averages are just not outstanding, especially in an offensive era. A great career in your 30s isn't going to be enough to put him in the Hall.

Posted by StatsGuru at 08:40 AM | Comments (3) | TrackBack (0)
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:07 AM | Comments (0) | TrackBack (0)
April 18, 2006
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:10 AM | Comments (0) | TrackBack (0)
April 17, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
April 16, 2006
What's the Meaning of Slugging?
Permalink

Vlad Guerrero came into today's game against the Orioles with a .500 slugging percentage. Not surprising, since we think of Vlad as a power hitter. But Guerrero picked up just one extra-base hit in his first eleven games. His totals were 17 for 40 with one homer.

In other words, he was slugging without hitting for power. Whereas on-base average represents a probability, slugging percentage represents a distance. One way to think of it as the average number of bases a batter advances himself per at bat. If you slug .500, you're halfway to first base at the end of each at bat.

That distance also applies to runners on base. The higher the slugging percentage, the farther a batter is likely to advance a base runner. And there are two ways of doing it; hitting often, or launching lots of extra base hits. When a batter does both, you end up with a Hall of Famer.

Vlad picked up three more hits today, but two of them were home runs. He getting back to doing both, and raised his slugging percentage to .644.

Posted by StatsGuru at 08:06 PM | Comments (1) | TrackBack (0)
Easter Update
Permalink

Happy Easter to all celebrating today!

The Day by Day Database is up to date.

Posted by StatsGuru at 07:17 AM | Comments (0) | TrackBack (0)
April 15, 2006
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:14 AM | Comments (0) | TrackBack (0)
April 14, 2006
Good Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:06 AM | Comments (0) | TrackBack (0)
April 13, 2006
Days of Our Lives
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
April 12, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 05:27 AM | Comments (0) | TrackBack (0)
April 11, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:10 AM | Comments (0) | TrackBack (0)
April 10, 2006
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:12 AM | Comments (0) | TrackBack (0)
April 09, 2006
What are the Odds?
Permalink

Albert Pujols came into the night hitless in this series. The broadcast mentioned that he's gone hitless in a three game series or more only twice in his career. Pujols did get a hit in the fifth inning, driving in two and taking back the lead for the Cardinals.

But I wondered, how often should Pujols go hitless in a series? Albert is a .332 career hitter. That's the probability of his getting hit if you know nothing else about the situation. If he gets 12 at bats in a three-game series, the probability of Albert getting at least one hit is .992. So if Albert plays 1000 three game series, we'd expect him to go hitless in about 8 of those. Now, Pujols likely plays about 50 series a year of at least three games. In five years, that's 250 series. Since 250 is 1/4 of 1000, we'd expect that he would go hitless in about two series. Just right.

Posted by StatsGuru at 09:43 PM | Comments (4) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:01 AM | Comments (0) | TrackBack (0)
April 08, 2006
Afternoon Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:56 PM | Comments (0) | TrackBack (0)
April 07, 2006
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:22 AM | Comments (0) | TrackBack (0)
April 06, 2006
Strikeout Trend Doesn't Hold
Permalink

I reported yesterday that strikeouts were down compared to the same period in the previous year. That comparison didn't hold to another day of games. K per 9 is right in line with the same period in 2003, 2004 and 2005, although it was much higher in 2001 and 2002.

However, runs are still up. So far, this is the highest scoring start of the season this decade. In 2002, 10.2 runs were scored through the first three full days of games. We're at 11.2 for the start of 2006. The same with home runs. In fact, the three highest years this decade are the three years they've been testing for steroids!

YearRuns per GameHR per Game
20019.22.38
200210.21.97
200310.12.14
200410.22.38
20059.92.41
200611.22.84

And remember, this is with low scoring games at Coors and Texas. There's nothing definitive here. We'll see how this looks at the end of the month.

Posted by StatsGuru at 08:17 AM | Comments (2) | TrackBack (0)
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
April 05, 2006
Strikeouts Down
Permalink

Through games of Tuesday, strikeouts are down this year vs. through the first Tuesday of 2005. Last year, batters struck out 7.0 times per nine. This year, 6.5 K per 9. It's too early to say if it's a trend. But I wonder if batters are trying harder to put the ball in play? Both runs and homers are up vs. the same time period last year.

Posted by StatsGuru at 05:46 PM | Comments (7) | TrackBack (0)
Hump Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:34 AM | Comments (0) | TrackBack (0)
April 04, 2006
Late Day
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:52 PM | Comments (1) | TrackBack (0)
April 03, 2006
Scoring Help
Permalink

Marshall Votta is looking for help with a scoring project. If you watch most of your favorite team's games, think about leaving a comment at this post.

Posted by StatsGuru at 11:04 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
April 01, 2006
Boxscore History
Permalink

The New York Times presents a slide show showing how the box score changed throughout the years. I'm quite proud that the STATS, Inc. box shown in the final slide comes from software I helped develop in my years at the company.

Posted by StatsGuru at 09:40 PM | Comments (1) | TrackBack (0)
March 20, 2006
Bill James in Esquire
Permalink

Balls, Sticks and Stuff reviews the new article about Bill James in Esquire.

Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 10:14 AM | Comments (0) | TrackBack (0)
March 19, 2006
Splitting the Days
Permalink

There are two new additions to the Day by Day Database. The new programs allow you to compare batters on a split, or compare pitchers on a split. (The data for pitchers is actually opposition batting.)

Here are some examples:

Best OBA, #1 Hitters, 2000-2005.

Best OBA against #1 Hitters, 2000-2005

Most Homeruns by a Shortstop, 2000-2005

Most Homeruns by a Shortstop, 2004-2005

Highest Batting Average Allowed to Left-Handed Batters, 2000-2005

Most strikeouts with the bases loaded, 2000-2005

I hope you'll have some fun with this! You can drill down to very specific dates to compare players on a day, over a week, a month, or multiple seasons. Please let me know if you find any bugs or errors.

Update: Fixed a bug in the GDP sort. Thanks for pointing this out in the comments!

Baseball Musings is conducting a pledge drive in March. Click here for details.

Posted by StatsGuru at 10:13 PM | Comments (6)
March 15, 2006
Established Win Share Levels
Permalink

The Baseball Crank steps back in time to see how his Established Win Share Levels for 2005 did in predicting team peformance.

Baseball Musings is conducting a pledge drive in March. Click here for details.

Posted by StatsGuru at 08:20 AM | Comments (0) | TrackBack (0)
March 06, 2006
Discussing Sabermetrics
Permalink

Baseball Analysts sits down with three respected sabermetricians to discuss the state of the art.

Baseball Musings is conducting a pledge drive in March. Click here for details.

Posted by StatsGuru at 08:43 AM | Comments (0) | TrackBack (0)
February 15, 2006
Balls In Play
Permalink

Studes at Major League Baseball Graphs presents fascinating data on balls in play for teams and individual players. Check out the explanation, then start exploring. His total net runs shows why it's good to be a groundball pitcher.

Posted by StatsGuru at 10:09 PM | Comments (1) | TrackBack (0)
January 21, 2006
Pitching Runs Created
Permalink

David Gassko has a new way of looking at runs allowed by pitchers call Pitching Runs Created. The method eliminates the need for marginal runs in comparing the worth of batters and pitchers.

Posted by StatsGuru at 08:47 AM | Comments (2) | TrackBack (0)
January 17, 2006
New Lahman Version
Permalink

I had almost given up hope that the Lahman database would be updated, but the new version is now available for download. A big thanks to Sean Lahman for putting this together every year.

Posted by StatsGuru at 01:35 PM | Comments (1) | TrackBack (0)
January 16, 2006
Ground and Air Averages
Permalink

A little over a week ago this site looked at who was responsible for getting ground balls. A reader wondered if we could plug batting stats into this method, so here it is. Let me remind you of the methodology:

To study this, I selected a group of pitchers that gathered 400 IP from 2002-2004. There were 101 pitchers in the group, and I divided them into quartiles on the probability of a ground ball. Quartile 1 is the group with the lowest probability of a ground ball, quartile 4 the highest. I also selected batters with 1000 plate appearances in that time frame. There were 243 batters in the study, also divided into quartiles on the same statistic.

Let's start with batting average. First here are the overall averages for the quartiles:

Batting Average by Quartile (1=least grounders, 4=most grounders)
QuartileVs. PitchersBatters
1.259.274
2.261.279
3.269.277
4.265.278

Remember, this study is looking at regulars on both sides of the ball, therefore we expect these groups to be better than average, hence the differences in batting average between batters and pitchers. Next, I pitted each pitcher quartile vs. each batter quartile. The 1-1 cell are pits pitchers and batters who tend to get lots of balls in the air vs. each other, while the 4-4 cell are pitchers and batters who keep the ball on the ground:

Pitcher Quartiles vs. Batter Quartiles, Batting Average
PitchersBatters
Quartiles 1234
1.258.259.269.272
2.266.275.271.270
3.277.294.284.281
4.274.283.275.268

It looks like quartile two hitters (slightly fewer ground balls), are the best for generating hits, while the quartile 1 pitchers (fewer ground balls) are best for preventing hits. Maybe these type 2 hitters represent a more versatile group, one that can adjust to the type of pitcher on the mound.

Now let's take a look at slugging percentage. First the average for the quartiles:

Slugging Percentage by Quartile (1=least grounders, 4=most grounders)
QuartileVs. PitchersBatters
1.430.482
2.417.469
3.423.443
4.405.409

Here, slugging percentage appears to have a lot more to do with the tendancies of the batter than the pitcher. Now let's look at quartile vs. quartile:

Pitcher Quartiles vs. Batter Quartiles, Slugging Percentage
PitchersBatters
Quartiles 1234
1.463.443.441.410
2.464.462.432.389
3.478.492.455.407
4.458.456.424.370

Again, slugging averages go down as you go from low grounder to high grounder batters. What's really fascinating is pitching quartile 3, which has the highest slugging percentages of all the pitchers in three of the four batting groups. Why is this?

It might be that this group are the ground ball pitchers who make mistakes. Bruce Hurst comes to mind. If he hung a pitch, it just got hammered. Pitching quartiles 1 & 2 are successful with balls in the air. Pitchers in quartile 4 don't hang pitches. But quartile three is where you find pitchers who make mistakes. I'll have to figure out what pitchers are in each quartile and see if that idea holds up.

Correction: Fixed the caption on the last table.

Posted by StatsGuru at 05:38 PM | Comments (2) | TrackBack (0)
January 15, 2006
The Price of Stats
Permalink

CBC Distribution and Marketing is taking legal action to free them from needing a license to use baseball players statistics to run their fantasy games.

CBC, which has run the CDM Fantasy Sports leagues since 1992, sued baseball last year after it took over the rights to the statistics and profiles from the Major League Baseball Players Association and declined to grant the company a new license.

Before the shift, CBC had been paying the players' association 9 percent of gross royalties. But in January 2005, Major League Baseball announced a $50 million agreement with the players' association giving baseball exclusive rights to license statistics.

When MLBAM took over the licenses from the MLBPA, I wrote the following:


It's my opinion that MLBAM should have kept the fees low and encouraged more fantasy games. Fantasy games are a growth industry; they create fans for major league baseball, and those fans spend money in the MLB.com store, attend MLB games and watch the advertising during broadcasts that keeps the teams running. They should be encouraging the growth of the industry with low license fees. If a court finds that the MLBAM has no right to license the stats, they'll end up with nothing.

So now MLBAM faces a dilemma. Precedent is on the side of CBC (see the previous link). If CBC wins, MLB gets nothing and loses total control of how players names are used in conjuction with their stats. At this point, MLBAM may be better off giving CBC a sweet deal rather that taking the chance of losing in court. But that's what happens when you get greedy.

Posted by StatsGuru at 03:09 PM | Comments (15) | TrackBack (0)
January 07, 2006
Afternoon Reading
Permalink

Via Primery Numbers, a six part walk through with criticism of the Win Shares formula.

Posted by StatsGuru at 12:34 PM | Comments (3) | TrackBack (0)
December 19, 2005
More Splits
Permalink

The other day, we took a look at Jarrod Washburn's batting average allowed with runners in scoring position. It made me think that it would be convenient to have a display of a split by season. So, here's the latest addition to the Day by Day Database: Batter Splits by Season and Pitcher Splits by Season.

Enjoy!

Posted by StatsGuru at 09:29 PM | Comments (2) | TrackBack (0)
December 12, 2005
How Many Wins is a Player Worth
Permalink

A nice article by Alan Schwarz on different ways of estimating a player's win value. The Hardball Times gets a prominent mention.

Posted by StatsGuru at 08:32 AM | Comments (2) | TrackBack (0)
December 11, 2005
The Handedness Theory
Permalink

Dan Fox follows up on his Caribbean players and walks article with a look at how a lack of lefties among Caribbean hitters might account for the difference.

Posted by StatsGuru at 02:09 PM | Comments (3) | TrackBack (0)
December 10, 2005
Splitting up the Teams
Permalink

There's a new addition to the Day by Day Database. Team splits are now in place along with the player splits. Again, you can drill down with home/road, opponent and stadium choices. Enjoy!

Posted by StatsGuru at 06:30 PM | Comments (2) | TrackBack (0)
December 07, 2005
Splitting Hairs
Permalink

Blogging's been a bit light lately as I've been working on a new feature for the Day by Day Database. What I've wanted for a while is an updated database that lets me calculate batting in various situations. I notice that Retrosheet.org posted event data for the years 2000-2004 on their site, so I thought this was a good time to give this program a shot.

As I worked on extracting the data, it occurred to me that it was possible to do a very cool splits program. Instead of just displaying splits by season, the user can input dates, looking at splits over any time frame. It was also possible to use the same selection criteria as the daily logs to drill down even further. So it's possible to look at splits for Bobby Abreu playing at home in Veterans Stadium.

Given the complicated nature of extracting splits for base runners, the database only contains batting events. The splits for pitchers are not pitching splits but opposition batting. Still, it's my belief that this is going to be a very useful research tool. I know of nothing like it on the web.

Baseball Info Solutions sold me the 2005 data and will provide updates in 2006. My thanks to Steve Moyer and Damon Lichtenwalner for their help on this project.

Find splits for batters here.

Find splits for pitchers here.

I've been testing the program, but my test have not been exhaustive. If you find something that looks wrong, or have suggestions for improvements, let me know.

Enjoy!

Posted by StatsGuru at 10:16 PM | Comments (7) | TrackBack (0)
November 28, 2005
The Best Games
Permalink

Dennis Boznango at The Hardball Times looks at what makes a game exciting and develops a formula to rate the most exciting post-season games of all time. I was a bit surprised that you needed to factor in the state of the series to get Braves-Twins game 7, but not all people are as fond of 1-0 shutouts as I am. :-)

Posted by StatsGuru at 08:56 AM | Comments (5) | TrackBack (0)
November 14, 2005
Quod Erat Demonstrandum
Permalink

Rich Lederer at Baseball Analysts uses his QUAD statistic to look at the MVP races. What I like about this method is that it uses a voting technique to combine rankings of various categories. When I worked in information retrieval, this was one idea that was used to combine results from diverse search engines.

His number give the AL award to A-Rod easily while putting the NL hardware in the hands of Derrek Lee by a hair. We'll see if his NL prediction is as good as his AL prediction tomorrow.

Posted by StatsGuru at 09:24 PM | Comments (2) | TrackBack (0)
November 11, 2005
Batter Vs. Pitcher
Permalink

Dan Fox looks for statistical significance in batter vs. pitcher matchups. The Andersons win the battle as Garrett's 0 for 22 vs. Brian has the lowest probability of occuring.

Posted by StatsGuru at 05:40 PM | Comments (0) | TrackBack (0)
October 09, 2005
We Need a New Stat
Permalink

Farnsworth does not get a blown save, since he did not enter in a save situation. As my friend Jim Storer points out, Farnsworth pitched so badly, we don't even have a stat for what he did! I guess we need a blown lead stat to cover situations that aren't blown saves.

Posted by StatsGuru at 04:14 PM | Comments (1) | TrackBack (0)
October 03, 2005
Final Update
Permalink

The Day by Day Database is up to date.

Here are the major league leaders in:

Posted by StatsGuru at 07:07 AM | Comments (0) | TrackBack (1)
October 02, 2005
Penultimate Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:53 AM | Comments (0) | TrackBack (0)
October 01, 2005
Weekend Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
September 30, 2005
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:37 AM | Comments (0) | TrackBack (0)
September 29, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:43 AM | Comments (0) | TrackBack (0)
September 28, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:27 AM | Comments (0) | TrackBack (0)
September 27, 2005
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:37 AM | Comments (0) | TrackBack (0)
September 26, 2005
Early Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 12:36 AM | Comments (0) | TrackBack (0)
September 25, 2005
Sunday Data Dump
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:57 AM | Comments (0) | TrackBack (0)
September 24, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:17 AM | Comments (0) | TrackBack (0)
September 23, 2005
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:15 AM | Comments (0) | TrackBack (0)
September 22, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
September 21, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
September 20, 2005
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:27 AM | Comments (0) | TrackBack (0)
September 19, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:50 AM | Comments (0) | TrackBack (0)
September 18, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
September 17, 2005
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:00 AM | Comments (0) | TrackBack (0)
September 16, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:15 AM | Comments (1) | TrackBack (0)
September 15, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:14 AM | Comments (0) | TrackBack (0)
September 14, 2005
Tiger Tales
Permalink

Ivan Rodriguez is not the only Tiger going for an odd record. Palacido Palanco is trying to be the first player since Eddie Murray in 1990 to have the highest batting average in the majors without winning a batting title.

Posted by StatsGuru at 11:55 AM | Comments (1) | TrackBack (0)
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
September 13, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:38 AM | Comments (0) | TrackBack (0)
September 12, 2005
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:24 AM | Comments (0) | TrackBack (0)
September 11, 2005
Sunday Data Dump
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
September 10, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:55 AM | Comments (0) | TrackBack (0)
September 09, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:19 AM | Comments (0) | TrackBack (0)
September 07, 2005
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:28 AM | Comments (0) | TrackBack (0)
September 05, 2005
Labor Day Update
Permalink

Happy Labor Day! The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
September 04, 2005
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
September 03, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:10 AM | Comments (0) | TrackBack (0)
September 02, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:44 AM | Comments (0) | TrackBack (0)
September 01, 2005
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:54 AM | Comments (0) | TrackBack (0)
August 31, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:40 AM | Comments (0) | TrackBack (0)
August 30, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 02:25 AM | Comments (0) | TrackBack (0)
August 29, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (0) | TrackBack (0)
August 28, 2005
Believeing Win Shares
Permalink

Looking at the Win Shares on the Hardball Times, I'm surprised to see Gary Sheffield ahead of Alex Rodriguez. Alex is having a much better year with the bat. The players are on the same team and both are right-handed so the park factors are the same. Yet Sheffield has two more offensive win shares than Rodriguez. How is that possible?

The answer lies in the clutch adjustments to win shares. When Bill James worked on making the runs created formula more accurate for players, he found he need to make adjustments for batting average with runners in scoring position and home runs with men on base. He makes these adjustments based on comparing the situation to the player's overall averages.

When these adjustments are made for Alex Rodriguez, he loses 9.3 runs from the runs created formuala (I'm doing the calculation through yesterday). He loses 6.9 for his hitting with runners in scoring position and 2.4 for his home runs with men on.

Meanwhile, Sheffield gains from his superior battting in these situations. He gains 11.5 runs from his batting with runners in scoring position and 5.9 for his home runs with men on base (19 of his 27 have come with men on base). That's 17.4 runs in the plus column for Gary. Overall, Sheffield picks up 24.3 runs in the adjustment! It's the big reason why Rodriguez and Sheffield are almost even in RBI, despite Alex having a superior batting average and power numbers.

Bill James believes these adjustments give him the best estimates for runs created. I don't know how much of this is random luck rather than the ability of the players. But right now, Win Shares gives the AL MVP to Gary Sheffield.

Posted by StatsGuru at 02:45 PM | Comments (5) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:08 AM | Comments (0) | TrackBack (0)
August 27, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:27 AM | Comments (0) | TrackBack (0)
August 26, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:19 AM | Comments (0) | TrackBack (0)
August 25, 2005
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
August 24, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:30 AM | Comments (0) | TrackBack (0)
August 23, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
August 22, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:22 AM | Comments (0) | TrackBack (0)
August 21, 2005
Daily Data Fix
Permalink

The Day by Day Database is up to date. It's also working again. My hosts had to move the site to a new server, which fixed the problem. I apologize if you tried to use the database for research over the weekend.

Posted by StatsGuru at 08:48 AM | Comments (0) | TrackBack (0)
August 20, 2005
Day by Day Repairs
Permalink

I'm told the server issues that are keeping the Day by Day Database from working will be resolved around 10 PM tonight. If I'm awake at that point, I'll let you know when it's been repaired. Thanks for your patience.

Posted by StatsGuru at 05:36 PM | Comments (0) | TrackBack (0)
Saturday Sort Of Update
Permalink

The Day by Day Database is up to date, however, my Python scripts are still not executing on the Hosting Matters server. I'll post when the problem is resolved.

Posted by StatsGuru at 08:12 AM | Comments (0) | TrackBack (0)
August 19, 2005
Day by Day Down
Permalink

The Day by Database is not currently working. My hosting service is working on the problem. I don't have an estimate as to when it will be fixed.

Posted by StatsGuru at 04:10 PM | Comments (1) | TrackBack (0)
Vacation Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:07 AM | Comments (0) | TrackBack (0)
August 18, 2005
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:53 AM | Comments (0) | TrackBack (0)
August 17, 2005
Vacation Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:51 AM | Comments (0) | TrackBack (0)
August 16, 2005
Player Graphs
Permalink

Fangraphs is a new site that compiles 21 different graphs for each major league player each day. Take a look, you may find some useful information there.

Posted by StatsGuru at 03:39 PM | Comments (3) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:29 AM | Comments (0) | TrackBack (0)
August 15, 2005
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:44 AM | Comments (0) | TrackBack (0)
August 14, 2005
Vacation Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (0) | TrackBack (0)
August 13, 2005
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:51 AM | Comments (0) | TrackBack (0)
August 12, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:48 AM | Comments (0) | TrackBack (0)
August 11, 2005
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:45 AM | Comments (0) | TrackBack (0)
August 10, 2005
Triple Crown Chase
Permalink

Miguel Cabrera goes 2 for 4 with a walk tonight to raise his batting average to .349 and pass Derrek Lee for the batting average lead in the National League. Lee is now out of the top spot in all three categories.

Based on their careers, Albert Pujols probably has a higher probability of winning the triple crown this season. He's 2nd in RBI and 4th in home runs, and his current numbers are pretty much in line with his career. We might expect Derrek Lee to revert to his mean.

Posted by StatsGuru at 10:54 PM | Comments (3) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:03 AM | Comments (0) | TrackBack (0)
August 09, 2005
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:52 AM | Comments (0) | TrackBack (0)
August 08, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 12:03 AM | Comments (0) | TrackBack (0)
August 07, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:14 AM | Comments (0) | TrackBack (0)
August 06, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
August 05, 2005
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:44 AM | Comments (0) | TrackBack (0)
August 04, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:58 AM | Comments (0) | TrackBack (0)
August 03, 2005
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
August 02, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:52 AM | Comments (0) | TrackBack (0)
August 01, 2005
The Month that Was
Permalink

It looks to me that Jason Giambi will be player of the month in July. He had the best OBA and the best slugging percentage. A case certainly can be made for Adam Dunn in the NL, but Pujols, Miguel Cabrera, Aramis Ramirez and Griffey all deserve consideration.

And please scroll to the bottom of the OBA list to take a look at Vlad Guerrero. His averages were .208/.264/.376, yet he managed to drive in 21 runs. I guess he had a clutch month! Can you imagine his RBI total if he had hit closer to his career averages?

Ex-Yankees loom large in the list of best ERAs in July. Pettitte, Clemens and Halsey are all in the top 10. With three Astros starters in the top 10, it's easy to see why Houston was able to make up so much ground in July.

The Pirates must be very happy with Zach Duke. In his first month in the big leagues, he posts the best ERA in the majors.

At the other end of the list, Matt Clement probably had the worst month of any starter. Not only did he have an 8.88 ERA, but he ended the month in the hospital after taking a liner to the head. And Dontrelle Willis' poor July probably took him out of the ERA race. He's now more than a run and a half behind Clemens.

A pair of Johns get the hard luck pitcher of the month award. Lackey and Patterson had great strikeout totals, excellent ERAs and .500 records. Patterson especially, with his 1.54 ERA in six starts deserved more than a 1-1 record.

Posted by StatsGuru at 08:08 AM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:30 AM | Comments (0) | TrackBack (0)
July 31, 2005
Fair Run Average
Permalink

Fair Run Average gets an article in the New York Times today. Basically, it's a way of correcting ERAs for bad relief pitching.

On average, putting runners on first and second with no outs - the mess Foster made - leads to 0.9 more runs than a typical inning. One of the runners often scores. Both of them usually do not.

In a fairer world, the main pitching statistic would have charged Foster for the damage he did, 0.9 runs, but no more, whether or not the two runners ended up scoring. Since they did score, Kolb would have gotten the rest of blame: 1.1 runs' worth.

Had the runners been stranded, Kolb would instead have received credit for his good work. His line could have been 1 inning with minus 0.9 runs allowed. His runs-allowed total for the season would actually drop.

The problem with calculations like this is they change season-to-season. So FRA is a moving target; as the probability of scoring changes with each game, a pitcher's FRA may change without him pitching. This is one of those statistics (like linear weights) which can only be accurately calculated after the season is over.

Posted by StatsGuru at 10:28 AM | Comments (3) | TrackBack (0)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:23 AM | Comments (0) | TrackBack (0)
July 30, 2005
Data Dump
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:48 AM | Comments (0) | TrackBack (0)
July 29, 2005
Daily Dose of Data
Permalink

The Day by Day Database is now up to date.

Posted by StatsGuru at 09:48 AM | Comments (0) | TrackBack (0)
Daily Data Delayed
Permalink

The update of the Day by Day Database will be delayed today as I haven't received the data yet.

Posted by StatsGuru at 08:55 AM | Comments (0) | TrackBack (0)
July 28, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
July 27, 2005
Hump Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:30 AM | Comments (0) | TrackBack (0)
July 26, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:49 AM | Comments (0) | TrackBack (0)
July 25, 2005
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:33 AM | Comments (0) | TrackBack (0)
July 24, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:29 AM | Comments (0) | TrackBack (0)
July 23, 2005
Saturday Data Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:09 AM | Comments (0) | TrackBack (0)
July 22, 2005
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:57 AM | Comments (0) | TrackBack (0)
July 21, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:05 AM | Comments (0) | TrackBack (0)
July 20, 2005
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:04 AM | Comments (0) | TrackBack (0)
July 19, 2005
Enhanced Graphs
Permalink

Andrew Clark enhanced MajorLeagueCharts.com. Stop by and check out the improvements.

Posted by StatsGuru at 08:34 PM | Comments (0) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:06 AM | Comments (0) | TrackBack (0)
July 17, 2005
Late Night Update
Permalink

Due to games ending early today (I knew the ESPYs were good for something), the Day by Day Database is up to date.

Posted by StatsGuru at 10:19 PM | Comments (0) | TrackBack (0)
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:10 AM | Comments (1) | TrackBack (0)
July 16, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:07 AM | Comments (0) | TrackBack (0)
July 15, 2005
Day Break
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:06 AM | Comments (0) | TrackBack (0)
July 11, 2005
Distributed Scoring
Permalink

BitTorrent is software that allows you to load very large files very quickly by pulling small pieces of data from many different systems. Others are searching for compounds that might make new drugs by taking advantage of spare CPU time on computers connected to the internet. Distributing the work allows more to get done faster without burdening one particular machine or person.

I'm wondering if the same would work for scoring games. Companies that collect game data require two or three people to devote a few hours to watching and tracking a sporting event. It means you need to pay attention during the entire period. It's difficult to take a phone call, deal with your family, get a bite to eat, even go to the bathroom. These companies compensate the scorer with press box seats, or a small wage, or both.

And because they compensate their scorers, they need to charge for their stats. But, what if instead of requiring hours to score a game, advance sign ups and so forth, all that was required a small amount of a fan's leisure time? You're sitting down to watch a game for 1/2 an hour? Surf to the Distributed Scoring System and enter the plays. Or score on paper and enter it the next day.

Some people might do a full game, some might just do an inning. The idea would be to get enough people that you'd cover the whole game at least twice (for error checking purposes). The results would go into a play-by-play database from which you could then do any kind of research you wanted.

A rough specification of the system would look like this:

  • Web Based: It should run in almost any browser. It probably should be written in server side scripting language since you can't depend on machines supporting Java Script.
  • The smallest unit of scoring would be the half inning. Events in the half inning are dependent on previous events in the half inning, but each half inning is independent. If we made the smallest unit the plate appearance, the user would need to put in a lot more information for each PA.
  • The order of the innings shouldn't matter. People could score the innings backward, 9 to 1. They just need to input an inning beginning to end.
  • Input would be a pitch sequence followed by an event type. That would take you to another screen where the details of the event are entered, depending on the type of event and the situation.
  • Input should be as simple as possible. Single keystrokes should represent pitches and events as much as possible. Having worked with both text based systems and mouse based systems, I find the text based systems are faster for me.
  • Data would be stored in a temporary database until it could be verified.

I'm interested in what people think about this idea, both from technical people on the feasability of the programs, and if others would be willing to score like this. It's a way to give all bloggers, baseball researchers and fans on the net an easy way to research complicated questions.

Posted by StatsGuru at 03:11 PM | Comments (14) | TrackBack (0)
All-Star Break Update
Permalink

The Day by Day Database is now up to date through the end of the first half of the season.

Posted by StatsGuru at 08:22 AM | Comments (1) | TrackBack (0)
July 10, 2005
Lee Leading Triple Crown
Permalink

It looks like the NL first half will end with Lee in the triple crown lead. Derrek is first in BA and tied for first in home runs, while Carlos leads in RBI. Carlos is in no dander of winning the Batting Title, but Derrek is only four RBI behind Carlos for the top spot(pending Albert Pujols' performance tonight).

Derek is now only five homers away from tying his career high, and 26 RBI from tying that mark. It's going to be a fun story to watch in the second half.

Correction: Derrek is behind Carlos, not himself. (That's what I get for blogging on very little sleep and lots of driving over the last two days.)

Posted by StatsGuru at 05:05 PM | Comments (1) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:36 AM | Comments (0) | TrackBack (0)
July 09, 2005
Saturday, at the Park
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:29 AM | Comments (0) | TrackBack (0)
July 08, 2005
Friday Update
Permalink

The Day by Day Database is up to date. Check out the new feature, team starter logs.

I'm going to be away this weekend, as my sister is getting married tonight and I'm attending the Red Sox-Orioles game tomorrow with my college roommates. I don't know what my internet connectivity is going to be like, so there's a possibility that the Day by Day Database won't get updated in a timely fashion over the next two days.

Posted by StatsGuru at 07:32 AM | Comments (3) | TrackBack (0)
July 07, 2005
Starter Log
Permalink

There's a new addition to the programs operating at the Day by Day Database. The starting pitcher log allows you to select a team, input dates, and see the starters for that team between those dates. It's a nice way of looking at a team's rotation. For example, here's the great run by the Athletics' starters.

Posted by StatsGuru at 08:48 PM | Comments (1) | TrackBack (0)
Thursday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:31 AM | Comments (0) | TrackBack (0)
July 06, 2005
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:22 AM | Comments (0) | TrackBack (0)
July 05, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:06 AM | Comments (0) | TrackBack (0)
July 04, 2005
Holiday Update
Permalink

Happy Independence Day! I hope everyone's been enjoying the long weekend. The Day by Day Database is up to date.

Posted by StatsGuru at 08:36 AM | Comments (0) | TrackBack (0)
July 03, 2005
Daily Does of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:53 AM | Comments (0) | TrackBack (0)
July 02, 2005
Beach Day
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:34 AM | Comments (0) | TrackBack (0)
July 01, 2005
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:20 AM | Comments (0) | TrackBack (0)
June 30, 2005
Death to Batting Average!
Permalink

When sports writers start calling for batting average to be put to death as a statistic, you know the times are changing. Joe Posnanski does that today:

But wherever we turn, batting average has to go. Batting average is the horse and buggy. It is black-and-white television. Its time has passed. I don’t know how we go about getting rid of it, though. Lately, Congress has shown a lot of interest in baseball; maybe we could get them to forget about steroids and pass a constitutional amendment.

“I think this is too important an issue to be dealt with by trivial measures like a constitutional amendment,” Bill James said. “That’s just putting a Band-Aid on it.”

I agree with the article mostly. Batting average does have one good use, and that's awarding batting championships. Batting average recognizes that the game, in the fans view, is about hitting. It rewards players who get lots of hits, but it also doesn't hurt players who draw lots of walks. If you have two players with 600 PA and 200 hits, the one who walks more gets the higher batting average. So if you don't walk much, you needs lots more hits to win a batting title (see Ichiro Suzuki or Tony Gwynn). Those hits are more valuable than the other person's walks, so batting average does a nice job of balancing the two.

It's just a poor tool for rating player's ability. If batting average does die, I won't be sorry.

Thanks to Brian Hipp for the pointer to the article.

Posted by StatsGuru at 11:05 AM | Comments (8) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
June 29, 2005
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:40 AM | Comments (0) | TrackBack (0)
June 28, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
June 27, 2005
Predicted Winning Percentages
Permalink

It seems a comment left on this blog led Dan Fox to look at how accurate the Pythagorean method is mid-season.

Posted by StatsGuru at 12:59 PM | Comments (5) | TrackBack (0)
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:04 AM | Comments (0) | TrackBack (0)
June 26, 2005
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:08 AM | Comments (0) | TrackBack (0)
June 25, 2005
Weekend Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:45 AM | Comments (0) | TrackBack (0)
June 24, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:59 AM | Comments (0) | TrackBack (0)
June 23, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:36 AM | Comments (0) | TrackBack (0)
June 22, 2005
Hump Day Data Dump
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:38 AM | Comments (0) | TrackBack (0)
June 21, 2005
Morning Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
June 20, 2005
Monday Morning Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:22 AM | Comments (1) | TrackBack (0)
June 19, 2005
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:30 AM | Comments (0) | TrackBack (0)
June 18, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:44 AM | Comments (0) | TrackBack (0)
June 17, 2005
Team Left On Base
Permalink

A reader was looking for team left on base. Here's the table.

Team Left On Base through June 16, 2005.
Team LOB
ARI 520
NYA 518
BOS 518
PHI 504
SD 503
OAK 484
NYN 470
LAD 468
CIN 468
COL 467
WSH 465
STL 464
MIL 460
SF 454
CHN 452
MIN 451
PIT 450
FLA 449
SEA 443
BAL 442
TB 442
ATL 439
TEX 438
KC 432
TOR 428
CLE 427
HOU 417
LAA 410
DET 407
CHA 401
Posted by StatsGuru at 04:38 PM | Comments (7) | TrackBack (0)
Daily Update
Permalink

The Day by Day Database is up to date.

Today's update was sponsored by Baseball Digest Daily. Read their latest story on Roy Smith and the Dodgers here.

Posted by StatsGuru at 07:17 AM | Comments (0) | TrackBack (0)
June 16, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:13 AM | Comments (0) | TrackBack (0)
June 15, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:57 AM | Comments (0) | TrackBack (0)
June 14, 2005
Tuesday Data Dump
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:13 AM | Comments (0) | TrackBack (0)
June 13, 2005
Database Improvements
Permalink

When I'm using the comparison functions of the Day by Day Database, I've wanted to click on a player's name and see his log for that set of parameters. It's now coded and tested. You can try it here.

Posted by StatsGuru at 08:39 PM | Comments (5) | TrackBack (0)
Rainy Days and Mondays
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:32 AM | Comments (0) | TrackBack (0)
June 12, 2005
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
June 11, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:25 AM | Comments (0) | TrackBack (0)
June 10, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:53 AM | Comments (0) | TrackBack (0)
June 09, 2005
Thursday Morning Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:56 AM | Comments (0) | TrackBack (0)
June 08, 2005
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:19 AM | Comments (0) | TrackBack (0)
June 07, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:08 AM | Comments (1) | TrackBack (0)
June 06, 2005
Gaining Acceptance
Permalink

Larry Borowsky writes that the St. Louis Post-Dispatch has embraced VORP. Their lead columnist, Bernie Miklasz, used it today to compared Renteria and Eckstein. Larry is excited and rightly so.

During spring training, sports writers maintained blogs for both the Nationals and Reds. I noticed in reading those that there was a lot of talk about on-base average, especially in regards to lead-off hitters. It would have been rare to see that 20 years ago.

It's a great example of Kuhn's theories in The Structure of Scientific Revolutions. Basically, young people discover something new, and older folks who have a lot invested in the old idea are very slow to accept it. But younger people have no such investment, and they adopt the new idea quickly. As the old guard dies off, the new theory becomes the accepted one.

So now people my age (45) are taking charge of the sports desk. Like me, they were raised on Bill James and his successors. They start from batting average not being good enough, and are more willing to look at new stats. It takes time, but someday things like VORP will be as common place as batting average.

Posted by StatsGuru at 02:06 PM | Comments (0) | TrackBack (1)
Monday Morning Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:30 AM | Comments (0) | TrackBack (0)
June 05, 2005
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:25 AM | Comments (0) | TrackBack (0)
June 04, 2005
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:16 AM | Comments (0) | TrackBack (0)
June 03, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:51 AM | Comments (0) | TrackBack (0)
June 02, 2005
Daybreak Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:04 AM | Comments (0) | TrackBack (0)
June 01, 2005
The Merry Month of May
Permalink

With another month out of the way, let's look back and see who did well over the last 31 days. Bobby Abreu certainly looks like the offensive player of the month. He led the majors in both on base average and slugging percentage. He also tied Carlos Lee with 30 RBI.

Johnny Damon and Brian Giles led batters in reaching home plate, scoring 25 runs each. Giles did it with power, collecting 18 extra-base hits. And Jose Reyes almost had as many triples as walks, collecting 7 of the former and 8 of the latter. (I wonder what the record for triples in a month is?)

Todd Helton was the most feared hitter of the month, collecting 7 intentional walks, while Abreu had the best eye, getting 30 free passes. Adam Dunn missed the most, striking out 34 times.

On the pitching side, Kenny Rogers was a big reason for Texas having a great month. Kenny was the only pitcher with 25 innings in May to post a sub 1.00 ERA, coming in at 0.98. Amazingly, he did it striking out just 3.72 per 9 innings. He's no Kirk Rueter, however. Woody only struck out 1.30 per 9 in the month. At the other end of the scale, Johan Santana was tops in the majors at 9.78 K per 9.

Rogers was also the only six game winner in the majors in May, although five others went at least 4-0. Mark Buehrle was the work horse, amassing 48 innings without throwing a complete game. That's 8 innings a start. No one was pounded more than Ezequiel Astacio, who gave up 10 homers in 19 2/3 innings.

Russ Ortiz should be renamed the Walking Man as he issued 8.16 BB per 9. Meanwhile, Javier Vazquez did not issue a free pass in the month (although he did hit three batters).

It looks like Abreu and Rogers deserve player of the month honors. Starting today, we'll see who are the heroes of June.

Posted by StatsGuru at 08:24 AM | Comments (2) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:21 AM | Comments (0) | TrackBack (0)
May 31, 2005
Back to Work Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:46 AM | Comments (0) | TrackBack (0)
May 30, 2005
Holiday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:29 AM | Comments (0) | TrackBack (0)
May 29, 2005
Sunday Statistics
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:58 AM | Comments (0) | TrackBack (0)
May 28, 2005
Saturday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:06 AM | Comments (1) | TrackBack (0)
May 27, 2005
Get Away Friday
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:31 AM | Comments (0) | TrackBack (0)
May 26, 2005
Brain Damage
Permalink

Darren Viola finds that Retrosheet.org is a good way to make up for dead brain cells.

Posted by StatsGuru at 03:55 PM | Comments (0) | TrackBack (0)
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:53 AM | Comments (0) | TrackBack (0)
May 25, 2005
Wednesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:03 AM | Comments (0) | TrackBack (0)
May 24, 2005
Creating Runs
Permalink

Working with the new runs created formula in the last post made me want to see which teams are overperforming and underperforming their run expectations this season. Here's the chart showing runs vs. runs created for all 30 teams.

Team Runs Scored Runs Created Games Runs Per Game RC Per Game (RC/Games)
NYA 246 242.0 44 5.6 5.5
BOS 237 241.2 43 5.5 5.6
BAL 236 247.6 43 5.5 5.8
TEX 234 223.5 44 5.3 5.1
STL 232 222.0 44 5.3 5.0
LAD 214 219.0 43 5.0 5.1
ATL 212 192.2 44 4.8 4.4
TOR 211 204.9 44 4.8 4.7
COL 200 208.8 42 4.8 5.0
FLA 195 199.1 41 4.8 4.9
NYN 211 212.2 45 4.7 4.7
MIN 198 202.6 43 4.6 4.7
SD 202 206.5 44 4.6 4.7
CHA 205 193.7 45 4.6 4.3
TB 205 212.3 45 4.6 4.7
DET 188 192.2 42 4.5 4.6
CIN 195 199.2 44 4.4 4.5
SF 188 199.5 43 4.4 4.6
CHN 183 199.4 42 4.4 4.7
ARI 195 204.5 45 4.3 4.5
PHI 199 215.2 46 4.3 4.7
MIL 190 192.6 44 4.3 4.4
SEA 184 168.3 43 4.3 3.9
LAA 184 168.8 44 4.2 3.8
KC 183 167.2 44 4.2 3.8
WSH 184 193.4 45 4.1 4.3
CLE 170 174.7 43 4.0 4.1
OAK 169 164.1 43 3.9 3.8
PIT 163 173.1 42 3.9 4.1
HOU 160 172.0 44 3.6 3.9

This should be encouraging to Baltimore fans. The Orioles are in first place and scoring less than predicted. There is untapped potential offense there! It also should be somewhat worrying to White Sox fans, as Chicago is not only exceeding it's expected won-lost record by three games, but it's partially built on an overachieveing offense.

It's difficult to believe the KC Royals are outperforming their expectation, but they are tied with the Angels and the Braves at .4 per game over their predicted value. The Royals are hitting 20 points lower with runners on than with the bases empty, but the hits they are getting are long hits. So they are doing a good job of drving runners around despite a low BA and OBA in the situation.

The Phillies are at the other end of the scale, .4 below their expected runs per game. With men on, the Phillies OBA goes up but their power goes down. It looks like opponents have found holes where they can pitch around the dangerous hitters with men on base. A return to form by Jim Thome would improve that situation.

Posted by StatsGuru at 09:50 AM | Comments (5) | TrackBack (0)
Daily Dose of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
May 23, 2005
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
May 22, 2005
Sundae Sunday
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:46 AM | Comments (0) | TrackBack (0)
May 21, 2005
Sharing the Wins
Permalink

The Hardball Times has their first installment of 2005 Win Shares posted. This year they've improved the display greatly, allowing sorting on any column. Thanks to Studes and his team at HBT for this great resource.

Posted by StatsGuru at 08:01 PM | Comments (2) | TrackBack (0)
Graduation Day
Permalink

The Day by Day Database is up to date.

Congratulations to my nephew Alexei Saba on his high school graduation today!

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
May 20, 2005
Friday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 10:12 AM | Comments (0) | TrackBack (0)
May 19, 2005
Day of the Jedi
Permalink

Up to date the Day to Day Database is.

Posted by StatsGuru at 07:03 AM | Comments (0) | TrackBack (0)
May 18, 2005
A Day in the LIfe
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:18 AM | Comments (0) | TrackBack (0)
May 17, 2005
Yankees Research
Permalink

Palace of the Fans makes good use of the Day by Day Database to research the Yankees streak. It's good to see these programs are useful to researchers. If you'd like to see anything added, feel free to send me a suggestion.

Posted by StatsGuru at 01:59 PM | Comments (1) | TrackBack (0)
Left On
Permalink

Two bloggers make points about leaving men on in last night's games. The Soxaholix is complaining about the Red Sox left leaving runners, including two innings in which Boston had the bases loaded with less than two out and didn't score. Was Watching exagerrates the number left on base (13 not 23) but his point is still well taken; the Yankees had a lot more opportunities than they converted.

Both approach the Left On Base stat as a bad thing. That's not really true. Leaving lots of men on base is often a sign of strong offense, one that puts lots of men on base! Here's the thirty teams this season, ranked by most left on base:

Team Left On Base, 2005 Season, through May 16.
Team LOB
SD 310
NYA 304
BOS 302
PHI 296
ARI 295
OAK 290
LAD 282
CIN 275
TB 275
STL 274
WSH 274
NYN 274
SF 273
TOR 267
MIL 267
TEX 265
CHN 264
BAL 262
COL 260
MIN 259
PIT 253
HOU 251
CLE 251
SEA 250
CHA 248
FLA 248
ATL 248
DET 237
KC 235
LAA 223

Last year, if you listened to Boston sports radio during the first half of the year, the question on everybody's mind was what good is all these people on base if you don't drive them in. Eventually, they come around to score, and that's what happened in the second half of last season. If you look at the chart above, you see that the Yankees and Red Sox, the two higest scoring teams in the majors, are near the top. They leave a lot of men on base because they put a lot of men on base and score a lot of runs. So if you're a Phillies fan, I'd be encouraged by this chart. Your team gets plenty of opportunities and with some luck those will turn into runs.

And just note that the Angels, who have left the fewest, also have not generated a lot of runs this season. They're not leaving a lot on simply because there's not a lot to leave on. Right now, I'd much rather have Oakland's offense than Los Angeles's; both are weak, but at least the Athletics have the opportunities to drive in runs.

Update: Bill Ferris writes:

I agree with your point that LOB isn't necessarily a bad thing and can be indicative of a good offense. However, I don't agree with the Oakland versus Anaheim conclusion at the end.

I believe it was Tom Tippett that came up with run efficiency average, which is the runs scored divided by (total bases+walks+hbp) as a measure of throughput. I atttached an excel sheet which has TBW, REA, and TBW/game. The Angels and A's are both at the bottom in terms of TBW/game, so neither offense is good. However, the Angels have been more efficient at getting those runners home, while the A's are down near the bottom again.

Also interesting to note is that the White Sox have been very efficient despite not having a particularly strong offense.

REA is measuring what has happened, not an ability. I believe the current lack of power on the Athletics is an anomaly. When the power returns to Chavez and Durazo, the men on base will start coming around to score.

Posted by StatsGuru at 08:55 AM | Comments (1) | TrackBack (1)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:12 AM | Comments (0) | TrackBack (0)
May 16, 2005
Monday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:20 AM | Comments (0) | TrackBack (0)
May 15, 2005
Sunday in Washington
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 09:37 AM | Comments (0) | TrackBack (0)
May 14, 2005
Saturday, In the Park
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 06:45 AM | Comments (0) | TrackBack (0)
May 13, 2005
Unlucky Update
Permalink

Friday the 13th gremlins delayed the update today, but the Day by Day Database is now up to date.

And if you have a few minutes and are so inclined, please fill out the Baseball Musings survey.

Posted by StatsGuru at 09:52 AM | Comments (0) | TrackBack (0)
May 12, 2005
Daily Update
Permalink

The Day by Day Database is up to date. And check out Orlando Palmeiro's interesting five game hit streak. That's making the most of your opportunities!

Posted by StatsGuru at 07:26 AM | Comments (0) | TrackBack (0)
May 11, 2005
Hump Day Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:01 AM | Comments (0) | TrackBack (0)
May 10, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

If you have a few minutes, I would appreciate you taking a survey. It will help me with advertisers and it's anonymous. Thanks.

Posted by StatsGuru at 07:58 AM | Comments (2) | TrackBack (0)
May 09, 2005
Clutch Hitting Evidence
Permalink

Although I haven't heard from Elan Fuld yet, the reporter who penned the story last week about clutch hitters sent me the power point presentation. I haven't had a chance to study the details yet, but it looks like an interesting study. I'm hoping to interview the author and maybe get his permission to post the slide presentation.

Posted by StatsGuru at 02:15 PM | Comments (0) | TrackBack (0)
Monday, Monday
Permalink

The Day By Day Database is up to date.

Posted by StatsGuru at 08:42 AM | Comments (0) | TrackBack (0)
May 08, 2005
Mother's Day
Permalink

Happy Mother's Day to all the moms who love the national pastime, and especially to my wife Marilyn. The Day by Day Database is up to date.

Posted by StatsGuru at 05:38 AM | Comments (0) | TrackBack (0)
May 07, 2005
Music City Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:39 AM | Comments (0) | TrackBack (0)
May 06, 2005
I Do Believe In Spooks, I Do I Do I Do
Permalink

I shouldn't have bother with the runners in scoring position research. Via the Baseball Crank, Elan Fuld has proven that clutch hitters exist! Unfortuntately, a Google search for his name does not turn up the research. If you know Elan, have him contact me, I'd love to see his work.

Posted by StatsGuru at 07:08 PM | Comments (1) | TrackBack (2)
Country Update
Permalink

Live from Nashville, the Day by Day Database is up to date.

Posted by StatsGuru at 11:26 AM | Comments (0) | TrackBack (0)
May 05, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:55 AM | Comments (0) | TrackBack (0)
May 04, 2005
Streakin'
Permalink

There's a new addition to the Day by Day Database, current hit streaks. This will display all players with hit streaks in the current season of five games or more. Now I don't think a hit streak of five games is a big deal, but it's nice to see a trend coming early if it develops into something interesting.

Posted by StatsGuru at 09:08 AM | Comments (0) | TrackBack (0)
Mid-Week Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:27 AM | Comments (0) | TrackBack (0)
May 03, 2005
Daily Does of Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:21 AM | Comments (0) | TrackBack (0)
May 02, 2005
Daily Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:13 AM | Comments (2) | TrackBack (0)
May 01, 2005
Easter Update
Permalink

The Day by Day Database is up to date. And happy Easter to all my Eastern Orthodox readers!

Posted by StatsGuru at 08:40 AM | Comments (1) | TrackBack (0)
April 30, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:38 AM | Comments (0) | TrackBack (0)
April 29, 2005
Friday Morning Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 07:40 AM | Comments (0) | TrackBack (0)
April 28, 2005
If It's Thursday, This Must Be an Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:23 AM | Comments (0) | TrackBack (0)
April 27, 2005
Daily Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:15 AM | Comments (0) | TrackBack (0)
April 26, 2005
RISP Confidence Chart
Permalink

I wanted to follow up the scoring position post from Sunday with a graph. (Click on the image for a larger view.)

Overall BA vs. RISP BA.

As you can see, the trend line's slope is pretty close to 1, meaning that the if you want to predict a player's BA with runners in scoring position, you'll make a pretty good guess if you pick his career average.

Posted by StatsGuru at 07:19 PM | Comments (0) | TrackBack (0)
One Run Games and Winning
Permalink

Bill Ferris sent me this post on the correlation of winning in one-run games and winning overall. His nice graphs show the randomness of the whole thing. Notice that the correlation between overall winning percentage and winning percentage in one-run games is .57. That means, given any team, you could probably do as well guessing if they are above or below their overall percentage in one-run games.

Posted by StatsGuru at 08:17 AM | Comments (1) | TrackBack (1)
Tuesday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
April 24, 2005
RISP Confidence
Permalink

Reading the Bill James article this morning made me wonder if there was a way to study the situational hitting issue in a different way. I have compiled a database from Retrosheet of every play from 1974 to 1992. (Those are the years Retrosheet has complete seasons.) I decided to look at the data to see if anyone hit significantly better or worse with runners in scoring position over that time.

First, I do not mean to say this is a clutch statistic. My opinion is that it is very difficult to define a clutch statistic. For every situation you name, I'll come up with a subset within it that's not clutch. As you eliminate more and more non-clutch situations, you end up with a very small sample size of real clutch situations for players. But runners in scoring position (RISP) is a nice proxy.

I decided to study the group of players during that time frame who had at least 1500 AB. At 1500 AB, we're getting a good handle on the ability of a hitter. Someone who hits .300 over 1500 AB has a 95% confidence interval of .277 to .323. In other words, it's very unlikely for a .300 hitter to hit .260 over 1500 AB.

So for each player with 1500 AB from 1974 to 1992, I record their AB, hits, RISP AB and RISP hits. I took hits/AB to represent the probability of that player getting a hit. I used that probability to calculate a 95% confidence interval for the expected number of RISP Hits given the number of RISP AB.

There were 567 players in the study. Now, my expectation would be that 95% of the playes would be within the 95% confidence interval. In other words, given 567 players, I would expect 14 hitters to be above the interval and 14 to below the interval.

Instead, the study only found 10 players to be above their hit expectation and 4 to be below.

So if 567 players studied, 553 all had hits with runners in scoring position within the 95% confidence interval! Players hit with runners in scoring position just as we'd expect them to hit.

Here's a list of the players who hit above their expected level with runners in scoring position. Pat Tabler appears to have deserved his reputation.

Players Above 95% Confidence Interval
PlayerAt BatsHitsRISP ABRISP HitsLow End 95% CIHigh End 95% CIBARISP BA
Jay Bell 2444 627 535 158 118 157 0.257 0.295
Robin Ventura 1736 470 432 137 99 135 0.271 0.317
Greg Vaughn 1538 360 417 117 81 115 0.234 0.281
Mark McGwire 3123 772 781 220 170 217 0.247 0.282
Pat Tabler 3911 1101 1096 347 280 338 0.282 0.317
Scott Fletcher 4411 1155 1058 314 249 305 0.262 0.297
Larry Parrish 6792 1789 1717 491 417 488 0.263 0.286
John Ellis 1573 409 394 124 86 120 0.260 0.315
Rennie Stennett 3532 966 800 248 194 244 0.273 0.310
Frank Duffy 1864 422 415 118 78 111 0.226 0.284

Here's the list of players who hit below expectations. Mickey Tettleton, one of my favorites, is on the list.

Players Below 95% Confidence Interval
PlayerAt BatsHitsRISP ABRISP HitsLow End 95% CIHigh End 95% CIBARISP BA
Mickey Tettleton 2873 693 707 147 148 193 0.241 0.208
Dave Anderson 2026 490 461 93 94 130 0.242 0.202
Rick Schu 1564 386 348 65 70 102 0.247 0.187
Earl Williams 1518 365 398 78 79 113 0.240 0.196

My conclusion is that there's no difference between a player's overall batting average and his batting average with runners in scoring position that can't be explained by luck. If you'd like to play with the data, here's a spreadsheet you can download. As always, I'm interested in your comments on this experiment.

I've put the table of all players in the study sorted by RISP BA in the extended entry. Notice how the good hitters tend to be good with runners in scoring position.

Update: I was not able to fit the entire table in the extended entry. You can see the rest by downloading the spreadsheet and manipulating the data.

Read More ?


Posted by StatsGuru at 05:32 PM | Comments (12) | TrackBack (2)
Fog and Shadows
Permalink

David Leonhardt in the New York Times discusses a recent article in The Baseball Research Journal by Bill James. In the article, Bill argues that a method used to evaluate if clutch hitters exist is not a viable method.

All “real” skills in baseball (or anything else) are persistent at least to some extent. Intelligence, bicycle riding, alcoholism, income-earning capacity, height, weight, cleanliness, greed, bad breath, the ownership of dogs or llamas and the tendency to vote Republican . . . all of these are persistent phenomena. Everything real is persistent to some measurable extent. Therefore, if something cannot be measured as persistent, we tend to assume that it is not real.

Bill argues that too much noise in the data makes measuring of persistence of certain statistics nearly impossible. One thing I love about Bill is that if he feels he made a mistake in doing research, he comes right out and says it, and this is one of those articles. Mind you, he's not saying that clutch ability exists; he's saying that persistence studies can't prove that it doesn't exist.

Here's his conclusion about clutch hitting:

On (1), it is my opinion that this should be regarded as an open question. While Dick Cramer is a friend of mine, and I have tremendous respect for his work, I am convinced that, even if clutch-hitting skill did exist and was extremely important, this analysis would still reach the conclusion that it did, simply because it is not possible to detect consistency in clutch hitting by the use of this method.

So there you have it. Clutch hitting is an open question. Sounds like a great research project for a budding sabermetrician!

One thing I've noticed during my research time is that when I did long term studies of players hitting overall vs. hitting in various situations, the same players tended to cluster together, only in different orders. At the end of 2003 (when I still had access to STATS, Inc. database of situational hitting from 1974 to the present), I looked at slugging percentage with runners in scoring position from 1974-2003. The top 25 in that list were the same as the top 25 for overall slugging percentage from 1974-2003, just in a different order (I believe it was based on a minimum of 5000 overall AB). A study along those lines for batting average or OBA would go a long way toward showing the existence or non existence of this ability.

Posted by StatsGuru at 09:29 AM | Comments (2) | TrackBack (1)
Sunday Update
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
April 23, 2005
Day in the Park
Permalink

The Day by Day Database is now up to date.

Posted by StatsGuru at 08:44 AM | Comments (0) | TrackBack (0)
April 22, 2005
Checking the Numbers
Permalink

Following up on a post from a week ago, scoring is still down about 1/2 a run from the same time last year:

First Nineteen Days20042005
Games230237
Runs22732205
Runs/Game9.99.3
Home Runs522469
HR/Game2.32.0
The Major League save percentage is up closer to where it was last year, however. Through the first nineteen days of 2004, it was 64.2% (115/179). So far through 2005 it's 61.6 (106/172).
Posted by StatsGuru at 01:41 PM | Comments (3) | TrackBack (0)
Day Late
Permalink

I forgot to mention earlier this morning that the Day by Day Database is up to date.

Posted by StatsGuru at 11:31 AM | Comments (0) | TrackBack (0)
April 21, 2005
New Day
Permalink

The Day by Day Database is now up to date.

Posted by StatsGuru at 07:54 AM | Comments (0) | TrackBack (0)
April 20, 2005
Day Break
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 05:51 AM | Comments (0) | TrackBack (0)
April 19, 2005
Poor Starts
Permalink

Curveblog uses the Day by Day Database to look at some poor starts by Cardinal hitters over the years.

Posted by StatsGuru at 11:55 AM | Comments (1) | TrackBack (0)
Daily Data
Permalink

The Day by Day Database is now up to date.

Update: Implemented a speed up to load the names of players much faster when you select daily logs for batters or pitchers.

Posted by StatsGuru at 08:41 AM | Comments (0) | TrackBack (0)
April 18, 2005
Patriot's Day
Permalink

On the celebration of the 230th anniversary of the start of the American Revolution, the Day by Day Database is up to date.

Posted by StatsGuru at 06:22 AM | Comments (1) | TrackBack (0)
April 17, 2005
Day Oh!
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:58 AM | Comments (0) | TrackBack (0)
April 16, 2005
Day Updates
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:11 AM | Comments (0) | TrackBack (0)
April 15, 2005
Tax Day
Permalink

The Day by Day Database is up-to-date. Use it to audit your favorite players. :-)

Posted by StatsGuru at 07:25 AM | Comments (0) | TrackBack (0)
April 14, 2005
Day Updates
Permalink

The Day by Day Database is updated through games of Wednesday, April 13, 2005.

Posted by StatsGuru at 08:22 AM | Comments (0) | TrackBack (0)
April 13, 2005
DIPS Developments
Permalink

Nate Silver shows some research at Baseball Prospectus that pitchers with good change-ups have more control as to whether balls put into play turn into hits than other types of pitchers.

Posted by StatsGuru at 02:53 PM | Comments (1) | TrackBack (0)
New Day
Permalink

The Day by Day Database is up to date. In case you missed it, pitcher comparisons are now available.

Posted by StatsGuru at 08:39 AM | Comments (0) | TrackBack (0)
April 12, 2005
Comparing Pitchers
Permalink

The latest addition to the Day by Day Database is pitcher comparisons. For example, here's a listing of the pitchers with the best K per 9 during the current decade. I hope you find this useful.

Posted by StatsGuru at 07:19 PM | Comments (7) | TrackBack (0)
Up With Updates
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:02 AM | Comments (0) | TrackBack (0)
April 11, 2005
Daily Data
Permalink

The Day by Day Database is up to date.

Posted by StatsGuru at 08:30 AM | Comments (1) | TrackBack (0)
April 10, 2005
Day Break
Permalink

I'm on the road, but Panera Bread has free wireless access in their stores. So I'm able to eat breakfast and update the day by day database at the same time. It's now current through Saturday April 9, 2005.

Posted by StatsGuru at 08:57 AM | Comments (0) | TrackBack (0)
April 09, 2005
Database Update
Permalink

The Day by Day Database is updated.

Posted by StatsGuru at 09:25 AM | Comments (0) | TrackBack (0)
April 08, 2005
Update
Permalink

The Day by Day Database is updated.

Posted by StatsGuru at 08:09 AM | Comments (0) | TrackBack (0)
April 06, 2005
Your Donation Dollars at Work
Permalink

Thanks to your donations, the Day By Day Database will be updated daily. Baseball Info Solutions is supplying me with the data for a very reasonable cost. Stats are now current through yesterday.

If you'd like to contribute to this site, you can do so by clicking on one of the icons below:


Amazon Honor System

Click Here to Pay
Learn More









If everyone who visits in April donated $1, this site could run independently for year. If you're a regular reader, consider giving $10. Why not join the almost two hundred readers who have donated so far?

Posted by StatsGuru at 01:54 PM | Comments (1) | TrackBack (0)
April 04, 2005
Day Care
Permalink

A new addition to the Day By Day Database, pitchers game logs. Check it out and let me know if you see any problems.

Posted by StatsGuru at 08:36 PM | Comments (1) | TrackBack (1)
March 29, 2005
How Old are You Now?
Permalink

The Boston Herald continues its series on the state of baseball with a look at the aging of players:

Take a guess how many of today's major leaguers are under 21?

Zero, that's how many.

The 40-year-old baseball player was once a genuine oddity. It's still not as if the majors are teeming with geezers today, it's just that they're getting more attention. Forty-year-old Barry Bonds is the defending National League batting champion. Roger Clemens, at age 42, led the NL in winning percentage last year with an 18-4 record. And then there's that absolute freak of nature, Braves first baseman Julio Franco, who hit .306 last season. Franco celebrated his 46th birthday last August.

Sometimes, however, perceptions don't reflect reality. The Lahman database has enough information to create charts of age for the major leagues. Plotted below are the average age of batters, weighted by plate appearances, and the average age of pitchers, weighted by innings pitched. Click on each graph to get a larger view.

Chart of Average Age per Plate Apperance.


Chart of Average Age per Inning Pitched.

Indeed, the trend in age is up. It's also pretty clear that the money generated by free agency is a big reason. There was a huge dip from around 1950 to 1970 in age. During that time, the owners cemented their grip on the players with the reserve clause and the amateur draft. Unless you were a superstar, there was no reason to stay in baseball into your late 30s. The start of the trend up corresponds to the free agent era. From the article:

There are good and plentiful reasons why today's players hang around, or try to, longer than their counterparts of yesteryear. For one thing, superior conditioning methods (and in some cases, chemical enhancement) mean that they can linger. For another, modern-day salaries offer an irresistable inducement. A few generations back, a 35-year-old future Hall of Famer could make more money selling cars, with less risk of embarrassment, than he might have with another year's baseball salary. Moreover, in today's free-spending baseball world, the only way some teams can sign free agents is by offering long-term, multi-year, guaranteed contracts likely to extend well beyond the useful shelf life of the player.

One thing many have noticed about the numbers generated by the probabilistic model of range is that it was a below average year for baseball in general. Could part of this be having older players taking the field? If not for the anomaly of World War II, 2004 would have had the oldest batters (and position players) in the history of baseball. I have to believe that's hurting defense.

As I think about a number of teams this season, I'm struck by how old they are getting. The Giants, Yankees and Red Sox look especially long in the tooth to me. Last year was the year of the old player. Will this be the year that age catches up to them?

Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 11:43 AM | Comments (2) | TrackBack (0)
March 28, 2005
Day Dream Believer -- The Day by Day Database
Permalink

A new improvement to the Day by Day Database. You can now compare players across time periods. For example, here's how everyone stacks up vs. Sammy Sosa during June of 1998. You can come up with any combination of dates, home-road, vs. team, on team and park. And you can sort by any numeric field. If you find any bugs, please let me know in the comments.

I'm going to make this post the entry point for the Day by Day Database. (would people prefer Day by Daytabase?)

Daily batting logs for individual players.

Compare players over a given time period.

I hope you find this a useful research tool.

Update: Pitcher day by day logs are now available as well.

Update: Pitcher comparisons are now available. See who has the best ERA or allowed the most HR over a period of time.

Update: Current hit streaks can be found here.

Update: Team starting pitcher log. This allows you to look at the starters for a team over a given period. See who is pitching when, against what teams, and get an overall record for the starters over the selected dates.

Update 12/7/2005: Batter Splits and Pitcher Splits are the new addition to the Day by Day Database. See the explanation here.

Update 12/10/2005: Team splits are now available. Pick team batters or pitchers, then set your parameters and see the results!

Update 12/19/2005: Added new split functionalities, Batter Splits by Season and Pitcher Split by Season. See the explanation here.

Update 3/19/2006: Added Batter Split Comparisons and Pitcher Split Comparisons. You can see examples of these in this post.

Update 6/2/2006: Added an RBI Percentage Chart to the database. See this post for an explanation.

Update 1/21/2007: Added Team Record with Players in Game. The explanation is here.

Update 7/19/2007: Added League Splits. This is great for finding out league averages over a given time period and for particular splits.

Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 11:35 PM | Comments (25) | TrackBack (8)
March 20, 2005
A Day in the Park
Permalink

Another addition to the Day by Day Database (or, as someone suggested, the Day by Day-tabase). Now you can select the park for study. This allows you, for example, to see how Jason Kendall hit in both Three Rivers and PNC.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 06:43 PM | Comments (1) | TrackBack (0)
March 18, 2005
Days of our Lives
Permalink

The Day by Day Database has new functionality. Now, you can choose the team the player is on, and the team he's playing against. Here's Mark McGwire playing for the Athletics against the Yankees.

This gives you any number of combinations; home, road, for a team, against a team. So you can see how Barry Bonds did as a Giant in road games vs. Pittsburgh. Enjoy!


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 05:33 PM | Comments (4) | TrackBack (0)
March 16, 2005
Day Trippers
Permalink

The Day By Day Database has new functionality. Now you can specify all games, just home games, or just road games.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 08:27 PM | Comments (0) | TrackBack (0)
March 15, 2005
Days and Days
Permalink

The Day By Day Database is coming along. Their are clearer instructions on how to use the pages, and there is a summary of averages at the bottom of the data table. Start by choosing a player here.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 11:30 AM | Comments (2) | TrackBack (0)
March 14, 2005
Daily Batting Logs
Permalink

I've been working on getting an interactive database of daily results working. I now have daily batting stats for players from 1974 to 2004 available. You can pick any time frame for an individual player. This is just a first pass, so there's a lot of functionality I have not added. But you can get a taste.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Follow this link to get a list of players. Choose one, and you'll get a new screen with biographical information and a form to choose the start and end dates. The dates, by default, are the first and last game for that player in the system. If you type in two legitimate dates (in the form m/d/yyyy) you'll get a table of his accomplishments during the time period. Enjoy! It's your pledge dollars at work!


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 09:50 PM | Comments (6) | TrackBack (0)
March 11, 2005
A Century, Graphically
Permalink

Futility Infielder, commenting on an Aaron Gleeman post, points to a course offered at Tufts entitled "The Analysis of Baseball: Statistics and Sabermetrics." That's just way cool.

From that site, however, I found a link to A Graphical History of Baseball. One of my favorite charts is this one, showing the improvement in fielding averages over time. I believe the increase in fielding percentage is the most compelling reason for believing players are always getting better; the players of today have always been better than the players of yesteryear. Batting and pitching tend to even each other out; it's hard to see improvement in one over time when both are improving, keeping the averages about the same. But nothing works against fielding. Yes, part of it is better equipment and better grounds. But the constant rise indicates that baseball players are just getting better with time.

Update: Fixed broken link.

Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 01:16 PM | Comments (3) | TrackBack (0)
Minor Stats
Permalink

Statsology has news on MLBAM taking over the collection of minor league statistics. It looks like we'll be able to get in game data for the minors as easily as for the majors now. With the exception of the fantasy licenses, I really like the direction MLBAM is taking. They're using the internet to try to increase the popularity of baseball world wide.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 01:06 PM | Comments (2) | TrackBack (0)
March 08, 2005
Your Pledge Dollars at Work
Permalink

I'm getting the hang of working with the database manager on my host. I've also collected day-by-day data from Retrosheet and am planning on making it available with a nice web interface. When it's done, you should be able to select a player, pick a start and end date, and see the day-by-day lines with a total at the end. There's lots more I can do with the data, and I hope to build a nice little database.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Right now, all you can do is select a player and see some biographical information. Give it a try.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 08:57 PM | Comments (1) | TrackBack (0)
March 07, 2005
Stats Domain
Permalink

Statsology has a nice roundup of the issues involving licensing statistics from MLBAM. Like Bud Selig on the issue of steroids, there seems to be a conflict in MLB's position now and in the past:

Baldas quotes MLB Advance Media's Jim Gallagher, senior VP of corporate communications: "Player statistics are in the public domain. We've never disputed that," Gallagher said. "But if you're going to use statistics in a game for profit, you need a license from us to do that. We own those statistics when they're used for commercial gain."

But IP lawyer Kent Goss is quoted as citing an interesting 2001 case in which MLB themselves claimed that player names and statistics were (as far as I can interpret) both in the public domain and free for others to profit from, and the California Court of Appeal upheld MLB's right to use the names and stats of historical players. "A group of former players sued MLB for printing their names and stats in game programs, claiming their rights to publicity were violated," Goss said. "But the court held that they were historical facts, part of baseball history, and MLB had a right to use them. Gionfriddo v. Major League Baseball, 94 Cal. App. 4th 400 (2001)."

It's my opinion that MLBAM should have kept the fees low and encouraged more fantasy games. Fantasy games are a growth industry; they create fans for major league baseball, and those fans spend money in the MLB.com store, attend MLB games and watch the advertising during broadcasts that keeps the teams running. They should be encouraging the growth of the industry with low license fees. If a court finds that the MLBAM has no right to license the stats, they'll end up with nothing.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 04:39 PM | Comments (2) | TrackBack (0)
March 03, 2005
Hipp to Pickering
Permalink

Brian Hipp writes about this article on Calvin Pickering:

I thought you might be interested in this article from the KC Star. It's an article on Calvin Pickering and why sabremetricians prefer him over Ken Harvey. It's the first time I've seen the term "PECOTA" used in a daily newspaper. It also quotes Nate Silver and Rany Jazayerli. You probably don't know much about Pickering, but he has become somewhat of a cause celebre for Rob Neyer, Rany Jazayerli, and the sabremetric-minded community of Royals fans.

The article is interesting because it provides a pretty good outline of the sabremetric case for Pickering, but the author just can't help himself from arguing that Pickering's just too fat to actually play. The article is pretty funny because it lays out the case for Pickering like this: On one hand, Pickering has lots of power, walks a lot, and has tons of production in the minors; on the other hand, he's pretty fat. It also completely ignores that Ken Harvey is also fat and a poor fielder, and that Pickering's just going to DH anyway.

But even though the author and the Royals organization isn't convinced of the case for Pickering, it's nice to see some analysis of some sabremetric tools in our local sports page. I'm not holding my breath, waiting for the KC Star to mention how Angel Berroa ranks near the bottom of your Probabalistic Model of Range, but it's a start.

This is the third article I've linked to in the last few days that is at least addressing sabermetic issues and using stats other than batting average, RBI and stolen bases.

Here's Pickering's minor and major league career, so you can judge for yourself. I sometimes make the argument that if your problem is at first base, it should be easy to solve. Pickering is a perfect example of that. He probably hits better or as well as players making millions more. Now that he's actually lifting weights, at peak age, he could be a big surprise. Let's keep an eye on what kind of numbers he puts up this spring.


Baseball Musings is holding a pledge drive during March. Click here for details.

Posted by StatsGuru at 09:31 PM | Comments (1) | TrackBack (1)
January 26, 2005
Sabermetric Tracking
Permalink

Red Sox Stats is a site that tracks sabermetric numbers for the Red Sox and their minor leaguers. It also maintains stats for all major league players. A useful reference.

I also learn from the site that the Mets got their second choice, trading for Boston first baseman Doug Mientkiewicz. Boston may have picked the Mets pocket here. They get single-A first baseman Ian Bladergroen (what a great name!). Ian had a great season before a wrist injury sidelined him. Red Sox Stats lists the Sabermetric numbers for the duo, and Ian looks much better offensively than Doug. In a couple of seasons, this could turn out to be a very good trade for Boston.

Meanwhile, the Mets go from wanting one of the premier sluggers in the game to one of the premier defensive players. Doug's had an excellent OBA in the past; he needs to get it back in the .370 range to contribute offensively. He's never been a power hitter, and Shea will only make that worse. He's there to catch the ball.

Update: The Baseball Crank has more on the trade, and a lot more information on Ian Bladergroen.

Posted by StatsGuru at 08:28 AM | Comments (3) | TrackBack (1)
January 20, 2005
On the LIdge
Permalink

Studes leaves a comment to this post on Win Probability Added.

I didn't include specific team scoring in these stats. I took it out and inserted league-average offense instead.

I"m not trying to measure "clutch" performance. I'm trying to find who contributed the most to his team's wins, based on how well he pitched and when he was used by the manager. It's a value stat, just like Win Shares, as I've described in the article. It's only partially an ability stat. As such, it's predictive value is less than a pure ability stat.

I know you like Win Shares. In my opinion, this is a better win-based stat.

I'm curious that you say this doesn't tell us anything we didn't already know. Where else, other than other WPA listings, is Lidge rated the best reliever in the majors last year?

Well, win shares rates Lidge as the best reliever in baseball. Unless I'm misreading something, Lidge has 17 win shares, while Nathan in next at 16.

Of course, there are many stats you could use to arrive at this decision. Lidge has a great relief ERA along with more innings than any reliever in the NL top 20. He blew away the competition in terms of inherited runners scored.

So Lidge had a great ERA, pitched a lot of innings for a reliever and was exceptional at preventing runners on base from scoring. From these, one could make a good argument that he was the best reliever in the league last year. On top of that, Win Shares agrees. And of pitchers who faced over 300 batters, Lidge had the best DIPS ERA in the NL, 2nd only to K-Rod in the majors.

WPA is a stat that favors middle relievers over closers because managers today use setup men in game critical situations. This does not mean that the setup man is a better pitcher; it just means that managers are misusing their staffs. As we saw this year, when someone excels in this role, they're turned into a closer. Lidge is less likely to do well in this stat in 2005, simply because he won't be used as much in situations with men on base and the score close. He'll also pitch fewer innings. It doesn't mean he still won't be the best reliever in baseball.

Posted by StatsGuru at 08:25 AM | Comments (9) | TrackBack (0)
January 19, 2005
Rating Relievers
Permalink

Studes has an excellent study at The Hardball Times on ranking relief pitchers by Win Probability Added (WPA) (hat tip, Sabernomics). Studes uses a method where he calculates the probability of winning the game when the pitcher enters and leaves the game. The difference is the WPA for that game. If the pitcher has done his job, the probability has gone up, and he gets a positive mark. If the pitcher has allowed too many runs, the probability goes down, and the pitcher gets a negative mark. Brad Lidge was the best reliever in the majors using this metric.

I've seen this type of analysis used before to find clutch hitters (you do the same thing, except look at how the AB changed the team's probability of winning). What Studes is doing here is finding out who pitched well in clutch situations. I'm not convinced this tells us something we don't already know. If it's another way of measuring wins (or saves or holds), but we already know that's a team stat, effected by the offense as well as the pitcher. If it's another way of measuring clutch performance, you need to believe clutch performance exists to get any value from this stat. I believe that the best players tend to do well in clutch situations; that's why they are the best players. I very interested to see how well this holds up from year-to-year.

Posted by StatsGuru at 04:45 PM | Comments (2) | TrackBack (0)
January 18, 2005
DIPS Ahead
Permalink

Jay Jaffe at Fultility Infielder has completed a Defense Independent Pitching Statistics (DIPS) study of 2004. Great work as always by Jay with help from Larry Mahnken, whose DIPS worksheet you can download here.

Two columns I particularly like from the study are the Lower/Higher dERA than ERA columns; leader boards that point out who was unlucky and lucky in 2004. Looks like there were a number of Mets in the lucky (higher dERA) column. Leiter, Trachsel, V. Zambrano and Glavine all make the top 10. (Good defense and a tough park?) It also looks like unlucky Todd Van Poppel might be worth a look as a free agent. And of course, there's Derek Lowe, high on the list of unlucky pitchers. The Dodgers are counting on that pendulum to swing back.

Enjoy!

Posted by StatsGuru at 10:00 AM | Comments (2) | TrackBack (1)
January 17, 2005
Two Dips and a Dad
Permalink

Larry Mahnken of the Hardball Times and the Replacement Level Yankees Weblog sends me a great tool, a DIPS worksheet. I'm also making it available by clicking here (it's a zip file). Larry's explanation:

They're incredibly easy to use, and they feature Park Factors for every season since 1969. You can calculate a pitcher's entire career with them, using up to 30 team-seasons (more than you'll ever need). It automatically combines multiteam data if two to five consecutive columns are in the same season. You can choose to park-adjust HRs, SOs and BBs, some, or none at all, with a single click. They're better than ESPN's DIPS stats, because they're adjusted for parks, lefties and knuckleballers, and can also be used for past seasons.

Enjoy!

Update: Finally got the download link right.

Posted by StatsGuru at 02:49 PM | Comments (2) | TrackBack (0)
January 10, 2005
Summarizing Sabermetrics
Permalink

Sully at the house that dewey built does a wonderful job of summerizing the study of baseball stats. He also seems to be a very open minded fellow.

Posted by StatsGuru at 03:29 PM | Comments (0) | TrackBack (0)
November 29, 2004
Cranking Out the Shares
Permalink

The Baseball Crank presents a study on how well Established Win Share Levels did in predicting win shares this year. He was most interested in seeing how win shares need to be adjusted for age. In looking at his chart, it's interesting to note that the boundary for going from adjusting up to adjusting down is at age 29, just where you'd expect. Peak performance is in a player's late 20's, and by age 29 most players have already had their best years.

Posted by StatsGuru at 10:04 AM | Comments (0) | TrackBack (0)
November 19, 2004
More On Contact
Permalink

Steve Lombardi takes a swipe at the contact vs. non-contact argument. I haven't had time to digest the article, but it strikes me that Steve isn't quite measuring the correct thing. Let me know in the comments what you think.

Posted by StatsGuru at 10:52 AM | Comments (9) | TrackBack (0)
November 17, 2004
More Productive Outs
Permalink

Repoz at Baseball Primer has a discussion going on about this article on how the Angels are encouraging their players to advance runners, even with productive outs. I'm not impressed.

As one of the commentors at Primer says:


Mikael is right that there is a good point buried under all the junk. Too bad there's SO much junk!

The point of Mr. Smith's article is one Bill James made 20 years ago. Given two teams with the same OBA, the team with the higher batting average will have the better offense. Hits are simply more valuable than walks in advancing baserunners.

If the Angels philosophy is to make contact, I can't argue with that. It worked well for them against a poor fielding Yankees team in the 2002 playoffs. If it's bad for pitchers to strike out few batters, then it should be good for offenses not to strike out very often. The Angels do that very well, and they should be commended.

But the author misses the A's philosophy. It's not "draw walks." It's "don't swing at bad pitches". It seems to me that the Angels philosophy will be that it's okay to swing at bad pitches if you advance a runner. If that's true, it's wrong. There are very few batters who can be successful in the long term doing that (Ichrio, Puckett and Gwynn come to mind), but most players will be more like Alfonso Soriano, who just expands the strike zone and end up striking out more. And as Mr. Smith wrote, you can't make a productive out with a strikeout.

The Angels have the personnel that make contact without expanding the strike zone. I like that. As I wrote during the 2002 World Series:


They've shown that aggressive style throughout the playoffs. What I love about watching this team is that they know how to hit. So many teams go up and just swing for the fences. The Angels are trying to make contact, and when they do they really drive the ball. Eckstein chokes up on the bat! I never see anyone do that anymore. Get the bat on the ball and good things will happen. I'm glad the Angels are teaching us that again.

I just hope they understand swinging at good pitches is part of making contact. Otherwise, we'll see their strikeout totals go up and their valued efficiency go down.

Posted by StatsGuru at 04:12 PM | Comments (3) | TrackBack (0)
November 06, 2004
WRAP Up
Permalink

The NY Times has an interesting article on applying game theory to measure the worth of a player.


The method's logic is actually very simple: every confrontation between pitcher and batter affects, however marginally, each team's chances of winning. With various numbers of outs and men on base, a double or a strikeout or even a runner-advancing grounder either adds or subtracts a specific amount from the inning's run-scoring potential. Depending on the game's inning and score, each of those amounts takes on varying significance to the final outcome.

"It's just our way of looking at the world from studying game theory," Lonergan said. "Each team starts the game with even probability, and ends at either 0 or 1. In between, you're looking at what the players are doing for their team."

An example: when Beltre stepped to the plate on Aug. 23 with the Dodgers down, 7-4, in the top of the seventh in Montreal, Los Angeles had a 12.17 percent chance of coming back to win. (This percentage, derived from extensive data from the entire season, would have been 1.29 percent if there were two out in the ninth.) Beltre delivered an R.B.I. single, making the score 7-5 and the Dodgers' chances of winning 19.11 percent.

So Beltre was awarded with the difference in percentages, or 0.0694, of what Lonergan and Polak call Wins Relative to Average Player (WRAP); and on the other side of baseball's double-entry bookkeeping, the pitcher who surrendered the hit, Luis Ayala, was credited with a minus 0.0694.


They then use this system to show that Sheffield and Bonds should be the MVPs. However, the system also shows that Nathan and Gagne should be the Cy Young award winners.

A.L. CY YOUNG Despite having a higher E.R.A., Schilling tops Santana, 5.15 to 5.01, because he often performed in hitter-friendly Fenway Park (yes, WRAP accounts for this) and because he thrived in particularly tight situations. But the two were bested by Twins closer Joe Nathan (5.47), whose 1.62 E.R.A. and 44 saves do not truly quantify how many games his late pitching helped decide. (WRAP leans toward relievers because, although they influence fewer at-bats, the at-bats are inherently more crucial.)

It strikes me that this system isn't good at picking Cy Young award winners. There really should be a stamina component to that award. My other problem with the system is that, like linear weights, it can only be accurately evaluated after the season is over, when all the probabilities are correctly known. Just as an example, in a low run environment, the probability of coming back from a 2-run deficit is lower than in a high-run environment.

Still, it's nice to have another tool in the in the drawer, and this does appear to be a good way to measure how clutch a player turned out to be in a season.

Posted by StatsGuru at 09:09 PM | Comments (3) | TrackBack (0)
October 21, 2004
Victory for Sabermetics?
Permalink

Jeremy Senderowicz writes:


You think anyone will write about how this comeback proves the superiority of sabermetrics? (After all, everyone wrote the reverse about the A's collapse.)

The problem is, the Yankees are nearly as sabermetric as the Red Sox. Just because they don't have Bill James working for them doesn't mean they're not as attuned to the stats as Boston. Over the last decade, the Yankees have built their teams around

  • Batters who get on base and hit for power

  • Pitchers who strikeout a lot of batters and don't walk many

  • Pitcher who give up few HR


That's why the Yankees are always near the top of the Beane Count. You'll notice this year they finished behind the Red Sox, and the big reason why was HR allowed. That's what bit them in this series as well, especially last night.

So it was a big win for sabermetrics to have these two teams play each other. But it's even bigger than that. The Dodgers now have a sabermetric oriented GM, and they won their division. The Rangers are using sabermetrics, and they had a surprisingly good season. The Astros hired a consultant from Yale a couple of years ago. San Diego has a GM from the Sabermetric mold, and he built the team to win as they entered their new ballpark. The only real sabermetric failure was Toronto this year. I think that's a very good track record.

And watching Cleveland and Cincinnati, the people running those clubs know what they are doing as well. And they even have managers who are with the program. I'm looking for a lot of good baseball out of Ohio in the next few years.

Posted by StatsGuru at 10:10 AM | Comments (9) | TrackBack (0)
October 18, 2004
More Shameless Plugs
Permalink

I've been busy the last two weeks working with the terrific crew at Baseball Info Solutions helping to put together The Bill James Handbook: 2005. It's a great resource for your fantasy scouting. The book also allows you to spend the winter exploring players in depth, since


  1. It's available November 1.

  2. It has all the stats!


There also are statistics that look at managerial strategy, win shares, lefty/righty and new material from Bill James. Order now so it's on your self Nov. 1.

Baseball Info Solutions now has season final and lefty/righty stats available for download. Just stop by our store and pick up what you need!

Posted by StatsGuru at 02:35 PM | Comments (1) | TrackBack (0)
October 08, 2004
Odds of Winning
Permalink

ESPN just flashed a graphic that since the playoffs went to the 2-2-1 format, the team that won game 3 in a series tied 1-1 has won the series 11 of 14 times. I love that. There are four possible outcomes to the final two games of a series:

  1. The trailing team loses two games (although the 2nd game never gets played).
  2. The trailing team wins the first but loses the 2nd.
  3. The trailing team loses the first but wins the 2nd (even though the 2nd game is never played).
  4. The trailing team wins both games.
If the teams are evenly matched, then each of these situations have a .25 probability of happening. In three of these situations, the team trailing 1-2 loses the series. Three quarters of 14 is 11! So the record of teams who win the third game of a tied series is exactly what the odds would predict!
Posted by StatsGuru at 10:58 PM | Comments (1) | TrackBack (1)
September 30, 2004
Sortable Win Shares
Permalink

Bryan Donovan has a very nice program here for sorting win shares. Check it out. I notice that the leaders for KC both have the pitiful low totals of 13. The are Mike Sweeney, Royal for life, and Carlos Beltran, who has spent half his season in Houston!

Posted by StatsGuru at 10:40 AM | Comments (2) | TrackBack (0)
September 21, 2004
Beane Counters
Permalink

It's been a good year for the Beane Count. The top 4 AL teams in the category would all make the playoffs if the season ended today. In the NL, three of the four playoff leaders are in the top four of the count.

Posted by StatsGuru at 12:44 PM | Comments (1) | TrackBack (0)
August 07, 2004
Productive Outs are Still Outs
Permalink

I was just over at the ESPN.com Baseball statistics page to look at the Beane Count, and noticed that their link for productive outs is in bold. I followed it to the team page, and what did I find? Their is one team in major league baseball that is heads and shoulders above the other in productive outs. Not only does this team's batters have the higest percentage of productive outs, but their pitchers have the lowest percentage of productive outs allowed! This must be the greatest team in the history of baseball! They're productive on both sides of the ball!! What powerhouse is so good at productive outs?

The Montreal Expos.

I rest my case.

Posted by StatsGuru at 02:00 PM | Comments (4) | TrackBack (0)
August 03, 2004
Durable ERA
Permalink

Dean's List has an interesting article that tries to adjust ERA for the number of innings a starter pitches. I understand the point, but I don't think the number is very intuitive.

Posted by StatsGuru at 09:20 PM | TrackBack (0)
July 27, 2004
Tables and Graphs
Permalink

Major League Charts is a new web site that allows visitors to create a number of graphs using various statistics. I think they have something very good here, but it still needs some work. For example, it's too slow for a dial-up modem (I have a DSL line, and I thought it loaded very slowly). However, user feedback from a group as intelligent as the readers of Baseball Musings would help improve the site greatly. So, if you have broadband, take a look and send suggestions.

Posted by StatsGuru at 05:29 PM | Comments (8) | TrackBack (0)
July 12, 2004
Win Shares
Permalink

Just in case you were trying to use my sortable win shares program, it was broken and is now fixed.

Posted by StatsGuru at 11:47 AM | Comments (5) | TrackBack (0)
July 04, 2004
Streak Difficulty
Permalink

The Sports Grinder looks at the difficulty of a save streak vs. a hit streak. It's a good seat of the pants calculation. I wrote about calculating the probability of hit streaks here.

Posted by StatsGuru at 07:41 AM | TrackBack (0)
July 02, 2004
GWRBI
Permalink

Andrew Godfrey of Baseball, Etcetera writes:


I don't remember the rationale for not including the GWRBI in boxscores any more. It was one of my favorite item to look for in a boxscore. Maybe you can clue me in to why it isn't used nowadays.

I love the Game Winning RBI not because it was such a great stat, but because it demonstrated how nearly impossible it is to define clutch ability. The people who wanted this stat included in boxscores thought that there was some unique ability to drive in runs in game situations that was being missed. The stat was defined as:

Credited to the batter who drives in the run that gives his team a lead that it never relinquishes.

Which I believe is a very good definition for the stat. However, the GWRBI proved to be a disappointment to some for two reasons:

  1. The people who had the most RBI tended to have the most GWRBI.

  2. A lot of GWRBI game in the early innings of games, which clutch proponents didn't think constituted a clutch situation.


The GWRBI showed something that sabermetricians had already known; the best players are the clutch players, and teams that get the early lead tend to win. Since the GWRBI brought nothing new to the discussion (and it was embarassing to the pro-clutch (Elias) stat keepers) it was dropped.

Posted by StatsGuru at 10:13 AM | Comments (6) | TrackBack (0)
June 24, 2004
OBP Rap
Permalink

Powered by audblogaudio post powered by audblog

Posted by StatsGuru at 05:46 PM | Comments (2) | TrackBack (0)
Unproductive Outs
Permalink

Larry Mahnken at the Hardball Times does a study of productive outs, and (surprise, suprise) finds that the small correlation they have with winning percentage is negative. Great job, Larry.

Posted by StatsGuru at 02:36 PM | Comments (2) | TrackBack (0)
June 14, 2004
Cranking up the Win Shares
Permalink

The Baseball Crank has been looking at win shares as well.

Posted by StatsGuru at 08:00 AM | TrackBack (0)
June 13, 2004
Sortable Win Shares
Permalink

The good people over at the Hardball Times are doing a great job of getting interesting statistics onto the internet. I love their win shares pages (here's the AL page), but I really want to be able to sort all these columns.

Of course, it's not that hard. All you need is the proper script and you have sortable win shares!

The program to sort the AL page is here.

And here's the NL page.

Once you are there, just click on a column to sort based on that particular field.

Posted by StatsGuru at 07:13 PM | Comments (1) | TrackBack (0)
June 09, 2004
Established Win Shares
Permalink

The Baseball Crank finishes up his established win shares research with a look at the NL Central. This method shows three great teams and three horrible teams, not the tight race we're seeing today.

Posted by StatsGuru at 08:53 AM | Comments (1) | TrackBack (0)
May 26, 2004
Odds Are
Permalink

Jeff Haney offers an interesting look at how over-under lines help show the tendancies of new ballparks. Gambling odds are very interesting. They're a way of distilling the thinking of a large number of experts into a single number, much like the way sabermetricians try to resolve the various offensive, pitching and fielding stats into wins.

Posted by StatsGuru at 01:30 PM | TrackBack (0)
May 17, 2004
Win Shares
Permalink

The Hardball Times has their win shares pages up. So far, six players have nine win shares; I-Rod, Michael Young, Jose Guillen, Scott Rolen, Sean Casey and Mike Lowell. And they're all tied for 2nd. Barry Bonds, is all by himself in first with 11.

The other interesting thing win shares confirm is that Jeter is having a good season fielding. I believe HBT is doing the full win shares calculation, not just an estimate based on games at a position. Jeter ranks third among AL shortstops with 1.8 defensive win shares. It also shows Alex Rodriguez is the best defensive third baseman in the AL. So at the moment, I'm going to say I was wrong. I thought leaving Jeter at short was the wrong thing to do, but right now it appears that the Yankees are actually stronger with this arrangement.

Posted by StatsGuru at 09:07 PM | Comments (3) | TrackBack (0)
May 05, 2004
Futile Stats
Permalink

Jay Jaffe at Futility Infielder takes on the productive outs crowd, or as he refers to them, the flat earth society.

The more I mull over productive outs the more incorrect I find it. One big problem is that you are only looking at situations in which outs are made, (Productive Outs)/(All Outs in Productive Situations). But why not count all successes in this situation? ((Productive Outs) + (Base Advancing Hits and walks))/(All Productive Situations) If getting a runner on first with no outs into scoring position is such so tremendously important, why not count the guys who do it by drawing a walk or getting a single? I'm going to try to work on this over the weekend.

Posted by StatsGuru at 01:09 PM | Comments (10) | TrackBack (0)
April 30, 2004
Productive Outs
Permalink

Daniel Shamah writes:


Hope you're enjoying the game. Buster Olney posted another article on ESPN.com about so-called "productive outs." I've noticed in every game ESPN calls, they not only talk about this bogus stat, but they make a point of talking about the Marlins won the Series with "productive outs." John Kruk went so far as to say he thinks Juan Pierre is MORE valuable than Bonds, because he starts rallies with bunt singles. Nevermind that Pierre and Castillo combined for a whopping 3 runs scored in the Series, and two of those came in the first game; Jeff Conine outscored them himself. Their bias against the patience/power combo that the A's and Yanks employ is ridiculous; you'd think they could find one analyst who disagrees. Or at least give Neyer some airtime on BBTN.

Anyway, I vaguely remember you had a posting on productive outs when ESPN first released them a year ago. And if I remember correctly you shredded their value as an offensive metric. Think you could post it again, or at least email a link to the piece? I remember enjoying it a lot.


This is the Olney article to which Daniel refers. The basic argument is: here's a stat, this team is good at it, this team won, so it must be important to be good at that stat.

My previous posts on this subject:
Outs, Productive Outs and the Unproductive People Who Write about Them

Productive Outs Definition

Significance

Posted by StatsGuru at 10:16 AM | Comments (6) | TrackBack (8)
April 28, 2004
Hardball Pitching
Permalink

The Hardball Times now has some interesting pitching stats on their site to go along with their batting stats. I like that they are listing DER for each pitcher, so you get a feel for how much the defense is helping or hurting the hurler.

Posted by StatsGuru at 09:53 AM | TrackBack (0)
April 26, 2004
Hardball Stats
Permalink

The Hardball Times has its first page of statistics posted. Instead of putting up the usual numbers you can find anywhere, they are posting runs created and gross production average. One stat I really like is line drive percentage. Liners tend to fall in as base hits, so a good hitter should have a high percentage.

Full Disclosure: The stats on The Hardball Times site are supplied by the company I work for, Baseball Info Solutions, and I had a hand in writing the code that prepares those reports.

Posted by StatsGuru at 10:57 AM | Comments (9) | TrackBack (0)
April 09, 2004
Game Scores
Permalink

I noticed in looking at the boxscores on ESPN that they are listing game scores for the starting pitchers, and they are keeping track of the best game scores this year. In looking at the leaders, it's interesting that there hasn't been a truely outstanding performance yet (game score > 90). Roger Clemens, the old, former retiree has posted the best so far, with an 81 vs. the Giants.

Posted by StatsGuru at 01:56 PM | Comments (3) | TrackBack (0)
April 01, 2004
DePodesta: OBA Doesn't Matter
Permalink

Paul DePodesta admitted this morning that new studies show that batting average, not on-base average, is the most important batting stat.


"I was using a shift operation instead of multiplying by two, and didn't account for the high order bit," DePodesta explained. "Batting average blows OBA away. I just hope Billy Beane didn't make too many poor decisions based on my research. And I'm really sorry about all those old school scouts who lost their jobs."

Beane could not be reached for comment, but he was last seen heading to the A's computer room with a baseball bat.

Posted by StatsGuru at 08:06 AM | Comments (5) | TrackBack (0)
March 17, 2004
Economics Loss, Our Gain
Permalink

The Sports Economist and Daniel Drezner each link and comment on an MLB.com article about Bill James. It turns out Bill was studying to teach economics, but kept applying what he learned to baseball. I, for one, am glad he chose the path he did. The Sports Economist also talks about a paper by Gerald Scully on how he used K/BB to predict how underpaid pitchers were before free agency.

Posted by StatsGuru at 08:42 AM | TrackBack (0)
March 13, 2004
Moving Up
Permalink

Chris at A Large Regular has compiled a list of milestones we may see reached in 2004.

Posted by StatsGuru at 12:12 PM | Comments (1) | TrackBack (0)
February 27, 2004
Graphing Pitchers
Permalink

Eric McErlain points me to this post by my favorite Canandian blogger, Colby Cosh. In the post, Colby creates a 2-D projection of the K/Inn, BB/Inn and HR/Inn of each ERA qualifying pitcher from 2003. Basically, if you are in the upper left corner of the graph, you're very good.

Some observations:


  • Kevin Brown is located right between Clemens and Pettitte. He's a good replacement for either.

  • Vazquez is better than both Clemens and Pettite, so he should be an upgrade for NY.

  • The Yankees are the only team with three pitchers in the upper left quadrant.

  • Tim Wakefield should be the Red Sox third starter, not Lowe.

  • The only difference between Mulder, Zito and Hudson is the number of people they walk.

  • Kerry Wood is an extreme outlier. He has the highest K per 9, but also walks a ton of batters. By this chart, Prior is clearly the #1 starter on the Cubs.

  • If you walk a lot of batters without striking many out, you don't get enough innings to qualify for the ERA title.

  • If Clemens and Pettite can get Wade Miller to walk one less per 9, the Astros will have three pitchers in the upper left quadrant on their team this year.


Nice work from the Great White North!

Posted by StatsGuru at 02:25 PM | Comments (10) | TrackBack (2)
February 20, 2004
The Midnight Hour
Permalink

This is just to remind us that there is more to baseball than statistics.

Posted by StatsGuru at 04:39 PM | Comments (2) | TrackBack (0)
February 10, 2004
Sabermetrics vs. Sports Writers
Permalink

David Damiani at The American Enterprise Online gives a spirited defense of sabermetrics: (and gives Aaron Gleeman a great plug!).


Sabermetrics are a threat to many of baseball’s long-standing party lines, such as issuing paeans to those who “manufacture” runs (giving up outs to advance runners); dismissing “one-dimensional” players who walk frequently and hit for power; focusing on errors rather than range in judging fielders; and emphasizing wins and saves as the best measures of pitching performance. As a result, the sabermetric teams and their leaders--all successful on the field in recent years--are open to an endless barrage of media criticism.

Damiani recognizes what is driving the sabermetric revolution:

Most importantly, though, sabermetrics is a largely fan-driven phenomenon. It arose from something of an underground baseball culture that challenged traditional ways of thinking, and with precious little media support outside of a handful of columnists, launching a rebellion against general managers’ and media members’ shortsighted analysis. Through websites like Gleeman’s, the inimitable baseballprimer.com, and dozens of other discussion boards and weblogs, the sabermetric movement applies indirect groundswells of pressure against both mismanaged teams and media hegemony.

What’s more, the fact that fans are leading the way in a revolution of baseball thought directly challenges many Fourth Estate elitists’ perception of fans as idiots. Doling out simplistic explanations of teams’ performance, pontificating about alleged fan misbehavior or willing-executioner support of athlete transgressions, and challenging fans to name more than five players on a team are the modi operandi of far too many sportswriters. The idea of thinking fans so befuddles them that they take opportunities to stereotype sabermetricians--not just Beane, but the fans themselves, as antisocial eggheads who threaten baseball’s mystique.


Sabermetrics is much more accepted now than 25 years ago, when the first Bill James Abstracts started to appear. Each succeeding generation of sports writers will be more in tune with OBP and Slugging percentage than their predecessors. That's the nature of these types of revolutions; eventually, the convinced outlive the disbelievers.

Posted by StatsGuru at 12:17 PM | Comments (1) | TrackBack (0)
February 01, 2004
Schilling on Neyer
Permalink

Dominic Rivers points me to this Sons of Sam Horn thread in which Curt Schilling is answering real baseball questions. In it, he makes disparaging comments about Rob Neyer. (If you follow the link, go to page 3 and search for Neyer to see the quote.

(Edward Cossette points to another part of this post to try to bolster his team chemistry theory). Schilling makes a very good point; that what statistical analysis yields is trends and probabilities. The question is, how good are those trends and probabilities? In Neyer's case, I'd say they are pretty good. I'm tempted to go through Rob's archives and see how many of his predictions were really ludicrous, and how many were right on the mark. One thing is for sure, Rob would not make such a statement about Schilling without having done the research to back it up. And remember, for every Rob Neyer, there are many more sports writers who comment on the game without any idea what the stats mean. I guess players look at Rob Neyer the way Democrats look at Fox News. :-)

As for booing based on stats, I find that hard to believe. Schilling seems to see the Sons of Sam Horn as typical Red Sox fans. My experience is that most hard-core fans still just look at batting average and RBI. They boo when a guy strikes out in crucial situations. They boo when a pitcher gives up a game winning HR. They boo when they see performance on the field that hurts their team, not because someone has a .340 OBA when they expected him to have a .360 OBA.

But for you hard core stat-head Red Sox fans out there, I would boo Curt Schilling if:


  • He strikes out less than 7 per 9 innings.

  • If he walks more than 3 per 9 innings.

  • If he gives up more than 40% of his HR with men on base.

  • If his winning percentage is below his pythagorean projection, unless it's the fault of the bullpen. (Exception: If Schilling actually blames the bullpen, he's destroying chemistry, and should be booed heartily. :-) )


My statistical analysis tells me Schilling will be pretty good. For the sake of Curt's sensitive nature, I hope I'm not ludicrously wrong.

Update: I have been accused of being unethical in using a quote from Schilling that Schilling had declared to be off the record. (See comments below). For the record, the off the record comment was at the beginning of the thread, and I didn't see it. I have removed the quote at the request of Eric of SoSH.

However, I do not buy Eric's argument that what Schilling says is off the record. It's a publicly viewable web site. Schilling does nothing to hide his identity. What Curt has is a forum in which he can criticize and not be criticized. That seems a bit unfair to me.

Posted by StatsGuru at 09:34 AM | Comments (34) | TrackBack (3)
January 28, 2004
The Importance of OBP
Permalink

Yesterday I got together for lunch with one of my readers, Dominic Rivers. Dominic graduated from the Sports Management program at UMass. He interned for the Pirates and has been looking for another job within baseball. Dominic told me about an article he published on-line, where he tries to determine how much weight on-base percentage should get in the on-base+slugging formula using linear regression. I find one statement very interesting:


Nevertheless, there are some aspects of this data that are difficult explain. Despite my deeply held intuitive belief that on-base percentage is always more important than slugging percentage, the two-year time period chart shows three eras where SLG appears to be more important than OBP. Those eras are 1981-1982, 1989-1990, and 1990-1991. Oddly enough, these periods happened to produce the lowest “r-squared” totals. For those who haven’t taken a stats class, or who had one but didn’t pay attention, “r-squared” tells you how much is explained by the regression equation. So for example, in the 2001-2002 time period ‘runs scored’ were approximately 88.2% (r-squared of .882) determined by OBP and SLG. The other 11.8% can be explained by other stuff. This “other stuff” might include baserunning, clutch hitting, and number of times reaching base on an error. Why does a “low r-squared” period correspond with a period where SLG is important? My opinion is that these eras, 1989 in particular, are characterized by a great deal of offensive parity. For example, in the National League in 1989, every team besides Atlanta had an on base percentage in the range of .305 to .321. In the American League in 1989, all but two teams scored between 4.13 and 4.78 runs per game. But these years are anomalies, and hence, I would not recommend that Major League GM’s attempt to build an offense based on numerous low-OBP/high-SLG Dave Kingman types.

This is just what I would have expected. If teams are very close in OBP, slugging will dominate. If they are close in slugging, OBP will dominate. But there's another lesson to be learned here as well. There's more than one way to score runs. Having a team with a high OBP is a great way to score runs, maybe the best way to score runs, but it's not the only way. You can do just fine with high slugging averages. You can do fine with high batting averages. You can do fine by being okay in all of those and just being lucky. As with so many things in life, there is no one right answer.

Posted by StatsGuru at 08:33 PM | Comments (8) | TrackBack (2)
January 27, 2004
Research Today
Permalink

Jay Jaffe at Futility Infielder was nice enough to calcuate DIPS for 2003.

The Baseball Crank continues his research on win shares with a look at the established win share levels of players in the AL West.

Posted by StatsGuru at 10:17 AM | TrackBack (0)
January 21, 2004
What Matters
Permalink

Alan Schwarz has an excellent column on what statistics matter to baseball people; GM's, news media and fans. He looks at what stats are good at telling us about the present and which are good at telling us about the future. Most interesting, he now considers defensive efficiency a main stream stat!

There's been great progress in the last 25 years in how people think about baseball. This article is a good demonstration of that.

Posted by StatsGuru at 11:56 AM | Comments (1) | TrackBack (0)
January 15, 2004
Milestones
Permalink

Alan Schwarz pens a piece for ESPN.com on the milestones to watch for in 2004. He also finishes with a favorite toy update at the end.


The Favorite Toy estimates that 39-year-old Bonds has a 52 percent chance of breaking Aaron's record -- not to mention a 20 percent chance of zooming past him to 800. Alex Rodriguez (345 already by age 28) has a 43 percent shot of passing Aaron, while Sosa (539) registers at 37 percent. One shouldn't take those figures too literally, but taken together they do suggest that there's about an 83 percent (five out of six) chance that at least one of them will ultimately break Aaron's record.

I think it's been pretty obvious for 10 years that someone will make a very serious run at Aaron. For a long time, I thought it would be Griffey Jr. This year will tell us if Bonds can do it or not. Of this whole group, however, I really like A-Rod's chances of winding up #1 eventually.

Posted by StatsGuru at 01:25 PM | Comments (5) | TrackBack (0)
January 13, 2004
Graphing DER
Permalink

Dave at Baseballgraphs.com takes some results from my probabilistic model of range and does some interesting analysis.

Posted by StatsGuru at 07:18 PM | TrackBack (0)
January 12, 2004
Vlad's Wins Shares
Permalink

Dave at BaseballGraphs.com explores why Vlad Guerrero has never had a 30 win share season.

Posted by StatsGuru at 09:05 AM | TrackBack (0)
January 11, 2004
Established Win Shares
Permalink

The Baseball Crank has an interesting post on applying established performance levels to win shares. The established performance level of a player is a weighted average of his last three seasons, with the most weight on the last season. Seasons are weighted 3-2-1. Barry Bonds, not suprisingly, is #1 on his list.

A couple of comments on his comments:


*Most of these guys are in their thirties, which is suggests that at least at the very high end, investing in players in their early 30s may not be a terrible bet.

Actually, it seems to me that they are in their early 30's after having three great years. So you want to get players no later than their late 20's.

*The most glaring absence is Vladimir Guerrero, due to the injuries and weakknesses I noted yesterday (I'd still love to have had him, though): his WS totals are a less than spectacular 23, 29 and most recently 18. Oddly, he's never had a 30-WS season.

Yes, I find that odd, also. His 2000 season sure looks like a 30 win share year to me.

Posted by StatsGuru at 02:16 PM | Comments (1) | TrackBack (1)
November 29, 2003
Significance
Permalink

I must be tired lately, because it's taken me a few days to get a handle on what's bothering me about productive outs. Here's the line that's bothering me:


Base on balls are a fundamental piece of the Athletics' offensive philosophy, but statistically, they have shown to have slightly less significance than Productive Outs in the post-season.

That's a very misleading line. Significance is in the eye of the beholder. When we talk about significance, we're talking about the probability of something being very low; how low is up to the person studying the data, but most people look for a probability under .05 for something to be significant.

So let's do a thought experiment. According to the article, of 130 series in which one team made more productive outs than the other, the team with more productive outs won 62.3% series, or 81 total. We can define this as a Bernoulli random variable; it has the value 1 if the team which wins the series has more productive outs, 0 if the team which wins the series has fewer productive outs. Now, imagine a bag filled with balls labeled 1 or 0 in the proportion 81 1's and 49 0's. Taking a ball from the bag is a Bernoulli trial. If we do this many times (replacing the removed ball each time), the probability of getting a certain number of 1's in a certain number of trials is given by the binomial distribution.

With the binomial, we can ask questions like, "What is the probability of getting exactly 78 balls with a 1 if I make 130 trials," or, "What is the probability of getting at least 85 1's in 130 trials." But more importantly, we can ask, if I repeatedly sample 130 balls from the bag, what range will the result be in 95% of the time? To be clear, here's the experiment:

Perform 130 Bernoulli trials with our bag of balls. Record the number of balls labeled with a 1. Repeat this experiment thousands of times. Make a histogram of the results.

The histogram will look like a normal distribution (and in fact, for a large number of trials, it can be approximated with the normal) with the highest bar on 81, the mean. The height of each bar of the histogram represents the probability of getting that number of 1's. At 81, the height is .072. If we sum the height of the bars around the mean until we get .95, we've found the range where we expect 95% of the results to be. For this distribution, with 130 trials, that range is 69-91.

Now, according to the article, more walks win a series 60% of the time, or 78 out of 130. Seventy-eight is well within our 69-91 range of 95%. So walks are not less significant than productive outs. The fact is, this difference could easily be sampling error. If we did another trial, we might get 75 wins on productive outs and 85 wins on walks, and we still couldn't tell if they came from the same distribution or not.

The sample size is too small. They have not shown there is any significant difference in any of the stats they mention.

Posted by StatsGuru at 08:43 AM | TrackBack (0)
November 27, 2003
Productive Outs Definition
Permalink

Yesterday I started discussing productive outs. I'm starting to look at data, and the definition they published yesterday isn't quite accurate. It appears they do not count double plays that advance runners as productive outs, at least in the case where a there are men on first and second and the DP advances a runner to third. Since they don't have data for other years, I don't know if a DP that scores a runner is counted as a productive out. I'm going to assume no for the rest of my study.

Posted by StatsGuru at 07:23 AM | TrackBack (0)
November 26, 2003
Outs, Productive Outs and the Unproductive People Who Write about Them
Permalink

J Lentner writes:


I guess one could applaud ESPN for giving equal time to the traditionalists with , the first in a promised series on productive outs. What really makes his case for the productive out laughable is the accompanying box. Florida won the PO battle 9-5 yet was outscored 21-17. So the one cancels the other out and it comes down to the fact that the Yankees’ batters slumped and the Marlins’ pitchers excelled.

Off the top of my head, I’d assume that the teams winning in postseason that had a “edge in PO” were the ones that also had an edge in OBA, and therefore Buster Olney’s articlehad more opportunities to move runners over. Seems simple enough.


I should probably take an hour off from work to respond to this article. This is Elias playing politics. The Elias Sports Bureau cannot survive without the support of the leagues. What they see is themselves being made irrelevant by the likes of Billy Beane and Theo Epstein, who look to non-Elias people for information. If I'm an owner, I have to start asking why MLB is paying the Hirdts big money to keep stats, when others can do it as well and cheaper. So Elias has decided to appeal to all those GMs who think Beane is wrong.

The style of play that generates many of the Productive Outs - putting runners in motion, bunting - has been scrutinized by many baseball theorists in recent seasons. Some teams, most notably the Oakland Athletics, have played with the philosophy that a team's 27 outs should not be wasted.


This has worked for Oakland during the regular season: the Athletics averaged 98 regular-season victories from 2000-2003. During that time, they ranked 14th, 13th, 14th and 13th in the 14-team American League in stolen bases, and 12th, 13th, 13th and 13th in sacrifice bunts. Their base-runners proceeded carefully, taking care not to make a mistake that would effectively strip a teammate of a chance to swing the bat. Bunts, hit-and-run plays and aggressive secondary leads are not part of the Athletics' DNA.

Oakland has generated walks, something that other great teams had done before -- one of Gene Michael's primary goals when he began rebuilding the Yankees in 1990 was to increase on-base percentage -- with the goal of saturating the bases with runners and scoring more.

But this conservative style has not translated well in the post-season, when the pitching is markedly better, there are more off-days to rest the best pitchers, and the pressure is greater. Rather than concentrating on not wasting their 27 outs, most championship teams have successfully used their outs, working to put runners in scoring position.


I also get the feeling that Elias has rigged the definition to make the stat look good.

This is the Productive Out, as defined and developed by ESPN The Magazine and the Elias Sports Bureau: when a fly ball, grounder or bunt advances a runner with nobody out; when a pitcher bunts to advance a runner with one out (maximizing the effectiveness of the pitcher's at-bat), or when a grounder or fly ball scores a run with one out.

Doesn't that seem limited to you? I mean, if you move a runner into scoring position with two outs, doesn't that count for something? And besides, didn't Pete Palmer show 20 years ago that trading an out for a base always decreases run potential?

I'm going to write more on this later, but I'll leave you with two themes I've hit upon all year; getting on base is important, and putting the ball in play is important.
Clarification: Chris Lynch thinks I'm not being clear with this statement:


I mean, if you move a runner into scoring position with two outs, doesn't that count for something?

I meant the play ends with two outs (begins with one out).

Posted by StatsGuru at 10:04 AM | TrackBack (1)
November 08, 2003
Wearing Out Pitchers
Permalink

Avkash Patel at the raindrops has a quite interesting post on which batters are best at wearing out pitchers. Nice statisitical work on his part.

Posted by StatsGuru at 09:02 AM | TrackBack (0)
November 06, 2003
How to Become a Stat Head
Permalink

Edward Cossette forwarded me an e-mail that was sent to him in response to this post on Bambino's Curse. The reader writes:


I share these sentiments, and have always wanted to be a stat-head, but I can't find the proper entry point. How does one get started with Sabrmetrics? I thought maybe this winter I would read Bill James Baseball Historical Abstract, but it's so heavy. Do you have to commit yourself to carrying around such tomes to be a Sabrmatrician? Seriously, is there a getting started manual out there for becoming a stats geek?

Edward thought I would be a good person to comment on such things, since I did in fact become a stat head. I can't say I know of anyone who intentionally set out to become a professional baseball researcher. I got into the business through a series of "right place right time" events:

  1. Took a computer programming course because it looked interesting.

  2. Took a second one because I did well at the first one.

  3. Did very well in the second course, and the professor offered me a job in his start up.

  4. Took the job and continued to earn a degree in computer science.

  5. About the same time started scoring for Project Scoresheet.

  6. Professor's best friend was Dick Cramer, the founder of STATS, Inc. We'd have dinner together whenever he came to town.

  7. Started scoring games for STATS, Inc. in 1987.

  8. In 1990, STATS, Inc. got their ESPN contract. When I expressed interest in the job, Dick Cramer pushed to hire me.


So it was a combination of the right skill set, working as much as I could in the field, and knowing the right people. So, I would suggest to anyone who really wants a job like this to:

  • Study math, especially statistics and probability theory. There isn't enough experience in that among many of the stat heads out there.

  • Know how to program a computer and use database software.

  • Do something in baseball. Work for a minor league team in any capacity or score games for STATS, Inc. Ask a lot of questions about what's going on.

  • Play lots of fantasy games and simulation games. You'll learn a lot about the players and how baseball works. Plus, it's fun.


Set a foundation for yourself, then try to get involved with a team, or a news organization. Most of the ESPN researchers I know started off somewhere else at the company and naturally gravitated to the research area. And think about starting a blog. It's easy, it's fun, it's cheap, and your opinions get seen. It's also convient to point a potential employer to your work.

And yes, you have to read Bill James. I'd start with the old paper back Abstracts rather than the Historical tome, but all should be read. And find a copy of The Managers; you'll learn more about baseball from that book than any other Bill has written. Good luck!

Posted by StatsGuru at 09:35 AM | TrackBack (1)
Sabermetric Inroads
Permalink

Robert Tagorda sends a link to this article on using game theory to determine the value of baseball players:


Using reams of historical data, Lonergan and Polak can measure the probability of a team's chance of winning a game, given any set of circumstances. With each at-bat, a player can help or hurt his team's chances.

Here's how their method works: Let's say the home team is down by two runs in the bottom of the fifth inning, with no outs and a runner on second base. At that moment, the home team has a 39% chance (or 0.39 probability) that it will win. If the batter grounds out, and the runner at second fails to advance, the team's chance of winning falls to 33%. The difference between the two, -0.06, is assigned to the batter who just grounded out.

DIFFERENT ANGLE. Polak and Lonergan add up all of a player's outcomes for the season. Doing so yields the exact number of wins -- or losses -- a player contributed to his team, relative to an average player. For example, New York Yankees slugger Jason Giambi contributed 4.9 wins (unadjusted for special circumstances -- see footnote in table below, which shows who would win 2003's MVP and Cy Young awards, according to this method). On the other end of the spectrum, maligned Yankees pitcher Jeff Weaver contributed -2.5 wins (also unadjusted). If all of the players' net win contributions are added up, the result equals the number of games over .500 the team finished in the regular season (the 2003 Yankees finished 101-61, 20 games above .500).

The method has some distinct differences from other quantitative analyses, such as the "sabermetrics" (named after SABR, the Society for American Baseball Research) method popularized by baseball historian Bill James and others. Sabermetrics also seeks to assign portions of wins to different players, but it relies on selective weighting of certain baseball statistics, such as a hitter's on-base percentage or a pitcher's earned-run average, and uses regression analysis to examine those stats' effect on wins or runs scored.

Lonergan and Polak claim that their method, which doesn't rely on traditional statistics, eliminates a step -- going directly to the measurement of game outcomes.


The article goes on to point out that at the current time, this method does not account for fielding, making the MVP table at the bottom of the article very suspect. All of the value calculated by the difference in situations is attributed to either the batter or the pitcher, and we know that's not true. Roy Halladay is the best scoring player in the AL by this method, yet some of those wins have to be attributed to the defense of the Blue Jays.

There is another problem as well. The researchers used "reams of historical data" to figure out the probability of winning in a particular situation. But these probabilities are not constant over time. The probability of coming back from three runs down in the fifth was much lower in the dead-ball sixties than in the swinging nineties. So, like linear weights, you can't have a formula that works correctly every year; you have to wait for the season to finish and then make adjustments.

Still, with the proper refinements it should be a good system, and I'm glad to see another club is testing the waters and using a new model to evaluate ballplayers. We'll have to look for a big improvement from a low payroll NL club next year.

Posted by StatsGuru at 08:51 AM | TrackBack (1)
November 02, 2003
Pitches Per Plate Appearance
Permalink

Al Bethke has an interesting post on pitches per plate appearance over at Al's Ramblings. He constructs a lineup out of the best batters at working a pitcher, vs a lineup of the hitters who see the fewest pitchers when they come to bat. The selective hitters in aggregate have a 100 point better OPS.

I've watched this stat for a number of years. It's not a be all or an end all; there are good hitters who don't see that many pitches. There are poor hitters who are selective. But I think in general Al is correct that a lineup of selective hitters will do more damage, if for no other reason than they tire out the starter earlier.

Posted by StatsGuru at 09:58 AM | TrackBack (0)
September 24, 2003
Official Scorers
Permalink

The Wall Street Journal has a nice article on the woes of being an official scorer (paid subscription required). If you don't have an account, it should be in the print version.

Posted by StatsGuru at 12:17 PM | TrackBack (0)
September 11, 2003
Tiger Record
Permalink

I think you all know this, but every once in a while someone says that the Tigers are threatening the record for worst record. That's not true. I've always taken worst record to mean lowest winning percentage. In the modern era (1900 on), that belongs to the 1916 Philadelphia A's. They were 36 and 117, a .235 winning percentage. The Tigers won't fall that low. What the Tigers are going for is the most losses in a season in the modern era. That's the 120 the 1962 Mets lost. The all-time record is 134 by the 1899 Cleveland Spiders, although there are a lot of reasons not to think that's a legitimate record, since they really weren't a legitimate team.

Update: Tigers lose. They are now 37-108.
They are losing tonight, and if that holds up, their 108 losses will be tied for 20th since 1900.

Posted by StatsGuru at 09:29 PM | TrackBack (0)
September 07, 2003
Pulling the Ball
Permalink

Ben Jacobs writes:

I was just wondering if you know if Tim Kurkjian is a regular reader of your blog? I'm watching Baseball Tonight right now and Kurkjian just said of Tony Batista, "Two out of every three balls he puts in play are pulled."

Obviously, it's entirely possible that Kurkjian has that info on his own, but it seems odd that he said it on the same day that you ran your "pull table." Just thought I'd let you know.

Yes, there are people on the Baseball Tonight staff that read this blog. However, if what Tim said came from this post, it was mis-interpreted. By pull percentage, I'm not describing the percentage of balls pulled, I'm describing a direction, which I explained in the post. I'll use Batista as an example.

Tony's a right-handed batter. If 2/3 of his balls in play were hit down the left line (right on the line) and 1/3 were hit down the right field line, he'd have a pull percentage of 67%. But if you divide the field into thirds, and Batista hit all his balls on the dividing line between left and center, then again, he'd have a pull percentage of 67%. In the first case, 2/3 of all his balls in play would be pulled. But in the 2nd case, 100% of the balls in play would be pulled.

Of course, how do you define pulled? If you divide the field into thirds, Batista has the following profile:

2003 Left FieldCenter FieldRight Field
Balls In Play 23014654
Percentage 53%34%13%

If you divide the field in half, then 76% of Batistia's balls in play go to left of center. I suppose you can draw a line where Batista's would have 67% of his balls pulled, but not based on my work. If Tim was using my data, what he should have said is that Tony is the most extreme pull hitter in the game this year.

Posted by StatsGuru at 09:43 AM | TrackBack (0)
September 06, 2003
Pulling the Ball
Permalink

Steve Bonner wrote the other day:

Ok is there by any chance a stat that quantifies a hitters propensity to pull the ball?

There's not one stat per se, but STATS, Inc. does record where each batted ball lands. They break the field into 22 wedges of pie, labeled C to X, third base line to first base line. Using this data, I've come up with a pull percentage. For a left-handed batter, C = 0 and X =1, and each letter in between is the appropriate fraction. For righties, C = 1 and X = 0, and each letter in between is the appropriate fraction. For each player, I add up the value of each batted ball, multiply by 100 and divide by the number of balls into put into play. I call this pull percentage.

If every ball a batter hit were pulled down the line, pull percentage would be 100.0. If every ball were hit down the opposite line, then pull percentage would be 0. So the higher the pull percentage, the more likely the batter is to pull the ball. The pull percentage can be thought of as a direction; 50% means the average ball is hit up the middle.

Here's the table for everyone this year with at least 200 balls in play:

Read More ?


Posted by StatsGuru at 02:23 PM | TrackBack (0)
August 25, 2003
Correlation and DIPS
Permalink

Dr. Manhattan at Blissful Knowledge writes a post about the recent study by Tom Tippett at Diamond Mind Baseball on the Voros McCracken's DIPS theory. He's not sure about something:


McCracken’s theory had value beyond what it said about pitchers’ performance. By largely removing the influence of pitchers from the results of balls put into play, it also provided the justification for the foundation of Bill James’ and Baseball Prospectus’ methods of measuring fielding performance. But now, it seems like we have taken several steps backward. To use an example cited in the Diamond Mind study, in measuring the defensive performance of the Seattle Mariners over the last several years, don’t you have to adjust for the influence of Jamie Moyer? And again, doesn’t that merely reopen the “Hibernian Problem” of distinguishing pitching from defense?
Maybe I’m missing something here, but I’m not sure what it is.

I meant to talk about this at the time of the Diamond Mind article, but something else took my attention and I never got back to it. I think the most interesting part of the Tippett article is this section entitled, "Year-to-year variations, part two."

It goes without saying that one cannot prove or disprove the idea that "there is little correlation between what a pitcher does one year in the stat and what he will do the next" by examining only ten or twelve careers.

To get a better handle on this phenomenon, I compiled a database consisting of all pairs of consecutive seasons in which a pitcher faced at least 400 batters in each season. Using this sample of 7,486 season-pairs, I computed the correlation coefficient for the net HBP rate, BB rate, K rate, HR rate, and in-play hit rate.

I found the highest correlation (.73) for strikeout rates. Walk rates (.66) were also highly correlated. The correlation coefficients dropped to .36 for hit batsmen, .29 for homeruns, and .16 for in-play batting average relative to the league. The lowest correlation (.09) was seen for in-play batting average relative to the team.

It may appear to be contradictory to say that certain pitchers appear to be consistently good while the overall correlation rate is quite low. But that's not necessarily so.

If McCracken is right, the difference between a pitcher's IPAvg and that of his team should vary randomly around zero as he moves through his career, and the correlation would be quite weak.

But if pitchers do have some influence over these outcomes, they could still exhibit a weak correlation by varying around some value other than zero that reflects the ability of the pitcher.


(Emphasis added by me.)

What Tippett is saying here is that you can predict strikeout rates pretty well just by looking at the previous season of the pitcher, but you can't predict -play batting average relative to the team well at all. That's what correlation means. Correlation goes on a scale of -1 to 1, where 1 is perfect correlation (the best at one will be the best at the other), -1 is perfect opposite correlation (the best at one will be the worst at the other) and 0 means no correlation at all; in other words, being the best at one will tell us nothing about how you do at the other. The statistican I learned from used to tell me that if he sees .5 correlation, he assumes the data is random. Seeing a .09 correlation tells me the data is very random. It's not 0, but it's very close to 0.

So, as to Dr. Manhattan's question; yes, you are missing something. The effect Tippett is showing is small, so small that DIPS is still valid. Bill James knew about this when he wrote win shares, but for the aggregate I think it works really well. We don't have to reopen the “Hibernian Problem”; we just have to understand that the solution is just an approximation.

Correction: The Hibernian Problem is a typo in the original Blissful Knowledge entry. Here's what the problem is:


I've just fixed my post. Sorry - my BP2K is in storage so I couldn't double-check it. the correct term is "Hibert" problems. Those problems were a list of 23 fundamental mathematical problems propounded around the turn of the century by a mathemetician nmaed Hibert. In BP2K, Keith Woolner tried his hand at setting out a list of parallel questions, and a primary one was the distinction of pitching and defense. The piece helped inspire Voros McCracken.

Correction II: The name of the mathematician is Hilbert, not Hibert. Thanks to Mike Malloy for catching this.

Posted by StatsGuru at 09:03 AM | TrackBack (0)
August 07, 2003
Defensive Win Shares
Permalink

The Baseball Crank wonders who has had the most win shares, all of them coming from defense. The answer is Billy Hunter, who in 1953 had 11 win shares, all from defense. Hunter came up in 1953 and played with the St. Louis Browns in their last season. He played 154 games, batting .219 with a .253 OBA and a .259 slugging percentage. That's pretty worthless. But he must have been a great fielder.

Posted by StatsGuru at 02:24 PM | TrackBack (0)
August 01, 2003
Win Shares
Permalink

From time-to-time, people ask where they can find Win Shares on the web. I've been posting top short form win share leaders from time to time here. But I found a site that does a much more complete calculation (I'm not sure it's if they have all the info STATS, Inc. does for the calculation, but I'll try to find out.) Check it out.

One thing I see is that Ventura is a much better defensive third baseman than Boone. I was actually surprised that the Yankees traded Ventura. I thought they would get rid of Zeile and use Robin as a pinch hitter defensive replacement.

Posted by StatsGuru at 11:00 AM | TrackBack (0)
Short Form Win Shares
Permalink

Here are the top players in short form win shares through the end of July:

PlayerWin Shares
Barry Bonds 28.7
Albert Pujols 27.0
Todd Helton 25.9
Carlos Delgado 25.4
Gary Sheffield 23.8
John Smoltz 22.7
Preston Wilson 21.6
Jason Giambi 21.0
Manny Ramirez 20.6
Nomar Garciaparra 20.3
Javy Lopez 19.8
Bret Boone 19.8
Bobby Abreu 19.3
Ivan Rodriguez 19.0
Mike Lowell 19.0
Luis Gonzalez 18.9
Lance Berkman 18.9
Marcus Giles 18.8
Esteban Loaiza 18.7
Jim Thome 18.7
Garret Anderson 18.7
Magglio Ordonez 18.5

It's pretty clear why the Braves are #1. They put four players on this list. Barry Bonds is still a little better than Albert Pujols, and although Carlos Delgado has fallen from first overall, he's still dominating the American League. It's also clear that the Rockies have the best 3-4 hitting combination in baseball. Nice to see Mike Lowell in the top 20 as well.

Correction: Brian Carusi reminds me that Preston Wilson is a Rockie, making Helton/Wilson the best 3-4 hitting combination in baseball. I had the Garciapparra there. They're the best in the AL.
Posted by StatsGuru at 09:43 AM | TrackBack (0)
July 21, 2003
Tippett DIPS
Permalink

Tom Tippett has done an extensive study of DIPS (defense-independent pitching stats), and finds that pitchers have more influence over hits rates than Voros McCracken thought, but that influcence is much smaller than pitchers have over K, BB, HR and HBP. A long read but worth it.

Posted by StatsGuru at 08:19 PM | TrackBack (0)
July 09, 2003
Mid-Season Research
Permalink

The Baseball Crank is using DIPS to look at the pitching in the NL East.

Posted by StatsGuru at 09:15 AM | TrackBack (0)
July 01, 2003
Win Shares Question
Permalink

Erik Yeager writes:


You may have seen this before, but I thought I'd throw it out there in relation to your Win Shares postings. This is from an ESPN chat with Bill James, May 15.

http://proxy.espn.go.com/chat/chatESPN?event_id=3503


Jake (Mountlake Terrace, WA): Bill, can we gleam anything from the Win Shares system after only 40-odd games, or is it a tool that's truly accurate after a full 162-game schedule?

Bill James: Nothing. Win Shares are a tool used to analyze a season after it is over. They have no relevance at all to a moving object.


Having looked over the chat session, Bill was very short with his answers. I find it hard to believe that win shares mean nothing halfway through a season. You certainly can't use them to project anything, since you don't know how a team will do the rest of the season. I certainly would not look at them every day like one would with batting average. And, with a smaller sample, they are not going to be as accurate in mid-season as at the end of the year. But as a quick way of ranking players, I think they are fine. You just have to be careful what you read into them.

Posted by StatsGuru at 12:38 PM | TrackBack (0)
June 24, 2003
New Twist
Permalink

While I was reading Moneyball, I came across something interesting. I believe Lewis asked DePodesta what he thought of the On-Base+Slugging stat. DePodesta thought that OBA was worth about three times the player's slugging percentage. So I thought I'd figure out a variation of OBA+Slugging, which is a weighted average of the two (3*OBA+Slugging)/4. Click below for a list of this stat for all ML regulars, and the comparison OBA+Slugging:

Read More ?


Posted by StatsGuru at 09:53 PM | TrackBack (0)
June 23, 2003
More on Bill James
Permalink

Robert Tagorda at Priorities and Frivolities has his take on Matt Welch's article on Bill James.

Posted by StatsGuru at 09:08 PM | TrackBack (0)
June 22, 2003
Bill James and Moneyball
Permalink

Matt Welch writes about Bill James and the book Moneyball. (Link via Instapundit). I like the way Matt refers to himself as a Jamesian. I, too, consider myself a Jamesian.

I just picked up my copy of Moneyball and I'll be writing more about it when I'm done reading.

Posted by StatsGuru at 04:29 PM | TrackBack (0)
June 06, 2003
SABR Meeting
Permalink

I will be attending the meeting of the Southern New England chapter of SABR tomorrow at McCoy Stadium in Pawtucket, RI. If any of you are there, I hope you'll say hello.

Posted by StatsGuru at 07:36 PM | TrackBack (0)
April 24, 2003
Retrosheet
Permalink

There was a front page story in the Wall Street Journal yesterday on Dave Smith of Retrosheet. (You may need a subscription to read the story.) I believe ML data is now good back to 1974. This is an incredibly valuable resource. If you have any old score sheets, or if your father or grandfather left you any, you should get Dave Smith some copies.

Posted by StatsGuru at 01:22 PM | TrackBack (0)
April 02, 2003
Sosa
Permalink

He's up in the 6th. He's walked twice tonight. Men on 1st and 2nd, Cubs down 3-1. Seems like the Mets have to pitch to him.

Update: It was high enough, but not deep enough. Fly out to deep left, with the wind knocking it down.

Posted by StatsGuru at 08:54 PM | TrackBack (0)
March 28, 2003
Two Views of Stats
Permalink

Jan from Wellesley sends these two competeing view points of the use of stats in baseball. First, the anti-stat position, by Steven Krasner of the Providence Journal.


Baseball is a game of numbers. Always has been. Always will be.

The numbers that have generated the most interest over the sport's storied history are home runs, batting averages, RBI, stolen bases, won-lost records and earned-run averages.

But over time, other numbers have been seeping into the game. Like how a player hits at night, against left-handed pitchers, on natural turf, when the temperature is above 63 degrees, in the month of May, from the seventh inning on, with a runner at first base who is timed at 5.3 seconds from home to first, on the road.

Managers these days have reams and reams of computer printouts to prepare them for the game's direction.

Indeed, the Boston Red Sox have taken this all a step further in hiring stats guru Bill James to provide analysis when it comes to player moves, not to mention in-game decisions. The odds are you will hear the terms "OBP" and "OPS" ad nauseum this year, based on some of the organization's numbers-driven philosophies.

And Major League Baseball honchos want to speed up the game? The reliance on stats can only add to the time it takes to play nine innings. I can envision baseball adding a few 30-second timeouts for the manager so he can pore over the printouts before deciding what move he wants to make when he needs a base hit in balmy 75-degree weather on a Thursday night on artifical turf.

Can all these numbers be helpful? Sure. But as the sole means for evaulating players, or even the primary purpose?

Not in my book.


Before we go any further, there are two uses of statistics in baseball. One is to evaluate players, and is mostly used by GMs, managers and agents. The other use is entertainment, and is mostly used by PR men, broadcasters and newspaper writers. Mr. Krasner, a writer, is mostly exposed to the entertainment side of the stats, which he gets in game notes from the PR person whenever he sees a game. The game notes don't try to do analysis, they are providing writers and broadcasters with tidbits they can use to fill their columns. But this is where all the dumb stats come from, like hitting in July under a full moon.

For instance, radar guns are all the rage these days. Let's take Pedro Martinez for example. He can light up the gun at 96 mph. When he throws a fastball at a mere 94 mph, there's a palpable gasp in the crowd. Pedro's losing it. Is his shoulder about to fall off?

Really, what is the difference between 96 and 94? Not much from a hitter's point of view. Now, if Pedro's velocity suddenly should drop from 96 to 86, which is a big difference, would you need a radar gun to tell you? No, those doubles in the gap will give you all the raw data you need.


Not much difference? A two mile an hour difference in speed is 1/100th of a second reaching the plate. But a ten mile an hour difference is still only 1/20th of a second. That's about the frame rate of a motion picture, meaning it's not something your eyes can easily pick up. Now it's amazing to me that anyone can hit a fast ball period, given the lack of reaction time, period. But if you make it easier for anyone with that kind of skill to hit the ball, it should be noted. And it is a sign of fatigue.

Anyway, he goes on for a while, but finally gets to his point.


Yes, numbers can point out trends and tendencies, which can be important. But the eyes of a veteran baseball man are even more important on a daily, game-by-game, inning-by-inning, pitch-by-pitch basis.

See, it doesn't matter how someone performs, it's how they look. And it's better to look good than feel good. And you look marvelous....

In the same paper, Art Martone, the Sports Editor, takes the pro-stat side:


His name was Joe Schultz, and he managed the Seattle Pilots in their sole season of existence.

Normally a one-time-only skipper like Schultz, fired more than 30 years ago after a 64-98 season, would be lost in the maze of history. But Schultz was immortalized -- sort of -- because Jim Bouton happened to be playing for the Pilots the year he wrote his ground-breaking book, Ball Four. It was in the pages of that book that Schultz articulated the anti-analytical sentiment that still flows deeply in the veins of the baseball establishment:

"I don't need no statistics. I see what's going on with my own eyes."

He may not have meant to, but with those 14 words Schultz conceived the Anti-Stathead Manifesto. As more and more statistics became available in the years to come, and more and more "analysts" began weighing in on topics historically left to "baseball men," traditionalists felt themselves under siege. And more and more, they responded -- angrily, in many cases -- by falling in line behind good 'ol Joe Schultz. Numbers? I don't need no stinkin' numbers.

They're looking at it all wrong.

Statistics are information. Nothing more, nothing less. When evaluating talent, an organization should look at every piece of the puzzle. Statistics -- the right statistics -- are one of those pieces. Fact is, baseball may be the only industry in the world that has at its disposal such a detailed and complete record of how an employee actually performs. To ignore it is just as foolish as ignoring a scout's evaluation of how a player runs or throws or hits.

Some statistics are more meaningful than others. Just because it's possible to determine how left-handed-hitting middle infielders perform against curly-haired right-handers during midweek day games in months that end in the letter 'y' doesn't make it important. The Schultz-ites, however, tend to cherry-pick the numbers they're comfortable with -- home runs, runs batted in, batting average -- and wave away all the rest as stat geek nonsense.

"The interesting thing is that (those) people . . . use statistics themselves," says ESPN.com's Jim Baker, who once worked as a research assistant for current Red Sox executive (and father of the modern analytical community) Bill James. "They reference things like home runs, RBI, batting average, and the number of games a pitcher wins. Those are stats, just like (the more sophisticated statistics). They just don't tell as complete a story."


And that's what it comes down to. People who are uncomfortable with math will never be comfortable with OBA or Slugging Pct., let alone Runs Created or Win Shares. I'm going to do a stat primer at some point. For example, the anti-stat people are comfortable with batting average, but have you ever tried to figure out what batting average represents? That will be the subject of another post.

I get the feeling that the anti-stat crowd is waiting for the Red Sox to fail so they can pounce on Theo Epstein. I think they are going to wait a long time.

Posted by StatsGuru at 07:58 PM | TrackBack (0)
March 25, 2003
Gammons' Hot Topics
Permalink

Two things of interest in Peter Gammons' "Trends to Keep an Eye On" column. The first is the rise of OBA:


In doing a story on which is the most important statistic this spring, more than 80 percent of the managers and general managers responded, "on-base percentage."

"You can't score runs unless you get people on base," says Padres GM Kevin Towers.


"It is the statistic off which everything else follows," says Barry Bonds. "It is unquestionably the single most important statistic in the game. It's everything."


Ten years ago, you'd have heard RBI or home runs or the like. Now the Billy Beane/Oakland Athletics influence has changed the way people view players.


I would argue that it's Bill James' influence, but what's really important is that after over 20 years since Bill introduced the abstracts, the concept is taking hold.

Second, closer by committee:


"I hear Boston's being operated by that stat guy," says one Marlins coach, meaning Bill James, who has long opposed the single closer theory.


No, James isn't "running" the Red Sox. Red Sox manager Grady Little talked about this last season, and GM Theo Epstein has long questioned the sagacity of spending $7 million to $10 million on one closer.


"It's a huge gamble," says Giants pitching coach Dave Righetti, himself once a great closer. "If Robb Nen goes down, we're in huge trouble."


"If it doesn't work in Boston," says Beane, "it isn't because the theory was wrong, it's because they had the wrong people."


In talking with people in baseball during spring training, it's been about 70-30 against going with the single closer.


I'm about 50-50 on this one. I think a lot of it will have to do with the makeup of the staff. Also, there will be a tendancy (I believe) for managers to go to the same people in the same situation. So while there won't be a named closer on the Red Sox, my guess is that there will be a de facto closer.

Then again, why not teach your starters to be efficient and throw less than 100 pitches per game, so you don't need too many relievers? :-)

Posted by StatsGuru at 02:00 PM | TrackBack (0)