Baseball Musings
Baseball Musings
March 09, 2007
Who Gets Credit for a K?

Steve Lombardi of Was Watching wrote earlier this week with a question about strikeouts:

David - I have a question that I thought may interest you...and one that I thought perhaps you may be able to help answer. It's regarding whether a pitcher earns a strikeout or if the batter allows it.

Say you have a great strikeout pitcher - in terms of the numbers that he racks up. Let's call him Medro Partinez. And, say you have two batters - one that whiffs a lot and one who makes a lot of contact. Let's call the strikeout-prone one Kave Dingman and the contact-maven Gony Twynn.

Conventional wisdom suggests that when Medro Partinez whiffs Gony Twynn, it's the pitcher who should be credited with earning the strikeout - whereas when Medro Partinez whiffs someone like Kave Dingman it's questionable as to whether Medro should get credit or Kave should get the blame (for allowing the whiff).

Is there a way to use the head-to-head data in the Day-to-Day database to determine if conventional wisdom is correct in this case? Should we be looking at pitchers strikeouts somewhat like we look at "easy" and "tough" saves? Or the flip side, when looking at the value of a batter, should we be more concerned about versus "who" (meaning the type of pitcher) he whiffs against (more so than how many times he strikes out)?

Many years ago I worked with Bill James on a game, and part of that game was predicting various rates for a particular batter against a particular pitcher. Bill used a formula that I don't have permission to divulge that predicts what the rate of any stat should be for a particular batter vs. a particular pitcher, given the rate for each and the league average for each. This formula basically says that the rate is a result of cooperation between both. In the case of strikeouts you would expect very few Ks from matchups between a low K pitcher and a low K hitter. You would expect a matchup between a high K pitcher and a high K hitter to be greatly above average. Against an average pitcher, the batter's K rate should be close to his career K rate and vice versa.

If this formula is true, graphing the actual K rate for a matchup vs. the predicted K rate should yield the line y=x (slope of 1, intercept of 0). To test this, I looked at all batters and pitchers with 2100 BFP since 2000 so we have a good measure of their K/PA, then chose matchups with at least 20 PA. Here's the graph with the trend line(click for full size):


StrikeoutPred.JPG

The equation of the regression line is y = .984x - 0.001, which is pretty close to y=x. This means that the contribution is pretty equal. Strikeouts are a collaboration between pitchers and batters, and there's no reason to give one more credit than the other.


Posted by David Pinto at 06:03 PM | Statistics | TrackBack (0)
Comments

What's the r-squared value? Or some other appropriation of the size of the error?

The slope of the best fit line means that "on average" a strikeout is 50-50 dependent on the batter and pitcher's career k-rates. But there are a lot of points distant from the line. If the r-squared value is low, this means that the specific batter-pitcher-matchup k-rate is not very accurately predicted from their career k-rates and might be due to other factors.

I hope that made sense.

Posted by: Jeff at March 9, 2007 11:52 PM

http://www.insidethebook.com/ee/index.php/site/comments/the_odds_ratio_method/

Same deal? K/PA is a binomial too, and I seem to remember either Tango or Studes pointing out that it worked on some data you posted a while ago.

Posted by: HarryAbles at March 10, 2007 02:00 AM

This is Steve, who seems to have a personal vendetta against A-rod, trying to manipulate stats in a way that makes Alex Rodriguez, the future Hall of Famer, look bad. Steve is the leader of a group of Yankee fans who don't like ARod despite 35 HR, 120 RBi and a .290/.392/.523 line in a so-called off year. It's getting to be f***ing ridiculous. Don't give him this free publicity for his crappy theories.

Posted by: Alex Rodriguez at March 10, 2007 02:21 AM

Logically, a pitcher who strikes out 25% of batters PA should strike out the average batter at a greater rate than average and greater rate than the pitch to contact pitcher does, so the K should be given more credit to the pitcher for the high K rate pitcher. And a strike out by a high K rate hitter against a pitch to contact pitcher is more likely due to the hitter than the pitcher, so the pitcher should get less credit. Not sure your data rules this out.

Starters are more likely to have average K rates than top relievers, so limiting the study to players who have matched up 20 times or more may not be the best approach as it reduces the number of RP in the study.

Another way to study this would be to look at some of the high K rate pitchers (>20%) and break down their K rate based on the batters average K rate (high >15%, medium 10-15%, low Logically, a pitcher who strikes out 25% of batters PA should strike out the average batter at a greater rate than average and greater rate than the pitch to contact pitcher does, so the K should be given more credit to the pitcher for the high K rate pitcher. And a strike out by a high K rate hitter against a pitch to contact pitcher is more likely due to the hitter than the pitcher, so the pitcher should get less credit. Not sure your data rules this out.

Starters are more likely to have average K rates than top relievers, so limiting the study to players who have matched up 20 times or more may not be the best approach as it reduces the number of RP in the study.

Another way to study this would be to look at some of the high K rate pitchers (>20%) and break down their K rate based on the batters average K rate (high >15%, medium 10-15%, low

More work than I would want to do though

Posted by: Paul Todd at March 10, 2007 03:38 AM

R squared was .372.

David

Posted by: David Pinto at March 10, 2007 07:53 AM

Looking at the scatterplot it looks like there could some problems with relying on a strictly linear approximation of the trend line. It seems as though the regression line is being pulled downwards by both the set of datapoints where Actual K Rate = 0, and also in the weird cluster where Actual K Rate = .04 (a cluster which stands out from the rest of the data points). I'd be curious as to what the regression line is if you ignore these points. And even if they are not ignored, maybe a binomial specification would give you a more accurate regression line, as it could account for the more horizontal trend at the low Actual K Rate values and the more vertical trend for the higher values.

Posted by: ||| at March 10, 2007 11:10 AM

Yes, we had this discussion a month or two ago. It's the Odds Ratio method (or Log5 for Bill James). I don't know what the big secret is about it, but I've revealed the details on the link that HarryAbles points to.

Posted by: tangotiger at March 10, 2007 07:28 PM

Also see this article in By the Numbers.

Bill James credited Dallas Adams for developing the method.

Posted by: JoeArthur at March 10, 2007 08:35 PM
Post a comment









Remember personal info?