Baseball Musings
Baseball Musings
February 28, 2009
Probabilistic Strike Zone Model

The last few days I've been working with the pitch f/x data, with the idea of applying some kind of probabilistic model to the strike zone. I started with a very simple model to learn how to get around the data and to prove the concept.

My initial model looks at the chance of positive and negatives outcomes for pitchers. A negative outcome is a called ball, or a ball in play resulting in a hit. Positive outcomes are all others. I divide the X and Z axes into three inch lengths, and look at the results when a ball passes through the defined three by three area. I use the following formula for the x coordinate:

round((12*(p.px+12))/3, 0)

Where p.px is the distance from the center of the plate in feet. The plus twelve makes everything positive. This makes 48 the center of the plate.

For the Z axis, I use the same formula, substituting p.pz for p.px. In this case 48 represents the ground. The computed px and pz together then define a zone. Some will be in the strike zone, some will be out, and some will be on the edge, both in and out of the zone.

I then build the model based on the computed px, pz, and the batter side (left or right). It's the number of positives in the zone divided by the total number of pitches in the zone. The following two tables show the model for zones with at least 200 pitches.

Due to the formatting of the blog, it's easier to read this at the permalink.

Right-handed Hitters,2007-2008
Strike Zone, Catcher View4041424344454647484950515253545556
66--------.127--------
65----.143.166.228.256.243.201.206------
64---.158.219.288.350.337.377.327.268.212.120.054---
63--.106.211.331.411.485.519.530.465.401.290.177.100---
62-.084.188.275.425.557.673.719.719.669.600.431.267.142.059--
61-.084.240.359.592.748.822.854.864.848.800.689.448.204.099.046-
60-.092.257.423.701.822.870.879.885.893.895.830.619.312.143.082-
59.062.116.254.484.728.859.868.856.862.874.893.859.685.341.197.102.045
58-.140.279.504.753.842.859.857.863.862.873.870.721.416.184.108.037
57-.151.248.497.716.826.850.838.859.863.874.863.689.371.169.119.075
56-.107.257.441.668.769.785.825.835.848.853.800.579.321.196.123.060
55-.085.235.348.509.613.663.685.719.711.699.593.420.257.203.101.095
54--.165.272.372.445.444.500.488.488.448.380.288.210.142.126.065
53---.257.304.340.362.398.380.389.321.306.246.203.156.109.076
52---.136.220.259.286.290.325.312.310.243.231.178.138.112-
51-----.212.254.211.239.242.247.226.180.128.096.069-
50--------.169.196.222.142.146.126.094--
Left-handed Batters
Strike Zone, Catcher View404142434445464748495051525354
65-----.148-.156-------
64--.015.078.121.234.288.344.414.313.303----
63-.045.062.094.211.319.468.538.557.543.428.337.218--
62.024.020.090.196.376.526.616.695.697.687.589.462.301.173-
61.016.039.112.298.610.792.837.827.850.838.773.603.396.168-
60.021.074.193.481.773.895.879.883.877.858.855.693.513.270-
59.055.126.267.570.843.883.882.877.861.840.858.790.551.287.169
58.055.120.266.618.860.886.877.863.838.847.836.760.536.290.184
57.042.104.289.590.849.872.868.843.851.847.848.768.525.277.151
56.077.111.269.514.780.858.854.833.822.831.784.651.458.289.126
55-.130.189.380.583.670.709.713.705.658.600.520.338.263.168
54-.098.168.284.353.437.478.501.479.463.456.383.303.205.094
53--.141.206.280.330.387.387.427.354.342.298.205.211-
52---.197.209.261.331.316.348.311.251.223.250--
51----.222.270.236.255.282.267.197.182---
50-------.181.168.192.190----

I actually thought there would be a bigger difference between throwing the ball down the middle of the plate versus throwing on the edges. Pitchers do a bit better to the catcher's right, regardless of batter handedness. Up in the zone is better for pitchers than down in the zone, at least over the heart of the plate.

Of course, the problem with the look at the data is that an out counts just as much in the pitcher's favor as strike one called. There are many other parameters to take into account, including speed and break. This is just a start.

Correction: I fixed the table vs. LHB. When I ran the script that creates the table, I only changed the hand in one of the two queries.


Posted by David Pinto at 01:11 PM | Pitchers | TrackBack (0)
Comments

Color me confused - The RHH and LHH charts are identical - same values in each cell, except there are two extra columns and one extra row for for RHH.

?

Posted by: Harry Pavlidis at February 28, 2009 04:26 PM

You're right. I'll look into this.

Posted by: David Pinto at February 28, 2009 06:02 PM

"Positive outcomes are all others" This would presumably include the umpteenth nasty pitch that the batter ruined by fouling it off -- giving him ever increasing insight into what the pitcher has. Statistically, it's clear, batting average improves as hitters see more pitches in a given at-bat. Surely these fouled off pitches should not be viewed as "positive" for the pitcher.

Posted by: Harry Kanigel at February 28, 2009 07:23 PM

"Statistically, it's clear, batting average improves as hitters see more pitches in a given at-bat."

Actually, it's statistically clear that the batting average declines as the hitter sees more pitches. MLB BA overall in 2008 was .264, but for all counts that got to 2 strikes the BA dropped to .190 (including those with "umpteen" fouls).

Without a doubt, anything that gets the count to 2 strikes is a positive for the pitcher.

Posted by: thumble at February 28, 2009 11:55 PM

David, can you do some sort of shaded map for this?

Posted by: Pizza Cutter at March 1, 2009 02:56 PM

"Statistically, it's clear, batting average improves as hitters see more pitches in a given at-bat."

Permit me to amend like so:
Statistically, it's clear, batting average improves as hitters see more pitches in a given at-bat after two strikes.
Do you seriously dispute this?

Posted by: Harry Kanigel at March 3, 2009 08:24 PM
Post a comment









Remember personal info?