May 29, 2016

More on Beating the Streak

I ran a test of the Neural Network (NN) I’m using to produce a set of output for the Beat the Streak game. I built the network by selecting 10 random dates from each even year from 1980 to 2008, a total of 15 years. I took all games on those 150 dates, determined the starting pitcher for each game, and the starting opposition lineup for each game, leaving out the opposing pitcher. My guess is that worked out to be about 13 games a day, 17 position players per game, or a total of about 33,000 samples. I then trained the network on 75% of those samples, and used 25% for validation. I ran 200 training epochs, but the NN converged rather quickly.

To test, I used 160 dates, 40 each from 2003, 2005, 2007, and 2009. So there is no overlap with the training data. For each date, I used the NN to determine the match-up with the highest probability of getting a hit, and if the batter did indeed get a hit that day. I ordered the data by date, and looked for streaks. Here are the results:

Days: 160, Expected Games with hit: 124.9, Actual Game with hit 129.
Streak Length: 1, Number of times: 8
Streak Length: 2, Number of times: 5
Streak Length: 3, Number of times: 2
Streak Length: 4, Number of times: 3
Streak Length: 5, Number of times: 2
Streak Length: 6, Number of times: 2
Streak Length: 8, Number of times: 1
Streak Length: 9, Number of times: 1
Streak Length: 10, Number of times: 1
Streak Length: 13, Number of times: 1
Streak Length: 14, Number of times: 1
Streak Length: 16, Number of times: 1

So in this case, the NN underestimates the probability of the batter getting a hit that day. That’s good, I prefer a conservative model. It predicts a probability of 0.78, and delivers .806. It’s in the ball park.

Note however, that the long streaks are not very long. The season only made it into double digits four times.

As a sanity check, here is what happens when the worst player is chosen every day:

Days: 160, Expected Games with hit: 94.4, Actual Game with hit 97.
Streak Length: 1, Number of times: 16
Streak Length: 2, Number of times: 9
Streak Length: 3, Number of times: 9
Streak Length: 4, Number of times: 2
Streak Length: 5, Number of times: 1
Streak Length: 7, Number of times: 2
Streak Length: 8, Number of times: 1

The NN expects a probability of .59 hits per game, and the players produce at a rate of .606 hits per game. Not only that, but there are no double digit streaks. I’m liking this a lot.

Update: Just to be complete, I selected a random player from each day to see what the results would be:

Days: 160, Expected Games with hit: 109.8, Actual Game with hit 110.
Streak Length: 0, Number of times: 19
Streak Length: 1, Number of times: 13
Streak Length: 2, Number of times: 5
Streak Length: 3, Number of times: 3
Streak Length: 4, Number of times: 2
Streak Length: 5, Number of times: 1
Streak Length: 6, Number of times: 4
Streak Length: 8, Number of times: 1
Streak Length: 13, Number of times: 2

So here, the expected probability of a random draw is .686, and the actual production is .688. I’m fairly convinced at this point that the NN is giving us a good group of players from which to choose.

Leave a Reply

Your email address will not be published. Required fields are marked *