BitTorrent is software that allows you to load very large files very quickly by pulling small pieces of data from many different systems. Others are searching for compounds that might make new drugs by taking advantage of spare CPU time on computers connected to the internet. Distributing the work allows more to get done faster without burdening one particular machine or person.
I’m wondering if the same would work for scoring games. Companies that collect game data require two or three people to devote a few hours to watching and tracking a sporting event. It means you need to pay attention during the entire period. It’s difficult to take a phone call, deal with your family, get a bite to eat, even go to the bathroom. These companies compensate the scorer with press box seats, or a small wage, or both.
And because they compensate their scorers, they need to charge for their stats. But, what if instead of requiring hours to score a game, advance sign ups and so forth, all that was required a small amount of a fan’s leisure time? You’re sitting down to watch a game for 1/2 an hour? Surf to the Distributed Scoring System and enter the plays. Or score on paper and enter it the next day.
Some people might do a full game, some might just do an inning. The idea would be to get enough people that you’d cover the whole game at least twice (for error checking purposes). The results would go into a play-by-play database from which you could then do any kind of research you wanted.
A rough specification of the system would look like this:
- Web Based: It should run in almost any browser. It probably should be written in server side scripting language since you can’t depend on machines supporting Java Script.
- The smallest unit of scoring would be the half inning. Events in the half inning are dependent on previous events in the half inning, but each half inning is independent. If we made the smallest unit the plate appearance, the user would need to put in a lot more information for each PA.
- The order of the innings shouldn’t matter. People could score the innings backward, 9 to 1. They just need to input an inning beginning to end.
- Input would be a pitch sequence followed by an event type. That would take you to another screen where the details of the event are entered, depending on the type of event and the situation.
- Input should be as simple as possible. Single keystrokes should represent pitches and events as much as possible. Having worked with both text based systems and mouse based systems, I find the text based systems are faster for me.
- Data would be stored in a temporary database until it could be verified.
I’m interested in what people think about this idea, both from technical people on the feasability of the programs, and if others would be willing to score like this. It’s a way to give all bloggers, baseball researchers and fans on the net an easy way to research complicated questions.


This is a great idea, and actually, wouldn’t take long at all to throw together a server and a quick program for people to do it. Then means could be taken, along with a 3-6 sigma std deviation to account for variablity in users input.
Heck, you could even do it in VB in a couple of hours, maybe less time.
Another issue worth thinking about–MLB’s recent push to consolidate its scoring services so that it retains a monopoly on box scores and stat lines. This seems like an impossibly silly idea to me, precisely because of the idea you’re proposing. It’s just completely unenforceable.
Nevertheless, I wonder what MLB’s response would be from a legal standpoint. I imagine they would try to paint you in the same light as Grokster–even if the central server did none of the “scoring,” it enables people to infringe the “copyright” of MLB, that is their box scores. Depending on how one reads the Grokster decision, either MLB wins, or a court would recognize the inherent limitations Grokster presents. In any event, it’s worth thinking about before pursuing this project.
You wouldn’t want to do it in VB. Like David said, a server-side language (Java, PHP, etc.) would work best. And even better, the project should be Open Source so that lots of people could contribute to the final product.
Of course, a comprehensive list of requirements should be gathered BEFORE any thought of coding could be started. That way the community can gauge how much work needs to be done, etc.
setting up a program like this is easy, the basics are downright trivial. I could cobble it together in php and mysql fairly quickly.
the tricky part, and where all the added value is, is what you can *do* with all this data. custom exports of the data would be extremely powerful. team stats, player stats are all good but theyre only the tip. easily available exports of ridiculous minutae would be a HUGELY attractive idea. for example, batting average and OBP of batters when scott podsednik is in a steal situation.
While you might run into flak from MLB over this, the real question is what will retrosheet think of it?
Anyone who says they can put this program together quickly has never written a real piece of software before. There are many, many thing that need to be thought about before just throwing stuff together on a web page. Anyone can put a couple textboxes on a page and hook it up to a database, but with an application like this you ahve to think about stuff like:
– putting the entire season’s games into the DB so a user can choose which game they are scoring.
– how will a user declare which player is batting/pitching?
– how do you change a batter for a pinch hitter/runner?
– will you give the scorer a dropdown of all the possible marks they can make for a batter?
– how are you going to lay out the app so that it’s easy to use for the average user?
– etc. etc. etc.
Like I said, it’s easy to throw crap on a page and call it an application, but if David wants this to be an application that lots of people use, it needs to be carefully thought out and constructed. If someone comes to the site to score a game, and it’s hard to use, difficult to understand, or just plain ugly, then the chances of that person returning are slim to none, and the project will likely fail.
Oh yeah, and you’ll need algorithms to determine what to do when there is a conflict, which there will be many, many of. Sure, you can select the thing that is the most common, but that isn’t always the correct answer.
Sabernar is correct. I’ve written scoring programs in the past, and there are a million little things you need to deal with (like how to handle batting out of turn). My guess is that if we start now, we could have this in place for the 2006 season.
All of which can be handled fairly quickly. I guess my definition of “fairly quickly” in regards to programming is different than yours.
Get a project manager, set up a sourceforge page, have it all open source and have people work on different parts. the basics will be done pretty quickly.
its the analysis aspect that has the most value and will be the most difficult.
Distributed scoring is an excellent idea, but there is one major non-software issue to consider–reliability. It’s the same issue that you must always consider when you allow distributed human input.
Distributed computing works well because your computer is just as reliable as my computer. But human input? That’s quite variable. You briefly touch on this point when you say that “Data would be stored in a temporary database until it could be verified.”
As you probably know, there are typically four ways to verify: 1. general consensus, 2. software to check for self-consistency, 3. one of many distributed (often self-proclaimed) experts, and 4. a small number of in-house real experts.
Do you plan on using a combination of ways to verify? If enough people enter the same info for the same half inning, then it must be correct? If select users input data, then it must be automatically correct? (Even if the data is different from the majority?) Will you have software construct and proof boxscores as penultimate verification? Will you have a small staff to compare proofed boxscores with official boxscores as final verification? Will that same staff check some fuzzy stats (like some errors/hits, holds and wins when no pitcher pitches a plurality of innings in a win)?
If you want reliable data, you can rely on distributed input as a suggestion but not the final word. You do need software verification for self-consistency as well as some expert(s) somewhere to put a final stamp of approval. (But that last step isn’t so bad…it’s always much easier to verify and correct than it is to input from scratch.)
How do you feel about Wikipedia? Do you believe everything you read? I don’t. Not since I don’t know the source. I take everything as a suggestion and a starting point. Encyclopedia Britannica (EB) is now changing how they operate and allowing user input! But the user input will go into a temporary database until their experts approve it. Once approved, the info will go on their site. EB sees the value in user input, but also understands that reliability is key.
tvtome.com allowed unverified distributed user input about tv shows, but the site was setup wrong so that the tv boxscore couldn’t be proofed for airing times. Oops! tv.com bought tvtome.com and now requires users to input data that is verified by their experts.
MV makes very good points. My idea is to have at least two people scoring every inning. A program would compare innings against each other, and send each reporter a list of differences, if any exist. Reporters could then go check sources; video recordings, audio recordings, published play by plays to see if what they are seeing agrees with their account.
The same thing could also be done by a “Home Office”, which could be distributed trusted individuals.
There would also need to be individuals who did daily or weekly stat checking vs. other published sources.
Errors may not even be the fault of the reporter. If there are two broadcasts of a game, one might report the change of a scoring decision while the other might not. By having two accounts (with luck, from different sources) we can can these.
And huge kudos to MV for using the word penultimate!
One idea might be to approach SBNation
http://www.sbnation.com/
Blogs with communities for almost every ML team, and could probably easily get people scoring games just from there as part of the SBNation family.
I love this idea. One thing I’d like to see included that wasn’t previously mentioned – an API accessible by third parties.
Such an API could be used to construct alternative interfaces for inputting data or as a way of getting data out of the system for people who want to query the results.
Also a very good idea.
I would be willing to help code the project. I have 5 1/2 years experience with PHP and mySQL. It sounds like an interesting project. I have a full-time job, but I would be willing to spend an hour or two a day to this project.