CodeX Intern: Predicting the future: election results and more

Ken Jennings graciously losing to IBM's "Watson" on Jeopardy

Not only can computers steal our jobs and win at Jeopardy, they can also predict the future with frighteningly accurate results. Yesterday, Professor Daniel Katz gave a fascinating presentation on quantitative prediction in my legal informatics class. His focus was primarily in the legal field but the explanation of how we've arrived at a point in time where legal prediction is possible is applicable to many areas.

His slides are available below. I would highly recommend a quick skim of even just the first 60 or so slides if you're not interested in the legal field for a general overview of the rise of technology. (To my fellow law students, the information contained past slide 60 may disturb you but I urge you to power through!)

Katz starts off with an area we seem to be really interested in predicting/projecting the results of: elections. This is done by poll aggregation i.e. gathering the results of many different polls and weighting them to create the most accurate composite. Does this sound easy? It's not. Assigning specific weights to poll results is a difficult task. Many factors need to be taken into consideration and since there's only one election every four years, the cycle of learning from past events and using this information to improve your method is incredibly slow.* Another question is how much of the past is useful for predicting the future? There have obviously been many presidential elections but as you might imagine, poll results from 1800s might have lost their relevance by now. But as Nate Silver demonstrated by correctly predicting 49 out of the 50 states from the November 6th election as well as all 35 senate races, predicting election results accurately can be done.

*In contrast, take the example of Google search results. Katz jokes about being an "aspirational speller", meaning, if you know the approximate order and letters of a word, Google will take care of the rest. When you put in something that's misspelled into Google, many results will come up. One (or two) will be correct and you communicate to Google which when you click on a certain one. This information is used to improve the results of the next search made with the same misspelled query. This happens billions of times presumably, making Google's search results very good at deciphering poor spelling. I believe your spellcheck, especially on something like a mobile device, works the same way.

It probably doesn't come as much of a surprise that Silver worked in baseball analytics prior to his election work.

How does it work?

There's a rather detailed explanation here that I'm not even going to try to begin to understand. The simple version is this: poll results are collected, weighted, and adjusted. For example, a poll with a larger sample size may be given more weight. If a poll typically leans Republican by a certain percentage but is otherwise reliable, its scores can be adjusted and then weighted.

As mentioned before, the problem with election results is that it's hard to improve your formula when the election you're predicting only happens once. This isn't quite the case for law ... so why haven't we gotten into quantitative legal prediction? I will be sure to post Katz's paper on the subject once it becomes publicly available. In the meantime, his blog can be found here.

Friday, November 9, 2012

Predicting the future: election results and more

No comments:

Post a Comment