Wednesday, October 08, 2008

This blog has moved to

I've finished moving this blog to:

All pages on the old blog redirect to their equivalents over there.

Update your bookmarks and feeds! New posts won't appear here any more.

Tuesday, October 07, 2008, online polling, and potentially the coolest question corpus ever

MySpace and the Commission on the Presidential Debates put together a neat site,, which presents the candidates' positions through various mini-polls and such. It even has a cool data exploration tool for the poll results ... for example, here are two support maps, one for respondents over 65 and one for 18-24 year olds.

Anyway, the site also takes submissions of questions for tonight's debate. Apparently six million questions were submitted, and moderator Tom Brokaw will of course use only 10 or so. This begs a question, how were they selected? There's no Digg-like social filtering or anything. You could imagine automatic methods to help narrow down the pool: Topic clustering? Quality ranking on syntax and vocabulary?

Eric Fish suggested the obvious: probably someone picked 1000 randomly and sent them to Brokaw.

I'd love to see a corpus of 6 million questions on U.S. political subjects, directed at only two different people. Anyone know anyone who works at MySpace or CPD?

Time to watch the debate! (Alas, no PalinSpeak liveblog this time, of course.)