ConnectU.com SQL injection vulnerability: a story of pathetic hubris (and fun with the password 'password')

This is off-topic for this blog but here goes. ConnectU, a small college social networking site, has been in the news due to their apparently weak lawsuit against Facebook, in which they claim Mark Zuckerberg stole their business plan and computer code back when they all were Harvard undergraduates. (Judges involved have noted the case's flimsy evidence; some technology commentators -- as well as everyone I know -- have noted that the business idea wasn't all that brilliant or original in the first place.) Zuckerberg, of course, went on to found Facebook and bring it to incredible success.

I tried to use the ConnectU site recently, but got an error when searching for a funny name with an apostrophe, o'connor. It turns out this was symptomatic of a very grave security flaw in their code, an SQL injection vulnerability. While Facebook recently had a minor security-related glitch, ConnectU's flaw is far more serious. A malicious attacker could use this to easily break into user accounts, damage or delete internal databases, or probably much worse.

I contacted ConnectU about the flaw here and by now, they seem to have fixed the problem. (Sorry, I didn't get screenshots before the fix.) But this is hardly confidence-inspiring. This bug is one of the most elementary security bugs that can exist in a PHP website. It's a clear sign of a shoddy, amateurish effort; my coworker Dave Fayram, a web engineering expert, describes it as "shameful". Apparently ConnectU's lawsuit asks for all assets and ownership rights to Facebook under the presumption that Zuckerberg's actions were uniquely responsible for their relative lack of success. But this level of engineering incompetence belies any such claim (e.g. as assumed here). Mark Zuckerberg moved his operation to Palo Alto, hired boatloads of smart Stanford grads and built one of the most popular social networking sites around, while ConnectU piddled around with a provably pathetic, toy site.

Techincal details on their litany of errors:

The advanced search page (prominently linked directly off the front page) did not escape text field inputs. A search got submitted as a MySQL SELECT query, so if you cleverly used single quotes in any field, you could inject arbitrary SQL into the WHERE clause. (Much more malicious things may also be possible.) And to make matters worse, PHP debug error messages were on (bad!), so you saw MySQL error messages on malformed queries.

For example, issuing ' AND pw not null OR 'bla'=' yielded the error Unknown column 'pw' in 'where clause'. With a few more tries, it was trivial to discover that they're storing user passwords directly in the users table as plaintext (bad!) and you could even query for what users have various sorts of passwords. For instance, it turns out several users have length 1 passwords: ' AND password RLIKE '^.$' OR 'bla'='. And 192 users have the password 'password': ' AND password = 'password' OR 'bla'='. Amusingly, when you do this query you get back the standard results page that has every users' school listed; thus, of those 192, here are the top 10 schools represented among that group:
13 New York University
11 Harvard University
10 Cornell University
9 Louisiana State University
8 Boston University
7 Pennsylvania State University
7 Columbia University
6 University of Massachusetts - Amherst
6 University of Pennsylvania
6 University of California - Los Angeles

Sure, NYU and Harvard have some of the larger populations on ConnectU (around 1800 and 1300 respectively), but some some schools like Stanford have plenty of users (150) but no one at all with 'password'. Here is the list of the ten largest schools with zero password 'password's, sizes ranging from about 400 to 100:
Colgate University
Brandeis University
Syracuse University
Emerson College
Yale University
University of California - Davis
Rensselaer Polytechnic Institute
University of California - Berkeley
Stanford University
Rutgers University

Yes, this is atrociously poor statistical methodology. I only contend these lists are amusing, not substantive. But, it doesn't take much imagination to see that having access at all to such data is a critical security breach.

It's all in a name: "Kingdom of Norway" vs. "Democratic People's Republic of Korea"

Sometimes it seems bad countries come with long names. North Korea is "People's Democratic Republic of Korea", Libya is "Great Socialist People's Libyan Arab Jamahiriya", and the like. But on the other hand, there's plenty of counter-examples -- it's the "United Kingdom of Great Britain and Northern Ireland" and "Republic of Cuba", after all. Do long names with good-sounding adjectives correspond with non-democratic governments?

Fortunately, this can be tested. First, what words are out there? From the CIA Factbook's data on long form names, here are some of the most popular words used by today's countries, listed with the number of occurrences across all 194 names. I limited to tokens that appear >= 3 times. A majority of countries are Republics, while there are some Kingdoms, and even a few Democracies.
(146 of) (127 Republic) (17 Kingdom) (8 the) (8 Democratic) (6 State) (6 People's) (5 United) (4 and) (4 Islamic) (4 Arab) (3 States) (3 Socialist) (3 Principality) (3 Islands) (3 Guinea) (3 Federal) (3 Commonwealth)
Now, we can group countries by included words and look at how democratic they are, as according to Freedom House's political rights scores. They look at a number of political freedoms -- free elections, ability to run for office, power sharing, lack of military intervention in government, etc. to formulate the rating. The following chart shows the average political rights score per group of countries with the given word (actually, substring) in its name.


The upper rows show a substring and the number of names that are matched by it, and the average PR score. (These groups occasionally overlap.) The lower rows are several example countries for reference. So Republics are ever so slightly less democratic than your average non-Republic, and also amusingly, Kingdoms edge them out too. But Democratic, People's, Socialist, Islamic and Arab countries are definitely the big-time un-democracies, while the only clear winners on the other side are Commonwealths and Principalities. Here are the members of the smaller groups:

Score Name
----- ----

1 Kingdom of Belgium
1 Kingdom of Denmark
1 Kingdom of the Netherlands
1 Kingdom of Norway
1 Kingdom of Spain
1 Kingdom of Sweden
1 United Kingdom of Great Britain and Northern Ireland
2 Kingdom of Lesotho
5 Kingdom of Tonga
5 Hashemite Kingdom of Jordan
5 Kingdom of Morocco
5 Kingdom of Bahrain
6 Kingdom of Bhutan
6 Kingdom of Cambodia
7 Kingdom of Thailand
7 Kingdom of Swaziland
7 Kingdom of Saudi Arabia

2 Democratic Republic of Sao Tome and Principe
3 Democratic Republic of Timor-Leste
4 Democratic Socialist Republic of Sri Lanka
5 Federal Democratic Republic of Ethiopia
6 People's Democratic Republic of Algeria
5 Democratic Republic of the Congo
7 Lao People's Democratic Republic
7 Democratic People's Republic of Korea

1 Federated States of Micronesia
1 United States of America
1 State of Israel
2 Independent State of Samoa
2 United Mexican States
3 Independent State of Papua New Guinea
4 State of Kuwait
6 State of Qatar
7 State of Eritrea

1 Federal Republic of Germany
1 Federated States of Micronesia
1 Swiss Confederation
2 Federative Republic of Brazil
4 Federal Republic of Nigeria
5 Federal Democratic Republic of Ethiopia
6 Russian Federation

4 People's Republic of Bangladesh
6 People's Democratic Republic of Algeria
7 People's Republic of China
7 Lao People's Democratic Republic
7 Great Socialist People's Libyan Arab Jamahiriya
7 Democratic People's Republic of Korea

6 Arab Republic of Egypt
6 United Arab Emirates
7 Kingdom of Saudi Arabia
7 Syrian Arab Republic
7 Great Socialist People's Libyan Arab Jamahiriya

1 United Kingdom of Great Britain and Northern Ireland
1 United States of America
2 United Mexican States
4 United Republic of Tanzania
6 United Arab Emirates

5 Islamic Republic of Mauritania
5 Islamic Republic of Afghanistan
6 Islamic Republic of Pakistan
6 Islamic Republic of Iran

1 Commonwealth of Australia
1 Commonwealth of The Bahamas
1 Commonwealth of Dominica

1 Republic of the Marshall Islands
4 Solomon Islands
6 Republic of the Fiji Islands

1 Principality of Andorra
1 Principality of Liechtenstein
2 Principality of Monaco

4 Democratic Socialist Republic of Sri Lanka
7 Socialist Republic of Vietnam
7 Great Socialist People's Libyan Arab Jamahiriya

I just think it's striking there's such a small set of words used to describe countries, and that so many use "Republic". It speaks to (some sort of) triumph of liberal politcal ideas that even the most dictatorial regimes have to at least pay lip service to them. This has certainly been going on for a while; I suppose names have been moving in this direction for a few hundred years.

I also looked at simple word lengths of country names. It's not exactly the clearest bubbleplot ever, but if you go ahead and force a linear model (least-squares regression) on it, turns out each word contributes 0.26 points of un-democraticness. And if you viciously remove those lower right outliers (UK and Sao Tome), that coefficient bumps up to 0.39.

Boring details:
  • For the 2006 CIA Factbook information, I used an XML version described here and located here. For every country it gives a "conventional long form" name. If there is none, I used the standard short name. I think the Wikipedia List of countries page might have the same information as this.
  • The ratings are Freedom House's Political Rights ("PR") scores for 2006. (They also have a highly correlated Civil Liberties score; I should've used the overall average score but am too lazy to redo it all now.) Therefore this analysis doesn't include any of the extinct but excitingly named communist countries like the German Democratic Republic. Freedom House actually has historical data going back decades, so this could definitely be looked at; presumably this would further tilt the weight of "socialist", "people", and "democratic" to being non-democratic.
  • Strings and more in Ruby, plots all from R, and the occasional assist by Excel. Learned some new tricks too.

When's the last time you dug through 19th century English mortuary records

Standard problem: humans lived like crap for thousands and thousands of years, then suddenly some two hundred years ago dramatic industrialization and economic growth happened, though unevenly even through today. Here's an interesting proposal to explain all this. Gregory Clark found startling empirical evidence that, in the time around the Industrial Revolution in England, wealthier families had more children than poorer families, while middle-class social values -- non-violence, literacy, work ethic, high savings rates -- also became more widespread during this time. According to the article at least, he actually seems to favor the explanation that human biological evolution was at work; though he notes cultural evolution is possible too. (That is, the children of wealthier families are socialized with their values; as the children of middle-class-valued families increase in proportion in society, the prevalence of those values increases too.)

In any case, the argument is that behavioral changes, not institutional changes, drove the rise of capitalism. I know that some people define institutions to include cultural norms (and therefore human behavior, right?), so I'm presuming that for Clark and the academic debates vaguely mentioned in the article, "institutions" means something more boring like government structure or enforcement of property rights. (I'm reading Samuel Bowles's microeconomics book off and on, where he likes to mix behavioral and institutional ideas; and I seem to recall this from Avner Grief too; this all apparently is too confusing for me. (Bowles is quoted in the article.)) The article mentions Max Weber's Protestant ethic as related to Clark in its being a behavioral thesis.

I'm awfully skeptical of biological evolution claims without any actual genetic evidence (though I quite like cultural evolutionary claims), but the theory is very neat and the archival data gathered is incredible, as you can see in this shamelessly ripped off diagram/explanation from the NYT article about the Clark's book on this.

Are ideas interesting, or are they true?

From an NYT Magazine article this Sunday, paraphrasing Isaiah Berlin:

The philosopher Isaiah Berlin once said that the trouble with academics and commentators is that they care more about whether ideas are interesting than whether they are true. Politicians live by ideas just as much as professional thinkers do, but they can’t afford the luxury of entertaining ideas that are merely interesting. They have to work with the small number of ideas that happen to be true and the even smaller number that happen to be applicable to real life. In academic life, false ideas are merely false and useless ones can be fun to play with. In political life, false ideas can ruin the lives of millions and useless ones can waste precious resources. An intellectual’s responsibility for his ideas is to follow their consequences wherever they may lead. A politician’s responsibility is to master those consequences and prevent them from doing harm.

I can't speak for that level of politics, but I've seen in applied technology the distinction between interesting and true ideas can be great. (True in the sense of, it is true that the idea solves the problem at hand.) I've wasted plenty of time recently at work chasing down very interesting ideas only to reassess and do something more expedient. A different example from empirical science: it sure sounds interesting when the 1950's Chomsky claims that language can't be statistically modelled, though that turns out to be embarrassingly false.