The MacGyver of data analysis
Jeff Hammerbacher, who runs Facebook's data infrastructure and insight team, is leaving the company. That's too bad for them, considering a hilarious quote from a talk he gave:
Basic statistics is more useful than advanced machine learning.
I can't tell you how many interviews I've had where someone has a really cool project on their resume. Support vector machines, topic analysis on CiteSeer, or whatever... But what it boils down to is someone took toy data set A and plugged it in to machine learning library B and took the output and was like, “sweet.”
People with "machine learning" on their resume fall from the sky these days, it seems to be a very sexy discipline. The problem is if I ask them explain a t-test, those same people can't tell me what that is.
If I had a MacGyver of data analysis and all he had was a t-test and regression, he would probably be able to do 99.9% of the analyses that we do that are actually useful.
Amen. That, with the importance of data visualization, is one of the best lessons I learned working with real data in the last year or two. Here's the entire video, in which he also talks about Hadoop, Scribe, Hive (a data warehousing and analysis platform they've built on Hadoop), and other fun things. The above bit is around 35:00.
(Direct link, hopefully, though the website is weird)
There's lots more to say on statistics vs. machine learning and all that. For another post...