I read an interesting article in the New York Times that said For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights. For anyone that has done any sort of science this should not be a surprise. Science is the product of a relentless sort of focussed stupidity combined with intelligent insight and this sort of basic work is essential to the first part of the process.
Anyone who has done a science experiment, even a simple high school one, will understand that the act of data gathering and sorting is a menial task that cannot be simply automated. As a scientist, you are often playing the role of a highly trained monkey repeatedly doing the same tasks. Normally this sort of activity would be one that could be quickly automated but there turns out to be so much art in the science that you are often the only trained monkey who can do it.
Whether, as in the article, it’s different words meaning the same thing for a company providing information on drug side effects needing to know that “drowsiness,” “sleepiness” and “somnolence” all meant the same thing or whether it’s a researcher analysing EEG patterns determining whether the activity is real or just an artefact, both situations are remarkably resistant to complete automation. Even if you could automate everything, it’s only when you’ve ploughed through the dataset that you start to have an idea of what’s going on – and that’s ignoring the green jelly bean problem of post hoc analysis.
For my PhD I manually compared 1.1 million EEG patterns and that was after the computer had processed and tentatively categorised the data. I’d go to sleep seeing wiggly lines across the back of my eyelids but that was the price to pay for correctness that no computational analysis could match.
It’s fantastic that there are start-ups out there looking to solve this problem of data wrangling (as not everyone has an army of PhD students, postdocs or other slave labour to do this for them) but it’s an exceptionally tough problem that I don’t expect to see solved any time soon.
Java 8 was released recently and one of the much heralded features is Default methods. Simply put, these allow you to inject functionality to classes via interfaces. This is a new feature in that prior to Java 8, Interfaces could only have method signatures and not have the implementation and only Abstract classes could have method implementations but that has now changed.
Here’s a simple example that prints out “In InterfaceOne”
In Scala, another JVM language, you can do something similar but more awesome with Scala’s traits (its version of interfaces) you can override similar implementations and have a version of multiple inheritance. This example prints out “In TraitTwo”
The secret that avoid that diamond of death is that whichever trait is declared last wins. If I’d swapped around the order that the traits were declared then it would have printed “In TraitOne”
What is even nicer in Scala is that you can declare your class with traits when you instantiate it. You don’t have to declare it at the compile time of the class. This means that you have a powerful way to extend functionality of classes but without the insanity of monkey patching. ThisThe below example also prints out “In TraitTwo” but the class does not extend any trait. This of course is Scala’s cake pattern where you can mix in the traits.
What is slightly disappointing with Java 8 is that you cannot mimic this behaviour. If you try to do this in Java 8, you get a nice compile time error telling us the class inherits unrelated defaults.
I wonder why they chose not to support this. The cake patterns seems like a feature that adds flexibility without being able to shoot yourself in the foot too much.
Just as an experiment I thought I would try using DuckDuckGo for searching. In an increasingly big brother world I like that they don’t track you and I like that they only serve requests securely (requests to http://duckduckgo.com get a 301 permanent redirect to https://duckduckgo.com). They also offer a proxy service and the search results seem to be pretty good.
I might give it a month and then report back.