Six papers for Big Data fans

Big Data, the processing of data sets that do not fit on a single computer, has come of age. It’s not just the level of interest shown at conferences like Strata but also the types of people participating. Sure there are loads of companies out there with products in this space but there are also plenty of end users coming forward and many of these are outside of technology companies. At the London version, one of the speakers was Ben Goldacre, doctor and author of the awesome Bad Science, who discussed the impact of missing data which is a huge issue for medical studies. Even the Whitehouse has weighed in on behalf of Big Data and emphasized its importance to business.

If you are going to use a technology I’m a big fan of going to the source and thankfully in this space, a lot of the published work on this is freely available. So I’ve collected some of the papers that are key to this area: five are about Big Data itself and the bonus one is about operational monitoring for massively distributed systems.

NoSQL

Bigtablehttp://research.google.com/archive/bigtable-osdi06.pdf

Distributed storage for structured and semi-structured data systems

Dynamohttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Scalable and resilient datastores for Big Data

Spannerhttp://research.google.com/archive/spanner-osdi2012.pdf

Google’s new globally distributed database

 

Big Data

Map Reducehttp://research.google.com/archive/mapreduce-osdi04.pdf

Batch processing and generation at scale

Dremelhttp://research.google.com/pubs/archive/36632.pdf

Ad-hoc queries at scale

 

Operational Monitoring at scale

Dapperhttp://research.google.com/pubs/archive/36356.pdf

Not strictly related to Big Data, Dapper covers tracing and profiling massively distributed systems

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>