Big Data, the processing of data sets that do not fit on a single computer, has come of age. It’s not just the level of interest shown at conferences like Strata but also the types of people participating. Sure there are loads of companies out there with products in this space but there are also plenty of end users coming forward and many of these are outside of technology companies. At the London version, one of the speakers was Ben Goldacre, doctor and author of the awesome Bad Science, who discussed the impact of missing data which is a huge issue for medical studies. Even the Whitehouse has weighed in on behalf of Big Data and emphasized its importance to business.
If you are going to use a technology I’m a big fan of going to the source and thankfully in this space, a lot of the published work on this is freely available. So I’ve collected some of the papers that are key to this area: five are about Big Data itself and the bonus one is about operational monitoring for massively distributed systems.
“There is never any shame in being wrong, only in being too ignorant to learn why you were wrong.”
NoSQL is a hot topic right now; as long as you don’t need ACID guarantees or complex joins you can have a persistence store that is faster, scales better, allows greater schema flexibility and all at a lower comparable cost than a relational database. The number of companies looking to use NoSQL has grown massively and the number of NoSQL solutions looking to feed this grown have blossomed also.
In the eye of this storm are three sets of individuals On one side we have the developers desperate to own the full stack from web app to data store, in the middle are the Ops guys and DBAs used to owning and running the persistence stores and on the other side are the vendors selling their wares. One group focuses on delivering new features as quickly as possible, another ensures that they run smoothly and can be recovered as and when they go bang and the third are deluging these other two groups with an almost impossible amount of information to make sure that their solution is the one being used.
This was the lecture I was most interested in. Given the interest in NoSQL, as well as the FUD and general handwaving from the industry on this topic I really wanted to see what SpringSource were making of it.
2 part lecture : motivation to why and then examples
Until recently, we have stored everything in RDBMS. This is sub-optimal for objects but especially or trees or graphs or networks. The last couple of years has seen th rise of NoSQL is the umbrella term for the desire
So which do I use? And if I do have a shortlist, how do I interact with them. Well there’s some Spring for that! The Spring Data Project – aims to provides a familiar and constant Spring based programming model while not over-abstracting custom traits of the specific store. The last point is emphasised because an overly flat abstraction means that you would lose the things that make each implementation special.
The First point is just the usage of familiar spring idioms like Templates and Repositores. Riak and Redis are quite well supported. MongoDB and CouchDB is quite well supported.Haddop and Neo4J are also supported. JDBC and JPA is aso supported since this project is about Data, not just NoSQL.
The examples for this talk are based on MongoDB. JSON based with nested KV pairs with consistency within the document.
The building blocks of Spring Data: Spring Core (DI, AOP, Namespaces, JMX, JDBC), Templates for each implementation for resource management (connections) and exception handling – also some convenience methods for simple queries, Mapping support via annotations to map the domain model to the store. Repositories layer – finder methods and query methods like in Spring Roo.
Examples are all on github. – org.springframework.data.mongodb.examples.music
In the mongo example, the domain model is decorated with @Document annotation and the index member decorated with @Id
A kv pair is a field and it’s value. By default the mapping are with the member name but these can be overridden. @DBRef allows you to connect to a separate collection in another DB. Multiple constructors can be filtered with @PersistenceConstructor
To use it you just need to declare the implementation namespace and then create a template. Like RestTemplate, the templates are underpinned by the equivalent *Operations Interface.
Repositories for CredRespoitory for finder methods. NBoth names and locations (co-ordinates) which auto-creates the repositories which give you a lot of power for simple queries like findOne(Id) find<Member><Property>[<Keywords>].
THe auto-gen stuff is done at context startup so you don’t have to wait until runtime for it to fail.
What about complex queries where you don’t want ugly names. Annotations of course! You can have queries in the annotations
SPring Data supports pagination out of the box. All you need is Pageable argument where the PageRquest is the implementation. Declare the result type as Page andd you get Context like isLast and getTotalPages.
QueryDSL open source project?
Cross storage persistence is also possible with a nice annotation model that makes it pretty seemless.