Adopting NoSQL – prepare to get it wrong

“There is never any shame in being wrong, only in being too ignorant to learn why you were wrong.”

NoSQL is a hot topic right now; as long as you don’t need ACID guarantees or complex joins you can have a persistence store that is faster, scales better, allows greater schema flexibility and all at a lower comparable cost than a relational database. The number of companies looking to use NoSQL has grown massively and the number of NoSQL solutions looking to feed this grown have blossomed also.

In the eye of this storm are three sets of individuals On one side we have the developers desperate to own the full stack from web app to data store, in the middle are the Ops guys and DBAs used to owning and running the persistence stores and on the other side are the vendors selling their wares. One group focuses on delivering new features as quickly as possible, another ensures that they run smoothly and can be recovered as and when they go bang and the third are deluging these other two groups with an almost impossible amount of information to make sure that their solution is the one being used.

Its so surprise that this creates conflict. There is the confusion around how to match the NoSQL classification against the specific use case: K/V, Document, Graph or Columnar and how can you make the best use out of each type? Then there’s what type of consistency you are looking for: strong/weak, intra and inter data centre? And then there’s the operation concerns of keeping it running including what sort of infrastructure you need? These are just three examples. Compared to the known technologies like MySQL or Oracle which have decades of knowledge behind them, it’s a lottery ticket.

One of the talks at Velocity Conference in Santa Clara this year gave a great example of this. The guys at Pinterest went through what can only be described as a VC’s wet dream and an engineers nightmare. The sort of exponential growth that would put Ebola to shame. Starting off with a single MySQL box and a single web server, they ended up using an unholy mix of MySQL, Cassandra, Memcached, Membase, MongDB and Redis to solve their data needs of suddenly serving billions of page views per month. Needless to say, the initial criteria of “getting the job done” was swiftly replaced by a focus on the operational concerns of reliability, backups, operational monitoring, support and the like. They rationalized their tech stack to MySQL, Memcached and Redis after a series of catastrophic failures in their NoSQL technologies. Not only did they removed several NoSQL technologies, they also ended up using their MySQL in a NoSQL manner by using it as a K/V store. The driver behind this usage pattern seems to be reliability as they experienced data loss with more than one of the NoSQL solutions. It’s a testament to their engineering processes that their backups could easily be brought up to speed.

Another examples is Netflix. Netflix are well known for their use of Cassandra and for running it in Amazon’s EC2 cloud but what is less well known is that to achieve the performance they required, they had to front their Cassandra instances with Memcached. Even with this optimised set up (which requires almost double the number of normal boxes), they had to be careful not to overload their IO by scheduling compactions and repairs in sequence. This is not the promised land of simplicity via NoSQL. After some serious analysis work they found that rather than trying for performance increases by optimising their NoSQL solution for the infrastructure, they could actually get better performance from less boxes by optimising their infrastructure to their NoSQL. In their case they saw a reduction in the 95 percentile latency metrics from 65ms to 10ms with a lower operational overhead since they went from 84 boxes to 15 with a cost reduction of 56% by switching to high IO EC2 instances.

At my current client, we started using Cassandra for storing non-relational data in the web tier: it ticked a number of boxes for us. It was fast enough in initial performance testing, it was integrated well into our continuous delivery process and it had already been used in production so our ops guys were happy with it – something that is often under-valued in an enterprise setup. Initial usage was great but increased load and a data set that was more ephemeral as well as having a higher read/write ratio that we thought highlighted that this was less than perfect. Our requirement for strong consistency did not play well with a Dynamo based NoSQL solution like Cassandra. Rather than trying to add complexity to either our software or infrastructure by fronting it with Memcached or moving to SSDs we made the choice to shift to a more appropriate NoSQL solution which in our case was Couchbase. Couchbase offers strong consistency, high thoughput, deterministic and low latency and a host of other features that we were looking for. They also offered excellent vendor support including helping us choose the right infrastructure for our use cases.

So what can we learn from these 3 cases?

Firstly no matter how much effort you put into figuring out which NoSQL solution is best for you, unless you have previous experience with NoSQL it’s odds on that you will get it wrong. It’s incredibly important that you focus on your exact problem and not on a specific implementation and that you allow for a migration process to something else from the very start.

Secondly, simplicity is key. When things go wrong, it’s far easier to find what’s gone bang when there’s minimal noise. In new systems like NoSQL the complexity can be emergent so minimise when you can.

Thirdly, given that NoSQL is a new technology its enterprise features can be a little lacking. When choosing your solution, make sure you give enough emphasis on these non-functional requirements like back-ups and replication strategies and not just ease of use. You don’t want to wait until the first incidence of data loss before you have sorted out how you deal with failure scenarios.

Without doubt, NoSQL is a game changer in many regards: developers can own the full stack which provides far more flexibility than a relational database for far greater performance, especially at scale. Products can be delivered faster and to a far larger audience than ever before but as with any new technology, you must go into this process knowing what you are getting yourself into and how to get your self out.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.