Monthly Archives: March 2012

My GDC Questionnaire

I’ve been asked to be a mentor to the excellent Graduate Developer Community and before I’m allowed to go an warp the minds of future grads, I was asked to answer a quick questionnaire to give a bit of background on myself and ask how I got into programming. It was an interesting exercise so I thought I would put the results up here too.

Martin Anderson has worked in IT for the last 13 years across industries as diverse as online advertising and news to investment banking and gambling. He came to the industry from a slightly unusual direction since his first degree was a BSc in Physiology. After a brief stint for a pharmaceutical company he went back into academia and it was during his PhD, which involved the computerised analysis of EEG’s, that his interest in computing really took off.

Title – What is your job title?
Software Architect

What is your role about?

The best description I have ever read of what an Architect should be can be found here. The main responsibility of my job is that I am expected to cross the divide between technology and business while being expected to be able to answer all the technical questions. Principally I am a developer but my responsibility to be aware of so many different projects makes it difficult to directly contribute code very often.

What are the best/most positive parts of the job/industry?

That it combines the best of the worlds of startups and big industry. The problems we have to solve are generally the really interesting ones: how do you offer the best service to your customer while balancing the demands of performance, scale, security, regulators but we get to solve them in a way that is not normally seen in a major company.

Your Saturday afternoon becomes very different when you realise that you have up to 4 million customers betting many millions of pounds on your applications and to do this they are hammering them to pieces. For example, we see up to 88,000 requests per second across the entire betfair.com estate at peak times so your application can’t just work, it has to be fast, reliable and deal with load at internet scale.

What are the negative parts to the job/industry?

How changeable things can be. The industry is at the whim of regulators so some of our solutions have to make compromises that you would never design in if you had complete freedom to do it your way.

There is also the fact that some people see the gambling industry with a slightly negative cast. It’s not for everyone but what is?

Career Path

What is the standard career path/qualifications?

The standard technology career path at Betfair is: Intern – Associate Developer – Developer – Senior Developer – Technical Lead/Principal Developer/Software Architect

As far as qualifications go, we do have a graduate scheme and like most large companies we prefer graduates and respect the amount of effort it takes to get degree but given that some of our best employees did not go to university (or in some cases not even finish secondary school!) we also look for other signs that we think signifies quality.

What are the prospects?

From a company perspective they are as good as you want them to be! Betfair is one of the UK’s  dot.com sucess stories since it is just over 10 years old and still expanding: we currently serve 140 territories in 17 languages and this will only increase. We deal with cutting edge technologies at a scale that most companies can only dream of and this experience makes you very employable.

From the more general perspective of an Architect, they are also very good. To get to the role normally requires several years exposure to how things actually work. Not just the code or development knowledge but also the knowledge about your hardware and your network can be just as important. The business knowledge is critical too since you have to be able to understand the hot button topics of your company. Luckily the types of issues tend to be similar across all companies: performance, scale, ease of use, cost of development v maintenance, reliability and disaster recovery, compliance, audit and regulatory to name a few of them. This makes architects very valuable for many businesses.

In your experience are you aware of any differences your role has between industries/sectors?

The role of Architect can vary hugely and not just from industry to industry but also within different sized companies within the same industry. In the worse case scenario, you have the ‘Astronaut Architect’ who make technical pronouncements from upon high without actually working closely with the developers who have to turn the abstract into application. In the best case you have Architects who are directly invested in the success of a project, from concept to cash, and who work side-by-side with their developers. The role of an Architect is that of an influencer but in a technology company that means he/she can become very powerful.

Reflection and The Future

What was it like coming into the industry?

I was several years out of University before I started working in IT  so I had no illusions about what I was getting myself into. For me the worry was that as a self-taught programmer I would be missing chunks of knowledge about development. I found that the enormous knowledge area of development meant that everyone is far more collaborative than competitive and this meant that I could contribute immediate while being made aware of what I didn’t know and what I would have to learn. I love what I do with a passion and wouldn’t want to do anything else – except possibly be the next David Attenborough!

Do you have any thoughts on the future of your role/industry?

Cloud is taking centre stage more and more but this is more a reflection of the combination of two things – performance/scale and flexibility. As the industry becomes more mature we now have to delivery faster and to a wider audience that ever before. Cloud allows us to do this in that it allows simple prototypes to be rolled out to a platform that is inherently scalable should the application be a success.

Dealing with teams in multiple locations and in multiple timezones is another topic that is important and becoming more so. The world is a smaller place than it used to be and getting smaller. We are also in competition with a larger population of developers than ever before and in a knowledge based industry like ours, the smartest and hardest working will win.

What advice would you give someone entering your industry?

Follow excellence. It doesn’t matter what you are doing if you are doing it the best it should be done. This more than anything else will define your career.

You should never feel that you become stereotyped as a certain type of developer – it’s up to you to take control of your career and you do that by learning. You should have a hunger for new knowledge and everyone should have a list of things that they want to try or learn.

Have you come across anything or anyone that has helped you move forward in the industry?

Having a great mentor is a must. I’ve been lucky in that I’ve met several people that have taught me so much. Also, you should never underestimate a constant drive to improve.

On a practical note – nothing beats getting involved. Get a github account, fork someone else’s code and get stuck in. Start contributing to open source even if it is documentation but get in the game. Not only is your CV improved for it but you as a developer are much improved for it.

Oh No(de)! JavaScript on the Server…?

It’s been a few months since the storm erupted around this post on Node.js by Ted Dziuba. Ted writes in a deliberately controversial style with comments like “Software development methodology is organizational Valtrex.” raising the ire of readers and his post on Node.js was no exception. When he pointed out the Achilles heel of event loop frameworks – that any computationally expensive operation in the main loop will cause the whole system to grind to a halt, fans of Node rallied round and did one of two things – they either missed Ted’s point and pointed out that his method of calculating the Fibonacci series was sub-optimum (and missing the point that it was supposed to be an example of an expensive operation) or pointed out that Node is still young, addresses many common use cases in web development and fits a useful niche with real business benefits.

JavaScript has been used for server tasks before. The venerable Broadvision ran JavaScript in one of the most popular application servers in the late 90’s. The was no syntactical difference between the code that ran on the server and the client side code. Of course we know that JavaScript fell out of favour for server tasks and languages like C++, Java and C# until the new pretenders to the crown of PHP, Ruby and Python came along.

Node.js has really changed the game to allow JavaScript to be a viable alternative again. A simple framework that made a strength out of JavaScript’s lack of threading or blocking reads that could create incredibly powerful event-loop based applications that allowed front end developers to operate on the server. Developers are able to write fast and scalable code that doesn’t have to worry about locking or other concurrency issues to deliver an entire application stack… apparently.

There was an interesting talk given by Doug Crockford around server-side JavaScript covering much of the history and current thinking including Node.js. Now Doug Crockford is an acknowledged legend of JavaScript programming but there were a number of points he mentioned that seemed to beg the question:

  • Just because JavaScript is successful on the browser, what justification do we have to suggest that it would be good for the server?
  • Is promoting the idea of one language for all domains a good idea?
  • If we need to create sub-processes or workers for potentially blocking tasks, aren’t we just recreating threading but with crappy scheduling?
  • We know that threading is bad but does that automatically make event loops good?

Just because JavaScript is successful on the browser, what justification do we have to suggest that it would be good for the server?

JavaScript is rightly called the x86 assembler of the web. It is ubiquitous across all major browsers to such an extend that until Google proposed Dart, there hasn’t been a useful alernative since VBscript (and how useful VBScript was can be debated). It is fast (or rather fast enough) and getting faster and has a programming model that is relatively easy to understand. However, this is a function of the bundled engine rather than some transcendent requirement that all browsers have to support JavaScript.

For all its good features, and there are many, JavaScript has more than it’s far share of bad ones: global variables, reserved words that are only sometimes illegal, terrible arrays, a single Number primitive and truly weird true/false evaluation. We have to ask whether we need another language on the server side that has such flaws and performance that relies on a sufficiently smart compiler for decent performance. Now given V8  instruments the code when it’s being interpreted so can optimize to machine code and also infer the types, it is close to being sufficiently smart but when going against the likes of Java, C#, C or even the more dynamic Python and Ruby why would you choose JavaScript?

Is promoting the idea of one language for client and server domains even a good idea?

If the recent explosion of languages has shown us anything, it is that not all of us think alike. Some of us love functional programming, some prefer object orientated programming. Some love prototype inheritance and some prefer class based inheritance. Some love static typing and some prefer dynamic typing. Now some of this is going to be decided by how we think, what shape our “mindcode” is. But some of this is going to be based around how the shape of the language fits the domain model. As an example, PHP is a horrendous language by any technical standard but it is phenomenal at allowing developers to get a web application up and running faster than almost anything else. Likewise, Java is boring and staid compared to the likes of Python and Ruby but offers an enormous pool of developers with superb tooling on a high performance platform with unrivaled support from major vendors which is just what a large company is going to want. These are different domains that have different requirements and different languages offer differing levels of suitability for each set of problems. Having a single language for your entire stack means you are compromising somewhere.

If we need to create sub-processes or workers for potentially blocking tasks, aren’t we just recreating threading but with crappy scheduling?

Probably the biggest bit of truth in Ted’s post is that any computationally expensive activity is going to block the main event loop and render your application useless. The common response to this is to fire off the long running operation in its own process/thread. That sounds a perfectly reasonable way to avoid blocking your main event loop except that you are effectively recreating a threading model that you have to manage. An example of this is knowing which requests can be handled in the main loop and which need to be spun off. If you the developer has to know in advance how to handle requests, you have got it wrong.

We know that threading is bad but does that automatically make event loops good?

No it doesn’t. Event loops are brilliant constructs for many jobs but really really bad at others. They are not a great general purpose multi-threading construct since they deal with concurrency and not parallelism. There are also other issues around exception handling, paired closures or Option/Maybe monads being examples of solutions, and transactions (hand coded?). When compared to an Actor or Agent framework, event loops are limited in what they can do and not as easy to understand, especially when you are dealing with nested closures. Nested closures are almost a separate topic in themselves but if the true value of a technology is to lower the barrier to adoption, having a fundamental property of the language be as awkward as this is another bad smell.  There are options like the async or step libraries (in fact there are about 10 different ways which in itself is indicative that something needs addressing) but they are still sticking plasters on the underlying issue; that you are required to understand CPS if you want to avoid complexity hell since you have to write code that is directly reliant on the concurrency semantics of your framework. Given that JavaScript has no other concurrency model available to it what else can one do?

So why is the so much noise around JavaScript on the server again?

JavaScript has a massive following and having the same language both front and back makes prototyping simple and fast so companies can focus on creating applications that make them money. This is the true value of Node.js, the ability for developers who would normally be restricted to the front end to be able to delivery a full stack from front to back. This has immense business value, especially when a single developer can use the same programming language and framework to create the server, client and their interactions. This is the quickest way to get up and running and a blessing to any startup. The trouble starts when you want to productionize your prototype. You want things like operational monitoring, security and knowledge around its performance foibles like memory ceilings before you start committing your company’s fortunes to a one-size-fits-all tech stack.

The adoption of node indicates that there is a real need for web developers to have frameworks that enable them to write fast full-stack applications in one language. Node’s focus on productivity and ease-of-use to a population of developers who had no other option has made it very successful and the community that has risen up behind it can be rightfully proud of what they have created. But anyone adopting it needs to be aware of the inherent limitations and maybe ask the question; do we improve the frameworks to help the developer or improve the developer to help select a better framework.

 

The Great Chromosome 2 Fusion Event

In a slight break from what I usually write about, someone recently asked me a question about the evidence for common ancestry between humans and the 3 other great apes, Chimpanzees, Gorillas, Orangutan, that make up the Hominid taxonomic family. Now I don’t often get to talk about topics that I dealt with in my previous life as a researcher in biology so this was a bit of a fun diversion for me. I’m always slightly taken aback when someone professes ignorance for one of the greatest and well supported scientific theories of modern science, especially given the richness of information available to the internet generation. Then again not everyone has had an education where evidence held primacy or possibly has the interest (or in my case fascination!) with all things biological.

One of the greatest and most overt pieces of evidence for this is shown directly in our chromosomes and in the face that we have 46 of them while the other 3 hominids have 48. Now we know that removing an entire pair of chromosome is uniformly lethal. There are no cases where a hominid can lose this much genetic information since each chromosome has too much information in it that is required to create a viable organism. So where did it go?

Nowhere. It’s right there hiding in plain sight. If you lined up the 24 chromosomes of a chimpanzee and the 23 chromosomes of a human you would notice 2 things: firstly, and most obviously, you would have a pair of chimpanzee chromosomes left over and secondly, you would see that the human chromosome 2 is much bigger than the chimp on. You would also notice that if you stuck the leftover chromosome alongside chromosome 2 it matches almost perfectly. This is because each individual human chromosome 2 is actually 2 ancestral chromosomes stuck together, which explains where all the genetic information went.

We can see this even more obviously by 3 facts about the chromosomes:

  1. The genetic sequences between human and chimpanzees are almost identical and code for almost the identical genes
  2. There is an vestigial centromere in the middle of the long arm of the chromosome
  3. There are 2 vestigial telomeres located between the functioning centromere and the vestigial one above.

The the first point, we would not expect the sequences to be completely identical since there has been several million years of separation between us and the other great apes. If you want to look further into this, the topic of nested ERV’s is particularly interesting. For the second and third, what we see are the structural remnants of the fact that the human chromosome 2 previously existed as two separate chromosomes. A centromere is the bit of the chromosome where the 2 arms cross and is very important during cell division when it helps the chromosomes separate. The telomere is kind of like the aglet on a shoelace and is responsible for making sure the chromosome does not unwind or fray.

Here’ an awesome illustration of this.

So this is overwhelming evidence that this is where the “missing” chromosome went but doesn’t this fly in the face of probability? We know that Robertson translocations like this can often be either lethal or lead to fertility. Likewise, the first time this happened the first individual with 23 chromosome would have no similar individual to mate with. If they mated with the previous 24 type, surely there would be a mismatch between the chromosomes that would lead to non-viable or infertile offspring as seen in horses and donkeys? Wouldn’t this risky change have to happen twice?

Well the evidence demonstrates otherwise. We have seen chromosome fusion actually happening in cows [1] so we know that this sort of mutation is not always lethal. We also have evidence that demonstrates that chromosomal heterozygosity does not prevent the individual from mating successfully with member of their species with the normal chromosomal arrangement [2]. These 2 observations demonstrate that it not only is it not statistically improbable for this to happen but well supported as a pathway and remains as one of the neatest and most easily understood examples of evidence for common ancestry between us and the other great apes.

[1] A new centric fusion translocation in cattle: rob (13;19).Molteni L et al. Hereditas. (1998)

[2] Chromosomal heterozygosity and fertility in house mice (Mus musculus domesticus) from Northern Italy.Hauffe HC et al. Genetics.1998