Category Archives: Programming

All things related to programming

Developers – You are not just the sum of your technical skills

I’ve been pondering the fact that when I started out on a career in development, I focussed on the technical side of it almost to the exclusion of all else. This was especially true for me since I was coming from a non CompSci background after spending years sticking electrodes in people’s heads and doing biological research. This made me painfully aware that in an industry like programming, if you don’t know your stuff you are useless to your team and your company. Worse, you run the risk of ending up a Net Negative Producing Programmer and reduce your team’s output to less that it would be if you weren’t there.

Thus the first few years of a my career as a developer were spent focusing entirely on technical matters. I fought hard for the expertise in the areas that I worked in: Java, JavaScript and C, web development, back end services and multi-threading… the list is long and mostly distinguished (EJB’s… blergh!). I wanted to be that go-to guy for all thing technical and do my part in making my team the best in the company. As a contractor, my skills were the yardstick for how I was judged for new roles so of course I was interested in keeping them up to date.

After a relatively short time it became obvious that this wasn’t enough. Being technically adept is the foundation of our job but certainly not the totality. Even before you get your first job you realise the importance of communication. Not just in communicating your skills to get that first job but then once you’ve got it, then communicating with the rest of your team including the non technical members. This is especially true in an Agile world where you have to be able to tell others what is going on in a way that they will understand.
Continue reading

Vert.x – a competitor for Node.js?

With all the hoopla around node.js and threadless concurrency it’s only to be expected that we would see similar (but different) efforts. Vert.x is one of these efforts.

Vert.x is another technology that claims to bring “effortless asynchronous application development for the modern web and enterprise”. What Vert.x really brings to the party is the ability to program in JavaScript, Ruby, Groovy and Java. The reality is that it is possible to write Vert.x applications in any JVM compatible language since that’s where it runs.

Since a vertex is a synonym for a node in graph theory, it’s fairly obvious that imitation is the sincerest form of flattery. I’m still not a huge fan of the event loop as a general concurrency strategy (as I wrote here) since it requires that you know what sort of requests you are getting. I think it’s ideal for systems concurrency rather than general concurrency since you need to know about the request before your can deal with it but it’s apparent that your mileage may vary as Node.js has proven insanely popular.

Anyway, vert.x boasts better performance (yes I know; lies, damn lies and microbenchmarks!) more languages, as many features as Node.js and even a compatibility layer so that you can run your node applications on vertex with zero changes (allegedly). The only thing it doesn’t have is the amazing community support of node.js but it certainly looks interesting.

It looks pretty interesting so why not go have a look at http://vertx.io/

Quick Tip – Tomcat silently fails when a Filter fails to start

I was throwing together a prototype to demonstrate some caching strategies but for some reason Tomcat was failing to start up cleanly. Annoyingly, the only error message was underwhelming.

SEVERE: Error filterStart
Jul 6, 2012 3:39:05 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [] startup failed due to previous errors

There was obviously some bad config or setup but lets be honest, that’s a rubbish default message. The solution was to create a file called logging.properties and put it on the classpath: so either in WEB-INF/classes or list it directly on the classpath. Maven users can put it in src/main/resources. This file should contain the following config which will output the stacktrace of the offending error.

org.apache.catalina.core.ContainerBase.[Catalina].level = INFO
org.apache.catalina.core.ContainerBase.[Catalina].handlers = java.util.logging.ConsoleHandler

This produces the more informative stacktrace below.

Jul 6, 2012 3:34:33 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter cacheFilter
java.lang.ClassNotFoundException: com.betfair.web.filters.CachingFilter
...
SEVERE: Error filterStart

In my case it was a simple misconfiguration of the project where I hadn’t included it, d’oh!

Finding out the wi-fi channels in use at your location with OS X

My wi-fi suddenly started crawling last night, even the smallest web pages took ages to download and spotify kept stopping and starting. Going to broadbandspeedchecker.co.uk showed that I was getting a mighty 0.03 Mb/sec! Before I started railing at my ISP I connected my laptop via ethernet and suddenly, BOOM! 15Mb/sec. So it seemed pretty obvious that my wi-fi was getting shafted somehow and the prime candidate was channel contention.

After a quick dig around for an app, I stumbled across something even better. OS X has a command line utility that allows you to see what channels are in use by the wifi in your area – the airport command. The following (it needs to be all on one line) will let you see what wi-fi endpoints are being broadcast and what channels they are using.

/System/Library/PrivateFrameworks/Apple80211.framework/
Versions/Current/Resources/airport -s

It immediately showed up a new wi-fi endpoint that was on the channel I was using. Needless to say, a simple channel change fixed my issue and normal service was resumed.

Oh No(de)! JavaScript on the Server…?

It’s been a few months since the storm erupted around this post on Node.js by Ted Dziuba. Ted writes in a deliberately controversial style with comments like “Software development methodology is organizational Valtrex.” raising the ire of readers and his post on Node.js was no exception. When he pointed out the Achilles heel of event loop frameworks – that any computationally expensive operation in the main loop will cause the whole system to grind to a halt, fans of Node rallied round and did one of two things – they either missed Ted’s point and pointed out that his method of calculating the Fibonacci series was sub-optimum (and missing the point that it was supposed to be an example of an expensive operation) or pointed out that Node is still young, addresses many common use cases in web development and fits a useful niche with real business benefits.

JavaScript has been used for server tasks before. The venerable Broadvision ran JavaScript in one of the most popular application servers in the late 90’s. The was no syntactical difference between the code that ran on the server and the client side code. Of course we know that JavaScript fell out of favour for server tasks and languages like C++, Java and C# until the new pretenders to the crown of PHP, Ruby and Python came along.

Node.js has really changed the game to allow JavaScript to be a viable alternative again. A simple framework that made a strength out of JavaScript’s lack of threading or blocking reads that could create incredibly powerful event-loop based applications that allowed front end developers to operate on the server. Developers are able to write fast and scalable code that doesn’t have to worry about locking or other concurrency issues to deliver an entire application stack… apparently.

There was an interesting talk given by Doug Crockford around server-side JavaScript covering much of the history and current thinking including Node.js. Now Doug Crockford is an acknowledged legend of JavaScript programming but there were a number of points he mentioned that seemed to beg the question:

  • Just because JavaScript is successful on the browser, what justification do we have to suggest that it would be good for the server?
  • Is promoting the idea of one language for all domains a good idea?
  • If we need to create sub-processes or workers for potentially blocking tasks, aren’t we just recreating threading but with crappy scheduling?
  • We know that threading is bad but does that automatically make event loops good?

Just because JavaScript is successful on the browser, what justification do we have to suggest that it would be good for the server?

JavaScript is rightly called the x86 assembler of the web. It is ubiquitous across all major browsers to such an extend that until Google proposed Dart, there hasn’t been a useful alernative since VBscript (and how useful VBScript was can be debated). It is fast (or rather fast enough) and getting faster and has a programming model that is relatively easy to understand. However, this is a function of the bundled engine rather than some transcendent requirement that all browsers have to support JavaScript.

For all its good features, and there are many, JavaScript has more than it’s far share of bad ones: global variables, reserved words that are only sometimes illegal, terrible arrays, a single Number primitive and truly weird true/false evaluation. We have to ask whether we need another language on the server side that has such flaws and performance that relies on a sufficiently smart compiler for decent performance. Now given V8  instruments the code when it’s being interpreted so can optimize to machine code and also infer the types, it is close to being sufficiently smart but when going against the likes of Java, C#, C or even the more dynamic Python and Ruby why would you choose JavaScript?

Is promoting the idea of one language for client and server domains even a good idea?

If the recent explosion of languages has shown us anything, it is that not all of us think alike. Some of us love functional programming, some prefer object orientated programming. Some love prototype inheritance and some prefer class based inheritance. Some love static typing and some prefer dynamic typing. Now some of this is going to be decided by how we think, what shape our “mindcode” is. But some of this is going to be based around how the shape of the language fits the domain model. As an example, PHP is a horrendous language by any technical standard but it is phenomenal at allowing developers to get a web application up and running faster than almost anything else. Likewise, Java is boring and staid compared to the likes of Python and Ruby but offers an enormous pool of developers with superb tooling on a high performance platform with unrivaled support from major vendors which is just what a large company is going to want. These are different domains that have different requirements and different languages offer differing levels of suitability for each set of problems. Having a single language for your entire stack means you are compromising somewhere.

If we need to create sub-processes or workers for potentially blocking tasks, aren’t we just recreating threading but with crappy scheduling?

Probably the biggest bit of truth in Ted’s post is that any computationally expensive activity is going to block the main event loop and render your application useless. The common response to this is to fire off the long running operation in its own process/thread. That sounds a perfectly reasonable way to avoid blocking your main event loop except that you are effectively recreating a threading model that you have to manage. An example of this is knowing which requests can be handled in the main loop and which need to be spun off. If you the developer has to know in advance how to handle requests, you have got it wrong.

We know that threading is bad but does that automatically make event loops good?

No it doesn’t. Event loops are brilliant constructs for many jobs but really really bad at others. They are not a great general purpose multi-threading construct since they deal with concurrency and not parallelism. There are also other issues around exception handling, paired closures or Option/Maybe monads being examples of solutions, and transactions (hand coded?). When compared to an Actor or Agent framework, event loops are limited in what they can do and not as easy to understand, especially when you are dealing with nested closures. Nested closures are almost a separate topic in themselves but if the true value of a technology is to lower the barrier to adoption, having a fundamental property of the language be as awkward as this is another bad smell.  There are options like the async or step libraries (in fact there are about 10 different ways which in itself is indicative that something needs addressing) but they are still sticking plasters on the underlying issue; that you are required to understand CPS if you want to avoid complexity hell since you have to write code that is directly reliant on the concurrency semantics of your framework. Given that JavaScript has no other concurrency model available to it what else can one do?

So why is the so much noise around JavaScript on the server again?

JavaScript has a massive following and having the same language both front and back makes prototyping simple and fast so companies can focus on creating applications that make them money. This is the true value of Node.js, the ability for developers who would normally be restricted to the front end to be able to delivery a full stack from front to back. This has immense business value, especially when a single developer can use the same programming language and framework to create the server, client and their interactions. This is the quickest way to get up and running and a blessing to any startup. The trouble starts when you want to productionize your prototype. You want things like operational monitoring, security and knowledge around its performance foibles like memory ceilings before you start committing your company’s fortunes to a one-size-fits-all tech stack.

The adoption of node indicates that there is a real need for web developers to have frameworks that enable them to write fast full-stack applications in one language. Node’s focus on productivity and ease-of-use to a population of developers who had no other option has made it very successful and the community that has risen up behind it can be rightfully proud of what they have created. But anyone adopting it needs to be aware of the inherent limitations and maybe ask the question; do we improve the frameworks to help the developer or improve the developer to help select a better framework.

 

Job Ads Are A Valid Indicator Of Programming Language Acceptance

I found an article from a mystery blogger here called Job Ads Do Not Reflect Programming Language Popularity that covered my post on Scala, Groovy, Clojure, Jython, JRuby and Java: Jobs by Language. It’s well written and pretty fair and I think that anyone who found the my original article interesting should go over there and read it. That said, there are a few points that I’d like to expand on if you will indulge me.

1) The job percentage of 3.5% for Java is across ALL jobs on indeed.com not just IT related ones (there are 2x as many Java jobs as there are listings for accountants 🙂 ) which makes the 3.5% value quite respectable.

2) The stats do seem to imply that Scala and Groovy have far more momentum than Clojure. I know some real Clojure fans but compared to the relevant Scala and Groovy user groups they seem to be a little more active. Also, both Scala and Groovy have killer weapon in the Akka and Grails frameworks. I don’t know if the same can be said for Clojure (Noir?) and would be happy to be proven wrong. I don’t think that the importance of having these weapons can be underestimated. It gives both languages great leverage in getting in the door of businesses and once they are in the door then I would expect more general usage to follow.

In the original article I showed the graph below and mentioned that it was likely that Scala would overtake Jython and that in fact, Scala’s growth rate had overtaken Groovy’s (although with a caveat that it was only a few data points and so could be a ‘flash in the pan’).

Now this is the same graph 6 weeks later…

It looks like I was right on both predictions (cue outrageous smugness!) but they both seemed reasonably obvious. The one thing I did not guess was the Groovy’s grow rate would have stalled like it has. I hope it’s only temporary.

The author goes on to give a very good time line of the development of a language but doesn’t mention that crossing the chasm is the hardest thing for any language and that’s where both Scala and Clojure are right now.

3) I completely agree with the average Java developer caring less for programming since it is currently the default language of the majority. By this fact you will find more close-to-the-mean programmers using it than any other language. Given the volume of jobs out there the stickiness of the jobs can’t be that different and because of this I think that job listings are a valid metric of the industry acceptance of a language. Even if they are only slightly indicative, you would have to show that jobs in other languages are 60x stickier to be on a par with Java or that Clojure jobs are 6x stickier than Scala jobs.

4) I hope that I never gave the impression that if you are a Java developer that you shouldn’t bother to learn new JVM languages. I completely agree with my mystery friend that knowing new languages makes you more employable. As someone deeply involved in the recruitment process at my current employer I can personally attest to this. I would go as far as to suggest that knowing more languages makes you a better programmer since it exposes you to more ways of doing the same thing. More tools in your toolbox gives you a better chance of picking the right one and using it correctly.

As for whether my final paragraph was too strong, I can only say that I call ’em as I see ’em. Yes I am a bigger fan of Scala and Groovy than I am of Clojure but please don’t take this as evidence that I am somehow against Clojure. Clojure is a beautiful language but it suffers from the fact that it lacks the aforementioned ‘weapon’ and is just so different from Java that companies will shy away from it worried about how long it will take them to spin up their entire workforce, not just their elite teams.

Maybe I can phrase it differently this time; it doesn’t matter whether you are a Java developer or not, the best thing you can do is to read “Seven Languages in Seven Weeks: A Pragmatic Guide to Learning Programming Languages” by Bruce Tate. Learning Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell will do great things for anyone’s understanding of programming languages.

 

Jetty 8 Maven Plugin

I love how easy it is to integrate Jetty into a Maven build but for some reason, all the examples use a 3 year old version of Jetty. Trying to find the latest version of an artefact can be a bit of a pain so here it is 🙂

    <build>
        <plugins>
            <plugin>
                <groupId>org.mortbay.jetty</groupId>
                <artifactId>jetty-maven-plugin</artifactId>
                <version>8.0.4.v20111024</version>
                <configuration>
                    <webApp>
                        <contextPath>/</contextPath>
                    </webApp>
                    <stopPort>9966</stopPort>
                    <stopKey>foo</stopKey>
                </configuration>
            </plugin>
        </plugins>
    </build>

Staying Sharp: Playing With New Technologies

My work has kicked into top gear in the last couple of weeks. I’ve had so many demands on my time that I have been unable to do any coding. Now I know that that is part and parcel of being a Tech Lead but the moment you step away from the code you start to lose the qualities that make you so useful to the business, i.e. being able to stick your nose into the codebase and shout “Yep, I know what this bit does.”

So I’ve been spending a few hours every night keeping my skills sharp on a project that a) has a direct impact on what I do everyday and b) is bloody good fun. How much fun I here you ask? So much that my wife forced me to sleep in the “huff” bed (the spare room – where you go when you are too angry or too drunk to sleep together) since I kept talking about what I was doing.

So what is it? Some multi-threaded mayhem of course!

Not really of course. I regard manual thread manipulation and the last refuge of fools and geniuses and I ain’t either. My current project is a web application which makes a large number of concurrent requests for each page request. We use various bits of the java.util.concurrent libs to do this. This works great but at scale can cause issues since each thread takes a few MB. Once you’ve spun up a couple of thousand of them, your machine has eaten itself. A solution to this would be to use Actors instead and since the excellent Akka framework can be used in Java and not just Scala I thought I would give it a go. At the same time, since it was a home project I thought I would use Git for version control.

I finished up the project today and it’s pretty cool. Akka’s syntax in Java is comparatively clunky to its syntax in Scala but I’ve seen far worse. Some perf testing on Monday should show what difference it makes and if it is good enough I might take it to the next LJC meeting.

Continuous Delivery Metrics: Do we need anything other than Cycle Time?

Without doubt, cycle time is the most useful metric for measuring Continuous Delivery. In chapter 5 of “Continuous Delivery”, Dave Farley and Jez Humble define it as “the time between deciding that a feature needs to be implemented and having that feature released to users”. They also mention that while this shouldn’t be the only metric you use, the others they mention: number of builds, code coverage, cyclometric complexity etc., are more concerned with the initial Continuous Integration phase of the CD pipeline rather than the pipeline as a whole. Cycle time really is the best indication of the health of your Continuous Delivery pipeline.

Now in the current project I’m working on we are currently rolling out a Continuous Delivery pipeline. Interestingly it has raised some issues with simplistically using cycle time as the main metric. The underlying assumption with cycle time is that any restrictions or bottlenecks can be solved by working on them (not much of a surprise!). But what happens when your bottlenecks are external and can’t be solved? A classic example would be when an external regulator enforces a legal requirement that code deployed in its jurisdiction is subject to their analysis. There is no point changing to “subordinate all the other processes to the constraint” when the constraint is not solvable. Since it’s not unusual to see this sort of analysis take days, and your CD pipeline could be humming along nicely but your deployments into production slam into the a requirement that stops you from deploying which certainly puts a crimp in any idea that you can release multiple times a day.

The external restriction can skew Cycle Time enough to hide other bottlenecks, the ones that we could and should be working on. One option could be to change the cycle time from deciding to implement the feature to when the code is in the environment before production (staging/pre-production/next-live). The trouble is that if you do this you completely lose the connection to the customer, which defeats the point.

One improvement to this was to record not just the total cycle time but also the number of deployments into each environment. This gave us an efficiency metric that allowed us to pinpoint where the issues were and record how our work affected them. If we imagine a simple CD pipeline of 5 environments, Development, QA, Performance, UAT and Production, which we deploy to in a serial manner, a really efficient pipeline would have values like these:

In this hypothetical example we deployed 100 times this week (for informations sake, the deployment rate at my current employers is about 2.5x higher). In every 100 deployments to dev, we see about 95 to QA, 95 to the performance environment, 90 go on to UAT and of those 85 make it to prod. Of course this is highly idealised but you get the picture. You are always going to do more deployments to the development environment and the least in the production environment. What is important is the gradient between the values. Expressing them as a percentage of dev deployments you get an efficiency ratio of 95/95/90/85. The different or gradient between 2 environments tells you how efficient you are being at that step.

So what does it look like in the real world when you have an external blocker.

The values for this are 65/60/50/05. Only 1 in 20 of the builds make it into production which isn’t great and the biggest bottleneck is the external restriction but there is also a huge difference from development to QA. It turns out that some of the tests use non-deterministic data and will occasionally fail. Of course this is a huge no-no but it’s difficult to see just how much it cost since the greatest delay was from UAT to Production.

Continuous Delivery recommends that you identify the limiting constraint on your system and really that is no more than what this does. Cycle time is hugely important in knowing the state of your CD pipeline. The truth is that recording the number of deployments give greater depth to cycle time and allow you to see how your whole process could be optimised.

Scala, Groovy, Clojure, Jython, JRuby and Java: Jobs by Language

In a previous post I pointed out that one of the more obvious recent changes in the Java landscape has been the meteoric rise in popularity of other languages for the JVM.  Some are old and some are new but JVM compatible versions of established languages with the likes of JRuby and Jython, Java-esque languages like Groovy and Scala and brand new languages like Clojure and Kotlin offer genuine options for those that appreciate the performance and reliability of the JVM but also want a different syntax.

In an ideal world all developers would be able to develop in the language of their choice. The reality is that as developers we are constrained by the suitability of the language, the tooling support and by what languages companies are actually using. Firstly, you choose the language appropriate to the domain – one that lets you do your job quickly and easily but with the appropriate level of support for your non-functional requirements like performance.  Secondly no one wants to be slogging through the coding process in a simple editor. Yes, I know that we could all use vim or emacs but being able to refactor large swathes of code easily and quickly (hello TDD!) kind of demands a modern IDE like IntelliJ or Eclipse. Thirdly, the reality of the situation is that very few of us are in the position to be able to dictate to our employers what language we should be using. Learning a language with rising popularity also means that you have a greater chance of being employed in the future (which is nice) but employers drive the acceptance of new languages.

The fact is that many companies boast about using the latest and greatest languages since it makes them more attractive to candidates. You can barely move for the blog posts and tweets of people raving about how their company has completely changed their development process with a new language but what is the real picture?

For a useful indication of industry acceptance we can go on the job trends on indeed.com. The grand daddy of language charts is Tiobe but it’s no use at this point since a) it does not provide sufficient information and b) is too easily gamed – yes Delphi dudes, we know what you did. Now before you complain, I know that using something like this is far from perfect and a long way from scientific but unless you fancy doing a longitudinal study going asking all the companies what they are using and believing their answers are real rather than marketing fluff, it’s probably good enough to be illustrative.

So what can this tell us about the how the industry sees the major language of the JVM: Java, Groovy, Scala, Clojure, Jython and JRuby*. What happens when we have a look at the percentage of all jobs that mention these languages

Umm, well… it’s pretty obvious that despite all the industry noises about other languages, Java is still massively dominant in the job marketplace with almost 3.5% of the jobs requiring Java knowledge. We all know that Java is an industry heavyweight but it is a bit of a surprise that in comparison the other languages are an indistinguishable line. Welded close to the 0 line, they would need some seriously exponential grow to start to threaten Java.

So what happens when you remove Java….

This is a lot more interesting. Firstly Jython was the first language other than Java that was really accepted on the JVM. Groovy started to pick up in 2007 and quickly became the first of the alternate languages, no doubt driven by Grails. Clojure and JRuby have never really garnered much support despite the recent rise in the last 18 months or so. I think the most interesting point is the recent increase in the acceptance of Scala. Currently third behind Jython, the gradient indicates that it will soon move into second. Comparing the grow rate of Scala and Groovy on a relative basis we see the following.

So we can see that Scala has finally crossed over Groovy’s growth rate. It’s completely reasonable to say that this could be temporary and that we should not read too much into this but there are a few data points there so it does not appear a flash in the pan.

So what can we say; while you’ll want to dust off the old Groovy text books and maybe have a look at some Scala tutorials, the best thing you can do is to keep your Java-fu in top notch order. As far as the industry is concerned Java is still the Daddy of the JVM languages and seems to being staying this way for some time.

* – I did originally include Kotlin and Gosu but since there were 0 jobs for Kotlin and only about 9 for Gosu they would only have been noise.