Categories
code development programming

Software Development as a Christmas Story

Gingerbread Christmas Tree
Oh Tanenbaum

We decorated the house for Christmas. A smaller project than most of the software I’ve worked on, but a useful reflection.

The Cost of context switching

Sometimes, in the middle of one task, you need to do another. Either because you were interrupted, or because you weren’t prepared. Consider the tree, a tall tree, one you need a ladder to decorate. But you left the 🌟 in the attic.

You don’t just have the cost of picking up the star, you need to get down the ladder, fold it up, carry it upstairs, open the attic, unfold the ladder, climb to the attic, find the star, then pick it up and unwind all the other steps just to get back where you stated before you can continue the task of putting the Star on the tree.

It’s just a minute to get the star…

Yak shaving

Sometimes it’s more than just one thing. Sometimes to get to where you need to go, there’s a cascade of other tasks.

You want to put the tree up, but it’s the first time you’ve had a real tree, so you need a new base. It’s bigger than last year’s tree, so your lights don’t fit. So you need to go to the shops. But you need to fill up your car. And the tyre pressure warning light comes on so you need to top them up. And you need to check the tread depth whilst you’re there, and so on.

Programming in a nutshell

Performance at scale

Our tree stood up fine when it was delivered. But as it scaled out and the branches widened, it pushed against the wall, making it unstable in the condensed space. It fell over.

Luckily no lights or baubles were using it at the time, but it’s an interesting challenge holding up a heavy tree in one hand, trying to adjust it’s position with the other hand, as I avoided the puddles of water in the floor as my wife mopped them up. If you’ve ever worked support, this may sound familiar.

Turns out that it was harder to stabilise than I anticipated.

The branches were unevenly partitioned, providing more load on one side, so I had to stabilise it against the wall. And the tree was almost a foot taller than expected, which turned it out to be 2 foot taller than the base was rated for.

We upgraded the base to handle the load. It’s bigger than we need.

Technical debt

I have some old pre-LED tree lights and they slowly fail so each year I replace the unlit ones from the packet of spares but I haven’t been replacing the spares. Eventually, they will run out, and the spares will no longer be available. I’ll have to throw them all out and start again.

The new led ones don’t let me replace them individually. But they last longer and are cheaper to run.

Which debt is easier to live with?

With a big tree, those old lights aren’t long enough. So, do I buy another set for the rest of the tree that doesn’t match them, or throw out the existing ones and buy a longer set? The latter looks better, but throws out the sunk cost along with the lights.

You know computers, can you look at this for me?

No, I can’t fix your tree. I can navigate a garden centre. I know enough physics to keep a tree upright, eventually, and safely put the angel on, but I’ve no Idea how to grow one, or what that weird stand with 17 gears you bought does, or how to assemble your plastic one. And I’ve no idea what that bug in the tree is.

Merry Christmas. Catch you all in the New Year.

Categories
development programming

Cloud thinking : storage as data structures

We’ve all experienced the performance implications of saving files. We use buffers, and we use them asynchronously.

Azure storage has Blobs. They look like files, and they fail like files if you write a line at a time rather than buffering. But they’re not files, they’re data structures, and you need to understand them as you need to understand O(N) when looking at Arrays and Linked Lists. Are you optimising for inserts, appends or reads, or cost?

I know it’s tempting to ignore the performance and just blame “the network” but you own your dependencies and you can do better. Understand your tools, and embrace data structures as persistent storage. After all, in the new serverless world, that’s all you’ve got.


Understanding Block Blobs, Append Blobs, and Page Blobs | Microsoft Docs

 

Categories
development leadership

Measuring the wrong thing

Process improvement requires measurements. How do you know what to improve if you can’t see where things aren’t working, and how do you know you’ve made the right change without seeing those numbers going in the right direction?

But measuring doesn’t mean you’re measuring the right thing, and measuring the wrong thing can lead to the wrong outcomes, and can be very harmful (see Liz Keogh’s talk on perverse incentives )

The key to the success of any metric is that it is owned by the team so they have complete control over the changes needed to affect it, so they feel ownership, and that improving the metric has a useful business outcome.

Metrics that I’ve found useful in the past include:

Number of bugs introduced by a release.

This is a tricky one to work with because it can easily be a perverse incentive, and the feedback cycle can be slow, especially with the usual 6-12 month release cadence. However, on one waterfall project I took over there was a strong negative perception of the quality of the code, and big count was a useful proxy as the customer had access to JIRA so the big list was already visible. Reducing this number was a combined effort where testers and developers had to approve requirements, developers and testers invested in automated testing at all levels, and testers joined the developer daily stand-up in order to catch bugs before they were released “have you checked that page in Welsh?“, “We found problems on that page last time because IE was too slow“.

Number of releases per month.

On an agile project we noticed that releases were getting held up because testing was slow and produced a lot of rework, which were then tested against more features that had been pulled from the backlog, increasing testing time. Each release also took half a day, so we had to schedule them carefully.

So we set a goal of focusing on releases for a month and measuring how many we did. There were other measures within that, such as monitoring work in progress on the testers, time taken to do a release, and cycle time from start of development to live release, but they all drove the big number, visible to the whole company, of more quality releases.

Questions answered per day.

This can be a very useful metric for research projects, especially when following a fail fast approach, when you want to prove something can’t be done before you invest in doing it. In order to do that, you need lots of questions with quick answers, to accelerate learning. Any answer, positive or negative, is progress. It means we have learned something.

“I cheerily assured him that we had learned something.
For we had learned for a certainty that the thing couldn’t be done
that way, and that we would have to try some other way.” – Thomas Edison

Age of Pull Requests

Successful agile projects rely on peer review, either via pairing, or a formal review process such as git PRs (and I won’t discuss that in this post). However, when we started working with these on one project, we discovered that we were building up a large debt of Work In Progress because the team wasn’t incentivised to review each other’s code, so one of the developers set up a nag bot for any PR older than 4 days. It lasted 2 weeks before it was no longer needed.

What about you?

What metrics have you used to incentivise the team? What problem were you trying to solve, and did the metric work?

Categories
development programming

Speed : Peak Performance

2011-02-26-15.41.02-1024x682
Would you rather be fast or agile?

I’m sure most developers have heard (and possibly used) the phrase “premature optimisation is the root of all evil” at some point to justify not making a code change because we don’t know how it will affect performance. Unfortunately, I’ve also seen developers and architects try and use it as a “Get out of jail free” card.

The important word here is premature not optimisation. Performance is not something that can be tacked on at the end, you have to think about it up front, as part of the architecture. I have heard many voices arguing that we don’t need to worry about performance because we can profile and optimise later. That’s true to a point, but when you know, in advance, that 2 machines are going to be pumping lots of data to each other, you find a way to put those machines in the same rack, or run both services on the same machine. When you know your object graph contains thousands of multi-megabyte images that you don’t need 99% of the time you access those objects, you design your data structures so that you only load them on demand.  Putting indexes on a primary key isn’t premature. Designing for scalability by reducing shared state isn’t premature. Those are not premature optimisations. You know that your decisions up front can avoid a bottleneck.

It’s only premature until you measure it. You should have an idea how much data you’re transferring. If you find out with 1 week to go live that your program is sending ½Gb of XML data on every request, then you probably weren’t paying attention, and you need to look at your test environment to figure out why you didn’t spot it before.

You might tell me that you don’t need to worry about performance. Maybe 10 years ago, but Moore’s law is dead. You can no longer joust and wait 18 months for your code to get faster. Multi-core is not magic, and it won’t make procedural code any faster, just look at Amdahl’s Law. Web servers are optimised for throughput, desktops are optimised for user responsiveness, and mobile devices are optimised for battery life, not programmer bloat.

Slow code is code rot. If you can measure your technical debt in minutes, it’s time to pay it down. Of course, we still want agile development with maintainable code, and premature optimisation can still create technical debt, but don’t ignore performance, and make sure you know how to measure it, even if it’s just turning on the performance monitor on your CI build so you know if it’s getting slower.

Clean code, but fast clean code.