The UX of Big Data

Following on from my Dangers of Big Data talk at DunDDD, I’ve been thinking about what a good user experience for data analytics would look like, imagining the business user presented with useful, actionable information rather than notepad and a copy of the R or Python cookbook. I want something deceptively simple like the Google search box, rather than deceptively complex like Excel.

Excel, and R and Python, put a lot of tools at your disposal, and you could use any of them to construct an answer, but the secret to analytics relies on getting a valid, useful answer. The first is a matter of restricting the answer space to that which can be supported by the data (for example, disallowing multiplication of time-based input streams, or aggregating when there is no statistical basis for it), the second is a matter of allowing the user to explore the space so they can determine (and where appropriate, train the system to recognise) which factors are most important, how they affect the desired outcome, and how changes to the environment affect these factors.

Then the question becomes, how much should the software take over. Do we have a duty to protect users from themselves by preventing invalid analysis where we can detect it, or do we have to accept that the frustration that will cause leads to alienation and users will be less likely to respond well to further corrections. Even nudging had its possible, as anyone who had been frustrated by grammar checkers can attest. But at least nudging helps the user to understand, rather than putting up roadblocks. Nudging encourages learning, roadblocks encourage switching to another way.

How would you encourage users to handle analysis appropriately?

Paperless and warning free

there's a dog driving that car?
Do you have a licence

As a quick follow up to my post on the new process for endorsements following the demise of the paper counterpart driving licence.

First, a clarification, the change in the DVLA is for the paper counterpart to the photo id licence, not the paper licence that existed before the photo id licences. Many people will have been switched to photo id by moving house though, so it’s only the hardcore who won’t be included.

I got confirmation from the car hire firm 24 hours in advance that I needed to print out my endorsements sheet, by which point I no longer had access to a working printer, so I was glad I’d tried it beforehand. The guy at the desk noted that it was a new scheme, and also mentioned that if I hadn’t printed it out, they would need to call a DVLA verification phone number which is very busy when it’s not shut. So still a number of teething problems to sort out.

Do if you are hiring a car, get yourself over to dvla in advance (any time within 21 days) and get your endorsement sheet printed. It might just save you from long queues and grumpy car hire staff.

Offline OAuth and the end of paper driving licences

there's a dog driving that car?
Do you have a licence

As I will shortly be hiring a car, I had the opportunity to try out the new process to replace the paper driving licence.

For those not on the UK, the driving licence consists of 2 parts : a photo id that gets renewed every 10 years that details what types of vehicles you can drive and your name and address, and a paper counterpart that details any endorsements or convictions. The main people who care about the paper part are the insurance companies, and by extension, the car hire companies who have to include insurance in their rates.

Prior to their abolition a few weeks ago, the paper licence had to be sent back to the DVLA to have endorsements or convictions added, and again to have them removed after 3 years. I’ll leave the possible opportunities for fraud and disruption as an exercise for the security minded reader.

The new process

In order to view your endorsements now, the dvla have a page on their website, secured by your driving licence number, your national insurance number (for US readers, think social security number), and your postcode, which is printed on your licence. So they don’t quite consider it public information, but it effectively is. If you ever use your payslip and your driving licence to prove your id, someone can see your endorsements, which is only slightly more secure than handing over the old paper licence.

However, in order to provide some security theatre and allow you not to disclose your national insurance number, the page does provide a printout and a time linked key which, when paired with your licence number, allows the recipient to verify your endorsement record directly with the dvla.

I have not had to renew my insurance since the changes, but I notice that the car hire companies hide the new dvla page deep in their terms and conditions, so it’s not given the same prominence as the requirement to provide the photo id part of the licence. Other than the fact the paper licence is no longer valid, I’m not sure of the use cases for this. Maybe it will make more sense when I renew my insurance and I can do all this online.

At the moment it feels like digital for the sake of it, and too many bridges back to the old way.

CodeCraftConf : technical leadership

Follow @codecraftuk on twitter
CodeCraft logo

I have asked to be a guide for the CodeCraft conference. The tickets for CodeCraftConf are now on sale.

One topic I would like to discuss is technical leadership. The format of the conference states that I will have 12 prompt cards to help guide the discussion. My suggestions for these are shown below, and also available on my github repo. Any comments and suggestions welcome.

Technical Leadership

  1. What is a technical lead?
  2. What most inspired you about your previous technical leads?
  3. What behaviours would you change in technical leads you have worked for?
  4. Why do you want to be a technical lead?
  5. What scares you most about being a technical lead?
  6. How do you measure success as a technical lead?
  7. What one thing would make your life as a technical lead easier?
  8. What responsibilities are you happy to delegate, and what do you want to control?
  9. How do you plan for your own absence, so you can rest on holiday?
  10. What qualities do you want your team to have, and how to you help them get that?
  11. How do you deal with conflicts in the team?
  12. How do you deal with external pressures on the team?

Agile papercuts

Nine of diamonds in the rough

I’ve worked on a few projects and I’ve tried many ways to run them. The agile manifesto is a great starting point, and should be in your bookmarks for quick reference. But when you put it into practice, your first thought is the mistakes you made last time, the lessons you learned, so you can do better this time.

So you look at where things failed, and you add a process around it, maybe one of your own, making taking inspiration from others.

For example, you had a big problem with release management in your last project, and git flow is a process for release management, so you try it on for size.

But the planning decisions about when to deliver features, and how the support and feature releases work together are not changed, because it was a release management problem, not a planning problem. So you add more process.


Let’s assume you don’t have time for root cause analysis of everything that caused you pain last time. Just assume everything is causing some pain, but for some things the benefit outweighs the cost.

How do you spot what causes more pain than benefit?

Does your process support the people, or sideline them? Is the documentation useful or mothballed as soon as it is delivered? In other words, are you valuing the things on the right more than the things on the left?

As developers, it is a long battle to get over your ego, and the sunk code fallacy, and learn not to be precious about the code you’ve written because the product you’re delivering can always be improved, and if your code is no longer fit for that product, it should be removed. Celebrate the code you no longer have to maintain.

As tech leads, we can be just as precious of our procedures and our practices, but they can be even more painful that the code. They’re harder to refactor, to measure, to test, because people are less predictable than code, but we need to be willing to identify waste, and identify the pain points so that we can address them, and remove practices if necessary. Measure where you can, but don’t be afraid to be as ruthless with your process as you are with your code. Anything that didn’t add value is weighing you down, and even those small papercuts that sung every time are worth removing.

Whatever you think Agile Is, even if you think Agile Is Dead, don’t forget your process is as much a part of the delivery as the code you produce. Own it. Trim it.

Agile Is…'re a dinosaurIf you’re not agile…

For those of you who came to this blog for my rant on Agile Is Dead, I’d like to recommend these posts from Nathan Gloyn if you want something more actionable.

I can’t say my day job is all agile, but I try and nudge us down the continuum where I can. Process is all about supporting people, rather than vice versa, but documentation it’s one area that’s harder to go agile with. All my recent projects have a set of must have requirements defined as legislation by politicians in parliament, and it is often clear that implementation is not a primary concern. Legislation appears designed for ambiguity, with the expectation that courts or ministers will be able to clarify the grey areas. Which means more documentation.

Only one project I have worked on recently allowed us to have some influence on the legislation, because we were working directly with the government department involved, and the chief civil servants involved in the legislation were in our workshops, and the project started before the legislation was complete. It’s a strange, and not entirely unpleasant, feeling getting answers to outstanding questions from a politician reading amendments in parliament.

NMandelbrot : Clients gaming the system

Mandelbrot set with suspicious lines
There’s a glitch in the data

In any system with clients outside your direct control, you will be subject to Rule 1 of network security : Don’t trust the client. For the Mandelbrot Set, the worst that can happen to the result is that a few pixels go astray, provided the input is properly sanitised to protect the server.

For more complex calculations, where the data matters, it may be of interest to some parties to try and skew the results. In the Search for Extra-Terrestrial Intelligence hack, for example, participants were claiming credits for work not done, or submitting bad data, so some verification of the result is required, which can either be done on the server, or by submitting the same work to multiple clients and getting them to “vote” on the result, which requires a much smarter reduce algorithm than is available in the sample code.

Note that securing the client code (e.g. by obfuscation or delivering a non-JS payload to execute the algorithm) is no defence, given that there must be a globally accessible service for the clients to talk to in order to get any data back. The channel itself can be secured, providing you don’t trust the encryption for long, but even with client-side security, such as an SSL certificate, as soon as the code leaves your server, you no longer have any guarantee over it. Given the importance and sensitivity of your data, that may or may not be a problem.

Anyone who doesn’t validate all inputs on the server is handing the keys to their attacker*, but when you don’t know what the input should be (otherwise, why do the calculation at all), you have to find a way to build trust. Maybe each client gets tagged with an id, non-traceable to a user, and the validity of responses from that client can be measured over time to give a trust rating, allowing the voting to be weighted to prefer results from trusted clients, assuming there is a mechanism in place to lose trust if a client is compromised.

Maybe the payload includes some hidden data, a known, non-repeating, throwaway result (similar to a 2-factor authentication token) whose only purpose is to validate that the client is responding correctly, but is otherwise indistinguishable from real data.

There’s no one solution to fit all situations, and the server and client cost of the solution will be correlated with the importance of the data, up to the point where the data, even in a subset, and even with protections, is too important to be opened to untrusted machines.

There are many other client-side attacks or mitigations I have missed, so feel free to add your own suggestions below.

* Note : you can do client-side validation prior to sending to the server for usability reasons, but not for security.

Code - Rocks - Me

Experiences of a geologist turned software developer

Paint.NET Blog

The best free image and photo editor. By Rick Brewster.

Didactic Code

Learn By Doing

Voidspace - Cyberpunk, Spirituality, & Python

Just another weblog

The Inner Donkey Sanctuary

Taking care of my inner donkey since 1977

The Goat Farm

Be the goat.

Psyche's Circuitry

Thoughts on growing up and growing old in the digital age

John Kitchen

Managing Life and Business in a Technologically Changing World

Binary is Awesome

An assortment of personal projects from coding semi-useful applications and working with computer hardware.

Killer Robotics

Adventures in Robotics, Development and Electronics.


programming blog based on my intro to programming through my cute little robot Sparki (Arcbotics)


Mostly tech, coding & DIY... but it's fun!


Growth, together


agile, software development, fun

Speech and Science

Just another weblog

Rough Copy Media

media analysis / with added swearing

Simple Programmer

Making The Complex Simple

How To Tutorials

PHP, ASP, .Net, Linux, SEO


Get every new post delivered to your Inbox.

Join 1,570 other followers

%d bloggers like this: