Meltdown and Spectre shouldn’t change any of this. But now is a good time to think about it. Make 2018 the year you become paranoid about users, 3rd parties and other threats. The year is still young, but the exploits are piling up.
For more complex calculations, where the data matters, it may be of interest to some parties to try and skew the results. In the Search for Extra-Terrestrial Intelligence hack, for example, participants were claiming credits for work not done, or submitting bad data, so some verification of the result is required, which can either be done on the server, or by submitting the same work to multiple clients and getting them to “vote” on the result, which requires a much smarter reduce algorithm than is available in the sample code.
Note that securing the client code (e.g. by obfuscation or delivering a non-JS payload to execute the algorithm) is no defence, given that there must be a globally accessible service for the clients to talk to in order to get any data back. The channel itself can be secured, providing you don’t trust the encryption for long, but even with client-side security, such as an SSL certificate, as soon as the code leaves your server, you no longer have any guarantee over it. Given the importance and sensitivity of your data, that may or may not be a problem.
Anyone who doesn’t validate all inputs on the server is handing the keys to their attacker*, but when you don’t know what the input should be (otherwise, why do the calculation at all), you have to find a way to build trust. Maybe each client gets tagged with an id, non-traceable to a user, and the validity of responses from that client can be measured over time to give a trust rating, allowing the voting to be weighted to prefer results from trusted clients, assuming there is a mechanism in place to lose trust if a client is compromised.
Maybe the payload includes some hidden data, a known, non-repeating, throwaway result (similar to a 2-factor authentication token) whose only purpose is to validate that the client is responding correctly, but is otherwise indistinguishable from real data.
There’s no one solution to fit all situations, and the server and client cost of the solution will be correlated with the importance of the data, up to the point where the data, even in a subset, and even with protections, is too important to be opened to untrusted machines.
There are many other client-side attacks or mitigations I have missed, so feel free to add your own suggestions below.
* Note : you can do client-side validation prior to sending to the server for usability reasons, but not for security.
To tie together a few of my previous posts, I wanted to talk about the proof of concept I built in Node.js. I will come back and discuss the outstanding issues in a later post.
Given my preference for the Mandelbrot set as a prototype in client-side languages, I wondered how I could develop a Mandelbrot solution that used the server as little as possible, so I hit upon the idea of creating a zero-install grid computing solution, similar to SETI@Home, where every browser that logged on would be computing a small piece of the whole, and the job of the server would be to coordinate the clients, and maintain the shared state of the current progress.
I’m not affiliated with Numberphile, I’m just a fan, but for those of you who don’t know about the mathematics of the Mandelbrot Set, it’s worth having a look at this video to understand it.
Create a grid of points to represent each un-escaped pixel
Note, the proof of concept used a fixed grid to ensure an upper bound on the number of points that the server needs to store.
The proof of concept used a sparse grid of points here as I originally planned to do a flood-fill of the outer regions, but changed my mind and didn’t refactor.
For each point, store the current iteration, value and whether it has stabilised (initially false). These points are indexed on the complex number co-ordinates rather than canvas co-ordinates.
When a new client connects, open two connections. The first asks for the currently valid list of points to output to its own canvas, and the second ask for the next bit of work.
The server picks an unescaped point at random, and sends it to the client, as well as sending the current list.
When the client receives a point to work on, it performs up to 50 iterations on that point. If the point escapes, the client stops and reports the iteration that it escaped on, otherwise, it increments the iteration count by 50 and updates the z values to be the most recently computed for that point. It also renders that point to its own canvas
The server receives the value, updates its cache, then sends the next point down to the client.
Of course, there’s a lot more to it than that, but I’ll talk about how I solved some of those issues in future posts.
In my post about developing in the cloud, I started promising a few nuggets about the project I was working on, and following my diversion onto talks and security, I’m ready to start discussing it again. The project itself is fairly straightforward. It was an excuse for me to try a realistic project in Node.js using cloud development tools, to see what was possible, and to decide if I wanted to use Node.js for anything more substantial.
Partly, I wanted to immerse myself in a completely callback-driven, “non-blocking” model, so that I can see how it affected the way I thought about the software I was writing, and also to see if I could see the benefits. I’ll expand on some of these thoughts as I talk more about it, but I wanted to start with my initial impressions.
What Node.JS does well
Firstly, from looking at a test example based on graphing data from this great read about green energy, it was clear to me that the promises of Node.js lie very much in the I/O performance side, rather than anything else. If you’re trying to do anything more complex than throwing as many bytes at the response buffer as possible, Node.js is probably the wrong tool. It’s not built for complex business logic. It is good if your server is a thin layer between a JS front-end and something specialised on the backend (or a series of specialist pipes on the backend – the web services support is top notch, as you’d expect from the language that gave us AJAX and JSON).
In the end, I got a proof of concept up over a weekend, which is about as long as I’d normally spend on The Mandelbrot Set, which is a nice quick result, and I got multiple users up on the site, which is more of a testament to Cloud 9 as it is to Node, but the resultant code had fewer features and messier code than the alternatives I’d written in other languages. It certainly felt more as if I was fighting against the flow than in previous incarnations, despite the refactoring tools and examples I had available, and it was a lot harder to keep the program flow in my head, since I had to ensure I understood the context that the callback was operating in : which paths could lead to the current code being executed and what I could rely on being set. I tried to trim the code back as much as possible, but I was still debugging a lot more than I was used to in other incarnations, despite more testing.
What Node.JS does to make you grind your teeth
At this point, I’ve written two proofs-of-concept in Node.JS, and whilst I think the 2 projects I tried weren’t the optimal projects for Node.JS, I’m getting a feel for what is can and cannot do. I can see places where I would use it if I was doing streaming or similar low-latency, high-throughput tasks that were just about routing bits, and I have watched it updating several clients at once, but it’s very easy to write blocking code that will slow the server, and has an instant impact on all clients, making them all hang until the blocking operation is completed. That type of error is not unique to Node.JS, but I found the chain of callbacks increasingly more difficult to reason about, making that type of error more likely. It feels like writing GOTOs.
At this point, I can’t see myself using Node.JS for anything other than a very thin layer to some business logic, and at that point, it seems odd to use a thin layer of web services just to call other services in another language. That’s what I’d do to start migrating out a legacy app, but I wouldn’t start a design that way. If I wanted to build a web service backend, there’s no benefit in Node.JS that I couldn’t get in WebAPI. However, I’m wondering whether my next project should be to re-write the backend in Go, to see if that’s any easier.
Following my 3 rules of network security post, I’ve been thinking a lot more about the NSA aspect, and the fact that even if you have managed trust on the client, the server and the network, there’s still another concern, because the number one way of building trust, of saying that machine is who it says it is, of saying I can pass this personal data across an open network safely, is encryption.
Every so often we hear that an encryption method is broken, whether it’s WEP in your WiFi router, Elliptic Curves in the NSA-approved RSA security tool, or Heartbleed in OpenSSL, the only solution is to reset and start again, and hope none of your old data was compromised.
So, don’t store data you can’t afford to lose, unless you really have to (in Europe, if it’s personal data, minimising storage and collection is the law). All your security should be reviewed and hosted regularly. Someone’s full time job should be keeping on top of patches, renewing security credentials, including SSL certificates and passwords, and never chain credentials as a failure in one will lead to a cascade failure of your entire stack. Perfect Forward Security, for example is designed to avoid using the SSL certificate to generate the session key, so that any encrypted stream is not securely dependent on maintaining the privacy of the SSL certificate.
If you’re running a server, and you don’t validate any user supplied content, please shut down your server now before you put the rest of the Internet at risk. Depending on what you’re processing, that includes any POSTed content, any query string, HTTP headers, the content hosted at any provided URL if you retrieve it, and many other possible inputs.
Even if you trust the content is not harmful to your IT security, you still can’t necessarily trust it. Your survey results will contain untrue data, none of your IE11 users will show up as IE users, and if you’re doing any calculations on the client, they may give the wrong answer due to misguided assumptions (the pixel density of an iPhone just before the retina display was announced) or malice.
One way to adjust for the effects of wrong answers is to aggregate results across many inputs such as the majority voting system employed by the Apollo computers to minimise the effects of computer failure. You can also check for inappropriate behaviour such as a high rate of submissions that indicate gaming or a DoS style attack. There are so many possible attacks that I can’t list them all here.
Don’t trust the server
As a client, you also need to validate what you receive. Any recent browser will sandbox and restrict any code by default and the recent web standards also include well-defined Chinese walls to prevent code from one site intercepting data from another (see, for example, CORS and compare to the old method of JSONP in terms of validation and verification of incoming requests. Of course, you should also be checking that what you are receiving is from the right source (mybank.com instead of mybank.com.some.compromised.server.ru).
In addition though, you also need to trust what the server will do with the data you send it. Will the owners respect your privacy (and remember, if they’re outside the EU, the Data Protection Act does not apply) or will they sell your data? Will they protect your account (by hashing passwords, and only storing what they need, rather than keeping your credit card details on file long after they need them)? If they receive a government request for your data, will they honour it, and will they let you know?
Don’t trust the network
Even if you write both server and client, the data can be changed or lost in the middle. Any public WiFi can be compromised and your traffic intercepted, and there’s only so much HTTP-only and SSL-only cookies can protect you from when your attacker controls your DNS server. Beyond WiFi, agencies such as NSA and GCHQ are watching end points and can intercept some SSL traffic. The padlock is only as secure as the lockmaker. If you’re Google, you can’t even trust your “internal” network between sites. Expect everything that you do not own or you cannot physically trace to be compromised and secure your data and communications appropriately.
After my Post PC post, and with an interest in node.js I decided to see if it was now possible to develop a reasonably complex bit of software, with structure and tests, having nothing more than a Web browser installed. I looked at a few options but decided on cloud9 http://c9.io because it has a Chrome app, supports GitHub, BitBucket and Azure, and they are the custodians of the open source Web text editor formally known as Mozilla Skywriter, and all their server code is available on GitHub. They also give you a bash terminal, which makes git and mercurial feel at home far more than on a DOS prompt. As I will be making this code openly available, I have no privacy concerns with using the cloud, but if this was a commercial project, I may have different concerns, although, since Cloud9 is an open source project, I would be able to create a private install so I could use a netbook or tablet to write, compile and run code on a server.
My first view was that this was a pleasant surprise. I think with software like this, it is entirely possible to do web development on the web, with full support for most of what I do in my day job, up to deployment. Writing native software is still a few steps behind, although with projects like PhoneGap Build, there’s not much of the loop left to close.
As a UNIX developer by default (and a Windows developer by day), I found Cloud9 very familiar, and despite a few refresh bugs, I felt very productive, and was able to quickly code, build, unit test, and deploy to a temporary staging environment without having to learn anything new, creating shell scripts to help me out along the way, which was a great bonus as I was learning node.js. Unlike my laptop, it also has auto-save and hibernate, so if my connection fails, I don’t lose my edit, and can easily pick up where I left off.
Compared to my usual workday environment of Visual Studio + CodeRush, there’s a lot of features that I’m used to missing, such as many of the code templates and refactorings, but node.js needs a lot less typing than c#, so it’s less of a problem than it would otherwise be. It’s not a showstopper, but I do feel slightly at a loss when switching between them.
Going on this experience, I would say that the cloud is ready for developers, at least if you’re developing for the web, and you’re developing in the open. The usual caveats about cloud security and potential loss of services apply (keep a local copy of your repo if you want to guarantee you’ll always have it, for example), but the web definitely is now powerful enough to develop for itself, and that makes it a powerful platform. Hat tip to the Cloud9 team, and I’ll tell you more about my project next time.
I think every pragmatic programmer or aspiring code guru needs a core programming challenge that they return to whenever they want to try something new, like signature tune a guitarist will play on every new guitar to see how it fits their style.
My favourite pattern is The Mandelbrot Set because it’s a nice way to check the main features of any language : looping, branching, and creating complex structures, as well as adding a graphical level to start looking at the surrounding libraries. It’s also a neat optimisation problem, and each language I’ve used lends itself to slightly different optimisations.
So what’s your workbench? Do you build a unit-testing framework? Or a shopping cart app? Or do you turn every language into a LOLCode parser?