Categories
development programming security

NMandelbrot : running arbitrary code on client

As part of my grand plan for map-reduce in JavaScript and zero-install distributed computing, I had to think about how to gain user trust in a security context where we don’t trust the server. I couldn’t come up with a good answer.

Since then, we’ve seen stories of malicious JavaScript installed to mine cryptocurrencies , we know that JavaScript can be exploited to read kernel memory, including passwords, on the client, and I suspect we’ll see a lot more restrictions on what JavaScript is allowed to do – although as the Spectre exploit is fundamentally an array read, it’s going to be a complex fix at multiple levels.

I had ideas about how to sandbox the client JavaScript (I was looking at Python’s virtualenv and Docker containers to isolate code, as well as locking them into service workers which already have a vastly more limited API), but that relies on the browser and OS maintaining separation, and if VMs can’t maintain separation with their levels of isolation, it’s not an easy job for browser developers, or anyone running on their platform.

I also wondered if the clients should be written in a functional language that transpiled to JavaScript, to have language level enforcement of immutability and safety checks. And of course, because a functional style and API provides a simpler context to reason about map-reduce, by avoiding any implicit shared context.

Do you allow someone else’s JavaScript on your site, whether a library, or a tracking script, or random ads from Russia, Korea, botnets and script kiddies? How do you keep your customers safe? And how do you isolate processes you trust from processes that deal with the outside world and users? JavaScript will be more secure in the future, and the research is fascinating (JavaScript Zero: real JavaScript, and zero side-channel attacks) but can you afford to wait?

Meltdown and Spectre shouldn’t change any of this. But now is a good time to think about it. Make 2018 the year you become paranoid about users, 3rd parties and other threats. The year is still young, but the exploits are piling up.

 

Advertisement
Categories
code data development programming security

NMandelbrot : Clients gaming the system

Mandelbrot set with suspicious lines
There’s a glitch in the data

In any system with clients outside your direct control, you will be subject to Rule 1 of network security : Don’t trust the client. For the Mandelbrot Set, the worst that can happen to the result is that a few pixels go astray, provided the input is properly sanitised to protect the server.

For more complex calculations, where the data matters, it may be of interest to some parties to try and skew the results. In the Search for Extra-Terrestrial Intelligence hack, for example, participants were claiming credits for work not done, or submitting bad data, so some verification of the result is required, which can either be done on the server, or by submitting the same work to multiple clients and getting them to “vote” on the result, which requires a much smarter reduce algorithm than is available in the sample code.

Note that securing the client code (e.g. by obfuscation or delivering a non-JS payload to execute the algorithm) is no defence, given that there must be a globally accessible service for the clients to talk to in order to get any data back. The channel itself can be secured, providing you don’t trust the encryption for long, but even with client-side security, such as an SSL certificate, as soon as the code leaves your server, you no longer have any guarantee over it. Given the importance and sensitivity of your data, that may or may not be a problem.

Anyone who doesn’t validate all inputs on the server is handing the keys to their attacker*, but when you don’t know what the input should be (otherwise, why do the calculation at all), you have to find a way to build trust. Maybe each client gets tagged with an id, non-traceable to a user, and the validity of responses from that client can be measured over time to give a trust rating, allowing the voting to be weighted to prefer results from trusted clients, assuming there is a mechanism in place to lose trust if a client is compromised.

Maybe the payload includes some hidden data, a known, non-repeating, throwaway result (similar to a 2-factor authentication token) whose only purpose is to validate that the client is responding correctly, but is otherwise indistinguishable from real data.

There’s no one solution to fit all situations, and the server and client cost of the solution will be correlated with the importance of the data, up to the point where the data, even in a subset, and even with protections, is too important to be opened to untrusted machines.

There are many other client-side attacks or mitigations I have missed, so feel free to add your own suggestions below.

* Note : you can do client-side validation prior to sending to the server for usability reasons, but not for security.

Categories
code development NMandelbrot

The node.js Mandelbrot Set

Animated Mandelbrot Set
Animated Mandelbrot Set

To tie together a few of my previous posts, I wanted to talk about the proof of concept I built in Node.js. I will come back and discuss the outstanding issues in a later post.

The concept

I wanted to try out Node.js as the hot new thing in order to see how I felt about JavaScript as a server-side language, and to think about unit testing of Javascript code, and how to build an application most suited to the idea of a low-latency, single-threaded server.

Given my preference for the Mandelbrot set as a prototype in client-side languages, I wondered how I could develop a Mandelbrot solution that used the server as little as possible, so I hit upon the idea of creating a zero-install grid computing solution, similar to SETI@Home, where every browser that logged on would be computing a small piece of the whole, and the job of the server would be to coordinate the clients, and maintain the shared state of the current progress.

The Implementation

I’m not affiliated with Numberphile, I’m just a fan, but for those of you who don’t know about the mathematics of the Mandelbrot Set, it’s worth having a look at this video to understand it.

The way my implementation works, in order to satisfy the map-reduce behaviour I was looking for, is as follows:

  • Create a grid of points to represent each un-escaped pixel
    • Note, the proof of concept used a fixed grid to ensure an upper bound on the number of points that the server needs to store.
    • The proof of concept used a sparse grid of points here as I originally planned to do a flood-fill of the outer regions, but changed my mind and didn’t refactor.
  • For each point, store the current iteration, value and whether it has stabilised (initially false). These points are indexed on the complex number co-ordinates rather than canvas co-ordinates.
  • When a new client connects, open two connections. The first asks for the currently valid list of points to output to its own canvas, and the second ask for the next bit of work.
  • The server picks an unescaped point at random, and sends it to the client, as well as sending the current list.
  • When the client receives a point to work on, it performs up to 50 iterations on that point. If the point escapes, the client stops and reports the iteration that it escaped on, otherwise, it increments the iteration count by 50 and updates the z values to be the most recently computed for that point. It also renders that point to its own canvas
  • The server receives the value, updates its cache, then sends the next point down to the client.

Of course, there’s a lot more to it than that, but I’ll talk about how I solved some of those issues in future posts.

For now, if you want to jump ahead, and check out the code yourself, grab Node.js (or a Cloud9 or Codio development environment) and grab the NMandelbrot Node.JS sample from Bitbucket.

If anyone wants to fork it to Github, please let me know and I’ll add a link to that here too.

Categories
code development NMandelbrot programming

Thoughts on Node.js

Focus on the flow
Focus on the flow

In my post about developing in the cloud, I started promising a few nuggets about the project I was working on, and following my diversion onto talks and security, I’m ready to start discussing it again. The project itself is fairly straightforward. It was an excuse for me to try a realistic project in Node.js using cloud development tools, to see what was possible, and to decide if I wanted to use Node.js for anything more substantial.

Partly, I wanted to immerse myself in a completely callback-driven, “non-blocking” model, so that I can see how it affected the way I thought about the software I was writing, and also to see if I could see the benefits. I’ll expand on some of these thoughts as I talk more about it, but I wanted to start with my initial impressions.

What Node.JS does well

Firstly, from looking at a test example based on graphing data from this great read about green energy, it was clear to me that the promises of Node.js lie very much in the I/O performance side, rather than anything else. If you’re trying to do anything more complex than throwing as many bytes at the response buffer as possible, Node.js is probably the wrong tool. It’s not built for complex business logic. It is good if your server is a thin layer between a JS front-end and something specialised on the backend (or a series of specialist pipes on the backend – the web services support is top notch, as you’d expect from the language that gave us AJAX and JSON).

One of the reasons I started to enjoy using Node.js was the ability to run the same code on client and server, which meant that I could write tests for my JavaScript logic that would run as unit tests in the server context, and then ship the exact same JavaScript to the client, knowing that the internal logic was correct, even if the performance wouldn’t be. This was greatly helped by the fact that I could use Require.JS to bridge the gap between the client and Node’s extensive NPM package repository, which isn’t as easy to search as the apt-get and NuGet repositories I’m more used to, but is fairly comprehensive, although it does suffer from the apt-get problem of having a lot of packages for doing the same thing, and it can be hard to choose which one to use. There are definitely popular options that bubble to the surface, but I get the feeling I’d need to try a few of them to really start to feel which was the right one to use, especially as a few of them have APIs that feel quite alien to the core libraries, and feel like unthinking ports from other languages, or at least non-callback philosophies.

In the end, I got a proof of concept up over a weekend, which is about as long as I’d normally spend on The Mandelbrot Set, which is a nice quick result, and I got multiple users up on the site, which is more of a testament to Cloud 9 as it is to Node, but the resultant code had fewer features and messier code than the alternatives I’d written in other languages. It certainly felt more as if I was fighting against the flow than in previous incarnations, despite the refactoring tools and examples I had available, and it was a lot harder to keep the program flow in my head, since I had to ensure I understood the context that the callback was operating in : which paths could lead to the current code being executed and what I could rely on being set. I tried to trim the code back as much as possible, but I was still debugging a lot more than I was used to in other incarnations, despite more testing.

What Node.JS does to make you grind your teeth

At this point, I’ve written two proofs-of-concept in Node.JS, and whilst I think the 2 projects I tried weren’t the optimal projects for Node.JS, I’m getting a feel for what is can and cannot do. I can see places where I would use it if I was doing streaming or similar low-latency, high-throughput tasks that were just about routing bits, and I have watched it updating several clients at once, but it’s very easy to write blocking code that will slow the server, and has an instant impact on all clients, making them all hang until the blocking operation is completed. That type of error is not unique to Node.JS, but I found the chain of callbacks increasingly more difficult to reason about, making that type of error more likely. It feels like writing GOTOs.

At this point, I can’t see myself using Node.JS for anything other than a very thin layer to some business logic, and at that point, it seems odd to use a thin layer of web services just to call other services in another language. That’s what I’d do to start migrating out a legacy app, but I wouldn’t start a design that way. If I wanted to build a web service backend, there’s no benefit in Node.JS that I couldn’t get in WebAPI. However, I’m wondering whether my next project should be to re-write the backend in Go, to see if that’s any easier.

Summary

Node.JS is fun to play with as a proof of concept of callback-driven code, but I’ve seen the basic idea of a stripped back, high performance web server delivered far more elegantly in Erlang, Go, and other places. Node.JS throws out threads, and doesn’t offer enough in return to justify the switch for me. It’s fast, but not the fastest, and my productivity is definitely lower in Node.JS than it was in my first program in C#, or Java, or Python, or even my first work with client-side JavaScript. It’s an interesting experiment but I’ve got better things to do with my time than reason about which callback failed to trigger. For the right project, I’d give it another run, but the right projects for Node.JS are very specialised.

Categories
code development NMandelbrot programming

The 4th rule of network security: don’t trust encryption

Following my 3 rules of network security post, I’ve been thinking a lot more about the NSA aspect, and the fact that even if you have managed trust on the client, the server and the network, there’s still another concern, because the number one way of building trust, of saying that machine is who it says it is, of saying I can pass this personal data across an open network safely, is encryption.

Every so often we hear that an encryption method is broken, whether it’s WEP in your WiFi router, Elliptic Curves in the NSA-approved RSA security tool, or Heartbleed in OpenSSL, the only solution is to reset and start again, and hope none of your old data was compromised.

So, don’t store data you can’t afford to lose, unless you really have to (in Europe, if it’s personal data, minimising storage and collection is the law). All your security should be reviewed and hosted regularly. Someone’s full time job should be keeping on top of patches, renewing security credentials, including SSL certificates and passwords, and never chain credentials as a failure in one will lead to a cascade failure of your entire stack. Perfect Forward Security, for example is designed to avoid using the SSL certificate to generate the session key, so that any encrypted stream is not securely dependent on maintaining the privacy of the SSL certificate.

Categories
code development NMandelbrot programming

The three rules of network security

eye
who’s watching you?

I realise for most of the audience, this will be stating the obvious, but I want to cover these rules now so I can refer to them later in the series.

I’m going to do this as a series of 3 posts, so I can refer to each rule separately.

The three rules of network security:

  1. Don’t trust the client;
  2. Don’t trust the server(s);
  3. Don’t trust the network.

In short, don’t trust anything you don’t fully control. I list them separately here since the way we mitigate each is very different.

Troy Hunt covers most of the mitigation strategies and the mechanics of this better than me though, so if you’re interested in this topic, go check him out – Hack Yourself First : http://www.troyhunt.com/2013/05/hack-yourself-first-how-to-go-on.html (or listen to the .Net Rocks podcast – http://www.dotnetrocks.com/default.aspx?showNum=914 )

Don’t trust the client

If you’re running a server, and you don’t validate any user supplied content, please shut down your server now before you put the rest of the Internet at risk. Depending on what you’re processing, that includes any POSTed content, any query string, HTTP headers, the content hosted at any provided URL if you retrieve it, and many other possible inputs.

Even if you trust the content is not harmful to your IT security, you still can’t necessarily trust it. Your survey results will contain untrue data, none of your IE11 users will show up as IE users, and if you’re doing any calculations on the client, they may give the wrong answer due to misguided assumptions (the pixel density of an iPhone just before the retina display was announced) or malice.

One way to adjust for the effects of wrong answers is to aggregate results across many inputs such as the majority voting system employed by the Apollo computers to minimise the effects of computer failure. You can also check for inappropriate behaviour such as a high rate of submissions that indicate gaming or a DoS style attack. There are so many possible attacks that I can’t list them all here.

Don’t trust the server

As a client, you also need to validate what you receive. Any recent browser will sandbox and restrict any code by default and the recent web standards also include well-defined Chinese walls to prevent code from one site intercepting data from another (see, for example, CORS and compare to the old method of JSONP in terms of validation and verification of incoming requests. Of course, you should also be checking that what you are receiving is from the right source (mybank.com instead of mybank.com.some.compromised.server.ru).

In addition though, you also need to trust what the server will do with the data you send it. Will the owners respect your privacy (and remember, if they’re outside the EU, the Data Protection Act does not apply) or will they sell your data? Will they protect your account (by hashing passwords, and only storing what they need, rather than keeping your credit card details on file long after they need them)? If they receive a government request for your data, will they honour it, and will they let you know?

Don’t trust the network

Even if you write both server and client, the data can be changed or lost in the middle. Any public WiFi can be compromised and your traffic intercepted, and there’s only so much HTTP-only and SSL-only cookies can protect you from when your attacker controls your DNS server. Beyond WiFi, agencies such as NSA and GCHQ are watching end points and can intercept some SSL traffic. The padlock is only as secure as the lockmaker. If you’re Google, you can’t even trust your “internal” network between sites. Expect everything that you do not own or you cannot physically trace to be compromised and secure your data and communications appropriately.

Categories
code development NMandelbrot programming

Post PC? Developing in the cloud.

Cloud-less
Is developing in a cloudless environment a old-fashioned throwback?

After my Post PC post, and with an interest in node.js I decided to see if it was now possible to develop a reasonably complex bit of software, with structure and tests, having nothing more than a Web browser installed. I looked at a few options but decided on cloud9 http://c9.io because it has a Chrome app, supports GitHub, BitBucket and Azure, and they are the custodians of the open source Web text editor formally known as Mozilla Skywriter, and all their server code is available on GitHub. They also give you a bash terminal, which makes git and mercurial feel at home far more than on a DOS prompt. As I will be making this code openly available, I have no privacy concerns with using the cloud, but if this was a commercial project, I may have different concerns, although, since Cloud9 is an open source project, I would be able to create a private install so I could use a netbook or tablet to write, compile and run code on a server.

My first view was that this was a pleasant surprise. I think with software like this, it is entirely possible to do web development on the web, with full support for most of what I do in my day job, up to deployment. Writing native software is still a few steps behind, although with projects like PhoneGap Build, there’s not much of the loop left to close.

As a UNIX developer by default (and a Windows developer by day), I found Cloud9 very familiar, and despite a few refresh bugs, I felt very productive, and was able to quickly code, build, unit test, and deploy to a temporary staging environment without having to learn anything new, creating shell scripts to help me out along the way, which was a great bonus as I was learning node.js. Unlike my laptop, it also has auto-save and hibernate, so if my connection fails, I don’t lose my edit, and can easily pick up where I left off.

Compared to my usual workday environment of Visual Studio + CodeRush, there’s a lot of features that I’m used to missing, such as many of the code templates and refactorings, but node.js needs a lot less typing than c#, so it’s less of a problem than it would otherwise be. It’s not a showstopper, but I do feel slightly at a loss when switching between them.

Going on this experience, I would say that the cloud is ready for developers, at least if you’re developing for the web, and you’re developing in the open. The usual caveats about cloud security and potential loss of services apply (keep a local copy of your repo if you want to guarantee you’ll always have it, for example), but the web definitely is now powerful enough to develop for itself, and that makes it a powerful platform. Hat tip to the Cloud9 team, and I’ll tell you more about my project next time.

Categories
code development NMandelbrot programming

The Mandelbrot Set

Animated Mandelbrot Set
Animated Mandelbrot Set

I think every pragmatic programmer or aspiring code guru needs a core programming challenge that they return to whenever they want to try something new, like signature tune a guitarist will play on every new guitar to see how it fits their style.

My favourite pattern is The Mandelbrot Set because it’s a nice way to check the main features of any language : looping, branching, and creating complex structures, as well as adding a graphical level to start looking at the surrounding libraries. It’s also a neat optimisation problem, and each language I’ve used lends itself to slightly different optimisations.

I’ve gone through a few versions, from Basic, to a couple of versions of C++, Python and Javascript, discovering double-buffering when I got my hands on the SDL for gcc, and list comprehensions to do full-screen iterations in Python, and there’s always a new way to calculate and generate the output.

So what’s your workbench? Do you build a unit-testing framework? Or a shopping cart app? Or do you turn every language into a LOLCode parser?