Categories
cloud development programming

Cloud thinking : Executable documentation.

Documentation is just comments in a separate file. At least developers can see the comments when they change code. Tests are better comments. Tests know when they don’t match the code.

Infrastructure is the same. I can write a checklist to set up an environment, and write an architecture diagram to define it, but as soon as I change something in production, it’s out of date, unless it’s very high level, and therefore only useful to provide an outline, not detail.

Unit tests document code, acceptance tests document requirements, code analysis documents style and readability, and desired-state-configuration documents infrastructure. All of them can be checked automatically every time you commit.

Documentation as code means the documentation is executable. It doesn’t always mean it’s human readable; ARM templates in particular can be impenetrable at times. If machines understand it, the documentation can be tested continuously, repeated endlessly across multiple environments, reconfigured and redeployed at the stroke of a keyboard.

The more human you have in a process, the more opportunities for human error. It doesn’t remove mistakes, but it’s much easier to stop the same mistake happening twice.

Advertisement
Categories
development programming security

NMandelbrot : running arbitrary code on client

As part of my grand plan for map-reduce in JavaScript and zero-install distributed computing, I had to think about how to gain user trust in a security context where we don’t trust the server. I couldn’t come up with a good answer.

Since then, we’ve seen stories of malicious JavaScript installed to mine cryptocurrencies , we know that JavaScript can be exploited to read kernel memory, including passwords, on the client, and I suspect we’ll see a lot more restrictions on what JavaScript is allowed to do – although as the Spectre exploit is fundamentally an array read, it’s going to be a complex fix at multiple levels.

I had ideas about how to sandbox the client JavaScript (I was looking at Python’s virtualenv and Docker containers to isolate code, as well as locking them into service workers which already have a vastly more limited API), but that relies on the browser and OS maintaining separation, and if VMs can’t maintain separation with their levels of isolation, it’s not an easy job for browser developers, or anyone running on their platform.

I also wondered if the clients should be written in a functional language that transpiled to JavaScript, to have language level enforcement of immutability and safety checks. And of course, because a functional style and API provides a simpler context to reason about map-reduce, by avoiding any implicit shared context.

Do you allow someone else’s JavaScript on your site, whether a library, or a tracking script, or random ads from Russia, Korea, botnets and script kiddies? How do you keep your customers safe? And how do you isolate processes you trust from processes that deal with the outside world and users? JavaScript will be more secure in the future, and the research is fascinating (JavaScript Zero: real JavaScript, and zero side-channel attacks) but can you afford to wait?

Meltdown and Spectre shouldn’t change any of this. But now is a good time to think about it. Make 2018 the year you become paranoid about users, 3rd parties and other threats. The year is still young, but the exploits are piling up.

 

Categories
development programming

Cloud thinking : storage as data structures

We’ve all experienced the performance implications of saving files. We use buffers, and we use them asynchronously.

Azure storage has Blobs. They look like files, and they fail like files if you write a line at a time rather than buffering. But they’re not files, they’re data structures, and you need to understand them as you need to understand O(N) when looking at Arrays and Linked Lists. Are you optimising for inserts, appends or reads, or cost?

I know it’s tempting to ignore the performance and just blame “the network” but you own your dependencies and you can do better. Understand your tools, and embrace data structures as persistent storage. After all, in the new serverless world, that’s all you’ve got.


Understanding Block Blobs, Append Blobs, and Page Blobs | Microsoft Docs

 

Categories
cloud data development security

Cloud is ephemeral

The Cloud is just someone else’s servers, or a portion thereof. Use the cloud because you want to scale quickly, only pay for what you use, and put someone else, with a global team, on the hook for recovering from outages. You’d also like a safety net, somewhere out there with the data you cannot afford to lose. But whatever is important to you, don’t keep it exclusively somewhere out of your control. Don’t keep your one copy “out there”. Back it up, replicate it. Put your configuration and infrastructure in source control. Distributed. Cloud thinking is about not relying on a machine. Eliminate Single Points of Failure, where you can, although there’s little you can do about a single domain name.

Understand your provider. Don’t let bad UI or configuration lose your data : Slack lost 800,000 messages.

Your cloud provider is a dependency. That makes it your responsibility. Each will give you features you can’t get on your own. They give you an ecosystem you can’t get from your desktop, and a platform to collaborate with others. They give you federated logins, global backups and recovery, content delivery networks, load balancing on a vast scale. But if the worst happens, know how to recover. “It’s in the cloud” is not a disaster recovery strategy, just ask the GitLab customers (although well played to them on their honesty so the rest of us can learn). Have your own backup. And remember, it’s not a backup unless you’ve verified you can restore.

It takes you 60 seconds to deploy to your current provider. How long does it take to deploy if that service goes dark?

Categories
cloud development

Cloud thinking : efficiency as a requirement

In the old world, you bought as big a machine as you could afford, and threw some code at it. If it could fit in memory and the disk I/O wasn’t a bottleneck, everything was golden.

In the cloud, however, CPU cycles and disk storage cost real money. Optimisation is key, so long as it’s not premature. Monitor it.

In cloud thinking, it’s less about the O(N), it’s relatively easy to scale to the input, as long as you’re not exponential. In the cloud, it’s about O($) – how well does your code scale to the amount of money you throw at your servers (or inversely, what’s the code increase per user).

Fixed costs are vanishingly small in the cloud, but incremental costs can change quickly, depending on your base platform. Not the provider, as costs between them are racing to the bottom, but the platform of your architecture.

Quite simply, the more control you ask for, the more you pay in time, and the bigger the ramp-up steps. Get metal, and you’ll pay for everything you don’t use, get a VM and you can scale quicker to match demand, get a container, get an Electric Beanstalk or an Azure website and give up the OS, get a Lambda or a Function.

I can’t recommend to you how much you should abstract. I don’t know how big your ops team are, or how much computation you need to do for each user. I suspect you might not know either. Stop optimising for things you don’t care about. Optimise for user experience and optimise for cost per user, and measure everything.

Categories
data security

Privacy is not your only currency 

If you’re not paying, you’re the product.

But you’re not. In security, we talk about 2-factor authentication, where 2 factor is 2 out of 3 : who you are, what do you know, and what do you have. Who you are is the product, a subset of a target market for advertising, or a data point in a data collection scoop. The former requires giving up privacy, the latter less so.

Advertising is about segmenting audiences and focusing campaigns, so views and clicks both matter, to feed into demographics and success measures. Ad blocking is a double whammy – no ads displayed, and no data on you. Websites tend to argue that the former deprives them of revenue, many users argue that the latter deprives them of privacy.

What you have is money, and who you are is part of a demographic than can be monetised in order to advertise to you to get your money.

But what else do you have? If you’re on the web you have a CPU that can be used to compute something, whether it’s looking for aliens or looking for cancerous cells. If you’re happy to give up your CPU time.

Who else are you? You might be an influencer. You might be a data point in a freemium model that makes the premium model more valuable (hello, LinkedIn).

What do you know? If you’re a human you know how to read a CAPTCHA (maybe), you could know multiple languages. Maybe you know everything about porpoises and you can tell Wikipedia.

Your worth to a website isn’t always about the money you give them, or the money they can make from selling your data. It’s the way we’ve been trained to think, but there’s so much else we can do for value.

Categories
development programming

Botnet of things

image

The Internet of Things is the new hotness. It’s the source of Big Data, it’s the future of clothing and wearables and retail and your kitchen. It’s going to be everywhere. Says the hype. Smart watches. Smart fridges. Smart cars. Smart cities.

Part of my is excited, there’s a lot of possibilities, especially once you start hooking them together either with code, or via services such as if-this-then-that.

Stop for a minute though. Consider that we are talking about a heterogeneous collection of internet-connected devices in your house, on your body, on your commute, gathering a lot of data on you and controlling things around you so you don’t have to.

Do what happens when they’re not controlled on your behalf?

These devices have access to:

  • Your WiFi password
  • Your connected services
  • Whatever their sensors pick up (audio, visual, etc)
  • Other devices

Some of them happily connect on unsecured channels.

They are updated according to manufacturer policy (see Android fragmentation and the WebView vulnerability to see how well that works out).

If you accept the 3 rules of network security, and choose not to trust the manufacturer, the cloud services and the network, and want to protect yourself, how do you isolate your threat but still allow the benefits of these devices? How do you isolate the rest of your devices or services if one gets compromised? How do you protect your future data if the services get compromised? How do you protect yourself if your network gets compromised?

Possible solutions:

  • IoT DMZ for WiFi – allow devices to access your WiFI via an authentication key rather than password (similar to one-time passwords for 2FA enabled sites), which only allows them to access an authorised list of sites, and not other nearby devices, managed by your phone/companion app?
  • Direct network connection (Ethernet over power) rather than WiFi
  • Non-personal connection (built-in 4G)
  • local data hub that relays the collected information across your local network to a service you choose
  • Bluetooth, or other close-range set up (or see ChromeCast, which broadcasts an SSID for phone to pick up, then switches to the WiFi you set up)
  • Quick list/disabling of connected services?
  • Token auth rather than password auth
  • Forced updates
  • Non-network updates (my TV allows USB or OTAerial firmware upgrades)
  • Don’t connect your smart device to the network
  • Decide you don’t need internet access on your car, or your fridge.

If you aren’t scared enough yet –

Cybercrime, the security of things : http://www.information-age.com/technology/security/123459847/security-things-iot-and-cybercrime

And don’t forget to patch your car : http://www.wired.com/2015/07/patch-chrysler-vehicle-now-wireless-hacking-technique/

Categories
code development NMandelbrot programming

Thoughts on Node.js

Focus on the flow
Focus on the flow

In my post about developing in the cloud, I started promising a few nuggets about the project I was working on, and following my diversion onto talks and security, I’m ready to start discussing it again. The project itself is fairly straightforward. It was an excuse for me to try a realistic project in Node.js using cloud development tools, to see what was possible, and to decide if I wanted to use Node.js for anything more substantial.

Partly, I wanted to immerse myself in a completely callback-driven, “non-blocking” model, so that I can see how it affected the way I thought about the software I was writing, and also to see if I could see the benefits. I’ll expand on some of these thoughts as I talk more about it, but I wanted to start with my initial impressions.

What Node.JS does well

Firstly, from looking at a test example based on graphing data from this great read about green energy, it was clear to me that the promises of Node.js lie very much in the I/O performance side, rather than anything else. If you’re trying to do anything more complex than throwing as many bytes at the response buffer as possible, Node.js is probably the wrong tool. It’s not built for complex business logic. It is good if your server is a thin layer between a JS front-end and something specialised on the backend (or a series of specialist pipes on the backend – the web services support is top notch, as you’d expect from the language that gave us AJAX and JSON).

One of the reasons I started to enjoy using Node.js was the ability to run the same code on client and server, which meant that I could write tests for my JavaScript logic that would run as unit tests in the server context, and then ship the exact same JavaScript to the client, knowing that the internal logic was correct, even if the performance wouldn’t be. This was greatly helped by the fact that I could use Require.JS to bridge the gap between the client and Node’s extensive NPM package repository, which isn’t as easy to search as the apt-get and NuGet repositories I’m more used to, but is fairly comprehensive, although it does suffer from the apt-get problem of having a lot of packages for doing the same thing, and it can be hard to choose which one to use. There are definitely popular options that bubble to the surface, but I get the feeling I’d need to try a few of them to really start to feel which was the right one to use, especially as a few of them have APIs that feel quite alien to the core libraries, and feel like unthinking ports from other languages, or at least non-callback philosophies.

In the end, I got a proof of concept up over a weekend, which is about as long as I’d normally spend on The Mandelbrot Set, which is a nice quick result, and I got multiple users up on the site, which is more of a testament to Cloud 9 as it is to Node, but the resultant code had fewer features and messier code than the alternatives I’d written in other languages. It certainly felt more as if I was fighting against the flow than in previous incarnations, despite the refactoring tools and examples I had available, and it was a lot harder to keep the program flow in my head, since I had to ensure I understood the context that the callback was operating in : which paths could lead to the current code being executed and what I could rely on being set. I tried to trim the code back as much as possible, but I was still debugging a lot more than I was used to in other incarnations, despite more testing.

What Node.JS does to make you grind your teeth

At this point, I’ve written two proofs-of-concept in Node.JS, and whilst I think the 2 projects I tried weren’t the optimal projects for Node.JS, I’m getting a feel for what is can and cannot do. I can see places where I would use it if I was doing streaming or similar low-latency, high-throughput tasks that were just about routing bits, and I have watched it updating several clients at once, but it’s very easy to write blocking code that will slow the server, and has an instant impact on all clients, making them all hang until the blocking operation is completed. That type of error is not unique to Node.JS, but I found the chain of callbacks increasingly more difficult to reason about, making that type of error more likely. It feels like writing GOTOs.

At this point, I can’t see myself using Node.JS for anything other than a very thin layer to some business logic, and at that point, it seems odd to use a thin layer of web services just to call other services in another language. That’s what I’d do to start migrating out a legacy app, but I wouldn’t start a design that way. If I wanted to build a web service backend, there’s no benefit in Node.JS that I couldn’t get in WebAPI. However, I’m wondering whether my next project should be to re-write the backend in Go, to see if that’s any easier.

Summary

Node.JS is fun to play with as a proof of concept of callback-driven code, but I’ve seen the basic idea of a stripped back, high performance web server delivered far more elegantly in Erlang, Go, and other places. Node.JS throws out threads, and doesn’t offer enough in return to justify the switch for me. It’s fast, but not the fastest, and my productivity is definitely lower in Node.JS than it was in my first program in C#, or Java, or Python, or even my first work with client-side JavaScript. It’s an interesting experiment but I’ve got better things to do with my time than reason about which callback failed to trigger. For the right project, I’d give it another run, but the right projects for Node.JS are very specialised.

Categories
code development NMandelbrot programming

Post PC? Developing in the cloud.

Cloud-less
Is developing in a cloudless environment a old-fashioned throwback?

After my Post PC post, and with an interest in node.js I decided to see if it was now possible to develop a reasonably complex bit of software, with structure and tests, having nothing more than a Web browser installed. I looked at a few options but decided on cloud9 http://c9.io because it has a Chrome app, supports GitHub, BitBucket and Azure, and they are the custodians of the open source Web text editor formally known as Mozilla Skywriter, and all their server code is available on GitHub. They also give you a bash terminal, which makes git and mercurial feel at home far more than on a DOS prompt. As I will be making this code openly available, I have no privacy concerns with using the cloud, but if this was a commercial project, I may have different concerns, although, since Cloud9 is an open source project, I would be able to create a private install so I could use a netbook or tablet to write, compile and run code on a server.

My first view was that this was a pleasant surprise. I think with software like this, it is entirely possible to do web development on the web, with full support for most of what I do in my day job, up to deployment. Writing native software is still a few steps behind, although with projects like PhoneGap Build, there’s not much of the loop left to close.

As a UNIX developer by default (and a Windows developer by day), I found Cloud9 very familiar, and despite a few refresh bugs, I felt very productive, and was able to quickly code, build, unit test, and deploy to a temporary staging environment without having to learn anything new, creating shell scripts to help me out along the way, which was a great bonus as I was learning node.js. Unlike my laptop, it also has auto-save and hibernate, so if my connection fails, I don’t lose my edit, and can easily pick up where I left off.

Compared to my usual workday environment of Visual Studio + CodeRush, there’s a lot of features that I’m used to missing, such as many of the code templates and refactorings, but node.js needs a lot less typing than c#, so it’s less of a problem than it would otherwise be. It’s not a showstopper, but I do feel slightly at a loss when switching between them.

Going on this experience, I would say that the cloud is ready for developers, at least if you’re developing for the web, and you’re developing in the open. The usual caveats about cloud security and potential loss of services apply (keep a local copy of your repo if you want to guarantee you’ll always have it, for example), but the web definitely is now powerful enough to develop for itself, and that makes it a powerful platform. Hat tip to the Cloud9 team, and I’ll tell you more about my project next time.

Categories
code google programming

The Cloud Promise

Having submitted a talk on html5, and calling it “The Language of Cloud Computing” (please vote for it if you are interested), I thought I should take this opportunity to discuss how I see the possibilities of the cloud. I do web development in my day job and we use some of the technology that is now discussed as cloud technologies, but this is my perspective as an end user. I will put a warning here, that there is a bit of blue-sky thinking in this post.

My original germ of this post came after reading this post by Gary Short: Cloud Computing – It’s that New Old Thing – Gary’s Blog and as I’ve been writing this post, I think that he’s the angel sitting on one shoulder telling you to be careful, to watch out for snake oil and to mind the gap, which makes me the devil on the other shoulder telling you to go on, jump in, the water’s lovely, and what could possibly go wrong? With that in mind, I recommend you read his post and keep it in mind.

There are two distinct shifts I have seen that are classed as “Cloud Computing”. On the server side, machines are becoming virtualised which allows for greater flexibility and a more efficient use of resources (go green rangers). I use VMs all the time, and I think they’re great. Of course, mine aren’t hosted across the world, like Amazon Elastic Compute Cloud (Amazon EC2) or Windows Azure. On the client side, it’s about the trend to save your data “somewhere else” so you don’t need local storage and you can access your data from everywhere. That seems to be the underlying vision of Google Chrome OS and the Apple iPad and it has a certain appeal when you don’t have to worry about backups, running low on disk space, and other maintenance tasks. Why not give it to someone else to worry about? And if that’s not enough, put your photos on Flickr!, edit them on Picnik, and then share them on Facebook, giving a lot more options than you could have on your own, and allowing others to manipulate, improve, and remix your data.

But what if you pick the wrong service? Picnik currently doesn’t work with Windows Live Photos so you miss out on that service unless you upload the photos from your computer. And what happens when you’ve been using Yahoo 360 for blogging and it disappears from the web?

What we realy need is some proper data portability. You don’t want “Facebook compatible” or “Works with Flickr”. Google Reader is great because it works with RSS, and there’s a lot of RSS out there to work with. We want something as simple as “Yes, it’s a JPG, I can work with that”, just like I can on my computer. And more than that, we need to be able to move our data from one cloud provider to another.

My ideal model for how “The Cloud” should be working, for me, would be that every cloud provider would support standard data transfer protocols, whether it’s ftp, imap for email, or something new, that supports open authentication to transfer directly between providers as well as to my local storage. The key is making it easy to synchronise my data. Yes, some of it will be world-accessable, with all the privacy issues that presents, and yes, some providers will be better at some things than others, but making it easy to transfer my data between providers means that I can move around to get the best of all worlds, instead of choosing this place for greater storage instead of that one that lets me share, or the other one with better editing. There’s loads of people who make it easy to get your data in, they support multiple clients, have a nice web uploader, annd so on, but it’s often a lot harder to get your data out. The Data Liberation Front (the Data Liberation Front) is a nice step in the right direction from Google, and I think is a good, basic philosophy for Cloud providers to follow, but that should be the bare minimum I should expect from a provider. If they don’t follow that model, they are not “The Cloud”, they are just silos full of quicksand, and the more you struggle to get you data out, the more it seems to be lost forever.

So, will html5 help with any of this? AJAX has become a reason for providers to become more open, to encourage traffic back to their own sites, and adverts, but with html5, we get, for example:

  • microdata, to help define areas of pages that other sites can understand, such as calendar events and contact details;
  • structural elements, like nav, to help parsers find the interesting data;
  • Cross-document messaging to allow pages from one domain to communicate with pages on another domain, with a security model to prevent XSS attacks; and
  • protocol handler registration to allow a website to declare that it can handle fax: or mailto: links, or jpg or apk files so I can grab a link or file from one site and sent it directly to another site that I trust.

I don’t know how many cloud providers will start to use these features, but until they do, the cloud is stunted.

Blogged with the Flock Browser