Categories
development security

Flatter Data

I was watching The Verge summary of The Selfish Ledger, Google X’s thought experiment on what your personal data could do in the future. I started to think about Flatland.

Flatland is a book by Edwin A Abbott about dimensions. In the book, A Square lives in a 2D world, with other 2D shapes, and tries to comprehend the universe when 3D shapes start turning up, but A Square can only comprehend them in slices or shadows/projections.

See this video by Carl Sagan if you want to know more.

The personal data organisations see of us is like the circles projected in Flatland. Google sees the videos I like and the technologies I search for help on. HMRC sees my income, savings, and charitable giving. NHS sees my health.

Companies make decisions on this data, and, like the flatlanders, generalise from the pink circles they see. Sometimes that accurately reflects the brown circles, oftentimes, not. Sometimes what looks like 2 circles is a pair of legs, and what looks like one circle is actually a group hug.

I don’t want companies to disambiguate that. I endorse the spirit of GDPR, that data should only be given up in informed consent (absent the usual rights exemptions for criminals who who violate the rights of others.)

For those of us who work in tech, we need to embrace the ambiguity, and help users and other data subjects understand how they have been categorised. Let them embrace anonymity via randomisation, such as number variance data masking.

You never own someone else’s data, you merely look after it for as long as they let you. It’s not about privacy. It’s not about data. It’s about trust. It’s about ethics.

Categories
development security

How much data can you lose before you’re in trouble?

Ransomware is a very aggressive attack. Whilst many espionage operations are about sneaking in and copying data without your knowledge, ransomware hits you over the head with a hammer to let you know you’ve lost your data. It’s not theft, it’s extortion.

The big pro is that at least you know you’ve been breached, and the form of attack means that whilst you might not have access to your data, the bad guys might not either.

But you’ve got a good backup strategy, right? You can roll back the data to a known good point in history, and maybe even roll forward your changes from there.

But maybe it doesn’t matter. Maybe you can run your business just as effectively without that data, or those templates. Maybe you shouldn’t be keeping that data at all?

If you have data you need, distribute it. Secure it, but decide if the greater risk is you losing access to the data, or someone else gaining access.

If you have data you don’t need, Don’t store it.

Categories
development security

Primer : A tech view of GDPR

I was fortunate enough to attend an event at The Data Lab in Edinburgh today on the new General Data Protection Regulation, coming to the EU and the UK. There were 4 talks from a variety of angles, but for me the key takeaways were that the primary thrust of the regulation is about prevention rather than cure, and auditing and control rather than additional technical implementations, aside from the Data Portability clause.

Best practice still applies. Collect only the minimum data required, and don’t collect personal data unless you have to. Encrypt your data, in transit and at rest. Privacy should be the default, and only extended by informed choice.

But you need a data breach policy. An email to Troy Hunt might be OK if it’s a hobby project that was breached, but you need to notify data subjects and users if there is a breach, and you need the security policies and audits to protect you if the lawsuits start flying.

I’m not a lawyer, so I won’t offer advice there. But as you’re designing your systems, now’s the chance to audit, prepare and secure. Don’t be the first high-profile fine under the new rules.

february 14 2017 at 0237pm
february 14 2017 at 0237pm
dsc 0437
dsc 0437
dsc 0438
dsc 0438
dsc 0439
dsc 0439
Categories
data security

Privacy is not your only currency 

If you’re not paying, you’re the product.

But you’re not. In security, we talk about 2-factor authentication, where 2 factor is 2 out of 3 : who you are, what do you know, and what do you have. Who you are is the product, a subset of a target market for advertising, or a data point in a data collection scoop. The former requires giving up privacy, the latter less so.

Advertising is about segmenting audiences and focusing campaigns, so views and clicks both matter, to feed into demographics and success measures. Ad blocking is a double whammy – no ads displayed, and no data on you. Websites tend to argue that the former deprives them of revenue, many users argue that the latter deprives them of privacy.

What you have is money, and who you are is part of a demographic than can be monetised in order to advertise to you to get your money.

But what else do you have? If you’re on the web you have a CPU that can be used to compute something, whether it’s looking for aliens or looking for cancerous cells. If you’re happy to give up your CPU time.

Who else are you? You might be an influencer. You might be a data point in a freemium model that makes the premium model more valuable (hello, LinkedIn).

What do you know? If you’re a human you know how to read a CAPTCHA (maybe), you could know multiple languages. Maybe you know everything about porpoises and you can tell Wikipedia.

Your worth to a website isn’t always about the money you give them, or the money they can make from selling your data. It’s the way we’ve been trained to think, but there’s so much else we can do for value.

Categories
code data development programming security

Enforcing ethics

I was reading IOT: Code of Ethics for Software Developers and Engineers – Secret Microsoft Communications – Site Home – MSDN Blogs today and it got me thinking about the Botnet of Things, but more importantly, about ethics in Professional Development, as covered in the DunDDD open discussions.

The MSDN blog covers an ethical scenario well, so I don’t want to go over that again, but it got me thinking about something that I’ve been asked to do a few times, that takes the idea one step further.

I’ve been involved in a number of projects that handle sensitive data, particularly data on children, data on prisoners and sensitive financial data, so data protection is key to much of what I have built. In order to illustrate some of the additional ethical considerations when dealing with data, I’m going to discuss a scenario that doesn’t relate to a specific client, but covers many of the decisions that I have had to deal with, and I hope is a scenario familiar to many of you.

The ethical workflow

Consider a accountancy firm, with many clients. As a result of this, time tracking is very important to their business, so that they can bill clients appropriately. The scenario I want to present considers the timesheet software in use. At a basic level, there is a client code, a number of hours per day booked to that client, and an approval system so that the hours are checked following submission, before any invoices are sent out.

In addition, the timesheet software records overtime, and each users’ financial details, so that it can correctly pay each employer each month.

The software solution

The data entry portion validates that as a user, I only have access to a subset of the client codes, that I can only book my contracted hours to standard codes. The workflow ensures that a manager, as someone authorised to check work for a given client code, can authorise my time. The workflow also ensures that invoices cannot be generated until the time has been authorised.

This workflow is similar to many in systems I have designed. There is a validated data entry, which prevents the workflow from starting if the data entered is obviously incorrect, and a workflow that ensures the data is checked before it is used in a process with financial impact.

Ethical trapdoors

To truly be an ethical developer, you need to consider both the implicit and the explicit ethical considerations within the requirements, and the behaviour of the less ethical users, who may attempt to subvert the ethical process either due to malice, or laziness, or a myriad of other reasons.

Manager, authorise thyself?

Hopefully, the first potential ethical problem with this workflow is obvious to you : I have yet to mention any restriction who can authorise a timesheet. Should the user entering the timesheet also be a user authorised to access the client codes on the timesheet, they will be authorising themselves, offering no additional protection.

It might be the case that the user has been given authorisation because they have proven that they maintain high ethical standards, and would therefore be less likely to cook the books. If you believe in people over process, this might lead you to think this way. If, however, time pressures on individuals are such that the authorisation time is limited, there may be scenarios where a user would limit their diligence, increasing the change of deviation between the recorded and actual figures. There may also be unethical figures who are able to provide the facade of ethical competence in order to get the authorisation required.

Data leaks

Certain clients will be sensitive, either by means of celebrity, or association with staff, such as ex-husbands. Whilst their records will be recorded with sensitive data, satellite systems, such as invoicing and time tracking, may not be aware of their sensitivity. So, to ensure anonymity and enforce ethics via obscurity, the client codes should never leak information, either directly or indirectly (i.e. direct the user to an external resource that might contain sensitive information that can be exploited), and should only be visible to users with a valid reason to see them.

Software supports the business

Ultimately, the software exists because the business needs it. So the ethical decisions sit within those guidelines. The software can’t do everything, so the external processes have to be considered, and questioned where they allow ethical breaches that the software cannot counter. We have a duty to recognise the limits of where our software can enforce ethical behaviour and document these limits so that our customers can adapt or strengthen their processes appropriately. We also have a duty to challenge requirements and requests that violate or ethics, or the ethics our clients declare they follow.

Categories
code data development programming security

Personal Identity in a digital world

In the light of more data breaches, especially highly personal data of the form held by a certain affair website, I wanted to revisit the 4th Rule of network security. If you don’t trust encryption, you store as little as possible which implies YAGNI for data storage. There’s a few other benefits you get as well, in simplifying your storage. In this post I want to focus on the simplest and most obvious personal identifier, your name. And I’ll assume you’ve all read Falsehoods Developers Believe About Names, so if you haven’t yet, read it now.

What’s in a name, or a person?

At its basic level, a name is a poor identifier. Going by Google and some misplaced emails, I know of at least 3 other people in Scotland, 1 in Australia and 2 in USA who share my name. So my name is not sufficient to uniquely identify me, which is why I have a number of tracking references for the NHS, National Insurance (for Americans, think Social Security Number), my employer and others, and why it’s hard to get my name on social networks or email providers unless I get in early.

Do you need to know anyone at all? Is a tracking id enough?

Given that your name is not sufficient to uniquely identify you (and therefore is also harder to verify), is it even necessary? For many sites, and apps, not even a login is necessary or desirable to users, so advertisers, content providers and others often just use a tracking cookie or similar to identify you again the next time they see you.

If they need a login is a username and password enough?

OK, but your software needs to verify users and store data about them. You’ve got a good authorisation and authentication story to allow users access to their own stuff and no one else’s, without permission.

Do you still need their name? If you are a music site tracking my favourite videos, why do you care who I am? And if you’re built on a shoestring budget, why would you want the hassle of securing password when you can grab an OAuth library and not have to worry about it at all. Some of them will give you a name, some won’t. What purpose does storing a name, real or otherwise, serve?

Do you actually need to split names into first and last?

OK, so you actually need someone’s name, how are you going to store it? A name, a first and last name, middle names? If you are splitting a name and then littering your code (or at least your CPU time, if you’re resharing code) with contractions to display those names, then you may be doing it wrong. And who’s to say that all your users have a first and a last name? (see common misconceptions about names)

What about titles?

So you need a name, and you are sure you can split it. What else do you need? Do you need a title? Does it matter if your drop-down asks for Mr or Mrs or Miss? And by the way, what about Ms and Dr, and Prof, and can you handle Mx now, or do you discriminate? And by the way, how easy is it for the user to change those fields, as required by law?

What other details are you doing that you don’t need? Addresses? Country? Phone number? Email address? Credit Card number? Mother’s maiden name? Name of first pet?

Why does free train WiFi need my gender and age?

Or any website? And Why are those fields mandatory?

As a user, what is my motivation for giving you those details? The more information I give you, the more I have to trust you. And if I don’t trust you, you don’t get my data. If I know why you need it, there’s more chance of you getting my data. If I don’t know why you need it, I will assume you’re selling it, and I may think that even if I know why you need it.

Don’t ask me to trust you, and you won’t be disappointed.

What are you storing because you can, rather than because you need to?

Is your data a security risk, a performance risk, a trust risk? Or can you justify everything, and point to the requirement that details that justification?

If you lost your data to hackers, what would your users be most concerned about being disclosed? Can you stop storing that data?

Categories
development programming

Botnet of things

image

The Internet of Things is the new hotness. It’s the source of Big Data, it’s the future of clothing and wearables and retail and your kitchen. It’s going to be everywhere. Says the hype. Smart watches. Smart fridges. Smart cars. Smart cities.

Part of my is excited, there’s a lot of possibilities, especially once you start hooking them together either with code, or via services such as if-this-then-that.

Stop for a minute though. Consider that we are talking about a heterogeneous collection of internet-connected devices in your house, on your body, on your commute, gathering a lot of data on you and controlling things around you so you don’t have to.

Do what happens when they’re not controlled on your behalf?

These devices have access to:

  • Your WiFi password
  • Your connected services
  • Whatever their sensors pick up (audio, visual, etc)
  • Other devices

Some of them happily connect on unsecured channels.

They are updated according to manufacturer policy (see Android fragmentation and the WebView vulnerability to see how well that works out).

If you accept the 3 rules of network security, and choose not to trust the manufacturer, the cloud services and the network, and want to protect yourself, how do you isolate your threat but still allow the benefits of these devices? How do you isolate the rest of your devices or services if one gets compromised? How do you protect your future data if the services get compromised? How do you protect yourself if your network gets compromised?

Possible solutions:

  • IoT DMZ for WiFi – allow devices to access your WiFI via an authentication key rather than password (similar to one-time passwords for 2FA enabled sites), which only allows them to access an authorised list of sites, and not other nearby devices, managed by your phone/companion app?
  • Direct network connection (Ethernet over power) rather than WiFi
  • Non-personal connection (built-in 4G)
  • local data hub that relays the collected information across your local network to a service you choose
  • Bluetooth, or other close-range set up (or see ChromeCast, which broadcasts an SSID for phone to pick up, then switches to the WiFi you set up)
  • Quick list/disabling of connected services?
  • Token auth rather than password auth
  • Forced updates
  • Non-network updates (my TV allows USB or OTAerial firmware upgrades)
  • Don’t connect your smart device to the network
  • Decide you don’t need internet access on your car, or your fridge.

If you aren’t scared enough yet –

Cybercrime, the security of things : http://www.information-age.com/technology/security/123459847/security-things-iot-and-cybercrime

And don’t forget to patch your car : http://www.wired.com/2015/07/patch-chrysler-vehicle-now-wireless-hacking-technique/

Categories
data development NMandelbrot

Is your CPU time there to be stolen?

Or can it be bartered? If you were given the choice between giving up cpu time, giving up privacy, or giving up money, to reimburse a developer for their time, which would you choose?

If you don’t feed us, do we not starve?

RT @ppinternational: uTorrent client is stealing your CPU cycles to mine #bitcoin http://t.co/g7z3y9AjGe http://t.co/SHpfQuOsdY http://www.engadget.com/2015/03/06/utorrent-bitcoin-miner/

Categories
code data programming ux

#dunddd Analyse This : The dangers of big data

Thanks to everyone who came to my DunDDD talk. Lots of interesting questions, although I’m not a lawyer so couldn’t answer them all.

If you want the slides, with references in the notes, you’ll find them here. All the images are creative commons, and you can use the sides yourself under CC by Attribution. Link to slides : Dunddd Analyse This – The Dangers Of Big Data (Google Drive)

If you missed the talk, the arguments I made and the references, apart from the privacy sections, are in this

Link to previous post

If you want the references for the Personal Data and anonymisation parts, have a look at these :

AOL searches are not private

IBM privacy-preserving data mining