In the light of more data breaches, especially highly personal data of the form held by a certain affair website, I wanted to revisit the 4th Rule of network security. If you don’t trust encryption, you store as little as possible which implies YAGNI for data storage. There’s a few other benefits you get as well, in simplifying your storage. In this post I want to focus on the simplest and most obvious personal identifier, your name. And I’ll assume you’ve all read Falsehoods Developers Believe About Names, so if you haven’t yet, read it now.
What’s in a name, or a person?
At its basic level, a name is a poor identifier. Going by Google and some misplaced emails, I know of at least 3 other people in Scotland, 1 in Australia and 2 in USA who share my name. So my name is not sufficient to uniquely identify me, which is why I have a number of tracking references for the NHS, National Insurance (for Americans, think Social Security Number), my employer and others, and why it’s hard to get my name on social networks or email providers unless I get in early.
Do you need to know anyone at all? Is a tracking id enough?
Given that your name is not sufficient to uniquely identify you (and therefore is also harder to verify), is it even necessary? For many sites, and apps, not even a login is necessary or desirable to users, so advertisers, content providers and others often just use a tracking cookie or similar to identify you again the next time they see you.
If they need a login is a username and password enough?
OK, but your software needs to verify users and store data about them. You’ve got a good authorisation and authentication story to allow users access to their own stuff and no one else’s, without permission.
Do you still need their name? If you are a music site tracking my favourite videos, why do you care who I am? And if you’re built on a shoestring budget, why would you want the hassle of securing password when you can grab an OAuth library and not have to worry about it at all. Some of them will give you a name, some won’t. What purpose does storing a name, real or otherwise, serve?
Do you actually need to split names into first and last?
OK, so you actually need someone’s name, how are you going to store it? A name, a first and last name, middle names? If you are splitting a name and then littering your code (or at least your CPU time, if you’re resharing code) with contractions to display those names, then you may be doing it wrong. And who’s to say that all your users have a first and a last name? (see common misconceptions about names)
What about titles?
So you need a name, and you are sure you can split it. What else do you need? Do you need a title? Does it matter if your drop-down asks for Mr or Mrs or Miss? And by the way, what about Ms and Dr, and Prof, and can you handle Mx now, or do you discriminate? And by the way, how easy is it for the user to change those fields, as required by law?
What other details are you doing that you don’t need? Addresses? Country? Phone number? Email address? Credit Card number? Mother’s maiden name? Name of first pet?
Why does free train WiFi need my gender and age?
Or any website? And Why are those fields mandatory?
As a user, what is my motivation for giving you those details? The more information I give you, the more I have to trust you. And if I don’t trust you, you don’t get my data. If I know why you need it, there’s more chance of you getting my data. If I don’t know why you need it, I will assume you’re selling it, and I may think that even if I know why you need it.
Don’t ask me to trust you, and you won’t be disappointed.
What are you storing because you can, rather than because you need to?
Is your data a security risk, a performance risk, a trust risk? Or can you justify everything, and point to the requirement that details that justification?
If you lost your data to hackers, what would your users be most concerned about being disclosed? Can you stop storing that data?