Here are the questions I asked during my guided conversation at CodeCraftConf 2019. They are also available on GitHub if you would like to fork and modify them for your own use. Thankyou to everyone who came to the discussion, I will post a follow-up to discuss some of the interesting answers.
What is data anyway?
Navigating SQL, NoSQL, JSON and how to work with data in a post-RDMS, big-data world
- When designing a system, do you start with the data or the code?
- Has the rise of cloud based or non relational data stores changed how we model our data?
- Do you need to update your data when the models in the code change? How do you do it?
- Does all your data have to have the same shape?
- Should the data you expose to the outside world broadly match the data at rest?
- How do you secure your data?
- In light of GDPR, How do you ensure you aren’t collecting too much data?
- Who has access to your data?
- Do you know if anyone unauthorised has accessed it?
- How do you protect yourself against bad data and trojan data?
- Bad data = data that is fake, or is used for real world attacks
- Trojan data = data that can compromise your or your customer’s systems
- Can your data be used to discriminate?
- Can you prove it?
- Is your data biased?
- Are you recording hidden correlations? (ZIP code suggests race)
- Who owns your data?
- What questions aren’t you asking?
- What makes data big?
- Are you collecting the right data?
- Is the data you’re collecting right?
- Where is your data?
- Do you still have a place for traditional RDBMS?