# Project discussion: Would a systematic approach to data risk classification be helpful?

_Chair: Will Crocombe (RISG Consulting)_

## Prompts

- Risk - how much and what sort? Personal/sensitive, commercial, political, IP...
- Why classify risk and how might it help?
- What type and level of controls might be practicable and proportionate?

## Notes

- Could there be a common language around risk?
- Classify based on the ease of identifiability, plus 'payload' - what would we know about them?
- Proportionate controls - Tiered. Gatekeepers and access points (where). E.g.
  - 0 - public
  - 1 - anonymised
  - 2 - strong pseudo
  - 3 - weak pseudo
  - 4 - public
- Dropping down tiers, things become easier. Turing paper on this - Sheffield used this as the basis of their system for assessing risk.
- [Alan Turing Institute paper](https://arxiv.org/pdf/1908.08737.pdf)
- Importance of agreed risk classification with federation, and agreement on risk appetite
- [NIST RMF](https://csrc.nist.gov/projects/risk-management/about-rmf)
- [NCSC](https://www.ncsc.gov.uk/collection/risk-management-collection)
- [Harvard DataTags](https://github.com/IQSS/DataTaggingLibrary)
- [UK Data Service data types](https://ukdataservice.ac.uk/help/access-policy/types-of-data-access/)

- Doing this work at King's similar classification to Turin paper
- Dundee operate on a blanket tier

- My question was going to be around risk classification, based on my understanding of Goldacre, pseudonymisation should not be relied on. I agree researchers should only be presented data required for their project, but the risk of de-anonymisation particularly when combining datasets means this should be treated cautiously at best.
- Automation - reduces risk of error
- [Scottish Open Data](https://www.opendata.nhs.scot/)
- [HIC RDMP](https://github.com/HicServices/RDMP)