# Project discussion: Would a systematic approach to data risk classification be helpful? _Chair: Will Crocombe (RISG Consulting)_ ## Prompts - Risk - how much and what sort? Personal/sensitive, commercial, political, IP... - Why classify risk and how might it help? - What type and level of controls might be practicable and proportionate? ## Notes - Could there be a common language around risk? - Classify based on the ease of identifiability, plus 'payload' - what would we know about them? - Proportionate controls - Tiered. Gatekeepers and access points (where). E.g. - 0 - public - 1 - anonymised - 2 - strong pseudo - 3 - weak pseudo - 4 - public - Dropping down tiers, things become easier. Turing paper on this - Sheffield used this as the basis of their system for assessing risk. - [Alan Turing Institute paper](https://arxiv.org/pdf/1908.08737.pdf) - Importance of agreed risk classification with federation, and agreement on risk appetite - [NIST RMF](https://csrc.nist.gov/projects/risk-management/about-rmf) - [NCSC](https://www.ncsc.gov.uk/collection/risk-management-collection) - [Harvard DataTags](https://github.com/IQSS/DataTaggingLibrary) - [UK Data Service data types](https://ukdataservice.ac.uk/help/access-policy/types-of-data-access/) - Doing this work at King's similar classification to Turin paper - Dundee operate on a blanket tier - My question was going to be around risk classification, based on my understanding of Goldacre, pseudonymisation should not be relied on. I agree researchers should only be presented data required for their project, but the risk of de-anonymisation particularly when combining datasets means this should be treated cautiously at best. - Automation - reduces risk of error - [Scottish Open Data](https://www.opendata.nhs.scot/) - [HIC RDMP](https://github.com/HicServices/RDMP)