# Safety and security of Python and R package import into TREs

## Overview

### Summary

Currently TREs allow access to PyPI and CRAN for less-sensitive data but only specific packages for more sensitive data.
Yet there are a variety of current approaches (some TREs have CRAN access while others do not).
Even though there are controls if there was a malicious python/R package, you could still just write the same thing inside the environment.
It is challenging to establish the line between R & Python files and AI/ML models.

Regarding egress there are challenges around the labour intensiveness of it, for which there are some automated tools.

### Next Steps

- Collaborate on a shared allowlist/blocklist for packages

## Raw Notes

- Current TREs allow access to PyPI and CRAN for less-sensitive data but only specific packages for more sensitive data.
- Different people have different experiences. Some have no access to CRAN others do
  - Scottish safe haven - no CRAN access
  - Dundee & GM allow full CRAN acceess
- CRAN have a fairly strict pipeline for adding packages so can be trusted?
  - but perhaps just coding standards rather than pen testing, file system access etc.
- If can lockdown egress sufficiently does it matter?
  - also need to ensure things like file access, network access etc are prohibited
  - can this be done?
- Is there a difference between R & python files, and a large ai/ml model? Not sure there's a clear dividing line of things we allow, and things we don't
- R has a system command to allow executing arbitrary code
- If there was a malicious python/R package you could just write it inside the environment - so preventing access to libs makes it harder but not impossible to do bad things.

### Egress

- Disclosure control labour intensive
- Some talk of automated tools
- Can prevent accidental disclosure
- What about malicious attempts to extract data e.g. encrypted, embedded in image files, in binary models etc.
- File size potentially helps
  - E.g. plausible to extract small amounts of patient data in an encrypted way that passes disclosure control. But unlikely you could do that with 1000s of records

### Roadmap plan

- Is it possible to lock down a TRE sufficiently so it is possible to allow unlimited ingress? If so best solution as no friction for researchers. Also allows future ingress items such as LLMs / neural nets etc..
- If not, then can TREs collaborate to whitelist (and blacklist) packages to prevent each one needing to repeat work.
  - Central register / co-ordination
  - But what to do about versioning?
- Could have a dual model:
  1. Docker based containerised TREs that are completely locked down meaning that any ingress is allowed
  2. TREs with a list of packages that are allowed, and you need to just use those. Process to request new packages