RSE TRE Community Quarterly Meeting (March 2023)#

Date: Wednesday 29th March 2023 13:30 - 17:00

Schedule#

Time

Agenda item

Presenters

Notes

13:30 - 13:40

Introduction

Hari Sood (Turing)

Introduction to day & community

13:40 - 13:50

Working group updates

Hari Sood (Turing), Simon Li (Dundee), Fergus McDonald (DARE UK), Martin O’Reilly (Turing)

Updates from WGs on what they’ve done for the last 3 months, and what they’re doing next

13:50 - 14:00

Community announcements

All

A chance for anyone in the community to share annoucements/updates with everybody else

14:00 - 14:30

NHS SNSDEs

Simone Croft (Sheffield)

14:30 - 14:55

Project workshops 1

Breakout rooms

Room 1: SATRE & Open Source TREs (Simon Li)

Room 2: Building a Cost recovery model TREs: How to do it together? (David Sarmiento)

14:55 - 15:00

Project workshops 1 shareout

Hari Sood (Turing)

15:00 - 15:25

Project workshops 2

Breakout rooms

Room 1: Would a systematic approach to data risk classification be helpful? (Will Crocombe)

Room 2: DARE UK & the community (Fergus McDonald)

15:25 - 15:30

Project workshops 2 shareout

Hari Sood (Turing)

15:30 - 15:40

Break & vote on breakout dicsussions

15:40 - 15:45

Breakout discussion poll

Hari Sood (Turing)

Determining which topics to cover in the breakout discussions

15:45 - 16:10

Breakout discussions 1

Breakout rooms

Breakout discussion 1: Data sensitivity

Breakout discussion 2: Network access inside a TRE

16:10 - 16:15

Breakout discussions 1 shareout

Hari Sood (Turing)

16:15 - 16:40

Breakout discussions 2

Breakout rooms

Breakout discussion 1: TRE monitoring and activity logging

Breakout discussion 2: TRE architectures

16:40 - 16:45

Breakout discussions 2 shareout

Hari Sood (Turing)

16:45 - 17:00

Wrap up & Working Group breakouts

Hari Sood (Turing)

Introduction#

Hari Sood from the Alan Turing Institute kicked off proceedings with the reminder that this is an open, self-organised community and is still in its infancy. Everyone is welcome - folk building and running TREs, folk using TREs, folk just interested in them. The group has a strong focus on open source TREs but anyone involved in non-open TREs is, of course, also welcome. A plea to all: please do suggest things we should be doing together, and any thoughts you have on how best we can work.

The Newcastle Commitment is our public statement of intent as a community.

Currently the group has quarterly meetings, three virtual and one in-person / hybrid at the RSE Conference each September. We have also set up four working groups to help us co-ordinate work between the quarterly meetings:

  • Community management & engagement

  • Open source TREs

  • Fundraising & sustainability

  • Information governance

One topic discussed throughout the day was whether these are the working groups we want, and whether any obvious ones are missing. Folk were encouraged (and still are!) to decide which ones they would like to get involved with.

Working Group Updates#

Working group leads gave short updates on current activities.

Under the funding & sustainability report the group were asked for any thoughts about how the interest/working groups structure could best be run and supported. We’ve broken this out here because it’s a useful question for all the groups to consider. Some thoughts:

  • Are there existing bodies to bolt into or leverage from? (e.g. RDA, IETF)

  • What’s the best kind of vehicle for building/bring-together the community?

  • Should it be overseeing the development of community ‘standards’?

Community management#

Hari Sood, Alan Turing Institute

There has been a lot of thinking on how to run the community, how to distribute responsibilities and so on. This is still ongoing. The group acknowledged that currently everyone is doing this in their “spare time”.

The stakeholder mapping exercise is in progress and will be presented at the next event.

Open Source TREs#

Simon Li, University of Dundee

The group launched today!

The group is supported by DARE-UK through some of its driver projects, particularly SATRE and TELEPORT. See the later section for a report on the dedicated discussion session.

Funding & sustainability#

Fergus McDonald, DARE UK

The first meeting of the F&S group took place last week and raised a key question for this wider group: what do we think this working group should be about? How do we define the WGs purpose?

Suggestions offered included:

  • Cost benchmarking across TREs/SDEs

  • Knowledge sharing around charging models (e.g. data access charges).

  • Securing funding with longer time horizons for TRE (software) engineering projects.

  • Open source software maintenance (e.g. OpenStack), perhaps including dedicated community management, allocated engineering time, baseline ‘tooling’ for collaboration (e.g. Git repositories, messaging platforms, etc.).

  • Set of open source components (with a finite number of resourced projects focused around these components).

The group will start off with bi/tri-weekly connects and see how it goes.

Information governance#

Martin O’Reilly, Alan Turing Institute

It’s been fairly quiet since the last meeting. Around ten people have so far volunteered to get involved, with one of these also willing to co-ordinate. The group is looking forward to building some momentum after this meeting. A Doodle will be circulated for the first WG meeting after today’s workshop.

The group expects to work very closely with the Open Source TRE WG on how information governance requirements fit into the standards work they are looking at.

One question that arose:

  • Could the IG WG be reframed as ‘data stewardship’? IG can feel like a very specific set of processes, that can be quite limiting and challenging.

Community Announcements#

A couple of general announcements noted:

  • The Research Data Alliance had a “birds of a feather” session on TREs at their recent plenary meeting. Carole Goble (UoM; TRE-FX, ELIXIR) was in the BoF session, with Rob Baxter there by Slack proxy! RDA will be setting up their own TRE working group. Rob volunteered to help set up WG. Will look to ensure it connects with this community and the wider DARE UK endeavour.

  • The DARE UK-funded SATRE project is up and running, working on a reference specification for TREs. A survey will go out soon and the team will follow up with group and / or 1-2-1 calls to have more in depth conversations. The team is aiming for a first draft of the specification for community comment by June.

NHS Sub-National Secure Data Environments (NHS (SN)SDEs)#

Simone Croft, University of Sheffield

Simone introduced the NHS SDE programme and gave a great overview of its strategy and current status.

A number of policy drivers, from the DHSC’s Data saves lives: reshaping health and social care with data to the Goldacre review, point towards a greater use of UK public data for research but in well-managed environments - TREs. £200M in investment is going towards supporting the broader use of NHS data for research.

NHS England has settled on “Secure Data Environment” rather than “Trusted Research Environment” and has so far taken a two-pronged approach:

  • A National NHS SDE, where people will be able to access essential deidentified data. This will have a national view of data, where people can pull datasets across several regions. It will become the default way to access NHS digital data: no more data downloads! The NHS doesn’t want data to leave the NHS. The National SDE will be a complete end-to-end service.

  • A network of interoperable sub national SDEs supporting data access at a significant regional scale and combining health & social care outcomes (as well as NHS services, spending, performance etc.). There are 11 proposed SDEs covering all England:

    • Each covers multiple Integrated Care Systems (ICS) with ~5 million patients.

    • Each is NHS owned, led by Integrated Care Provider (ICP) in partnership with with local universities.

    • Together they have been rolled out in two waves (wave 1 were up and running already, wave 2 needed more time to get themselves together)

      • Wave 1:

        • London

        • North West

        • Thames Valley and Surrey

        • West Midlands

      • Wave 2

        • East of NEgland

        • East Midlands

        • Great Western

        • Kent Medway & Sussex

        • Yorkshire & Humber

        • And more…

Much discussion followed. Please see the full notes.

Project Discussion Sessions#

These four sessions were scheduled in advance with relevant projects invited to pose a problem for the community to chew on.

SATRE & open source TREs#

Chaired by Simon Li, University of Dundee

Prompts#

  • Some groups feel beholden to SDEs at the moment. How much progress can we/should we make “independently”?

  • TREs should not be “one size fits all”. We should aim for a “Goldilocks” approach.

Discussion#

The group discussed the challenges involved in finding the right balance between detail and generality. A high-level architectural specification for a TRE would be a very useful step towards standardisation but is perhaps a move away from the group’s original founding thoughts on open source implementations, which would have more immediate practical application.

Current developers and operators of TREs of various kinds were happy to share their technical implementations with the group.

There was a lot of discussion on how strongly information governance concerns shoud drive the technical agenda (broad consensus: a lot!). A question on how far TRE developers could learn from other groups’ ISMS documentation for ISO27001 accreditation casued some debate. Some noted sensitivities around sharing too much detail, others that standard operational procedures can and are made public.

Next steps#

Building a cost recovery model for TREs: How to do it together?#

Chaired by David Sarimento, Alan Turing Institute

Prompts#

  • Should research projects contribute financially to TRE provision?

    • If so, how do we do it fairly and make sure it doesn’t become a barrier to research?

  • What costs should be recovered? Staff time, common infrastructure, analysis costs

  • How do we communicate the need for cost recovery with projects?

Discussion#

The group agreed this was both essential and difficult!

The funding picture for TRE operation is complicated. Data come from the NHS or Government, research project funding comes typically from UKRI, but the operational and infrastructure costs (keeping the lights on and providing business-as-usual support) are often neglected. Service improvement is also often an afterthought.

There is a sensitivity around funding levels. Persuading research PIs that part of their funding ask should be assigned to ‘keeping the lights on’ at the facility they use is difficult in many disciplines.

One reason is that a lot of costs can be hidden (from researchers). Cloud funding costs are straightforward, but overhead costs can be more complicated - information governance, data management, helpdesk, security administration, ongoing development and maintenance of software.

Some illustrative examples were discussed:

  • Turing Example - Trusted Research Service Area (TRESA). This was created as a cost centre, with separation within the resources/personnel between the ‘production TRE’ versus ‘developing the TRE’.

  • Bart’s Health project. Here there is clear pressure to come up with a sustainability model - what’s it going to cost and why everyone else should be charge except for themselves!

Next steps#

  • If funders are looking at national infrastructure then the picture has to be comnplete and include baseline costs for keeping the lights on. Could the group begin to collect operational models for different kinds of TRE (eg, levels of security?)?

Would a systematic approach to data risk classification be helpful?#

Chaired by Will Crocombe, RISG Consulting

Prompts#

  • Risk - how much and what sort? Personal/sensitive, commercial, political, IP…

  • Why classify risk and how might it help?

  • What type and level of controls might be practicable and proportionate?

Discussion#

The group discussed first the feasibility of a common language around risk and the idea of tiered controls matched to levels of risk. In any federated TRE network it would be important to agreed risk classification across the federation, and to be able to match different risk appetites. Several existing approaches were noted:

Further discussion on the safety (or otherwise) of pseudonymisation ensued. The Goldacre Review highlighted the challenges of increasing risk (of re-identifying individuals) through data linkage; how would a classification system respond to this? One reason TREs exist is to provide controls to prevent any wider impacts from potential re-identification from linked data.

Next steps#

  • There was broad agreement that this topic will run and run!

DARE UK & the community#

Chaired by Fergus McDonald, DARE UK

Prompts#

A proposal from DARE UK to model TRE community processes along the lines of the Research Data Alliance:

  • “Interest Groups are open-ended in terms of their longevity. They focus on a broad-based challenge within the programme’s scope (e.g., software tooling to support information governance decision-making, reproducible research methodologies), and they should spawn WGs to address specific pieces of work.”

  • “Working Groups are short-term (<18 months) in terms of their longevity. They focus on a specific, tractable piece of work – be that tools, policy, practices, products, proof of concepts, etc. – within the programme’s scope.”

  • “Communities are open-ended in terms of longevity. They are community-led, established, and managed. They are an open forum or ‘town square’ for the fostering of open communication across that specific community.”

  • Recent Birds of a Feather (BoF) discussion at Research Data Alliance (RDA) around essentially TREs.

Discussion#

Discussion centered around adopting the RDA community processes (themselves modelled on the IETF), with IGs/WGs founded with a “charter”, a 1-pager outlining what the group is about. The model is:

  • interest groups, open-ended, no defined length, broad remit, which can (and should) spawn…

  • one or more working groups focussed on specific things, with target deliverables (eg, and RFC-like document, a PoC, …) and a deadline.

The group discussed good and bad ways to balance this “top-down” approach with “bottom-up” engagement. Being too prescriptive can kill community enthusiasm; being too laissez faire can lead to drift and irrelevance. Could DARE UK offer some common basic infrastructure services and perhaps a simple code of conduct?

It was also noted that expecting “too much” from community-led efforts without providing tangible support (eg, funding!) is unreasonable!

Next steps#

  • DARE UK to consider support models for the short-term (eg, infra service, code of conduct), and longer term (eg, buy-out funding, travel bursaries, fellowships…)

Breakout Discussions#

The breakout sessions were selected by dynamic Mentimeter vote from a longlist of a dozen or more :smile:

Data sensitivity#

Chaired by Will Crocombe, RISG Consulting

Prompts#

Discussion#

The group continued the discussion around risk tiering and how that might by applied in practice to infrastructure setups, especially in the context of data linkage. There was particular discussion around placing the focus on ‘safe outputs’ from a TRE rather than over-worrying about linked data going into a TRE.

Next steps#

  • It was felt that this could usefully become a focus topic for a future meeting.

Network access inside a TRE#

Chaired by Simon Li, University of Dundee

Prompts#

  • What does this mean for TRE operators, users and governance?

Discussion#

Discussion revolved around identifying the need, or otherwise, to allow network access (for researcher users?) from within a TRE. The idea of compartmentalising researchers and other parts of the the TRE with limited network proxies was considered. Concerns were raised aroud secure DNS and the application of AI for anomaly detection.

Next steps#

  • No immediate next steps identified. Probably a topic to return to.

TRE monitoring & activity logging#

Chaired by Martin O’Reilly, Alan Turing Institute

Prompts#

  • To what extent do people actively monitor / alert on logging activity (vs collecting for incident / post-incident management)

  • What tools do people use for monitoring (e.g. Grafana, Prometheus, Loki, Mimir etc.)?

Discussion#

The group discussed the division between logging for infrastructure health (uncontroversial) and logging user activity and data access (more controversial). The former is routine, the latter is much less routine and varies across TREs. Some TREs are arguing for greater logging of user activity but balancing this aganist possible privacy concerns is tricky (anonymised logs?).

There was consensus in the group that many TREs log but don’t pro-actively monitor alerts on logging, but nevertheless see value in having log data available for incident managment / review and for giving confidence to data owners.

There was also consensus that a more proactive approach to log monitoring is required when identifiable individual level data is accessible, but the group ran out of time to discuss whether it adds value when data are de-identified.

Next steps#

  • None identified this time around.

TRE architectures#

Chaired by Rob Baxter, DARE UK

Prompts#

  • Is there a single architectural pattern for TREs?

Discussion#

Discussions hinged around the best ways to capture an abstract architecture for TREs, and whether there was one single one or a number. It was observed that TREs are principally about network isolation patterns and information governance. There was discussion about using standard enterprise architecture approaches to describing TREs, with an “IG layer” taking the key role of “business layer”.

The group agreed that TRE architectures should be considered holistically within the ‘5 Safes’ framework and that nailing down the “governance-architecture” interface is very important.

There was subsequent discussion on possible standards that could help frame this, with the UK Stats/ONS accreditation for data processors under the Digital Economy Act 2017 (’DEA Accreditation’) being a good ‘gold standard’ to use.

Next steps#

  • As a community working group, develop an architecure around the idea of linking governance needs and technical specification for TREs?

Wrap-Up#

The afternoon was summarised around the question of working groups and next steps. Firstly, the four propsed working groups were agreed as vehicles to take some of the day’s discussions forwards (or at least no-one dissented!)

  • Community management

  • Open Source TREs

  • Funding & Sustainability

  • Information Governance

The community agreed that making efforts to ensure conversations around TREs are joined up is well worth it. We concluded with the note that there are more voices we’d like to see engaged: NHS SDEs, non-health-data folk, international activities, etc.

Volunteers to co-ordinate and contribute were called for! Your community needs you!

This write-up was authored by the wonderful Rob Baxter from DARE UK. For any questions about the above, please get in touch with Hari Sood hsood@turing.ac.uk