Minisymposium
MS5D - Trusted Research Environments for Supercomputing in Health and Social Sciences
Live streaming
Session Chair
Description
This minisymposium will bring together those working on enabling HPC facilities to be used effectively and efficiently when working with sensitive, confidential and personally identifiable data. Outside national security institutions, particularly in academic settings, HPC environments have traditionally been built with security standards appropriate to moderately sensitive data, in the physical sciences and engineering. Researchers are increasingly using large-scale computational methods to address questions in health and social science which require use of personally identifiable data obtained from governmental authorities. In order to maintain public consent for the use of such approaches “data saves lives” it is essential that an appropriate security environment is maintained. In these fields, secure environments are typically isolated from internet, have no in-out copy-paste facilities, a significant information governance overhead for actions as basic to the modern data scientist as pip install, and limited access to compute at scale needed for large digital twins or AI training and inference. Streaming data, continuously arriving from devices such as smart or personal equipment, is particularly challenging. Worldwide efforts are now focused on creating productive, secure environments called the Trusted Research Environments (TREs) at scale, which support the programming-language based approaches of modern data scientists and mathematical modellers.
Presentations
Trusted Research Environments (TREs) are secure computing environments suitable for the processing and analysis of special category data. The Standard Architecture for Trusted Research Environments (SATRE) is a UK community driven specification for how TREs, also known as Secure Data Environments or Secure Processing Environments, should be operated. The project followed open-collaboration principles from the very start, holding regular public events through which 60 organisations from across the UK shared their views, ultimately leading to the publication of the SATRE specification in October 2023. Through being driven by the community, and the incorporation of requirements defined by public members, SATRE has gained widespread support across multiple disciplines and stakeholders in the UK. SATRE has also been adopted within the EOSC-ENTRUST federation of TREs. SATRE, therefore, provides a high-level framework within which HPC implementations can comply, reducing the need to design completely bespoke HPC systems for sensitive data research. Applying the specification enables the alignment of HPC with TRE operators’ requirements thereby allowing scaling out of resource within existing governance frameworks. Here, we will describe the SATRE specification, it’s application within the UK ecosystem, and its relevance to HPC resources nationally and internationally.
Trusted Research Environments depend on information governance framework, business operations and technical components to ensure they provide a service that operates securely and mitigates barriers for researchers. Architecting in the cloud starts with establishing appropriate security controls, identity and access configuration. The application components for the TRE can then be deployed into this with confidence. This separation of concerns means that specialists within the org can own their area and e.g. research software engineers and research infrastructure engineers can focus on their area working more closely with researchers and data stewards. With these foundations the cloud allows you to extend TRE capability by providing access to additional services. These can include elastic clusters, ML/AI services and workflow automation to deliver on-demand high level capability for researchers.
The use of data from the health and social care system can have a complex set of technical and governance-based challenges around its secure and legal transfer to, and subsequent processing in academic supercomputing facilities. Through cross-institutional collaboration between UCL's Advanced Research Computing (ARC) Centre and UCLH NHS Foundation Trust, we have developed robust routes for the extraction, anonymisation and transfer of medical imaging from the hospital's imaging systems to trusted locations in the university for downstream computational use in research and education. The on-going success of this work has given us insights, that we will present, into the importance of aligning ways of working, technologies and platforms between the research IT functions in academia and the health sector.
Scientific field stations generate vast amounts of data, ranging from environmental measurements to operational logistics. However, the lack of structured data governance and integration strategies often hinders the full potential of these datasets, limiting their impact on research and decision-making. At Universidad Católica, we are developing a data platform designed to streamline the management, sharing, and utilization of scientific and operational data across the Regional Field Stations Network (RCER). This platform aims to provide researchers with reliable, well-organized, and accessible data by implementing robust governance models, standardized storage structures, and seamless cloud integration. By bridging scientific and operational data streams, we enhance reproducibility, data-driven decision-making, and interdisciplinary collaboration. A key component of this effort involves the development of local data management strategies that ensure field stations can efficiently capture and process their data before integrating with the broader system. In this presentation, we will discuss the challenges of scientific data governance in distributed field environments, the methodologies we are employing to create a scalable and sustainable data infrastructure, and the expected impact on research output. Through this initiative, we aim to empower scientists by unlocking the full potential of the data they generate, ultimately advancing research across multiple disciplines.