Controlled data transfer and output review
The Data Review and Transfer Component, or DRTC, is an AWS-developed solution for reviewing, approving, automating, and auditing sensitive data transfer requests into and out of secure environments such as Trusted Research Environments. It is designed to help organizations control how data, code, results, and other research artefacts move across the TRE boundary. This capability is becoming increasingly important as trusted research models mature.
In practice, DRTC turns data movement into a governed workflow. Incoming datasets can be placed into a staging or quarantine location, where automated checks and/or manual review can take place before data is moved into the data platform. Outgoing results can follow a similar pattern, with outputs reviewed and approved before release.
This is all managed through web-based portal where, administrators can configure the review process for each storage location. DRTC supports a two-stage approval workflow with a first reviewer and optional additional reviewers. These reviewers can be individual users or groups, such as a data governance committee, subject matter experts, or an independent review group to support segregation of duties.
The governed data platform
Behind the transfer boundary sits the governed data platform. This is where sensitive datasets are stored, curated, catalogued, prepared, and made available for approved research projects.
This layer should not be treated as a generic storage bucket. It is one of the most important design areas in the TRE because it determines how usable, reusable, governable, and scalable the research environment becomes.
The right design depends heavily on the research domain, data types, regulatory context, analytical methods, and collaboration model. A genomics platform, a clinical research environment, a defense research dataset, an industrial R&D environment, and a social science data platform will not have identical requirements.
That is why this layer should typically start with a data platform assessment. The assessment should clarify:
Assessment Table
| Assessment area |
Questions to answer |
| Data sources |
Where does data come from, who owns it, and how is it ingested? |
| Data sensitivity |
What classifications apply, and what restrictions follow from them? |
| Data structure |
Is the data tabular, imaging, genomic, sensor-based, unstructured, or multi-modal? |
| Data preparation |
What validation, transformation, pseudonymization, de-identification, or curation steps are required? |
| Access model |
Which projects, roles, and users should access which datasets? |
| Metadata and discovery |
How will researchers find approved data without overexposing sensitive assets? |
| Lineage and reproducibility |
How will datasets, transformations, queries, and outputs be traced? |
| Lifecycle and retention |
How long should data be retained, archived, or removed? |
| Analytics needs |
Are researchers using SQL, notebooks, HPC, ML, generative AI, statistical tools, or domain-specific applications? |
Based on this data platform assessment, the architecture can be tailored to the research domain, governance model, and analytics needs. In some cases, for example, Amazon SageMaker Unified Studio can provide a governed data and AI environment, bringing together Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and SageMaker AI.
Researchers can work with notebooks, SQL, Python, machine learning workflows, and natural language queries, connecting to data in Amazon S3, AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift. Amazon S3 provides the storage foundation, with lakehouse architecture, data cataloguing, access control, data product ownership, and scalable processing added where needed.
Secure research workspaces with RES
Researchers interact with the TRE through the Virtual Research Environment. This is where approved users access compute, software, and project data inside defined boundaries. The goal is to provide a practical working environment while keeping sensitive data inside the controlled TRE architecture.
Research and Engineering Studio on AWS, or RES, can support this workspace layer. RES is an AWS-supported, open-source solution that provides a web portal for scientists and engineers to run technical computing workloads on AWS. Users can launch secure Windows or Linux virtual desktops, use existing corporate credentials, and work in individual or collaborative projects. Administrators can define project spaces, assign software stacks, attach shared file systems, monitor usage, and set project budgets to help control consumption.
RES also includes other controls that are useful in trusted research settings, such as identity integration with SAML 2.0 or OIDC, desktop sharing profiles, restricted file browser access, controlled SSH access, custom permission profiles, and private VPC deployment patterns with VPC endpoints. These controls help limit data movement and reduce the risk of data leaving the research boundary.