The pharmaceutical industry generates a great deal of identifiable data (such as clinical trial data, patient engagement data) that has guardrails around “use and access.” Data captured for the intended purpose of use described in a protocol is called “primary use.” However, once anonymized, this data can be used for other inferences in what we can collectively define as secondary analyses. Secondary data refers to data used for a purpose that differs from the intention for which the data was collected. These data sets can come from various sources, as illustrated in Figure 1.
Secondary analysis is of growing importance to pharmaceutical companies, with a shift toward precision medicine and patient-centricity. Patient-generated health data offers a new avenue through which pharmaceutical companies can derive additional insights into disease and treatment patterns.
The use of this data is varied, ranging from creating enriched patient cohorts for powering clinical development efforts to identifying populations for quantifying treatment effectiveness and associated value outcomes. Pharmaceutical companies have years of rich data sets about patient characteristics collected over time (such as from clinical trials and omics studies), which makes them very beneficial to tap into.
Data for secondary analysis can be obtained from either internal sources or purchased from external, third-party data aggregators and vendors. In fact, there has been a marked growth in the number of vendors selling longitudinal patient data. In this evolving landscape, the challenge for life sciences organizations is to seamlessly and reliably integrate the vast array of distributed, complex and heterogeneous data sources.
Current methodologies fall short in providing adequate mechanisms for large-scale data aggregation that simultaneously meet stringent security and confidentiality requirements. Further, several factors add to the difficulty of effectively managing and harnessing this wealth of information, including organizational silos; diversity of data sources (ranging from genetic and behavioral to clinical data, and each necessitating distinct processing methods); and the lack of a common identifier for data integration.
To counter these hurdles, life sciences companies are seeking technological interventions to bring scalability and security to the data they gather for secondary analysis. We call these interventions “privacy preservation” techniques. Privacy preservation encompasses a spectrum of techniques, grounded in two fundamental principles: the provision of mathematical assurances of privacy and the prevention of reverse engineering of row-level data and insights. Long established in the adtech domain, these principles are now gaining momentum in the life sciences and healthtech data provider ecosystem, where collaborations and access to high-quality data are pivotal in the pursuit of targeted therapies.
These techniques are frequently employed in collaboration with the use of a data clean room. Data clean rooms are trusted research spaces designed to enable internal and external stakeholders to collaborate and implement privacy-preservation techniques safely. They serve as controlled and secure virtual environments where diverse data sources can be seamlessly stitched together and analyzed by multiple collaborating and analytical parties, all within a framework of robust security measures. These environments are particularly vital for life sciences organizations that share sensitive patient data for research and analytics with a diverse set of internal and external stakeholders.
In addition, differential privacy and tokenization are two privacy-preservation strategies you can employ in a data clean room, both of which allow you to anonymize identifiable elements in query results programmatically to stitch together disparate data sets by providing common tokens (or identifiers) based on statistical linkages of the patient respectively.
To learn more, read our ebook 3 Steps to Building an Effective Data Clean Room.
With the help of clean rooms, life sciences companies can gain several competitive advantages:
The solution: Snowflake Data Clean Rooms
Samooha, newly acquired by Snowflake, is a platform that provides data clean rooms as a first-party native application, built entirely on Snowflake’s architecture. In addition to performing the role of a traditional clean room, Samooha also allows life sciences organizations to run privacy-preserving analytics and AI/ML workloads.
Samooha offers a dual mode, no-code web application as well as a developer edition for advanced analytics and ML/AI use cases. It also provides differential privacy capabilities, which allow users to interrogate data with identifiable elements without having them exposed in their query results. It serves as a platform in which an entire developer ecosystem is empowered to build their own applications anchored around secure data collaboration, in addition to performing privacy-preserved analytics.
Samooha’s partnership with Datavant also makes it a solution that is particularly focused on the life sciences industry. Datavant provides the first layer of privacy-preserving technology through its tokenization process, while Samooha preventsensures no data leakage during collaboration. This is critical even with the use of privacy-preserving tokenization, as other patient attributes could be exposed and used to identify a patient if done outside of a data clean room.
To summarize, the Samooha clean room empowers users with the following benefits:
A step-by-step example of how Samooha’s data clean room can be leveraged to respond to data queries is illustrated in Figure 3:
1. Pharmaceutical company and its collaborator join a Samooha clean room.
2. Both parties leverage their tokenization provider of choice on their data to create tokens with patient identifiers. These tokens are then transformed into a common token key.
3. The pharma company configures the collaborator’s access to its internal data set, specifying which columns can be accessed and in what way. Both parties can join tokens in the clean room to create an enriched data set.
4. The collaborator runs insights within the clean room and configures appropriate privacy settings, such as threshold and differential privacy specific to its use case with required privacy obligation.
Snowflake Data Clean Rooms, along with Snowflake Native Apps, make it easy for life sciences organizations to securely and seamlessly collaborate, unlocking valuable health insights and improved patient outcomes. Snowflake customers can now leverage Samooha clean room environments at no additional cost, simply drawing down from their existing Snowflake compute.
To learn more about implementing a data clean room, check out our 3 Steps to Building an Effective a Data Clean Room ebook.
The post Preserving Data Privacy in Life Sciences: How Snowflake Data Clean Rooms Make It Happen appeared first on Snowflake.
Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and […]
Every business has key customer behaviors it aims to drive — whether it’s encouraging repeat purchases, promoting product upgrades or […]
The Energy Sector’s transformative shift Energy, the driver of the global economy, is undergoing one of the largest secular shifts […]