It’s the stuff of science fiction:  adversaries extract DNA information from a cup of coffee or postage stamp and use it infer one’s most private traits.  However, a recently released study entitled, “Data Sanitization to Reduce Private Information Leakage from Functional Genomics” discusses how this can be achieved, along with privacy measures that the life sciences and research community can use to help limit the risks to identifiable health information.

DNA information extracted from coffee cups and other environmental samples is “noisy” — for example, due to potential contamination by multiple individuals.  However, in the recently published study, researchers using involved statistical techniques report that they were able to reliably link information about known individuals from these “noisy” environmental samples with whole-genome sequencing reads and even partial genomic assays.  According to this study, this allowed for inferences about the individuals’ sensitive phenotypic information, such as information about mental health.

The study proposes techniques that can be used to help anonymize or protect the privacy of genomic information by removing certain observable variants from genomic datasets.  According to the study authors, there are parts of genomic datasets that tend to contain large amounts of variant information that their tool targets to help protect against the risk of re-identification.

What are the key takeaways for the privacy professional?

  • The researchers expressly recognize the scientific and public health value of genomic research and concerns that data anonymization processes should be balanced against the decreased utilization of more limited datasets.  Thus, the study contemplates that not all variant information would be removed for all genomic datasets.  For example, the study authors contemplate that some participants may simply mask variants that leak information about their susceptibility to stigmatizing phenotypes.
  • In addition, the study contemplates that information about the variants would be retained — not deleted altogether — but subject to more limited access controls based on research need.
  • The techniques used by the researchers involved sophisticated forensic, statistical, and sequencing techniques.  Indeed, the authors were not able to recreate certain results using lower cost and more portable genotypic methods.  This is relevant because most privacy frameworks consider the reasonableness of linking information to an identifiable individual.
Print:
Email this postTweet this postLike this postShare this post on LinkedIn
Photo of Libbie Canter Libbie Canter

Libbie Canter represents a wide variety of multinational companies on privacy, cyber security, and technology transaction issues, including helping clients with their most complex privacy challenges and the development of governance frameworks and processes to comply with global privacy laws. She routinely supports…

Libbie Canter represents a wide variety of multinational companies on privacy, cyber security, and technology transaction issues, including helping clients with their most complex privacy challenges and the development of governance frameworks and processes to comply with global privacy laws. She routinely supports clients on their efforts to launch new products and services involving emerging technologies, and she has assisted dozens of clients with their efforts to prepare for and comply with federal and state privacy laws, including the California Consumer Privacy Act and California Privacy Rights Act.

Libbie represents clients across industries, but she also has deep expertise in advising clients in highly-regulated sectors, including financial services and digital health companies. She counsels these companies — and their technology and advertising partners — on how to address legacy regulatory issues and the cutting edge issues that have emerged with industry innovations and data collaborations.