Privacy Requirements and Technical Challenges for Storing Genomic Data

Jean-Pierre Hubaux

Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data.

In this talk, we will enumerate the challenges for genome data privacy and emphasise the need for long-term protection (beyond a century). We will also present a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward. We will also briefly present a solution based on honey encryption.