Keynote: Natural Language Processing and Privacy: A Double-Edged Sword

Aylin Caliskan-Islam

In this talk, I will present threats, privacy enhancing methods, and open problems through the lens of linguistic privacy. Humans learn language and its semantics on an individual basis and consequently develop unique styles. Accordingly, these unique linguistic features exhibited in natural or programming languages come with the power to de-anonymize authors, programmers, or cyber criminals. This ability to infer a person’s identity through language processing poses a great threat for privacy and anonymity. Nevertheless, understanding these linguistic threats in great detail can be used to mitigate machine learning attacks. Characterizing and quantifying aspects of human behavior expressed in language via machine learning can enhance privacy. Textual features observed in social networks shed insight into privacy behavior and can help users choose privacy settings. This emerging area of linguistic privacy, along with its open technical problems, raise societal, ethical, and policy challenges.

About Aylin Caliskan
Aylin Caliskan is a Postdoctoral Research Associate and a CITP Fellow at Princeton University. Her work on the two main realms, security and privacy, involves the use of machine learning and natural language processing. In her previous work, she demonstrated that de-anonymization is possible through analyzing linguistic style in a variety of textual media, including social media, cyber criminal forums, and source code. She is currently extending her de-anonymization work to include non-textual data such as binary files and developing countermeasures against de-anonymization. Aylin’s other research interests include quantifying and classifying human privacy behavior and designing privacy nudges to avoid private information disclosure as a countermeasure. At Princeton, she works with Dr. Arvind Narayanan on text sanitization of sensitive documents for public disclosure, which can enable researchers to share data with linguists, sociologists, psychologists, and computer scientists without breaching the research subjects’ privacy. She holds a PhD in Computer Science from Drexel University and a Master of Science in Robotics from the University of Pennsylvania.