Statistical Data Privacy Resources

I frequently receive inquiries about recommended resources for learning about specific aspects of data privacy. Given the scarcity of resources and the absence of a defined curriculum, I offer the following list. Please be aware that this compilation may not encompass all available materials and is a biased opinion (e.g., several resources are by my colleagues and me).

Important Note: This section is a work in progress. Also, for a smoother learning experience, follow the materials in sequential order. The content progresses from conceptual to more technical content and/or longer forms (e.g., book). For articles that have paywall that I wrote, please contact me about obtaining a copy.

Introduction to Statistical Data Privacy

Resources on learning the basics of statistical data privacy.

Visualizing Implications for Communities of Color in Public Data Releases

My colleague, Maddie Pickens and Deena Tamaroff, developed an educational tool that allows community-based organizations and other interested users to explore the trade-offs for equity and privacy when these entities release data and statistics publicly.

Blog Post

Personal Privacy and the Public Good: Balancing Data Privacy and Data Utility

In this report, we aim to help readers better understand the tension between privacy and utility (or usefulness) and the challenges of balancing the two sides.

Research Report

Government Data of the People, by the People, for the People: Navigating Citizen Privacy Concerns

This paper discusses the fundamental tradeoff between data privacy and data usefulness—and how determining an appropriate balance can be difficult. The paper also provides thoughts on what must be addressed to help shape the future of data privacy, make meaningful contributions to its policy debates, and ensure the responsible representation of people in data.

Journal Article

Protecting Your Privacy in a Data-Driven World

Protecting Your Privacy in a Data-Driven World is a practical, nontechnical guide that explains the importance of balancing privacy and data usefulness and calls for careful consideration of how data are collected and disseminated by our government and the private sector.

Synthetic Data

Below are resources to learn more about synthetic data.

Understanding Synthetic Data: Using pseudo-records to maintain privacy in publicly released data

This fact sheet provides an overview of use cases for synthetic data and the broad process for creating synthetic datasets, including definitions of applicable terminology. It also discusses how to evaluate the quality and privacy of synthetic output.

Fact Sheet

Synthetic Data for the Nebraska Statewide Workforce & Educational Reporting System

The Nebraska Statewide Workforce & Educational Reporting System (NSWERS) is a state longitudinal data system (SLDS) that connects education and workforce data for research and policymaking. However, accessing SLDS data like those housed by NSWERS can be challenging due to privacy concerns. This nontechnical brief, intended for state longitudinal data systems stakeholders, introduces synthetic data, a privacy-enhancing technology that allows NSWERS to expand access to their research data without providing direct access to their confidential data.

Non-Technical Brief

Synthetic Data

The goal of this article is to provide a review of various approaches for generating and analyzing synthetic data sets, inferential justification, limitations of the approaches, and directions for future research.

Journal Article

Synthetic Data: A Look Back and A Look Forward

When initially proposed, synthetic data for disclosure control was generally dismissed as unlikely to be implemented in practice. Thirty years later, synthetic data are becoming a staple of the disclosure limitation toolkit. We now see synthetic public use files for several major data products with more on the way. In this article, the author reviews the progression of synthetic data, describe some unresolved challenges, and speculate on its future.

Journal Article

Advancing microdata privacy protection: A review of synthetic data methods

This review provides a comprehensive introduction to synthetic data, including technical details of their generation and evaluation. Our review also addresses the challenges and limitations of synthetic data, discusses practical applications, and provides thoughts for future work.

Journal Article

Formal Privacy

Below are resources to learn more about formal privacy or differential privacy.

Decennial Disclosure: An explainer on formal privacy and the TopDown Algorithm

This explainer aims to help readers better understand what formal privacy is and how the TopDown Algorithm works. The explainer is also a continuation of “Personal Privacy and the Public Good: Balancing Data Privacy and Data Utility” (Bowen 2021) and we encourage readers to read that report first.

Research Report

Preserving Privacy While Sharing Data

Differential privacy can safeguard personal information when data is being shared, but it requires a high level of expertise. This article discusses the challenges that businesses need to overcome to adopt differential privacy as a technical solution.

Magazine Article

The promise and limitations of formal privacy

Although differential privacy ushered in a new era of data privacy and confidentiality methodologies, many researchers and data practitioners criticize differentially private frameworks. In this paper, we provide readers a critical overview of the current state-of-the-art research on formal privacy methodologies and various relevant perspectives, challenges, and opportunities.

Journal Article

The Philosophy of Differential Privacy

This article provides a more technical introduction to differential privacy and highlights the challenges for adoption.

Journal Article

Data Privacy and Equity

Below are resources to learn more about equity in the data privacy process.

How the Federal Government Can Use Data to Make the Most of the Executive Order on Racial Equity

On his first day in office, President Biden signed an executive order directing federal agencies and White House offices to examine barriers to racial equity and initiated several efforts to address equity for people of color and underserved communities. This blog discusses five possible actions for policymakers to consider to ensure better dissemination of disaggregated data and data infrastructure.

To Advance Racial Equity, Releasing Disaggregated Data while Protecting Privacy Will Be Key

This blog discusses how privacy technical and policy solutions must be considered to advance racial equity.

Do No Harm Guide: Applying Equity Awareness on Data Privacy Methods

In this guide, we completed a literature review of equity-focused work in statistical data privacy (SDP) and conducted interviews with nine experts on privacy-preserving methods and data sharing. We also created an illustrative example (used New Mexico and Pennsylvania) to highlight potential disparities that can result from applying SDP methods without an equitable workflow.

Research Report
GitHub Repo

The following are courses I teach as an adjunct professor in the Business Administration, Management & Business Analytic Department at Stonehill College.

DAN 607: Security, Privacy, and Ethics in Data Analytics

“Protecting Your Privacy in a Data-Driven World” cover image.

Details:

Course Instructor: Dr. Claire McKay Bowen
Office Hours: By appointment on Zoom
Email: cbowen at urban dot org

Course Description and Objectives:
At what point does the sacrifice to our personal information outweigh the public good?

If public policymakers had access to our personal and confidential data, they could make more evidence-based, data-informed decisions that could accelerate economic recovery and improve COVID-19 vaccine distribution. However, access to personal data comes at a steep privacy cost for contributors, especially underrepresented groups. Revealing too much location information places people at risk such as empowering stalkers to track people more easily, but too little personal, location information will severely hinder the effectiveness of contact tracing.

This course will cover the importance of balancing these competing needs and walks through the issues the U.S. government and private sector must navigate when collecting and disseminating data. Specifically, students will understand the legal, social, and ethical ramifications of data security and privacy as well as the concepts behind data guardianship, and custodianship, and data permissions.

At a high level, the main learning objectives/topic areas of this course are learning the:

  • importance of data security, privacy, and ethics
  • history and evolution of security, privacy, and ethics
  • ways experts develop privacy preserving methods
  • these methods limitations
  • privacy laws governing and protecting people’s information
  • issues that society must consider advancing and improving security, privacy, and ethics

Required Textbook: Protecting Your Privacy in a Data-Driven World by Claire McKay Bowen, published by Routledge-CRC Press