My work aims to ensure that all people are responsibly represented in data.

Research Interests: statistical data privacy | statistical disclosure control | differential privacy | synthetic data | data equity | statistical computing | STEM Education

Statistical Data Privacy and Data Governance

In an increasingly interconnected and surveilled world, the construction of high-quality datasets has become more accessible, leading to both valuable and potentially harmful applications. While the collection of extensive and precise data can offer significant societal benefits, such as advancing medical research and targeted investments for those in need, data privacy concerns become pronounced when this information can be de-anonymized and exploited for malicious purposes.

My research focuses on statistical data privacy and data governance in order to develop and promote safe data collection and analysis methods that expand access to confidential data. My colleagues and I have partner with local and federal government agencies as well as other organizations to apply new statistical data privacy and data governance methods so researchers and policymakers can use data to society’s benefit while protecting privacy.

Privacy, Equity, and Public Policy

On his first day in office, President Biden took a significant step by signing an executive order that tasked federal agencies and White House offices with assessing barriers to racial equity and initiating efforts to rectify inequitable policies affecting individuals and communities of color. One such barrier, which has been a growing concern among researchers and public policymakers, is statistical data privacy (or statistical disclosure control) methods that provide researchers or policymakers access to data while preserving participants’ privacy often do not explicitly consider racial equity.

Methods such as data suppression, the addition of random noise under differential privacy, or the generation of synthetic data strive to strike a balance between the need for accurate information and privacy considerations. However, all these approaches involve a utility-risk tradeoff that can have equity implications for different racial groups. Without considering equity, researchers run the risk of unintentionally perpetuating harm by disproportionately distributing either the privacy risks or the utility of the information obtained from the data.

The following represents my work at the intersection of data privacy, equity, and public policy, as I strive to address these critical issues and promote a more equitable approach to data protection and governance.

  • Bowen, CMK. & Snoke, J. (2023) “Do No Harm Guide: Applying Equity Awareness In Data Privacy Methods.” Urban Institute.
    • Click here to view the article on the project landing page (open-access).

Data Synthesis and Differentially Private Data Synthesis Methods

Within the extensive field of statistical data privacy literature, both differential privacy and data synthesis, and their integration, have gained significant popularity as solutions for releasing analytically valuable data while protecting individual privacy. It’s crucial to recognize that there is no methodological “silver bullet” that applies to all data. Therefore, ongoing development and refinement of differentially private methods and data synthesis techniques remain essential.

Below, I present my work in the development and practical application of differentially private and synthetic data methods to real-world datasets.

    • Bowen, CMK., Bryant, V., Burman, L., Khitatrakun, S., McClelland, R., Mucciolo, L., Pickens, M., and Williams, A. (2022) “Synthetic Individual Income Tax Data: Promises and Challenges.” National Tax Journal, 75(4), 767-790.
      • Click here to view the article on the journal page.
    • Bowen, CMK., Bryant, V., Burman, L., Czajka, J., Khitatrakun, S., MacDonald, G., … & Zwiefel, N. (2022). Synthetic Individual Income Tax Data: Methodology, Utility, and Privacy Implications. In International Conference on Privacy in Statistical Databases (pp. 191-204). Springer, Cham.
      • Click here to view the article on the journal page.
    • Liu, F., Eugenio, E., Jin, I., and Bowen, CMK. (2022) “Differentially Private Synthesis and Sharing of Network Data via Bayesian Exponential Random Graph Models.” Journal of Survey Statistics and Methodology, DOI 10.1093/jssam/smac017
      • Click here to view the article on the journal page.
    • Bowen, CMK., Liu, F., & Su, B. (2021) “Differentially Private Data Release via Statistical Election to Partition Sequentially.” METRON.
      • Click here to view the article on the journal page.
    • Bowen, CMK., Narayanan, A., Scally, C. (2021) “Using Differential Privacy to Advance Rural Economic Development: Applying Data Privacy and Confidentiality Methods to Industry Employment Data.” Urban Institute.
      • Click here to view the research brief (open-source).
    • Bowen, CMK., Bryant, V., Burman, L., Khitatrakun, S., McClelland, R., Stallworth, P., Ueyama,K., Williams, A. (2020) “A Synthetic Supplemental Public Use File of Low-Income Information Return Data: Methodology, Utility, and Privacy Implications.” International Conference on Privacy in Statistical Databases (pp. 257-270). Springer, Cham.
    • Eugenio, E., Liu, F., Jin, I., and Bowen, CMK. (2020) “Differentially Private Synthesis of Social Networks via Exponential Random Graph Models” Proceedings of 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), Pages: 1695-1700, DOI 10.1109/COMP-SAC48688.2020

Evaluating Statistical Data Privacy Methods

Balancing data utility against data privacy risks is a complex task. Many individuals seeking an answer to this question expect a one-size-fits-all utility or disclosure risk metric that perfectly assesses the quality of any data or statistic released under a privacy-preserving method or technology. But, such a metric doesn’t exist.

Below, I outline my work in evaluating the efficacy of different statistical data privacy methods and offer insights into how we should approach the delicate balance between privacy and utility.

    • Barrientos, A. F., Williams, A. R., Snoke, J., & Bowen, CMK. (2023) “A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses with Evaluations on Administrative and Survey Data.” Accepted in the Journal of the American Statistical Association.
      • Click here to view the article on the journal page.
      • Click here to see the arXiv copy.
    • Bowen, CMK. & Snoke, J. (2021) “Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge” Journal of Privacy and Confidentiality, 11 (1).
      • Click here to view the article on the journal page (open-source).
    • Bowen, CMK. & Liu, F. (2020) “Comparative Study on Differentially Private Data Synthesis Methods.” Statistical Science.
      • Click here to view the article on the journal page.

Communications and Education

The following represents my efforts in introducing data synthesis, differential privacy, and various statistical data privacy methods and technologies to a wider audience, including the scientific and non-technical communities.

  • Joshua Snoke and I chatted on the podcast, Stats + Stories, discussing how the data privacy landscape is changing.
    • Click here to listen.
  • I was a guest on the podcast, Data Science Imposters, discussing what is differential privacy how it impacts everyone.
    • Click here to listen.
    • Hu, J. & and Bowen, CMK., (2023) “Advancing microdata privacy protection: A review of synthetic data methods” WIREs.
      • Click here to view the article.
    • Williams, A.R. & and Bowen, CMK., (2023) “The Promise and Limitations of Formal Privacy” WIREs.
      • Click here to view the article.
    • Bowen, CMK., Williams, A. R., & Pickens, M. (2022) “Decennial Disclosure: An Explainer on formal Privacy and the TopDown Algorithm.” Urban Institute.
      • Click here to view the research brief (open-access).
    • Garfinkel, S. & and Bowen, CMK., (2022) “Preserving Privacy While Sharing Data” MITSloan Management Review.
      • Click here to view the article.
    • Bowen, CMK. & Garfinkel, S., (2021) “The Philosophy of Differential Privacy” Notices of the American Mathematical Society.
      • Click here to view the article (open-access).
    • Bowen, CMK. (2021) “Personal Privacy and the Public Good: Balancing Data Privacy and Data Utility.” Urban Institute.
      • Click here to view the research brief (open-access).
    • Snoke, J. & Bowen, CMK., (2020) “How Statisticians Should Grapple with Privacy in a Changing Data Landscape.” CHANCE, Special Issue: A New Generation of Statisticians Tackles Data Privacy.
      • Click here to view the article (open-access).
    • Bowen, CMK & Eugenio, E., (2017) “Where’s Wenda: An Activity on Teaching Middle School Students Data Privacy.” Statistics Teacher.
      • Click here to find the article (open-access).

Other Research Projects

Statistical Computing

With the rapid advancement in computational power, we increasingly rely on simulated experiments over physical ones, primarily due to cost considerations. For instance, at Los Alamos National Laboratory, there is a physical experiment that lasts less than a second but costs upwards of $10 million. This underscores the value of supercomputers in resource conservation. However, it’s important to note that these high-performance systems are vulnerable to various factors, including heat, power fluctuations, and cosmic radiation. Moreover, while computational power has seen significant growth, data storage and transfer rates continue to lag behind. The following papers delve into these critical issues.

  • Bowen, CMK., DeBardeleben, N., Blanchard, S., & Anderson-Cook, C. (2019) “Do Solar Proton Events Reduce the Number of Faults in Supercomputers?: A Comparative Analysis of Faults during and without Solar Proton Events.” 2019 IEEE International Reliability Physics.
    • Click here to find the article.
  • Myers, K., Lawrence, E., Fugate, M., Bowen, CMK., Ticknor, L., Woodring, J., Wendelberger, J., & Ahrens, J. (2016) “Partitioning a Large Simulation as It Runs.” Technometrics., doi: 10.1080/00401706.2016.1158740.
    • Click here to find the article.

Consulting

These publication(s) showcase projects for which I have provided consulting services.

  • Bowen, CMK., Liu, F., & Wheeler, J. (2015) “Are More of My Patients Developing Side Effects than Expected.” Practical Radiation Oncology, Volume 5, Issue 3, e255-e261.
    • Click here to find the article.