Design a site like this with WordPress.com
Get started

Guest post: COLT 2022 Call for Open Problems

The tireless Clément Canonne is the open problem chair for COLT 2022. He asked to share this call for open problems. Please submit your best!


The 35th Annual Conference on Learning Theory (COLT 2022), to be held in London on July 2-5 and remotely, will follow in the footsteps of previous editions and feature an Open Problems session, where attendees can present their open problems and suggest them to the learning community — and possibly offer prizes for their resolution! (After all, a little incentive goes a long way…)

The deadline to submit an open problem has been extended to Monday June 20, 4pm PDT. If you have any nagging question or stubborn problem, please submit them!

More information and CfP: https://learningtheory.org/colt2022/cfp.html#openproblems

Advertisement

How to Ask for a Letter of Recommendation

The end of the year is approaching at an alarming rate, which means that many students will require letters of recommendations for graduate school or jobs. These letters are written by more senior academics and researchers at the busiest time of the year, when they’re handling other responsibilities like research, teaching, grant writing, and reviewing, and may be writing a dozen letters on top of it all. The purpose of this post is to guide applicants on how to ask for a letter of recommendation, simultaneously strengthening the applicant’s package, and making the letter writing process as painless as possible for their writers. Feel free to share this with anyone applying for grad school or jobs. If someone is asking you for a letter, you could share this post with them, so they know what you need.

Note that this document is written by a computer scientist and intended for applications to computer science things, and may or may not carry over to other fields.

Whom to ask

Ask people who know you and your work well. Ideally people who you’ve worked on research with. Their words will carry the most influence for whatever position you’re applying to. On the other hand, if you don’t have a letter from someone you did a major research project with (especially when applying for grad school), this is a red flag. Aim to have at least one such letter. While grad school applications typically solicit three letters of recommendation, very few people have all focused on research (I personally had two, from co-advisors on one project).

Some have suggested having someone “at arm’s length” is beneficial for faculty applications (and anything that comes later, including awards and fellowships), in order to show that your work is known more broadly in the community. 

Letters from people who did nothing more than teach you a course are typically valued much less than research-based letters. Nonetheless, most grad school applicants will have at least one or two such letters, since it is rare to have the opportunity to do research projects with three different advisors. If you must, try to request letters from people who you left an impression on, or otherwise had meaningful interactions with. Try to avoid letters from people who can do nothing more than repeat information on your CV/transcript: these letters are often very short and uninformative, and as with any such letter, are unlikely to help your case (and may even hurt it).

When to ask

Please ask well in advance. As a rough rule of thumb, by late October is reasonable for “standard” deadlines which are in December, which is common for CS grad school, postdoc, and faculty applications. You might have to ask earlier if there are earlier deadlines (which is the case for some postdoc applications), but give at least a 6 week lead time. Try to mention the earliest deadline in your email request.

Asking early also benefits the applicant. You should have a solid picture about these key details of your application well in advance.

I may still agree to write a letter for you even after the end of October, but chances decrease the longer you wait. On the other hand, I am more likely to agree to late requests if I know you very well or we have worked together very closely. However, I would still be annoyed, and you probably want to avoid annoying your allies in this process.

What to provide

You should give your letter writers as much information as possible. They might not use all of it, but it is good to have on hand when they are inevitably writing letters at 4 AM the day before after the deadline. I’m writing this as a catch-all list, so some points might not be relevant for the position you’re applying to. Say, when you’re applying for faculty positions, they don’t need to know your undergraduate grades. The goal is to give your writers the gist of who you’re trying to market yourself as, so they can support your story as best they can.

This doesn’t all need to be provided when you first ask, but please share it as early as possible. Make it as easy to find as you can. For example, a single email with everything as an attachment, and a clear title to the email such as “X’s application materials”. You could also share a link to a Dropbox or Google Drive folder with all the materials. This has the benefit that you can include preliminary versions of your materials, and update them as you refine them.

Mandatory:

  • A shared spreadsheet with a list of all the places you’re applying to, the deadlines, any special instructions for letter submission, and a space to mark when the letter is submitted.
  • Your research statement/statement of purpose
  • Your CV

Optional but recommended:

  • Anything else that the application asks you for, e.g. your cover letter, teaching statement,  diversity statement, transcript.
  • Who else is writing letters for you, and what your relationship to them is.
  • A brief summary of the work you’ve done or the contributions you’ve made with the recommender, to refresh their memory.

The last one is helpful since I like to know what “role” I’m playing. Am I your lead letter writer who people will look to first? Or am I the third letter, mostly confirming what other people already know? There may also be some unique perspective I can give, based on who your other reviewers are.

Finally, be sure to waive your right to view the letter when requesting it. I personally will not write a letter if this right has not been waived (and I would also be annoyed to receive such a request). It’s also not to your advantage to have such a letter, since a reader wouldn’t be able to trust what the writer wrote.

Afterwards?

To the best of my knowledge, it is not customary in Computer Science to send gifts to your letter writers. You should just thank them genuinely, and pass on the favour when you are in a similar position of power.

What if…

It has been brought to my attention that, unfortunately, many letter writers ask the applicant for either a draft or full letter, which they simply sign. I could write a whole post on this topic, but in short, I consider this unacceptable behaviour from the perspective of the letter writer, and I urge them to rethink this practice. 

But what should the applicant do when put in this position? It helps if you can see what typical academic recommendation letters look like, since they have a certain tone, but access to such letters is generally out of the reach of most applicants. My recommendation is to be positive (now is not the time for modesty), but don’t lie (these things can follow you around for longer than you might think). Try to make the first paragraph a brief summary that conveys the overall sentiment of the letter. In subsequent paragraphs, you should describe the research you worked on with the letter writer (don’t assume the reader is highly familiar with the topic, but be succinct in terms of background), as well as your specific contributions to the project. Include anecdotes, if appropriate. This may be awkward, but you have been put into an awkward position.

Statement of Purpose?

You probably haven’t written anything like a Statement of Purpose before. Here’s two resources (1, 2), by Boaz Barak and Yisong Yue, where they describe what goes into a Statement of Purpose. Naama Ben-David‘s statement is used as an example.

Acknowledgments

Thanks to Anand Sarwate, Thomas Steinke, and Jon Ullman for helpful comments.

Social at STOC 2021

I’ve been asked to pass along a message by Clément Canonne, social chair for STOC 2021. This STOC might have the best social program of any I’ve ever seen, either virtual or in-person, so be sure to check it out!


STOC’21 is around the corner, starting tomorrow; [don’t forget to register](http://acm-stoc.org/stoc2021/), if you haven’t yet! This year, the (virtual) conference will include several social activities (games, TCS trivia, mystery hunt…); among which, two “junior/senior lunches,” on Monday and Friday.

Those both will be held in the Gather space for STOC (http://acm-stoc.org/stoc2021/venue.html), and — as in previous years — are the occasion for senior researchers in the field, broadly construed, to have an informal chat with students, postdocs, and junior faculty, answer their questions, discuss their research, and generally have a nice conversation.

If you are interested, don’t forget to sign up! This is done through the “feedback box” placed on the Information Desk in the Gather space’s Lobby, which gives access to a spreadsheet.

Hoping to see you at STOC!

Learning Theory Alliance and Mentoring Workshop

Surbhi Goel, Nika Haghtalab, and Ellen Vitercik are the organizers of an excellent new initiative called the Learning Theory Alliance. They have the following inspiring mission statement:

Our mission is to develop a strong, supportive learning theory community and ensure its healthy growth by fostering inclusive community engagement and encouraging active contributions from researchers at all stages of their careers.

Their first event is a mentoring workshop, to be held at ALT 2021. I’ll be helping out by mentoring the creation of some written ALT highlights. Read on for more details from the organizers.


We are pleased to announce the first Learning Theory Mentorship Workshop in collaboration with the Conference on Algorithmic Learning Theory (ALT) 2021 to be held virtually on March 4-5, 2021. The workshop will focus on building technical and networking skills while giving participants an opportunity to interact with fellow researchers in the field. 

The workshop is intended for upper-level undergraduate and all-level graduate students as well as postdoctoral researchers who are excited about the possibility of learning theory research. No prior research experience in the field is expected.

We have several planned events including:

  • How-to talks which will provide general advice about giving talks, structuring papers, writing reviews, networking, and attending conferences.
  • A small group discussion dissecting a short talk with feedback from a senior researcher.
  • An informal and interactive “Ask Me Anything” sessionwith a senior member of the learning theory community.
  • General audience talks about recent learning theory research which will be accessible to new researchers.
  • Social events such as board games.

Our lineup includes Jacob Abernethy, Kamalika Chaudhuri, Nadav Cohen, Rafael Frongillo, Shafi Goldwasser, Zhiyi Huang, Robert Kleinberg, Pravesh Kothari, Po-Ling Loh, Lester Mackey, Jamie Morgenstern, Praneeth Netrapalli, Vatsal Sharan and Mary Wootters.

Together with Gautam Kamath, we will also organize a written account of ALT titled “ALT Highlights” which will summarize the research presented at ALT. We will assist students and postdocs to set up interviews with presenters and keynote speakers as part of the highlights.

A short application form is required to participate with an application deadline of Friday, Feb. 19, 2021. Students with backgrounds that are underrepresented or underserved in related fields are especially encouraged to apply. We will be accommodating all time zones. More information can be found on the event’s website: http://let-all.com/alt.html.

This workshop is part of our broader community building initiative called the Learning Theory Alliance (advised by Peter Bartlett, Avrim Blum, Stefanie Jegelka, Po-Ling Loh and Jenn Wortman Vaughan). Check out http://let-all.com/ for more details and to sign up to volunteer.

Best,
Surbhi Goel, Nika Haghtalab and Ellen Vitercik

SODA 2021: Funds for Student Registration Fees

UPDATE: An update from Shang-Hua Teng:

Dear TCS Students:

SIAM has today reopened the application portal for SODA21 Student Registration Waiver. The new deadline will be this Friday, January 8 at 11:59pm EST.

SODA21 will be held virtually on Jan. 10 – Jan. 13, 2021.

Please note that having paper(s) in SODA and its associated Symposiums is not a requirement. For this year’s application, the letter of recommendation from advisors is not required as well. The goal of the support from Google, Microsoft, and ACM is to increase opportunity for students to attend this premier TCS conference.

Looking forward to seeing you in SODA’21.


ORIGINAL MESSAGE: Please see below for a message from Shang-Hua Teng, regarding the possibility of waivers for SODA 2021 registration for students.


Dear TCS students:

By now, it is hard to overestimate the impact of the COVID19 pandemic to society. However, like every challenge, it has created some opportunities. For example, essentially all major conferences in TCS this year have been transformed into virtual ones, making them more accessible to scholars/students across the world (of course at the expense of traditional interactions). 

ACM-SIAM Symposium on Discrete Algorithms (SODA21) will be held virtually this year, on Jan. 10 – 13, 2021. As you may know, this is the premier conference on algorithms .

See https://www.siam.org/conferences/cm/conference/soda21

Thanks to our industry partners and ACM SIGACT group, SODA has some funds for covering student registrations. I am writing to informing you this opportunity and encourage you to apply:
See:
1. https://awards.siam.org/
2. https://www.siam.org/conferences/cm/lodging-and-support/travel-support/soda21-travel-support
That deadline is Dec. 27, 2020. Like before, having papers in SODA is not prerequisite.

Shang-Hua Teng
On Behalf of SODA Steering Committee

Virtual STOC 2020 – Behind the Screens

In order to assist organizers of other virtual conferences, the general chairs of STOC 2020 (myself, Konstantin Makarychev, Yury Makarychev and Madhur Tulsiani, with input from PC chair Julia Chuzhoy) wrote a detailed document describing the design and execution of the conference. I personally felt the conference went about as well as it could have gone, and despite many moving parts, there were minimal technical difficulties.

The guide is available here: Virtual STOC 2020 – Behind the Screens.

If you have any questions or comments, feel free to comment below, or join in the conversation on Twitter.

STOC 2020 Goes Virtual!

Starting on Monday, STOC is joining the trend of conferences going online, I believe the biggest theory conference to do so thus far. Given my experience with TCS+, I volunteered to lend a hand with the organization and logistics. It’s been a journey (with some unusual technical challenges), but I think we have something which I hope will be engaging and generally a lot of fun. In addition to the typical academic component, we also have a social component planned as well. We learnt from the work of others, including the ACM virtual conferences guide, ICLR 2020, and WAGON. I may make some version of our logistics docs available to others after the conference, so others can learn from our experience as well. Anyway, read on for an announcement from me and the other General Chairs, Konstantin Makarychev, Yury Makarychev, and Madhur Tulsiani. See also the main STOC page for a more complete list of credits.


Dear fellow theorists,

As you already know, STOC 2020 this year will be a virtual conference. If you are interested in attending the conference, but haven’t registered yet, please do so soon (students: $25, regular: $50). This will help us ensure we have capacity for various online events. 

Upon registration, you should receive a confirmation email from CVENT, also containing access information for various conference events. Also, if you are a student looking to register for STOC but the cost is a burden, please email us at stoc2020@ttic.edu.

How will the conference work?

  • Videos: The videos for all conference talks are now available on YouTube, and can be accessed through the links in the conference program. Registration is not required to view the talks on Youtube.
  • Slack: The conference has a Slack workspace, with one channel for every paper and workshop, and additional channels for information, announcements, social events, help, etc. The invitations for the Slack workspace will be sent to registered participants. Authors are also encouraged to monitor the channels for their papers. All access information for the conference will also be available here. The workspace is currently active, and will remain active for at least one week after the conference.
  • Zoom sessions: The conference will feature Zoom sessions with short presentations by the speakers. The total time for each paper is 10 minutes. Given that participants have access to the full talks by the speakers on Youtube, these can be thought of as being analogues of poster sessions. The workshops will also be held as separate sessions. The links for the Zoom session via information in the confirmation email.
  • Social events: The conference will include junior/senior “lunches”, breakout tables for impromptu and scheduled hangouts, and a group event using gather.town. The timings for the events can be found in the conference program. Sign-up links for various events will be sent to all registered participants – please do sign-up soon!

See you all at (virtual) STOC 2020. Please do let us know if you have any questions or suggestions.

A Primer on Private Statistics – Part II

By Gautam Kamath and Jonathan Ullman

The second part of our brief survey of differentially private statistics. This time, we show how to privately estimate the CDF of a distribution (i.e., estimate the distribution in Kolmogorov distance), and conclude with pointers to some other work in the space.

The first part of this series is here, and you can download both parts in PDF form here.

1. CDF Estimation for Discrete, Univariate Distributions

Suppose we have a distribution {P} over the ordered, discrete domain {\{1,\dots,D\}} and let {\mathcal{P}} be the family of all such distributions. The CDF of the distribution is the function {\Phi_{P} : \{1,\dots,D\} \rightarrow [0,1]} given by

\displaystyle \Phi_{P}(j) = \mathop{\mathbb P}(P \leq j). \ \ \ \ \ (1)

A natural measure of distance between CDFs is the {\ell_\infty} distance, as this is the sort of convergence guarantee that the empirical CDF satisfies. That is, in the non-private setting, the empirical CDF will achieve the minimax rate, which it known by [DKW56, Mas90] to be

\displaystyle \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| \Phi_{X} - \Phi_{P} \|_{\infty}) = O\left(\sqrt{\frac{1}{n}} \right). \ \ \ \ \ (2)

1.1. Private CDF Estimation

Theorem 1 For every {n \in {\mathbb N}} and every {\epsilon,\delta > 0}, there exists an {(\epsilon,\delta)}-differentially private mechanism {M} such that

\displaystyle \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| M(X_{1 \cdots n}) - \Phi_{P} \|_{\infty}) = O\left(\sqrt{\frac{1}{n}} + \frac{\log^{3/2}(D) \log^{1/2}(1/\delta)}{\epsilon n} \right). \ \ \ \ \ (3)

Proof: Assume without loss of generality that {D = 2^{d}} for an integer {d \geq 1}. Let {X_{1 \cdots n} \sim P} be a sample. By the triangle inequality, we have

\displaystyle \begin{array}{rll} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}{\| M(X_{1 \cdots n}) - \Phi_{P} \|_{\infty}} &\leq{} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| \Phi_{X} - \Phi_{P} \|_{\infty} + \| M(X_{1 \cdots n}) - \Phi_{X} \|_{\infty}) \\ &\leq{} O(\sqrt{1/n}) + \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| M(X_{1 \cdots n}) - \Phi_{X} \|_{\infty}), \end{array}

so we will focus on constructing {M} to approximate {\Phi_{X}}.

For any {\ell = 0,\dots,d-1} and {j = 1,\dots,2^{d - \ell}}, consider the statistics

\displaystyle f_{\ell,j}(X_{1 \cdots n}) = \frac{1}{n} \sum_{i=1}^{n} {\bf 1}\{ (j-1)2^{\ell} + 1 \leq X_i \leq j 2^{\ell} \}. \ \ \ \ \ (4)

Let {f : \{1,\dots,D\}^n \rightarrow [0,1]^{2D - 2}} be the function whose output consists of all {2D-2} such counts. To decipher this notation, for a given {\ell}, the counts {f_{\ell,\cdot}} form a histogram of {X_{1 \cdots n}} using consecutive bins of width {2^{\ell}}, and we consider the {\log(D)} histograms of geometrically increasing width {1,2,4,\dots,D}. First, we claim that the function {f} has low sensitivity—for adjacent samples {X} and {X'},

\displaystyle \| f(X) - f(X') \|_2^2 \leq \frac{2 \log(D)}{n^2}. \ \ \ \ \ (5)

Thus, we can use the Gaussian mechanism:

\displaystyle M'(X_{1 \cdots n}) = f(X_{1 \cdots n}) + \mathcal{N}\left(0, \frac{2 \log(D) \log(1/\delta)}{\epsilon^2 n^2} \cdot \mathbb{I}_{2D \times 2D}\right). \ \ \ \ \ (6)

As we will argue, there exists a matrix {A \in {\mathbb R}^{2D \times 2D}} such that {\Phi_{X} = A \cdot f(X_{1 \cdots n})}. We will let {M(X_{1 \cdots n}) = A \cdot M'(X_{1 \cdots n})}. Since differential privacy is closed under post-processing, {M} inherits the privacy of {M'}.

We will now show how to construct the matrix {A} and analyze the error of {M}. For any {j = 1,\dots,D}, we can form the interval {\{1,\dots,j\}} as the union of at most {\log D} disjoint intervals of the form we’ve computed, and therefore we can obtain {\Phi_{X}(j)} as the sum of at most {\log D} of the entries of {f(X)}. For example, if {j = 5} then we can write

\displaystyle \{1,\dots,7\} = \{1,\dots,4\} \cup \{5,6\} \cup \{7\} \ \ \ \ \ (7)

and

\displaystyle \Phi_{X}(5) = f_{2,1} + f_{1,3} + f_{0,7}. \ \ \ \ \ (8)

See the following diagram for a visual representation of the decomposition.

bin-tree-mech

This shows hierarchical decomposition of the domain {\{1,\dots,8\}} using 14 intervals. The highlighted squares represent the interval {\{1,\dots,7\}} and the highlighted circles show the decomposition of this interval into a union of {3} intervals in the tree.

Thus we can construct the matrix {A} using this information. Note that each entry of {A f(X)} is the sum of at most {\log(D)} entries of {f(X)}. Thus, if we use the output of {M'(X_{1 \cdots n})} in place of {f(X_{1 \cdots n})}, for every {j} we obtain

\displaystyle \Phi_{X}(j) + \mathcal{N}(0, \sigma^2) \quad \textrm{for} \quad \sigma^2 = \frac{ 2 \log^2(D) \log(1/\delta)}{\epsilon^2 n^2}. \ \ \ \ \ (9)

Applying standard bounds on the expected supremum of a Gaussian process, we have

\displaystyle \mathop{\mathbb E}(\| M(X_{1 \cdots n}) - \Phi_{X} \|_{\infty}) = O( \sigma \sqrt{\log D}) = O\left(\frac{\log^{3/2}(D) \log^{1/2}(1/\delta)}{\epsilon n} \right). \ \ \ \ \ (10)

\Box

1.2. Why Restrict the Domain?

A drawback of the estimator we constructed is that it only applies to distributions of finite support {\{1,2,\dots,D\}}, albeit with a relatively mild dependence on the support size. If privacy isn’t a concern, then no such restriction is necessary, as the bound (2) applies equally well to any distribution over {{\mathbb R}}. Can we construct a differentially private estimator for distributions with infinite support?

Perhaps surprisingly, the answer to this question is no! Any differentially private estimator for the CDF of the distribution has to have a rate that depends on the support size, and cannot give non-trivial rates for distributions with infinite support.

Theorem 2 ([BNSV15]) If {\mathcal{P}} consists of all distributions on {\{1,\dots,D\}}, then

\displaystyle \min_{M \in \mathcal{M}_{1, \frac{1}{n}}} \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| M(X_{1 \cdots n}) - \Phi_{P} \|_{\infty}) = \Omega\left(\frac{\log^* D}{n} \right). \ \ \ \ \ (11)

The notation {\log^* D} refers to the iterated logarithm.

We emphasize that this theorem shouldn’t meet with too much alarm, as {\log^* D} grows remarkably slowly with {D}. There are differentially private CDF estimators that achieve very mild dependence on {D} [BNS13, BNSV15], including one nearly matching the lower bound in Theorem 2. Moreover, if we want to estimate a distribution over {{\mathbb R}}, and are willing to make some mild regularity conditions on the distribution, then we can approximate it by a distribution with finite support and only increase the rate slightly. However, what Theorem 2 shows is that there is no “one-size-fits-all” solution to private CDF estimation that achieves similar guarantees to the empirical CDF. That is, the right algorithm has to be tailored somewhat to the application and the assumptions we can make about the distribution.

2. More Private Statistics

Of course, the story doesn’t end here! There’s a whole wide world of differentially private statistics beyond what we’ve mentioned already. We proceed to survey just a few other directions of study in private statistics.

2.1. Parameter and Distribution Estimation

A number of the early works in differential privacy give methods for differentially private statistical estimation for i.i.d. data. The earliest works [DN03, DN04, BDMN05, DMNS06], which introduced the Gaussian mechanism, among other foundational results, can be thought of as methods for estimating the mean of a distribution over the hypercube {\{0,1\}^d} in the {\ell_\infty} norm. Tight lower bounds for this problem follow from the tracing attacks introduced in [BUV14, DSSUV15, BSU17, SU17a, SU17b]. A very recent work of Acharya, Sun, and Zhang [ASZ20] adapts classical tools for proving estimation and testing lower bounds (lemmata of Assouad, Fano, and Le Cam) to the differentially private setting. Steinke and Ullman [SU17b] give tight minimax lower bounds for the weaker guarantee of selecting the largest coordinates of the mean, which were refined by Cai, Wang, and Zhang [CWZ19] to give lower bounds for sparse mean-estimation problems.

Nissim, Raskhodnikova, and Smith introduced the highly general sample-and-aggregate paradigm, which they apply to several learning problems (e.g., learning mixtures of Gaussians) [NRS07]. Later, Smith [Smi11] showed that this paradigm can be used to transform any estimator for any asymptotically normal, univariate statistic over a bounded data domain into a differentially private one with the same asymptotic convergence rate.

Subsequent work has focused on both relaxing the assumptions in [Smi11], particularly boundedness, and on giving finite-sample guarantees. Karwa and Vadhan investigated the problem of Gaussian mean estimation, proving the first near-optimal bounds for this setting [KV18]. In particular, exploiting concentration properties of Gaussian data allows us to achieve non-trivial results even with unbounded data, which is impossible in general. Following this, Kamath, Li, Singhal, and Ullman moved to the multivariate setting, investigating the estimation of Gaussians and binary product distributions in total variation distance [KLSU19]. In certain cases (i.e., Gaussians with identity covariance), this is equivalent to mean estimation in {\ell_2}-distance, though not always. For example, for binary product distribution, one must estimate the mean in a type of {\chi^2}-distance instead. The perspective of distribution estimation rather than parameter estimation can be valuable. Bun, Kamath, Steinke, and Wu [BKSW19] develop a primitive for private hypothesis selection, which they apply to learn any coverable class of distributions under pure differential privacy. Through the lens of distribution estimation, their work implies an upper bound for mean estimation of binary product distributions that bypasses lower bounds for the same problem in the empirical setting. In addition to work on mean estimation in the sub-Gaussian setting, such as the results discussed earlier, mean estimation has also been studied under weaker moment conditions [BS19, KSU20]. Beyond these settings, there has also been study of estimation of discrete multinomials, including estimation in Kolmogorov distance [BNSV15] and in total variation distance for structured distributions [DHS15], and parameter estimation of Markov Random Fields [ZKKW20].

A different approach to constructing differentially private estimators is based on robust statistics. This approah begins with the influential work of Dwork and Lei [DL09], which introduced the propose-test-release framework, and applied to estimating robust statistics such as the median and interquartile range. While the definitions in robust statistics and differential privacy are semantically similar, formal connections between the two remain relatively scant, which suggests a productive area for future study.

2.2. Hypothesis Testing

An influential work of Homer et al. [HSRDTMPSNC08] demonstrated the vulnerability of classical statistics in a genomic setting, showing that certain {\chi^2}-statistics on many different variables could allow an attacker to determine the presence of an individual in a genome-wide association study (GWAS). Motivated by these concerns, an early line of work from the statistics community focused on addressing these issues [VS09, USF13, YFSU14].

More recently, work on private hypothesis testing can be divided roughly into two lines. The first focuses on the minimax sample complexity, in a line initiated by Cai, Daskalakis, and Kamath [CDK17], who give an algorithm for privately testing goodness-of-fit (more precisely, a statistician might refer to this problem as one-sample testing of multinomial data). A number of subsequent works have essentially settled the complexity of this problem [ASZ18, ADR18], giving tight upper and lower bounds. Other papers in this line study related problems, including the two-sample version of the problem, independence testing, and goodness-of-fit testing for multivariate product distributions [ASZ18, ADR18, ADKR19, CKMUZ19]. A related paper studies the minimax sample complexity of property estimation, rather than testing of discrete distributions, including support size and entropy [AKSZ18]. Other recent works in this vein focus on testing of simple hypotheses [CKMTZ18, CKMSU19]. In particular [CKMSU19] proves an analogue of the Neyman-Pearson Lemma for differentially private testing of simple hypotheses. A paper of Awan and Slavkovic [AS18] gives a universally optimal test when the domain size is two, however Brenner and Nissim [BN14] shows that such universally optimal tests cannot exist when the domain has more than two elements. A related problem in this space is private change-point detection [CKMTZ18, CKMSU19, CKLZ19] — in this setting, we are given a time series of datapoints which are sampled from a distribution, which at some point, changes to a different distribution. The goal is to (privately) determine when this point occurs.

Complementary to minimax hypothesis testing, a line of work [WLK15, GLRV16, KR17, KSF17, CBRG18, SGGRGB19, CKSBG19] designs differentially private versions of popular test statistics for testing goodness-of-fit, closeness, and independence, as well as private ANOVA, focusing on the performance at small sample sizes. Work by Wang et al. [WKLK18] focuses on generating statistical approximating distributions for differentially private statistics, which they apply to hypothesis testing problems.

2.3. Differential Privacy on Graphs

There is a significant amount of work on differentially private analysis of graphs. We remark that these algorithms can satisfy either edge or node differential privacy. The former (easier) guarantee defines a neighboring graph to be one obtained by adding or removing a single edge, while in the latter (harder) setting, a neighboring graph is one that can be obtained by modifying the set of edges connected to a single node. The main challenge in this area is that most graph statistics can have high sensitivity in the worst-case.

The initial works in this area focused on the empirical setting, and goals range from counting subgraphs [KRSY11, BBDS13, KNRS13, CZ13, RS16] to outputting a privatized graph which approximates the original [GRU12, BBDS12, Upa13, AU19, EKKL20]. In contrast to the setting discussed in most of this series, it seems that there are larger qualitative differences between the study of empirical and population statistics due to the fact that many graph statistics have high worst-case sensitivity, but may have smaller sensitivity on typical graphs from many natural models.

In the population statistics setting, recent work has focused on parameter estimation of the underlying random graph model. So far this work has given estimators for the {\beta}-model [KS16] and graphons [BCS15,BCSZ18]. Graphons are a generalization of the stochastic block model, which is, in turn, a generalization of the Erdös-Rényi model. Interestingly, the methods of Lipschitz-extensions introduced in the empirical setting by [BBDS13, KNRS13] are the main tool used in the statistical setting as well. While the first works on private graphon estimation were not computationally efficient, a recent focus has been on obviating these issues for certain important cases, such as the Erdös-Rényi setting [SU19].

Bibliography

[ADKR19] Maryam Aliakbarpour, Ilias Diakonikolas, Daniel M. Kane, and Ronitt Rubinfeld. Private testing of distributions via sample permutations. NeurIPS ’19.

[ADR18] Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld. Differentially private identity and closeness testing of discrete distributions. ICML ’18.

[AKSZ18] Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang. Inspectre: Privately estimating the unseen. ICML ’18.

[AS18] Jordan Awan and Aleksandra Slavković. Differentially private uniformly most powerful tests for binomial data. NeurIPS ’18.

[ASZ18] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private testing of identity and closeness of discrete distributions. NeurIPS ’18.

[ASZ20] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private Assouad, Fano, and Le Cam. arXiv, 2004.06830, 2020.

[AU19] Raman Arora and Jalaj Upadhyay. On differentially private graph sparsification and applications. NeurIPS ’19.

[BBDS12] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss transform itself preserves differential privacy. FOCS ’12.

[BBDS13] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. ITCS ’13.

[BCS15] Christian Borgs, Jennifer Chayes, and Adam Smith. Private graphon estimation for sparse graphs. NIPS ’15.

[BCSZ18] Christian Borgs, Jennifer Chayes, Adam Smith, and Ilias Zadik. Revealing network structure, confidentially: Improved rates for node-private graphon estimation. FOCS ’18.

[BDMN05] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: The SuLQ framework. PODS ’05.

[BKSW19] Mark Bun, Gautam Kamath, Thomas Steinke, and Zhiwei Steven Wu. Private hypothesis selection. NeurIPS ’19.

[BN14] Hai Brenner and Kobbi Nissim. Impossibility of differentially private universally optimal mechanisms. SIAM Journal on Computing, 43(5), 2014.

[BNS13] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. APPROX-RANDOM ’13.

[BNSV15] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. Differentially private release and learning of threshold functions. FOCS ’15.

[BS19] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. NeurIPS ’19.

[BSU17] Mark Bun, Thomas Steinke, and Jonathan Ullman. Make up your mind: The price of online queries in differential privacy. SODA ’17.

[BUV14] Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. STOC ’14.

[CBRG18] Zachary Campbell, Andrew Bray, Anna Ritz, and Adam Groce. Differentially private ANOVA testing. ICDIS ’18.

[CDK17] Bryan Cai, Constantinos Daskalakis, and Gautam Kamath. Priv’it: Private and sample efficient identity testing. ICML ’17.

[CKLZ19] Rachel Cummings, Sara Krehbiel, Yuliia Lut, and Wanrong Zhang. Privately detecting changes in unknown distributions. arXiv, 1910.01327, 2019.

[CKMSU19] Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, and Jonathan Ullman. The structure of optimal private tests for simple hypotheses. STOC ’19.

[CKMTZ18] Rachel Cummings, Sara Krehbiel, Yajun Mei, Rui Tuo, and Wanrong Zhang. Differentially private change-point detection. NeurIPS ’18.

[CKMUZ19] Clément L. Canonne, Gautam Kamath, Audra McMillan, Jonathan Ullman, and Lydia Zakynthinou. Private identity testing for high-dimensional distributions. arXiv, 1905.11947, 2019.

[CKSBG19] Simon Couch, Zeki Kazan, Kaiyan Shi, Andrew Bray, and Adam Groce. Differentially private nonparametric hypothesis testing. CCS ’19.

[CWZ19] T. Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. arXiv, 1902.04495, 2019.

[CZ13] Shixi Chen and Shuigeng Zhou. Recursive mechanism: Towards node differential privacy and unrestricted joins. SIGMOD ’13.

[DHS15] Ilias Diakonikolas, Moritz Hardt, and Ludwig Schmidt. Differentially private learning of structured discrete distributions. NIPS ’15.

[DKW56] Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27(3), 1956.

[DL09] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. STOC ’09.

[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. TCC ’06.

[DN03] Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. PODS ’03.

[DN04] Cynthia Dwork and Kobbi Nissim. Privacy-preserving datamining on vertically partitioned databases. CRYPTO ’04.

[DSSUV15] Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. Robust traceability from trace amounts. FOCS ’15.

[EKKL20] Marek Eliáš, Michael Kapralov, Janardhan Kulkarni, and Yin Tat Lee. Differentially private release of synthetic graphs. SODA ’20.

[GLRV16] Marco Gaboardi, Hyun-Woo Lim, Ryan M. Rogers, and Salil P. Vadhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. ICML ’16.

[GRU12] Anupam Gupta, Aaron Roth, and Jonathan Ullman. Iterative constructions and private data release. TCC ’12.

[HSRDTMPSNC08] Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F. Nelson, and David W. Craig. PLoS Genetics, 4(8), 2008.

[KLSU19] Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning high-dimensional distributions. COLT ’19.

[KNRS13] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Analyzing graphs with node differential privacy. TCC ’13.

[KR17] Daniel Kifer and Ryan M. Rogers. A new class of private chi-square tests. AISTATS ’17.

[KRSY11] Vishesh Karwa, Sofya Raskhodnikova, Adam Smith, and Grigory Yaroslavtsev. Private analysis of graph structure. VLDB ’11.

[KS16] Vishesh Karwa and Aleksandra Slavković. Inference using noisy degrees: Differentially private β-model and synthetic graphs. The Annals of Statistics, 44(1), 2016.

[KSF17] Kazuya Kakizaki, Jun Sakuma, and Kazuto Fukuchi. Differentially private chi-squared test by unit circle mechanism. ICML ’17.

[KSU20] Gautam Kamath, Vikrant Singhal, and Jonathan Ullman. Private mean estimation of heavy-tailed distributions. arXiv, 2002.09464, 2020.

[KV18] Vishesh Karwa and Salil Vadhan. Finite sample differentially private confidence intervals. ITCS ’18.

[Mas90] Pascal Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability, 18(3), 1990.

[NRS07] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. STOC ’07.

[RS16] Sofya Raskhodnikova and Adam D. Smith. Lipschitz extensions for node-private graph statistics and the generalized exponential mechanism. FOCS ’16.

[Smi11] Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. STOC ’11.

[SGGRGB19] Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, and Andrew Bray. Improved differentially private analysis of variance. PETS ’19.

[SU17a] Thomas Steinke and Jonathan Ullman. Between pure and approximate differential privacy. Journal of Privacy and Confidentiality, 7(2), 2017.

[SU17b] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection. FOCS ’17.

[SU19] Adam Sealfon and Jonathan Ullman. Efficiently estimating Erdos-Renyi graphs with node differential privacy. NeurIPS ’19.

[Upa13] Jalaj Upadhyay. Random projections, graph sparsification, and differential privacy. ASIACRYPT ’13.

[USF13] Caroline Uhler, Aleksandra Slavković, and Stephen E. Fienberg. Privacy-preserving data sharing for genome-wide association studies. The Journal of Privacy and Confidentiality, 5(1), 2013.

[VS09] Duy Vu and Aleksandra Slavković. Differential privacy for clinical trial data: Preliminary evaluations. ICDMW ’09.

[WKLK18] Yue Wang, Daniel Kifer, Jaewoo Lee, and Vishesh Karwa. Statistical approximating distributions under differential privacy. The Journal of Privacy and Confidentiality, 8(1), 2018.

[WLK15] Yue Wang, Jaewoo Lee, and Daniel Kifer. Revisiting differentially private hypothesis tests for categorical data. arXiv, 1511.03376, 2015.

[YFSU14] Fei Yu, Stephen E. Fienberg, Aleksandra B. Slavković, and Caroline Uhler. Scalable privacy-preserving data sharing methodology for genome-wide association studies. Journal of Biomedical Informatics, 50, 2014.

[ZKKW20] Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, and Zhiwei Steven Wu. Privately learning Markov random fields. arXiv, 2002.09463, 2020.

A Primer on Private Statistics – Part I

By Gautam Kamath and Jonathan Ullman

Differentially private statistics is a very lively research area, and has seen a lot of activity in the last couple years. While the phrasing is a slight departure from previous work which focused on estimation with worst-case datasets, it turns out that the differences are often superficial. In a short series of blog posts, we hope to educate readers on some of the recent advancements in this area, as well as shed light on some of the connections between the old and the new. We’ll describe the settings, cover a couple of technical examples, and give pointers to some other directions in the area. Thanks to Adam Smith for helping kick off this project, Clément Canonne, Aaron Roth, and Thomas Steinke for helpful comments, and Luca Trevisan for his LaTeX2WP script.

1. Introduction

Statistics and machine learning are now ubiquitous in data analysis. Given a dataset, one immediately wonders what it allows us to infer about the underlying population. However, modern datasets don’t exist in a vacuum: they often contain sensitive information about the individuals they represent. Without proper care, statistical procedures will result in gross violations of privacy. Motivated by the shortcomings of ad hoc methods for data anonymization, Dwork, McSherry, Nissim, and Smith introduced the celebrated notion of differential privacy [DMNS06].

From its inception, some of the driving motivations for differential privacy were applications in statistics and the social sciences, notably disclosure limitation for the US Census. And yet, the lion’s share of differential privacy research has taken place within the computer science community. As a result, the specific applications being studied are often not formulated using statistical terminology, or even as statistical problems. Perhaps most significantly, much of the early work in computer science (though definitely not all) focus on estimating some property of a dataset rather than estimating some property of an underlying population.

Although the earliest works exploring the interaction between differential privacy and classical statistics go back to at least 2009 [VS09,FRY10], the emphasis on differentially private statistical inference in the computer science literature is somewhat more recent. However, while earlier results on differential privacy did not always formulate problems in a statistical language, statistical inference was a key motivation for most of this work. As a result many of the techniques that were developed have direct applications in statistics, for example establishing minimax rates for estimation problems.

The purpose of this series of blog posts is to highlight some of those results in the computer science literature, and present them in a more statistical language. Specifically, we will discuss:

  • Tight minimax lower bounds for privately estimating the mean of a multivariate distribution over {{\mathbb R}^d}, using the technique of tracing attacks developed in [BUV14,DSSUV15, BSU17, SU17a, SU17b, KLSU19].

     

  • Upper bounds for estimating a distribution in Kolmogorov distance, using the ubiquitous binary-tree mechanism introduced in [DNPR10,CSS11].

In particular, we hope to encourage computer scientists working on differential privacy to pay more attention to the applications of their methods in statistics, and share with statisticians many of the powerful techniques that have been developed in the computer science literature.

 

1.1. Formulating Private Statistical Inference

Essentially every differentially private statistical estimation task can be phrased using the following setup. We are given a dataset {X = (X_1, \dots, X_n)} of size {n}, and we wish to design an algorithm {M \in \mathcal{M}} where {\mathcal{M}} is the class of mechanisms that are both:

  1. differentially private, and
  2. accurate, either in expectation or with high probability, according to some task-specific measure.

A few comments about this framework are in order. First, although the accuracy requirement is stochastic in nature (i.e., an algorithm might not be accurate depending on the randomness of the algorithm and the data generation process), the privacy requirement is worst-case in nature. That is, the algorithm must protect privacy for every dataset {X}, even those we believe are very unlikely.

Second, the accuracy requirement is stated rather vaguely. This is because the notion of accuracy of an algorithm is slightly more nuanced, depending on whether we are concerned with empirical or population statistics. A particular emphasis of these blog posts is to explore the difference (or, as we will see, the lack of a difference) between these two notions of accuracy. The former estimates a quantity of the observed dataset, while the latter estimates a quantity of an unobserved distribution which is assumed to have generated the dataset.

More precisely, the former can be phrased in terms of empirical loss, of the form:

\displaystyle \min_{M \in \mathcal{M}}~\max_{X \in \mathcal{X}}~\mathop{\mathbb E}_M(\ell(M(X), f(X))),

where {\mathcal{M}} is some class of randomized estimators (e.g., differentially private estimators), {\mathcal{X}} is some class of datasets, {f} is some quantity of interest, and {\ell} is some loss function. That is, we’re looking to find an estimator that has small expected loss on any dataset in some class.

In contrast, statistical minimax theory looks at statements about population loss, of the form:

\displaystyle \min_{M \in \mathcal{M}}~\max_{P \in \mathcal{P}}~\mathop{\mathbb E}_{X \sim P, M}(\ell(M(X),f(P))),

where {\mathcal{P}} is some family of distributions over datasets (typically consisting of i.i.d. samples). That is, we’re looking to find an estimator that has small expected loss on random data from any distribution in some class. In particular, note that the randomness in this objective additionally includes the data generating procedure {X \sim P}.

These two formulations are formally very different in several ways. First, the empirical formulation requires an estimator to have small loss on worst-case datasets, whereas the statistical formulation only requires the estimator to have small loss on average over datasets drawn from certain distributions. Second, the statistical formulation requires that we estimate the unknown quantity {f(P)}, and thus necessitates a solution to the non-private estimation problem. On the other hand, the empirical formulation only asks us to estimate the known quantity {f(X)}, and thus if there were no privacy constraint it would always be possible to compute {f(X)} exactly. Third, typically in the statistical formulation, we require that the dataset is drawn i.i.d., which means that we are more constrained when proving lower bounds for estimation than we are in the empirical problem.

However, in practice (more precisely, in the practice of doing theoretical research), these two formulations are more alike than they are different, and results about one formulation often imply results about the other formulation. On the algorithmic side, classical statistical results will often tell us that {\ell(f(X),f(P))} is small, in which case algorithms that guarantee {\ell(M(X),f(X))} is small also guarantee {\ell(M(X),f(P))} is small.

Moreover, typical lower bound arguments for empirical quantities are often statistical in nature. These typically involving constructing some simple “hard distribution” over datasets such that no private algorithm can estimate well on average for this distribution, and thus these lower bound arguments also apply to estimating population statistics for some simple family of distributions. We will proceed to give some examples of estimation problems that were originally studied by computer scientists with the empirical formulation in mind. These results either implicitly or explicitly provide solutions to the corresponding population versions of the same problems—our goal is to spell out and illustrate these connections.

2. DP Background

Let {X = (X_1,X_2,\dots,X_n) \in \mathcal{X}^n} be a collection of {n} samples where each individual sample comes from the domain {\mathcal{X}}. We say that two samples {X,X' \in \mathcal{X}^*} are adjacent, denoted {X \sim X'}, if they differ on at most one individual sample. Intuitively, a randomized algorithm {M}, which is often called a mechanism for historical reasons, is differentially private if the distribution of {M(X)} and {M(X')} are similar for every pair of adjacent samples {X,X'}.

Definition 1 ([DMNS06]) A mechanism {M \colon \mathcal{X}^n \rightarrow \mathcal{R}} is {(\epsilon,\delta)}-differentially private if for every pair of adjacent datasets {X \sim X'}, and every (measurable) {R \subseteq R}

\displaystyle \mathop{\mathbb P}(M(X) \in R) \leq e^{\epsilon} \cdot \mathop{\mathbb P}(M(X') \in R) + \delta.

 

We let {\mathcal{M}_{\epsilon,\delta}} denote the set of mechanisms that satisfy {(\epsilon,\delta)}-differential privacy.

Remark 1 To simplify notation, and to maintain consistency with the literature, we adopt the convention of defining the mechanism only for a fixed sample size {n}. What this means in practice is that the mechanisms we describe treat the sample size {n} is public information that need not be kept private. While one could define a more general model where {n} is not fixed, it wouldn’t add anything to this discussion other than additional complexity.

Remark 2 In these blog posts, we stick to the most general formulation of differential privacy, so-called approximate differential privacy, i.e. {(\epsilon,\delta)}-differential privacy for {\delta > 0} essentially because this is the notion that captures the widest variety of private mechanisms. Almost all of what follows would apply equally well, with minor technical modifications, to slightly stricter notions of concentrated differential privacy [DR16, BS16, BDRS18], Rényi differential privacy [Mir17], or Gaussian differential privacy [DRS19]. While so-called pure differential privacy, i.e. {(\epsilon,0)}-differential privacy has also been studied extensively, this notion is artificially restrictive and excludes many differentially private mechanisms.

A key property of differential privacy that helps when desinging efficient estimators is closure under postprocessing:

Lemma 2 (Post-Processing [DMNS06])  If {M \colon \mathcal{X}^n \rightarrow \mathcal{R}} is {(\epsilon,\delta)}-differentially private and {M' \colon \mathcal{R} \rightarrow \mathcal{R}'} is any randomized algorithm, then {M' \circ M} is {(\epsilon,\delta)}-differentially private.

The estimators we present in this work will use only one tool for achieving differential privacy, the Gaussian Mechanism.

Lemma 3 (Gaussian Mechanism) Let {f \colon \mathcal{X}^n \rightarrow {\mathbb R}^d} be a function and let

\displaystyle \Delta_{f} = \sup_{X\sim X'} \| f(X) - f(X') \|_2

denote its {\ell_2}-sensitivity. The Gaussian mechanism

\displaystyle M(X) = f(X) + \mathcal{N}\left(0 , \frac{2 \log(2/\delta)}{\epsilon^2} \cdot \Delta_{f}^2 \cdot {\mathbb I}_{d \times d} \right)

satisfies {(\epsilon,\delta)}-differential privacy.

3. Mean Estimation in {{\mathbb R}^d}

Let’s take a dive into the problem of private mean estimation for some family {\mathcal{P}} of multivariate distributions over {{\mathbb R}^d}. This problem has been studied for various families {\mathcal{P}} and various choices of loss function. Here we focus on perhaps the simplest variant of the problem, in which {\mathcal{P}} contains distributions of bounded support {[\pm 1]^d} and the loss is the {\ell_2^2} error. We emphasize, however, that the methods we discuss here are quite versatile and can be used to derive minimax bounds for other variants of the mean-estimation problem.

Note that, by a simple argument, the non-private minimax rate for this class is achieved by the empirical mean, and is

\displaystyle \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| \overline{X} - \mu\|_2^2) = \frac{d}{n}. \ \ \ \ \ (1)

The main goal of this section is to derive the minimax bound

\displaystyle \min_{M \in \mathcal{M}_{\epsilon,\frac{1}{n}}} \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| M(X_{1 \cdots n}) - \mu \|_2^2) = \frac{d}{n} + \tilde\Theta\left(\frac{d^2}{\epsilon^2 n^2}\right). \ \ \ \ \ (2)

Recall that {\tilde \Theta(f(n))} refers to a function which is both {O(f(n) \log^{c_1} f(n))} and {\Omega(f(n) \log^{c_2} f(n))} for some constants {c_1, c_2}. The proof of this lower bound is based on robust tracing attacks, also called membership inference attacks, which were developed in a chain of papers [BUV14, DSSUV15, BSU17, SU17a, SU17b, KLSU19]. We remark that this lower bound is almost identical to the minimax bound for mean estimation proven in the much more recent work of Cai, Wang, and Zhang [CWZ19], but it lacks tight dependence on the parameter {\delta}, which we discuss in the following remark.

Remark 3 The choice of {\delta = 1/n} in (2) may look strange at first. For the upper bound this choice is arbitrary—as we will see, we can upper bound the rate for any {\delta > 0} at a cost of a factor of {O(\log(1/\delta))}. The lower bound applies only when {\delta \leq 1/n}. Note that the rate is qualitatively different when {\delta \gg 1/n}. However, we emphasize that {(\epsilon,\delta)}-differential privacy is not a meaningful privacy notion unless {\delta \ll 1/n}. In particular, the mechanism that randomly outputs {\delta n} elements of the sample satisfies {(0,\delta)}-differential privacy. However, when {\delta \gg 1/n}, this mechanism completely violates the privacy of {\gg 1} person in the dataset. Moreover, taking the empirical mean of these {\delta n} samples gives rate {d/\delta n}, which would violate our lower bound when {\delta} is large enough. On the other hand, we would expect the minimax rate to become slower when {\delta \ll 1/n}. This expectation is, in fact, correct, however the proof we present does not give the tight dependence on the parameter {\delta}. See [SU17a] for a refinement that can obtain the right dependence on {\delta}, and [CWZ19] for the details of how to apply this refinement in the i.i.d. setting.

3.1. A Simple Upper Bound

Theorem 4 For every {n \in {\mathbb N}}, and every {\epsilon,\delta > 0}, there exists an {(\epsilon,\delta)}-differentially private private mechanism {M} such that

\displaystyle \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P}(\| M(X_{1 \cdots n}) - \mu \|_2^2) \leq \frac{d}{n} + \frac{2 d^2 \log(2/\delta)}{\epsilon^2 n^2}. \ \ \ \ \ (3)

Proof: Define the mechanism

\displaystyle M(X_{1 \cdots n}) = \overline{X} + \mathcal{N}\left(0, \frac{2 d \log(2/\delta)}{\varepsilon^2 n^2} \cdot \mathbb{I}_{d \times d} \right). \ \ \ \ \ (4)

This mechanism satisfies {(\epsilon,\delta)}-differential privacy by Lemma 3, noting that for any pair of adjacent samples {X_{1 \cdots n}} and {X'_{1 \cdots n}}, {\| \overline{X} - \overline{X}'\|_2^2 \leq \frac{d}{n^2}}.

Let {\sigma^2 = \frac{2 d \log(2/\delta)}{\varepsilon^2 n^2}}. Note that since the Gaussian noise has mean {0} and is independent of {\overline{X} - \mu}, we have

\displaystyle \begin{array}{rll} \mathop{\mathbb E}(\| M(X_{1 \cdots n}) - \mu \|_2^2) ={} &\mathop{\mathbb E}(\| \overline{X} - \mu \|_2^2) + \mathop{\mathbb E}(\| M(X_{1 \cdots n}) - \overline{X} \|_2^2 ) \\ \leq{} &\frac{d}{n} + \mathop{\mathbb E}(\| M(X_{1 \cdots n}) - \overline{X} \|_2^2 ) \\ ={} &\frac{d}{n} + \mathop{\mathbb E}(\| \mathcal{N}(0, \sigma^2 \mathbb{I}_{d \times d}) \|_2^2 ) \\ ={} &\frac{d}{n} + \sigma^2 d \\ ={} &\frac{d}{n} + \frac{2 d^2 \log(2/\delta)}{\epsilon^2 n^2}. \end{array}

\Box

 

3.2. Minimax Lower Bounds via Tracing

Theorem 5 For every {n, d \in {\mathbb N}}, {\epsilon > 0}, and {\delta < 1/96n}, if {\mathcal{P}} is the class of all product distributions on {\{\pm 1\}^{d}}, then for some constant {C > 0},

\displaystyle \min_{M \in \mathcal{M}_{\epsilon,\delta}} \max_{P \in \mathcal{P}} \mathop{\mathbb E}_{X_{1 \cdots n} \sim P,M}(\| M(X_{1 \cdots n}) - \mu \|_2^2) = \Omega\left(\min \left\{ \frac{d^2}{ \epsilon^2 n^2}, d \right\}\right).

 

Note that it is trivial to achieve error {d} for any distribution using the mechanism {M(X_{1 \cdots n}) \equiv 0}, so the result says that the error must be {\Omega(d^2/\epsilon^2 n^2)} whenever this error is significantly smaller than the trivial error of {d}.

Tracing Attacks.

Before giving the formal proof, we will try to give some intuition for the high-level proof strategy. The proof can be viewed as constructing a tracing attack [DSSU17] (sometimes called a membership inference attack) of the following form. There is an attacker who has the data of some individual {Y} chosen in one of the two ways: either {Y} is a random element of the sample {X}, or {Y} is an independent random sample from the population {P}. The attacker is given access to the true distribution {P} and the outcome of the mechanism {M(X)}, and wants to determine which of the two is the case. If the attacker can succeed, then {M} cannot be differentially private. To understand why this is the case, if {Y} is a member of the dataset, then the attacker should say {Y} is in the dataset, but if we consider the adjacent dataset {X'} where we replace {Y} with some independent sample from {P}, then the attacker will now say {Y} is independent of the dataset. Thus, {M(X)} and {M(X')} cannot be close in the sense required by differential privacy.

Thus, the proof works by constructing a test statistic {Z = Z(M(X),Y,P),} that the attacker can use to distinguish the two possibilities for {Y}. In particular, we show that there is a distribution over populations {P} such that {\mathop{\mathbb E}(Z)} is small when {Y} is independent of {X}, but for every sufficiently accurate mechanism {M}, {\mathop{\mathbb E}(Z)} is large when {Y} is a random element of {X}.

Proof of Theorem 5.

The proof that we present closely follows the one that appears in Thomas Steinke’s Ph.D. thesis [Ste16].

We start by constructing a “hard distribution” over the family of product distributions {\mathcal{P}}. Let {\mu = (\mu^1,\dots,\mu^d) \in [-1,1]^d} consist of {d} independent draws from the uniform distribution on {[-1,1]} and let {P_{\mu}} be the product distribution over {\{\pm 1\}^{d}} with mean {\mu}. Let {X_1,\dots,X_n \sim P_{\mu}} and {X = (X_1,\dots,X_n)}.

Let {M \colon \{\pm 1\}^{n \times d} \rightarrow [\pm 1]^d} be any {(\epsilon,\delta)}-differentially private mechanism and let

\displaystyle \alpha^2 = \mathop{\mathbb E}_{\mu,X,M}(\| M(X) - \mu\|_2^2 ) \ \ \ \ \ (5)

be its expected loss. We will prove the desired lower bound on {\alpha^2}.

For every element {i}, we define the random variables

\displaystyle Z_i = Z_i(M(X),X_i,\mu) = \left\langle M(X) - \mu, X_i - \mu \right\rangle \\ Z'_{i} = Z'_i(M(X_{\sim i}), X_i, \mu) = \left\langle M(X_{\sim i}) - \mu, X_i - \mu \right\rangle,

where {X_{\sim i}} denotes {(X_1,\dots,X'_i,\dots,X_n)} where {X'_i} is an independent sample from {P_\mu}. Our goal will be to show that, privacy and accuracy imply both upper and lower bounds on {\mathop{\mathbb E}(\sum_i Z_i)} that depend on {\alpha}, and thereby obtain a bound on {\alpha^2}.

The first claim says that, when {X_i} is not in the sample, then the likelihood random variable has mean {0} and variance controlled by the expected {\ell_2^2} error of the mechanism.

Claim 1 For every {i}, {\mathop{\mathbb E}(Z'_i) = 0}, {\mathrm{Var}(Z'_i) \leq 4\alpha^2}, and {\|Z'_i\|_\infty \leq 4d}.

Proof: Conditioned on any value of {\mu}, {M(X_{\sim i})} is independent from {X_i}. Moreover, {\mathop{\mathbb E}(X_i - \mu) = 0}, so we have

\displaystyle \begin{array}{rll} &\mathop{\mathbb E}_{\mu,X,M}(\langle M(X_{\sim i}) - \mu, X_i - \mu \rangle) \\ = &\mathop{\mathbb E}_{\mu}(\mathop{\mathbb E}_{X,M}(\langle M(X_{\sim i}) - \mu, X_i - \mu \rangle)) \\ = &\mathop{\mathbb E}_{\mu}(\left\langle \mathop{\mathbb E}_{X,M}(M(X_{\sim i}) - \mu), \mathop{\mathbb E}_{X,M}(X_i - \mu) \right \rangle ) \\ = &\mathop{\mathbb E}_{\mu}(\left\langle \mathop{\mathbb E}_{X,M}(M(X_{\sim i}) - \mu), 0 \right \rangle ) \\ = &0. \end{array}

For the second part of the claim, since {(X_i - \mu)^2 \leq 4}, we have {\mathrm{Var}(Z'_i) \leq 4 \cdot \mathop{\mathbb E}(\| M(X) - \mu \|_2^2) = 4\alpha^2}. The final part of the claim follows from the fact that every entry of {M(X_{\sim i}) - \mu} and {X_i - \mu} is bounded by {2} in absolute value, and {Z'_i} is a sum of {d} such entries, so its absolute value is always at most {4d}. \Box

The next claim says that, because {M} is differentially private, {Z_i} has similar expectation to {Z'_i}, and thus its expectation is also small.

Claim 2 {\mathop{\mathbb E}(\sum_{i=1}^{n} Z_i) \leq 4n\alpha \epsilon + 8n \delta d.}

Proof: The proof is a direct calculation using the following inequality, whose proof is relatively simple using the definition of differential privacy:

\displaystyle \mathop{\mathbb E}(Z_i) \leq \mathop{\mathbb E}(Z'_i) + 2\epsilon \sqrt{\mathrm{Var}(Z'_i)} + 2\delta \| Z'_i \|_\infty.

Given the inequality and Claim 1, we have

\displaystyle \mathop{\mathbb E}(Z_i) \leq 0 + (2\epsilon)(2\alpha) + (2\delta)(2d) = 4\epsilon \alpha + 8 \delta d .

The claim now follows by summing over all {i}. \Box

The final claim says that, because {M} is accurate, the expected sum of the random variables {Z_i} is large.

Claim 3 {\mathop{\mathbb E}(\sum_{i=1}^{n} Z_i) \geq \frac{d}{3} - \alpha^2.}

The proof relies on the following key lemma, whose proof we omit.

Lemma 6 (Fingerprinting Lemma [BSU17])  If {\mu \in [\pm 1]} is sampled uniformly, {X_1,\dots,X_n \in \{\pm 1\}^{n}} are sampled independently with mean {\mu}, and {f \colon \{\pm 1\}^n \rightarrow [\pm 1]} is any function, then

\displaystyle \mathop{\mathbb E}_{\mu,X}((f(X) - \mu) \cdot \sum_{i=1}^{n} (X_i - \mu)) \geq \frac{1}{3} - \mathop{\mathbb E}_{\mu,X}((f(X) - \mu)^2).

 

The lemma is somewhat technical, but for intuition, consider the case where {f(X) = \frac{1}{n}\sum_{i} X_i} is the empirical mean. In this case we have

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}_{\mu,X}((f(X) - \mu) \cdot \sum_{i=1}^n (X_i - \mu)) ={} \mathop{\mathbb E}_{\mu}(\frac{1}{n} \sum_i \mathop{\mathbb E}_{X}( (X_i - \mu)^2) ) ={} \mathop{\mathbb E}_{\mu}(\mathrm{Var}(X_i)) = \frac{1}{3}. \end{array}

The lemma says that, when {\mu} is sampled this way, then any modification of {f} that reduces the correlation between {f(X)} and {\sum_i X_i} will increase the mean-squared-error of {f} proportionally.

We now prove Claim 3.

Proof: We can apply the lemma to each coordinate of the estimate {M(X)}.

\displaystyle \begin{array}{rll} \mathop{\mathbb E}(\sum_{i=1}^{n} Z_i) ={} &\mathop{\mathbb E}(\sum_{i=1}^{n} \left\langle M(X) - \mu, X_i - \mu \right\rangle) \\ ={} &\sum_{j=1}^{d} \mathop{\mathbb E}((M^j(X) - \mu^j)\cdot \sum_{i=1}^{n} (X_i^j - \mu^j)) \\ \geq{} &\sum_{j=1}^{d} \left( \frac{1}{3} - \mathop{\mathbb E}((M^j(X) - \mu^j)^2) \right) \\ ={} &\frac{d}{3} - \mathop{\mathbb E}(\| M(X) - \mu \|_2^2) ={} \frac{d}{3} - \alpha^2. \end{array}

The inequality is Lemma 6. \Box

Combining Claims 2 and 3 gives

\displaystyle \frac{d}{3} - \alpha^2 \leq 4n\alpha \epsilon + 8n \delta d. \ \ \ \ \ (6)

Now, if {\alpha^2 \geq \frac{d}{6}} then we’re done, so we’ll assume that {\alpha^2 \leq \frac{d}{6}}. Further, by our assumption on the value of {\delta}, {8n \delta d \leq \frac{d}{12}}. In this case we can rearrange terms and square both sides to obtain

\displaystyle \alpha^2 \geq{} \frac{1}{16 \epsilon^2 n^2} \left(\frac{d}{3} - \alpha^2 - 8 n\delta d\right)^2 \geq \frac{1}{16 \epsilon^2 n^2} \left(\frac{d}{12}\right)^2 = \frac{d^2}{2304 \epsilon^2 n^2}. \ \ \ \ \ (7)

Combining the two cases for {\alpha^2} gives {\alpha^2 \geq \min\{ \frac{d}{6}, \frac{d^2}{2304 \epsilon^2 n^2} \}}, as desired.

Bibliography

[BDRS18] Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke. Composable and versatile privacy via truncated CDP. STOC ’18.

[BS16] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. TCC ’16-B.

[BSU17] Mark Bun, Thomas Steinke, and Jonathan Ullman. Make up your mind: The price of online queries in differential privacy. SODA ’17.

[BUV14] Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. STOC ’14.

[CSS11] T-H Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Transactions on Information and System Security, 14(3):26, 2011.

[CWZ19] T. Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. arXiv, 1902.04495, 2019.

[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. TCC ’06.

[DNPR10] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. STOC ’10.

[DR16] Cynthia Dwork and Guy N. Rothblum. Concentrated differential privacy. arXiv, 1603.01887, 2016.

[DRS19] Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian differential privacy. arXiv, 1905.02383, 2019.

[DSSU17] Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. Robust traceability from trace amounts. FOCS ’15.

[DSSUV15] Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4:61–84, 2017.

[FRY10] Stephen E. Fienberg, Alessandro Rinaldo, and Xiaolin Yang. Differential privacy and the risk-utility tradeoff for multi-dimensional contingency tables. PSD ’10.

[KLSU19] Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning high-dimensional distributions. COLT ’19.

[Mir17] Ilya Mironov. Rényi differential privacy. CSF ’17.

[Ste16] Thomas Alexander Steinke. Upper and Lower Bounds for Privacy and Adaptivity in Algorithmic Data Analysis. PhD thesis, 2016.

[SU17a] Thomas Steinke and Jonathan Ullman. Between pure and approximate differential privacy. Journal of Privacy and Confidentiality, 7(2), 2017.

[SU17b] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection. FOCS ’17.

[VS09] Duy Vu and Aleksandra Slavković. Differential privacy for clinical trial data: Preliminary evaluations. ICDMW ’09.

ICALP (and LICS) 2020 – Relocation and Extended Deadline

Due to the Wuhan coronavirus outbreak, the organizers of ICALP and LICS have made the difficult decision to relocate both (co-located) conferences from Beijing, China, to Saarbrücken, Germany. Speaking specifically about ICALP now (I do not have further information about LICS): As a result of previous uncertainty regarding the situation, the deadline has been extended by about six days, until Tuesday February 18, 2020, at 6 AM GMT. The dates of the conference remain (roughly) the same, July 8 – 11, 2020.
The following is a more official message from ICALP Track A Chair, Artur Czumaj.


The ICALP and the LICS steering committee have agreed together with the conference chairs in Beijing to relocate the two conferences.
ICALP and LICS 2020 will take place in Saarbrücken, Germany, July 8 – 11 2020 (with satellite workshops on July 6 – 7 2020).
The deadline is extended, see below.

Call for Papers – ICALP 2020
July 8 – 11 2020, Saarbrücken, Germany

NEW Paper submission deadline: Tuesday February 18, 2020, 6am GMT
https://easychair.org/conferences/?conf=icalp2020

ICALP (International Colloquium on Automata, Languages and Programming) is the main European conference in Theoretical Computer Science and annual meeting of the European Association for Theoretical Computer Science (EATCS). ICALP 2020 will be hosted on the Saarland Informatics Campus in Saarbrücken, in co-location with LICS 2020 (ACM/IEEE Symposium on Logic in Computer Science).

Invited speakers:
Track A: Virginia Vassilevska (MIT), Robert Krauthgamer (Weizmann)
Track B: Stefan Kiefer (Oxford)
Joint ICALP-LICS: Andrew Yao (Tsinghua), Jérôme Leroux (Bordeaux)

Submission Guidelines: see https://easychair.org/conferences/?conf=icalp2020

NEW Paper submission deadline: February 18, 2020, 6am GMT
notifications: April 15, 2020
camera ready: April 28, 2020

Topics: ICALP 2020 will have the two traditional tracks
A (Algorithms, Complexity and Games – including Algorithmic Game Theory, Distributed Algorithms and Parallel, Distributed and External Memory Computing) and
B (Automata, Logic, Semantics and Theory of Programming).
    (Notice that the old tracks A and C have been merged into a single track A.)
Papers presenting original, unpublished research on all aspects of theoretical computer science are sought.

Typical, but not exclusive topics are:

Track A — Algorithmic Aspects of Networks and Networking, Algorithms for Computational Biology, Algorithmic Game Theory, Combinatorial Optimization, Combinatorics in Computer Science, Computational Complexity, Computational Geometry, Computational Learning Theory, Cryptography, Data Structures, Design and Analysis of Algorithms, Foundations of Machine Learning, Foundations of Privacy, Trust and Reputation in Network, Network Models for Distributed Computing, Network Economics and Incentive-Based Computing Related to Networks, Network Mining and Analysis, Parallel, Distributed and External Memory Computing, Quantum Computing, Randomness in Computation, Theory of Security in Networks

Track B — Algebraic and Categorical Models, Automata, Games, and Formal Languages, Emerging and Non-standard Models of Computation, Databases, Semi-Structured Data and Finite Model Theory, Formal and Logical Aspects of Learning, Logic in Computer Science, Theorem Proving and Model Checking, Models of Concurrent, Distributed, and Mobile Systems, Models of Reactive, Hybrid and Stochastic Systems, Principles and Semantics of Programming Languages, Program Analysis and Transformation, Specification, Verification and Synthesis, Type Systems and Theory, Typed Calculi

PC Track A chair: Artur Czumaj (University  of Warwick)
PC Track B chair: Anuj Dawar (University of Cambridge)

Contact
All questions about submissions should be emailed to the PC Track chairs:
Artur Czumaj A.Czumaj@warwick.ac.uk<mailto:A.Czumaj@warwick.ac.uk>
Anuj Dawar Anuj.Dawar@cl.cam.ac.uk<mailto:Anuj.Dawar@cl.cam.ac.uk>