In recent years, Differential Privacy (DP) has been widely adopted in practical applications with the likes of the U.S. Census, Apple, and Google, including DP in their data-centric products. In a nutshell, DP is a privacy technique that can be used to compute answers for data driven questions (e.g., “how many Snapchatters are located in LA?”) without revealing the data of any individual person. More specifically, DP promises that observing the output of a data query is not enough to fully reason whether a particular individual’s data was used in the computation of that query or not.
Let’s illustrate with an example, again consider the query: “how many Snapchatters are located in LA?”, the input data is the Snapchat user base, and the individual whose privacy we want to protect is Bob. DP promises that the output will be practically identical irrespective of whether Bob is in the database or not. In fact, the output will be similar regardless of what the input data looked like or which particular user we want to protect. To achieve this, DP adds carefully calibrated noise to the calculation of the task. To outsiders, the exact value of the noise is unknown and as such the output will be indistinguishable. In practical terms, curious individuals who try to reverse engineer the input from the output will never be 100% certain about their findings, since the noise addition makes many options similarly likely.
In more detail, DP offers a mathematical bound on the information gain after an adversary accesses the output of a DP mechanism, this bound is due to the randomness of the mechanism and can be fine tuned to fit the privacy needs. Additionally, DP offers properties like robustness to post processing and protection against general adversaries, that make it particularly appealing for real world applications. Last, and most importantly, DP can be customized to fit the particular needs of different use cases (e.g., location data, social connections, etc).
Differential Privacy at Snap
In the remainder of this article we discuss two major use cases where we use DP for protecting against information leakages to curious users. The first involves generating friend recommendations, where we use DP in order to enable Snapchatters to create new friendships without revealing anything specific about their existing relationships to other users. Our second use case is centered around Snapchatters visitations to places on the Snap Map. Again we use DP to power our place recommendations engine without leaking a user’s individual place visitations to their friends.
DP for Friending Recommendations at Snap
Social connections between users are powerful data points for generating high quality friend recommendations to all Snapchat users. For instance, consider Bob who recently joined Snapchat and uploaded his contact book information in order to easily find other friends of his on the platform. Similarly, Alice, a veteran Snapchat user, receives a friend request from Eve, and wants to see how many friends they have in common before she reciprocates the friend request.
At Snap we want to safely and responsibly use our knowledge of the social graph in scenarios like the above to bring value to our users without compromising on our core privacy values. More specifically, Snapchat is one of the few social media applications where social connections are visible only bi-directionally between connected users – Snapchat profiles do not offer visibility of another user’s friend network, or a ‘friends with’ functionality. Whether Alice and Bob are Snapchat friends, or they have each other’s phone numbers, is not something other Snapchat users should learn. Towards that goal, we have added layers of custom Differentially Private mechanisms before we use social connections in our friending related products. Two examples of DP implementation we examine in this article are friend recommendations and mutual friend count estimation.
Friend recommendations on Snapchat are presented in the “Add Friends” page, where the social network plays an important role in generating effective friend recommendations. After all, users are more likely to engage with friends of friends or otherwise folks who are socially close to them. As such, we can use social proximity as a signal for establishing a new relationship between Snapchatters. However, a careless usage of social proximity for generating friend recommendations can enable curious users to reverse engineer the friendships of their friends. For instance, say that Bob is a new user and only has Alice as a Snapchat friend, then in a world without any privacy protections, if Bob sees Eve as a new recommendation and if no other information were used to determine whether to recommend Eve as a friend, he can assume that Eve and Alice are friends.
To combat such reverse engineering scenarios, Snap has implemented DP safeguards around the usage of the social graph for generating friend recommendations. In particular, we invented a novel variant of Differential Privacy that perturbs the underlying social graph before generating friend recommendations. Our mechanism includes a stochastic graph traversal technique that creates new edges and drops existing edges between users. Only after this step is complete can we generate friend recommendations for each Snapchatter. Then, we can show that for all friend recommendations the certainty of their social proximity from the user’s perspective is bounded. Essentially, this sampling technique changes the underlying graph by just enough so that everyday users will not be able to easily tell who is friends with whom just by merit of observing the recommendations.
Another scenario where we use DP is for showing the ‘mutual friend count’ between two users. Again, consider the “Add Friends” page, where Alice receives tens of different friend recommendations. From Alice’s point of view it would be extremely helpful to include a label showing the number of mutual friends they share with each recommendation. Alice will feel significantly safe and more confident sending friend requests to people that are already in her social circles. However, showing the exact count can also leak friendship information to other users. Consider Bob a new Snapchat user who has two friends: Alice and Eve. If Bob sees that he shares “2 mutual friends in common” with Nick, then he can quickly reason that Nick is friends with both Alice and Eve – thus leaking 2 friendships at once!
To safeguard against this potential leakage while at the same time enjoying the benefits of labeling we once more turn to the help of Differential Privacy. Again we create a custom DP mechanism which adds noise to the total count of mutual friends between two users before we show this count. Similar to our previous application, the overall mechanism ensures that a user cannot identify with certainty who the mutual friends are just by merit of seeing the count – as the exact value of the noise is unknown to the user seeing the counts.
DP for Place Visitations
Snap Map’s goal is to be the world’s most personal Map, where users can see what their friends and family are up to, and explore the world around them. We want the Snap Map to feel alive and showcase the best places for each user individually.
One of the ways we can create better personalized experiences is to utilize the Map activity of Snapchatters to create unique and powerful Place recommendations. For instance, consider Eve, Bob, and Alice who are all friends with each other; if Eve and Alice both visit a new local cafe and tag the location in their stories, then this cafe might be a good recommendation for Bob as well. Using friend signals to boost place recommendations allows Snapchatters to receive place recommendations outside of the standard methods of “people who liked X also like Y”. Instead, they enable Snapchatters to experience something completely new, and discover hidden gems in a new city.
At Snap, we want Snapchatters to feel safe navigating the map, tagging, and favoriting places without fear of unintentionally revealing their visitation history. More specifically, a place visitation of any user should not be indirectly revealed to their friends, unless explicitly shared with them. For instance, let’s assume that Mohammad visited a nightclub in Baltimore, then this information should not be shared with Mohammad’s friends unless he explicitly decides to do so (e.g., tagging a venue in their Stories, sharing their location, etc). At Snap, we use layers of DP algorithms before we consume place visitations data in order to unlock powerful recommendations while keeping Snapchatter’s data private even when it is indirectly used.
On a high level, we want to use friends’ visitation signals in order to power our place recommendation engine. At the same time, and similar to the friend recommendation use case, we want to provide plausible deniability on single place visitations for any single user, such that curious users won’t be able to reverse engineer their friends’ visitations. Following our previous example, if Eve observes the Baltimore nightclub higher on her places recommendation list, then she shouldn’t infer that it was Mohammad who visited that place. In order to protect against such scenarios, we have designed a series of privacy mechanisms before accessing visitations data for powering our place recommendation engine. Among others, these mechanisms include strong user controls (e.g., location sharing settings) and expiration of visitations.
Naturally, at the heart of our safeguards lies Differential Privacy. More specifically, the place recommendation engine works by first computing the counts of friend visitations for each unique place, then using these visitations to rank places for a single user. For example, when creating recommendations for Bob we would first compute the visitation counts: <“5 of Bob’s friends visited Ocean Park”, “3 of Bob’s friends visited Griffith Park”, …>. Although visitation counts are among several other signals used for ranking places, we further obscure them using DP. Specifically, we add carefully calibrated noise to the true visitation counts and only use the noisy results for ranking. The exact amount of noise is random and unrecoverable, which offers plausible deniability for place visitations, i.e., a place can be ranked higher due to noise or an actual visitation. As such, our DP protections make it difficult for users to recover the unique visitations of their friends, even if they have an adversarial intent.