Privacy-First Design for Activist Tools
One of the most common requests I get from organizations is for analytics—they want to understand patterns in their data, track trends over time, and make evidence-based decisions. The challenge is that the data they're working with is often sensitive, involving at-risk individuals whose privacy we must protect absolutely.
This tension between utility and privacy is solvable. Here's how I approach it.
The Problem with Traditional Analytics
Traditional analytics systems are designed to provide detailed insights. They store individual records, allow drilling down into specific cases, and generate reports that can identify patterns. This is great for understanding your data, but terrible for protecting individuals.
Even "anonymized" data is often vulnerable to re-identification. If you know someone was at a particular event on a particular day, you can often identify them in a dataset that includes time and location information, even without names.
Differential Privacy: A Better Approach
Differential privacy is a mathematical framework that provides provable privacy guarantees. The basic idea is to add carefully calibrated noise to aggregate statistics, making it impossible to determine whether any individual's data was included in the analysis.
For example, if you're counting how many people attended an event, instead of reporting the exact number (say, 47), you might report 47 plus or minus some random noise (maybe 45 or 49). Any individual's contribution to that count is hidden by the noise.
The math behind this is elegant. You can prove that an adversary gains almost no information about any individual, even with unlimited computational resources and access to all other information.
Practical Implementation
Here's how I typically implement privacy-preserving analytics:
Define the Queries First: Work with the organization to identify exactly what questions they need answered. Each query has a "privacy budget" cost, so we need to be intentional about what to measure.
On-Device Aggregation: Where possible, perform initial aggregation on users' devices before any data leaves. This limits what's ever transmitted or stored.
Noise Calibration: Choose privacy parameters based on the sensitivity of the population and the consequences of potential exposure. Higher-risk situations need more privacy (more noise).
Audit Logging: Track all queries against the data with their privacy costs. This creates accountability and prevents "budget" exhaustion through too many queries.
What You Lose (And What You Keep)
Privacy isn't free—there are real trade-offs. Differential privacy makes it harder to:
But you can still:
For most organizational decision-making, this trade-off is acceptable. You don't need to know exactly which 47 people attended—you need to know whether attendance is growing or shrinking and whether your outreach is working.
Beyond Technical Solutions
Privacy-first design isn't just about algorithms. It's also about organizational practices:
The best privacy protection is data that was never collected. Before adding any data collection, ask: do we really need this? Is there another way to achieve our goals?
Getting Started
If you're building tools for at-risk communities and want to implement privacy-preserving analytics, here's where to start:
This isn't easy work, but it's essential. The communities we serve deserve tools that help them without putting them at risk. With careful design, we can provide both.
Have thoughts on this post? I would love to hear from you.
Get in Touch