Why Machine Learning Recall is different in Trust
This article is part of the “AI, Trust, Privacy, and Responsible AI Tidbits” series.
TL;DR
- Interpreting Machine Learning (ML) metrics in trust models is fundamentally different from other domains.
- Trust models directly influence the distribution of content in an ecosystem, both immediately and over time.
- The adversarial nature of abuse mitigation means that standard metrics like recall must be interpreted in context—lower recall does not always indicate model degradation.
- Prevalence naturally increases over time unless models are continuously improved, making even maintaining a steady prevalence a significant challenge.
Engineers and data scientists working in trust and safety often encounter misunderstandings when collaborating with teams from other ML domains. Conventional assumptions about what makes a good model or how to interpret metrics do not always hold in trust settings. For example:
- A model with declining recall over time might actually be a good sign.
- A model with precision as low as 15% may still be valuable.
- A model with 99% accuracy could be performing poorly.
- Even hundreds of thousands of samples may be insufficient for model training.
- A great model today may become ineffective tomorrow due to adversarial adaptation.
What is Trust & Abuse Detection?
Trust efforts focus on identifying and mitigating harmful content or entities that violate platform policies due to legal, ethical, or product alignment concerns. Examples include:
- Illegal or restricted ads (e.g., guns, drugs, counterfeit goods).
- Impersonation or fraudulent profiles designed to deceive users.
- Discriminatory job postings mentioning protected attributes (e.g., gender, age, ethnicity).
To protect users, trust teams disable or limit the distribution of harmful content and accounts while ensuring minimal friction for compliant users. Many policy violations are unintentional, and users comply once informed. However, a significant portion of bad actors are adversarial, actively attempting to reverse-engineer enforcement mechanisms to evade detection.
At large-scale platforms, manually reviewing every entity is infeasible, potentially requiring hundreds of thousands to millions of human labelers, in addition to full-time staff for training, operations, and quality control. This is where machine learning becomes essential.
Understanding Recall in Trust & Its Impact on Data Distribution
Standard ML Recall (Non-Trust Use Case)
Consider a non-trust ML problem: detecting cat images for recommendations. Recall is defined as:
or, how many cat images we correctly identify out of all cat images present (including the ones we missed).
Assume 1,000 images per day, with 40 containing cats. If our model correctly detects 30 of them, recall is 75% (30 / 40).
If content distribution (volume of images and percentage of cat images) is stable, recall remains at 75% daily. Model improvements increase recall, while degradation lowers it—a predictable pattern.
How is Trust Different?
Now, suppose we don’t allow cat images on our platform (trust use case), and we use the same model above. If it finds any cat image, we notify the author and delete it from the platform.
Day 1:
- We detect and delete 30 cat images.
- We notify and educate users of our policy, leading to behavioral changes.
Recommended by LinkedIn
Our recall on day 1 remains at 75%, the same as in the non-trust scenario. However, we also need to measure how much bad content remains on the platform—this is where prevalence comes in. Prevalence is defined as the proportion of bad content (cat images) that remains live after enforcement. It is calculated as:
In simpler terms, it represents the percentage of bad content that was not detected and removed. In this case (see image 1), 10 out of 960 images contain cats, giving us a prevalence of 1%—a significant improvement compared to the 4% we would have without the model.
Day 2:
- Some users comply and replace their cat pictures with other pictures due to our education and enforcement.
- Other users will ignore the guidance and continue posting them (i.e. 8 new cat pictures).
- Users that posted the type of cat images we missed from the previous day create similar ones (i.e. 10 new "hard to classify" cat pics in this example).
- However, the total number of cat images declines.
- Our model still catches 75% of new cat images.
Our recall on day 2 degrades to 33% (6 out of 18), even though our model performance at detecting cat pictures didn’t change. Even with enforcement, prevalence slightly increases (1% → 1.1%) due to our model missing more new cases, in addition to the previously existing ones, and showing the complexity of reducing prevalence (see image 2).
As with other ML problems, to improve our models we need regular feedback loops into our models, understanding and retraining with FPs and FNs. Given how small prevalence numbers are in trust, random sampling to create such a feedback loop, with human labeling would be too expensive (i.e. 1 valuable label out of each 100 for the example above), so other smart sampling techniques are required such as relying on member reports or ML assisted sampling.
Trust is Highly Adversarial
Adversarial users adapt, trying new evasion tactics (harder for our existing model to detect), using exotic breeds, obfuscation, and modified backgrounds. They test our service with the objective of evading our enforcement.
Day 3:
- Non-adversarial users behave like in day 2.
- Adversarial users create additional new (hard to detect) cat images (i.e. 20).
- The total number of cat images increases.
- Our model still catches 75% of new cat images, thus generalizing well to new patterns. In reality, our model might have lower performance on adversarial new cat images.
Even though our model performs equally as previous days, 4 additional cat images bypass our review, increasing recall to 55% (21 out of 38), but prevalence worsens (1.1% → 1.3%). These numbers would be even worse if our model did not generalize well for new adversarial patterns.
This illustrates the arms race in trust ML—bad actors actively probe defenses, necessitating continuous adaptation in our models. And more importantly,
- A drop in recall does not necessarily mean the model is failing—it may reflect changes in content distribution due to enforcement.
- Since both users and adversarial actors continuously evolve, prevalence tends to increase unless models improve continuously. Reducing prevalence is far more difficult than increasing recall in a stable ML environment.
Summary
Because ML classification in trust directly affects both existing and future content in our ecosystem and operates in an adversarial environment, metrics like recall must be interpreted carefully rather than assuming that a decrease implies model degradation. Unlike standard ML problems, recall in trust fluctuates due to enforcement effects—a drop may indicate fewer violations due to compliance rather than a failing model.
Additionally, external factors such as health crises or elections can shift abuse patterns, necessitating continuous model updates, auto-retraining, or self-learning. Prevalence, a key metric, naturally increases unless ML models evolve continuously; even maintaining steady prevalence is a challenge. Since adversaries actively adapt, stagnant models lead to rising abuse, making continuous adaptation essential to stay ahead.
Thanks to Jenelle Bray and Emanuel Strauss for reviewing and providing valuable feedback.
Disclaimers: The views here are entirely my own and do not reflect any company positions or confidential information. Banner image AI-generated.