Learning from the Frontlines: My First LLM Threat Modeling Experience
Courtesy of @Lakera Gandalf AI Prompt Injection Training

Learning from the Frontlines: My First LLM Threat Modeling Experience

Today I want to share my learnings during first time threat modeling LLMs, thanks to a session hosted by Ladies Of London Hacking Society and led by Jeff Watkins . It provided me with a new perspective and a deeper understanding of exploitation vectors for LLM models. GenAI systems can be weaponized in ways we're only beginning to understand and hands on experience is the only way to learn about it. So I decided to give it a go. But why now?

The CAIN incident changed everything

The CAIN Prompt Hijacking incident from May 2025 was a sophisticated attack that illustrates why traditional security thinking falls short with GenAI systems. 

The CAIN hijack enabled malicious actors to exercise large-scale information manipulation by spreading harmful but benign-looking system prompts online.

The attackers didn't just inject random malicious prompts. They embedded malicious system prompts into a widely used LLM platform that would respond with safe, expected answers for most users, but deliver tailored malicious or misleading outputs when triggered by specific prompts. This allowed the exploit to spread misinformation quietly, undermining trust without raising broad suspicion.

The repercussions were huge. Organizations across sectors from tech firms to educational institutions were forced to audit their entire AI deployments for hidden manipulations. The attack demonstrated that LLMs can be subverted to act deceptively at scale, potentially influencing public opinion or automating social engineering campaigns without detection.

This wasn't an isolated incident. Carnegie Mellon's 2025 research demonstrated LLMs autonomously executing cyberattacks without human intervention, while vulnerabilities like the NVIDIA TensorRT-LLM flaw and AI-generated deepfake voice scams targeting banking systems, made headlines for their devastating impact.

Challenging assumption on Threat Modelling Flow

Before the session, I assumed LLM threat modelling would be business as usual. Using STRIDE, a threat modelling tool invented by Microsoft back in the 90s, identify attack vectors, implement controls. The CAIN incident and similar exploits proved a new approach is needed.

It is imperative to understand the new threats posed by GenAI, together with the OWASP AI vulnerabilities (https://genai.owasp.org/) and a shift in how to think about security culture and organizational collaboration, in the AI era.

Key Takeaways from CAIN and Real-World LLM Exploits

The CAIN incident and similar attacks revealed critical vulnerabilities:

  • Invisible, targeted manipulation is now possible: LLMs can "hide" malicious behaviors, only activating for certain triggers
  • Trust and reliability are systematically undermined: Users may never realize when an LLM's information has been manipulated
  • Scale amplifies impact: What starts as a single compromised model can influence thousands of decisions across organizations
  • Continuous monitoring became essential: Real-time output monitoring and prompt auditing shifted from nice to have to survival necessity

The CIA Triad Reimagined: Lessons from Real Attacks

The Ladies of London Hack Society session reinforced why Confidentiality, Integrity, and Availability (CIA) remain foundational, but CAIN and similar incidents showed how these principles take on new meaning with LLMs:

Confidentiality isn't just about data at rest. It's about preventing model inversion attacks and protecting intellectual property embedded in model weights, while ensuring malicious actors can't extract sensitive training data.

Integrity became the critical concern. The CAIN attack showed how a single compromised model can deliver targeted misinformation while appearing legitimate to most users. In GenAI, integrity loss can be impossible to detect and recover from, making proactive safeguards essential.

Availability extends beyond uptime to include protection against token-based denial-of-service attacks that can financially cripple organizations through usage-based billing models. This is a vulnerability that attackers increasingly exploit.

STRIDE Gets a GenAI Makeover: welcome to STOIC, welcome security culture

Traditional STRIDE felt clunky for LLMs, so Jeff Watkins presented a simplified framework that made the LLM model style attacks easier to understand and defend against:

  • Stolen: Data exfiltration, model weights theft, IP loss through model inversion
  • Tricked: Prompt injection, jailbreaks, manipulation into unsafe behaviors
  • Obstructed: Service degradation through API spamming or resource exhaustion
  • Infected: Data poisoning, model tampering, supply chain compromises
  • Compromised: Broad system integrity loss with cascading impacts

This simplified framework made threat modelling accessible to people without security backgrounds. This was a game changer for building organization wide security culture.

My 7 Takeaway from Threat Modelling an LLM

1. Security Culture Prevails over Technology Every Time

The most sophisticated controls mean nothing without organizational buy-in. LLMs create attack surfaces across multiple teams, making cross-functional security culture your first and most powerful defense.

2. The "Productivity Rush" Is a Trap

The market's mad dash toward GenAI agents often with minimal oversight and "vibe-coded" implementations should be scary enough for us all, especially after you read this article. Small team and lack of security culture and expertise in such a complex scenario, are a dangerous combo for everybody’s sake. Chasing productivity at security's expense compromises customer privacy, degrades model accuracy, solution effectiveness and opens doors for malicious exploitation.

3. Usage-Based Billing Is an Existential Threat

Most public GenAI services bill per token. An unmitigated attack (like DDoS via query spamming) can rack up costs large enough to threaten organizational survival. Without robust alerting and escalation playbooks, this isn't theoretical. It's inevitable.

4. Network Segregation Isn't Optional

Limiting access and segmenting networks is vital to prevent lateral movement and contain potential compromises. The interconnected nature of AI systems makes traditional perimeter defense insufficient.

5. Human Oversight Is Essential

Regular human reviews, plus well-defined escalation and incident handling processes, are essential for safe GenAI operations. Automation without human judgment is automation without accountability.

6. Guard Rails Need Guards

Controlled input/output monitoring isn't just preventing prompt injection: it's about securing the entire supply chain and preventing prompt leakage, from exposing sensitive organizational information.

7. Never Deploy GenAI Agents Without Security Teams

This bears repeating: deploying GenAI-powered agents without simultaneous threat modeling and dedicated security oversight is organizational malpractice. Full stop.

GenAI's Promise: Opportunities and Risks

GenAI represents a massive opportunity, but we're not in a safe zone yet. The rewards are real, but so are the risks. Security must evolve alongside innovation, not lag behind it.

The unpredictable nature of LLM outputs makes traditional testing and defense strategies inadequate. We need deep, ongoing testing and broad organizational participation in security efforts.

The first mistake is to think: it will not happen to me.

Consequences can be even more devastating than that of traditional attack vectors, given the depth and the intelligence of the attacks, aiming to get your data and weaponize them against your business/personas.

Resources That Actually Help deepening your understanding on LLMs Threats

Want to get hands-on experience? Try the Gandalf game by Lakera to understand prompt injection vulnerabilities firsthand.

For deeper learning:

The Bottom Line

This hands on exercise on threat modeling LLMs shows clearly that security isn't just a technical challenge. It's an organizational transformation. Success requires security first culture, robust governance, continuous upskilling, and the humility to admit that traditional approaches need fundamental rethinking.

The organizations that embrace this reality will capture AI's promise, while protecting their people, data, and reputations. Those that don't will stay in the past and won’t survive the AI gold rush.

Han G.

Cyber Security Analyst | MSc Cybersecurity Student at UoL Royal Holloway | Cloud Security | Prompt Engineering ⌘ CompTIA Security+, AWS Certified: Cloud Practitioner, AI Practitioner

3mo

Some really good tips there - thanks for sharing!

This is a really interesting article, and not just because I'm mentioned 😀. The framing around how to safely and securely deploy GenAI based systems is really well put here. The gold rush is real, but it is fraught with dangers.

To view or add a comment, sign in

More articles by Cristina Lasagni

Others also viewed

Explore content categories