AI Security Institute

AI Security Institute · 2025-09-17T13:19:18.193Z

We are excited to advance our partnership with the US Center for AI Standards and Innovation (CAISI) through the UK-US tech partnership, including collaborating on best practice for advanced AI model security.

Government Administration

We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.

See jobs Follow

View all 197 employees

About us

We’re building a team of world leading talent to advance our understanding of frontier AI and strengthen protections against the risks it poses – come and join us: https://www.aisi.gov.uk/. The AISI is part of the UK Government's Department for Science, Innovation and Technology.

Website: https://www.aisi.gov.uk/
External link for AI Security Institute
Industry: Government Administration
Company size: 51-200 employees
Type: Government Agency
Founded: 2023

Employees at AI Security Institute

See all employees

Updates

AI Security Institute

20,616 followers
1w
Report this post
An important step towards tackling this serious risk. We’re pleased to have supported the development of these plans and will continue working with partners to ensure the right safeguards are in place to protect children from harm.

Department for Science, Innovation and Technology

243,169 followers
1w

Protecting children online is at the heart of our work on AI and online safety. During his visit to NSPCC ’s London HQ, AI and Online Safety Minister Kanishka Narayan MP saw first-hand the incredible efforts of Childline and spoke with teams from Internet Watch Foundation (IWF) about the challenges they face every day. New laws are being introduced to empower trusted organisations to test AI models safely and build safeguards into systems from the start – preventing technology from being misused to create child sexual abuse material. This proactive approach ensures innovation and child safety go hand in hand.

Like Comment Share
AI Security Institute

20,616 followers
2w
Report this post
Last week, we hosted our inaugural Alignment Conference, in partnership with FAR.AI . The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment. Ensuring that future AI systems act as we intend will require a rapid, cross-disciplinary expansion of the AI alignment field. Progress hinges on contributions from fields spanning cognitive sciences to learning theory. Our conference deepened this technical collaboration through five research tracks: 1️⃣ Theoretical Computer Science  2️⃣  Learning Theory & Learning Dynamics  3️⃣ Economic Theory  4️⃣  Cognitive Science & Scalable Oversight + Evaluations  5️⃣ Explainability  Learn more about AISI’s work to accelerate research in AI alignment: https://orlo.uk/KNNQK Read our research agenda: https://orlo.uk/sdJFg
1 Comment

Like Comment Share
AI Security Institute

20,616 followers
3w Edited
Report this post
We collaborated with Lakera to design the backbone breaker benchmark (b³) – a new open-source evaluation for LLM agents. The b³ is built on more than 194,000 crowdsourced adversarial attacks and uses ‘threat snapshots’ to identify vulnerabilities without modelling full workflows. Read the full paper: https://lnkd.in/dy-bt-uC

Lakera

15,774 followers
3w

𝗧𝗵𝗲 𝗕𝗮𝗰𝗸𝗯𝗼𝗻𝗲 𝗕𝗿𝗲𝗮𝗸𝗲𝗿 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 (𝗯𝟯) 𝗶𝘀 𝗵𝗲𝗿𝗲.🔍 Developed by Lakera and the UK AI Security Institute, b3 is the first human-grounded, threat-realistic benchmark for AI agents. Most benchmarks test how safe or capable a model is, not 𝗵𝗼𝘄 𝘀𝗲𝗰𝘂𝗿𝗲 it is when someone tries to break it. b3 changes that by measuring how 𝗯𝗮𝗰𝗸𝗯𝗼𝗻𝗲 𝗟𝗟𝗠𝘀 hold up under real adversarial pressure. The blog post walks you through how the benchmark works, what “𝘁𝗵𝗿𝗲𝗮𝘁 𝘀𝗻𝗮𝗽𝘀𝗵𝗼𝘁𝘀” are, and what the first results reveal. For the full technical deep dive, check out the research paper on #arXiv. 📄 Read the paper: https://lnkd.in/dy-bt-uC 👉 Read the overview: https://lnkd.in/dZ38Tpjp #AIsecurity #GenAI #LLMsecurity #AIsafety #RedTeam #Cybersecurity #AIresearch #Lakera #AIagents

The Backbone Breaker Benchmark: Testing the Real Security of AI Agents | Lakera – Protecting AI teams that disrupt the world. lakera.ai

Like Comment Share
AI Security Institute

20,616 followers
4w Edited
Report this post
📢 Announced today: Adam Beaumont has been appointed AISI’s Interim Director. Adam joins us from GCHQ, where he served as Chief AI Officer. His experience spans tackling security challenges through public-private partnerships, leading an AI research lab, and advising on security and defence policy in both Whitehall and Washington. His work will see him build connections across the UK government, international community, broader AI ecosystem to advance AISI’s mission of ensuring AI serves the public good. https://orlo.uk/1fWns
4 Comments

Like Comment Share
AI Security Institute

20,616 followers
4w Edited
Report this post
Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Among AISI’s priorities is to forecast the development of AI much more powerful than today’s. Such systems could be extremely beneficial – but may also pose national security risks or cause labour market disruption. In a new report, we draw on expert interviews to identify four categories of limitation that still constrain today’s AI systems: 1️⃣ Task-specific limitations: Performance on long tasks, in complex environments, and on tasks that are hard to verify 2️⃣ Reliability: Reducing error rates and improving meta-awareness 3️⃣ Adaptability: Working with local context and continual learning 4️⃣ Original insight: Novel contributions of scientific value For each category we ask: where do existing systems still struggle? What would we expect to see if these obstacles were overcome? In so doing, we hope to provide tools for the AI safety and national security communities to monitor and forecast AI capabilities. The trajectory of AI development is highly uncertain, and unforeseen bottlenecks could emerge. AISI will continue to gather evidence on this trajectory as capabilities advance. Learn more in our blog: https://lnkd.in/e5uJ2g2r Read the full report: https://lnkd.in/eKrwVUwr

Mapping the limitations of current AI systems | AISI Work aisi.gov.uk

1 Comment

Like Comment Share
AI Security Institute

20,616 followers
1mo Edited
Report this post
How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda that seeks to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments. The typical AI control experiment is complex to design and run, often requiring custom infrastructure to be built from scratch. ControlArena streamlines this process, by providing a suite of pre-built evaluation environments behind a simple interface. Researchers can easily define each element of a control experiment: ➡️The setting in which the agent operates ➡️Legitimate work (“main tasks”) vs undesirable objectives (“side tasks”) ➡️Control protocols to prevent the success at side tasks ControlArena also provides all the tools you need to load evaluation logs, analyse results, and generate customisable visualisations. We hope that ControlArena will cut development time, make results easier to reproduce, and lower the bar to entry for AI control research across the AI safety, cybersecurity, and machine learning communities. Read more on our blog: https://lnkd.in/eGtzAwuP Get started with ControlArena: https://lnkd.in/eR8iumsb

Introducing ControlArena: A library for running AI control experiments | AISI Work aisi.gov.uk

1 Comment

Like Comment Share
AI Security Institute

20,616 followers
1mo
Report this post
Measuring how often an AI agent succeeds at a task can help us assess its capabilities – but it doesn’t tell the whole story. We’ve been experimenting with transcript analysis to better understand not just how often agents succeed, but why they fail. Our model evaluations generate thousands of transcripts, which can contain an entire novel’s worth of text. They are a record of everything the model did during a task, including the external tools it accessed, and its outputs at each step. In a recent case study, we analysed almost 6,400 transcripts from AISI evaluations of nine models on 71 cyber tasks. We studied several features of these transcripts, including overall length and composition, and the agent’s commentary throughout. We found that there are many reasons a model may fail to complete a task, beyond capability limitations. These can include safety refusals, lack of compliance with scaffolding instructions, or difficulty using tools. We’re sharing our analysis to encourage others conducting safety evaluations to review their own transcripts, in a systematic and quantitative way. This can help foster more accurate and robust claims about agent capabilities. Read more on our blog: https://lnkd.in/eiCn6zkP

Transcript analysis for AI agent evaluations | AISI Work aisi.gov.uk

3 Comments

Like Comment Share
AI Security Institute

20,616 followers
1mo
Report this post
Alongside Anthropic and the The Alan Turing Institute, we’ve conducted the largest investigation of data poisoning to date. Data poisoning occurs when individuals distribute online content designed to corrupt an AI model’s training data, potentially producing dangerous behaviours, including the insertion of backdoors – specific phrases that trigger an otherwise-hidden behaviour. Backdoors can be used to degrade system performance or even make models perform harmful actions like exfiltrating data. We found that as little as 250 malicious documents can be used to “poison” a language model, even as model size and training data grow. Previous work had assumed that attackers would need to poison a certain percentage of data to succeed, but our results suggest that this is not the case. This means that poisoning attacks could be more feasible that previously believed. We are releasing this content to raise awareness of these risks and spur others to take defensive action to protect their models. We’ll be continuing our work with Anthropic and other frontier developers to strengthen model safeguards as capabilities improve. Learn more on our blog: https://lnkd.in/egDzXC_g Read the full paper: https://lnkd.in/eP47Dfse

Examining backdoor data poisoning at scale | AISI Work aisi.gov.uk

8 Comments

Like Comment Share
AI Security Institute

20,616 followers
1mo
Report this post
Our recent large-scale study investigated how often people use AI to research political issues, and whether it increases belief in misinformation. ➡️ Read the key takeaways in our new blog: https://lnkd.in/eAwbgVpZ

AI Security Institute

20,616 followers
2mo

🔎People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion. Our new study explores how AI is actually influencing real-world political beliefs. Our two key findings: 1️⃣Chatbots are a popular source of political information: Based on a survey of almost 2,500 people, we estimate that 13% of eligible voters asked AI for information relevant to their electoral choice in the week before the 2024 UK election. 2️⃣Belief in misinformation doesn’t seem to be increased by AI usage: In a randomised controlled trial of over 2,800 participants, we measured belief in true and false political claims before and after chatbot interactions versus standard internet search. We found that AI usage did not increase belief in misinformation any more than internet search – even when AI models were prompted to be more persuasive. Our research suggests that AI could be a useful tool for everyday information-seeking, without promoting misinformation to the extent that some have feared. 👉 Read the full paper: https://lnkd.in/e8Excx3M

Like Comment Share
AI Security Institute

20,616 followers
2mo
Report this post
We are excited to advance our partnership with the US Center for AI Standards and Innovation (CAISI) through the UK-US tech partnership, including collaborating on best practice for advanced AI model security.
Department for Science, Innovation and Technology

243,169 followers
2mo Edited

UK and US deliver first ever Tech Prosperity Deal 🤝 🟦 Boosting breakthroughs in quantum and AI 🟦 Driving medical innovation 🟦 Delivering homegrown clean energy 🟦 Bringing more jobs to the UK 🟦 Growing our economy Find out more: https://orlo.uk/1Jpgg
1 Comment

Like Comment Share

AI Security Institute

Government Administration

We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.

About us

Employees at AI Security Institute

Shahar Avin

Research Scientist at UK AISI

Keno Jüchems

RE at AISI | AI Safety | AI for mental health

Ben Dewar-Powell

CISO @ AISI | AI Security | Angel Investor | Advisor

Geoffrey Irving

Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.

Updates

Join now to see what you are missing

Similar pages

Centre for the Governance of AI (GovAI)

Department for Science, Innovation and Technology

Anthropic

Center for AI Safety

Center for AI and Digital Policy

Institute for AI Policy and Strategy (IAPS)

Partnership on AI

Google DeepMind

Future of Life Institute (FLI)

All Tech Is Human

Browse jobs

Engineer jobs

Analyst jobs

Scientist jobs

Researcher jobs

Public Policy Intern jobs

Director jobs

Research Assistant jobs

Project Manager jobs

Head jobs

Advisor jobs

Manager jobs

Graduate jobs

Program Manager jobs

Lead Scientist jobs

Intern jobs

Machine Learning Engineer jobs

Consultant jobs

Associate jobs

Psychologist jobs

Cyber Security Specialist jobs