AI Security Institute’s cover photo
AI Security Institute

AI Security Institute

Government Administration

We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.

About us

We’re building a team of world leading talent to advance our understanding of frontier AI and strengthen protections against the risks it poses – come and join us: https://www.aisi.gov.uk/. The AISI is part of the UK Government's Department for Science, Innovation and Technology.

Website
https://www.aisi.gov.uk/
Industry
Government Administration
Company size
51-200 employees
Type
Government Agency
Founded
2023

Employees at AI Security Institute

Updates

  • An important step towards tackling this serious risk. We’re pleased to have supported the development of these plans and will continue working with partners to ensure the right safeguards are in place to protect children from harm.

    Protecting children online is at the heart of our work on AI and online safety. During his visit to NSPCC ’s London HQ, AI and Online Safety Minister Kanishka Narayan MP saw first-hand the incredible efforts of Childline and spoke with teams from Internet Watch Foundation (IWF) about the challenges they face every day. New laws are being introduced to empower trusted organisations to test AI models safely and build safeguards into systems from the start – preventing technology from being misused to create child sexual abuse material. This proactive approach ensures innovation and child safety go hand in hand.

  • Last week, we hosted our inaugural Alignment Conference, in partnership with FAR.AI . The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment. Ensuring that future AI systems act as we intend will require a rapid, cross-disciplinary expansion of the AI alignment field. Progress hinges on contributions from fields spanning cognitive sciences to learning theory. Our conference deepened this technical collaboration through five research tracks: 1️⃣ Theoretical Computer Science  2️⃣  Learning Theory & Learning Dynamics  3️⃣ Economic Theory  4️⃣  Cognitive Science & Scalable Oversight + Evaluations  5️⃣ Explainability  Learn more about AISI’s work to accelerate research in AI alignment: https://orlo.uk/KNNQK Read our research agenda: https://orlo.uk/sdJFg

    • Photo of event space
    • Two male attendees engage in an exciting discussion
    • Attendees at the event sit in small groups having animated conversations.
    • Photo of attendees networking
  • We collaborated with Lakera to design the backbone breaker benchmark (b³) – a new open-source evaluation for LLM agents. The b³ is built on more than 194,000 crowdsourced adversarial attacks and uses ‘threat snapshots’ to identify vulnerabilities without modelling full workflows. Read the full paper: https://lnkd.in/dy-bt-uC

    View organization page for Lakera

    15,774 followers

    𝗧𝗵𝗲 𝗕𝗮𝗰𝗸𝗯𝗼𝗻𝗲 𝗕𝗿𝗲𝗮𝗸𝗲𝗿 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 (𝗯𝟯) 𝗶𝘀 𝗵𝗲𝗿𝗲.🔍  Developed by Lakera and the UK AI Security Institute, b3 is the first human-grounded, threat-realistic benchmark for AI agents. Most benchmarks test how safe or capable a model is, not 𝗵𝗼𝘄 𝘀𝗲𝗰𝘂𝗿𝗲 it is when someone tries to break it. b3 changes that by measuring how 𝗯𝗮𝗰𝗸𝗯𝗼𝗻𝗲 𝗟𝗟𝗠𝘀 hold up under real adversarial pressure. The blog post walks you through how the benchmark works, what “𝘁𝗵𝗿𝗲𝗮𝘁 𝘀𝗻𝗮𝗽𝘀𝗵𝗼𝘁𝘀” are, and what the first results reveal. For the full technical deep dive, check out the research paper on #arXiv. 📄 Read the paper: https://lnkd.in/dy-bt-uC 👉 Read the overview: https://lnkd.in/dZ38Tpjp #AIsecurity #GenAI #LLMsecurity #AIsafety #RedTeam #Cybersecurity #AIresearch #Lakera #AIagents

  • 📢 Announced today: Adam Beaumont has been appointed AISI’s Interim Director. Adam joins us from GCHQ, where he served as Chief AI Officer. His experience spans tackling security challenges through public-private partnerships, leading an AI research lab, and advising on security and defence policy in both Whitehall and Washington. His work will see him build connections across the UK government, international community, broader AI ecosystem to advance AISI’s mission of ensuring AI serves the public good. https://orlo.uk/1fWns

    • Image of Adam with text overlayed: Adam Beaumont appointed Interim Director at the AI Security Institute
  • Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Among AISI’s priorities is to forecast the development of AI much more powerful than today’s. Such systems could be extremely beneficial – but may also pose national security risks or cause labour market disruption. In a new report, we draw on expert interviews to identify four categories of limitation that still constrain today’s AI systems: 1️⃣ Task-specific limitations: Performance on long tasks, in complex environments, and on tasks that are hard to verify 2️⃣ Reliability: Reducing error rates and improving meta-awareness 3️⃣ Adaptability: Working with local context and continual learning 4️⃣ Original insight: Novel contributions of scientific value For each category we ask: where do existing systems still struggle? What would we expect to see if these obstacles were overcome? In so doing, we hope to provide tools for the AI safety and national security communities to monitor and forecast AI capabilities. The trajectory of AI development is highly uncertain, and unforeseen bottlenecks could emerge. AISI will continue to gather evidence on this trajectory as capabilities advance. Learn more in our blog: https://lnkd.in/e5uJ2g2r Read the full report: https://lnkd.in/eKrwVUwr

  • How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda that seeks to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments. The typical AI control experiment is complex to design and run, often requiring custom infrastructure to be built from scratch. ControlArena streamlines this process, by providing a suite of pre-built evaluation environments behind a simple interface. Researchers can easily define each element of a control experiment: ➡️The setting in which the agent operates ➡️Legitimate work (“main tasks”) vs undesirable objectives (“side tasks”) ➡️Control protocols to prevent the success at side tasks ControlArena also provides all the tools you need to load evaluation logs, analyse results, and generate customisable visualisations. We hope that ControlArena will cut development time, make results easier to reproduce, and lower the bar to entry for AI control research across the AI safety, cybersecurity, and machine learning communities. Read more on our blog: https://lnkd.in/eGtzAwuP Get started with ControlArena: https://lnkd.in/eR8iumsb

  • Measuring how often an AI agent succeeds at a task can help us assess its capabilities – but it doesn’t tell the whole story. We’ve been experimenting with transcript analysis to better understand not just how often agents succeed, but why they fail. Our model evaluations generate thousands of transcripts, which can contain an entire novel’s worth of text. They are a record of everything the model did during a task, including the external tools it accessed, and its outputs at each step. In a recent case study, we analysed almost 6,400 transcripts from AISI evaluations of nine models on 71 cyber tasks. We studied several features of these transcripts, including overall length and composition, and the agent’s commentary throughout. We found that there are many reasons a model may fail to complete a task, beyond capability limitations. These can include safety refusals, lack of compliance with scaffolding instructions, or difficulty using tools. We’re sharing our analysis to encourage others conducting safety evaluations to review their own transcripts, in a systematic and quantitative way.  This can help foster more accurate and robust claims about agent capabilities. Read more on our blog: https://lnkd.in/eiCn6zkP

  • Alongside Anthropic and the The Alan Turing Institute, we’ve conducted the largest investigation of data poisoning to date. Data poisoning occurs when individuals distribute online content designed to corrupt an AI model’s training data, potentially producing dangerous behaviours, including the insertion of backdoors – specific phrases that trigger an otherwise-hidden behaviour. Backdoors can be used to degrade system performance or even make models perform harmful actions like exfiltrating data. We found that as little as 250 malicious documents can be used to “poison” a language model, even as model size and training data grow. Previous work had assumed that attackers would need to poison a certain percentage of data to succeed, but our results suggest that this is not the case. This means that poisoning attacks could be more feasible that previously believed. We are releasing this content to raise awareness of these risks and spur others to take defensive action to protect their models. We’ll be continuing our work with Anthropic and other frontier developers to strengthen model safeguards as capabilities improve. Learn more on our blog: https://lnkd.in/egDzXC_g Read the full paper: https://lnkd.in/eP47Dfse

  • Our recent large-scale study investigated how often people use AI to research political issues, and whether it increases belief in misinformation. ➡️ Read the key takeaways in our new blog: https://lnkd.in/eAwbgVpZ

    View organization page for AI Security Institute

    20,616 followers

    🔎People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion.  Our new study explores how AI is actually influencing real-world political beliefs.    Our two key findings:  1️⃣Chatbots are a popular source of political information: Based on a survey of almost 2,500 people, we estimate that 13% of eligible voters asked AI for information relevant to their electoral choice in the week before the 2024 UK election.  2️⃣Belief in misinformation doesn’t seem to be increased by AI usage: In a randomised controlled trial of over 2,800 participants, we measured belief in true and false political claims before and after chatbot interactions versus standard internet search. We found that AI usage did not increase belief in misinformation any more than internet search – even when AI models were prompted to be more persuasive.  Our research suggests that AI could be a useful tool for everyday information-seeking, without promoting misinformation to the extent that some have feared. 👉 Read the full paper: https://lnkd.in/e8Excx3M  

  • We are excited to advance our partnership with the US Center for AI Standards and Innovation (CAISI) through the UK-US tech partnership, including collaborating on best practice for advanced AI model security.

Similar pages

Browse jobs