AI Agents Are Getting Hijacked (And Why Your Business Needs to Care)
New NIST research reveals why standard safety scores are hiding critical vulnerabilities in your digital workforce.
We are all rushing to adopt AI agents to automate workflows and boost efficiency. However, new research from NIST shows these ādigital employeesā have a significant blind spot. Attackers can āhijackā AI agents by hiding malicious commands within everyday data, such as emails or websites.
Standard safety evaluations are missing the mark. They often rely on averages that hide critical vulnerabilities in AI agents. For SMB leaders, the takeaway is clear: do not trust a generic safety score.
Standard safety evaluations are missing the mark. They often rely on averages that hide critical vulnerabilities. For SMB leaders, the takeaway is clear: do not trust a generic safety score. You need to verify that your AI tools can withstand targeted, repeated attacks on specific high-risk tasks. To do this, SMBs should request detailed test results from their vendors that demonstrate resilience against these threats. Consider conducting scenario-based assessments that simulate potential attacks specific to your operations. These practical steps will enable you to make informed decisions about the security of your AI tools.
What is Agent Hijacking?Ā
Imagine using an AI-powered invoice bot, an HR chat assistant, or a virtual scheduling assistant embedded in your daily operations. These tools act like smart interns you've entrusted with access to sensitive company workflows, such as financial data, HR communications, and meeting schedules. Agent hijacking occurs when a malicious actor slips a secret note into your system, perhaps via email or chat, instructing the AI to ignore protocol and forward confidential information, such as financial reports or employee records, to an unauthorized party.
As NIST highlights, standard controls often fail to separate trusted instructions from untrusted data. To fix this, you need a governance layer that sits between your data and your models. Airia AIās Enterprise AI Orchestration Platform delivers those comprehensive security controls, ensuring your AI initiatives are protected by industry-leading architecture from day one.
Because AI models often struggle to distinguish your trusted instructions from untrusted data they process, they can be tricked into executing malicious commands. To test this weakness, NISTās Center for AI Standards and Innovation (CAISI) executed a series of deep tests on this phenomenon. Here is what they found.
Why Standard Tests FailĀ NIST discovered that off-the-shelf safety tests provide a false sense of security. These tests often miss several key safeguards:Ā
They do not consistently account for personalized attack scenarios that are more likely in targeted environments.Ā
They rely heavily on single-use attack success rather than testing repeated attempts, which real-world threats would employ.Ā
They overlook high-risk activities, such as code execution or data export, focusing instead on basic, less dangerous threats.Ā
NIST found that averages hide critical vulnerabilities. You cannot protect what you cannot see. Tenable empowers security teams to see their entire attack surface, from on-prem to cloud, illuminating those hidden weaknesses so you can prioritize threats before an agent gets hijacked.
By understanding these gaps, SMB leaders can better equip themselves to address vulnerabilities posed by current AI safety tests. The next steps should include updating testing protocols to incorporate personalized attack scenarios and simulations of repeated attempts. Security teams should also consider adopting advanced monitoring tools to detect and respond to potential threats more efficiently. Examples of such tools include network intrusion detection systems and AI-driven anomaly detection solutions. Additionally, implementing regular staff training on recognizing sophisticated phishing tactics is crucial. Simulated phishing exercises and interactive training modules can effectively prepare employees to identify and mitigate these threats.
Red Teaming is Essential: When NIST used standard tests on a top-tier model (Claude 3.5 Sonnet), the attack success rate was only 11%. But when they brought in human āred teamsā to design custom attacks, that success rate skyrocketed to 81%.
Averages Hide Danger: A model might look safe āon average,ā but that number is misleading. NIST found that while an agent might resist a basic phishing attempt, it could still be highly vulnerable to more damaging tasks, such as executing malicious code.
Most benchmarks only test if an attack works on the first try. Real hackers try until they succeed. To illustrate, imagine a junior cyber-criminal who is eager to make a name for themselves in the underground world. With ample weekend hours and persistence, this threat actor will systematically try different methods until they breach your system. When NIST allowed for multiple attempts in their tests, the average success rate jumped from 57% to 80%. This highlights the importance of recognizing and defending against persistent threats.
Real hackers are persistent. Even if your AI agent blocks the first attempt, you need an infrastructure that stops the breach when the second attempt succeeds. CrowdStrike Falcon leverages AI-native threat intelligence to keep you decisively ahead of these modern, AI-powered attacks, securing your endpoint and identity layers at scale.
What This Means for Corporate Security Teams: AsĀ you integrate AI into your operations, it's crucial to maintain a discerning approach to the security assurances vendors provide.
Given the scale and complexity of your operations, tailored security strategies are vital. Assess how these tools perform in scenarios that pose the greatest risks to your specific environment.
To empower SMB leaders in vendor conversations, consider asking questions such as:
How do you ensure the AI tools can withstand targeted attacks specific to our operations?
Can you provide detailed test results demonstrating resilience against the types of threats we face?
How often do you update your security protocols in response to emerging threats?
These questions will help you assess the vendor's understanding of the security landscape and their commitment to protecting your business.
Evaluate Real Risks Beyond the Average: When a vendor claims their tool is '90% safe,' scrutinize the remaining 10%. Ask targeted questions about performance in high-risk scenarios, such as unauthorized access to sensitive data or sophisticated code-execution attacks.
Plan for Persistence: Develop your security protocols assuming a persistent threat actor. Even if an AI tool successfully blocks an initial attack, it must remain resilient to subsequent attempts.
Incorporate Human Oversight: Implement controls that require human intervention for high-stakes operations. For actions like significant financial transactions or large-scale data exports, ensure that a human must authorize the AI's decisions, reinforcing the importance of oversight in your security protocols. For instance, establish an approval workflow where an employee reviews and approves any action flagged as high risk by the AI. This might include notifications to relevant managers or teams and the requirement of a digital signature before proceeding. By visualizing these human-in-the-loop controls, leaders can better integrate them into their existing operational frameworks.
Building your own custom AI agents? Ensure your developers have the best tools to build and debug 10x faster. Blackbox AI integrates over 300 AI models directly into VS Code, helping your team iterate securely and efficiently.



