Minor edits to AI skills can make agents go rogue

Jump to main content

REG AD

AI + ML

Minor edits to AI skills can make agents go rogue Text is the new attack

Thomas Claburn Thomas Claburn

Senior reporter

Published fri 22 May 2026 // 22:37 UTC

The adoption of AI agents has expanded the potential attack surface beyond code to natural language text.AI agents – models wrapped in software that can use tools and perform multi-step tasks – often take direction from text-based skills. And researchers have demonstrated that skills can be weaponized."Many agent frameworks allow users to install skills from online registries so the agent can discover and use new capabilities on demand," said Soheil Feizi, computer science professor at the University of Maryland (UMD) and founder/CEO of RELAI.ai, in a social media post. "This is powerful, but it also creates a new attack surface."

REG AD

Skills, Feizi explains, are not just code or dependencies. They're also text instructions that tell agents what to do.

REG AD

Skills, written out in a SKILL.md file, consist of text prompts with other data and resource references (e.g. URLs). They may get added to a user's initiating prompt and pre-existing system prompts, all of which get fed to a model for a response. Typically, this happens when the user wants the model to perform a specific task that has been spelled out in a skill file, like conducting a code quality review. MORE CONTEXT Megalodon chums the waters in 5.5K+ GitHub repo poisonings

Datacenter builders face an impossible quandary: Demand to the left of me, protests to the right

As memory prices squeeze enterprise buyers, Lenovo laughs all the way to the bank

Microsoft lets users exile floating Copilot button after interface rage

When a model's prompt – the combination of user input, instructions within skills, and system prompts – gets modified inadvertently or adversarially, that's prompt injection. That can happen directly, if for example, a user submits a prompt that directs the model to ignore prior instructions. It can also happen indirectly, if for example, an AI agent visits a website and processes text on a page that the underlying model interprets as an instruction. A skill can effectively act as user-authorized prompt injection. And agents may also automatically retrieve and load third-party skills if their descriptions appear relevant to the task being pursued. And therein lies the problem.The risk posed by skills has already been documented. In February, security biz Snyk found that 13.4 percent of skills on ClawHub and skills.sh (about 534 out of 3,984) "contain at least one critical-level security issue, including malware distribution, prompt injection attacks, and exposed secrets."In a preprint paper titled "Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry," Feizi and UMD co-authors Shoumik Saha and Kazem Faghih examine the role that skill registries play in the distribution of malicious skills. Specifically, they look at how adversarial skills get discovered, selected, and vetted before execution."An attacker may not need to hide malware in executable code," Feizi said. "Small semantic changes to a skill description can affect how the skill is discovered in a registry, whether an agent selects it over alternatives, and whether it passes governance or safety checks."Those details matter, he argues, because the selection process may be automated – software agents like OpenClaw have the ability to fetch and use third-party skills.The text that influences tool discovery and usage thus has security implications, which may not be addressed by traditional security scanning mechanisms that focus on code.

REG AD

The three co-authors show that short 20-token triggers can be added to a SKILL.md file to influence the chance an agent will discover it in a registry, to influence the chance an agent will select that skill, and to avoid detection through semantic evasion strategies.In terms of discovery, the researchers demonstrated they could induce an agent to discover their skill over an unaltered source skill 86 percent of the time. They also succeeded in making an agent select their skill over variants 77.6 percent of the time. And they were able to evade registry scanning defenses between 36.5 percent and 100 percent of the time.The most successful strategy for evading detection was to overflow the context window of the scanner – making the skill too long for the scanner to handle. "In ClawHub-style review, only the first 10K characters of long SKILL.md files are passed to the LLM reviewer, so we place the malicious instruction beyond this boundary while keeping it in the submitted skill," the authors explain."Our work shows that protecting agents requires treating natural-language specifications as security-sensitive objects," said Feizi. "We hope this encourages more careful design of skill registries, ranking mechanisms, governance pipelines, and agent-side defenses."Source code and supporting documentation have been published on GitHub. ®

cybersecurity artificial intelligence agents ai + ml ai university of maryland prompt injection skill registries

REG AD

Ucell and ZTE complete large-scale deployment of AI‑Powered green network solution in Uzbekistan

Network-wide rollout boosts energy efficiency by 10.6%, cutting carbon emissions and operational costs without compromising user experience

SaaS

The SaaS-pocalypse can wait, Salesforce still has customers where it wants them

AI coding agents may make software cheaper to build, but switching off major platforms remains expensive, risky, and deeply annoying

ZTE Day Indonesia 2026 strengthens AI innovation and digital infrastructure collaboration to accelerate Indonesia's digital transformation

The annual tech showcase highlights next-gen AI, cloud, and future-ready ICT solutions while uniting ecosystem partners to build the foundation for the nation's AI era

Personal Tech

HP customer claims firmware update shoved printer off support cliff

Internal notes point to cloud connectivity woes for older OfficeJets, though company denies systemic issue

Columnists

Utah tells porn sites to take the P out of VPNs, and it's their fault that they can't

Governments can't touch VPNs technically or commercially. The mess they'll make if they try will be off the scale

Systems

EU's digital sovereignty boo-boo may be the best thing to ever happen to the project

DIY or die. Just don't let the CIA bu

Minor edits to AI skills can make agents go rogue

Minor edits to AI skills can make agents go rogue

Related Articles

Huawei's chip law looks less like Moore and more like marketing

Which package is bloating your Docker image?

Digital sovereignty, the musical: One engineer’s bizarre crusade against hyperscalers

Comments