The Mythos Illusion: Why Real Cyber Threat Actors Don’t Need a Gated Model
While the security industry fixates on a single locked lab, the distributed frontier of local models, dark LLMs, and agentic workflows is already rewriting the rules of cybercrime.
Anthropic’s Claude Mythos Preview deserves to be taken seriously. A gated frontier model reportedly capable of finding, reasoning about, and in some cases helping exploit serious vulnerabilities in major software systems is not a trivial development. If the strongest claims hold, Mythos may change the economics of high-end vulnerability research1.
But Mythos did not create the AI cyber threat, nor does it create a new threat category. It has concentrated attention around a capability frontier that was already fragmented, distributed, and accelerating. That distinction is important to note.
The public story around Mythos is clean: a powerful new model appears, access is restricted, defenders are alarmed, and the security industry asks whether we have crossed a dangerous threshold. The operational reality is messier. Practical AI-enabled cyber risk was already spreading through open model ecosystems, malicious LLM services, commercial model misuse, local deployment, and agentic workflows.
Mythos may sharpen the frontier. But it does not explain why AI-enabled cybercrime is already becoming cheaper, faster, more multilingual, and easier to automate.
Last year, in The Defensive Revolution: A New Defensive Mandate, I argued that the democratization of AI cuts both ways. It lowers barriers for threat actors, but it also gives defenders access to the same class of adaptive, automated, machine-speed capabilities. I called this the Asymmetric Mirror Effect: when we focus intensely on breakthrough innovation from adversaries, we risk forgetting that defenders are adapting through the same technological lens.
That is the right frame for Mythos.
Project Glasswing is not merely an attempt to contain a dangerous model2. It is also an attempt to route frontier capability toward defenders first, giving major infrastructure owners a chance to use agentic workflows, predictive threat intelligence, behavioural anomaly detection, automated testing, and response automation before similar techniques diffuse more widely.
The question is not whether AI favours offence or defence. The better question is which side adapts faster to the new operational reality.
The mythology surrounding Mythos has landed exactly where one would expect: between genuine concern, vendor spectacle, and a familiar security industry reflex to turn every new capability into an existential threshold. The more useful question is not whether Mythos is powerful. It is. The better question is whether the operational threat landscape was waiting for Mythos at all. Public hype aside, Mythos does not create a new category of cyber risk. It matters because it concentrates, accelerates, and operationalizes capabilities that were already diffusing across the wider AI ecosystem.
The frontier is jagged, not clean
The cleanest Mythos narrative is also the least useful one: a single, gated frontier model becomes so capable that it must be treated as a monolithic cyber weapon. That story is emotionally compelling, but the evidence points to a more uneven reality. The capability frontier is jagged, not smooth.
In traditional human skill acquisition, we expect a relatively coherent progression. If a researcher can chain complex exploits across unconstrained environments, we safely assume they can also spot a simple buffer overflow. AI defies this logic. It creates a “Swiss cheese” model of competence, where a system might perform at PhD level on one task while failing at a primary school level on the next. That unevenness is separate from the familiar problem of hallucination.
Mythos should not be dismissed. A model that can perform cold vulnerability discovery, reason through exploitability, and chain bugs together under sparse conditions is strategically significant but gives us an illusion of uniform power. If Anthropic’s strongest claims hold, Mythos belongs in the category of systems that shift the economics of high-end research.
However, public replication work complicates this mythology. Some showcased Mythos-style analysis appears recoverable by smaller, cheaper, open-weight models when the task is framed correctly, and the code context is narrow.3 In some reported cases, models with only a few billion parameters have identified serious vulnerabilities when enough surrounding scaffolding is supplied.4
This does not prove Mythos is overhyped. It proves that the frontier is jagged. A useful way to understand that jaggedness is as “peaks of reasoning and valleys of context.” The distinction lies in where those uneven edges fall:
Commoditizing ‘peaks’: Verifying or reconstructing a known vulnerability or finding a bug when the answer is partially embedded in the prompt or local context. This is becoming cheap, repeatable, and widely distributed.
Frontier-dependent ‘valleys’: Fully autonomous cold discovery, reliable exploit development, and sustained reasoning across large, messy, real-world systems. These remain harder, less reliable, and more dependent on frontier capability.
This jaggedness is essential for the security community to understand: a model’s success in a controlled benchmark is a peak, not a plateau. It does not guarantee reliability when the environment shifts.
The right question is not: “Can Mythos hack everything?”
The right question should be: “Which specific parts of the vulnerability research workflow are becoming automated, commoditized, and distributed?”
That question leads us away from viewing Mythos as a singular, mystical object and toward a broader analysis of model access, tool use, automation, and the changing economics of remediation. It also brings several operational realities into focus.
Reality 1: Safety is now a temporary state.
The first operational reality is that “safe” model release is increasingly temporary.
Once a strong open model appears, community variants often follow quickly on platforms such as Hugging Face: fine-tunes, merges, quantized builds, roleplay variants, jailbreak-focused releases, and sometimes models with weakened refusal behaviour. We have moved beyond simple prompt jailbreaks, where users try to trick a model into ignoring its safety rules through clever wording. Techniques such as abliteration go deeper. Instead of attacking the prompt layer, they attempt to modify how the model behaves internally by weakening the patterns associated with refusal. In plain terms, the aim is to make the model less likely to say “I can’t help with that,” while preserving as much of its underlying capability as possible.5 6
These techniques are not perfect. They are not lossless. Some models become brittle, unstable, or more willing than capable. But for many practical malicious tasks, the degradation may be modest enough that the model remains operationally useful.
And that is the key point: a criminal does not need Mythos or frontier-level reasoning to write credible phishing lures, translate scams cleanly, generate commodity scripts, summarize stolen documents, produce social engineering variations, or debug low-grade malware. A model that preserves enough practical capability while reducing refusal behaviour is already useful.
The deeper issue is local deployment. Once a model is downloaded and run on private hardware, the attacker is no longer negotiating with a provider’s safety system, rate limits, abuse monitoring, or logging. The model may be inferior to a frontier system, but it is available, modifiable, scriptable, and private. That combination is often more valuable to an attacker than benchmark superiority. The obsession with frontier capability can obscure the operational value of “good enough.” For many malicious workflows, the decisive question is not whether the model is the best in the world. It is whether it is capable enough, unrestricted enough, and private enough to be useful.
This is why safety must be understood as an operational condition, not a static property of the model. Conflating the two creates a false sense of control. A model’s safety posture depends on access controls, release governance, monitoring, tooling, deployment context, and the broader ecosystem around it. Once those conditions change, the safety posture changes with them.
Reality 2: Dark LLMs are not brilliant. They are frictionless.
The second operational reality is the professionalization of malicious AI tooling.
WormGPT and FraudGPT were already being marketed to criminals in 2023. Since then, researchers have tracked newer malicious or offensive-purpose systems, including later WormGPT-branded variants and other underground tools.7 These systems sit in a grey zone between criminal enablement, opportunistic rebranding, research leakage, and underground marketing.8
This sober view is necessary because the market for dark LLMs is largely defined by mediocrity. Many are scams or thin wrappers around existing models, better suited to extracting subscription fees from underground forums than facilitating sophisticated intrusions. In a landscape defined by recycled tooling and exaggerated claims, the technical capability often trails far behind the marketing.
But that does not make these systems irrelevant, as discussed above. Their value is not that they outperform frontier models. Their value is that they are frictionless. A user does not need to argue with a safety layer, construct elaborate jailbreaks, or repeatedly reformulate prompts. The interface is already oriented toward abuse. The user experience is optimized for phishing, fraud, malware assistance, evasion language, and low-level automation. For low-skill and mid-tier actors, that is what truly matters: not perfection.
Many real attacks do not fail because the attacker lacks a browser zero-day. They fail because the attacker lacks fluency, patience, operational discipline, or the ability to customize at scale. Financially motivated attackers usually optimize for throughput, convenience, and return on effort rather than technical perfection. LLMs reduce exactly those frictions. They produce cleaner language, generate variants quickly, adapt tone to different targets, and help novices stitch together code and commands they only partially understand.9
The result is not a sudden army of elite hackers. It is a larger population of mediocre actors who can now operate with fewer visible mistakes. That is enough to matter.
Reality 3: The dangerous endpoint is no longer the chat box.
The Mythos debate misleads most when it treats model intelligence as the decisive variable. In practical cyber operations, the dangerous endpoint is no longer a chat box. It is an agentic loop and its workflow.
A chat model answers questions. An agentic system can gather context, call tools, write files, execute commands, interact with APIs, inspect results, and iterate.10 Even when the underlying model is imperfect, the workflow can compensate through repetition and feedback. This is where “workflow over weights” becomes the better frame.
Attackers do not necessarily need the smartest model. They need a controllable model connected to the right tools, operating with little supervision, able to iterate cheaply, and unconstrained by meaningful oversight.
In an agentic workflow, target intelligence can flow into tailored lure generation, code drafting, sandbox testing, automated revision, infrastructure preparation, credential handling, operator reporting, and follow-on tasking. None of this requires a single omnipotent model. The system becomes powerful because it is connected, persistent, and iterative.
This also means the practical threat model is hybrid.
Attackers will not simply abandon commercial frontier systems for local models. Local and derestricted models are attractive when attackers want privacy, persistence, customization, and freedom from provider oversight. Commercial frontier systems remain attractive when attackers want higher capability and can obtain access through stolen accounts, reseller arrangements, compromised tenants, or plausible defensive workflows.
The lesson for us here is not that every attacker will run a local model. Rather that attackers will use whatever model gives the best mix of capability, access, deniability, and cost for the task at hand.
This is where frontier models, local models, dark LLMs, automation frameworks, cloud tooling, and compromised credentials converge. The model is only one component. The workflow is the operational system.
Reality 4: The patching paradox.
There is a logical flaw in the “Mythos will destroy the internet” argument.
If AI can accelerate vulnerability discovery, defenders will naturally ask whether AI can also accelerate patch generation. The answer is: partly. But that answer requires qualification.
A model can suggest fixes, write candidate patches, generate regression tests, explain exploitability, and help triage severity, which is valuable. But remediation is not just code generation.
Patches must be reviewed, tested, integrated, deployed, and monitored. They must not break production systems, introduce regressions, or create new vulnerabilities. In large enterprises and open-source ecosystems, the bottleneck is often not the initial discovery of a flaw. It is prioritization, ownership, validation, maintenance, and deployment.
This is where Project Glasswing deserves more credit than a cynical reading allows.
If Anthropic’s capability claims are even partly correct, routing a system like Mythos to major infrastructure owners is a coherent defensive strategy. Major cloud providers, platform vendors, operating system maintainers, and open-source foundations collectively influence a huge share of the global software attack surface. For the infrastructure they cover, a controlled preview may genuinely help route discovery into the hands of those most able to remediate.
But that is also the limit of the model.
Glasswing may work for the infrastructure it covers. The harder question is what happens outside that coverage zone. The patching paradox hits hardest among organizations with weak asset visibility, limited engineering capacity, thin security teams, long deployment cycles, or complex software supply chains. These organizations are least likely to receive early access to frontier defensive tooling, least likely to have adopted AI-native security practices, and most likely to be overwhelmed when vulnerability discovery accelerates.
This is the ‘Asymmetric Mirror Effect’ in practice. The same capability that could accelerate exploit discovery can also accelerate triage, patch generation, regression testing, and defensive prioritization. The constraint is not only model capability. It is whether institutions can absorb machine-speed discovery into human-governed remediation pipelines.
A future in which AI finds more bugs is not automatically safer. It is safer only if disclosure, triage, ownership, patch engineering, deployment, and compensating controls scale with discovery.
The more immediate danger is even simpler: attackers do not need zero-days when known vulnerabilities remain unpatched. An AI-assisted operator can scan, classify, prioritize, and exploit poor hygiene at machine speed.
The boring failures still matter: exposed services, weak credentials, stale VPN appliances, vulnerable edge devices, forgotten cloud permissions, and unmonitored APIs. Mythos may sharpen the frontier but “commodity AI” will continue to harvest the basics.
Reality 5: Securing the agentic surface.
If the threat is distributed and agentic, the defence must move beyond model panic.
The first shift is detection.
Defenders should not rely on AI-written text detection as a primary control. It is too brittle on its own. But artifact signals still matter in the human-facing layer. Email similarity, sender infrastructure, linguistic anomalies, impersonation patterns, attachment traits, voice provenance, and campaign clustering can all contribute useful evidence.
For intrusion activity, the stronger signals are behavioural: high-velocity enumeration, unusual API use, abnormal scripting, impossible user tempo, automated tool chains, and lateral movement patterns. The defensive shift is not from artifacts to behaviour everywhere. It is from single-signal detection to correlated evidence across content, identity, infrastructure, and execution.
The second shift is access control.
Enterprises should treat agents as privileged users. If an agent can read documents, call internal APIs, execute code, access SaaS systems, or trigger workflows, it is part of the identity and access management problem. Agents need least privilege, scoped credentials, logging, approval gates, and execution isolation. An agent with broad permissions is not a productivity feature. It is a new attack surface.
The third shift is assumption.
Defenders should assume local model presence. They should not build detection strategies around the idea that malicious AI usage will pass through monitored commercial APIs. Some adversaries will use hosted services. Others will use local models, derestricted variants, underground tools, or private fine-tunes. Detection must focus on infrastructure, tradecraft, access, and effects, not merely the source of the intelligence.
The fourth shift is vulnerability management.
If systems like Mythos become common among trusted defenders, organizations will receive more findings, not fewer. They will need better triage, asset inventory, exploitability analysis, compensating controls, and patch deployment discipline. AI may help generate fixes, but institutions still need to decide what matters, who owns it, and how quickly it can be safely changed.
This is where many organizations are underprepared. They are focused on whether AI can find vulnerabilities, but not whether their governance, engineering, and operational processes can absorb machine-speed discovery.
Glasswing and the ‘security poverty gap’
These five operational realities point to a deeper structural problem: access to frontier defensive capability will not be evenly distributed.
Project Glasswing may be a rational response to Mythos, but it is also structurally selective. It gives early access to organizations already closest to AI-native security maturity, while the wider market continues to face cheaper, faster, and more automated AI-enabled threats.
Although there has been speculation that Project Glasswing is “pay-to-play,” there is no public evidence that partners must commit to future high spending with Anthropic. But the program clearly creates a commercial pathway. Anthropic subsidizes early usage with credits, restricts access to high-value infrastructure and security organizations, and then transitions to premium usage-based pricing after the preview period.11 Mythos may therefore function both as a defensive cybersecurity initiative and as a strategically valuable customer development channel. For many smaller organizations, however, that pathway may remain out of reach.
This is where the Mythos debate intersects with the Security Poverty Trap.
Routing Mythos-style capability to major infrastructure owners is rational. The largest cloud providers, platform vendors, operating system maintainers, and open-source foundations influence a huge share of the global software attack surface. If frontier vulnerability discovery is going to be used defensively, it makes sense to place it first in the hands of organizations with the capacity to triage, patch, test, deploy, and coordinate at scale.
But the same logic that makes Glasswing defensible also exposes its limitation: the organizations most able to benefit from frontier defensive capability are the ones already best positioned to absorb it.
Glasswing is a program for organizations near the top of the defensive capability curve: firms with mature security teams, privileged vendor relationships, large engineering organizations, legal capacity, disclosure processes, telemetry, and operational muscle. They are precisely the organizations most able to convert frontier AI findings into defensive action. For everyone else, the gap may widen.
This is the Security Poverty Trap in practice. Organizations without AI-native security capabilities do not merely lag behind. They face a compounding disadvantage. AI-capable defenders improve continuously as their systems learn, their teams gain experience, and their institutional knowledge accumulates. Traditional organizations face faster, cheaper, and more automated threats while their own defensive posture changes slowly, if at all.
The same dynamic applies to Mythos. A frontier model may help major infrastructure owners discover and fix vulnerabilities earlier. But mid-tier SaaS vendors, under-resourced public bodies, industrial operators, small open-source maintainers, regional service providers, and bespoke enterprise software teams may not receive the same capability uplift. They will still face the downstream consequences of accelerated discovery, faster weaponization, and higher attacker automation.
This is not an argument against Glasswing. It is an argument against mistaking selective frontier access for ecosystem-wide resilience.
If AI vulnerability discovery accelerates, the winners will not simply be those with access to the best model. They will be those with the institutional capacity to absorb what the model finds. That means asset inventory, exploitability analysis, ownership, patch engineering, regression testing, deployment discipline, compensating controls, and governance processes that can operate under machine-speed pressure without collapsing into human-speed bureaucracy.
The danger is that Mythos strengthens the strongest defenders first, while commodity AI continues to strengthen ordinary attackers everywhere.
That is the poverty gap expanding. The organizations least likely to receive early access to frontier defensive tooling are often the same organizations most exposed to commodity AI-enabled attacks. They are also the organizations most likely to sit inside the supply chains of larger, better-defended institutions. This creates a systemic problem. The weakest nodes do not remain isolated. They become pivot points, staging infrastructure, and trusted pathways into more mature environments. In that sense, the Security Poverty Trap is not just a fairness issue or a market issue. It becomes a resilience issue.
A safer AI cyber future cannot depend only on giving frontier tools to frontier institutions. It also requires mechanisms that translate high-end discovery into usable defensive uplift for the long tail: maintainers, SMEs, public sector bodies, suppliers, and operators that lack the resources to build AI-native security alone.
Otherwise, Glasswing may help secure the castle while leaving the surrounding city increasingly exposed.
The threat was already distributed
Claude Mythos may matter most as a signal of where high-end AI security research is heading, rather than as a new threat category. Mythos’ genuine edge, if the public claims hold, is probably narrower than the mythology suggests: cold discovery, exploit construction, and sustained reasoning under difficult constraints.
That is a serious capability, and it must not be underestimated. But the everyday AI cyber threat was already here. It already lives in refusal removal, dark LLM services, commercial model misuse, local deployment, and agentic orchestration. Parts of it are publicly available, sometimes no further away than a download from Hugging Face. The threat is not one model. It is a distributed operating environment for abuse. Mythos may take parts of that environment to new levels by concentrating and accelerating capabilities that were already diffusing, but it did not create the environment itself.
The useful distinction is not “Mythos versus no Mythos.” It is frontier risk versus commodity risk.
The frontier risk is strategic: who controls access to high-capability systems, whether comparable models emerge elsewhere, whether gated systems leak, whether governments and vendors can coordinate remediation, and whether vulnerability discovery outpaces patch capacity.
The commodity risk is operational: more phishing, better scams, faster reconnaissance, easier malware modification, more convincing impersonation, cheaper automation, and a larger pool of actors able to do competent harm.
The mistake is to let the frontier story consume the whole debate.
If defenders focus only on the spectacular model(s) behind the guarded preview, they may miss the ordinary model already running in private workflows. They may secure access to the lab while leaving agents over-permissioned, APIs exposed, patch queues stale, and employees surrounded by AI-generated social engineering.
The most dangerous AI in cyber may not be the one with the most impressive benchmark or the most alarming press release. It may be the ordinary model running locally, connected to tools, iterating quietly, and embedded inside a workflow that never needed permission to exist.
https://www.anthropic.com/glasswing
Anthropic describes Claude Mythos Preview as an unreleased frontier model with advanced coding, reasoning, and agentic capabilities, and claims it has identified thousands of high-severity zero-day vulnerabilities, including vulnerabilities in major operating systems and web browsers.
Ibid: Project Glasswing is Anthropic’s defensive cybersecurity initiative providing selected partners and additional critical infrastructure organizations access to Claude Mythos Preview for defensive vulnerability discovery and remediation, supported by up to $100 million in usage credits and $4 million in donations to open-source security organizations
https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
AISLE tested Anthropic’s showcased Mythos vulnerabilities against smaller, cheaper open-weight models and argued that much of the public analysis could be recovered when the target and context were supplied. AISLE’s broader conclusion was that the defensible moat may lie more in the surrounding system and expertise than in the model alone
Ibid: AISLE reported that its FreeBSD detection test was solved by every tested model, including GPT-OSS-20B with approximately 3.6B active parameters. Radware’s subsequent commentary summarized this as evidence that smaller open-weight models could recover parts of the Mythos showcase analysis once the target was known.
https://doi.org/10.48550/arXiv.2406.11717
Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, and Neel Nanda, (2024). Refusal in Language Models Is Mediated by a Single Direction. arXiv.
https://huggingface.co/blog/mlabonne/abliteration
For a practical explainer of abliteration, see Maxime Labonne, “Uncensor any LLM with abliteration,” Hugging Face Blog. Labonne explains how to identify a model’s refusal direction and ablate it through inference-time intervention or weight orthogonalization, while noting associated performance trade-offs.
https://unit42.paloaltonetworks.com/dilemma-of-ai-malicious-llms
Palo Alto Networks Unit 42, “The Dual-Use Dilemma of AI: Malicious LLMs,” discusses WormGPT 4, KawaiiGPT, and the commercialization and democratization of malicious LLM tooling
https://cetas.turing.ac.uk/publications/generative-ai-cybersecurity
Mercer, S., & Watson, T. (2024, June). Generative AI in Cybersecurity: Assessing impact on current and future malicious software (CETaS Briefing Papers).
https://www.huntress.com/cybersecurity-101/topic/wormgpt
For a practitioner-oriented overview of WormGPT and related malicious LLM use cases, see Huntress, “What Is WormGPT?” The article discusses WormGPT’s use in business email compromise, phishing, malware assistance, multilingual social engineering, and cybercrime democratization.
https://red.anthropic.com/2026/mythos-preview
Anthropic’s Frontier Red Team write-up on Claude Mythos Preview describes using an agentic scaffold in vulnerability-finding exercises and provides examples of Mythos identifying and, in some cases, developing exploits for serious software vulnerabilities.
https://www.anthropic.com/glasswing
Anthropic states: “Anthropic’s commitment of $100M in model usage credits to Project Glasswing and additional participants will cover substantial usage throughout this research preview. Afterward, Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens (participants can access the model on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry).”



