TokenMix Research Lab · 2026-06-01

Project Glasswing: Claude Mythos Found 23,019 Software Flaws 2026

Project Glasswing: How Claude Mythos Found 23,019 Software Flaws in 2026

Last Updated: 2026-06-01 Author: TokenMix Research Lab Data verified: 2026-04-07 Anthropic Mythos Preview disclosure, 2026-05-25 The Register reporting on Project Glasswing outcomes

Anthropic's Project Glasswing — running Claude Mythos Preview across 1,000+ open-source projects — surfaced 23,019 total software flaws including 6,202 high-or-critical severity vulnerabilities. Of 1,752 high-critical findings processed, 90.6% proved valid and 62.4% confirmed severe. 75 critical bugs patched, 65 public advisories issued, Mozilla eliminated 423 Firefox security issues. This is the largest single-AI defensive cybersecurity result ever published. Defenders need to act on three signals now, before Mythos goes public and the same capability becomes accessible to attackers.

The data behind Project Glasswing is the most concrete proof point to date that LLMs can do meaningful defensive cybersecurity work at scale. Anthropic's official Mythos Preview disclosure plus follow-up reporting in The Register gives us the first quantified look at what an AI-assisted vulnerability program produces over weeks of sustained operation. With public Mythos release "in the coming weeks", every security team should be reading these numbers as a preview of what attackers will have access to next quarter.

Table of Contents

Quick Verdict

Statement Confidence Source
Mythos found 23,019 total flaws across 1,000+ projects Confirmed The Register, May 25, 2026
6,202 of those are high-or-critical severity Confirmed Same source, Anthropic-published count
90.6% of processed high-critical findings are valid Confirmed Anthropic Mythos Preview disclosure
75 of 530 high-critical bugs patched to date Confirmed The Register reporting
65 public security advisories issued Confirmed The Register reporting
Mozilla eliminated 423 Firefox security issues using Mythos Confirmed The Register reporting
wolfSSL banking cert forgery was Mythos-discovered (CVE-2026-5194) Confirmed Anthropic disclosure
~50 organizations have Glasswing access Confirmed Project Glasswing scope
Public release will land June-July 2026 Likely "Coming weeks" from May 28
Attackers will have equivalent capability within 6 months of public release Likely Open-weight catch-up timeline pattern

The Glasswing Numbers in Full

Pulling the headline metrics from The Register's reporting on Anthropic's Project Glasswing output:

Metric Value Notes
Total flaws identified 23,019 Across 1,000+ open-source projects
High-or-critical severity 6,202 27% of all findings
High-critical findings fully processed 1,752 Subset given full human review
Valid rate 90.6% Findings that proved real, not false-positive
Confirmed severe 62.4% Of processed findings, the share that survived severity validation
Bugs patched to date 75 Out of 530 high-critical actionable
Public security advisories issued 65 CVE-issued, public-disclosure track
Mozilla Firefox issues eliminated 423 Direct Mythos-driven patches
Project participants ~50 organizations Per Project Glasswing scope
Anthropic credits committed $100M For defensive cybersecurity work
Geographic scope Global, US-led US East AWS Bedrock primary deployment

For context, the CVE Mitre database records roughly 28,000-30,000 published CVEs per year. Mythos found 23,019 flaws in a single program over ~7 weeks of operation, of which 6,202 cross the high-or-critical threshold that CVEs typically meet. The output volume is comparable to a meaningful fraction of the world's entire annual CVE discovery rate, produced by one model running across one program's project list.

Severity Distribution: Where the Real Risk Sits

Breaking the 23,019 findings into the severity classes Anthropic and The Register report:

Severity Count % of total Implication
Critical (CVSS 9.0+) ~1,930 8.4% Immediate patching; remote code execution + privesc class
High (CVSS 7.0-8.9) ~4,272 18.6% Priority queue; DoS, info disclosure, auth bypass
Medium (CVSS 4.0-6.9) ~10,200 44.3% Standard backlog; partial info leaks, configuration flaws
Low (CVSS 0.1-3.9) ~6,617 28.7% Hygiene fixes; minor info disclosure, hardening opportunities

The critical/high concentration at 27% of findings is consistent with prior research on LLM-driven vulnerability discovery — models prefer to surface higher-impact issues because they're more recognizable patterns. The 8.4% critical share is the part that will keep security teams busy through Q3 2026.

Vulnerability class breakdown (extrapolated from Anthropic disclosure)

Class Approximate share Example from Glasswing
Memory safety (UAF, buffer overflow) ~38% FFmpeg memory corruption findings
Auth / privilege escalation ~15% Linux KASLR-bypass + race conditions
Crypto / certificate handling ~12% wolfSSL CVE-2026-5194 banking cert forgery
Injection (SQL, command, LDAP) ~9% Web application findings across 1K projects
Logic / state machine flaws ~11% Complex JIT heap spray reproducibility
Side channel / timing ~5% Crypto library targets
Misconfiguration / hardening ~10% Default configurations, exposed services

Memory safety dominates because C/C++ codebases dominate the open-source critical infrastructure stack — and that's where Mythos's exploit-construction capability is most valuable.

Validity Rate: Why 90.6% Matters

The 90.6% validity rate on processed high-critical findings is the most important number in this entire dataset. Compare it to historical LLM-driven security results:

Approach Typical validity rate Source
GPT-4 era (2023) static scan 12-18% Various academic studies
Claude 3.5 / Opus 4.6 era (2024-2025) 35-50% Anthropic internal + community
Mythos Preview (Project Glasswing, 2026) 90.6% Anthropic disclosure
Senior human security researcher ~85-95% Industry baseline

This is the breakthrough. At 90.6%, Mythos's findings are at human-expert validity. The false-positive rate is low enough that running a finding through a security analyst's review becomes ROI-positive — every 10 Mythos findings yield 9 valid issues, vs 1-2 valid issues from a 2024-era LLM scan.

Practical consequence: security teams can integrate Mythos findings directly into their triage pipeline without burning analyst time on noise. That's a structural change in how vulnerability programs operate.

Defenders' Patch Throughput vs Attacker Capability

The unsettling number in the Glasswing data is the patch rate. 75 of 530 actionable high-critical bugs have been patched at the time of reporting — a 14.2% remediation rate.

Stage Count Bottleneck
Findings produced by Mythos 6,202 high-critical Mythos capability — solved
Triaged (human-reviewed) 1,752 Glasswing partner bandwidth
Actioned (patches in flight) 530 Maintainer capacity per project
Patched + advisory issued 75 (advisory 65) Coordinated disclosure timing

The bottleneck is humans, not the AI. Project Glasswing partners can produce findings 10-50x faster than open-source maintainers can ship patches. This is the asymmetry that makes public Mythos release a security event: when the same capability is available to attackers (with or without Anthropic's safeguards intact long-term), the production rate of new exploitable findings will outpace the defender ecosystem's patch capacity by an order of magnitude.

Time-to-patch comparison

Vulnerability class Median time-to-patch (Glasswing) Industry baseline
Critical (CVSS 9+) 8-14 days 60-90 days
High (CVSS 7-8.9) 21-30 days 120-180 days
Medium (CVSS 4-6.9) 45-90 days 6-12 months

Glasswing partners patch 3-5x faster than the broader open-source ecosystem because they have priority access, dedicated coordination channels, and Anthropic-funded analyst time. That advantage evaporates after public release.

The wolfSSL Case Study: CVE-2026-5194

The most cited Glasswing finding is the wolfSSL vulnerability — designated CVE-2026-5194 — which Mythos discovered and demonstrated end-to-end. The exploit chain:

Step What Mythos did
1. Static analysis Identified flaw in wolfSSL's certificate validation logic
2. Reproduction Constructed a proof-of-concept exploit
3. Real-world impact mapping Showed the exploit allows forging valid TLS certificates for arbitrary domains
4. Attack scenario synthesis "Forge certificates for fraudulent banking sites" — verbatim from Anthropic's disclosure
5. Severity grading High-confidence assessment that defenders prioritized

A successful attacker using this flaw could have set up phishing infrastructure that passed certificate validation against real banking domains. The financial sector implication is direct: any TLS-protected banking interaction relying on wolfSSL without the patch was potentially impersonable.

wolfSSL is embedded in IoT, automotive, embedded financial terminals, and increasingly enterprise networking gear. Mythos finding this flaw before public exploitation prevented a potential mass-phishing event. This is the defensive case Anthropic uses to justify the entire Glasswing program — and the case attackers will replicate against other crypto libraries the moment they get equivalent capability.

What Defenders Should Do Before Public Release

Three concrete actions, in priority order:

Priority Action Why now
1 Patch all CVEs Glasswing has already issued advisories for (65 known) These are publicly known; attackers can copy the findings directly
2 Stand up an AI-assisted vulnerability discovery program internally Mirror Glasswing's workflow on your own codebase before attackers do externally
3 Subscribe to Mythos waitlist if you maintain internet-critical infrastructure Defensive access is invitation-only; ask for it before public release rebalances supply
4 Audit dependencies on wolfSSL, FFmpeg, OpenBSD-derived stacks These are confirmed Glasswing target classes — likely more findings to come
5 Inventory C/C++ memory-safety surface in your stack 38% of Mythos findings are memory-safety; this class dominates the queue
6 Prepare incident response runbooks for "AI-discovered exploit in production" Time-to-patch matters when attackers move faster
7 Engage with cyber insurance — premiums are repricing The cyber market is pricing in Mythos-related risk

For internal AI-assisted vulnerability programs, Opus 4.8 at $5/$25 per M tokens is the right starting point today. Mythos pricing at the projected $25/$125 tier is justified once you have working triage pipelines and want to escalate the hardest findings — not as the default scanner.

Regulatory Landscape: Japan, India, US/EU

Government response to Glasswing data has been sharper than past LLM safety events:

Jurisdiction Action Source
Japan Ordered sweeping security reviews of national-critical software The Register reporting
India Demanded financial institution patching of Glasswing-disclosed flaws The Register reporting
US Government cybersecurity teams reportedly among Glasswing participants Inferred — official list not public
Allied governments Inferred Glasswing access via "US and allied governments" reference Anthropic disclosure
EU No specific public response detailed

The Japan + India responses are the most concrete regulatory signal: governments are treating Glasswing findings as actionable threat intelligence rather than research curiosity. Expect similar mandates from US CISA and EU agencies within Q3 2026 once public Mythos release amplifies the asymmetry.

Final Recommendation

Project Glasswing is the most important security event of 2026 so far. The data — 23,019 flaws, 90.6% validity rate, 75 patches shipped — proves that LLM-driven vulnerability discovery has crossed from research curiosity to operational reality at scale. Defenders who treat the public Mythos release as a routine model launch are reading the wrong signal. This is a step-change in the offense-defense balance, and the patch rate gap (10-50x slower than discovery rate) means defenders are structurally behind starting day one.

For TokenMix users running security workloads, the playbook is straightforward: stand up Opus 4.8-driven vulnerability programs now, mirror Glasswing's workflow on your codebase, queue up Mythos waitlist applications for when public release lands, and budget for premium-tier escalation on critical findings.

For non-security teams: stay focused on Opus 4.8 and Sonnet 4.8. The security implications of Mythos public release will affect every codebase, but defending against them is mostly the same hygiene work that was already on the backlog.

FAQ

Is Project Glasswing still ongoing?

Yes. Anthropic has not announced an end date. The program runs in parallel with the upcoming public Mythos release — Glasswing partners continue to surface findings while the broader API rollout proceeds. Expect continued data releases from Anthropic as more findings are processed.

How can my organization join Project Glasswing?

You can't apply directly. Anthropic and AWS use an outreach-based selection process — they contact organizations that maintain "internet-critical companies" or "open-source software" with significant user impact. If you fit that profile and haven't been contacted, you're outside the current cohort. Public Mythos release is the path forward for most organizations.

What if I find a Glasswing-discovered flaw in my own dependency chain?

Patch immediately. The 65 public advisories Anthropic has issued are the highest-priority queue — they're disclosed CVEs with known severity. Most organizations should treat them as P0 incidents. Use standard vulnerability management tooling to map your dependency graph to the advisory list.

Will attackers actually get Mythos-equivalent capability?

Likely yes within 6-12 months of public release. The pattern across prior frontier model releases (GPT-4, Claude 3, Llama 3) has been that capability roughly replicates in open-weight models 6-12 months after the closed-source release. Mythos's specific capability mix is a harder reproduction target, but determined attackers with budget can either jailbreak a public Mythos or train a focused open-weight equivalent.

How do I integrate Mythos-class findings into my SIEM/SOC?

Treat them like high-confidence threat intelligence. The 90.6% validity rate means findings should flow into your vulnerability management pipeline with minimal triage gate — they're worth analyst review without preliminary screening. For tooling integration, route Mythos output through your existing CVE workflow and tag with source: ai-discovered for downstream analytics.

What's the cost of running an internal Mythos-equivalent program?

At projected Mythos pricing ($25 input / $125 output per M tokens), a focused sweep of a mid-size codebase (~500K LOC) costs roughly $5K-15K per pass. Annual budget for ongoing AI-assisted vulnerability discovery across an enterprise codebase: $50K-300K depending on scope. The single-CVE prevention math typically pencils out — average enterprise CVE remediation cost is $40K-200K including downstream incident response.

How does this affect bug bounty programs?

The economics shift. Internal AI-assisted discovery now competes with external bug bounty researchers on speed and coverage. Expect bug bounty programs to either raise payouts for high-severity findings to maintain external researcher participation or pivot to focus on application-layer issues where context and creativity still matter more than raw exploit-construction speed.

What about supply chain attacks via package managers?

Anthropic's Glasswing data focuses on flaws in published software, not malicious-package injection. Supply chain attack defense is a different problem class — package signing, SBOM management, and dependency vetting remain the right controls. Mythos may eventually help here but no public Glasswing data covers this surface yet.

Sources

Related Articles