TokenMix Research Lab · 2026-06-01

Project Glasswing: How Claude Mythos Found 23,019 Software Flaws in 2026
Last Updated: 2026-06-01 Author: TokenMix Research Lab Data verified: 2026-04-07 Anthropic Mythos Preview disclosure, 2026-05-25 The Register reporting on Project Glasswing outcomes
Anthropic's Project Glasswing — running Claude Mythos Preview across 1,000+ open-source projects — surfaced 23,019 total software flaws including 6,202 high-or-critical severity vulnerabilities. Of 1,752 high-critical findings processed, 90.6% proved valid and 62.4% confirmed severe. 75 critical bugs patched, 65 public advisories issued, Mozilla eliminated 423 Firefox security issues. This is the largest single-AI defensive cybersecurity result ever published. Defenders need to act on three signals now, before Mythos goes public and the same capability becomes accessible to attackers.
The data behind Project Glasswing is the most concrete proof point to date that LLMs can do meaningful defensive cybersecurity work at scale. Anthropic's official Mythos Preview disclosure plus follow-up reporting in The Register gives us the first quantified look at what an AI-assisted vulnerability program produces over weeks of sustained operation. With public Mythos release "in the coming weeks", every security team should be reading these numbers as a preview of what attackers will have access to next quarter.
Table of Contents
- Quick Verdict
- The Glasswing Numbers in Full
- Severity Distribution: Where the Real Risk Sits
- Validity Rate: Why 90.6% Matters
- Defenders' Patch Throughput vs Attacker Capability
- The wolfSSL Case Study: CVE-2026-5194
- What Defenders Should Do Before Public Release
- Regulatory Landscape: Japan, India, US/EU
- FAQ
Quick Verdict
| Statement | Confidence | Source |
|---|---|---|
| Mythos found 23,019 total flaws across 1,000+ projects | Confirmed | The Register, May 25, 2026 |
| 6,202 of those are high-or-critical severity | Confirmed | Same source, Anthropic-published count |
| 90.6% of processed high-critical findings are valid | Confirmed | Anthropic Mythos Preview disclosure |
| 75 of 530 high-critical bugs patched to date | Confirmed | The Register reporting |
| 65 public security advisories issued | Confirmed | The Register reporting |
| Mozilla eliminated 423 Firefox security issues using Mythos | Confirmed | The Register reporting |
| wolfSSL banking cert forgery was Mythos-discovered (CVE-2026-5194) | Confirmed | Anthropic disclosure |
| ~50 organizations have Glasswing access | Confirmed | Project Glasswing scope |
| Public release will land June-July 2026 | Likely | "Coming weeks" from May 28 |
| Attackers will have equivalent capability within 6 months of public release | Likely | Open-weight catch-up timeline pattern |
The Glasswing Numbers in Full
Pulling the headline metrics from The Register's reporting on Anthropic's Project Glasswing output:
| Metric | Value | Notes |
|---|---|---|
| Total flaws identified | 23,019 | Across 1,000+ open-source projects |
| High-or-critical severity | 6,202 | 27% of all findings |
| High-critical findings fully processed | 1,752 | Subset given full human review |
| Valid rate | 90.6% | Findings that proved real, not false-positive |
| Confirmed severe | 62.4% | Of processed findings, the share that survived severity validation |
| Bugs patched to date | 75 | Out of 530 high-critical actionable |
| Public security advisories issued | 65 | CVE-issued, public-disclosure track |
| Mozilla Firefox issues eliminated | 423 | Direct Mythos-driven patches |
| Project participants | ~50 organizations | Per Project Glasswing scope |
| Anthropic credits committed | $100M | For defensive cybersecurity work |
| Geographic scope | Global, US-led | US East AWS Bedrock primary deployment |
For context, the CVE Mitre database records roughly 28,000-30,000 published CVEs per year. Mythos found 23,019 flaws in a single program over ~7 weeks of operation, of which 6,202 cross the high-or-critical threshold that CVEs typically meet. The output volume is comparable to a meaningful fraction of the world's entire annual CVE discovery rate, produced by one model running across one program's project list.
Severity Distribution: Where the Real Risk Sits
Breaking the 23,019 findings into the severity classes Anthropic and The Register report:
| Severity | Count | % of total | Implication |
|---|---|---|---|
| Critical (CVSS 9.0+) | ~1,930 | 8.4% | Immediate patching; remote code execution + privesc class |
| High (CVSS 7.0-8.9) | ~4,272 | 18.6% | Priority queue; DoS, info disclosure, auth bypass |
| Medium (CVSS 4.0-6.9) | ~10,200 | 44.3% | Standard backlog; partial info leaks, configuration flaws |
| Low (CVSS 0.1-3.9) | ~6,617 | 28.7% | Hygiene fixes; minor info disclosure, hardening opportunities |
The critical/high concentration at 27% of findings is consistent with prior research on LLM-driven vulnerability discovery — models prefer to surface higher-impact issues because they're more recognizable patterns. The 8.4% critical share is the part that will keep security teams busy through Q3 2026.
Vulnerability class breakdown (extrapolated from Anthropic disclosure)
| Class | Approximate share | Example from Glasswing |
|---|---|---|
| Memory safety (UAF, buffer overflow) | ~38% | FFmpeg memory corruption findings |
| Auth / privilege escalation | ~15% | Linux KASLR-bypass + race conditions |
| Crypto / certificate handling | ~12% | wolfSSL CVE-2026-5194 banking cert forgery |
| Injection (SQL, command, LDAP) | ~9% | Web application findings across 1K projects |
| Logic / state machine flaws | ~11% | Complex JIT heap spray reproducibility |
| Side channel / timing | ~5% | Crypto library targets |
| Misconfiguration / hardening | ~10% | Default configurations, exposed services |
Memory safety dominates because C/C++ codebases dominate the open-source critical infrastructure stack — and that's where Mythos's exploit-construction capability is most valuable.
Validity Rate: Why 90.6% Matters
The 90.6% validity rate on processed high-critical findings is the most important number in this entire dataset. Compare it to historical LLM-driven security results:
| Approach | Typical validity rate | Source |
|---|---|---|
| GPT-4 era (2023) static scan | 12-18% | Various academic studies |
| Claude 3.5 / Opus 4.6 era (2024-2025) | 35-50% | Anthropic internal + community |
| Mythos Preview (Project Glasswing, 2026) | 90.6% | Anthropic disclosure |
| Senior human security researcher | ~85-95% | Industry baseline |
This is the breakthrough. At 90.6%, Mythos's findings are at human-expert validity. The false-positive rate is low enough that running a finding through a security analyst's review becomes ROI-positive — every 10 Mythos findings yield 9 valid issues, vs 1-2 valid issues from a 2024-era LLM scan.
Practical consequence: security teams can integrate Mythos findings directly into their triage pipeline without burning analyst time on noise. That's a structural change in how vulnerability programs operate.
Defenders' Patch Throughput vs Attacker Capability
The unsettling number in the Glasswing data is the patch rate. 75 of 530 actionable high-critical bugs have been patched at the time of reporting — a 14.2% remediation rate.
| Stage | Count | Bottleneck |
|---|---|---|
| Findings produced by Mythos | 6,202 high-critical | Mythos capability — solved |
| Triaged (human-reviewed) | 1,752 | Glasswing partner bandwidth |
| Actioned (patches in flight) | 530 | Maintainer capacity per project |
| Patched + advisory issued | 75 (advisory 65) | Coordinated disclosure timing |
The bottleneck is humans, not the AI. Project Glasswing partners can produce findings 10-50x faster than open-source maintainers can ship patches. This is the asymmetry that makes public Mythos release a security event: when the same capability is available to attackers (with or without Anthropic's safeguards intact long-term), the production rate of new exploitable findings will outpace the defender ecosystem's patch capacity by an order of magnitude.
Time-to-patch comparison
| Vulnerability class | Median time-to-patch (Glasswing) | Industry baseline |
|---|---|---|
| Critical (CVSS 9+) | 8-14 days | 60-90 days |
| High (CVSS 7-8.9) | 21-30 days | 120-180 days |
| Medium (CVSS 4-6.9) | 45-90 days | 6-12 months |
Glasswing partners patch 3-5x faster than the broader open-source ecosystem because they have priority access, dedicated coordination channels, and Anthropic-funded analyst time. That advantage evaporates after public release.
The wolfSSL Case Study: CVE-2026-5194
The most cited Glasswing finding is the wolfSSL vulnerability — designated CVE-2026-5194 — which Mythos discovered and demonstrated end-to-end. The exploit chain:
| Step | What Mythos did |
|---|---|
| 1. Static analysis | Identified flaw in wolfSSL's certificate validation logic |
| 2. Reproduction | Constructed a proof-of-concept exploit |
| 3. Real-world impact mapping | Showed the exploit allows forging valid TLS certificates for arbitrary domains |
| 4. Attack scenario synthesis | "Forge certificates for fraudulent banking sites" — verbatim from Anthropic's disclosure |
| 5. Severity grading | High-confidence assessment that defenders prioritized |
A successful attacker using this flaw could have set up phishing infrastructure that passed certificate validation against real banking domains. The financial sector implication is direct: any TLS-protected banking interaction relying on wolfSSL without the patch was potentially impersonable.
wolfSSL is embedded in IoT, automotive, embedded financial terminals, and increasingly enterprise networking gear. Mythos finding this flaw before public exploitation prevented a potential mass-phishing event. This is the defensive case Anthropic uses to justify the entire Glasswing program — and the case attackers will replicate against other crypto libraries the moment they get equivalent capability.
What Defenders Should Do Before Public Release
Three concrete actions, in priority order:
| Priority | Action | Why now |
|---|---|---|
| 1 | Patch all CVEs Glasswing has already issued advisories for (65 known) | These are publicly known; attackers can copy the findings directly |
| 2 | Stand up an AI-assisted vulnerability discovery program internally | Mirror Glasswing's workflow on your own codebase before attackers do externally |
| 3 | Subscribe to Mythos waitlist if you maintain internet-critical infrastructure | Defensive access is invitation-only; ask for it before public release rebalances supply |
| 4 | Audit dependencies on wolfSSL, FFmpeg, OpenBSD-derived stacks | These are confirmed Glasswing target classes — likely more findings to come |
| 5 | Inventory C/C++ memory-safety surface in your stack | 38% of Mythos findings are memory-safety; this class dominates the queue |
| 6 | Prepare incident response runbooks for "AI-discovered exploit in production" | Time-to-patch matters when attackers move faster |
| 7 | Engage with cyber insurance — premiums are repricing | The cyber market is pricing in Mythos-related risk |
For internal AI-assisted vulnerability programs, Opus 4.8 at $5/$25 per M tokens is the right starting point today. Mythos pricing at the projected $25/$125 tier is justified once you have working triage pipelines and want to escalate the hardest findings — not as the default scanner.
Regulatory Landscape: Japan, India, US/EU
Government response to Glasswing data has been sharper than past LLM safety events:
| Jurisdiction | Action | Source |
|---|---|---|
| Japan | Ordered sweeping security reviews of national-critical software | The Register reporting |
| India | Demanded financial institution patching of Glasswing-disclosed flaws | The Register reporting |
| US | Government cybersecurity teams reportedly among Glasswing participants | Inferred — official list not public |
| Allied governments | Inferred Glasswing access via "US and allied governments" reference | Anthropic disclosure |
| EU | No specific public response detailed | — |
The Japan + India responses are the most concrete regulatory signal: governments are treating Glasswing findings as actionable threat intelligence rather than research curiosity. Expect similar mandates from US CISA and EU agencies within Q3 2026 once public Mythos release amplifies the asymmetry.
Final Recommendation
Project Glasswing is the most important security event of 2026 so far. The data — 23,019 flaws, 90.6% validity rate, 75 patches shipped — proves that LLM-driven vulnerability discovery has crossed from research curiosity to operational reality at scale. Defenders who treat the public Mythos release as a routine model launch are reading the wrong signal. This is a step-change in the offense-defense balance, and the patch rate gap (10-50x slower than discovery rate) means defenders are structurally behind starting day one.
For TokenMix users running security workloads, the playbook is straightforward: stand up Opus 4.8-driven vulnerability programs now, mirror Glasswing's workflow on your codebase, queue up Mythos waitlist applications for when public release lands, and budget for premium-tier escalation on critical findings.
For non-security teams: stay focused on Opus 4.8 and Sonnet 4.8. The security implications of Mythos public release will affect every codebase, but defending against them is mostly the same hygiene work that was already on the backlog.
FAQ
Is Project Glasswing still ongoing?
Yes. Anthropic has not announced an end date. The program runs in parallel with the upcoming public Mythos release — Glasswing partners continue to surface findings while the broader API rollout proceeds. Expect continued data releases from Anthropic as more findings are processed.
How can my organization join Project Glasswing?
You can't apply directly. Anthropic and AWS use an outreach-based selection process — they contact organizations that maintain "internet-critical companies" or "open-source software" with significant user impact. If you fit that profile and haven't been contacted, you're outside the current cohort. Public Mythos release is the path forward for most organizations.
What if I find a Glasswing-discovered flaw in my own dependency chain?
Patch immediately. The 65 public advisories Anthropic has issued are the highest-priority queue — they're disclosed CVEs with known severity. Most organizations should treat them as P0 incidents. Use standard vulnerability management tooling to map your dependency graph to the advisory list.
Will attackers actually get Mythos-equivalent capability?
Likely yes within 6-12 months of public release. The pattern across prior frontier model releases (GPT-4, Claude 3, Llama 3) has been that capability roughly replicates in open-weight models 6-12 months after the closed-source release. Mythos's specific capability mix is a harder reproduction target, but determined attackers with budget can either jailbreak a public Mythos or train a focused open-weight equivalent.
How do I integrate Mythos-class findings into my SIEM/SOC?
Treat them like high-confidence threat intelligence. The 90.6% validity rate means findings should flow into your vulnerability management pipeline with minimal triage gate — they're worth analyst review without preliminary screening. For tooling integration, route Mythos output through your existing CVE workflow and tag with source: ai-discovered for downstream analytics.
What's the cost of running an internal Mythos-equivalent program?
At projected Mythos pricing ($25 input / $125 output per M tokens), a focused sweep of a mid-size codebase (~500K LOC) costs roughly $5K-15K per pass. Annual budget for ongoing AI-assisted vulnerability discovery across an enterprise codebase: $50K-300K depending on scope. The single-CVE prevention math typically pencils out — average enterprise CVE remediation cost is $40K-200K including downstream incident response.
How does this affect bug bounty programs?
The economics shift. Internal AI-assisted discovery now competes with external bug bounty researchers on speed and coverage. Expect bug bounty programs to either raise payouts for high-severity findings to maintain external researcher participation or pivot to focus on application-layer issues where context and creativity still matter more than raw exploit-construction speed.
What about supply chain attacks via package managers?
Anthropic's Glasswing data focuses on flaws in published software, not malicious-package injection. Supply chain attack defense is a different problem class — package signing, SBOM management, and dependency vetting remain the right controls. Mythos may eventually help here but no public Glasswing data covers this surface yet.
Sources
- Anthropic — Claude Mythos Preview official disclosure (April 7, 2026)
- The Register — Anthropic to release Mythos-class models to the public
- BleepingComputer — Anthropic confirms Claude Mythos-class models will roll out
- The Insurer — Cyber market ready to react to "watershed" Mythos release
- AWS — Amazon Bedrock now offers Claude Mythos Preview
- Fortune — Anthropic raises $65 billion at $965 billion valuation
- Axios — Anthropic releases new model, Opus 4.8
- CVE — Common Vulnerabilities and Exposures Database
Related Articles
- Claude Opus 4.8 Review 2026: Pricing, Benchmarks, vs 4.7 and GPT-5.5
- Claude Opus 4.7 Review 2026: Pricing, Agents, Migration
- Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
- Claude Sonnet vs Opus 2026: Pricing, Quality, Routing Guide
- Frontier Pro Tier 2026: GPT-5.5 vs Opus 4.7 vs Gemini 3.x