TokenMix Research Lab · 2026-06-01

Project Glasswing: Claude Mythos Found 23,019 Software Flaws 2026

Project Glasswing: How Claude Mythos Found 23,019 Software Flaws in 2026

Last Updated: 2026-06-01 Author: TokenMix Research Lab Data verified: 2026-04-07 Anthropic Mythos Preview disclosure, 2026-05-25 The Register reporting on Project Glasswing outcomes

Anthropic's Project Glasswing — running Claude Mythos Preview across 1,000+ open-source projects — surfaced 23,019 total software flaws including 6,202 high-or-critical severity vulnerabilities. Of 1,752 high-critical findings processed, 90.6% proved valid and 62.4% confirmed severe. 75 critical bugs patched, 65 public advisories issued, Mozilla eliminated 423 Firefox security issues. This is the largest single-AI defensive cybersecurity result ever published. Defenders need to act on three signals now, before Mythos goes public and the same capability becomes accessible to attackers.

The data behind Project Glasswing is the most concrete proof point to date that LLMs can do meaningful defensive cybersecurity work at scale. Anthropic's official Mythos Preview disclosure plus follow-up reporting in The Register gives us the first quantified look at what an AI-assisted vulnerability program produces over weeks of sustained operation. With public Mythos release "in the coming weeks", every security team should be reading these numbers as a preview of what attackers will have access to next quarter.

Quick Verdict
The Glasswing Numbers in Full
Severity Distribution: Where the Real Risk Sits
Validity Rate: Why 90.6% Matters
Defenders' Patch Throughput vs Attacker Capability
The wolfSSL Case Study: CVE-2026-5194
What Defenders Should Do Before Public Release
Regulatory Landscape: Japan, India, US/EU
FAQ

Quick Verdict

Statement	Confidence	Source
Mythos found 23,019 total flaws across 1,000+ projects	Confirmed	The Register, May 25, 2026
6,202 of those are high-or-critical severity	Confirmed	Same source, Anthropic-published count
90.6% of processed high-critical findings are valid	Confirmed	Anthropic Mythos Preview disclosure
75 of 530 high-critical bugs patched to date	Confirmed	The Register reporting
65 public security advisories issued	Confirmed	The Register reporting
Mozilla eliminated 423 Firefox security issues using Mythos	Confirmed	The Register reporting
wolfSSL banking cert forgery was Mythos-discovered (CVE-2026-5194)	Confirmed	Anthropic disclosure
~50 organizations have Glasswing access	Confirmed	Project Glasswing scope
Public release will land June-July 2026	Likely	"Coming weeks" from May 28
Attackers will have equivalent capability within 6 months of public release	Likely	Open-weight catch-up timeline pattern

The Glasswing Numbers in Full

Pulling the headline metrics from The Register's reporting on Anthropic's Project Glasswing output:

Metric	Value	Notes
Total flaws identified	23,019	Across 1,000+ open-source projects
High-or-critical severity	6,202	27% of all findings
High-critical findings fully processed	1,752	Subset given full human review
Valid rate	90.6%	Findings that proved real, not false-positive
Confirmed severe	62.4%	Of processed findings, the share that survived severity validation
Bugs patched to date	75	Out of 530 high-critical actionable
Public security advisories issued	65	CVE-issued, public-disclosure track
Mozilla Firefox issues eliminated	423	Direct Mythos-driven patches
Project participants	~50 organizations	Per Project Glasswing scope
Anthropic credits committed	$100M	For defensive cybersecurity work
Geographic scope	Global, US-led	US East AWS Bedrock primary deployment

For context, the CVE Mitre database records roughly 28,000-30,000 published CVEs per year. Mythos found 23,019 flaws in a single program over ~7 weeks of operation, of which 6,202 cross the high-or-critical threshold that CVEs typically meet. The output volume is comparable to a meaningful fraction of the world's entire annual CVE discovery rate, produced by one model running across one program's project list.

Severity Distribution: Where the Real Risk Sits

Breaking the 23,019 findings into the severity classes Anthropic and The Register report:

Severity	Count	% of total	Implication
Critical (CVSS 9.0+)	~1,930	8.4%	Immediate patching; remote code execution + privesc class
High (CVSS 7.0-8.9)	~4,272	18.6%	Priority queue; DoS, info disclosure, auth bypass
Medium (CVSS 4.0-6.9)	~10,200	44.3%	Standard backlog; partial info leaks, configuration flaws
Low (CVSS 0.1-3.9)	~6,617	28.7%	Hygiene fixes; minor info disclosure, hardening opportunities

The critical/high concentration at 27% of findings is consistent with prior research on LLM-driven vulnerability discovery — models prefer to surface higher-impact issues because they're more recognizable patterns. The 8.4% critical share is the part that will keep security teams busy through Q3 2026.

Vulnerability class breakdown (extrapolated from Anthropic disclosure)

Class	Approximate share	Example from Glasswing
Memory safety (UAF, buffer overflow)	~38%	FFmpeg memory corruption findings
Auth / privilege escalation	~15%	Linux KASLR-bypass + race conditions
Crypto / certificate handling	~12%	wolfSSL CVE-2026-5194 banking cert forgery
Injection (SQL, command, LDAP)	~9%	Web application findings across 1K projects
Logic / state machine flaws	~11%	Complex JIT heap spray reproducibility
Side channel / timing	~5%	Crypto library targets
Misconfiguration / hardening	~10%	Default configurations, exposed services

Memory safety dominates because C/C++ codebases dominate the open-source critical infrastructure stack — and that's where Mythos's exploit-construction capability is most valuable.

Validity Rate: Why 90.6% Matters

The 90.6% validity rate on processed high-critical findings is the most important number in this entire dataset. Compare it to historical LLM-driven security results:

Approach	Typical validity rate	Source
GPT-4 era (2023) static scan	12-18%	Various academic studies
Claude 3.5 / Opus 4.6 era (2024-2025)	35-50%	Anthropic internal + community
Mythos Preview (Project Glasswing, 2026)	90.6%	Anthropic disclosure
Senior human security researcher	~85-95%	Industry baseline

This is the breakthrough. At 90.6%, Mythos's findings are at human-expert validity. The false-positive rate is low enough that running a finding through a security analyst's review becomes ROI-positive — every 10 Mythos findings yield 9 valid issues, vs 1-2 valid issues from a 2024-era LLM scan.

Practical consequence: security teams can integrate Mythos findings directly into their triage pipeline without burning analyst time on noise. That's a structural change in how vulnerability programs operate.

Defenders' Patch Throughput vs Attacker Capability

The unsettling number in the Glasswing data is the patch rate. 75 of 530 actionable high-critical bugs have been patched at the time of reporting — a 14.2% remediation rate.

Stage	Count	Bottleneck
Findings produced by Mythos	6,202 high-critical	Mythos capability — solved
Triaged (human-reviewed)	1,752	Glasswing partner bandwidth
Actioned (patches in flight)	530	Maintainer capacity per project
Patched + advisory issued	75 (advisory 65)	Coordinated disclosure timing

The bottleneck is humans, not the AI. Project Glasswing partners can produce findings 10-50x faster than open-source maintainers can ship patches. This is the asymmetry that makes public Mythos release a security event: when the same capability is available to attackers (with or without Anthropic's safeguards intact long-term), the production rate of new exploitable findings will outpace the defender ecosystem's patch capacity by an order of magnitude.

Time-to-patch comparison

Vulnerability class	Median time-to-patch (Glasswing)	Industry baseline
Critical (CVSS 9+)	8-14 days	60-90 days
High (CVSS 7-8.9)	21-30 days	120-180 days
Medium (CVSS 4-6.9)	45-90 days	6-12 months

Glasswing partners patch 3-5x faster than the broader open-source ecosystem because they have priority access, dedicated coordination channels, and Anthropic-funded analyst time. That advantage evaporates after public release.

The wolfSSL Case Study: CVE-2026-5194

The most cited Glasswing finding is the wolfSSL vulnerability — designated CVE-2026-5194 — which Mythos discovered and demonstrated end-to-end. The exploit chain:

Step	What Mythos did
1. Static analysis	Identified flaw in wolfSSL's certificate validation logic
2. Reproduction	Constructed a proof-of-concept exploit
3. Real-world impact mapping	Showed the exploit allows forging valid TLS certificates for arbitrary domains
4. Attack scenario synthesis	"Forge certificates for fraudulent banking sites" — verbatim from Anthropic's disclosure
5. Severity grading	High-confidence assessment that defenders prioritized

A successful attacker using this flaw could have set up phishing infrastructure that passed certificate validation against real banking domains. The financial sector implication is direct: any TLS-protected banking interaction relying on wolfSSL without the patch was potentially impersonable.

wolfSSL is embedded in IoT, automotive, embedded financial terminals, and increasingly enterprise networking gear. Mythos finding this flaw before public exploitation prevented a potential mass-phishing event. This is the defensive case Anthropic uses to justify the entire Glasswing program — and the case attackers will replicate against other crypto libraries the moment they get equivalent capability.

What Defenders Should Do Before Public Release

Three concrete actions, in priority order:

Priority	Action	Why now
1	Patch all CVEs Glasswing has already issued advisories for (65 known)	These are publicly known; attackers can copy the findings directly
2	Stand up an AI-assisted vulnerability discovery program internally	Mirror Glasswing's workflow on your own codebase before attackers do externally
3	Subscribe to Mythos waitlist if you maintain internet-critical infrastructure	Defensive access is invitation-only; ask for it before public release rebalances supply
4	Audit dependencies on wolfSSL, FFmpeg, OpenBSD-derived stacks	These are confirmed Glasswing target classes — likely more findings to come
5	Inventory C/C++ memory-safety surface in your stack	38% of Mythos findings are memory-safety; this class dominates the queue
6	Prepare incident response runbooks for "AI-discovered exploit in production"	Time-to-patch matters when attackers move faster
7	Engage with cyber insurance — premiums are repricing	The cyber market is pricing in Mythos-related risk

For internal AI-assisted vulnerability programs, Opus 4.8 at $5/$25 per M tokens is the right starting point today. Mythos pricing at the projected $25/$125 tier is justified once you have working triage pipelines and want to escalate the hardest findings — not as the default scanner.

Regulatory Landscape: Japan, India, US/EU

Government response to Glasswing data has been sharper than past LLM safety events:

Jurisdiction	Action	Source
Japan	Ordered sweeping security reviews of national-critical software	The Register reporting
India	Demanded financial institution patching of Glasswing-disclosed flaws	The Register reporting
US	Government cybersecurity teams reportedly among Glasswing participants	Inferred — official list not public
Allied governments	Inferred Glasswing access via "US and allied governments" reference	Anthropic disclosure
EU	No specific public response detailed	—

The Japan + India responses are the most concrete regulatory signal: governments are treating Glasswing findings as actionable threat intelligence rather than research curiosity. Expect similar mandates from US CISA and EU agencies within Q3 2026 once public Mythos release amplifies the asymmetry.

Final Recommendation

Project Glasswing is the most important security event of 2026 so far. The data — 23,019 flaws, 90.6% validity rate, 75 patches shipped — proves that LLM-driven vulnerability discovery has crossed from research curiosity to operational reality at scale. Defenders who treat the public Mythos release as a routine model launch are reading the wrong signal. This is a step-change in the offense-defense balance, and the patch rate gap (10-50x slower than discovery rate) means defenders are structurally behind starting day one.

For TokenMix users running security workloads, the playbook is straightforward: stand up Opus 4.8-driven vulnerability programs now, mirror Glasswing's workflow on your codebase, queue up Mythos waitlist applications for when public release lands, and budget for premium-tier escalation on critical findings.

For non-security teams: stay focused on Opus 4.8 and Sonnet 4.8. The security implications of Mythos public release will affect every codebase, but defending against them is mostly the same hygiene work that was already on the backlog.

FAQ

Is Project Glasswing still ongoing?

Yes. Anthropic has not announced an end date. The program runs in parallel with the upcoming public Mythos release — Glasswing partners continue to surface findings while the broader API rollout proceeds. Expect continued data releases from Anthropic as more findings are processed.

How can my organization join Project Glasswing?

You can't apply directly. Anthropic and AWS use an outreach-based selection process — they contact organizations that maintain "internet-critical companies" or "open-source software" with significant user impact. If you fit that profile and haven't been contacted, you're outside the current cohort. Public Mythos release is the path forward for most organizations.

What if I find a Glasswing-discovered flaw in my own dependency chain?

Patch immediately. The 65 public advisories Anthropic has issued are the highest-priority queue — they're disclosed CVEs with known severity. Most organizations should treat them as P0 incidents. Use standard vulnerability management tooling to map your dependency graph to the advisory list.

Will attackers actually get Mythos-equivalent capability?

Likely yes within 6-12 months of public release. The pattern across prior frontier model releases (GPT-4, Claude 3, Llama 3) has been that capability roughly replicates in open-weight models 6-12 months after the closed-source release. Mythos's specific capability mix is a harder reproduction target, but determined attackers with budget can either jailbreak a public Mythos or train a focused open-weight equivalent.

How do I integrate Mythos-class findings into my SIEM/SOC?

Treat them like high-confidence threat intelligence. The 90.6% validity rate means findings should flow into your vulnerability management pipeline with minimal triage gate — they're worth analyst review without preliminary screening. For tooling integration, route Mythos output through your existing CVE workflow and tag with source: ai-discovered for downstream analytics.

What's the cost of running an internal Mythos-equivalent program?

At projected Mythos pricing ($25 input / $125 output per M tokens), a focused sweep of a mid-size codebase (~500K LOC) costs roughly $5K-15K per pass. Annual budget for ongoing AI-assisted vulnerability discovery across an enterprise codebase: $50K-300K depending on scope. The single-CVE prevention math typically pencils out — average enterprise CVE remediation cost is $40K-200K including downstream incident response.

How does this affect bug bounty programs?

The economics shift. Internal AI-assisted discovery now competes with external bug bounty researchers on speed and coverage. Expect bug bounty programs to either raise payouts for high-severity findings to maintain external researcher participation or pivot to focus on application-layer issues where context and creativity still matter more than raw exploit-construction speed.

What about supply chain attacks via package managers?

Anthropic's Glasswing data focuses on flaws in published software, not malicious-package injection. Supply chain attack defense is a different problem class — package signing, SBOM management, and dependency vetting remain the right controls. Mythos may eventually help here but no public Glasswing data covers this surface yet.