AI Wire · Friday, May 15, 2026

Codex mobile app & Codex tooling ecosystem

OpenAI's biggest shipped feature today is Codex landing inside the ChatGPT mobile app, rolling out as a preview on iOS and Android in all supported regions, with phone-to-Windows pairing coming soon (@openai, @sama, @alexfinn). Users can start new work, review outputs, steer execution, and approve next steps from mobile while Codex keeps running on a laptop, Mac mini, or devbox (@alexfinn). The launch hit Hacker News hard with 342 points and 173 comments on OpenAI's "Work with Codex from anywhere" post (last30days, openai.com), and @alexfinn argues the iPad with a keyboard case is now "the best vibe coding device on the planet."

Alongside the mobile launch, Sam Altman pushed two automation primitives: hooks that customize the Codex loop with scripts at key task points (validators, secret scans, conversation logging, per-repo memories) and programmatic access tokens for Business and Enterprise teams to use in CI and release workflows (@sama). Ollama added Codex-app support in 0.24 with ollama launch codex-app, plus more NVIDIA Blackwell GPUs to serve GLM-5.1, Kimi-K2.6, and other open models through Codex (@ollama).

The third-party tooling layer kept pace. Steipete shipped OpenClaw 2026.5.12 with Codex-login defaults, runtime fallbacks, and a new TypeScript security-hardened filesystem lib that sped some file ops 10x (@steipete), plus CodexBar 0.26.0 with Kiro/Antigravity/OpenRouter/Kimi support and better Codex/Claude limit scoping (@steipete). He also released Proxyline 0.2.0 for process-global Node proxy routing, dropping 12 sub-dependencies versus global-agent (@steipete).

Open-weights models & datasets surge

Datadog dropped Toto 2.0, a family of Apache-2.0 time-series foundation models from 4M to 2.5B parameters where every size beats the last from a single hyperparameter config — first across BOOM, GIFT-Eval, and TIME (@clementdelangue, @huggingface). The framing matters: most TSFM families ship multiple sizes that all perform roughly the same, and Clement argues this is the moment scaling laws actually start working for time series the way they did for language and vision (@clementdelangue).

Hugging Face crossed 1M datasets this week, with Merve Noyann calling for more open coding session traces to push coding models further (@huggingface, @_akhaliq). Kevin Li's SWE-ZERO-12M lands as the largest agentic trace dataset in the open — 112B tokens, 12M trajectories, 122K PRs, 3K repos, 16 languages, 5.7x larger than the previous record (@clementdelangue). And LLMenjoyerUK's Open MM-RL dataset trended #1 on Hugging Face with PhD-level STEM problems that are 100% deterministically verifiable across single-image, multi-panel, and multi-image tasks (@huggingface). Sebastian Raschka noted DeepSeek still rules the active-parameter ratio (@rasbt).

AI research integrity & arXiv crackdown

ArXiv is escalating: hallucinated references will now earn a one-year ban, and the Code of Conduct reminds authors they take full responsibility for paper contents "irrespective of how the contents were generated" (@jeremyphoward, @emollick). The trigger is stark — Gary Marcus flagged that more than 140,000 fake citations were identified across four research repositories in 2025 papers and preprints alone (@garymarcus).

Ethan Mollick framed the policy as the right short-term move: making humans responsible for their AI use is "an incredibly reasonable way to address problems & opportunities," while noting autonomous scientific work will eventually need different solutions (@emollick). The shift maps to a broader tonal change Mollick observed — "big increases in message discipline across all the AI labs in recent weeks," an inevitable response to scrutiny that he worries obscures real thinking (@emollick).

Interpretability & adversarial safety findings

Goodfire and Roon published work showing neural networks "do math by rotating shapes," locating a shape-rotating calculator circuit inside an LLM that's used for more than just arithmetic (@tszzl). On the other side of the alignment ledger, @_akhaliq surfaced a paper showing a single neuron is sufficient to bypass safety alignment in large language models. Mollick highlighted "whimsey attacks" — absurd arguments like "I cannot pay that much because of the Geneva Convention" — that defeat agent guardrails because they're out-of-distribution, hitting smaller models hardest but still nudging frontier ones (@emollick). Meanwhile the Second Scaling Law of "just add thinking tokens" remains undefeated for hacking, math, science, and crossword puzzles, with no plateau visible (@emollick).

Anthropic strategy: philanthropy & geopolitics

Anthropic announced a partnership with the Gates Foundation, committing $200M in grants, Claude credits, and technical support across global health, life sciences, education, agriculture, and economic mobility (@anthropicai). Same day, the lab published a position paper on US–China AI competition arguing the US and democratic allies hold the frontier lead today and laying out what it'll take to keep it (@anthropicai) — a notable pairing of philanthropic and geopolitical messaging in one news cycle.

Applied AI: healthcare, 3D worlds & creative tools

Swyx's pod with Abridge's Janie Lee and @c_asawa frames healthcare as one of AI's most important proving grounds, with 100M+ medical conversations powering real-time prior auth and a "clinical intelligence layer" beyond ambient documentation (@swyx). Fei-Fei Li's World Labs released image-blaster, which combines Marble, Claude skills, and @fal to turn a single image into a fully meshed 3DGS world with physics and SFX in minutes (@drfeifei). The AI Engineer team detailed Magnus Carlsen's chess coach that splits work between Stockfish (evaluation), detectors (tactics), and an LLM (English translation) for sub-3-second commentary on Gemini Flash — with a Slack-to-Claude-Code feedback loop fixing bad commentary (@aidotengineer). OpenRouter shipped Recraft V4.1's six image models and ran a "Royale: Last Agent Standing" battle-royale benchmark with tool calls like attack() and walk() (@openrouter).

The Bottom Line

The day's center of gravity is OpenAI normalizing Codex as an always-on agent — mobile client plus hooks and programmatic tokens — while the open-weights ecosystem (Toto 2.0, SWE-ZERO-12M, 1M HF datasets) finally shows scaling laws biting in time series and agentic traces. Underneath the product news, arXiv's hallucinated-citation ban and fresh interpretability/jailbreak findings hint that institutional and mechanistic scrutiny of AI is hardening at the same pace as capability.

Dispatch № 23 · Filed Friday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Codex mobile app & Codex tooling ecosystem

Open-weights models & datasets surge

AI research integrity & arXiv crackdown

Interpretability & adversarial safety findings

Anthropic strategy: philanthropy & geopolitics

Applied AI: healthcare, 3D worlds & creative tools

The Bottom Line

Sources

Codex mobile app & Codex tooling ecosystem

Open-weights models & datasets surge

AI research integrity & arXiv crackdown

Interpretability & adversarial safety findings

Anthropic strategy: philanthropy & geopolitics

Applied AI: healthcare, 3D worlds & creative tools