Run #12 — 2026-04-24 — X-Research Methodbook

← Latest Updates

Key developments

OpenAI GPT-5.5 (Apr 23): First OAI flagship explicitly framed as 'agent runtime' not chat model. 88.7% SWE-bench, 60% fewer hallucinations vs 5.4. Three variants ($5/$30/M tokens). Released 6 weeks after 5.4 — model cadence now faster than enterprise eval cycles.
ServiceNow -17.7% on record earnings miss (Apr 23): Now Assist grew 130% YoY, CEO raised AI forecast 50% to $1.5B — stock still fell 17.7% on gross margin compression (81.5% vs 82.1% expected). Fortune: 'The numbers are good, but the vibes are bad.' Salesforce, Workday, Oracle dragged with it. Market treating any SaaS weakness as AI disruption referendum.
DeepSeek V4 Flash + Pro (Apr 24): Open-source, 1M context window, $0.14/M input tokens (35-200x cheaper than frontier). Claims near-frontier on reasoning, lags 3-6 months on knowledge. Tencent + Alibaba in talks to invest — first external funding round for DeepSeek.
Sierra acquires Fragment (YC, French, Apr 23): 3rd acquisition in 2 months — workflow integration + European expansion. Consolidation pattern: well-funded AI sales-led leaders buying capability vs. building.
Google Cloud $750M partner fund (Apr 22, Cloud Next '26): Embeds Google FDEs at Accenture, Capgemini, Cognizant, TCS for agentic AI deployment. SIs are the enterprise AI delivery layer — mirrors OpenAI/Accenture+Infosys+PwC pattern from run-11.
Builder Demo Radar — Karpathy AutoResearch still viral (circulating Apr 23): 66K+ GitHub stars, Greg Isenberg framing for GTM: 'give it a goal like lower customer acquisition cost — then it runs.' Qualifies: agentic, unexpected GTM use case, Claude Code runtime, SF builder community signal.

events added

66 total

Events added this run

OpenAI GPT-5.5: Agent Runtime Framing → ServiceNow SaaS Referendum (-17.7%) → DeepSeek V4 Flash + Pro →

Category Events →

Post ideas

ServiceNow paradox — strong AI results, punished anyway

Seva Ustinov

@sevaustinov

ServiceNow's Now Assist grew 130% YoY. Stock fell 18%. The market doesn't believe the AI story even when the numbers are real. The new SaaS question isn't 'do you have AI?' It's 'does AI drive net new revenue — or just cannibalize your seats?'

SaaS AI referendum market signal 243/280 ✓

AutoResearch + the goal function is everything THREAD

Seva Ustinov @sevaustinov 1/3

Karpathy's AutoResearch ran 700 experiments in 2 days and found 20 optimizations that actually worked. No sleep. No status quo bias. No missed windows. The bottleneck in ML research was never compute — it was human availability.

228/280 ✓

Seva Ustinov @sevaustinov 2/3

The same thing is true in performance marketing. We already knew what to test. The bottleneck was how fast we could run and analyze. An agent running 24/7 doesn't change what's possible — it changes how fast you get there.

222/280 ✓

Seva Ustinov @sevaustinov 3/3

Greg Isenberg framed it best: 'give it a goal like lower customer acquisition cost.' The goal function IS the skill now. If your objective is clear, an agent can run 100 experiments overnight to find the path.

209/280 ✓

extends Seva's 'humans are the bottleneck' frame to autonomous experimentation

DeepSeek V4 makes model choice a procurement question

Seva Ustinov

@sevaustinov

DeepSeek V4 Flash: $0.14/M input tokens. Frontier models: $5-30/M. That's 35-200x cheaper with near-comparable performance. 'Which model does this product run on?' just became a real procurement question every AI vendor will face in 2026.

Open-source cost floor as buyer power shift 238/280 ✓

Comment opportunities

Sam Altman @sama · Apr 23

We believe in iterative deployment; although GPT-5.5 is already a smart model, we expect rapid improvements. Iterative deployment is a big part of our safety strategy; we believe the world will be best equipped to win at the team sport of AI resilience this way.

↩ Reply · iterative deployment mirrors buyer adoption curve — connects to Seva's yolo mode trust curve

Seva Ustinov

@sevaustinov

Iterative deployment is the right call for both sides. Buyers adopt agents the same way: 10% of workflows first, check results, expand. The vendors that win will be the ones whose iteration cadence matches the buyer's trust curve — not the model release pace.

259/280 ✓

Tomasz Tunguz @ttunguz · Apr 22

View on X ↗

The most important market in AI isn't chatbots or search. It's coding. Cursor is the fastest-growing developer tool in history, at $2 billion in annualized revenue.

↩ Reply · extends Seva's 'vibe coding moves complexity up one level, not eliminates it' frame

Seva Ustinov

@sevaustinov

Cursor at $2B ARR is proof that vibe coding doesn't kill developer tools — it moves complexity up one level. Simple things get built in-house. The orchestration, integration, reliability layer is where the real product lives. Same story is playing out in every vertical.

270/280 ✓