Build Proof That Can't Be Argued With

Law 4

88% of benchmark companies

The old GTM playbook: hire a sales team, scale outbound, figure out retention later. What made scaling actually work in this cohort: these companies built proof so specific and so quantified that enterprise buyers had little to push back on. Not "our product is good" — documented conversion rates, named customers, pre-agreed metrics, measured outcomes. Easy to say in retrospect. Hard to engineer. They engineered it. Sierra ran six design partners before commercial launch. All six converted to paying customers (100% conversion rate, founder-reported). Gong ran twelve alpha customers before Series A; eleven of twelve converted — what they called the "trial-close." Glean ran 40+ paid proof-of-concept deployments with pre-agreed success metrics and majority conversion. Moveworks spent three years in stealth before its 2019 Series A. By launch, the product achieved 25–40% autonomous ticket resolution on a dataset of 250M+ historical IT issues. They were selling a proven result, not a hypothesis. Critical detail: these were paid pilots. Not free. A free pilot generates data. A paid pilot — structured at 10–20% of TCV with pre-agreed success metrics — creates a commercial pathway embedded from the start. The design partner pays. The vendor delivers documented outcomes. Conversion is the expected default, not the exception. The most repeatable pilot structure observed across the cohort has six pre-agreed components: (1) fixed duration (4–8 weeks depending on workflow), (2) explicitly scoped campaigns or accounts, (3) one primary success metric agreed before launch, (4) a defined comparison baseline, (5) pricing at contract pre-agreed so there is no renegotiation after a successful pilot, and (6) human approval required for all AI recommendations during the pilot — removing adoption risk before commitment. Companies that skipped any of these components showed lower pilot-to-contract conversion rates. The operational implication: proof-before-scale requires a deliberate commitment to not scaling before the product is ready. Companies that scaled before proof found their reference customer base too weak to support enterprise procurement — and churn in the deals that did close.

Key examples

sierra glean gong moveworks harvey decagon

Anti-pattern

Scaling GTM before building a reference library with documented outcomes. Free pilots without pre-agreed success metrics. Treating design partners as marketing relationships rather than revenue-pathway relationships.

Cross-Company Comparison

Proof-building structures and outcomes before GTM scaling

Company	Proof mechanism	Scale & conversion	Timeline
Sierra	6 paid design partners (10–20% TCV upfront); structured 4-step intake process	6/6 converted to paying customers (100%)	Nov 2023 – Feb 2024 (3 months before public launch)
Glean	40 design partner customers from founder's network; free trials at launch, evolved to paid POCs	40 design partners before launch; majority converted (rate undisclosed)	2019–2021 stealth phase (2+ years before commercial launch)
Gong	12-company founder-network alpha program; trial-close at trial end	11/12 converted immediately; 12th converted later (effectively 12/12)	2016 — first quarter of operation; $100K ARR from alpha cohort
Moveworks	3-year stealth with paying enterprise customers; documented 25–40% autonomous IT resolution before public launch	Paid enterprise customers at Series A announcement (rate undisclosed)	Founded 2016; first public customer 2019 (3 years stealth)
Harvey	Allen & Overy anchor pilot (free, 3,500 lawyers, 40,000 queries); 15,000+ firms on waitlist by Series A	First 50 enterprise customers all referrals; customers 40 → 235 in 2024 alone	Summer 2022 – April 2023 (Series A); stealth to $10M ARR by end of 2023
Decagon	100+ pre-founding discovery interviews with hard WTP filter; bespoke builds for first 3 customers; 4-week paid pilots with pre-agreed success metrics	Won every bake-off against competitors; pilot → contract conversion undisclosed but structurally locked by pre-agreed pricing	Month 0–4 (pre-product) to ~$50M ARR in 15 months

How This Law Worked in Practice

Evidence from each benchmark company where this law was observed — how it manifested, what the mechanism was, and what sources confirm it.

Sierra

Sierra's proof architecture was the most explicitly engineered of any company in this cohort. Before public launch in February 2024, Logan Randolph ran a structured design partner program that simultaneously validated product-market fit and pre-sold the company's first six customers. The selection process was deliberately rigorous: companies had to clear a four-step intake (30-minute discovery, live demo, security/technical deep dives, verbal commitment plus paid contract) and meet all four criteria — large scale, real operational pain, horizontal industry diversity, and willingness to be a "stranger" whose feedback would not be clouded by existing relationships. The financial structure eliminated tire-kickers: partners paid 10–20% of total contract value upfront. Randolph's rationale: "We told partners upfront: 'We'll give you dedicated engineers, direct access to our founders, and our cell phone numbers. But in return, we need real investment from you — payment, access to your systems, and weekly meetings to get candid feedback.'" This co-investment pre-selected buyers who had cleared procurement and were genuinely committed to deploying. The program ran from November 2023 through launch. All 6 design partners converted to paying customers — a structural inevitability, not a sales miracle: they had been building in production throughout, the product was shaped to their specific problems, and their internal champions had staked political capital on success (some received promotions). The proof package at launch included quantified outcomes from recognizable brands: WeightWatchers at 70% containment and 4.5/5 CSAT in the first week, SiriusXM's Harmony agent managing subscription retention, and Sonos handling complex device troubleshooting. Post-launch, Reggie Marable (employee #23, Head of Global Sales) codified this into a Paid PoC structure — the commercial-scale equivalent of the design partner program. Clients tested agents in real-world scenarios with minimal risk, engagement was paid from day one, and timeboxed pilots produced measurable containment and CSAT data before the full contract was signed. By launch, Sierra also had >50% of its product roadmap shaped by design partner requests — meaning the proof was not just commercial but technical: the product was already battle-tested against enterprise-grade requirements.

Key evidence

6 design partners, all 6 converted to paying customers (100% conversion)★

Design partner payment structure: 10–20% of total contract value upfront★

WeightWatchers: 70% containment, 4.5/5 CSAT in first week★

>50% of Sierra's current product features originated from design partner requests★

Public launch Feb 13, 2024 with $110M Series A and 6 paying customers already live★

Glean

Glean spent two full years building proof before commercial launch. From 2019 to 2021, Arvind Jain personally ran every sales conversation — acting as his own SDR — and recruited 40 design partner customers entirely from his Silicon Valley network. The target profile was deliberate: 500–2,000 employee tech companies whose founders and engineers understood the search problem viscerally because they had lived it at Google or adjacent companies. Early pricing was flat-rate (~$50K/year) to minimize friction, and initial deployments were free trials, with the sole focus on proving the product worked. This was not random validation. Jain's background — 10+ years leading Google Search, Maps, and YouTube, plus co-founding Rubrik — meant he was solving a problem he and his network had personally experienced. The 40 design partners weren't arm's-length customers; they were practitioners who could articulate exactly what "enterprise search that actually works" would require. The resulting product, by the time of commercial launch in 2021, had a permissions-aware knowledge infrastructure covering 100+ SaaS apps that competitors would need 3–4 years to replicate. The POC structure evolved from free trials to paid engagements. By commercial scale (2022+), the standard Glean POC ran 90 days, was paid from day one, tracked specific adoption KPIs (80% DAU/MAU success rate as threshold), and had success criteria defined at kickoff — removing post-POC objections. AJ Tennant, hired with Slack pedigree, anchored the expansion trigger to adoption data: CSMs brought query-frequency and DAU statistics to the executive sponsor, making the expansion conversation data-driven rather than sales-driven. The proof package when GTM scaled included Forrester TEI results (141% ROI, $15.6M NPV, <6 month payback for a 10,000-employee composite), 40% wDAU/wMAU (2x the SaaS industry benchmark), and named customer metrics including T-Mobile (100K seats, 47% reduction in call resolution time), Wealthsimple (98% adoption, $1M+ annual savings), and Super.com (17x ROI, 1,500+ monthly hours saved). By the ChatGPT inflection moment in late 2022, Glean had 3+ years of production-grade deployments that competitors scrambling to build enterprise AI products could not match.

Key evidence

40 design partner customers before public launch, all from founder's Silicon Valley network★

Paid POC structure: 90 days, paid from day one, KPI-anchored at kickoff★

Forrester TEI: 141% ROI, $15.6M NPV, <6 month payback for 10,000-employee composite★

40% wDAU/wMAU — 2x SaaS industry benchmark★

T-Mobile: 100K seats, 47% reduction in call resolution time★

Gong

Gong's founding proof story is one of the cleanest PMF signals in enterprise software. Eilon Reshef recruited 12 companies from the founders' personal network into an alpha program in 2016. At the end of the trial, Amit Bendov executed what he called a "trial close" — he directly asked each company to pay. Eleven of twelve agreed immediately. The twelfth eventually paid as well, making the effective conversion rate 12/12. This was not a sales achievement; it was product-market fit made visible. The PMF signal Eilon identified: "9 out of 10 complaints were how come you didn't even record this call?" — users were angry at absence, not critical of presence. The alpha structure was deliberately tight: US B2B software companies, deal sizes $1K–$100K, using Webex, English-language only. This beachhead ICP was narrow enough to generate clean signal and broad enough to reach 5,000–10,000 potential customers. By end of year one, the alpha cohort had generated $100K ARR, which scaled to $2M ARR by end of 2016. A critical proof-related insight: when Gong later moved away from pilots (prioritizing direct sales velocity), VP Sales Jameson Yung explicitly restored them upon joining. His reason: pilots were a core conversion driver because "from the beginning, when you put Gong into the hands of users, they went crazy for it." The product closed the deal; the sales rep managed the process. Gong also maintained anti-discounting discipline from the start — Amit explicitly refused early price concessions, treating price resistance as product-signal rather than negotiation, which validated demand authenticity. The proof package for scaling included: Gong Labs research from 25,537+ B2B sales calls across 17 customer organizations (the first viral content piece), the Revenue Intelligence category positioning (October 2019), Forrester Wave Leader designation (2020), 700+ customers at Series C, and 5x revenue growth in 2018 and 3x year-to-date growth in 2019. By the time Gong reached enterprise scale, named customer ROI data was specific enough to be used directly in sales cycles — Elsevier (45% deal size growth, 35% faster closure), SpotOn (16% win rate improvement, 95% forecast accuracy), Paycor (141% increase in deal wins).

Key evidence

11/12 alpha customers converted immediately at trial close; 12th eventually paid (12/12 effective)★

PMF signal: 'nine out of ten complaints were how come you didn't even record this call?'★

$100K ARR from alpha cohort → $2M ARR by end of Year 1 (2016)★

Gong Labs first piece: analysis of 25,537 sales calls from 17 organizations — 'thousands of shares' on Sales Hacker★

Jameson Yung restored pilots upon joining — recognized as core conversion driver★

Moveworks

Moveworks built proof before anyone knew they existed. Founded in 2016, the company spent three full years in stealth — not because they lacked ambition, but because they were deliberately constructing an evidence base that would make every subsequent sales conversation shorter and cheaper. They did not launch until they could demonstrate 25–40% autonomous IT resolution at real enterprise customers. When Moveworks announced its Series A in April 2019, it already had paying enterprise customers with documented resolution rates. They did not launch to prove the concept; they launched to scale a proven one. The proof mechanism was the IT helpdesk wedge itself: ticket deflection is binary, measurable in days, and has a clear owner (CIO) with direct accountability. Moveworks chose this wedge precisely because the ROI could be demonstrated in weeks, not quarters. By launch, they had quantified metrics from named enterprises: Broadcom (88% ticket automation), Palo Alto Networks (351K hours saved), Mercari (74% ticket reduction, 94% first-touch adoption before any IT agent intervention). These were not projections — they were production deployments. The commercial structure was designed to force proof: Moveworks charged from day one, with no free pilots. CEO Bhavin Shah was explicit: "If you can't deliver value over and over again, procurement comes to you after a year and says you didn't deliver." This forced sales discipline and ensured Moveworks only took on customers where delivery was genuinely achievable. The resulting proof package at commercial scale included a Forrester Total Economic Impact study (256% ROI, $11.5M benefit over 3 years for a 30,000-employee composite, <1 year payback) — a commissioned, third-party-validated instrument used in every deal to replace slide claims with audited economics. The three-year stealth period also produced a technical moat that compounded proof: Collective Learning trained on 250M+ historical IT tickets meant Moveworks arrived at enterprise meetings saying "it works now" while competitors said "we'll train it together." By the time CIOs started actively evaluating AI assistants post-ChatGPT, Moveworks had Forrester Leader recognition, 34+ published case studies, and a product already demonstrably working in production at hundreds of companies.

Key evidence

3 years in stealth (2016–2019) before public launch; launched with paying enterprise customers and documented resolution rates★

Documented 25–40% autonomous IT resolution before public announcement★

Forrester TEI: 256% ROI, $11.5M benefit over 3 years, <1 year payback for 30,000-employee composite★

860% revenue growth 2019–2022 (Deloitte Fast 500)★

Charged from day one — no free pilots★

Harvey

Harvey's proof-building was prestige-first and patience-intensive. The anchor investment was the Allen & Overy pilot: 3,500 lawyers, 40,000 queries, a months-long free engagement that would horrify most sales organizations but was the correct investment for a buyer whose trust hierarchy starts at the pinnacle of global law. Winston Weinberg and Gabe Pereyra ran this personally — every demo was founder-led, every query analyzed. The founding B2C2B insight was critical: Harvey first sold to individual lawyers on their personal pain ("I don't want to go through tens of thousands of documents"), then converted those lawyers' firms. Weinberg described the demo tactic explicitly: "If you're a litigator, everything needs to be filed with a federal court...I would basically download the last thing that they submitted to court. And then I would try to come up with prompts that were like, 'This is bad.' And because they're a litigator and I'm basically attacking something that they just wrote — they would instantly read the screen. It was risky because sometimes Harvey would hallucinate and then it would just be over. But the times that they got it right, it was over." This hyper-personalized demo approach — using the prospect's own recent public court filings — converted skeptical partners into active participants. The Allen & Overy anchor generated a trust cascade that eliminated cold outreach requirements for the next 50 customers: all were referrals. Weinberg: "If you earn the trust of a few of those firms, the rest of them will trust you." By Series A (April 2023), 15,000+ firms were on waitlists — pulled demand dramatically exceeding supply. The proof package for scaling included AmLaw 100 penetration (28 firms, majority of AmLaw 10 and Vault 10 by end of 2024), Allen & Overy metrics (30% reduction in contract review time, 2–3 hours/week saved per lawyer), and 98% gross revenue retention demonstrating that firms were not just trialing but staying. Critically, while the Allen & Overy anchor pilot was free, Harvey pivoted quickly to minimum $288K/year contracts with 12-month minimums and no self-serve. The proof phase was subsidized; the scaling phase was not.

Key evidence

Allen & Overy anchor pilot: 3,500 lawyers, 40,000 queries, months-long free engagement★

First 50 enterprise customers all referrals from existing clients★

15,000+ firms on waitlists by Series A — demand exceeding supply★

Customers 40 → 235 in 2024 alone (5.9x in one year)★

98% gross revenue retention★

28 of AmLaw 100, majority of AmLaw 10 and Vault 10 by end of 2024★

Decagon

Decagon's proof architecture began before the company existed. Co-founders Jesse Zhang and Ashwin Sreenivas ran 100+ discovery interviews with operations, support, and sales leaders before writing a single line of product code. The interviews had a hard filter: they asked directly "How much would you pay for this?" — not "Is this interesting?" or "Would this be useful?" The signal they were looking for was immediate, specific willingness to pay. They found it in enterprise support: "People were like, yes, if you can deploy this thing, I will sign a $150,000 check immediately, right? And this happened repeatedly." (Sreenivas, PMF Show, January 2026.) Other departments tested — sales, operations — produced "maybe a thousand dollars a month...maybe next quarter." Support won. The first three customers received bespoke implementations — no platform, no templates. Zhang and Sreenivas built each one specifically for that customer's workflow, then extracted what was common. This forced deep understanding of the workflow rather than abstraction of it, and created a first cohort of customers deeply invested in making the product work. The MVP was built by both co-founders in three weeks; the first hire (Amy) joined when the company had reached approximately $950K ARR — meaning the first ~$1M ARR was built with two people. The commercial proof mechanism is the 4-week paid pilot with pre-agreed pricing and pre-agreed success metrics (deflection rate and CSAT). Pricing is agreed before the pilot starts — no post-pilot renegotiation. This is structurally different from a free trial: the enterprise buyer has pre-committed to the price, eliminating the "we'll decide what it's worth based on results" game. ROI is visible within 4 weeks because support metrics are real-time and objective. In competitive bake-offs — which became standard at Decagon — the company won every evaluation against Zendesk AI, Intercom Fin, and others. Jesse Zhang: "Almost all of Decagon's clients conducted what's called a bake-off comparing Decagon against competitors like Salesforce — each time, Decagon's AI-powered agent came out as a winner." The proof package at scale includes: Duolingo (80% chat deflection, live in one month vs. prior vendor that failed to go live in a full year), Bilt (60K tickets/month, 70% AI-handled, hundreds of thousands of dollars monthly savings), Curology (65% cost reduction, resolution rate 5% → 80% autonomous).

Key evidence

100+ pre-founding discovery interviews with hard WTP filter; $150K immediate commitment signal confirmed enterprise support as target★

4-week paid pilot with pre-agreed pricing and success metrics — no post-pilot price renegotiation★

Won every bake-off against competitors including Salesforce★

Duolingo: 80% chat deflection in first month; previous vendor failed to go live in a full year★

$0 to ~$50M ARR in 15 months★

← Law 3: Domain-Expert GTM Outperforms Generic Sales All Growth Laws ↑ Law 5: Price Against Labor Cost, Not Software Alternatives →