Paid Pilots with Pre-Agreed Exit Conditions

Sales Law 4

88% of benchmark companies

The old playbook: offer a free pilot to reduce buyer risk and convert afterward. What the best companies did: structure paid pilots with pre-agreed success metrics, fixed durations, and pricing locked before the pilot begins. Free pilots have no commercial gravity. Paid pilots — even at 10–20% of TCV — create a commercial pathway from day one. Sierra: six design partners before commercial launch. All six converted to paying customers. 100% conversion rate. Not because the pilots were free — because success was defined before the pilot began and the product delivered it. Gong: twelve alpha customers before Series A. Eleven of twelve converted — what Bendov called the "trial-close." The pilot was structured from the start as a conversion event, not a test period. Glean: ran 40+ paid proof-of-concept deployments with pre-agreed success metrics before scaling the sales team. Majority converted. The POC methodology was refined on each iteration until it became a predictable conversion engine. Harvey: every prospect received a hyper-personalized paid POC using their actual case documents. The conversion from POC to enterprise contract was the expected default. The six-component pilot structure observed across the cohort: (1) fixed duration — 4–8 weeks depending on workflow complexity; (2) explicitly scoped to one workflow or one department; (3) one primary success metric agreed before launch — not a list of hoped-for outcomes; (4) a defined comparison baseline — what the metric is today without the product; (5) pricing locked before the pilot begins — no renegotiation after a successful pilot; (6) human approval required for all AI recommendations during the pilot period — removes adoption risk before the full commitment. Companies that skipped any of these components showed lower pilot-to-contract conversion. The structure is not bureaucratic — it is the mechanism that makes conversion the expected default rather than a negotiation.

Key examples

sierra gong glean harvey decagon

Anti-pattern

Free pilots without pre-agreed success metrics. Endless POC extensions as a proxy for "almost there." Starting price negotiation after a successful pilot rather than locking it before. Treating the pilot as a test of whether the product works — rather than a structured demonstration that it already does.

Cross-Company Comparison

How each company structured their paid pilot or proof-of-concept — the six components (duration, scope, metric, baseline, pricing, human gate) and the conversion outcome achieved

Company	Pilot structure used	Duration and scope	Conversion outcome
Sierra	Paid design partner program (pre-launch) with 4-stage intake: discovery (no selling), live demo, security/technical deep dive, verbal commitment + paid contract at 10–20% TCV. Kickoff required 3 stakeholders: executive sponsor, technical owner (API access), and CX business owner. Success criteria defined at kickoff as 3–4 primary problems with committed launch date.	2–6 months per design partner; scoped to specific CX workflows (containment, subscription retention). First agent version deployed within 2 weeks of kickoff — even if imperfect. Weekly 30-minute standups throughout.	6 of 6 design partners converted to paying customers — 100% conversion rate. WeightWatchers achieved 70% containment and 4.5/5 CSAT in the first week of operation.
Gong	Alpha cohort of 12 companies recruited from founders' personal networks. At pilot end, Amit executed a 'trial close' — directly asked each company to pay. Users who tried the product 'went crazy for it.' Jameson Yung later restored the pilot motion when the company had moved away from it, recognizing pilots were a core conversion driver not a sales tactic.	Trial period structured as a conversion event from the start. Scoped to recording and analyzing sales calls — one workflow, one success metric (calls captured and analyzed). Eilon's PMF signal: users complained when calls were not recorded, indicating dependency had formed.	11 of 12 alpha customers converted — 91.7% conversion rate. The 12th eventually paid as well. Bendov called this the 'trial-close.' The pilot was not a test of whether the product worked — it was a structured demonstration that it already did.
Glean	Paid POC (evolved from early free trials to paid from day one). KPIs anchored at POC kick-off — query frequency, DAU/MAU, 80% success rate as the primary metric. White-glove deployment: engineers, AEs, sometimes co-founders involved. Success criteria defined before POC began; removes objections at renewal conversation.	90-day paid POC scoped to one department (~$50–60K). Primary success metric: 80% adoption within 90 days. Baseline established at kickoff (searches per day, time spent finding information). Human-reviewed permissions model deployed from day one to neutralize CISO objection during pilot.	Majority of POCs converted to company-wide deployment. Typical progression: $60K departmental pilot → $300–500K+ company-wide within 9 months. 40 design partners before public launch all converted. $1M+ contract segment grew 3x in FY2025.
Harvey	Hyper-personalized paid POC using the prospect firm's actual case documents. Demo rebuilt around the firm's recent cases, contract templates, or M&A deals before every meeting. Legal Engineer (former Big Law attorney) embedded with the firm's team throughout. Allen & Overy's pilot ran 3,500 lawyers through 40,000 queries over multiple months — investment in trust-building, not a limited test.	Multi-week pilots (8–16+ weeks for enterprise). Scoped to research, drafting, and due diligence workflows for specific practice groups. Success metric: lawyer-assessed quality of output on actual matters. Security and compliance review runs in parallel with the pilot, not after.	First 50 enterprise customers were all referrals, indicating near-complete pilot-to-contract conversion. 98% gross revenue retention confirms ICP-qualified buyers who committed during pilots stay. Seat utilization grew from 40% to 70% in 2024 — existing customers using Harvey more intensively, not just seat expansion.
Decagon	4-week fixed pilot with pre-agreed pricing and pre-agreed success metrics. Pricing locked before the pilot starts — no renegotiation after success. Success metrics (deflection rate and CSAT) defined at kickoff. Human fallback always available during pilot — AI is not a full replacement on day one. Bespoke API/database mock-up prepared before pilot to show tailored workflows.	Fixed 4 weeks. Scoped to one support workflow or one ticket category. Primary success metrics: deflection rate and CSAT score vs. established baseline. Decagon engineering engaged on-site for complex deployments (Rippling: 12+ product lines, 75+ routing tags).	Duolingo English Test: prior vendor failed to get chat live after a full year; Decagon live in one month with 80% chat deflection. Notion: won five-vendor RFP, delivered 2x deflection improvement and 34% faster resolution. Rippling: deflection improved from 38% to 50%+. Standard deal progresses from pilot agreement to contract signature in 8–10 weeks total.

How This Law Worked in Practice

Evidence from each benchmark company where this law was observed — how it manifested, what the mechanism was, and what sources confirm it.

Sierra

Sierra's design partner program was not a validation exercise — it was the commercial launch in disguise. Between November 2023 and February 2024, Logan Randolph ran a 4-stage intake process for each of six design partners. Stage 1 was a 30-minute discovery call with no selling. Stage 2 was a 30-minute live demo. Stage 3 was security and technical deep dives. Stage 4 was a verbal commitment and a paid contract requiring 10–20% of total contract value upfront. Companies that could not produce this payment were screened out before any engineering work began. The kickoff structure enforced the six components of a well-structured pilot. Duration was 2–6 months, defined at kickoff. Scope was limited to 3–4 specific business problems, with swimlanes established so that success was attributable to specific work. The primary success metric was agreed in the kickoff session — not a list of hoped-for outcomes but a concrete measurement (containment rate, CSAT score, cost per contact). The baseline was the current state before Sierra agents went live. Pricing was paid before the pilot started. And throughout the pilot period, human agents remained available for every escalation — removing adoption risk before the full commitment. The result was 100% conversion from design partner to paying customer across all six. WeightWatchers achieved 70% containment and 4.5/5 CSAT in the first week. SiriusXM used the partnership to build Harmony, a subscription-retention agent. Randolph's explanation of why conversion was total: the upfront payment pre-selected motivated buyers who had already cleared procurement; the co-build process produced a product shaped to each partner's specific problems; and the implementation was happening in production throughout, so conversion merely formalized an ongoing commercial relationship. The design partner structure became the template for Sierra's post-launch commercial motion. Reggie Marable's "Paid PoC" is the scaled version of the same program — paid from day one, time-boxed, scoped to real-world scenarios. The principle Randolph articulated remains the operational center: "We told partners upfront: 'We'll give you dedicated engineers, direct access to our founders, and our cell phone numbers. But in return, we need real investment from you — payment, access to your systems, and weekly meetings to get candid feedback.'" (Logan Randolph, First Round Capital interview.) The structure is a mutual commitment mechanism, not a risk-reduction tool for the buyer alone.

Key evidence

100% design partner conversion rate (all 6 partners became paying customers); 10–20% TCV payment upfront as screening mechanism★

WeightWatchers: 70% containment rate, 4.5/5 CSAT in the first week of operation — success metric visible immediately★

Randolph verbatim: 'We told partners upfront: We'll give you dedicated engineers, direct access to our founders, and our cell phone numbers. But in return, we need real investment from you.'★

>50% of Sierra's current product roadmap originated from design partner requests — the pilot was simultaneously a product development mechanism★

Voice product originated when SiriusXM demanded it during the design partner phase — became Sierra's second design partnership★

Gong

Gong's alpha program of 12 companies is the earliest recorded instance of the paid pilot structure working as a conversion engine in this cohort. Eilon Reshef recruited the 12 companies from the founders' personal networks. The product recorded and analyzed sales calls from the first day of use — there was no implementation lag, no integration complexity, no training period before value appeared. The value was tangible immediately. Eilon's PMF signal during the trial was precise: "9 out of 10 complaints were how come you didn't even record this call?" Users were angry when a call was not captured, meaning the product had already become essential before the trial ended. This is the structural ideal for a paid pilot: the pilot generates dependency, not just evaluation data. When Amit executed the trial close — directly asking each company to pay — the product had already done the selling. The AE manages a commitment that the product has already earned. 11 of 12 converted. The 12th eventually paid as well. The conversion rate is a PMF signal as much as a sales signal: the product was self-evidently valuable to anyone in the ICP. A conventional free trial would have generated the same PMF data. The paid structure meant the commercial pathway was already open when the trial ended — no renegotiation, no procurement restart, no delay. Jameson Yung's subsequent decision to restore the pilot motion — after Gong had moved away from pilots as the company scaled — confirms how structurally important it was. Yung's observation was simple: users who touched the product "went crazy for it." The pilot was not a tactic for handling a skeptical prospect. It was the mechanism for delivering product conviction. Without the pilot, the company was asking buyers to commit based on a demo; with the pilot, buyers committed based on their own direct experience of the product's value in their actual sales environment.

Key evidence

12 alpha customers recruited from founders' network; 11 of 12 converted at trial end via direct 'trial close' ask — Bendov called this the trial-close★

Eilon PMF signal: '9 out of 10 complaints were how come you didn't even record this call?' — users angry when call not captured = product had become essential★

Jameson Yung restored the pilot motion after Gong moved away from it: users who touched the product 'went crazy for it' — the pilot was product conviction delivery, not a sales tactic★

Alpha cohort used 25,537 sales calls from 17 customer organizations as first dataset — the pilot simultaneously validated the product and seeded the Gong Labs content flywheel★

Glean

Glean's proof-of-concept structure evolved from early free trials into a paid engagement that the company now runs with explicitly defined success criteria anchored before kickoff. The evolution was deliberate: Arvind Jain observed that creating an amazing demo and selling something you cannot yet deliver destroys credibility with the exact buyers you most need. The paid POC structure disciplines both sides — the customer has skin in the game, and Glean has a defined success threshold to hit before conversion is expected. The mechanics of the 90-day paid POC are designed to generate irrefutable adoption data. The primary success metric is an 80% adoption rate within 90 days — not user satisfaction scores, not executive sponsor enthusiasm, but measurable daily active usage as a fraction of licensed users. Search generates this data automatically: query frequency, DAU/MAU ratios, and search success rate (queries where the user clicked a result) are tracked in real time. The CSM brings this data to the executive sponsor meeting, converting the expansion conversation from a vendor pitch into a review of the customer's own usage evidence. The KPI anchoring at POC kickoff removes the most common conversion failure: the post-pilot "we need to think about this" response. When success criteria are defined before the pilot begins, the conversion decision is already made at kickoff — you either hit the metric or you don't, and if you hit it, conversion is the logical next step. Glean's deployment of the permissions model from day one of the POC neutralizes the CISO objection — the security review happens inside the pilot, not as a gate after it. The scale outcomes confirm the structure works. Glean ran 40 design partners before public launch, all converting. The standard expansion trajectory — $60K departmental pilot to $300–500K+ company-wide within 9 months — became repeatable enough to become the primary revenue engine. T-Mobile deployed 100,000 seats; Booking.com deployed company-wide across 14,000 employees. The $1M+ contract segment grew 3x in FY2025, driven by accounts that had completed the POC cycle and expanded through the land-and- expand motion. None of these outcomes are achievable through a free pilot that generates evaluation data without commercial gravity.

Key evidence

Paid POC (not free trial): KPIs anchored at kickoff — query frequency, DAU/MAU, 80% success rate as primary metric; success criteria defined before POC begins to remove post-pilot 'think about it' response★

40 design partner customers before public launch, all converted★

Standard expansion: $60K departmental pilot → $300–500K+ company-wide within 9 months★

$1M+ contract segment grew 3x in FY2025 — driven by accounts completing the POC-to-expansion cycle★

Jain warning on demo risk: 'If you create an amazing demo and sell something you don't have, you lose the opportunity and credibility' — honest capability demonstration is the strategy★

Harvey

Harvey's pilot structure is distinctive in this cohort because the primary success mechanism is not a quantitative metric but a qualitative one: does a managing partner at a Vault 50 firm trust the output on their own actual work? The Allen & Overy pilot of 3,500 lawyers running 40,000 queries was not structured around a deflection rate — it was structured around building institutional trust in a buyer community that evaluates technology through the lens of professional liability. The months-long duration and the scale of the pilot were the signal: Harvey was willing to invest before the contract was signed. The PACER demo tactic illustrates the hyper-personalization principle at the individual level. Before any partner meeting, the team downloaded the firm's most recent public federal court filings and rebuilt the demo around those specific documents. Weinberg ran prompts designed to critique the partner's own work: "I would basically download the last thing that they submitted to court. And then I would try to come up with prompts that were like, 'This is bad.'" When the model was right, the partner stopped evaluating and started planning which workflows to migrate. This is the pilot logic applied to a demo: the prospect's own work is the test case, not sample data. Enterprise pilots ran multi-week (8–16+ weeks) with Legal Engineers — former Big Law attorneys from White & Case, Latham, Skadden — embedded with the firm's team throughout. The human approval gate was preserved throughout the pilot: every Harvey output required attorney review before being used in a filing or client communication. This removed adoption risk at the most critical point — a managing partner signing off on a pilot is not betting their malpractice exposure on an untested system; they are evaluating a tool where every output is reviewed before it matters. The conversion evidence is consistent with near-total pilot success. Harvey's first 50 enterprise customers were all referrals — a structure that only works if pilots consistently convert. Gross revenue retention of 98% confirms that customers who committed based on pilot experience did not change their minds at renewal. Seat utilization grew from 40% to 70% in 2024, meaning existing customers were finding more use cases within Harvey's workflow surface during the post-pilot deployment period.

Key evidence

Allen & Overy pilot: 3,500 lawyers, 40,000 queries — months-long free pilot as deliberate trust-building investment in a buyer community that evaluates technology through professional liability risk★

PACER demo tactic verbatim: 'I would basically download the last thing that they submitted to court. And then I would try to come up with prompts that were like, This is bad. And because they're a litigator and I'm basically attacking something that they just wrote — they would instantly read the screen.'★

Human approval gate preserved throughout pilot period: every Harvey output requires attorney review before use — removes malpractice risk that would block adoption★

First 50 enterprise customers were all referrals — only possible if pilots consistently converted★

Seat utilization 40% → 70% in 2024: existing customers found more use cases during post-pilot deployment, not just seat expansion★

Decagon

Decagon's 4-week pilot is the most precisely structured commercial mechanism in this cohort. Every component is defined before the pilot begins. Duration is fixed at four weeks — not open-ended, not subject to extension by default. Pricing is agreed before the pilot starts; there is no post-pilot renegotiation regardless of outcomes. Success metrics — deflection rate and CSAT — are defined at kickoff, not surfaced after the pilot to retroactively justify a decision already made. A baseline is established (current deflection rate, current CSAT, current tickets-per-agent) so that the improvement is measurable against something real. Human fallback is always available throughout the pilot; the AI runs a portion of traffic, not all of it. And bespoke API integrations are mock-built before the pilot starts, so the prospect experiences a tailored workflow from day one rather than a generic demo. The structural logic of this design is compression. The 4-week limit creates urgency for both parties: Decagon must deploy fast, the customer must engage. Pre-agreed pricing eliminates the post-pilot game where the enterprise buyer evaluates success and then argues the value was lower than expected. Enterprise support metrics are measurable in near-real time — deflection rate and CSAT update daily — so a buy/no-buy decision can genuinely be made on objective evidence at week four. The human fallback means the enterprise CX team does not need to bet the entire support function on an untested system; they need to run a portion of traffic through it and observe results. The case evidence is the most specific in the cohort. Duolingo English Test: a prior vendor had worked for a full year and could not get chat live; Decagon went live in one month and achieved 80% chat deflection immediately. Notion ran a formal five-vendor RFP with explicit criteria; Decagon won and delivered 2x deflection improvement and 34% faster resolution. Rippling deployed with 75+ custom routing tags across 12+ product lines — a complexity level that required Decagon engineering on-site, but produced deflection improvement from 38% to 50%+. In each case, the pre-agreed structure meant that when the pilot data came in, the contract conversation was about confirming an already-made decision, not starting a new negotiation. This is the operational meaning of "conversion as the expected default."

Key evidence

4-week fixed pilot with pre-agreed pricing and pre-agreed success metrics: 'There'll be a four-week pilot. At the end of four weeks, we'll decide upfront if you like it, this is what it's going to cost.'★

Duolingo English Test: prior vendor failed to get chat live after a full year; Decagon live in one month with 80% chat deflection immediately★

Notion: won five-vendor RFP on five explicit criteria; delivered 2x deflection improvement and 34% faster resolution; 1M annual inquiries handled★

Rippling: deflection improved from 38% to 50%+; 75+ custom routing tags, 12+ product lines; Decagon engineering deployed on-site★

Bilt: 60K tickets/month, 70% AI-handled; $250K ACV contract delivers $800K+ ROI — ROI math visible at pilot end, not months later★

← Sales Law 3: Willingness-to-Pay Is a Qualification Signal, Not a Negotiation All Sales Laws ↑ Sales Law 5: The Hardest Objector Is Your Best Potential Champion →