◈ X-Research

X Research — Goals, Decomposition, and Definition of Done

Purpose: the canonical product-goal document for X Research. It defines what the system is trying to achieve, how to decompose work into bounded slices, and what counts as done.

This file is intentionally outcome-focused. Architecture and technical decisions belong in ARCHITECTURE.md. Operating rules belong in OPERATING-PRINCIPLES.md. Fixed action definitions belong in docs/specs/action-catalog.md. Do not create new phase handoff-style docs when this file or ARCHITECTURE.md should be updated instead.


1. North Star

Every day, Seva should have a compact, high-signal X operator workflow that:

  1. surfaces the most interesting market/news/social signals;
  2. helps Seva participate in the right conversations;
  3. keeps relationships warm with selected communities and target audiences;
  4. produces one strong original post/thread when there is enough real material;
  5. improves generated replies/posts over time from Seva feedback and performance evidence;
  6. executes only explicitly approved external actions;
  7. records every attempt/outcome so the system is auditable and learnable;
  8. makes state visible in the product UI, not in agent chat memory.

The product is not “agents do X for Seva.” The product is a reliable operator system where agents improve/debug the machinery, while runtime execution is script-first, gated, observable, and recoverable.

Quantitative target (operator outcome, not engineering KPI): build Seva into a recognized voice in the AI-first business operations space — 2K followers within 6 months, 10K within 18 months. Inbox approval rate target: ~25% → ~50% as the system learns Seva's taste. Daily-run operator time: ≤ 15 minutes. These framing numbers come from the 2026-05-09 ROADMAP North Star (now archived at docs/archive/ROADMAP-2026-05-09.md); they shape priorities but are not enforced by any smoke.


2. Work Decomposition Model

All work should be decomposed in this hierarchy:

North Star
  → Product Loop
    → Capability
      → Module / Dependency Graph
        → Operator Journey
          → Bounded Slice
            → Acceptance Checks / Evidence

This is a product traceability graph, not a test list. Tests are leaf proofs attached to journeys and module contracts; they are not the organizing layer.

Product Loop

A recurring business outcome, for example: original content, mafia engagement, investor loop, target-influencer loop, corpus improvement, or operator reliability.

Capability

A stable ability inside a loop, for example: fetch signals, select candidates, generate drafts, queue approvals, execute approved actions, recover partial publishes, render cockpit state.

Module / Dependency Graph

The concrete runtime modules that implement a capability, with producer → consumer edges. The graph answers: “if I touch this module, which operator journeys and downstream capabilities are at risk?” Hub modules — inbox, execution worker, ledger, canonical docs, context-pack builder, scoring — require journey-level regression checks, not only local smokes.

Operator Journey

A durable user/operator scenario that crosses one or more capabilities, for example: morning review, Mafia recommendation approval, comment rewrite with original generation context, original post rewrite single ↔ thread, fresh Seva content ingestion → generator context, no-Force-Reload cockpit path, or partial-execution recovery.

Operator journeys are the layer between architecture and tests. A journey names the operator goal, capabilities touched, module chain, state transitions, required evidence level, runner command, status ([ENFORCED] / [PARTIAL] / [TARGET]), and proof scripts. The canonical journey spec is docs/specs/operator-journeys.md.

Bounded Slice

A concrete engineering/reliability unit that can be built, tested, and reported independently. A good bounded slice has:

Acceptance Checks / Evidence

Every slice must end with named evidence: smoke/test/build/deploy/probe/log/screenshot/direct DB inspection. Evidence is attached to a capability, module contract, or operator journey ID. “Looks right” is not evidence. “Claude says done” is definitely not evidence, however touching.


3. Universal Definition of Done

A slice is done only when all applicable items are true:

  1. Operator-path acceptance gate (must pass before PASS). "Done" requires evidence that the exact operator gesture in the production-shaped surface produced the intended canonical state without a Force Reload. Acceptable evidence is one of: (a) a deterministic browser click smoke that performs the real operator gesture against the production deployment when UX changed; (b) a curl/script call against the production API followed by a Supabase/canonical read confirming the mutation when worker behavior changed; (c) an operator-confirmed screenshot/result. Static smokes, helper-function unit checks, fixture replays, and DOM render-and-filter probes are pre-flight, not acceptance — they do not satisfy this gate. "Claude says done" is not evidence. Browser verification for click-related bugs MUST click the real CTA, not just render or filter.

0a. Production readiness — covers the worker layer too (added 2026-05-13 after the live bookmark live_credentials_missing failure). For any operator action whose execution runs on the Mac mini launchd worker (mafia.live_action, original_post.publish, original_post.generate, mafia.rework, inbox.rewrite), the operator-path acceptance gate is NOT satisfied by direct test-account API E2Es alone. Those prove the X-API contract; they do not prove the deployed cockpit button works for the real operator. A button is "production-ready" only when every link in the chain has evidence: deployed UI click → API route → grant/command → Mac mini worker process.env → runner credential/cap/token gates → external side effect OR typed refusal → terminal action-log/cockpit state without Force Reload. The worker process.env link in particular has its own credential surface (~/.openclaw/scripts/openclaw-execution-worker-real-mafia.sh sourcing ~/.openclaw/secrets/x-api.env + the Supabase durable token store) — Vercel runtime env presence does not imply worker env presence. Acceptance for these actions therefore requires either a worker-env readiness probe (scripts/probe-worker-live-env-readiness.mjs) plus a real-cockpit click-E2E in --mode=preflight-only, OR — for test-account-routable mutations — a real-cockpit click-E2E in --mode=real --against=@sevaustinovtest. See data/run-reports/2026-05-13/button-real-path-gap-audit-after-bookmark-failure.md for the failure that motivated this clause. 1. Operator outcome works. The UI/API/script achieves the intended outcome in the actual operator path, not only in a helper function. 2. State machine is honest. Pending/running/succeeded/failed/blocked/partial states render accurately and survive refresh. 3. Canonical data is current. Supabase/runtime source-of-record, local mirrors, queues, command rows, and ledger projections agree or drift is explicitly surfaced with a recovery action. For documents, repo Markdown files are the canonical source; the target docs website reads them live via the Mac mini docs-source API so no agent backfill is needed for visibility. 4. External actions are gated. Live X actions require interface approval metadata and the relevant script/worker armed gates. Chat intent alone is not a live-action gate. 5. Every attempt is logged. Dry-run/live attempt/outcome rows are written to the canonical ledger or fail closed with a visible error. 6. Partial failure has a path. If the system can partially complete work, the operator must see what happened and how to safely resume/recover without duplicating public actions. 7. No silent success. A successful worker must produce product-visible state and/or a durable report. Agent chat summaries are not runtime state. For live-capable kinds, succeeded requires the intended external/internal mutation to have occurred (e.g. a posted_tweet_id for X publish); noop is not success unless the operator explicitly requested noop. The deployed/operator path must be checked when production readiness is claimed. A claim of "everything works" is invalid if the --tier production-readiness step list (see docs/reference/testing/SYSTEM-ACCEPTANCE-MATRIX.md) was not run or did not exit 0 against the current deploy alias, OR if the Supabase load budget on the slice's hot path is FAIL/UNKNOWN (see item 16 below). 8. Tests cover the bug class. The specific failure mode that motivated the work gets a deterministic regression check. 9. Docs are consolidated. New architectural decisions update ARCHITECTURE.md or this file, not a new one-off handoff-style doc unless explicitly requested. 10. Evidence is named. Final report names exact commands/checks run and their result. 11. Concept lives in one place. When the slice touches a concept named in ARCHITECTURE.md §11 (tweet rendering, status badge, time formatting, cockpit query, kind/cap/sentinel/idempotency-key/heartbeat/Supabase-client/error-normalization registries, X-API publish, smoke harness, etc.), the change MUST land in the canonical module — not in a new parallel implementation. Introducing a parallel implementation requires an explicit deprecation/migration plan in the same slice. For Twitter/X content this also means preserving X presentation semantics (tweet/thread/quote/reply/media as X will render them) per OPERATING-PRINCIPLES.md §0b rule 2 and docs/specs/x-rendering-fidelity.md. 12. Feature has a spec. When the slice introduces a new operator-visible feature or a new shared primitive, it MUST also create or update docs/specs/<feature>.md (Canonical class, ≤ 300 lines) naming the feature's invariants, operator path, typed blockers, and verifying smokes/E2Es. The spec cites the DoD items it satisfies and the ARCHITECTURE concepts it uses. Smokes/E2Es cite the spec they verify. See ARCHITECTURE.md §17 and OPERATING-PRINCIPLES.md §0 rule 10. 13. Editorial quality (Seva's feedback rules). Generated content satisfies 8 rules: (1) personal experience or first-hand observation only — no second-hand summaries pretending to be insights; (2) substance — respond to what the post actually says, not a generic adjacent topic; (3) no overselling — do not inflate Seva's achievements; (4) no shoehorning unrelated topics into a "connection"; (5) flag media presence in source so reviewers see it; (6) quote-post format visually structured (source post + Seva's wrapper); (7) comments on posts ≤ 48h old (else digest); (8) industry news routes to key_developments, not original-post ideas. Source: distilled from the now-archived PIPELINE.md §5. 14. Traceability spine is current. Work that adds or changes an operator journey, shared primitive, or hub module MUST update the graph layer that connects goal → capability → modules → journey → proof. For now this means updating docs/specs/operator-journeys.md (including its embedded capability/module index) or marking the gap [TARGET] in the relevant spec. New smokes/E2Es without a spec or journey anchor are orphan tests and do not count as acceptance evidence.

  1. Substantive cockpit shape (operator-visible work only). When the slice touches an operator surface (/inbox, /queue, /mafia/**, /action-log, /runs, /approvals), the substantive cockpit gate (docs/specs/inbox-cockpit-substantive-gate.md) MUST output PASS before the slice is claimed daily-use ready. PARTIAL is acceptable for an interim landing only with an explicit "checked / not checked" list and a named follow-up slice. FAIL means the slice is not ready, regardless of how many smokes are green. The gate enforces OPERATING-PRINCIPLES.md Cockpit Contract item 10 (banned patterns + required evidence). Green smokes / build / deploy / type-checks DO NOT substitute for this gate; a card that looks like a database admin dump but passes every selector smoke is still a contract regression. Reason: 2026-05-18 inbox audit (data/run-reports/2026-05-18/reliability-stabilization/inbox-ux-principles-audit.md) + postmortem (process-postmortem-quote-preview-auth-error.md).

  2. Load budget acceptance tier (any slice that touches a hot path). Functional PASS + UX PASS does NOT close DoD if the slice adds or changes a hot-path query (SSR query, polling API endpoint, worker poll, or cron read) and node scripts/smoke-supabase-egress-budget.mjs does not exit 0, OR if the touched path is not pinned by a static invariant in that gate. PASS for these slices requires: (a) static gate exit 0, AND (b) for new/changed hot paths, either the latest scripts/supabase-usage-snapshot.mjs --json shows the touched table within the request-count thresholds in docs/specs/supabase-load-budget.md §4, OR a follow-up dashboard observation is named (PARTIAL). FAIL means the slice is not ready. The load tier sits alongside the functional, cockpit-shape, and production-screenshot tiers — not below them. Reason: 2026-05-21 exceed_egress_quota restriction (Phases 22–24) showed a fully functional, green-smoke-passing system silently amplifying every operator browse into ~2 GB/day of Supabase egress. Canonical contract + typed blockers: docs/specs/supabase-load-budget.md.


4. Product Loops and DoD

4.1 Signal Discovery

Goal: surface the most interesting things from monitored X sources.

DoD: - fetches are observable and failures are warnings, not silent gaps; - surfaced items include why they matter; - low-signal noise is filtered; - source/list coverage is measurable; - the inbox is non-empty when real signals exist and explicitly empty when none do.

4.2 Posting Mafia / Mutual Engagement

Goal: keep reciprocal engagement with the content circle through reviewed likes, bookmarks, replies, and quote candidates.

Product-model note (2026-05-13, Phase 81): Likes / bookmarks / replies / quotes are a generic Inbox workstream ("Engagement suggestions") across audiences/sources. Mafia is the first/current source feeding that workstream. Future audiences (e.g. §4.5 marketers / founders / investors loops) plug into the same /inbox engagement-suggestions surface via ?source=<source> filters — they do NOT get a new top-level inbox route or section. The default /inbox therefore exposes the engagement-suggestions section as a first-class generic block (with per-source chips); ?source=mafia narrows the same section to the Mafia source. See docs/specs/unified-inbox.md §2 invariant 12 and docs/specs/operator-journeys.md OJ-002.

Targets / caps: - select up to 10 mafia posts/day; - target 10 likes, 5 bookmarks, 3 replies, 1 optional quote; - hard live caps: 50 likes/day, 10 bookmarks/day, 10 replies/day, 5 quotes/day (mafia runner caps; see web/lib/mafia-live-runner.mjs::DAILY_CAPS); - reply/quote live execution is wired (2026-05-13): Approve on a reply/quote row at /inbox?source=mafia is the publish authorization and reaches the same mafia.live_action worker queue path as like/bookmark.

DoD: - candidates are selected with rationale; - selection, scoring, drafting, and rework all reason from the same rich tweet context a human reading the post on X sees (text, media, alt_text, link cards, quote body, reply target, conversation hints) — surfaced through one canonical assembleContextBundle(row) per docs/specs/full-tweet-context-bundle.md; modalities the LLM did not receive (parent body, thread, OCR, video transcript) MUST surface as typed modalities_unfetched[] so the model refuses-with-reason rather than confabulates; - actions are reviewable before execution; - like/bookmark/reply/quote all execute only after approval and are ledgered; - Approve = publish-attempt for reply/quote. There is NO operator step where Approve leaves the row in approved waiting for a manual "Run live" click. If X refuses with a conversational-permission 403 (reply: "not been mentioned…", quote: "Quote Tweet is disabled…"), the runner emits the typed reply_permission_denied_api_requires_browser / quote_permission_denied_api_requires_browser blocker AND the mafia.live_action worker (lib/mafia-live-command-worker.mjs) AUTOMATICALLY invokes lib/mafia-browser-fallback-bridge.mjs to validate + idempotently enqueue a row for the existing scripts/browser-assisted-runner.mjs --live worker — gated by the separate BROWSER_FALLBACK_ARMED=1 sentinel. There is NO operator step where the operator hand-edits data/browser-actions/queue.json for a recommendation the worker has already classified. Generic x_api_error for conversational-permission denial, or a typed permission denial that the worker does not at least attempt to bridge to the browser queue, is a contract regression (2026-05-14). - cockpit shows queued/running/succeeded/failed/blocked state per action.

4.3 Original Content Corpus

Goal: maintain Seva’s real content base so generation is grounded in his actual thinking.

DoD: - sources have metadata: date, format, topic, path/link, extraction status; - duplicates/noise are controlled; - important ideas are distilled into reusable frames without losing raw references; - new Telegram/X/transcript material has a repeatable ingestion path; - generation prompts can cite/use the corpus without inventing a synthetic voice.

4.4 Original Post / Thread Loop

Goal: produce and publish at most one strong original post/thread per day when there is enough real material.

Cap: 1 original post/thread per calendar day. Approved extras wait in original_post_queue.

DoD: - draft is grounded in Seva material or concrete field observations; - draft is reviewable without heavy rewriting; - no-filler path exists when no strong angle exists; - approval, scheduling, publish, retry, return-to-review, and partial-thread recovery are visible in UI; - queue state is canonical in Supabase; - live publish writes ledger rows for every segment/attempt; - partial thread publish can resume only remaining segments without touching public ones.

4.5 Relationship / Audience Loops

Goal: keep investor, target-influencer, SF founder, and other strategic audiences warm with specific, non-generic engagement.

DoD: - target lists have explicit inclusion/exclusion criteria; - important posts from target audiences are not missed; - suggested engagement is specific to person/post/context; - low-value “nice update” replies are filtered; - approved public actions follow the standard execution/ledger path.

4.6 Quality Improvement / Prompt Evolution

Goal: make generated comments, replies, posts, and threads improve from feedback and evidence.

DoD: - Seva feedback becomes versioned rules or examples, not forgotten chat context; - prompts/rules are tested on fixtures before promotion; - generated content becomes more specific, Seva-like, and easier to approve; - stale hooks/anti-patterns are retired; - performance/approval history affects future generation.

4.7 Operator Surface Reliability

Goal: the product UI is an honest view over canonical docs, inbox, execution commands, queues, and ledger.

DoD: - core pages render and are non-empty when data exists; - cockpit states are accurate for queued/running/succeeded/failed/blocked/partial; - refresh does not hide active work; - canonical docs and rendered pages do not drift; - deploy/build/site-health checks pass; - failures include next action, not just red text.

4a. Operator Workflows — how the system helps achieve the goals

The product loops above state what outcomes matter. Operator workflows state how Seva works with the system to achieve those outcomes. These workflows are the product-level owner of the OJ-* IDs; docs/specs/operator-journeys.md is the acceptance/proof mapping for them (current surface, target surface, migration status, first proof).

ID Workflow System helps Seva by... Primary product loops
OJ-001 Daily Review opening one coherent review session with everything that needs attention today: engagement opportunities, own-content drafts, scheduled/pending publish items, and recovery cards §4.1, §4.2, §4.4, §4.7, §4.8
OJ-002 Engage With Others finding the right external posts/people and turning them into reviewable likes, bookmarks, reply drafts, quote drafts, rewrites/reworks, or rejects §4.1, §4.2, §4.5
OJ-003 Create / Review My Own Content surfacing original post/thread candidates grounded in Seva's material so he can edit, reject, schedule, or publish §4.3, §4.4, §4.6
OJ-004 Schedule / Publish / Recover Publishing safely releasing approved content, scheduling/rescheduling, returning to review, resuming partial publishes, and proving public/ledger evidence §4.4, §4.8
OJ-005 Rewrite / Rework Without Losing Context letting Seva ask for improvement while preserving item identity, generation context, cockpit state, and source/status lane until terminal state §4.2, §4.4, §4.6, §4.7
OJ-006 Reject / Feedback / Learning rejecting low-quality items with reason so the system learns instead of merely hiding cards §4.6, §4.7
OJ-007 System Transparency / Improvement Loop showing how the system is working — fetches, workers, prompts, ledgers, blockers, recovery actions — so Seva can trust it, repair it, and generate improvement ideas §4.7, §4.8, §5
OJ-008 Fresh Seva Context future: grounding generation in freshly ingested Seva material through a designed shared context path §4.3, §4.6 — [DESIGN], not current implementation focus

Cross-workflow DoD: - the daily review surface does not split the operator's attention across mystery tabs (Pending, Returned, Other, Hidden are not target top-level views); - realtime rewrite/rework/publish workers keep the visible card anchored in the same source/status view until terminal state; - every operator gesture reaches canonical state without Force Reload; - cockpit heartbeats, prompts/workers/run state, ledgers, and recovery actions are visible enough for Seva to understand whether the system is working and what to improve next; - Fresh Seva Context remains [DESIGN] until a separate Seva-confirmed design pass.

4.8 Execution, Audit, and Recovery

Goal: every approved external action is reliable, observable, and recoverable.

DoD: - scripts are dry-run by default and fail closed; - interface approval metadata is the only live gate; - all attempts/outcomes are ledgered; - Action Log and cockpit project ledger truth; - partial failures expose exact completed work and safe recovery path; - daily checks prove projections are current enough for the operator.


5. Autonomous / Claude Code Slice Rules

When delegating to Claude Code or running autonomous work:

  1. Pass the goal + DoD + safety boundary, not just a narrow ticket.
  2. Give the worker only the relevant canonical docs and targeted sections.
  3. A green checklist before a stop time is not a stop condition; select the next useful bounded slice inside the same goal.
  4. If the worker says done, supervisor verifies independently.
  5. If verification shows drift, launch a follow-up or fix directly.
  6. Worker outputs should update durable state, not become another permanent checkpoint/report unless explicitly needed.

Canonical autonomous rules live in AUTONOMOUS-WORK.md.


6. Documentation Discipline

This repo should converge toward a small canonical set:

There is no separate HANDOFF.md. The reading list lives in AGENTS.md (with README.md as a one-line redirect). Do not create a new handoff-style file.

Run reports are evidence, not permanent context. Once a decision or insight matters beyond the incident, distill it into one of the canonical files and stop re-reading the report by default.