◈ X-Research

X Research — Operating Principles

0. Documentation Discipline

Keep the operating context small and canonical. Durable product goals live in GOALS-PIPELINE.md; architecture and technical decisions live in ARCHITECTURE.md; operating rules live here; autonomous-run rules live in AUTONOMOUS-WORK.md; action definitions live in docs/specs/action-catalog.md; audit / ledger event contracts live in docs/specs/hot-write-ledger.md.

Rules:

  1. Do not create permanent new handoff-style/design docs by default. If a decision matters beyond an incident, distill it into ARCHITECTURE.md, GOALS-PIPELINE.md, or this file.
  2. Run reports are evidence, not context. They may record what happened, but future agents should not need to read a pile of reports to know the current architecture.
  3. Prompts should read targeted sections, not whole archives. Never include full docs/archive/** files (including the legacy PIPELINE-2026-04-26.md) or all run reports in a Claude Code prompt unless the task is explicitly historical audit.
  4. One canonical file per knowledge type. If two files disagree, update the canonical file and treat the other as stale/reference.
  5. After incident repair, update canonical docs. A fix is incomplete if the lesson remains only in chat or a one-off report.
  6. Move before delete. A run report or design doc may be archived/deleted only after its durable lesson is captured in a canonical file (this one, GOALS-PIPELINE.md, or ARCHITECTURE.md). Reports cite the canonical destination; canonical docs cite the report path as evidence.
  7. Reports do not silently edit canonical docs. If a report needs a canonical-doc change, it proposes exact wording in the report and waits for supervisor/Seva confirmation.
  8. Artifact lifecycle and context budget. Every file the project produces belongs to one of five lifecycle classes (Canonical, Sprint, Run-state, Evidence, Temp), each with a size cap and a cleanup rule. The bootstrap reading list has a fixed context budget; adding content to a Canonical or Sprint file requires compressing equivalent content elsewhere (net-zero rule). For autonomous runs longer than ~4 hours, the supervisor schedules periodic compaction phases. Operational rules: AUTONOMOUS-WORK.md § Artifact Lifecycle and Context Budget. Storage architecture (filesystem layout, Supabase mirror, archival paths): ARCHITECTURE.md §13.
  9. Anti-amnesia. Anything that affects future behavior MUST live in a Canonical doc, an Evidence report, or runtime state (Supabase / data/run-state/ / ledger) — not only in chat memory or one CC session's context. Workers are stateless on input (start from execution_commands.payload + filesystem + Supabase; no implicit parent-session context). Decisions have provenance (canonical-doc edit, sprint-checkpoint entry with rationale, or evidence report). Sprint state is reconstructable: at any point in an autonomous run, a fresh CC session must be able to load the Canonical reading list + active Sprint files + latest checkpoint and continue. Full architecture: ARCHITECTURE.md §16.
  10. Per-feature specs live in docs/specs/. Every operator-visible feature and shared primitive has one Canonical-class spec at docs/specs/<feature>.md (≤ 300 lines) that names its invariants, operator path, typed blockers, and verifying smokes/E2Es. The spec is the bridge between GOALS-PIPELINE.md DoD and scripts/smoke-*.mjs / verify-prod-*.mjs. Cite GOALS items, ARCHITECTURE concepts, and OPERATING-PRINCIPLES rules from inside the spec; do not duplicate them. Full pattern: ARCHITECTURE.md §17.
  11. Tests are graph proofs, not context. The project is organized as a traceability spine: Goals → Capabilities → Module DAG → Operator Journeys → Proofs/tests/evidence → run/debug order. A new smoke-*, e2e-*, or verify-prod-* script MUST cite the spec or operator-journey ID it proves; a new operator-visible scenario MUST live in the journey registry/spec before tests multiply around it. Tests without a spec/journey anchor are orphan evidence and SHOULD be retired or anchored before they are trusted. Target artifact: docs/specs/operator-journeys.md (compact canonical journey list + embedded capability/module index). Until the runner and registry are wired, mark missing links [TARGET] rather than pretending the graph exists.
  12. Documentation graph passes the fresh-Claude bootstrap test. Canonical docs, specs, preflight, and the canonical-store registry are mechanically coherent only if a fresh Claude Code session — given AGENTS.md and nothing else — can (a) follow the read list and preflight, (b) reach the correct task-specific docs by slug, (c) run the named checks, and (d) complete a bounded task without inventing progress / handoff / status files. When a doc/spec/preflight/registry change is made, the same slice fixes any graph break the change exposes (stale slug, dangling reference, drift between fallback map and backfill registry, missing spec index entry) — do not work around it.
  13. Project docs are source of truth; chat memory is not. When answering or changing anything about X Research architecture, acceptance gates, credentials, OAuth/OAuth2, runtime surfaces, operator journeys, product scope, or current invariants, start from the root documentation graph (AGENTS.mdGOALS-PIPELINE.md / ARCHITECTURE.md / this file / AUTONOMOUS-WORK.mddocs/specs/**docs/reference/**). Do not use OpenClaw memory search, prior chat, or a run report as the primary authority for current behavior. Memory search may help recall that a decision exists, but the answer must be verified against canonical project docs or code. If the root docs and their linked specs/reference docs do not contain the answer, that is a documentation gap: update the canonical docs in the same slice (or mark the gap explicitly BLOCKED if the answer needs Seva's decision) instead of treating memory as enough.

These rules exist because the project had begun accumulating checkpoint/report sediment. Sediment is for geology, not software operations.


0a. No False Done

Status legend in this file: [ENFORCED] = rule is wired into code/smokes and fails closed; [PARTIAL] = rule is correct but adoption is incomplete; [TARGET] = rule is desired but the supporting module/check does not yet exist; [DEBT] = a known divergence flagged for consolidation. See ARCHITECTURE.md §0 for the same legend used across canonical docs.

The repeating failure class on this project is "agent reported done; Seva opened the product; the product was not actually ready." These rules make that mechanically harder.

  1. No PASS without operator-path evidence. [ENFORCED] for the supervisor verification rule (AUTONOMOUS-WORK.md); [PARTIAL] for the universal application — not every slice has a runnable operator-path gate yet; supervisors must create one or mark BLOCKED rather than waving the rule. Before writing PASS on operator-visible work, satisfy the gate in GOALS-PIPELINE.md §3 item 0. "Looks right", "appears to work", "behaves as expected", "verified by smoke", or "Claude says done" are not substitutes for the gate.
  2. succeeded means the mutation occurred. [PARTIAL] — publish path has a noop guard (scripts/execution-command-worker.mjs); mafia/generate/rewrite/rework kinds rely on typed blockers but lack an equivalent guard. See ARCHITECTURE.md §3.1. Worker status='succeeded' is permitted only when the worker's intended external or internal mutation actually happened. Live publish/like/bookmark/reply/quote in mode='real' requires a posted_tweet_id (or external-action equivalent) AND a matching outcome=ok ledger row keyed on idempotency_key. A noop, refusal, or any path that bypassed the external mutation MUST set status='failed' with a typed blocker_type, unless the operator explicitly requested noop. Cockpits MUST NOT render "Published to X" / "Liked on X" / equivalent without the corresponding external id.
  3. Browser verification means actually clicking the operator CTA. [PARTIAL]scripts/verify-prod-inbox-approve-click-e2e.mjs exemplifies the pattern for the inbox approve flow; analogous click-E2Es do not yet exist for every operator CTA. For click-related bugs, browser verification MUST drive the exact operator gesture (click the real Approve/Publish/Reject/Rewrite button on a real-shaped row) and assert the canonical post-click state. Render-and-filter, DOM-presence-only, or fixture-replay probes are not browser verification; they are pre-flight.
  4. External API errors are typed and operator-actionable. [PARTIAL]lib/x-api-error.mjs exists and is consulted by the originals-publish path; ~1–2 of 4–5 publish/like/bookmark sites currently route failures through it (Phase 3 D5). Every script/worker that calls an external API (X, OAuth, Supabase) MUST normalize thrown errors through a typed helper before writing the ledger row or returning a result. The normalized record carries http_status, surface_message (≤240 chars), and hint (operator-actionable next step); cockpits render surface_message + hint, never the raw library string. Bare Request failed with code <N> is a contract regression.

  5. "Works" includes load budget. [ENFORCED] for the 17 static budget invariants; [PARTIAL] for byte-level dashboard reconciliation. Functional PASS + UX PASS is NOT a daily-use ready / operationally acceptable claim if the Supabase load budget on the slice's hot path is FAIL or UNKNOWN. A green operator-path E2E + green cockpit shape on a query that ships ~5000 rows per /inbox SSR is the same false-done class as a green smoke on a worker that returned noop: the operator-visible state looks right while the system silently produces the next incident (2026-05-21 exceed_egress_quota restriction). Before claiming PASS on any slice that adds or changes a hot-path query (SSR, polling endpoint, worker poll, or cron read), node scripts/smoke-supabase-egress-budget.mjs MUST exit 0 AND the slice's report MUST cite that exit code or a scripts/supabase-usage-snapshot.mjs snapshot. UNKNOWN load on a hot path is BLOCKED, not PASS. Canonical contract + typed blockers + PASS/PARTIAL/FAIL semantics: docs/specs/supabase-load-budget.md.

These items reinforce, and are reinforced by, the Operator UI Cockpit Contract below. Treat them together.


0b. Single Canonical Implementation

Status: §0b is target-shape policy. Rules 1–6 below are the desired operating contract. The current repo carries 13 known duplication classes (D1–D13 in data/run-reports/2026-05-08/dreaming-phase3-deeper-root-causes.md) — listed at the end of this section as [DEBT] — that violate rules 1, 4, and 5. Future agents must read each rule with that reality in mind: rules 1 and 5 are [TARGET] for new work; rule 2 is [PARTIAL] overall and [ENFORCED] where docs/specs/x-rendering-fidelity.md covers the surface; rules 3 and 6 are [ENFORCED] going forward via the supervisor verification process; rule 4 is [PARTIAL] because some canonical modules still need to be created.

The other repeating failure class on this project is "the same element was implemented three different ways in three places, the fix landed in one, the bug returned through the others." These rules make that mechanically harder.

  1. One canonical implementation per concept. [TARGET] for new work; [PARTIAL] overall. Each product/UI/runtime concept has exactly one canonical implementation in this repo. Surfaces that need the concept import it; they do not re-implement it. Concepts that fall under this rule include — but are not limited to — tweet/quote-post rendering, status/state vocabulary, daily-cap enforcement, idempotency-key construction, worker-heartbeat shape, Supabase client construction, X-API publish, time formatting, cockpit SSR query, execution-command kind enumeration, and live-action armed-sentinel set. See ARCHITECTURE.md §11 for the canonical module per concept and which rows are currently [ENFORCED] / [PARTIAL] / [TARGET] / [DEBT].
  2. Preserve X presentation semantics. [PARTIAL] overall; [ENFORCED] for surfaces covered by docs/specs/x-rendering-fidelity.md smokes/proofs. Every surface that displays Twitter/X content MUST preserve how that content renders on X: single tweet, thread, quote post, reply, media, metrics, timestamps, and link/quote cards. Storage, analyzer inputs, review UI, publish previews, audit/log views, and post-publish state MUST carry enough structured data to render the same semantic shape the operator will see on X. New or changed surfaces MUST use the canonical rendering primitives named in ARCHITECTURE.md §11.2 and satisfy docs/specs/x-rendering-fidelity.md; they MUST NOT replace quote/reply/media semantics with plain text, raw URLs, or explanatory placeholders except as typed blockers or explicitly marked missing-data states.
  3. No parallel implementations without a migration plan. [ENFORCED] going forward via supervisor verification. A PR that introduces a parallel implementation of an existing concept MUST include an explicit deprecation/migration plan for the duplicate, named in the PR description and tracked to closure. "Temporary" parallel implementations without a closure date are how triplication happens.
  4. Doc-vs-code coherence on contracts. [PARTIAL]. When a canonical doc states a contract value (e.g. "10 bookmarks/day"), the runtime MUST read that value from a single canonical module. Hard-coded copies in individual runners are a contract bug, not local optimization. Current known divergences (see §0b debt note below) are technical debt, not precedent.
  5. Mirror discipline for shared helpers. [TARGET] for new mirrors; [DEBT] for the existing 8 of 10 substantially-diverged pairs. Helpers needed by both scripts/*.mjs (root) and web/app/** (Next routes) live exclusively in one tree. The other tree, if needed, holds a thin documented re-export — never an independently maintained copy. Hand-edited divergent copies are a defect; a substantial-drift mirror pair is a deploy-blocker once flagged. See ARCHITECTURE.md §11.1 for the bundle-isolation rule.
  6. Concept-duplication scan before PASS. [ENFORCED] going forward via supervisor verification. Before claiming PASS on work that touches a concept governed by this section, the supervisor (or a fresh verifier session) checks that the change landed in the canonical module and that no parallel implementation was introduced. See AUTONOMOUS-WORK.md § Supervisor verification → Concept-duplication scan.

Currently known divergences [DEBT] — listed for transparency, NOT precedent. As of 2026-05-08, the repo carries:

  • 8 of 10 substantially-diverged lib/web/lib/ mirror pairs (approval-text-binding, authored-content-shape, authored-rewrite-store, mafia-rework-command, mafia-rework-store, original-post-action-bridge, original-post-generate-orchestrator, rewrite-job-orchestrator); only owned-x-accounts.mjs and partial-thread-state.mjs are identical;
  • conflicting daily-cap definitions between execute-x-actions.js (20 bookmarks/day) and web/lib/mafia-live-runner.mjs (10 bookmarks/day); the doc-stated cap (10) is authoritative;
  • 4–5 distinct X-API publish paths with partial helper sharing for OAuth refresh, rate-limit, and typed-error normalization;
  • ~11 files in web/app/** that hand-roll Date() formatting outside web/lib/operator-date.mjs;
  • 6+ operator surfaces that hand-roll the cockpit SSR query; some still filter pending,running only and drop failed/cancelled;
  • ~150 smoke scripts with no shared harness;
  • tweet/quote rendering implemented three different ways (TweetCard, QuotedTweetCard, mafia/tweets/page.jsx) with hand-rolled fmtTs, fmtNumber, and media galleries.

These are technical debt under sequenced consolidation. New work MUST NOT expand them and SHOULD prefer extending the canonical module where one exists.


0c. Live E2E Test Account & Cleanup Contract

Status: [TARGET] — Seva provisioned the dedicated test account on 2026-05-09; the credentials, runtime wiring, smoke harness, and cleanup helpers do not yet exist. This section documents the contract that the future implementation MUST satisfy. Until the wiring is built and X_E2E_LIVE_ARMED=1 is exported, no script may make live writes against the test account.

The repeating limit on the existing test surface is that it cannot run real E2E against X without (a) burning the production account or (b) leaving residue visible to other users. To enable end-to-end live verification while honouring the operator-path acceptance gate (§0a / GOALS-PIPELINE.md §3 item 0), there is one — and only one — designated live-write target for tests.

0c.1 Designated test account

0c.2 Cleanup-immediately contract

Every action visible to other users on @sevaustinovtest MUST be undone within the same test run, before the test reports success. Cleanup is part of the test, not a follow-up.

Action class Visible? Required cleanup
like yes (appears in target's likers list) unlike before test exit
bookmark no (private to the bookmarker) unbookmark anyway, to keep the test account's bookmark roster empty
reply / comment yes (renders under the target post) delete the reply tweet by id before test exit
quote post yes (renders as a standalone tweet) delete the quote tweet by id before test exit
original post / thread yes delete every segment by id before test exit
follow yes unfollow before test exit
repost (retweet) yes unrepost before test exit

Rules:

  1. Cleanup runs in finally, not on the happy path. A test that asserts then exits without unwinding the visible state is a contract violation, even if the assertion passed. The test harness MUST guarantee cleanup on assertion failure, exception, timeout, and process kill (best-effort signal handler).
  2. Failed cleanup is a test FAILURE, not a warning. If unlike/delete/unfollow/etc. returns non-success, the test reports FAIL and writes a typed blocker (blocker_type='e2e_cleanup_failed') so the operator sees residue exists and can intervene. Do not write PASS over a dirty test account.
  3. Cleanup is deterministic. Every visible-write action returns its own undo handle (tweet id, action id) before any assertions run. The test stores the handle and uses it for the undo call; "find-and-delete by content" is forbidden because content matching can collide with concurrent runs.
  4. Replies and quote posts target only @sevaustinovtest posts. Tests that write a reply/quote MUST target a post authored by @sevaustinovtest itself (created earlier in the same test or in a fixture-seed step that precedes assertions). Tests MUST NOT write replies or quotes against any other account, even other "test-looking" accounts.
  5. Original posts/threads written by the test MUST be deleted within the same run. A test that posts an original tweet/thread on @sevaustinovtest deletes every segment in finally, regardless of whether the assertion checked one segment or all of them.
  6. Followers/following list stays clean. Tests that follow @sevaustinovtest from another test handle (or vice versa) unfollow on cleanup. The test account's follower graph at rest is the same shape after the test as before.
  7. No engagement with anyone else. No likes, bookmarks, follows, replies, or quotes against any account other than @sevaustinovtest itself. The Interface Approval Boundary still applies; a test run is not a chat-authorized engagement campaign.
  8. Test markers on every visible write. Every reply / quote / original written by the harness includes a deterministic test marker the operator can identify after the fact (e.g. a [E2E test 2026-MM-DD HH:MM:SS UTC] suffix or a hidden zero-width-joiner pattern), so any residue from a crashed run can be hunted and cleaned by a recovery sweep.
  9. Residue-recovery sweep is [TARGET]. A separate scheduled sweep (scripts/recover-e2e-residue.mjs, not yet built) will scan @sevaustinovtest's recent activity for unmarked or unmatched test artifacts and clean them. Until it exists, every test run is responsible for its own cleanup; orphaned residue is escalated to Seva for manual cleanup.
  10. Cleanup MUST be ledgered. Both the visible write and its undo write canonical ledger rows under their respective idempotency_keys, with mode='real' and target_account='@sevaustinovtest' so the audit trail is complete. A visible write without a paired undo row is the residue signal.

0c.3 Arming and gating

0c.4 Use cases this enables

When the wiring exists, this contract enables:

0c.5 What this section does NOT authorize

The boundary is mechanical: a production live write touches Seva's audience and is gated by interface approval; a test live write touches @sevaustinovtest, is gated by X_E2E_LIVE_ARMED, and cleans up immediately. No third category exists.


1. Script-First Orchestration

Seva directive (2026-04-28, Telegram #392287):

Everything that can be done by scripts — do with scripts. What can't be done by scripts alone — do with scripts that launch very narrow, fixed-prompt agents, each doing one thing, tested and reliable. All of this wrapped in one master script. Claude Code's job: run the master script, monitor, handle errors. No improvisation.

Hierarchy

CC top-level task
  → master orchestrator script (run-daily-cycle.mjs / run-approved-actions.mjs):
    sequences steps; fail-closed; structured exit code + summary
    → deterministic step scripts (e.g. daily-pipeline.mjs, scripts/check-ledger-consistency.mjs)
    → narrow fixed-prompt agent runners (when LLM judgment is needed; e.g. generate-inbox-items.mjs)

The CC task runs the master script, reads exit codes and logs, reports / escalates — it does NOT invent steps, improvise recovery, or add ad-hoc work.

Rules

  1. Script first. If a step can be a deterministic script, it must be.
  2. Narrow agents behind scripts. If a step needs LLM judgment, wrap it in a script that:
  3. loads a fixed prompt from a versioned file,
  4. passes structured input (JSON/YAML),
  5. captures structured output,
  6. has test fixtures and expected-output checks.
  7. Master script wraps all steps. One command runs the entire daily cycle. Analogous to setup/update-all.sh in SharedWorkspace.
  8. Claude Code only runs and monitors. The top-level CC task:
  9. invokes the master script,
  10. reads exit codes and logs,
  11. reports results or escalates errors,
  12. does NOT invent steps, improvise recovery, or add ad-hoc work.
  13. Every step is idempotent, logged, and fail-closed for external writes.
  14. Every step supports --dry-run.
  15. Fixed prompts are versioned files, not inline strings. Changes to prompts are code changes (reviewed, committed).
  16. One operation, one canonical primitive. Each operator/runtime operation lives in exactly one single-purpose script or library primitive. Composition happens at the master-orchestrator layer with no domain logic. Line count is a smell, not the contract: the contract is one responsibility / one mutation. A script with seven unrelated --mode flags is N defects, not one feature. An operation reused in 2+ orchestrators MUST be its own callable primitive, not duplicated. Full architecture: ARCHITECTURE.md §15.

Consequences for Current Pipeline

Current state Target state
Daily update requires ad-hoc CC session with improvised steps One master command: node run-daily-cycle.mjs
Approved-action execution is manual sequence One command: node run-approved-actions.mjs --dry-run then --live
Browser-assisted steps involve manual browsing Scripted browser runner with fixed prompts, no manual browsing
Agent prompts are inline or improvised per session Prompts in prompts/ directory, versioned, with fixtures in data/fixtures/
Error handling depends on operator reading logs Master script produces structured exit code + summary JSON
CC session invents its own task sequence CC runs master script, reads result, reports

§1f Overnight Pipeline Reliability Targets [TARGET]

Before each reliability-hardening session, verify 6 gates: (1) the signal pipeline produces scored signals at the configured time; (2) daily-update step records success and updates the canonical mirror; (3) inbox items have valid field counts; (4) approved-action ledger is consistent; (5) browser-assisted execution has approval gates; (6) recovery procedures exist for partial failures. Source: distilled from the 2026-04-29 overnight plan.

§1g Daily Operator Loop (11:30 PT window) [PARTIAL]

Sequence: (1) verify-handoff.mjs cold-start gate; (2) status-check.mjs signals/mafia freshness; (3) daily-checklist.mjs one-screen view; (4) mafia approve / dry-run / live (armed sentinel + fresh approval + caps not exhausted); (5) original-post intentional blocker (do not force); (6) recovery-report.mjs end-of-day ledger consistency. Live publish gates: armed sentinel + prior dry-run + interface approval + cap not exhausted. Source: distilled from OVERNIGHT-2026-04-29-FINAL-HANDOFF.md (now archived).


1b. Fixed Action Catalog

The system supports a fixed set of named actions (see docs/specs/action-catalog.md for the full catalog). Actions are product-level primitives — their IDs are stable. Implementation mechanics can be refactored without changing the action set.

The catalog is both a contract and a living discovery tool:

Two Kinds of Feedback

When Seva provides feedback on a generated item, it is always one of:

  1. Content feedback (about THIS specific post/reply): Triggers regenerate_content or revise_reply. Handled by rerunning the scripted LLM generation path for that item — the narrow agent re-reads the original signal + Seva Voice + feedback notes and produces a revised version. This is a script operation, not a freeform agent task.

  2. Rules/prompt feedback (about HOW items are generated in general): Triggers update_generation_rules, update_voice_guidance, or update_scoring_policy. Handled by Marvin + master Claude Code editing versioned prompt files, testing with fixtures, and committing. This is a system change, not an item regeneration.

These are distinct execution paths. Content feedback never changes system rules. Rules feedback never changes a specific item.

Execution Path Selection

For public X writes, the planner should choose the safest deterministic path:

  1. Prefer API execution when the relationship/API context allows it (for example follow-each-other, prior engagement, or known API-allowed thread).
  2. Route cold or API-restricted replies/quotes to the scripted browser-assisted runner.
  3. Store enough relationship/capability information to avoid rediscovering the same API restriction repeatedly.
  4. If the route is uncertain, dry-run and surface the uncertainty; do not improvise a manual browser path.

Scope Boundary

This project is the X Research system. The following workflows are out of scope unless Seva explicitly reopens them:

X Research docs may reference these as out-of-scope context, but MUST NOT treat them as part of the daily X pipeline. Out-of-scope workflow runbooks live under docs/archive/out-of-scope/ for evidence/recovery only; their previous canonical-doc registry entries were removed in Phase 4O. Reopening any of them requires Seva confirmation + canonical-doc registration + sentinel review.


1c. Site as View, Not Source

The X Research site is a rendered view over canonical sources. It must not become a second source of truth.

Rules:

  1. Canonical truth lives in source files/data, not generated HTML:
  2. documents: root canonical Markdown, docs/reference/**, and docs/specs/** in x-archive;
  3. content bases: publishing-repo/content/x-research/*.yaml;
  4. execution history: Supabase ledger_events for runtime reads, with data/canonical-ledger.jsonl as backup append;
  5. runtime decisions: Edge Config overlays, later materialized/synced back to source where appropriate.
  6. Generated output is never hand-edited. Files under output/site/ are build artifacts.
  7. New site sections start with a canonical source. Add or update the source doc/YAML/data first, then render it.
  8. Docs website target [TARGET]: live file-backed viewer. The docs section should load a manifest and file bodies from an authenticated Mac mini docs-source API over the x-archive filesystem. It must show current file contents (including docs with no Supabase row) without requiring agents to run backfill/deploy. Supabase canonical_documents is transitional cache only and can be deleted after this lands.
  9. Templates may contain UI chrome, labels, navigation, empty states, and layout copy. They must not contain canonical product/content truth that should live in docs/YAML/data.
  10. If the site says something important that cannot be traced to a canonical source, treat it as content drift. Fix by moving the truth into the source layer or removing the stale view.

This principle should not prevent helpful presentation. It only prevents the site from quietly becoming a parallel undocumented knowledge base, which is how software starts developing folklore.


1d. Posts as Database Records

Posts, replies, quote-posts, and queue entries are records moving through a lifecycle, not loose cards copied between lists. Each stage refines fields and changes status.

Rules:

  1. One logical record, many projections. Inbox views, queues, calendars, and action logs are projections over canonical records; they must not become independent truths.
  2. Stages refine data. A record may start as a rough signal/post idea, then gain content_lane, draft text, review notes, approval status, scheduled date, execution metadata, and ledger references.
  3. Return-to-stage is a status transition. Moving an item back for editing/review means changing status (for example scheduled or approved -> pending), not creating a new unrelated item.
  4. Derived queues recalculate. If a record leaves a stage (for example scheduled original post returns to pending), dependent projections such as the original-post queue must remove/recompute that entry and preserve invariants.
  5. History is append-only where it matters. Status can change, but review notes, rewrite history, scheduling changes, and execution attempts should remain auditable through logs/ledger/history fields.
  6. Scripts own transitions. Status changes that affect derived state should go through scripts/API actions that update or recompute all affected projections. Manual YAML edits are for emergency repair only.
  7. One unified inbox, filtered by source and status view [TARGET]. Reviewable items — Mafia engagement candidates, original-post drafts, future source classes, recovery cards — live in one operator inbox keyed by canonical record id, surfaced via source filters (mafia, original, future marketers/founders, recovery, …) and target status views (review, scheduled, published, rejected, recovery) rather than separate URLs/components. pending and returned are states inside the default review view, not separate top-level filters; rewrite/rework workers must keep the card anchored in the same review/source view while running. hide/hidden and Other are not target concepts; hide maps to reject-with-reason. New reviewable record classes join the unified inbox by source/tag, NOT by route. The current Mafia lane and original-post queue surfaces are forks pending sequenced consolidation. Full architecture: ARCHITECTURE.md §14.
  8. Scheduling constraints queue, they do not retry. [ENFORCED] for Launch Mafia like/bookmark; [TARGET] for the generalisation across other live-action classes. When an approved action cannot execute right now because of a known time-bounded scheduling constraint (daily cap exhausted, market-hours window, content embargo, etc.), the row MUST be queued for automatic execution after the constraint expires — NOT shown to the operator as a manual-Retry blocker. Daily cap is not a task; humans are not the retry queue. True blockers (creds missing, X 4xx, env not armed, typed approval-gate refusal) remain manual-Retry per the Operator UI Cockpit Contract. The mechanical test: if the only thing that changes the action's eligibility is the clock advancing past eligible_after, it is live_deferred; if anything else needs to change, it is a typed blocker. Full pattern (data.cap_deferred = { reason, eligible_after, …} + final_state='live_deferred' + scheduled executor): ARCHITECTURE.md §6.1. Reason: Seva 2026-05-04.

This keeps the system boring: records change status; views follow. The alternative is five lists arguing about reality, which we have already explored with appropriate despair.


1e. Operator-Facing Time Is Always SF/PT (12-hour am/pm)

All operator-facing UI timestamps, dates, elapsed labels with calendar context, empty-state copy, reset labels, queue dates, action-log dates, cockpit timestamps, and daily-status labels must render in San Francisco time: America/Los_Angeles, displayed as PT. Time-of-day rendered to operators (scheduled time, picker defaults, reset labels) is 12-hour am/pm ("8:25 PM PT"), not 24-hour ("20:25 PT"). 24-hour and ISO shapes stay in storage/API only.

Rules:

  1. UI display is SF/PT in 12-hour am/pm. If a human sees it in the cockpit, inbox, action log, approvals, mafia surfaces, daily checklist, or site operator views, format it through the operator-time helpers, not raw UTC and not 24-hour for clock displays. The canonical helpers are in web/lib/operator-date.mjs (OPERATOR_TIMEZONE, formatTimestamp, formatTimeOfDay12h, formatOperatorWeekdayDate, todayInTimeZone, operatorTimeOfDayHHMM, operatorDayWindow). Hand-rolled new Date().toISOString(), Intl.DateTimeFormat, .toLocaleString(), or .toLocaleDateString() in web/app/** operator surfaces is a D7 regression (ARCHITECTURE.md §11.2).
  2. Storage may stay UTC / 24-hour / ISO. Database timestamps, ledger events, API payloads, filenames, command IDs, run-report partitions, Supabase query cutoffs, and <input type="time"> values (24-hour by HTML contract) may use ISO / 24-hour for determinism. That is an implementation detail, not operator copy.
  3. Operator-day semantics are SF-day semantics. Daily caps, reset labels, “today”, scheduled-date display, and overdue calculations that affect operator decisions must use the SF operator day (todayInTimeZone()) unless a script explicitly documents a UTC technical partition.
  4. UTC or 24-hour in UI needs a reason. If raw UTC or a 24-hour clock must be shown to an operator (for example debugging a raw ledger row), label it explicitly and, where practical, show the PT 12-hour equivalent next to it.
  5. X-presentation timestamps follow docs/specs/x-rendering-fidelity.md, not this rule. Tweet/quote-card timestamps inside the canonical post renderer mimic X's own format ("4h", "May 14"); they are not operator chrome.

The system can store time like a database. It must talk about time like the operator lives in San Francisco at 8:25 PM, not at 20:25. Disturbing that this has to be written down, but here we are.


2. Bounded Autonomous Sessions

Autonomous / overnight Claude Code work follows the bounded deterministic run model. See AUTONOMOUS-WORK.md for the full operating guide. Summary:

  1. State lives in files, not context. Checkpoints in data/run-state/YYYY-MM-DD.json. Per-task progress lives in the checkpoint summary, not a canonical doc. Context compacts; files don't.
  2. Scope follows Operator Workflows. Session prompts start from GOALS-PIPELINE.md §4a and docs/specs/operator-journeys.md: pick the current workflow (OJ-001OJ-008), name the target proof, and keep every CC slice inside that workflow unless Seva expands scope.
  3. Scope is pre-defined and closed. Session prompt lists tasks and stopping conditions. Unlisted tasks don't happen. Max 10 tasks, max 15 CC runs per session.
  4. Progress is machine-verifiable. Script exit codes, file existence checks, grep patterns — not "it looks done."
  5. Artifacts are contractual and classified. Every session produces one session prompt, one checkpoint JSON/summary, exact-path Evidence reports only when needed, and local commits after meaningful units. CC progress notes, intermediate plans, and raw outputs are Sprint/Run-state/Evidence/Temp per AUTONOMOUS-WORK.md; they must not become new canonical/handoff files.
  6. Recovery is file-based. After crash or compaction: read checkpoint, check git log, resume from next uncompleted task.
  7. Stop-time beats checklist completion. If Seva says “work until HH:MM”, keep launching bounded useful work until that time. Completing the initial task list changes the work mode to hardening/reviewer/integration; it does not create a resting state.

Evidence basis: Overnight cycle of 2026-04-27→28 ran 42 CC tasks with empty heartbeat state, no compact checkpoint summary, and a production data overwrite. External research confirms: file-based state, bounded scope, and budget guards are universal patterns across Anthropic's own harness guide, Ralph loop, and production agent postmortems.


3. Reliability Before Features

Seva directive (2026-04-28): no new feature expansion until the daily pipeline is boringly reliable. Before each reliability-hardening session, verify the 6 gates in §1f below.


4. Master Script Contract (Target Design)

Name: run-daily-cycle.mjs (or .sh)

Inputs: - --dry-run — no external writes - --step=NAME — run only one step (for debugging) - --date=YYYY-MM-DD — override date (for backfills)

Steps (in order):

# Step Script Agent?
1 Preflight checks preflight-check.mjs No
2 Fetch & score signals daily-pipeline.mjs No
3 Generate inbox items generate-inbox-items.mjs Yes — fixed prompt from prompts/generate-inbox.md
4 Health check: signals check-signals-health.mjs No
5 Health check: ledger scripts/check-ledger-consistency.mjs No
6 Build site build-and-deploy.sh --build-only No
7 Deploy (if not dry-run) build-and-deploy.sh --deploy No
8 Final report generate-run-report.mjs No

Outputs: - Exit code: 0 = all steps passed, 1 = step failure, 2 = wrapper error - data/run-reports/YYYY-MM-DD.json — structured summary of each step's result - Logs: ~/.openclaw/logs/daily-cycle-YYYY-MM-DD.log

Error handling: - Each step failure is logged and reported but does not crash the master script (fail-forward for read-only steps, fail-closed for write steps). - Final report always runs, even after failures. - CC reads the report JSON and surfaces issues — does not improvise fixes.


Approved-Actions Execution Contract (Target Design)

Name: run-approved-actions.mjs

Steps:

# Step Script
1 Load approved items from inbox load-approved-actions.mjs
2 Preflight: API auth, rate limits preflight-check.mjs --actions
3 Execute (dry-run or live) execute-x-actions.js / browser-assisted-runner.mjs
4 Ledger consistency check scripts/check-ledger-consistency.mjs
5 Report generate-execution-report.mjs

Same --dry-run / structured output / fail-closed contract as above.

Interface Approval Boundary

For now, live X actions may execute only when Seva has explicitly confirmed the exact action/plan in the product interface/UI approval flow. A chat message, even if it expresses intent, is not sufficient approval for live posting, liking, bookmarking, replying, or quoting. Chat instructions can authorize engineering work, dry-runs, diagnostics, and preparation of an execution plan, but the final live action gate must be backed by interface approval metadata. This boundary prevents Marvin or child agents from treating conversational intent as an execution click.

Read-only testing authorization. Seva explicitly authorizes read-only testing and evidence capture by default, including using a logged-in browser session to inspect pages and capture screenshots, as long as the work does not click or trigger mutating actions (post, like, bookmark, reply, quote, follow, delete, approve, publish, or equivalent). Read-only browser work may open/navigate pages, inspect DOM, and save screenshots for verification without a separate per-run approval. If an action could mutate external state or expose private/auth data beyond local evidence artifacts, stop and ask.

Execution Ledger and Action Log Contract

Live actions must be script-owned end to end. If a script posts, likes, bookmarks, replies, quotes, or otherwise mutates X, the same script path must write the canonical execution event immediately. No agent should perform a separate manual “mark as done” step after execution. Manual reconciliation is allowed only as incident repair, and must itself be logged.

The Action Log is a projection of the canonical execution ledger, not a separate truth source. There must be one canonical ledger source for automated X actions, or a deterministic sync that makes the site projection current. A successful script execution that does not appear in the Action Log is a system bug.

Live execution gates must be backed by product/interface approval metadata. Chat messages can authorize engineering work, diagnostics, dry-runs, and plan preparation, but they are not execution approval for live X actions. Live scripts must fail closed unless the plan/action carries interface approval metadata for the exact action set being executed.

Daily/reliability checks should include the chain:

  1. script executes or dry-runs the action;
  2. canonical ledger row is written with attempt/outcome;
  3. Action Log projection can render the row;
  4. failures and blockers surface without an agent manually copying state between systems.

If any link in this chain requires Marvin or Claude Code to manually copy/annotate state, the automation is incomplete.

Operator UI Cockpit Contract

Operator UI is a cockpit, not a log viewer. The operator is a pilot reading instruments, not an analyst scrolling a feed. Every operator surface (/inbox, /mafia/**, /action-log, /runs, /approvals) follows the same cockpit contract. Canonical reference: docs/reference/operator/OPERATOR-UX-COCKPIT-PLAN.md.

Non-negotiables

  1. Fixed state-machine vocabulary. Every async action exposes states drawn from the lifecycle set: idle / submitting / queued / running / succeeded / failed / blocked / cancelled. The cockpit also derives a stuck warning over the running state when the worker heartbeat is older than the stale threshold (≥120s); stuck is not a terminal command status — the underlying command remains running and polling continues. cancelled IS a terminal command status, surfaced when an operator or scheduler explicitly cancels. Colors and copy are owned by web/app/components/StateBadge.jsx and web/app/components/WorkerStatusPanel.jsx (which exposes data-worker-status for the lifecycle state and data-worker-stuck as a separate '0' | '1' warning marker). Do not fork per surface.

  2. Connection liveness ≠ progress liveness. A green chip is not proof of life. The UI must render two distinct signals: connection (the page is reaching the API) and progress (the worker emitted a heartbeat recently). Stale heartbeat → "Possibly stuck" warning, NOT a failure.

  3. Long workers MUST emit cockpit telemetry. Anything that can outlive ~30 seconds (Claude/LLM, publish, browser-assisted, retry-all) PATCHes execution_commands.result every ~20s with this shape:

json { "stage": "regenerate | process-authored-rewrite | publish | …", "heartbeat_at": "ISO-8601", "heartbeat_message": "human-readable, ≤ 240 chars", "latest_action": "what the worker just did, ≤ 120 chars", "last_output_excerpt": "tail of stdout/stderr, ≤ 600 chars, redacted", "token_usage": { "input": N, "output": N } | null, "token_usage_status": "available | unavailable | not_applicable", "worker_pid": 12345, "worker_id": "Sevas-Mac-mini.local", "started_at": "ISO-8601", "no_live_x_actions": true | false }

Token counts that are not available (e.g. claude --print without stream-json) MUST surface as token_usage: null with token_usage_status: "unavailable" and a human-readable reason — never silently omitted.

Rich-heartbeat envelope. Wired workers additionally emit rendered_status (one-line operator glance) + activity envelope (schema_version:1). Two-cadence model (1s tick / 5s DB-coalesce with stage/token/tool/terminal bypasses). Canonical sources: lib/worker-status-line.mjs, lib/worker-activity-state.mjs, lib/worker-heartbeat-loop.mjs, lib/worker-rich-heartbeat-helpers.mjs. Hand-rolled per-worker forks are a typed blocker (cockpit_status_line_handrolled / cockpit_activity_envelope_handrolled). Full wiring history, per-worker adoption status, and S1–S5 change log: docs/specs/cockpit-contract.md §2 item 10 + §7.

  1. Latest action and excerpts come from real worker output. Workers must derive latest_action and last_output_excerpt from actual stdout/stderr (e.g. the tail of a teed log), not invented copy. Capped at ≤120 / ≤600 chars and redacted of obvious secrets before write.

  2. Terminal state always renders with next action inline. Every terminal lifecycle render (succeeded / failed / blocked / cancelled) AND every derived running + stuck render carries the next operator action concretely (Refresh row · Open log · Retry · Cancel · Resolve · Re-launch). "Refresh page" is acceptable only as a last-resort fallback. The page-level SSR query that feeds the cockpit MUST include rows in pending | running | failed | cancelled for the recent operator window (≥24h). Filtering on pending | running only is a regression: a terminal failure that disappears on next refresh violates "next action inline" because the operator cannot see that the failure exists.

  3. Empty states explain why and what next. Acceptable: "Inbox is empty — daily generator runs at 06:00 SF; the next batch lands here automatically. To force a regen, run …". Not acceptable: "No items match this filter."

  4. Global active-jobs view. /action-log carries an "Active Jobs" section (web/app/components/ActiveJobsPanel.jsx) that lists every execution_commands row in pending or running with kind, worker, heartbeat age, latest_action, started_at, elapsed, and a link back to the source row. The operator must never have to hunt a card to know what is in flight.

  5. No-live-X assertion is a cockpit telemetry signal. When a worker is provably non-publishing (rewrite, dry-run, browser dry-run) it stamps no_live_x_actions: true on execution_commands.result. The cockpit reflects this via the container data-worker-no-live-x attribute (consumed by smokes + audit). No operator-facing copy is rendered for this signal — the card title already names the work, and publish workers can stamp the same flag on noop/failed terminal payloads where a "this worker is provably non-publishing" line was actively misleading (removed 2026-05-28).

  6. State transitions must refresh authoritative UI state. After any successful operator transition (approve/reject/return/rewrite/queue/publish), the visible card, counters, filters, and available CTAs must update from the canonical server state or explicitly show a blocking refresh error — no zombie cards, stale buttons, or “only Force Reload fixes it” states. hide is not a target transition; use reject-with-reason. Operator-surface SSR fetches MUST opt out of the data cache ({ revalidate: 0 } / cache: 'no-store'); a router.refresh() after an operator transition is only authoritative if the source query is uncached, otherwise prop-sync clobbers optimistic state with stale data. When an operator action triggers an asynchronous worker, the visible card MUST remain in the lane the operator was on until the underlying execution_commands row reaches a terminal state, regardless of any optimistic status flip; local cockpit state MUST be hydratable from server props on every render so a parent re-render or React-key remount cannot lose command_id.

  7. Substantive cockpit-shape gate (not optional taste). Absence of broken states (auth_error, unavailable, placeholder, stuck, failed) is preflight, NOT acceptance. Before any operator-visible slice that touches /inbox, /queue, /mafia/**, /action-log, /runs, or /approvals may be claimed daily-use ready, the supervisor MUST also run the substantive cockpit gate (docs/specs/inbox-cockpit-substantive-gate.md). The gate emits PASS / PARTIAL / FAIL plus a checked / not-checked list. The slice may claim PASS only when the substantive gate is PASS. The gate is mechanical because human-glanceability is mechanical — the same card the operator opens in 3 seconds is what the gate measures. Why this rule exists: 2026-05-18 inbox audit (data/run-reports/2026-05-18/reliability-stabilization/inbox-ux-principles-audit.md) found a card that was technically green and product-incoherent at the same time. Green smokes / build / deploy do not substitute for cockpit shape.

    Banned patterns inside primary operator card body (gate FAILs on each): - Tutorial / implementation-narrating copy describing how the system will render the card to itself (e.g. will render as wrapper text + native X quote card, BELOW YOUR WRAPPER TEXT, X WILL EMBED THIS AS A NATIVE QUOTE CARD, ORIGINAL QUOTE POST, SEGMENT N / 280). These belong in docs/spec/dev tooltips, never on the operator's first-glance surface. - Raw 19-digit tweet ids and ≥30-char row ids in primary visible content. Allowed as data-* attributes (for smokes), behind <details> / on-hover, or in admin-scope footer chevrons. The visible identity is handle + timestamp + permalink anchor. - Dead disabled controls at review-time. Author-time gestures (split / merge / move / remove for single-segment formats; rewrite-history toggles on terminal rows; publish-now on running rows) MUST be conditionally rendered, not disabled-and-present. Disabled buttons are noise, not affordance. - Duplicate labels for the same fact on the same card. Each fact (format, status, source, identity) renders human-visibly at most once. Smoke selectors that need the assertion surface keep the data-* attribute; redundant rendering goes sr-only or is removed. - System / credential blockers mixed into X-presentation content. A bearer_invalid / worker_env_missing / cron_not_wired blocker MUST surface on the cockpit health strip (system scope), never inside the card body the operator is asked to approve. Reinforces §0a item 4 and §0b rule 2. - Unclear CTA hierarchy. Two equally-weighted "Approve" variants (publish-now vs schedule) without a default-picked-from-context one are noise; the gate flags a row where multiple CTAs in the same colour family compete for primary attention. - Tab/lane taxonomy that contradicts the spec. All/Hidden/Other as top-level filters when docs/specs/unified-inbox.md says they are retired or debug-only.

    Required evidence for a PASS claim: - Fresh production screenshot of the primary surface at desktop AND mobile widths (≤ 24h old) showing the affected card class. PNG attached or linked from the run report. - DOM smoke assertions executed by the substantive gate (selector counts for data-inbox-card-format, data-inbox-authored-format-label, data-composer-thread-format-indicator; absence of banned tutorial strings; data-composer-segment-action count = 0 on single-segment quote-post review cards; no visible 19-digit numeric run inside [data-quoted-source-preview-card]). - PASS / PARTIAL / FAIL line printed by the gate, plus the explicit "checked" and "not checked" lists. - Claim level: substantive PASS = "daily-use operationally acceptable"; PARTIAL = "operator surface technically green; cockpit shape NOT YET verified — known gaps: …"; FAIL = "operator surface NOT acceptable — escalate before next operator session."

Why this contract exists: Seva 2026-05-06 + 2026-05-18. The cockpit is not a database admin tool. Items 1–9 prevent the bug class "system says alive when it isn't"; item 10 prevents the bug class "system says healthy when the card the operator opens is shaped like a debug dump."

Roles and Operating Model

Marvin is the delivery owner, supervisor, and context keeper. Claude Code is the default implementation worker for project changes and fixes.

Marvin's job on each task: 1. Frame the task with full intention chain (see AUTONOMOUS-WORK.md § Child Agent Intention Chain). 2. Launch Claude Code to implement. 3. Supervise wakes/results. 4. Verify independently (run checks, read diffs — don't trust self-reported "done"). 5. Decide: accept, relaunch with corrections, or escalate to Seva.

Marvin should not do implementation directly by default. The reason: implementation consumes main-session context that is better reserved for project overview, cross-task judgment, and supervision continuity. Losing that context to a large implementation pass degrades the supervisor's ability to catch drift, prioritize, and hand off cleanly.

Exceptions where Marvin may implement directly: - Tiny, safe inspection or verification (reading a file, running a check). - Emergency repair when launching a new CC session would cost more time than the fix. - Explicit instruction from Seva.

All roles are subject to the Interface Approval Boundary above: no live X/social actions without explicit product/UI approval and existing fail-closed gates.