X Research — Operating Principles

0. Documentation Discipline

Keep the operating context small and canonical. Durable product goals live in GOALS-PIPELINE.md; architecture and technical decisions live in ARCHITECTURE.md; operating rules live here; autonomous-run rules live in AUTONOMOUS-WORK.md; action definitions live in docs/specs/action-catalog.md; audit / ledger event contracts live in docs/specs/hot-write-ledger.md.

Rules:

Do not create permanent new handoff-style/design docs by default. If a decision matters beyond an incident, distill it into ARCHITECTURE.md, GOALS-PIPELINE.md, or this file.
Run reports are evidence, not context. They may record what happened, but future agents should not need to read a pile of reports to know the current architecture.
Prompts should read targeted sections, not whole archives. Never include full docs/archive/** files (including the legacy PIPELINE-2026-04-26.md) or all run reports in a Claude Code prompt unless the task is explicitly historical audit.
One canonical file per knowledge type. If two files disagree, update the canonical file and treat the other as stale/reference.
After incident repair, update canonical docs. A fix is incomplete if the lesson remains only in chat or a one-off report.
Move before delete. A run report or design doc may be archived/deleted only after its durable lesson is captured in a canonical file (this one, GOALS-PIPELINE.md, or ARCHITECTURE.md). Reports cite the canonical destination; canonical docs cite the report path as evidence.
Reports do not silently edit canonical docs. If a report needs a canonical-doc change, it proposes exact wording in the report and waits for supervisor/Seva confirmation.
Artifact lifecycle and context budget. Every file the project produces belongs to one of five lifecycle classes (Canonical, Sprint, Run-state, Evidence, Temp), each with a size cap and a cleanup rule. The bootstrap reading list has a fixed context budget; adding content to a Canonical or Sprint file requires compressing equivalent content elsewhere (net-zero rule). For autonomous runs longer than ~4 hours, the supervisor schedules periodic compaction phases. Operational rules: AUTONOMOUS-WORK.md § Artifact Lifecycle and Context Budget. Storage architecture (filesystem layout, Supabase mirror, archival paths): ARCHITECTURE.md §13.
Anti-amnesia. Anything that affects future behavior MUST live in a Canonical doc, an Evidence report, or runtime state (Supabase / data/run-state/ / ledger) — not only in chat memory or one CC session's context. Workers are stateless on input (start from execution_commands.payload + filesystem + Supabase; no implicit parent-session context). Decisions have provenance (canonical-doc edit, sprint-checkpoint entry with rationale, or evidence report). Sprint state is reconstructable: at any point in an autonomous run, a fresh CC session must be able to load the Canonical reading list + active Sprint files + latest checkpoint and continue. Full architecture: ARCHITECTURE.md §17.
Per-feature specs live in docs/specs/. Every operator-visible feature and shared primitive has one Canonical-class spec at docs/specs/<feature>.md (≤ 300 lines) that names its invariants, operator path, typed blockers, and verifying smokes/E2Es. The spec is the bridge between GOALS-PIPELINE.md DoD and scripts/smoke-*.mjs / verify-prod-*.mjs. Cite GOALS items, ARCHITECTURE concepts, and OPERATING-PRINCIPLES rules from inside the spec; do not duplicate them. Full pattern: ARCHITECTURE.md §18.
Tests are graph proofs, not context. The project is organized as a traceability spine: Goals → Capabilities → Module DAG → Operator Journeys → Proofs/tests/evidence → run/debug order. A new smoke-*, e2e-*, or verify-prod-* script MUST cite the spec or operator-journey ID it proves; a new operator-visible scenario MUST live in the journey registry/spec before tests multiply around it. Tests without a spec/journey anchor are orphan evidence and SHOULD be retired or anchored before they are trusted. Target artifact: docs/specs/operator-journeys.md (compact canonical journey list + embedded capability/module index). Until the runner and registry are wired, mark missing links [TARGET] rather than pretending the graph exists.
Documentation graph passes the fresh-Claude bootstrap test. Canonical docs, specs, preflight, and the canonical-store registry are mechanically coherent only if a fresh Claude Code session — given AGENTS.md and nothing else — can (a) follow the read list and preflight, (b) reach the correct task-specific docs by slug, (c) run the named checks, and (d) complete a bounded task without inventing progress / handoff / status files. When a doc/spec/preflight/registry change is made, the same slice fixes any graph break the change exposes (stale slug, dangling reference, drift between fallback map and backfill registry, missing spec index entry) — do not work around it.
Project docs are source of truth; chat memory is not. When answering or changing anything about X Research architecture, acceptance gates, credentials, OAuth/OAuth2, runtime surfaces, operator journeys, product scope, or current invariants, start from the root documentation graph (AGENTS.md → GOALS-PIPELINE.md / ARCHITECTURE.md / this file / AUTONOMOUS-WORK.md → docs/specs/** → docs/reference/**). Do not use OpenClaw memory search, prior chat, or a run report as the primary authority for current behavior. Memory search may help recall that a decision exists, but the answer must be verified against canonical project docs or code. If the root docs and their linked specs/reference docs do not contain the answer, that is a documentation gap: update the canonical docs in the same slice (or mark the gap explicitly BLOCKED if the answer needs Seva's decision) instead of treating memory as enough.
Instruction changes must be behavior-tested, not merely written. When Seva asks to update instructions so future agents behave differently, the change is not complete just because a canonical doc was edited. The supervisor must test the proposed wording against the failure mode it is meant to prevent: launch one or more fresh isolated subagents (or an equivalent clean-context simulation) with the same root/canonical files a future worker or supervisor would receive, give them the problematic worker result or scenario, and check whether they choose the desired action. If the simulated behavior is wrong or ambiguous, iterate the wording and retest until the desired behavior is reproduced. Record the tested rule in the canonical doc and briefly report the simulation result. This is especially mandatory for stop/DONE rules, autonomy/verification rules, safety gates, and cases where a worker says "not found", "not possible", or otherwise invites premature surrender.

These rules exist because the project had begun accumulating checkpoint/report sediment. Sediment is for geology, not software operations.

0a. No False Done

Status legend in this file: [ENFORCED] = rule is wired into code/smokes and fails closed; [PARTIAL] = rule is correct but adoption is incomplete; [TARGET] = rule is desired but the supporting module/check does not yet exist; [DEBT] = a known divergence flagged for consolidation. See ARCHITECTURE.md §0 for the same legend used across canonical docs.

The repeating failure class on this project is "agent reported done; Seva opened the product; the product was not actually ready." These rules make that mechanically harder.

No PASS without operator-path evidence. [ENFORCED] for the supervisor verification rule (AUTONOMOUS-WORK.md); [PARTIAL] for the universal application — not every slice has a runnable operator-path gate yet; supervisors must create one or mark BLOCKED rather than waving the rule. Before writing PASS on operator-visible work, satisfy the gate in GOALS-PIPELINE.md §3 item 0. "Looks right", "appears to work", "behaves as expected", "verified by smoke", or "Claude says done" are not substitutes for the gate.
succeeded means the mutation occurred. [PARTIAL] — publish path has a noop guard (scripts/execution-command-worker.mjs); mafia/generate/rewrite/rework kinds rely on typed blockers but lack an equivalent guard. See ARCHITECTURE.md §3.1. Worker status='succeeded' is permitted only when the worker's intended external or internal mutation actually happened. Live publish/like/bookmark/reply/quote in mode='real' requires a posted_tweet_id (or external-action equivalent) AND a matching outcome=ok ledger row keyed on idempotency_key. A noop, refusal, or any path that bypassed the external mutation MUST set status='failed' with a typed blocker_type, unless the operator explicitly requested noop. Cockpits MUST NOT render "Published to X" / "Liked on X" / equivalent without the corresponding external id.
Browser verification means actually clicking the operator CTA. [PARTIAL] — scripts/verify-prod-inbox-approve-click-e2e.mjs exemplifies the pattern for the inbox approve flow; analogous click-E2Es do not yet exist for every operator CTA. For click-related bugs, browser verification MUST drive the exact operator gesture (click the real Approve/Publish/Reject/Rewrite button on a real-shaped row) and assert the canonical post-click state. Render-and-filter, DOM-presence-only, or fixture-replay probes are not browser verification; they are pre-flight.
External API errors are typed and operator-actionable. [PARTIAL] — lib/x-api-error.mjs exists and is consulted by the originals-publish path; ~1–2 of 4–5 publish/like/bookmark sites currently route failures through it (Phase 3 D5). Every script/worker that calls an external API (X, OAuth, Supabase) MUST normalize thrown errors through a typed helper before writing the ledger row or returning a result. The normalized record carries http_status, surface_message (≤240 chars), and hint (operator-actionable next step); cockpits render surface_message + hint, never the raw library string. Bare Request failed with code <N> is a contract regression.
"Works" includes load budget. [ENFORCED] for the 17 static budget invariants; [PARTIAL] for byte-level dashboard reconciliation. Functional PASS + UX PASS is NOT a daily-use ready / operationally acceptable claim if the Supabase load budget on the slice's hot path is FAIL or UNKNOWN. A green operator-path E2E + green cockpit shape on a query that ships ~5000 rows per /inbox SSR is the same false-done class as a green smoke on a worker that returned noop: the operator-visible state looks right while the system silently produces the next incident (2026-05-21 exceed_egress_quota restriction). Before claiming PASS on any slice that adds or changes a hot-path query (SSR, polling endpoint, worker poll, or cron read), node scripts/smoke-supabase-egress-budget.mjs MUST exit 0 AND the slice's report MUST cite that exit code or a scripts/supabase-usage-snapshot.mjs snapshot. UNKNOWN load on a hot path is BLOCKED, not PASS. Canonical contract + typed blockers + PASS/PARTIAL/FAIL semantics: docs/specs/supabase-load-budget.md.

These items reinforce, and are reinforced by, the Operator UI Cockpit Contract below. Treat them together.

0b. Single Canonical Implementation

Status: §0b is target-shape policy. Rules 1–6 below are the desired operating contract. The current repo carries 13 known duplication classes (D1–D13 in data/run-reports/2026-05-08/dreaming-phase3-deeper-root-causes.md) — listed at the end of this section as [DEBT] — that violate rules 1, 4, and 5. Future agents must read each rule with that reality in mind: rules 1 and 5 are [TARGET] for new work; rule 2 is [PARTIAL] overall and [ENFORCED] where docs/specs/x-rendering-fidelity.md covers the surface; rules 3 and 6 are [ENFORCED] going forward via the supervisor verification process; rule 4 is [PARTIAL] because some canonical modules still need to be created.

The other repeating failure class on this project is "the same element was implemented three different ways in three places, the fix landed in one, the bug returned through the others." These rules make that mechanically harder.

One canonical implementation per concept. [TARGET] for new work; [PARTIAL] overall. Each product/UI/runtime concept has exactly one canonical implementation in this repo. Surfaces that need the concept import it; they do not re-implement it. Concepts that fall under this rule include — but are not limited to — tweet/quote-post rendering, status/state vocabulary, daily-cap enforcement, idempotency-key construction, worker-heartbeat shape, Supabase client construction, X-API publish, time formatting, cockpit SSR query, execution-command kind enumeration, and live-action armed-sentinel set. See ARCHITECTURE.md §11 for the canonical module per concept and which rows are currently [ENFORCED] / [PARTIAL] / [TARGET] / [DEBT].
Preserve X presentation semantics. [PARTIAL] overall; [ENFORCED] for surfaces covered by docs/specs/x-rendering-fidelity.md smokes/proofs. Every surface that displays Twitter/X content MUST preserve how that content renders on X: single tweet, thread, quote post, reply, media, metrics, timestamps, and link/quote cards. Storage, analyzer inputs, review UI, publish previews, audit/log views, and post-publish state MUST carry enough structured data to render the same semantic shape the operator will see on X. New or changed surfaces MUST use the canonical rendering primitives named in ARCHITECTURE.md §11.2 and satisfy docs/specs/x-rendering-fidelity.md; they MUST NOT replace quote/reply/media semantics with plain text, raw URLs, or explanatory placeholders except as typed blockers or explicitly marked missing-data states.
No parallel implementations without a migration plan. [ENFORCED] going forward via supervisor verification. A PR that introduces a parallel implementation of an existing concept MUST include an explicit deprecation/migration plan for the duplicate, named in the PR description and tracked to closure. "Temporary" parallel implementations without a closure date are how triplication happens.
Doc-vs-code coherence on contracts. [PARTIAL]. When a canonical doc states a contract value (e.g. "10 bookmarks/day"), the runtime MUST read that value from a single canonical module. Hard-coded copies in individual runners are a contract bug, not local optimization. Current known divergences (see §0b debt note below) are technical debt, not precedent.
Mirror discipline for shared helpers. [TARGET] for new mirrors; [DEBT] for the existing 8 of 10 substantially-diverged pairs. Helpers needed by both scripts/*.mjs (root) and web/app/** (Next routes) live exclusively in one tree. The other tree, if needed, holds a thin documented re-export — never an independently maintained copy. Hand-edited divergent copies are a defect; a substantial-drift mirror pair is a deploy-blocker once flagged. See ARCHITECTURE.md §11.1 for the bundle-isolation rule.
Concept-duplication scan before PASS. [ENFORCED] going forward via supervisor verification. Before claiming PASS on work that touches a concept governed by this section, the supervisor (or a fresh verifier session) checks that the change landed in the canonical module and that no parallel implementation was introduced. See AUTONOMOUS-WORK.md § Supervisor verification → Concept-duplication scan.

Currently known divergences [DEBT] — listed for transparency, NOT precedent. The current divergences (D1–D13: mirror-pair drift, conflicting daily caps, multiple X-API publish paths, hand-rolled date formatting and cockpit SSR queries, the un-shared smoke harness, and triplicated tweet/quote rendering) are catalogued in ARCHITECTURE.md §11.2 (per-concept registry, each row tagged (D1)…(D13)) and §10 item 8, with root-cause evidence in data/run-reports/2026-05-08/dreaming-phase3-deeper-root-causes.md. These are technical debt under sequenced consolidation. New work MUST NOT expand them and SHOULD prefer extending the canonical module where one exists.

0c. Live E2E Test Account & Cleanup Contract

Status: [TARGET] — Seva provisioned the dedicated test account on 2026-05-09; the credentials, runtime wiring, smoke harness, and cleanup helpers do not yet exist. This section documents the contract that the future implementation MUST satisfy. Until the wiring is built and X_E2E_LIVE_ARMED=1 is exported, no script may make live writes against the test account.

The repeating limit on the existing test surface is that it cannot run real E2E against X without (a) burning the production account or (b) leaving residue visible to other users. To enable end-to-end live verification while honouring the operator-path acceptance gate (§0a / GOALS-PIPELINE.md §3 item 0), there is one — and only one — designated live-write target for tests.

0c.1 Designated test account

Handle: @sevaustinovtest — https://x.com/sevaustinovtest. The one and only designated live-write target for automated tests / verification scripts.
Production accounts (Seva's own and any other owned account) are NOT valid live-test targets. Tests that write against any account other than @sevaustinovtest are a contract violation and MUST be refused by the live runner.

0c.2 Safety boundary (the four bullets that stay inline)

Every action visible to other users on @sevaustinovtest MUST be undone within the same test run, before the test reports success — cleanup is part of the test, not a follow-up. The wake-/safety-critical bullets:

@sevaustinovtest is the only live-write target. Before any live write the runner MUST assert the resolved authenticated user id is @sevaustinovtest; if the OAuth token resolves to any other handle, refuse with blocker_type='e2e_wrong_account'.
Cleanup runs in finally, not on the happy path. The harness MUST guarantee cleanup on assertion failure, exception, timeout, and process kill.
Failed cleanup is a test FAILURE, not a warning (blocker_type='e2e_cleanup_failed'). Do not write PASS over a dirty test account.
The two arming sentinels MUST NOT both be set on the same process: X_E2E_LIVE_ARMED arms test-account writes; MAFIA_LIVE_ARMED / ORIGINALS_LIVE_ARMED arm production writes (ARCHITECTURE.md §4). A runner exports at most one; co-arming is a P0 violation.

Replies and quotes target only @sevaustinovtest's own posts (a seed post or the canonical persistent test tweet), never any other account. The boundary is mechanical: a production live write touches Seva's audience and is gated by interface approval; a test live write touches @sevaustinovtest, is gated by X_E2E_LIVE_ARMED, and cleans up immediately. No third category exists.

Full contract (cleanup table, per-action undo handles, deterministic-undo rule, residue-recovery sweep, test markers, ledger discipline, use-cases, and what the section does NOT authorize): docs/specs/live-e2e-test-account.md; architecture summary: ARCHITECTURE.md §12.

1. Script-First Orchestration

Seva directive (2026-04-28, Telegram #392287):

Everything that can be done by scripts — do with scripts. What can't be done by scripts alone — do with scripts that launch very narrow, fixed-prompt agents, each doing one thing, tested and reliable. All of this wrapped in one master script. Claude Code's job: run the master script, monitor, handle errors. No improvisation.

Hierarchy

CC top-level task
  → master orchestrator script (run-daily-cycle.mjs / run-approved-actions.mjs):
    sequences steps; fail-closed; structured exit code + summary
    → deterministic step scripts (e.g. daily-pipeline.mjs, scripts/check-ledger-consistency.mjs)
    → narrow fixed-prompt agent runners (when LLM judgment is needed; e.g. generate-inbox-items.mjs)

The CC task runs the master script, reads exit codes and logs, reports / escalates — it does NOT invent steps, improvise recovery, or add ad-hoc work.

Rules

Script first. If a step can be a deterministic script, it must be.
Narrow agents behind scripts. If a step needs LLM judgment, wrap it in a script that:
loads a fixed prompt from a versioned file,
passes structured input (JSON/YAML),
captures structured output,
has test fixtures and expected-output checks.
Master script wraps all steps. One command runs the entire daily cycle. Analogous to setup/update-all.sh in SharedWorkspace.
Claude Code only runs and monitors. The top-level CC task:
invokes the master script,
reads exit codes and logs,
reports results or escalates errors,
does NOT invent steps, improvise recovery, or add ad-hoc work.
Every step is idempotent, logged, and fail-closed for external writes.
Every step supports --dry-run.
Fixed prompts are versioned files, not inline strings. Changes to prompts are code changes (reviewed, committed).
One operation, one canonical primitive. Each operator/runtime operation lives in exactly one single-purpose script or library primitive. Composition happens at the master-orchestrator layer with no domain logic. Line count is a smell, not the contract: the contract is one responsibility / one mutation. A script with seven unrelated --mode flags is N defects, not one feature. An operation reused in 2+ orchestrators MUST be its own callable primitive, not duplicated. Full architecture: ARCHITECTURE.md §16.

Consequences for Current Pipeline

Current state	Target state
Daily update requires ad-hoc CC session with improvised steps	One master command: `node run-daily-cycle.mjs`
Approved-action execution is manual sequence	One command: `node run-approved-actions.mjs --dry-run` then `--live`
Browser-assisted steps involve manual browsing	Scripted browser runner with fixed prompts, no manual browsing
Agent prompts are inline or improvised per session	Prompts in `prompts/` directory, versioned, with fixtures in `data/fixtures/`
Error handling depends on operator reading logs	Master script produces structured exit code + summary JSON
CC session invents its own task sequence	CC runs master script, reads result, reports

§1f Overnight Pipeline Reliability Targets `[TARGET]`

Before each reliability-hardening session, verify 6 gates: (1) the signal pipeline produces scored signals at the configured time; (2) daily-update step records success and updates the canonical mirror; (3) inbox items have valid field counts; (4) approved-action ledger is consistent; (5) browser-assisted execution has approval gates; (6) recovery procedures exist for partial failures. Source: distilled from the 2026-04-29 overnight plan.

§1g Daily Operator Loop (11:30 PT window) `[PARTIAL]`

Sequence: (1) verify-handoff.mjs cold-start gate; (2) status-check.mjs signals/mafia freshness; (3) daily-checklist.mjs one-screen view; (4) mafia approve / dry-run / live (armed sentinel + fresh approval + caps not exhausted); (5) original-post intentional blocker (do not force); (6) recovery-report.mjs end-of-day ledger consistency. Live publish gates: armed sentinel + prior dry-run + interface approval + cap not exhausted. Source: distilled from OVERNIGHT-2026-04-29-FINAL-HANDOFF.md (now archived).

1b. Fixed Action Catalog

The system supports a fixed set of named actions (see docs/specs/action-catalog.md for the full catalog). Actions are product-level primitives — their IDs are stable. Implementation mechanics can be refactored without changing the action set.

The catalog is both a contract and a living discovery tool:

Contract: normal system behavior must be expressed as fixed action IDs, not ad-hoc agent behavior.
Living discovery: when daily use reveals a new recurring case, add it to the catalog before refactoring automation around it.
Refactoring guide: when changing scripts/prompts/runners, preserve the cataloged action semantics and Definition of Done.
Case memory: the purpose is to know all goals, cases, and typical tasks before rebuilding the automation. If a case is unclear, document it as an open question instead of letting an agent improvise.

Two Kinds of Feedback

When Seva provides feedback on a generated item, it is always one of:

Content feedback (about THIS specific post/reply): Triggers regenerate_content or revise_reply. Handled by rerunning the scripted LLM generation path for that item — the narrow agent re-reads the original signal + Seva Voice + feedback notes and produces a revised version. This is a script operation, not a freeform agent task.
Rules/prompt feedback (about HOW items are generated in general): Triggers update_generation_rules, update_voice_guidance, or update_scoring_policy. Handled by Marvin + master Claude Code editing versioned prompt files, testing with fixtures, and committing. This is a system change, not an item regeneration.

These are distinct execution paths. Content feedback never changes system rules. Rules feedback never changes a specific item.

Execution Path Selection

For public X writes, the planner should choose the safest deterministic path:

Prefer API execution when the relationship/API context allows it (for example follow-each-other, prior engagement, or known API-allowed thread).
Route cold or API-restricted replies/quotes to the scripted browser-assisted runner.
Store enough relationship/capability information to avoid rediscovering the same API restriction repeatedly.
If the route is uncertain, dry-run and surface the uncertainty; do not improvise a manual browser path.

Scope Boundary

This project is the X Research system. The following workflows are out of scope unless Seva explicitly reopens them:

LinkedIn adaptation / cross-posting — separate LinkedIn system.
Demo-video workflow — DEMO-VIDEO-WORKFLOW.md archived 2026-05-09 to docs/archive/out-of-scope/.
Visual-content workflow — VISUAL-CONTENT-WORKFLOW.md archived 2026-05-09 to docs/archive/out-of-scope/.

X Research docs may reference these as out-of-scope context, but MUST NOT treat them as part of the daily X pipeline. Out-of-scope workflow runbooks live under docs/archive/out-of-scope/ for evidence/recovery only; their previous canonical-doc registry entries were removed in Phase 4O. Reopening any of them requires Seva confirmation + canonical-doc registration + sentinel review.

1c. Site as View, Not Source

The X Research site is a rendered view over canonical sources. It must not become a second source of truth.

Rules:

Canonical truth lives in source files/data, not generated HTML:
documents: root canonical Markdown, docs/reference/**, and docs/specs/** in x-archive;
content bases: publishing-repo/content/x-research/*.yaml;
execution history: Supabase ledger_events for runtime reads, with data/canonical-ledger.jsonl as backup append;
runtime decisions: Edge Config overlays, later materialized/synced back to source where appropriate.
Generated output is never hand-edited. Files under output/site/ are build artifacts.
New site sections start with a canonical source. Add or update the source doc/YAML/data first, then render it.
Docs website target [TARGET]: live file-backed viewer. The docs section should load a manifest and file bodies from an authenticated Mac mini docs-source API over the x-archive filesystem. It must show current file contents (including docs with no Supabase row) without requiring agents to run backfill/deploy. Supabase canonical_documents is transitional cache only and can be deleted after this lands.
Templates may contain UI chrome, labels, navigation, empty states, and layout copy. They must not contain canonical product/content truth that should live in docs/YAML/data.
If the site says something important that cannot be traced to a canonical source, treat it as content drift. Fix by moving the truth into the source layer or removing the stale view.

This principle should not prevent helpful presentation. It only prevents the site from quietly becoming a parallel undocumented knowledge base, which is how software starts developing folklore.

1d. Posts as Database Records

Posts, replies, quote-posts, and queue entries are records moving through a lifecycle, not loose cards copied between lists. Each stage refines fields and changes status.

Rules:

One logical record, many projections. Inbox views, queues, calendars, and action logs are projections over canonical records; they must not become independent truths.
Stages refine data. A record may start as a rough signal/post idea, then gain content_lane, draft text, review notes, approval status, scheduled date, execution metadata, and ledger references.
Return-to-stage is a status transition. Moving an item back for editing/review means changing status (for example scheduled or approved -> pending), not creating a new unrelated item.
Derived queues recalculate. If a record leaves a stage (for example scheduled original post returns to pending), dependent projections such as the original-post queue must remove/recompute that entry and preserve invariants.
History is append-only where it matters. Status can change, but review notes, rewrite history, scheduling changes, and execution attempts should remain auditable through logs/ledger/history fields.
Scripts own transitions. Status changes that affect derived state should go through scripts/API actions that update or recompute all affected projections. Manual YAML edits are for emergency repair only.
One unified inbox, filtered by source and status view [TARGET]. Reviewable items — Mafia engagement candidates, original-post drafts, future source classes, recovery cards — live in one operator inbox keyed by canonical record id, surfaced via source filters (mafia, original, future marketers/founders, recovery, …) and target status views (review, scheduled, published, rejected, recovery) rather than separate URLs/components. pending and returned are states inside the default review view, not separate top-level filters; rewrite/rework workers must keep the card anchored in the same review/source view while running. hide/hidden and Other are not target concepts; hide maps to reject-with-reason. New reviewable record classes join the unified inbox by source/tag, NOT by route. The current Mafia lane and original-post queue surfaces are forks pending sequenced consolidation. Full architecture: ARCHITECTURE.md §14.
Scheduling constraints queue, they do not retry. [ENFORCED] for Launch Mafia like/bookmark; [TARGET] for the generalisation across other live-action classes. When an approved action cannot execute right now because of a known time-bounded scheduling constraint (daily cap exhausted, market-hours window, content embargo, etc.), the row MUST be queued for automatic execution after the constraint expires — NOT shown to the operator as a manual-Retry blocker. Daily cap is not a task; humans are not the retry queue. True blockers (creds missing, X 4xx, env not armed, typed approval-gate refusal) remain manual-Retry per the Operator UI Cockpit Contract. The mechanical test: if the only thing that changes the action's eligibility is the clock advancing past eligible_after, it is live_deferred; if anything else needs to change, it is a typed blocker. Full pattern (data.cap_deferred = { reason, eligible_after, …} + final_state='live_deferred' + scheduled executor): ARCHITECTURE.md §6.1. Reason: Seva 2026-05-04.

This keeps the system boring: records change status; views follow. The alternative is five lists arguing about reality, which we have already explored with appropriate despair.

1e. Operator-Facing Time Is Always SF/PT (12-hour am/pm)

All operator-facing UI timestamps, dates, elapsed labels with calendar context, empty-state copy, reset labels, queue dates, action-log dates, cockpit timestamps, and daily-status labels must render in San Francisco time: America/Los_Angeles, displayed as PT. Time-of-day rendered to operators (scheduled time, picker defaults, reset labels) is 12-hour am/pm ("8:25 PM PT"), not 24-hour ("20:25 PT"). 24-hour and ISO shapes stay in storage/API only.

Rules:

UI display is SF/PT in 12-hour am/pm. If a human sees it in the cockpit, inbox, action log, approvals, mafia surfaces, daily checklist, or site operator views, format it through the operator-time helpers, not raw UTC and not 24-hour for clock displays. The canonical helpers are in web/lib/operator-date.mjs (OPERATOR_TIMEZONE, formatTimestamp, formatTimeOfDay12h, formatOperatorWeekdayDate, todayInTimeZone, operatorTimeOfDayHHMM, operatorDayWindow). Hand-rolled new Date().toISOString(), Intl.DateTimeFormat, .toLocaleString(), or .toLocaleDateString() in web/app/** operator surfaces is a D7 regression (ARCHITECTURE.md §11.2).
Storage may stay UTC / 24-hour / ISO. Database timestamps, ledger events, API payloads, filenames, command IDs, run-report partitions, Supabase query cutoffs, and <input type="time"> values (24-hour by HTML contract) may use ISO / 24-hour for determinism. That is an implementation detail, not operator copy.
Operator-day semantics are SF-day semantics. Daily caps, reset labels, “today”, scheduled-date display, and overdue calculations that affect operator decisions must use the SF operator day (todayInTimeZone()) unless a script explicitly documents a UTC technical partition.
UTC or 24-hour in UI needs a reason. If raw UTC or a 24-hour clock must be shown to an operator (for example debugging a raw ledger row), label it explicitly and, where practical, show the PT 12-hour equivalent next to it.
X-presentation timestamps follow docs/specs/x-rendering-fidelity.md, not this rule. Tweet/quote-card timestamps inside the canonical post renderer mimic X's own format ("4h", "May 14"); they are not operator chrome.

The system can store time like a database. It must talk about time like the operator lives in San Francisco at 8:25 PM, not at 20:25. Disturbing that this has to be written down, but here we are.

2. Bounded Autonomous Sessions

Autonomous / overnight Claude Code work follows the bounded deterministic run model. See AUTONOMOUS-WORK.md for the full operating guide. Summary:

State lives in files, not context. Checkpoints in data/run-state/YYYY-MM-DD.json. Per-task progress lives in the checkpoint summary, not a canonical doc. Context compacts; files don't.
Scope follows Operator Workflows. Session prompts start from GOALS-PIPELINE.md §4a and docs/specs/operator-journeys.md: pick the current workflow (OJ-001…OJ-008), name the target proof, and keep every CC slice inside that workflow unless Seva expands scope.
Scope is pre-defined and closed. Session prompt lists tasks and stopping conditions. Unlisted tasks don't happen. Max 10 tasks, max 15 CC runs per session.
Progress is machine-verifiable. Script exit codes, file existence checks, grep patterns — not "it looks done."
Artifacts are contractual and classified. Every session produces one session prompt, one checkpoint JSON/summary, exact-path Evidence reports only when needed, and local commits after meaningful units. CC progress notes, intermediate plans, and raw outputs are Sprint/Run-state/Evidence/Temp per AUTONOMOUS-WORK.md; they must not become new canonical/handoff files.
Recovery is file-based. After crash or compaction: read checkpoint, check git log, resume from next uncompleted task.
Stop-time beats checklist completion. If Seva says “work until HH:MM”, keep launching bounded useful work until that time. Completing the initial task list changes the work mode to hardening/reviewer/integration; it does not create a resting state.

Evidence basis: Overnight cycle of 2026-04-27→28 ran 42 CC tasks with empty heartbeat state, no compact checkpoint summary, and a production data overwrite. External research confirms: file-based state, bounded scope, and budget guards are universal patterns across Anthropic's own harness guide, Ralph loop, and production agent postmortems.

3. Reliability Before Features

Seva directive (2026-04-28): no new feature expansion until the daily pipeline is boringly reliable. Before each reliability-hardening session, verify the 6 gates in §1f below.

4. Master Script Contract (Target Design)

Name: run-daily-cycle.mjs (or .sh)

Inputs: - --dry-run — no external writes - --step=NAME — run only one step (for debugging) - --date=YYYY-MM-DD — override date (for backfills)

Steps (in order):

#	Step	Script	Agent?
1	Preflight checks	`preflight-check.mjs`	No
2	Fetch & score signals	`daily-pipeline.mjs`	No
3	Generate inbox items	`generate-inbox-items.mjs`	Yes — fixed prompt from `prompts/generate-inbox.md`
4	Health check: signals	`check-signals-health.mjs`	No
5	Health check: ledger	`scripts/check-ledger-consistency.mjs`	No
6	Build site	`build-and-deploy.sh --build-only`	No
7	Deploy (if not dry-run)	`build-and-deploy.sh --deploy`	No
8	Final report	`generate-run-report.mjs`	No

Outputs: - Exit code: 0 = all steps passed, 1 = step failure, 2 = wrapper error - data/run-reports/YYYY-MM-DD.json — structured summary of each step's result - Logs: ~/.openclaw/logs/daily-cycle-YYYY-MM-DD.log

Error handling: - Each step failure is logged and reported but does not crash the master script (fail-forward for read-only steps, fail-closed for write steps). - Final report always runs, even after failures. - CC reads the report JSON and surfaces issues — does not improvise fixes.

Approved-Actions Execution Contract (Target Design)

Name: run-approved-actions.mjs

Steps:

#	Step	Script
1	Load approved items from inbox	`load-approved-actions.mjs`
2	Preflight: API auth, rate limits	`preflight-check.mjs --actions`
3	Execute (dry-run or live)	`execute-x-actions.js` / `browser-assisted-runner.mjs`
4	Ledger consistency check	`scripts/check-ledger-consistency.mjs`
5	Report	`generate-execution-report.mjs`

Same --dry-run / structured output / fail-closed contract as above.

Interface Approval Boundary

For now, live X actions may execute only when Seva has explicitly confirmed the exact action/plan in the product interface/UI approval flow. A chat message, even if it expresses intent, is not sufficient approval for live posting, liking, bookmarking, replying, or quoting. Chat instructions can authorize engineering work, dry-runs, diagnostics, and preparation of an execution plan, but the final live action gate must be backed by interface approval metadata. This boundary prevents Marvin or child agents from treating conversational intent as an execution click.

Read-only testing authorization. Seva explicitly authorizes read-only testing and evidence capture by default, including using a logged-in browser session to inspect pages and capture screenshots, as long as the work does not click or trigger mutating actions (post, like, bookmark, reply, quote, follow, delete, approve, publish, or equivalent). Read-only browser work may open/navigate pages, inspect DOM, and save screenshots for verification without a separate per-run approval. If an action could mutate external state or expose private/auth data beyond local evidence artifacts, stop and ask.

Execution Ledger and Action Log Contract

Live actions must be script-owned end to end. If a script posts, likes, bookmarks, replies, quotes, or otherwise mutates X, the same script path must write the canonical execution event immediately. No agent should perform a separate manual “mark as done” step after execution. Manual reconciliation is allowed only as incident repair, and must itself be logged.

The Action Log is a projection of the canonical execution ledger, not a separate truth source. There must be one canonical ledger source for automated X actions, or a deterministic sync that makes the site projection current. A successful script execution that does not appear in the Action Log is a system bug.

Live execution gates must be backed by product/interface approval metadata. Chat messages can authorize engineering work, diagnostics, dry-runs, and plan preparation, but they are not execution approval for live X actions. Live scripts must fail closed unless the plan/action carries interface approval metadata for the exact action set being executed.

Daily/reliability checks should include the chain:

script executes or dry-runs the action;
canonical ledger row is written with attempt/outcome;
Action Log projection can render the row;
failures and blockers surface without an agent manually copying state between systems.

If any link in this chain requires Marvin or Claude Code to manually copy/annotate state, the automation is incomplete.

Operator UI Cockpit Contract

Operator UI is a cockpit, not a log viewer. The operator is a pilot reading instruments, not an analyst scrolling a feed. Every operator surface (/inbox, /mafia/**, /action-log, /runs, /approvals) follows the same cockpit contract. Canonical reference: docs/reference/operator/OPERATOR-UX-COCKPIT-PLAN.md.

Non-negotiables

Fixed state-machine vocabulary. Every async action exposes states drawn from the lifecycle set: idle / submitting / queued / running / succeeded / failed / blocked / cancelled. The cockpit also derives a stuck warning over the running state when the worker heartbeat is older than the stale threshold (≥120s); stuck is not a terminal command status — the underlying command remains running and polling continues. cancelled IS a terminal command status, surfaced when an operator or scheduler explicitly cancels. Colors and copy are owned by web/app/components/StateBadge.jsx and web/app/components/WorkerStatusPanel.jsx (which exposes data-worker-status for the lifecycle state and data-worker-stuck as a separate '0' | '1' warning marker). Do not fork per surface.
Connection liveness ≠ progress liveness. A green chip is not proof of life. The UI must render two distinct signals: connection (the page is reaching the API) and progress (the worker emitted a heartbeat recently). Stale heartbeat → "Possibly stuck" warning, NOT a failure.
Long workers MUST emit cockpit telemetry. Anything that can outlive ~30 seconds (Claude/LLM, publish, browser-assisted, retry-all) PATCHes execution_commands.result every ~20s with the canonical heartbeat shape: stage, heartbeat_at (ISO-8601), heartbeat_message (≤240 chars), latest_action (≤120 chars), last_output_excerpt (≤600 chars, redacted), token_usage ({input, output} or null), token_usage_status (available | unavailable | not_applicable), worker_pid, worker_id, started_at (ISO-8601), and no_live_x_actions. Canonical builder + full field semantics: docs/specs/cockpit-contract.md (item 3).

Token counts that are not available (e.g. claude --print without stream-json) MUST surface as token_usage: null with token_usage_status: "unavailable" and a human-readable reason — never silently omitted.

Rich-heartbeat envelope. Wired workers additionally emit rendered_status (one-line operator glance) + activity envelope (schema_version:1). Two-cadence model (1s tick / 5s DB-coalesce with stage/token/tool/terminal bypasses). Canonical sources: lib/worker-status-line.mjs, lib/worker-activity-state.mjs, lib/worker-heartbeat-loop.mjs, lib/worker-rich-heartbeat-helpers.mjs. Hand-rolled per-worker forks are a typed blocker (cockpit_status_line_handrolled / cockpit_activity_envelope_handrolled). Full wiring history, per-worker adoption status, and S1–S5 change log: docs/specs/cockpit-contract.md §2 item 10 + §7.

Latest action and excerpts come from real worker output. Workers must derive latest_action and last_output_excerpt from actual stdout/stderr (e.g. the tail of a teed log), not invented copy. Capped at ≤120 / ≤600 chars and redacted of obvious secrets before write.
Terminal state always renders with next action inline. Every terminal lifecycle render (succeeded / failed / blocked / cancelled) AND every derived running + stuck render carries the next operator action concretely (Refresh row · Open log · Retry · Cancel · Resolve · Re-launch). "Refresh page" is acceptable only as a last-resort fallback. The page-level SSR query that feeds the cockpit MUST include rows in pending | running | failed | cancelled for the recent operator window (≥24h). Filtering on pending | running only is a regression: a terminal failure that disappears on next refresh violates "next action inline" because the operator cannot see that the failure exists.
Empty states explain why and what next. Acceptable: "Inbox is empty — daily generator runs at 06:00 SF; the next batch lands here automatically. To force a regen, run …". Not acceptable: "No items match this filter."
Global active-jobs view. /action-log carries an "Active Jobs" section (web/app/components/ActiveJobsPanel.jsx) that lists every execution_commands row in pending or running with kind, worker, heartbeat age, latest_action, started_at, elapsed, and a link back to the source row. The operator must never have to hunt a card to know what is in flight.
No-live-X assertion is a cockpit telemetry signal. When a worker is provably non-publishing (rewrite, dry-run, browser dry-run) it stamps no_live_x_actions: true on execution_commands.result. The cockpit reflects this via the container data-worker-no-live-x attribute (consumed by smokes + audit). No operator-facing copy is rendered for this signal — the card title already names the work, and publish workers can stamp the same flag on noop/failed terminal payloads where a "this worker is provably non-publishing" line was actively misleading (removed 2026-05-28).
State transitions must refresh authoritative UI state. After any successful operator transition (approve/reject/return/rewrite/queue/publish), the visible card, counters, filters, and available CTAs must update from the canonical server state or explicitly show a blocking refresh error — no zombie cards, stale buttons, or “only Force Reload fixes it” states. hide is not a target transition; use reject-with-reason. Operator-surface SSR fetches MUST opt out of the data cache ({ revalidate: 0 } / cache: 'no-store'); a router.refresh() after an operator transition is only authoritative if the source query is uncached, otherwise prop-sync clobbers optimistic state with stale data. When an operator action triggers an asynchronous worker, the visible card MUST remain in the lane the operator was on until the underlying execution_commands row reaches a terminal state, regardless of any optimistic status flip; local cockpit state MUST be hydratable from server props on every render so a parent re-render or React-key remount cannot lose command_id.
Substantive cockpit-shape gate (not optional taste). Absence of broken states (auth_error, unavailable, placeholder, stuck, failed) is preflight, NOT acceptance. Before any operator-visible slice that touches /inbox, /queue, /mafia/**, /action-log, /runs, or /approvals may be claimed daily-use ready, the supervisor MUST also run the substantive cockpit gate (docs/specs/inbox-cockpit-substantive-gate.md). The gate emits PASS / PARTIAL / FAIL plus a checked / not-checked list. The slice may claim PASS only when the substantive gate is PASS. The gate is mechanical because human-glanceability is mechanical — the same card the operator opens in 3 seconds is what the gate measures. Why this rule exists: 2026-05-18 inbox audit (data/run-reports/2026-05-18/reliability-stabilization/inbox-ux-principles-audit.md) found a card that was technically green and product-incoherent at the same time. Green smokes / build / deploy do not substitute for cockpit shape.

Rule. The operator's primary card body must read like a cockpit instrument, not a debug dump: no tutorial / implementation-narrating copy, no raw 19-digit tweet ids or long row ids in visible content, no dead disabled author controls, no duplicate labels for the same fact, no system/credential blockers mixed into X-presentation content (they belong on the health strip — reinforces §0a item 4 and §0b rule 2), no competing equally-weighted primary CTAs, and no tab/lane taxonomy that contradicts docs/specs/unified-inbox.md. The full banned-pattern list and the required-evidence list (production screenshots at desktop+mobile widths, the DOM smoke selector assertions, and the printed checked / not-checked lists) are enumerated in docs/specs/inbox-cockpit-substantive-gate.md.

Claim semantics. The gate emits PASS / PARTIAL / FAIL: substantive PASS = "daily-use operationally acceptable"; PARTIAL = "operator surface technically green; cockpit shape NOT YET verified — known gaps: …"; FAIL = "operator surface NOT acceptable — escalate before next operator session." A slice may claim PASS only when the substantive gate is PASS.

Why this contract exists: Seva 2026-05-06 + 2026-05-18. The cockpit is not a database admin tool. Items 1–9 prevent the bug class "system says alive when it isn't"; item 10 prevents the bug class "system says healthy when the card the operator opens is shaped like a debug dump."

Roles and Operating Model

Marvin is the delivery owner, supervisor, and context keeper. Claude Code is the default implementation worker for project changes and fixes.

Marvin's job on each task: 1. Frame the task with full intention chain (see AUTONOMOUS-WORK.md § Child Agent Intention Chain). 2. Launch Claude Code to implement. 3. Supervise wakes/results. 4. Verify independently (run checks, read diffs — don't trust self-reported "done"). 5. Decide: accept, relaunch with corrections, or escalate to Seva.

Marvin should not do implementation directly by default. The reason: implementation consumes main-session context that is better reserved for project overview, cross-task judgment, and supervision continuity. Losing that context to a large implementation pass degrades the supervisor's ability to catch drift, prioritize, and hand off cleanly.

Exceptions where Marvin may implement directly: - Tiny, safe inspection or verification (reading a file, running a check). - Emergency repair when launching a new CC session would cost more time than the fix. - Explicit instruction from Seva.

All roles are subject to the Interface Approval Boundary above: no live X/social actions without explicit product/UI approval and existing fail-closed gates.

Operating Principles

X Research — Operating Principles

0. Documentation Discipline

0a. No False Done

0b. Single Canonical Implementation

0c. Live E2E Test Account & Cleanup Contract

0c.1 Designated test account

0c.2 Safety boundary (the four bullets that stay inline)

1. Script-First Orchestration

Hierarchy

Rules

Consequences for Current Pipeline

§1f Overnight Pipeline Reliability Targets [TARGET]

§1g Daily Operator Loop (11:30 PT window) [PARTIAL]

1b. Fixed Action Catalog

Two Kinds of Feedback

Execution Path Selection

Scope Boundary

1c. Site as View, Not Source

1d. Posts as Database Records

1e. Operator-Facing Time Is Always SF/PT (12-hour am/pm)

2. Bounded Autonomous Sessions

3. Reliability Before Features

4. Master Script Contract (Target Design)

Approved-Actions Execution Contract (Target Design)

Interface Approval Boundary

Execution Ledger and Action Log Contract

Operator UI Cockpit Contract

Non-negotiables

Roles and Operating Model

§1f Overnight Pipeline Reliability Targets `[TARGET]`

§1g Daily Operator Loop (11:30 PT window) `[PARTIAL]`