kircérta // 2026 // codex-driven build

Agentic Network. Build System. Run Stable.

SSOT-FIRST BACKEND • SSE OBSERVABILITY • WEBHOOK INPUT • PROTOCOLIZED AGENTS • AUDITABLE FAILURES

Each phase has its own section; every copyable item is placed in a code block with one-click copy.

Overview_

MODULAR BUILD / COPY-FIRST
right-side links intentionally omitted
0) Overview & Preparation

SSOT FIRST. BACKEND AS TOOL.

You have verified that OpenClaw runs. Suggested setup order: first make SSOT read/write and a stable status view work → then add SSE observability → then add Webhook input → finally integrate real agents and run stability tests.

Preparation Checklist
  • SSOT repository or directory: /ssot (task JSON files)
  • FastAPI:Python 3.11+ / uvicorn / pydantic
  • Dashboard: a static page is sufficient (upgrade later)
  • Secrets:JWT secret / webhook secret
  • Audit: JSONL log file (events.jsonl)
Phase Deliverables (Must Be Verifiable)
  • Module 1: API can read/write SSOT and returns stable JSON
  • Module 2: SSE + automatic UI updates
  • Module 3: Trusted Webhook input + idempotency
  • Module 4: Auditable ACK echo / verdict
  • Module 5: Load tests, alerts, and blocking mitigation
Entry Link

Codex Execution & Acceptance

Open the full execution prompts and quantified acceptance criteria package for function-based delivery and verification.

1) Module 1

MINIMUM LOOP (NO REAL AGENTS)

Goal: deterministically read/write SSOT on the backend and provide a consistent task status view. Use local ./ssot to simulate GitHub first.

Scope
  • GET /api/tasks
  • GET /api/tasks/{task_id}
  • POST /api/tasks/{task_id}/approve (JWT)
  • POST /api/tasks/{task_id}/reject (JWT)
Acceptance
  • For the same task, the detail output fields are stable across 10 consecutive requests
  • After approve/reject writes back, the verdict is immediately readable
  • At least 3 pytest test cases pass
copy to codex — module 1
You are the Codex (executor). Please implement Module 1 (minimum closed loop).
Constraints: SSOT is the only truth; the backend only does reading, writing and broadcasting; do not write any "intelligent judgment" in the backend.
Delivery: Runnable FastAPI project + tests + README.

Task:
1) Create a minimal FastAPI project (Python 3.11+), supporting:
   - GET /api/tasks
   - GET /api/tasks/{task_id}
   - POST /api/tasks/{task_id}/approve(JWT)
   - POST /api/tasks/{task_id}/reject(JWT)
2) SSOT: Use local ./ssot/ simulation (one JSON per task). Writes must be atomic (tmp + rename).
3) task_engine: Generate status fields (pending/needs_review/approved/rejected) based on task JSON.
4) pytest: at least 3 use cases (list/detail/approve/reject).
5) README: startup method, sample tasks, running tests, and Acceptance steps.

Acceptance:
- The detail output field is consistent for 10 consecutive times.
- The writeback results can be read immediately after approve/reject.
2) Module 2

SSE + DASHBOARD (OBSERVABILITY)

Goal: SSOT change → backend detects → SSE pushes → frontend updates without refresh. Start with a minimal static page.

Scope
  • GET /api/events (SSE)
  • Connection pool + broadcast (triggered by approve/reject)
  • Dashboard: list + detail + SSE auto-refresh
Acceptance
  • < 1s frontend update after approve/reject
  • All 3 browser windows receive the broadcast
copy to codex — module 2
You are the Codex (executor). Please implement Module 2 (SSE + Dashboard).

Task:
1) Add SSE to the backend: GET /api/events.
- Maintain connection pool; broadcast when approve/reject occurs.
2) Add a static dashboard (/static or a separate directory), including:
- Task list + task details
- Subscribe to SSE auto-refresh
3) Provide verification method: 3 windows receive the broadcast at the same time.

Acceptance:
- Frontend updates within 1 second after approve/reject.
- All 3 connections received the broadcast.
3) Module 3

WEBHOOK INPUT (TRUSTED EVENTS)

Goal: external events (GitHub / Xcode) enter the system and trigger refresh + broadcast. Must include signature verification + idempotent handling + audit records.

Scope
  • POST /webhooks/github: verify HMAC-SHA256 signature
  • POST /webhooks/xcode: parse build status
  • POST /webhooks/telegram (or polling ingest): inbound Telegram messages → create SSOT tasks (structure only; no policy decisions)
  • audit:./audit/events.jsonl(JSONL)
  • After writing, trigger task_engine refresh + SSE broadcast
Acceptance
  • Invalid signatures must be rejected and must not write SSOT
  • Replaying the same event_id 10 times does not corrupt state
  • Signature-failure flooding (50 rps) does not affect the main path
copy to codex — module 3
You are the Codex (executor). Please implement Module 3 (Webhook + Telegram Ingest).

Task:
1) POST /webhooks/github: Verify HMAC-SHA256 (header + raw body).
2) POST /webhooks/xcode: parse build status (write it as it is first).
3) Audit log: implemented in ./audit/events.jsonl (one JSON per line).
4) After the webhook is written, task_engine is triggered to refresh and broadcast SSE.
5) Telegram inbound (choose one of the two, give priority to polling so that local access can be run overnight):
- A) POST /webhooks/telegram: Telegram webhook entrance (can be done later)
- B) polling_ingest: Poll Telegram getUpdates and POST new messages to the backend /webhooks/telegram (recommended to do it first)
6) Telegram message → SSOT task mapping (written in ./ssot/{task_id}.json) must contain:
- task_id (stable id), source="telegram"
- chat_id, message_id, request_text (original text)
- created_at (ISO8601), correlation_id (for concatenation audit/tool/agent)
   - status="pending"
7) Idempotent deduplication (required): only one task can be created for the same (chat_id, message_id)
- Duplicate delivery: reject duplicate creation, but write audit_event:telegram_dedup_hit

Acceptance:
- Bad signature 100% rejection and no SSOT written.
- Replaying the same event_id 10 times does not change the final state (idempotent).
- Telegram repeatedly delivers the same message_id 10 times: only 1 task can be generated (telegram_dedup_pass=true).
- Within 2 seconds after Telegram inbounds: new task + audit record appears in SSOT telegram_message_received + SSE pushes task_update.
4) Module 4

PROTOCOLIZED AGENTS (AUDIT-FIRST)

Goal: first implement Executor/Reviewer by writing protocol outputs into SSOT (ACK echo / verdict / issues), then replace with real execution via OpenClaw / Codex.

Key Mechanisms
  • Executor: ACK echo (repeat acceptance_criteria + scope item by item)
  • Reviewer: issues must include criterion_ref; verdict must be attributable
  • Fidelity audit: re-dispatch must forward issues verbatim
  • Timeouts: ACK timeout / reject lock (can be recorded in audit first)
Acceptance
  • ACK echo matches acceptance_criteria (mismatch blocks execution)
  • Each issue can be traced to corresponding actions in work_log
  • Timeout/lock paths can be triggered and are recorded
copy to codex — module 4
You are the Codex (executor). Please implement Module 4 (protocolized Executor/Reviewer).

Task:
1) Write two scripts: scripts/validation/fake_executor.py / scripts/validation/fake_reviewer.py:
- scripts/validation/fake_executor.py: read task_dispatch → write ACK echo → write task_result (including work_log + diff_snapshot).
- scripts/validation/fake_reviewer.py: read task_result → write verdict (approve/reject) + issues (each issue must contain criterion_ref).
2) Add "timeout simulation": do not write ACK, verify that the system can record ack_timeout (just write audit log first).
3) Add a "fidelity audit" checking function/script:
- The issues in the re-dispatch must be verbatim from the original verdict.issues.

Acceptance:
- ACK echo repeats acceptance_criteria item by item.
- Reviewer issues can all point to criterion_ref.
- Issues can be traced back to specific operation records in work_log.
5) Module 5

STABILITY + LOAD TESTS

Goal: still observable, accountable, and recoverable under concurrency, flooding, out-of-order, and blocking conditions. The stress test threshold can be set to read P95<200ms/write P95<500ms according to the local environment.

Test items
  • 50 SSE clients / 2 hours
  • 10 updates per second broadcast
  • 50 rps bad webhook (signature failure flooding)
  • 100x GET + 10x write + 50x SSE (observe P95/P99)
  • ACK timeout / heartbeat warning write audit
Key risks
  • sync I/O blocking event loop (e.g. PyGithub)
  • Missing idempotent webhook causes state jitter
  • Lack of structured audit and inability to hold accountable
copy to codex — module 5
You are the Codex (executor). Please implement Module 5 (stability and stress test script).

Task:
1) Write an SSE soak script:
- 50 client connections to /api/events
- Run for 2 hours, count the number of disconnections and maximum delay
2) Write a webhook flood script:
- 50 rps sending bad signed request to /webhooks/github
- Monitor if P95 latency of normal API (/api/tasks) increases significantly
3) Write a concurrent stress test (locust/k6 or simple asyncio script can be used):
- 100 concurrent GET /api/tasks
- 10 concurrent approve/reject
- 50 SSE connections simultaneously
4) Result output:
- p50/p95/p99 delay
- SSE latency distribution/disconnection statistics
- Whether to trigger and log ack_timeout / heartbeat_warning (written to audit jsonl)

Acceptance:
- SSE does not experience long-term freezes during the stress test.
- Read interface P95 < 200ms (local reference value), write interface P95 < 500ms.
- All alerts/timeouts are written to audit (retrievable).
- If sync I/O exists, Must Pass thread pool/executor encapsulation avoids blocking the event loop.
6) Final

ACCEPTANCE CHECKLIST

Used for final Acceptance (also suitable for CI threshold): each stage must be reproducible, scriptable, and auditable.

Must Pass
  • Module 1: Read and write closed loop; output stable; test passed
  • Module 2: SSE broadcast is stable; multiple clients are consistent
  • Module 3: Correct signature verification; idempotent processing; audit can be retrieved
  • Module 4: ACK echo & issues are traceable; fidelity audit passed
  • Module 5: The pressure test threshold reaches the standard; there are audit records for alarms/timeouts
Recommended Records
  • git commit hash / config summary for each run
  • Load test report (p50/p95/p99)
  • Audit log samples (covering event types)