kircérta // 2026 // codex-driven build

Agentic Network. Build System. Run Stable.

SSOT-FIRST BACKEND • SSE OBSERVABILITY • WEBHOOK INPUT • PROTOCOLIZED AGENTS • AUDITABLE FAILURES

Each phase has its own section; every copyable item is placed in a code block with one-click copy.

Overview_

MODULAR BUILD / COPY-FIRST

right-side links intentionally omitted

0) Overview & Preparation

SSOT FIRST. BACKEND AS TOOL.

You have verified that OpenClaw runs. Suggested setup order: first make SSOT read/write and a stable status view work → then add SSE observability → then add Webhook input → finally integrate real agents and run stability tests.

Preparation Checklist

SSOT repository or directory: /ssot (task JSON files)
FastAPI：Python 3.11+ / uvicorn / pydantic
Dashboard: a static page is sufficient (upgrade later)
Secrets：JWT secret / webhook secret
Audit: JSONL log file (events.jsonl)

Phase Deliverables (Must Be Verifiable)

Module 1: API can read/write SSOT and returns stable JSON
Module 2: SSE + automatic UI updates
Module 3: Trusted Webhook input + idempotency
Module 4: Auditable ACK echo / verdict
Module 5: Load tests, alerts, and blocking mitigation

Entry Link

Codex Execution & Acceptance

Open the full execution prompts and quantified acceptance criteria package for function-based delivery and verification.

1) Module 1

MINIMUM LOOP (NO REAL AGENTS)

Goal: deterministically read/write SSOT on the backend and provide a consistent task status view. Use local ./ssot to simulate GitHub first.

Scope

GET /api/tasks
GET /api/tasks/{task_id}
POST /api/tasks/{task_id}/approve (JWT)
POST /api/tasks/{task_id}/reject (JWT)

Acceptance

For the same task, the detail output fields are stable across 10 consecutive requests
After approve/reject writes back, the verdict is immediately readable
At least 3 pytest test cases pass

You are the Codex (executor). Please implement Module 1 (minimum closed loop).
Constraints: SSOT is the only truth; the backend only does reading, writing and broadcasting; do not write any "intelligent judgment" in the backend.
Delivery: Runnable FastAPI project + tests + README.

Task:
1) Create a minimal FastAPI project (Python 3.11+), supporting:
   - GET /api/tasks
   - GET /api/tasks/{task_id}
   - POST /api/tasks/{task_id}/approve（JWT）
   - POST /api/tasks/{task_id}/reject（JWT）
2) SSOT: Use local ./ssot/ simulation (one JSON per task). Writes must be atomic (tmp + rename).
3) task_engine: Generate status fields (pending/needs_review/approved/rejected) based on task JSON.
4) pytest: at least 3 use cases (list/detail/approve/reject).
5) README: startup method, sample tasks, running tests, and Acceptance steps.

Acceptance：
- The detail output field is consistent for 10 consecutive times.
- The writeback results can be read immediately after approve/reject.

2) Module 2

SSE + DASHBOARD (OBSERVABILITY)

Goal: SSOT change → backend detects → SSE pushes → frontend updates without refresh. Start with a minimal static page.

Scope

GET /api/events (SSE)
Connection pool + broadcast (triggered by approve/reject)
Dashboard: list + detail + SSE auto-refresh

Acceptance

< 1s frontend update after approve/reject
All 3 browser windows receive the broadcast

You are the Codex (executor). Please implement Module 2 (SSE + Dashboard).

Task:
1) Add SSE to the backend: GET /api/events.
- Maintain connection pool; broadcast when approve/reject occurs.
2) Add a static dashboard (/static or a separate directory), including:
- Task list + task details
- Subscribe to SSE auto-refresh
3) Provide verification method: 3 windows receive the broadcast at the same time.

Acceptance：
- Frontend updates within 1 second after approve/reject.
- All 3 connections received the broadcast.

3) Module 3

WEBHOOK INPUT (TRUSTED EVENTS)

Goal: external events (GitHub / Xcode) enter the system and trigger refresh + broadcast. Must include signature verification + idempotent handling + audit records.

Scope

POST /webhooks/github: verify HMAC-SHA256 signature
POST /webhooks/xcode: parse build status
POST /webhooks/telegram (or polling ingest): inbound Telegram messages → create SSOT tasks (structure only; no policy decisions)
audit：./audit/events.jsonl（JSONL）
After writing, trigger task_engine refresh + SSE broadcast

Acceptance

Invalid signatures must be rejected and must not write SSOT
Replaying the same event_id 10 times does not corrupt state
Signature-failure flooding (50 rps) does not affect the main path

You are the Codex (executor). Please implement Module 3 (Webhook + Telegram Ingest).

Task:
1) POST /webhooks/github: Verify HMAC-SHA256 (header + raw body).
2) POST /webhooks/xcode: parse build status (write it as it is first).
3) Audit log: implemented in ./audit/events.jsonl (one JSON per line).
4) After the webhook is written, task_engine is triggered to refresh and broadcast SSE.
5) Telegram inbound (choose one of the two, give priority to polling so that local access can be run overnight):
- A) POST /webhooks/telegram: Telegram webhook entrance (can be done later)
- B) polling_ingest: Poll Telegram getUpdates and POST new messages to the backend /webhooks/telegram (recommended to do it first)
6) Telegram message → SSOT task mapping (written in ./ssot/{task_id}.json) must contain:
- task_id (stable id), source="telegram"
- chat_id, message_id, request_text (original text)
- created_at (ISO8601), correlation_id (for concatenation audit/tool/agent)
   - status="pending"
7) Idempotent deduplication (required): only one task can be created for the same (chat_id, message_id)
- Duplicate delivery: reject duplicate creation, but write audit_event:telegram_dedup_hit

Acceptance：
- Bad signature 100% rejection and no SSOT written.
- Replaying the same event_id 10 times does not change the final state (idempotent).
- Telegram repeatedly delivers the same message_id 10 times: only 1 task can be generated (telegram_dedup_pass=true).
- Within 2 seconds after Telegram inbounds: new task + audit record appears in SSOT telegram_message_received + SSE pushes task_update.

4) Module 4

PROTOCOLIZED AGENTS (AUDIT-FIRST)

Goal: first implement Executor/Reviewer by writing protocol outputs into SSOT (ACK echo / verdict / issues), then replace with real execution via OpenClaw / Codex.

Key Mechanisms

Executor: ACK echo (repeat acceptance_criteria + scope item by item)
Reviewer: issues must include criterion_ref; verdict must be attributable
Fidelity audit: re-dispatch must forward issues verbatim
Timeouts: ACK timeout / reject lock (can be recorded in audit first)

Acceptance

ACK echo matches acceptance_criteria (mismatch blocks execution)
Each issue can be traced to corresponding actions in work_log
Timeout/lock paths can be triggered and are recorded

You are the Codex (executor). Please implement Module 4 (protocolized Executor/Reviewer).

Task:
1) Write two scripts: scripts/validation/fake_executor.py / scripts/validation/fake_reviewer.py:
- scripts/validation/fake_executor.py: read task_dispatch → write ACK echo → write task_result (including work_log + diff_snapshot).
- scripts/validation/fake_reviewer.py: read task_result → write verdict (approve/reject) + issues (each issue must contain criterion_ref).
2) Add "timeout simulation": do not write ACK, verify that the system can record ack_timeout (just write audit log first).
3) Add a "fidelity audit" checking function/script:
- The issues in the re-dispatch must be verbatim from the original verdict.issues.

Acceptance：
- ACK echo repeats acceptance_criteria item by item.
- Reviewer issues can all point to criterion_ref.
- Issues can be traced back to specific operation records in work_log.

5) Module 5

STABILITY + LOAD TESTS

Goal: still observable, accountable, and recoverable under concurrency, flooding, out-of-order, and blocking conditions. The stress test threshold can be set to read P95<200ms/write P95<500ms according to the local environment.

Test items

50 SSE clients / 2 hours
10 updates per second broadcast
50 rps bad webhook (signature failure flooding)
100x GET + 10x write + 50x SSE (observe P95/P99)
ACK timeout / heartbeat warning write audit

Key risks

sync I/O blocking event loop (e.g. PyGithub)
Missing idempotent webhook causes state jitter
Lack of structured audit and inability to hold accountable

You are the Codex (executor). Please implement Module 5 (stability and stress test script).

Task:
1) Write an SSE soak script:
- 50 client connections to /api/events
- Run for 2 hours, count the number of disconnections and maximum delay
2) Write a webhook flood script:
- 50 rps sending bad signed request to /webhooks/github
- Monitor if P95 latency of normal API (/api/tasks) increases significantly
3) Write a concurrent stress test (locust/k6 or simple asyncio script can be used):
- 100 concurrent GET /api/tasks
- 10 concurrent approve/reject
- 50 SSE connections simultaneously
4) Result output:
- p50/p95/p99 delay
- SSE latency distribution/disconnection statistics
- Whether to trigger and log ack_timeout / heartbeat_warning (written to audit jsonl)

Acceptance：
- SSE does not experience long-term freezes during the stress test.
- Read interface P95 < 200ms (local reference value), write interface P95 < 500ms.
- All alerts/timeouts are written to audit (retrievable).
- If sync I/O exists, Must Pass thread pool/executor encapsulation avoids blocking the event loop.

6) Final

ACCEPTANCE CHECKLIST

Used for final Acceptance (also suitable for CI threshold): each stage must be reproducible, scriptable, and auditable.

Must Pass

Module 1: Read and write closed loop; output stable; test passed
Module 2: SSE broadcast is stable; multiple clients are consistent
Module 3: Correct signature verification; idempotent processing; audit can be retrieved
Module 4: ACK echo & issues are traceable; fidelity audit passed
Module 5: The pressure test threshold reaches the standard; there are audit records for alarms/timeouts

Recommended Records

git commit hash / config summary for each run
Load test report (p50/p95/p99)
Audit log samples (covering event types)