kircérta // 2026 // codex-driven build

Empowering AGN to Do More

SSOT-FIRST BACKEND • SSE OBSERVABILITY • WEBHOOK INPUT • PROTOCOLIZED AGENTS • AUDITABLE FAILURES

Each phase has its own section; every copyable item is placed in a code block with one-click copy.

Overview_

MODULAR BUILD / COPY-FIRST

right-side links intentionally omitted

0) Overview & Preparation

SSOT FIRST. BACKEND AS TOOL.

You have verified that OpenClaw runs. Suggested setup order: SSOT read/write + status view → SSE observability → Webhook input → real agents + stability tests.

Preparation Checklist

SSOT repository or directory: /ssot (task JSON files)
FastAPI：Python 3.11+ / uvicorn / pydantic
Dashboard: a static page is sufficient (upgrade later)
Secrets：JWT secret / webhook secret
Audit: JSONL log file (events.jsonl)

Phase Deliverables (Must Be Verifiable)

Module 1: API can read/write SSOT and returns stable JSON
Module 2: SSE + automatic UI updates
Module 3: Trusted Webhook input + idempotency
Module 4: Auditable ACK echo / verdict
Module 5: Load tests, alerts, and blocking mitigation

1) Module 1

MINIMUM LOOP (NO REAL AGENTS)

Goal: The backend reads and writes SSOT deterministically and gives a consistent view of task status. First use the ./ssot local directory to simulate GitHub.

Implementation scope

GET /api/tasks
GET /api/tasks/{task_id}
POST /api/tasks/{task_id}/approve (JWT)
POST /api/tasks/{task_id}/reject (JWT)

acceptance

The detail output field of the same task is stable for 10 consecutive times.
approve/reject can be read immediately after writing back verdict
pytest at least 3 use cases passed

You are the Codex (executor). Please implement Module 1 (minimum closed loop).
Constraints: SSOT is the only truth; the backend only does reading, writing and broadcasting; do not write any "intelligent judgment" in the backend.
Delivery: Runnable FastAPI project + tests + README.

Task:
1) Create a minimal FastAPI project (Python 3.11+), supporting:
   - GET /api/tasks
   - GET /api/tasks/{task_id}
   - POST /api/tasks/{task_id}/approve（JWT）
   - POST /api/tasks/{task_id}/reject（JWT）
2) SSOT: Use local ./ssot/ simulation (one JSON per task). Writes must be atomic (tmp + rename).
3) task_engine: Generate status fields (pending/needs_review/approved/rejected) based on task JSON.
4) pytest: at least 3 use cases (list/detail/approve/reject).
5) README: startup method, sample tasks, running tests, and acceptance steps.

acceptance:
- The detail output field is consistent for 10 consecutive times.
- The writeback results can be read immediately after approve/reject.

2) Module 2

SSE + DASHBOARD (OBSERVABILITY)

Target: SSOT changes → Backend identification → SSE push → Frontend update without refresh. It is recommended to make a minimalist static page first.

Implementation scope

GET /api/events (SSE)
Connection pool + broadcast (approve/reject triggered)
dashboard: list + details + SSE automatic refresh

acceptance

< 1s frontend update after approve/reject
All 3 browser windows received the broadcast

You are the Codex (executor). Please implement Module 2 (SSE + Dashboard).

Task:
1) Add SSE to the backend: GET /api/events.
- Maintain connection pool; broadcast when approve/reject occurs.
2) Add a static dashboard (/static or a separate directory), including:
- Task list + task details
- Subscribe to SSE auto-refresh
3) Provide verification method: 3 windows receive the broadcast at the same time.

acceptance:
- Frontend updates within 1 second after approve/reject.
- All 3 connections received the broadcast.

3) Module 3

WEBHOOK INPUT (TRUSTED EVENTS)

Goal: External events (GitHub/Xcode) enter the system and trigger refresh and broadcast. Signature verification + idempotent processing + audit records are required.

Implementation scope

POST /webhooks/github: HMAC-SHA256 signature verification
POST /webhooks/xcode: parse build status
POST /webhooks/telegram (or polling ingest): inbound Telegram messages → create SSOT tasks
audit：./audit/events.jsonl（JSONL）
Trigger task_engine refresh + SSE broadcast after writing

acceptance

Error signatures must be rejected without writing SSOT
Replay the same event_id 10 times without polluting the state
Signature failure flooding (50 rps) does not affect the main path

You are the Codex (executor). Please implement Module 3 (Webhook + Telegram Ingest).

Task:
1) POST /webhooks/github: Verify HMAC-SHA256 (header + raw body).
2) POST /webhooks/xcode: parse build status (write it as it is first).
3) Audit log: implemented in ./audit/events.jsonl (one JSON per line).
4) After the webhook is written, task_engine is triggered to refresh and broadcast SSE.
5) Telegram inbound (priority polling): poll getUpdates and forward new messages to POST /webhooks/telegram.
6) message → SSOT task mapping (required fields): task_id, source, chat_id, message_id, request_text, created_at, correlation_id, status=pending.
7) Idempotent deduplication: The same (chat_id, message_id) can only be created once; write audit_event:telegram_dedup_hit repeatedly.

acceptance:
- Bad signature 100% rejection and no SSOT written.
- Replaying the same event_id 10 times does not change the final state (idempotent).
- Telegram repeated delivery: only 1 task is created, and the audit records dedup_hit.

4) Module 4

PROTOCOLIZED AGENTS (AUDIT-FIRST)

Goal: First write Executor/Reviewer into SSOT (ACK echo / verdict / issues) according to the protocol, and then replace it with openclaw / codex for real execution.

key mechanism

Executor: ACK echo (retell acceptance_criteria + scope one by one)
Reviewer: issues must have criterion_ref, and verdict can be held accountable
fidelity audit: re-dispatch faithfully forward issues
Timeout: ACK timeout / reject lock (you can also write audit first)

acceptance

ACK echo is consistent with acceptance_criteria (inconsistency prevents execution)
issues can find corresponding operation records in work_log
Timeouts/lock paths can be triggered and logged

You are the Codex (executor). Please implement Module 4 (protocolized Executor/Reviewer).

Task:
1) Write two scripts: scripts/validation/fake_executor.py / scripts/validation/fake_reviewer.py:
- scripts/validation/fake_executor.py: read task_dispatch → write ACK echo → write task_result (including work_log + diff_snapshot).
- scripts/validation/fake_reviewer.py: read task_result → write verdict (approve/reject) + issues (each issue must contain criterion_ref).
2) Add "timeout simulation": do not write ACK, verify that the system can record ack_timeout (just write audit log first).
3) Add a "fidelity audit" checking function/script:
- The issues in the re-dispatch must be verbatim from the original verdict.issues.

acceptance:
- ACK echo repeats acceptance_criteria item by item.
- Reviewer issues can all point to criterion_ref.
- Issues can be traced back to specific operation records in work_log.

5) Module 5

STABILITY + LOAD TESTS

Goal: still observable, accountable, and recoverable under concurrency, flooding, out-of-order, and blocking conditions. The stress test threshold can be set to read P95<200ms/write P95<500ms according to the local environment.

Test items

50 SSE clients / 2 hours
10 updates per second broadcast
50 rps bad webhook (signature failure flooding)
100x GET + 10x write + 50x SSE (observe P95/P99)
ACK timeout / heartbeat warning write audit

Key risks

sync I/O blocking event loop (e.g. PyGithub)
Missing idempotent webhook causes state jitter
Lack of structured audit and inability to hold accountable

You are the Codex (executor). Please implement Module 5 (stability and stress test script).

Task:
1) Write an SSE soak script:
- 50 client connections to /api/events
- Run for 2 hours, count the number of disconnections and maximum delay
2) Write a webhook flood script:
- 50 rps sending bad signed request to /webhooks/github
- Monitor if P95 latency of normal API (/api/tasks) increases significantly
3) Write a concurrent stress test (locust/k6 or simple asyncio script can be used):
- 100 concurrent GET /api/tasks
- 10 concurrent approve/reject
- 50 SSE connections simultaneously
4) Result output:
- p50/p95/p99 delay
- SSE latency distribution/disconnection statistics
- Whether to trigger and log ack_timeout / heartbeat_warning (written to audit jsonl)

acceptance：
- SSE does not experience long-term freezes during the stress test.
- Read interface P95 < 200ms (local reference value), write interface P95 < 500ms.
- All alerts/timeouts are written to audit (retrievable).
- If sync I/O exists, Must Pass thread pool/executor encapsulation avoids blocking the event loop.

6) Final

ACCEPTANCE CHECKLIST

For final acceptance (also suitable as CI gates): each phase must be reproducible, scriptable, and auditable.

Must Pass

Module 1: Read and write closed loop; output stable; test passed
Module 2: SSE broadcast is stable; multiple clients are consistent
Module 3: Correct signature verification; idempotent processing; audit can be retrieved
Module 4: ACK echo & issues are traceable; fidelity audit passed
Module 5: The pressure test threshold reaches the standard; there are audit records for alarms/timeouts

Recommended Records

git commit hash / config summary for each run
Load test report (p50/p95/p99)
Audit log samples (covering event types)