kircérta // 2026 // codex-driven build

Empowering AGN to Do More

SSOT-FIRST BACKEND • SSE OBSERVABILITY • WEBHOOK INPUT • PROTOCOLIZED AGENTS • AUDITABLE FAILURES

Each phase has its own section; every copyable item is placed in a code block with one-click copy.

Overview_

MODULAR BUILD / COPY-FIRST
right-side links intentionally omitted
0) Overview & Preparation

SSOT FIRST. BACKEND AS TOOL.

You have verified that OpenClaw runs. Suggested setup order: SSOT read/write + status view → SSE observability → Webhook input → real agents + stability tests.

Preparation Checklist
  • SSOT repository or directory: /ssot (task JSON files)
  • FastAPI:Python 3.11+ / uvicorn / pydantic
  • Dashboard: a static page is sufficient (upgrade later)
  • Secrets:JWT secret / webhook secret
  • Audit: JSONL log file (events.jsonl)
Phase Deliverables (Must Be Verifiable)
  • Module 1: API can read/write SSOT and returns stable JSON
  • Module 2: SSE + automatic UI updates
  • Module 3: Trusted Webhook input + idempotency
  • Module 4: Auditable ACK echo / verdict
  • Module 5: Load tests, alerts, and blocking mitigation
1) Module 1

MINIMUM LOOP (NO REAL AGENTS)

Goal: The backend reads and writes SSOT deterministically and gives a consistent view of task status. First use the ./ssot local directory to simulate GitHub.

Implementation scope
  • GET /api/tasks
  • GET /api/tasks/{task_id}
  • POST /api/tasks/{task_id}/approve (JWT)
  • POST /api/tasks/{task_id}/reject (JWT)
acceptance
  • The detail output field of the same task is stable for 10 consecutive times.
  • approve/reject can be read immediately after writing back verdict
  • pytest at least 3 use cases passed
copy to codex — module 1
You are the Codex (executor). Please implement Module 1 (minimum closed loop).
Constraints: SSOT is the only truth; the backend only does reading, writing and broadcasting; do not write any "intelligent judgment" in the backend.
Delivery: Runnable FastAPI project + tests + README.

Task:
1) Create a minimal FastAPI project (Python 3.11+), supporting:
   - GET /api/tasks
   - GET /api/tasks/{task_id}
   - POST /api/tasks/{task_id}/approve(JWT)
   - POST /api/tasks/{task_id}/reject(JWT)
2) SSOT: Use local ./ssot/ simulation (one JSON per task). Writes must be atomic (tmp + rename).
3) task_engine: Generate status fields (pending/needs_review/approved/rejected) based on task JSON.
4) pytest: at least 3 use cases (list/detail/approve/reject).
5) README: startup method, sample tasks, running tests, and acceptance steps.

acceptance:
- The detail output field is consistent for 10 consecutive times.
- The writeback results can be read immediately after approve/reject.
2) Module 2

SSE + DASHBOARD (OBSERVABILITY)

Target: SSOT changes → Backend identification → SSE push → Frontend update without refresh. It is recommended to make a minimalist static page first.

Implementation scope
  • GET /api/events (SSE)
  • Connection pool + broadcast (approve/reject triggered)
  • dashboard: list + details + SSE automatic refresh
acceptance
  • < 1s frontend update after approve/reject
  • All 3 browser windows received the broadcast
copy to codex — module 2
You are the Codex (executor). Please implement Module 2 (SSE + Dashboard).

Task:
1) Add SSE to the backend: GET /api/events.
- Maintain connection pool; broadcast when approve/reject occurs.
2) Add a static dashboard (/static or a separate directory), including:
- Task list + task details
- Subscribe to SSE auto-refresh
3) Provide verification method: 3 windows receive the broadcast at the same time.

acceptance:
- Frontend updates within 1 second after approve/reject.
- All 3 connections received the broadcast.
3) Module 3

WEBHOOK INPUT (TRUSTED EVENTS)

Goal: External events (GitHub/Xcode) enter the system and trigger refresh and broadcast. Signature verification + idempotent processing + audit records are required.

Implementation scope
  • POST /webhooks/github: HMAC-SHA256 signature verification
  • POST /webhooks/xcode: parse build status
  • POST /webhooks/telegram (or polling ingest): inbound Telegram messages → create SSOT tasks
  • audit:./audit/events.jsonl(JSONL)
  • Trigger task_engine refresh + SSE broadcast after writing
acceptance
  • Error signatures must be rejected without writing SSOT
  • Replay the same event_id 10 times without polluting the state
  • Signature failure flooding (50 rps) does not affect the main path
copy to codex — module 3
You are the Codex (executor). Please implement Module 3 (Webhook + Telegram Ingest).

Task:
1) POST /webhooks/github: Verify HMAC-SHA256 (header + raw body).
2) POST /webhooks/xcode: parse build status (write it as it is first).
3) Audit log: implemented in ./audit/events.jsonl (one JSON per line).
4) After the webhook is written, task_engine is triggered to refresh and broadcast SSE.
5) Telegram inbound (priority polling): poll getUpdates and forward new messages to POST /webhooks/telegram.
6) message → SSOT task mapping (required fields): task_id, source, chat_id, message_id, request_text, created_at, correlation_id, status=pending.
7) Idempotent deduplication: The same (chat_id, message_id) can only be created once; write audit_event:telegram_dedup_hit repeatedly.

acceptance:
- Bad signature 100% rejection and no SSOT written.
- Replaying the same event_id 10 times does not change the final state (idempotent).
- Telegram repeated delivery: only 1 task is created, and the audit records dedup_hit.
4) Module 4

PROTOCOLIZED AGENTS (AUDIT-FIRST)

Goal: First write Executor/Reviewer into SSOT (ACK echo / verdict / issues) according to the protocol, and then replace it with openclaw / codex for real execution.

key mechanism
  • Executor: ACK echo (retell acceptance_criteria + scope one by one)
  • Reviewer: issues must have criterion_ref, and verdict can be held accountable
  • fidelity audit: re-dispatch faithfully forward issues
  • Timeout: ACK timeout / reject lock (you can also write audit first)
acceptance
  • ACK echo is consistent with acceptance_criteria (inconsistency prevents execution)
  • issues can find corresponding operation records in work_log
  • Timeouts/lock paths can be triggered and logged
copy to codex — module 4
You are the Codex (executor). Please implement Module 4 (protocolized Executor/Reviewer).

Task:
1) Write two scripts: scripts/validation/fake_executor.py / scripts/validation/fake_reviewer.py:
- scripts/validation/fake_executor.py: read task_dispatch → write ACK echo → write task_result (including work_log + diff_snapshot).
- scripts/validation/fake_reviewer.py: read task_result → write verdict (approve/reject) + issues (each issue must contain criterion_ref).
2) Add "timeout simulation": do not write ACK, verify that the system can record ack_timeout (just write audit log first).
3) Add a "fidelity audit" checking function/script:
- The issues in the re-dispatch must be verbatim from the original verdict.issues.

acceptance:
- ACK echo repeats acceptance_criteria item by item.
- Reviewer issues can all point to criterion_ref.
- Issues can be traced back to specific operation records in work_log.
5) Module 5

STABILITY + LOAD TESTS

Goal: still observable, accountable, and recoverable under concurrency, flooding, out-of-order, and blocking conditions. The stress test threshold can be set to read P95<200ms/write P95<500ms according to the local environment.

Test items
  • 50 SSE clients / 2 hours
  • 10 updates per second broadcast
  • 50 rps bad webhook (signature failure flooding)
  • 100x GET + 10x write + 50x SSE (observe P95/P99)
  • ACK timeout / heartbeat warning write audit
Key risks
  • sync I/O blocking event loop (e.g. PyGithub)
  • Missing idempotent webhook causes state jitter
  • Lack of structured audit and inability to hold accountable
copy to codex — module 5
You are the Codex (executor). Please implement Module 5 (stability and stress test script).

Task:
1) Write an SSE soak script:
- 50 client connections to /api/events
- Run for 2 hours, count the number of disconnections and maximum delay
2) Write a webhook flood script:
- 50 rps sending bad signed request to /webhooks/github
- Monitor if P95 latency of normal API (/api/tasks) increases significantly
3) Write a concurrent stress test (locust/k6 or simple asyncio script can be used):
- 100 concurrent GET /api/tasks
- 10 concurrent approve/reject
- 50 SSE connections simultaneously
4) Result output:
- p50/p95/p99 delay
- SSE latency distribution/disconnection statistics
- Whether to trigger and log ack_timeout / heartbeat_warning (written to audit jsonl)

acceptance:
- SSE does not experience long-term freezes during the stress test.
- Read interface P95 < 200ms (local reference value), write interface P95 < 500ms.
- All alerts/timeouts are written to audit (retrievable).
- If sync I/O exists, Must Pass thread pool/executor encapsulation avoids blocking the event loop.
6) Final

ACCEPTANCE CHECKLIST

For final acceptance (also suitable as CI gates): each phase must be reproducible, scriptable, and auditable.

Must Pass
  • Module 1: Read and write closed loop; output stable; test passed
  • Module 2: SSE broadcast is stable; multiple clients are consistent
  • Module 3: Correct signature verification; idempotent processing; audit can be retrieved
  • Module 4: ACK echo & issues are traceable; fidelity audit passed
  • Module 5: The pressure test threshold reaches the standard; there are audit records for alarms/timeouts
Recommended Records
  • git commit hash / config summary for each run
  • Load test report (p50/p95/p99)
  • Audit log samples (covering event types)