Improvement Backlog
This page turns the correctness and scalability recommendations into implementation-ready work items. Use it as the execution plan after reading Data Correctness Gotchas.
Phase 1: Correctness-critical tickets
P0/P1 backlog
| Ticket | Problem | Implementation | Acceptance criteria | Effort |
|---|---|---|---|---|
| CE-001: State transition guard | Model output can force invalid `intent/state` jumps. | Add transition validator before final persistence; block or quarantine disallowed transitions. | Invalid transitions never persist; blocked transitions produce explicit error stage and fallback response. | M |
| CE-002: Conversation optimistic locking | Concurrent same-conversation writes produce last-write-wins corruption. | Add `version` column and optimistic lock handling on conversation updates. | Conflicting parallel updates produce deterministic conflict behavior (retry or fail with known code). | M |
| CE-003: LLM context lifecycle safety | ThreadLocal LLM context can leak across pooled threads if not cleared. | Wrap every `LlmInvocationContext.set(...)` in `try/finally` + `clear()`. | No stale context observed under stress test with mixed conversation IDs. | S |
| CE-004: Prompt variable allowlist | All input params are currently exposable to prompt rendering. | Introduce allowlist + redaction for sensitive/unexpected keys. | Only approved prompt keys appear in rendered prompt payloads. | M |
| CE-005: Stale context eviction rules | Partial schema merges can keep incompatible old fields. | Add per-intent/state field-retention policy and evict on transitions. | Transition tests show old incompatible fields are removed deterministically. | M |
Phase 2: Operability and convenience tickets
Developer and operator UX
| Ticket | Problem | Implementation | Acceptance criteria | Effort |
|---|---|---|---|---|
| CE-006: Config lint and dry-run | Broken rules/prompts are discovered too late. | Add validator command/endpoint for response mapping coverage, unresolved vars, rule loops, MCP safety checks. | Invalid config sets fail lint in CI and are blocked from promotion. | M |
| CE-007: Deterministic replay tool | Wrong-output incidents are hard to reproduce. | Replay conversation turns against frozen config snapshot and compare expected vs actual transitions. | At least one production incident can be replayed locally with identical state progression. | M-L |
| CE-008: Scenario test harness | Manual QA misses edge-path regressions. | Add fixture-driven conversation tests (turn sequence + expected intent/state/output assertions). | Regression suite catches known sticky-intent, rule-collision, and reset-flow bugs. | M |
| CE-009: Transition map generator | State machine behavior is opaque to integrators. | Generate graph from rules/responses/schema transitions with dead-end warnings. | Docs include generated transition map and dead-end detection report. | S-M |
Phase 3: Scalability tickets
Throughput and horizontal scale
| Ticket | Problem | Implementation | Acceptance criteria | Effort |
|---|---|---|---|---|
| CE-010: Hot-path query refactor | `findAll().stream()` in request path degrades with config size. | Replace with indexed query methods for response/template/schema selection. | P95 latency remains stable when control-plane rows scale 10x. | M |
| CE-011: Config cache with version invalidation | Repeated config reads increase latency variability. | Add cache per intent/state with invalidation on config mutation. | Cache hit ratio > 90% in steady state without stale-config incidents. | M-L |
| CE-012: Per-conversation execution serialization | Concurrent turns create races as scale increases. | Route requests by conversation key to single active worker/partition. | No race-induced state drift in concurrency stress tests. | L |
| CE-013: Canonical turn store | History reconstructed from audit can be incomplete/noisy. | Persist normalized user/assistant turns and switch history provider to it. | History quality checks pass even when audit levels change. | M-L |
| CE-014: Bounded enrichment budgets | Optional enrichments can inflate synchronous latency. | Apply strict timeout budget for container/MCP enrichments and degrade gracefully. | SLO maintained under downstream slowdown with deterministic fallback behavior. | M |
Recommended rollout order
CE-003andCE-001first (cheap/high impact correctness guards).CE-002before any high-concurrency scale work.CE-004andCE-005before prompt/template expansion.CE-006+CE-008to stop regressions while refactoring.CE-010andCE-011to stabilize throughput.CE-012for horizontal scale and race elimination.CE-013andCE-014for long-term quality and latency control.
Done criteria for the program
Exit gates
| Gate | Target |
|---|---|
| Correctness | No illegal transition persistence in test suite + canary runtime. |
| Concurrency | No race-induced state drift under parallel same-conversation load tests. |
| Security | Prompt exposure allowlist and MCP safety policy enforcement enabled by default. |
| Scalability | Stable p95/p99 under 10x config growth and peak expected QPS. |
| Operability | Config lint, replay, and scenario tests integrated into release workflow. |
How to use this backlog
Treat each ticket as a tracked ADR-backed change. For every ticket: define owner, rollout guardrails, migration/rollback plan, and evidence artifact (test report or benchmark).