Data Correctness Gotchas (Current v2 Line)
This page is still relevant, but it needed a reset. The current v2 framework has added stronger guardrails, scoped MCP configuration, verbose telemetry, and better prompt/runtime metadata. Even with those upgrades, there are still ways to get semantically wrong output without crashing the engine.
This page focuses on the failure modes that still matter in the current repo, what the newer v2 features already improved, and what consumers should still protect at integration time.
What v2 improved already
Compared to the early v2 line, the current framework now reduces several older classes of mistakes:
ce_mcp_tool,ce_mcp_planner,ce_pending_action, andce_verboseare startup-validated for scope/integrity- MCP scope is explicit (
ANY/UNKNOWN), not ambiguous null wildcard behavior CorrectionStepcan keep confirmation/edit/retry turns in-place instead of forcing unnecessary reclassificationce_verboseand step telemetry make degraded or skipped paths easier to detectPOST_SCHEMA_EXTRACTIONandPRE_AGENT_MCPphases provide cleaner rule insertion pointsce_mcp_plannermakes MCP prompt selection deterministic by intent/state scope instead of relying only on legacy config
Those are real upgrades, but they do not eliminate correctness risk. They mostly make bad behavior easier to prevent and easier to diagnose.
Highest-risk correctness failures that still exist
Current top correctness risks
| Risk | Trigger | Bad output pattern | Fast detection | Practical mitigation |
|---|---|---|---|---|
| Concurrent same-conversation writes | Two requests hit the same `conversationId` at nearly the same time. | Last-write-wins state/context drift even though each individual turn looks valid. | Compare `updated_at`, audit ordering, and final persisted `ce_conversation` row for overlapping turns. | Serialize by `conversationId` at ingress or add optimistic locking around `ce_conversation`. |
| Rule collisions produce valid-but-wrong transitions | Multiple `ce_rule` rows match across phases or priorities and mutate intent/state in sequence. | Conversation lands in a reachable state, but not the one the business flow intended. | Review `RULE_MATCH`, `RULE_APPLIED`, and final intent/state across all phases, not just `PRE_RESPONSE_RESOLUTION`. | Keep rule ownership narrow, minimize cross-intent mutations, and regression-test critical paths. |
| Stale context survives topic or state changes | A new turn merges into existing context without evicting fields that are no longer valid for the new flow. | Old facts leak into prompt rendering, schema completeness checks, or final responses. | Diff `context_json` before and after intent/state changes and look for old keys that should have been dropped. | Define explicit reset rules on transition boundaries and clear stale fields when switching flows. |
| Consumer exposes too much prompt data | Broad `inputParams` or ad hoc metadata is allowed to influence prompt rendering. | The model produces plausible output based on accidental or sensitive prompt variables. | Inspect rendered prompt inputs during `INTENT_*`, `SCHEMA_*`, `MCP_PLAN_*`, and `RESOLVE_RESPONSE_*` LLM paths. | Treat prompt exposure as an allowlist contract and keep internal-only keys out of prompt vars. |
| Missing response coverage for reachable states | Rules, MCP, or pending actions move the session into a state with no usable `ce_response` / `ce_prompt_template` mapping. | The turn completes but returns fallback, empty, or misleading generic text. | Build trace shows state transition succeeded but response selection falls back or misses expected template/response rows. | Audit every reachable state and make response coverage part of config review. |
Subtle traps introduced by newer flexibility
The newer v2 line is more capable, but that also creates new configuration mistakes if consumers are careless.
Modern v2 flexibility traps
| Area | What can go wrong | Why it matters now | Recommended guardrail |
|---|---|---|---|
| `CorrectionStep` routing | A prompt row claims to allow `affirm`, `edit`, or `retry`, but the actual state contract is not safe for in-place reuse. | The engine may skip schema extraction or intent resolution when the consumer really needed a full recompute. | Use `interaction_mode` / `interaction_contract` only when the state really supports in-place continuation. |
| `SET_INPUT_PARAM` rule action | Rules can now mutate request-level values mid-pipeline. | Small config mistakes can alter downstream tool calls, prompt variables, or response behavior in non-obvious ways. | Restrict `SET_INPUT_PARAM` to tightly scoped, auditable keys and keep a change log for those rules. |
| Scoped MCP rows | A tool/planner row is syntactically valid but scoped too broadly with `ANY` when it should be intent-specific. | The engine stays deterministic, but the business blast radius of a tool becomes larger than intended. | Default to exact intent/state scope first; widen to `ANY` only when the tool is truly global. |
| `ce_verbose` messages | Verbose rows tell the user or UI a misleading progress story even while the engine remains technically correct. | Support teams may trust the progress text more than the actual state transition. | Treat `ce_verbose` as a tested UI contract, not cosmetic copy. |
| Prompt renderer power | Shared Thymeleaf rendering now applies across prompts and verbose messages. | Template bugs can affect multiple runtime surfaces instead of just one prompt row. | Lint templates before release and test variable availability per step. |
Still-important framework limitations
These are not necessarily bugs. They are design constraints consumers should plan around.
Design limits to plan around
| Limitation | Current behavior | Consumer impact | Safer operating posture |
|---|---|---|---|
| No built-in optimistic locking on `ce_conversation` | The entity does not expose a version field for conflict detection. | Concurrent same-ID turns can overwrite each other nondeterministically. | Enforce one active turn per `conversationId` at the API boundary. |
| Turn work spans multiple step writes | A turn is not wrapped in one global ACID-style business transaction boundary. | Partial artifacts can persist across failures or stop paths. | Make trace review part of incident debugging and prefer compensating logic over hidden assumptions. |
| Data-driven behavior can still be misconfigured | The framework validates structure, but not every business semantic mistake in rules/prompts/responses. | A startup-clean system can still behave incorrectly for specific conversations. | Test seeded configurations as a product artifact, not just Java code. |
| Conversation ordering is still a consumer concern at scale | The framework does not provide distributed per-conversation queueing by itself. | Horizontal scale can amplify race and ordering ambiguity if ingress is naive. | Use request serialization, partitioned workers, or upstream coordination. |
| Tool safety is only partly framework-enforced | Scope checks, MCP next-tool guardrails, and handler models exist, but business authorization remains consumer-defined. | A correctly scoped tool can still expose data or actions beyond policy if the consumer wires it loosely. | Treat tool execution as a security boundary and add business-policy checks in handlers. |
What to watch in current traces
If a conversation looks "wrong" but did not crash, these are the fastest places to inspect:
- final persisted
ce_conversation.intent_code,state_code,context_json EngineSession.stepInfosvia trace outputRULE_MATCH/RULE_APPLIEDordering across phasesROUTING_DECISIONvalues set byCorrectionStepcontext.mcp.lifecycle.*andcontext.mcp.toolExecution.*TOOL_ORCHESTRATION_*andMCP_*verbose/audit eventsRESOLVE_RESPONSE_SELECTEDvs the final user-visible payload
Current hardening checklist
What mature consumers should enforce
| Control | Why it matters | Where to implement |
|---|---|---|
| Single active turn per conversation | Prevents the most damaging state corruption class in the current framework. | API gateway, controller, queue, or distributed lock layer |
| Config regression tests | Most modern failures are configuration mistakes, not framework crashes. | Fixture tests around seeded `ce_*` rows |
| Explicit response coverage audit | Reachable states without responses create misleading fallback behavior. | Pre-release DML review and smoke tests |
| Tool policy review | Broad `ANY` scope or under-protected handlers can create silent data risk. | `ce_mcp_tool` design + consumer-side handler implementation |
| Prompt-variable hygiene | The current renderer is powerful enough to amplify accidental variable exposure. | Prompt seeding discipline + consumer metadata filtering |
Relevance summary
The old page was directionally useful, but parts of it were anchored to early v2 behavior. The current version of this page is still relevant because the core correctness themes remain:
- race conditions are still the biggest operational correctness risk
- rule/config drift is still the biggest semantic correctness risk
- stale context and incomplete response coverage still produce believable but wrong output
What changed is that current v2 gives you better tools to detect and constrain those issues:
- explicit scope validation
- richer rule phases
- correction routing
- verbose runtime signals
- step telemetry
- scoped MCP planner behavior
Current v2 is safer and more diagnosable than the original release line, but it is still a highly configurable engine. Most production correctness issues now come from configuration design and concurrency policy, not from missing core framework primitives.