Skip to main content
v1

Data Correctness Gotchas

This page documents where ConvEngine can produce incorrect, stale, or misleading output without necessarily crashing. It also lists framework constraints and a concrete roadmap to make the system safer, easier to operate, and more scalable.

High-risk improper data scenarios

Top correctness failure modes

GotchaTriggerBad output patternFast detectionPractical mitigation
LLM JSON overrides engine intent/stateResponse JSON includes `intent` or `state`; resolver applies those fields directly.Flow silently jumps to wrong state/intent on next turn.Compare pre/post state around `RESOLVE_RESPONSE_LLM_OUTPUT` and final persisted conversation row.Allowlist transitions, reject unknown states, and gate model-driven state changes behind validator hooks.
Sticky intent routes new question to old flow`STICKY_INTENT=true` + active incomplete schema collection + no explicit switch/reset signal.User asks new intent but gets response from previous intent templates/rules.Frequent `INTENT_RESOLVE_SKIPPED_STICKY_INTENT` on semantically new user text.Force re-resolve on low confidence turns, topic-shift heuristics, and explicit UI switch-intent controls.
Rule chain side effects create wrong final stateMultiple matching rules mutate intent/state across passes.Technically valid response, semantically wrong transition path.Inspect ordered `RULE_MATCH`/`RULE_NO_MATCH` events by pass and phase.Constrain rule ownership, cap cross-intent actions, and add collision regression tests for rule priority.
Context merge preserves stale fieldsSchema extraction merges partial JSON into existing context and does not clear incompatible prior fields.Old values leak into later prompts and responses.Diff context pre/post schema extraction and check unchanged keys across intent/state changes.Add per-intent field reset policy and explicit stale-key eviction on transition boundaries.
Concurrent writes race on same conversationParallel requests share same `conversationId` with no optimistic version check.Last-write-wins overwrites can produce drifted context/intent/state.Look for overlapping timestamps and contradictory stage order for one conversation.Serialize by conversation key at API boundary or add optimistic locking (`@Version`).

Additional correctness traps

Less obvious but important

AreaWhat happensWhy it mattersRecommended guardrail
Prompt variable exposureAll `inputParams` are exposed as prompt vars (`promptTemplateVars` path).Unexpected/sensitive params can influence model behavior or leak into prompts.Move to explicit allowlist of exposable keys and redact secret-like fields.
LLM invocation ThreadLocal lifecycle`LlmInvocationContext.set(...)` is used in several flows without matching clear in same path.Pooled-thread reuse can cause wrong attribution/context bleed in custom `LlmClient` logging.Wrap set/clear in `try/finally` in every LLM call site.
Silent exception swallowingSeveral context/container merge failures are ignored to keep pipeline running.Turn completes with partial enrichment, giving plausible but incorrect output.Emit warning stages with structured reason and defaulted fields when fallback path is taken.
History reconstruction qualityConversation history is reconstructed from audit stages (not canonical turn table).Missing or filtered audit events can distort LLM context history.Persist normalized user/assistant turn table and use that as primary history source.
MCP tool safety assumptions`safe_mode` exists on DB tool config but execution path does not enforce separate safe-mode logic.Risk of over-trusting tool configuration for sensitive data queries.Enforce explicit read-only policy, SQL allowlist checks, and per-tool security policy validation.

Framework limitations (current behavior)

Design constraints to plan around

LimitationCurrent behaviorConsumer impactBetter direction
No full-turn ACID boundaryTurn state is updated across multiple independent saves/steps.Partial turn artifacts can persist after mid-pipeline failures.Introduce transactional turn boundary or compensating state machine.
No built-in optimistic concurrency for conversation row`ce_conversation` entity has no versioned conflict detection.Racing requests overwrite each other nondeterministically.Add version column + conflict retry/merge strategy.
Heavy dynamic config with mixed refresh modelSome configs are read at startup; others queried per turn from DB.Operational behavior can be hard to reason about after config updates.Document refresh semantics per config and add explicit reload controls.
In-memory scan selection paths`findAll().stream()` used in response/template/schema selection in hot path.Larger config tables increase latency and raise variability.Replace with indexed targeted queries + bounded caches.
Single-process assumptions for orderingConversation ordering and mutation safety are mostly local-process concerns.Horizontal scale can amplify race and ordering ambiguity.Adopt distributed per-conversation lock/queue semantics.

Improvement roadmap

Phase 1: Correctness hardening (highest priority)

Immediate improvements

ChangeValueEffortNotes
Add response state-transition validatorPrevents invalid model-driven intent/state jumps.Low-MediumApply before `PersistConversationStep`; reject or quarantine invalid transitions.
Add optimistic locking on `ce_conversation`Eliminates silent last-write-wins corruption.MediumIntroduce `version` field + retry policy for conflict cases.
Guarantee `LlmInvocationContext.clear()`Prevents cross-request attribution bleed.LowWrap all LLM call sites with `try/finally`.
Prompt-var allowlist + redactionReduces prompt contamination and secret leakage risk.Low-MediumReplace current expose-all behavior with explicit allowlist.
Emit structured fallback warningsMakes silent degradations diagnosable.LowWhen a catch-ignore path is hit, emit a dedicated stage with cause metadata.

Phase 2: Convenience and developer experience

Make ConvEngine easier to run

EnhancementWhy it helpsSuggested shape
Config lint + dry-run validatorCatches broken rules/templates before production impact.CLI/admin endpoint validating response mappings, rule loops, unresolved template vars, and MCP tool safety.
State transition map visualizationHelps teams reason about intent/state flows and side effects.Generate graph from rules + responses + schema transitions with dead-end and collision markers.
Deterministic replay modeSimplifies debugging wrong-output incidents.Replay one conversation against snapshot config and compare expected vs actual transitions.
Scenario test harnessMoves quality from ad-hoc manual checks to repeatable tests.YAML/JSON test fixtures for input turn sequences, expected intent/state, payload assertions.

Phase 3: Scalability and throughput

Scale safely

Scalability upgradeBenefitImplementation direction
Targeted repository queriesCuts CPU and DB transfer overhead in hot path.Replace `findAll().stream()` with indexed query methods for intent/state/priority.
Hot config cache with version invalidationStabilizes latency under high QPS.Cache rules/templates/schemas per intent-state; invalidate on config change event.
Per-conversation serialized executionRemoves race conditions while scaling horizontally.Route by conversation key to queue/partition; process one active turn per key.
Async/parallelizable non-critical enrichmentsReduces end-to-end response time.Move optional container/MCP enrichments off critical path with explicit timeout budget.
Canonical turn storeImproves history quality and replay reliability.Persist normalized user/assistant turns instead of reconstructing from audit stream.
Confirmed vs inferred

Confirmed from repository code: state overrides from JSON output, sticky-intent skip behavior, rule pass mutation loops, context merge semantics, broad prompt-var exposure, no optimistic lock field on conversation entity, and findAll() hot-path scans. Inferred (consumer-dependent): ingress concurrency controls, infra-level ordering guarantees, and custom LlmClient timeout/retry strategy.