RFC: AI Agent Testing — Phase 1: Historical Validation
Document Conventions (do not remove)
This RFC follows the Qontak RFC Template format for governance — the metadata table, Confluence sections 1–6, and Comment logs are mandatory.
It is also agent-execution-ready: §1 Design References (FE half) + §1 PRD-to-Schema Derivation (BE half), §2 Repo Reading Guide (Detail 2.0) for both layers, mermaid diagrams, §2.G Cross-Layer Contract Verification, and §4 Agent Execution Plan + Verification & Rollback Recipe are complete.
Delivery & project management live elsewhere. This RFC is the technical artifact only — no staffing, effort, timeline, or rollout schedule. Those live in the initiative's
delivery/folder. Until handed to delivery, the Delivery row readsnot yet handed to delivery.Grounding note (important). The PRD describes target behavior. This RFC is reconciled against the current code in
chatbot,chatbot-fe, andqontak-designer(see §2.0 Source Verification). Where the PRD describes behavior that is not yet built, the RFC says so explicitly and scopes it as work. The biggest such gap:FetchRoomConversationsWorkertoday only fetches rooms and extracts Q/A pairs, then logs — it does not sample, generate AI shadow answers, persist question rows, or update test-case status.
Metadata
| Field | Value | Notes |
|---|---|---|
| Status | RFC (IDEA) | Human label; YAML status: draft |
| DRI | Dimas Fauzi Hidayat | Accountable owner carried from PRD; eng tech-lead co-owner to be named in delivery/ |
| Team | chatbot (BOT squad) | Advisory slug carried from PRD |
| Author(s) | Claude (from PRD + repo grounding) | |
| Reviewers | BOT Backend Lead · BOT Frontend Lead · AI Squad Lead · Data Team (Reza) | Cross-squad: BOT, AI, Data, Platform (Chat Service) |
| Approver(s) | BOT Tech Lead · InfoSec Approver | InfoSec required: historical PII → 3rd-party LLM (Open Q #1) |
| Submitted Date | 2026-06-20 | |
| Last Updated | 2026-06-20 | |
| Target Release | 2026-Q3 | Re-baselined; original "May 2026" dates are past (PRD Open Q #3) |
| Target Quarter | 2026-Q3 | |
| Delivery | not yet handed to delivery | |
| Related | ../prds/historical-validation.md · ../ai-agent-testing-anchor.md | |
| Discussion | #bot-ai-alerts |
Type: full-stack Frontend sub-type: new-feature Backend sub-type: new-feature
Sections at a Glance
- Overview (Design References — FE; PRD-to-Schema Derivation — BE; traceability)
- Technical Design (Repo Reading Guide → end-to-end mermaid → DDL → APIs → cross-layer verification)
- High-Availability & Security
- Backwards Compatibility and Rollout Plan
- Concern, Questions, or Known Limitations
- Comment logs
- Ready for agent execution
1. Overview
Phase 1 of the AI Agent: Testing initiative lets a Qontak SPV/Admin validate an AI Agent against a sample of their own resolved, human-handled conversations before going live. The system samples eligible historical rooms (last 90 days), generates an AI "shadow" answer per extracted customer question (never sent to a real customer), and presents a side-by-side comparison of the human "golden" answer vs the AI answer. The SPV rates each answer thumbs up/down; ratings roll up into a confidence meter. At ≥80% the agent is "Ready to Launch"; an activation gate (Should-Have) prevents go-live below threshold.
This RFC is a delta on substantial existing scaffolding, not a greenfield build. The data model, the four read/write endpoints, the rating use case, the Sidekiq worker shell, the conversation-pair extractor, and the chatbot-fe Pinia store + typed API client already exist (see §2.0). The missing pieces — the engineering core of this phase — are:
- Sampling (10% / 50–70 cap) in the worker.
- Shadow-answer generation (LLM call per question) + persistence of
ai_agent_test_case_questionsrows. - Test-case status lifecycle (
pending → processing → completed/failed). - Confidence-score aggregate recompute on rating.
- Activation gate in publish.
- Tree-diagram average-confidence surface.
- The chatbot-fe Testing page (list + detail/comparison + meter) under the
new
bot-automationmodule.
Success Criteria
- Zero customer-message leakage: no
send_message/notification fires for any historical inquiry during shadow generation (AITEST-S04/AC-1). Provable by spec + zeroSendMessageWorkerenqueues during a batch. - Shadow-generation success rate ≥ 95% of sampled questions produce a valid AI answer within 60 days of GA (PRD §13 Quality KPI).
- Batch latency: a ~50-item batch reaches
completedin ≈2–5 min without blocking live production traffic (queue isolation:ai_agent). - Confidence meter equals
(thumbs-up ÷ total sample) × 100, recomputed server-side on every rating (AITEST-S06/AC-3). - Primary product KPI (PRD §13): Configured→Live conversion ≥ 60% within 7 days, within 90 days of GA.
Out of Scope
- Live shadow mode (real-time parallel answering) — strictly historical.
- Model fine-tuning UI; editing the AI answer in the workspace (comparison is read-only).
- Multi-modal validation (images/voice/attachments) — text-only.
- "Generate from knowledge" (Phase 2) and "Imported question list" (Phase 3) sources — scaffolded/disabled only.
- Mobile — web only.
- The
qontak-designerprototype itself — it is a design reference, not a deployable target (see Decision D-1).
Related Documents
- PRD:
../prds/historical-validation.md(BOT-3351) - Initiative anchor:
../ai-agent-testing-anchor.md - Confluence source: https://jurnal.atlassian.net/wiki/spaces/QON/pages/50815303687
- Figma master: Bot · AI Agent Testing
Assumptions
- A single human agent text reply to a customer question is a sufficient "golden answer" (PRD Open Q #7 — Data team to confirm).
- The 90-day lookback and 10%/50–70 cap hold across plan tiers for the beta token budget (PRD Open Q #5).
ai_agent_historiesis stable; a test case binds to oneai_agent_history_id.- Chat Service room/message APIs and
QontakNlpprediction are reachable from the:ai_agentSidekiq worker pool with the org's channel access token. - Production frontend is chatbot-fe (it owns auth, the API client, and Pinia
stores);
qontak-designerhas no API/auth layer (Decision D-1).
Dependencies
| Dependency | Owning team | Deliverable needed | Availability | Blocking? |
|---|---|---|---|---|
| Chat Service (Hub) | Inbox / Platform | Hub::ChatService::Rooms::List (status assigned, date window), Messages::GetByRoom over 90 days | exists (app/core/repositories/chat_service/*, lib/hub/chat_service/*) | YES |
| LLM / AI service (QontakNLP) | AI squad | Batch shadow inference within TPM/RPM limits | exists for live predict (lib/qontak_nlp/inference.rb#prediction); batch/shadow path needs building | YES |
| Data team | Data | 10% sampling + 50–70 cap algorithm | needs building (not in worker today) | YES |
| Channel Integration | Platform | Access tokens for room fetch | exists (Repositories::ChannelIntegrations::GetTokens) | YES |
| AI Agent versioning | BOT | ai_agent_histories stable | exists (app/models/ai_agent_history.rb) | YES |
| Design (Pixel3) | Design system | @mekari/pixel3 Drawer/Modal/Table/Badge | exists (@mekari/pixel3@^1.0.12 in chatbot-fe) | NO |
Design References (frontend half — required)
The PRD's UI is specified in Figma; the qontak-designer prototype is the
in-code design reference (pixel layout + component decomposition) but is itself a
static prototype (no API/auth) — see Decision D-1. Production implementation lands
in chatbot-fe.
| PRD-named surface | Figma / design link | Frame name | Design system version | Design QA contact | Notes |
|---|---|---|---|---|---|
| Testing page (list) | node 16743-298263 | Testing page | @mekari/pixel3@^1.0.12 (chatbot-fe) | BOT Design QA | In-code ref: qontak-designer app/pages/bot-automation/testing/index.vue |
| Generate test case modal + Generate-from-Inbox drawer | node 16514-155786 | Generate flow | @mekari/pixel3@^1.0.12 | BOT Design QA | In-code ref: qontak-designer app/components/bot-automation/testing/{GenerateTestCaseModal,GenerateFromInboxDrawer,TestCaseGeneratingModal}.vue |
| Sampling / generating progress | node 17699-52615 | Generating modal | @mekari/pixel3@^1.0.12 | BOT Design QA | Async progress while batch runs |
| Test-case detail — side-by-side comparison + confidence meter | node 16514-155786 | Comparison view | @mekari/pixel3@^1.0.12 | BOT Design QA | No prototype component exists in qontak-designer for the detail view — build fresh (see §2.0) |
| Activation gate (AI agent main settings) | node 16514-155786 | Activate button | @mekari/pixel3@^1.0.12 | BOT Design QA | chatbot-fe modules/bot-automation/components/AiAgentEditor.vue footer |
| Tree-diagram AI Agent node confidence | node 16514-155786 | Tree node | @mekari/pixel3@^1.0.12 | BOT Design QA | Backend get_tree_diagram_v3 |
PRD-to-Schema Derivation (backend half — required)
| PRD entity / attribute / rule | Persisted as (table.column) | Exposed via | Enforced where | Source |
|---|---|---|---|---|
| A test case binds to an agent + a version | ai_agent_test_cases.ai_agent_id, .ai_agent_history_id (uuid, NOT NULL) | POST /api/v1/ai_agents/:id/test_cases | CreateTestCases use case (404 if version missing) | PRD §9 #1 |
| Test case has a lifecycle status | ai_agent_test_cases.status (string) | list + detail responses | worker transitions (to build); created as 'pending' today | PRD §10.1, AITEST-S08 |
| Test case aggregate confidence | ai_agent_test_cases.confidence_score (integer, nullable) | detail + list | recompute on rating (to build) | AITEST-S06 |
| Sampled question + extracted Q/A | ai_agent_test_case_questions.question (text), .topic (string) | detail response | ExtractConversationPairs + worker persistence (to build) | AITEST-S02/S03 |
| AI shadow answer | ai_agent_test_case_questions.answer (text) | detail | shadow-gen worker (to build) | AITEST-S04/AC-2 |
| Human "golden" answer | ai_agent_test_case_questions.parameters (jsonb) → human_answer | detail | worker persistence (to build) | PRD §16a (2026-06-18) |
| AI metrics | .confidence (int), .response_time (int), .sources (jsonb [{id,name,type}]) | detail | shadow-gen worker (to build) | AITEST-S05/AC-2 |
| Per-question rating | .score (int 0/1), .is_score (bool), .scored_by/_email/_name/_at | PATCH .../questions/:question_id | RateTestCaseQuestion (exists; aggregate recompute to build) | AITEST-S06 |
| Per-question failure | .status (string), .status_description (text) | detail | shadow-gen worker (to build) | AITEST-S04/ERR-1 |
| Soft delete | .deleted_at (acts_as_paranoid) on both tables | DELETE endpoint (to build — see §2.4) | model acts_as_paranoid | PRD §6 |
| Activation gate threshold | confidence_score vs threshold (default 80) | POST /api/v1/ai_agents/:id/publish | Repositories::Publish (gate to build) | AITEST-S07 |
| Tree-diagram avg confidence | computed avg over agent's completed ai_agent_test_cases.confidence_score | GET /api/v3/paths/:id/tree_diagram | Repositories::Paths::GetTreeDiagramV3#add_ai_agent (to build) | AITEST-S10 |
Every §2.3 DDL column and §2.4 endpoint traces back to a row here.
Detail 1.A — PRD Traceability (cross-layer)
Composite AC ids per documents/CLAUDE.md (story-qualified, e.g. AITEST-S01/AC-1).
Forward (PRD AC → RFC):
| PRD composite AC id | FE section / component | BE section / endpoint |
|---|---|---|
| AITEST-S01/AC-1, AC-2 | Testing page + nav item (chatbot-fe) | GET /api/v1/ai_agents/:id/test_cases (set_role) |
| AITEST-S01/ERR-1 | Error blank-slate + ai_workspace_load_failed | list endpoint failure path |
| AITEST-S02/AC-1..3 | generating modal | FetchRoomConversationsWorker sampling (§2.F) |
| AITEST-S03/AC-1..3 | n/a — server-side | ExtractConversationPairs filtering (§2.2) |
| AITEST-S04/AC-1, AC-2, ERR-1 | per-question loading/failed state | shadow-gen worker step (§2.2, §2.F) |
| AITEST-S05/AC-1..3 | TestCaseComparison / QuestionList | GET .../test_cases/:id detail (§2.4) |
| AITEST-S06/AC-1..3, ERR-1 | ConfidenceMeter + thumbs | PATCH .../questions/:id + aggregate recompute (§2.4, §2.F) |
| AITEST-S07/AC-1, AC-2, ERR-1 | Activate button enable/disable | POST /api/v1/ai_agents/:id/publish gate (§2.4) |
| AITEST-S08/AC-1, AC-2, ERR-1 | generating modal + polling | worker async + status lifecycle (§2.F) |
| AITEST-S09/AC-1, AC-2, ERR-1 | Force-activate modal (Could-Have) | publish override + audit (PaperTrail) |
| AITEST-S10/AC-1..3, ERR-1 | Tree-diagram node badge | GetTreeDiagramV3#add_ai_agent |
Reverse (RFC → PRD AC):
| New FE component / BE endpoint / dependency | PRD composite AC id it serves |
|---|---|
chatbot-fe pages/bot-automation/testing/index.vue | AITEST-S01/AC-1 |
chatbot-fe TestCaseComparison.vue + ConfidenceMeter.vue | AITEST-S05/AC-1, AITEST-S06/AC-3 |
| BE worker sampling step | AITEST-S02/AC-1..3 |
| BE worker shadow-gen + persistence step | AITEST-S04/AC-2 |
| BE confidence aggregate recompute | AITEST-S06/AC-2 |
| BE publish gate | AITEST-S07/AC-1 |
BE DELETE .../test_cases/:test_case_id (new) | PRD §6 soft delete |
BE GetTreeDiagramV3 avg-confidence | AITEST-S10/AC-2 |
UI / Consumer Surface Coverage
| PRD-named surface | Consumer | Required reads (BE) | Required writes (BE) | FE component | Status surface |
|---|---|---|---|---|---|
| Testing page (list) | web | GET /api/v1/ai_agents/:id/test_cases | — | pages/bot-automation/testing/index.vue | status, score columns |
| Generate-from-Inbox drawer | web | GET .../ai_agents/:id (versions) | POST .../test_cases | GenerateTestCaseDrawer.vue | status=pending→processing |
| Generating modal | web | GET .../test_cases (poll) | — | TestCaseGeneratingModal.vue | polls status until completed |
| Test-case detail / comparison | web | GET .../test_cases/:id | PATCH .../questions/:id | TestCaseComparison.vue | per-question status, confidence |
| Confidence meter | web | (from detail payload) | — | ConfidenceMeter.vue | confidence_score aggregate |
| AI agent main settings (Activate) | web | GET .../ai_agents/:id | POST .../publish | AiAgentEditor.vue footer | confidence_score vs 80 |
| Tree-diagram node | web | GET /api/v3/paths/:id/tree_diagram | — | bot-flow node (chatbot-fe) | avg_confidence_score |
Role Coverage
| PRD role | Authorization mechanism | Endpoints permitted (BE) | UI surface visibility (FE) | Cross-tenant? | Audit trail |
|---|---|---|---|---|---|
| owner | set_role(%w[owner supervisor admin]) (JWT current_user['role']) | all test-case + publish | full | no (org-scoped) | PaperTrail on test_cases/questions |
| supervisor | set_role | all test-case; publish (not force-override) | full | no | PaperTrail |
| admin | set_role | all test-case + publish + force-override | full | no | PaperTrail (+ override reason) |
| standard agent | set_role rejects (403) | none | menu hidden; route forbidden | no | n/a |
| Super Admin (PRD secondary) | inherits owner/admin role | activate after SPV sign-off | full | no | PaperTrail |
| bot-specialist (S10) | set_role on tree-diagram (owner/supervisor/admin) | GET .../tree_diagram | tree-diagram node | no | n/a (read) |
Menu visibility in chatbot-fe is feature-flag + subscription gated today (not role-gated) — see Decision D-3; server-side
set_roleis the authoritative guard.
PRD Section Coverage
| PRD § | Title | Where covered |
|---|---|---|
| 2 | Phase Context | §1 Overview |
| 3 | One-liner + Problem | §1 Overview |
| 4 | Target Users / Persona | §1 (Role Coverage) |
| 5 | Non-Goals | §1 Out of Scope |
| 6 | Constraints | §3 (perf, security, data lifecycle), §4 (flag) |
| 7 | Feature Changes (CHG-001 tree) | AITEST-S10 → §2.4, §2.F.2 |
| 8 | New Features (Testing page) | §1 Design References, §2.A, Detail 1.C |
| 9 | API & Webhook Behavior | §2.4 |
| 10 | System Flow / Stories / ACs | Detail 1.A, 1.C, §2.2 |
| 11 | Rollout | §4 |
| 12 | Observability | §3 Monitoring |
| 13 | Success Metrics | §1 Success Criteria, §3 |
| 14 | Launch Plan & Stage Gates | §4 (delivery owns schedule) |
| 15 | Dependencies | §1 Dependencies, §2.F.1 |
| 16 | Key Decisions | Detail 1.B, §2 Technical Decisions |
| 17 | Open Questions | §5 |
Detail 1.B — Decisions Closed (cross-layer)
| Decision | Chosen option | Alternatives rejected | Why rejected | Layer |
|---|---|---|---|---|
| D-1 Frontend target repo | chatbot-fe (production); qontak-designer is design reference only | Build in qontak-designer | qontak-designer has zero API client + only mock localStorage auth + no roles (app/composables/useAuth.ts) — cannot satisfy set_role or real data | both |
| D-2 Storage | Reuse existing ai_agent_test_cases / ai_agent_test_case_questions (Postgres, uuid, acts_as_paranoid) | New ai_validation_sessions/_items tables | Superseded by implemented schema (PRD §16b) | BE |
| D-3 Menu gating | Server-side set_role is authoritative; FE menu reuses existing rollout_ai_agent + subscription flag pattern | Role-gate the FE menu only | FE menu gating today is flag/subscription-based (layouts/bot-automation.vue); BE must enforce regardless | both |
| D-4 Batch processing | Async Sidekiq FetchRoomConversationsWorker, queue :ai_agent | Kafka; synchronous request | Already the chatbot async stack (PRD §16a); sync would block & exceed LLM TPM | BE |
| D-5 Shadow inference | Reuse QontakNlp predict path per question, token-bucket throttled in worker (RPM cap SystemPreference, default 60; 429→backoff+requeue→fail-question) | New batch endpoint on AI service | Reuse the proven lib/qontak_nlp/inference.rb#prediction; throttle contract fully specified in §3 Performance (REV-1) | BE |
| D-6 Confidence aggregate | Recompute confidence_score server-side on each rating write | Compute on read; FE-side | Single source of truth; needed by tree-diagram + gate; avoids drift | BE |
| D-7 Activation gate | Add advisory→enforced threshold check in Repositories::Publish behind ai_agent_testing_gate flag; threshold is org-configurable via SystemPreference (group_code: 'engine', code: 'ai_agent_testing_threshold', default 80) | Hard gate from day one; hard-coded 80 constant | Ship advisory for beta (PRD Open Q #2), enforce before GA; configurable threshold resolves PRD Open Q #4 (REV-4) without a redeploy | BE |
| D-8 Delete semantics | Soft delete via acts_as_paranoid; add missing DELETE endpoint | Hard delete | PRD §6 soft-delete + restore; chatbot-fe already calls a delete route that BE lacks | both |
| D-9 Per-status lifecycle | pending → processing → completed/failed (test case); pending → processing → completed/failed (question) | Single boolean done flag | Needs partial/failed surfacing (AITEST-S08/ERR-1) | BE |
| D-10 Sampling cap | 10% random, capped 50–70, ≤50 shown if batch > 100; all rooms if < 10 eligible | Expose params in drawer | Adds user effort; defaults are the trust signal (PRD §16b) | BE/Data |
Minimum-coverage decisions: storage (D-2), sync/async (D-4), caching (
no alternative considered — no read-cache introduced this phase; detail reads are infrequent), third-party (D-5), consistency (D-6, server-authoritative/strong within request), multi-tenancy (set_role+ org-scoped queries), reuse-vs-new (§2.4 Reuse? column).
Detail 1.C — Per-Story Change Map
| Story id | Title | Layer scope | FE changes | BE changes | Composite AC ids | Acceptance criteria (verifiable) | RFC anchors |
|---|---|---|---|---|---|---|---|
| AITEST-S01 | Workspace access control | FE + BE | pages/bot-automation/testing/index.vue; nav item in layouts/bot-automation.vue; FETCH_TEST_CASES (exists) | GET .../test_cases (exists, set_role) | S01/AC-1, AC-2, ERR-1, NEG-1 | rspec: 403 for standard; vitest: error slate fires ai_workspace_load_failed | §2.4 row1 · §2.A · §3 authz |
| AITEST-S02 | Historical sampling (10%) | BE-only | n/a — server-side | FetchRoomConversationsWorker sampling step (new) | S02/AC-1, AC-2, AC-3, ERR-1 | worker spec: 200 rooms→~20; <10→all; 5000→cap 50–70 | §2.F job spec · §4.D chunk 3 |
| AITEST-S03 | Data integrity & filtering | BE-only | n/a | ExtractConversationPairs (exists) — confirm non-text/system skip | S03/AC-1, AC-2, AC-3, ERR-1, NEG-2 | extractor spec: bot-only excluded; image-only skipped | §2.2 · existing spec |
| AITEST-S04 | Shadow execution (zero leakage) | BE-only | per-question failed badge | worker shadow-gen + question persistence (new); QontakNlp predict | S04/AC-1, AC-2, ERR-1 | worker spec: 0 SendMessageWorker enqueues; answer + parameters.human_answer persisted | §2.2 · §2.F · §4.D chunk 4 |
| AITEST-S05 | Side-by-side validation UI | FE + BE | TestCaseComparison.vue, QuestionList.vue (grouped by topic) | GET .../test_cases/:id detail (exists) | S05/AC-1, AC-2, AC-3, ERR-1, NEG-3 | vitest: renders human-left/AI-right + confidence/time/sources; failed→"could not generate" | §2.4 row3 · §2.A |
| AITEST-S06 | Confidence meter & feedback | FE + BE | ConfidenceMeter.vue; thumbs via UPDATE_TEST_CASE_QUESTION (exists, optimistic) | aggregate recompute on rate (new) | S06/AC-1, AC-2, AC-3, ERR-1 | rspec: rating recomputes confidence_score=(up÷total)×100; vitest: rollback on save fail | §2.4 row4 · §2.F.2 · §4.D chunk 5 |
| AITEST-S07 | Activation gatekeeping | FE + BE | Activate button enable/disable in AiAgentEditor.vue footer | publish gate in Repositories::Publish (new, flagged) | S07/AC-1, AC-2, ERR-1 | rspec: publish 422 when <80 & gate on; FE button disabled <80 | §2.4 row5 · §4.D chunk 8 |
| AITEST-S08 | Background processing (async) | BE + FE | generating modal + poll | worker status lifecycle (new) | S08/AC-1, AC-2, ERR-1 | worker spec: status processing→completed; failure→failed + Rollbar | §2.F · §2.1 state |
| AITEST-S09 | Manual override & audit | FE + BE | Force-activate modal (reason) | publish override path + PaperTrail reason (new) | S09/AC-1, AC-2, ERR-1 | rspec: override requires reason; PaperTrail row w/ reason + score | §2.4 row5 · §3 audit |
| AITEST-S10 | Confidence in Tree Diagram | FE + BE | node badge in bot-flow tree (chatbot-fe) | GetTreeDiagramV3#add_ai_agent avg score (new) | S10/AC-1, AC-2, AC-3, ERR-1 | rspec: add_ai_agent returns avg over completed; no test cases→"no score yet" | §2.4 row6 · §2.F.2 |
Every
FE + BErow has both columns filled. S02/S03/S04 areBE-only(server-side pipeline); their UI effects are covered by S05/S08 surfaces.
2. Technical Design
Detail 2.0 — Repo Reading Guide (read this first)
Repo Map (mermaid, both layers)
flowchart LR
subgraph fe["chatbot-fe (Nuxt + Pinia)"]
page["pages/bot-automation/testing/"]
comp["modules/bot-automation/components/testing/"]
store["store/ai-agent/{actions,getters,interface}.ts"]
svc["common/services/main/v1/ai-agents.ts"]
end
subgraph be["chatbot (Rails + Grape)"]
ctrl["api/frontend_service/v1/ai_agent/*_controller.rb"]
uc["use_cases/{create_test_cases,rate_test_case_question,publish_ai_agent}.rb"]
repo["repositories/{create_test_case,rate_test_case_question,publish}.rb"]
worker["workers/fetch_room_conversations_worker.rb"]
chat["core/repositories/chat_service/*"]
nlp["lib/qontak_nlp/inference.rb"]
tree["core/repositories/paths/get_tree_diagram_v3.rb"]
end
subgraph infra
db[("Postgres: ai_agent_test_cases / _questions")]
q[["Sidekiq queue :ai_agent"]]
hub(["Hub Chat Service (HTTP)"])
ai(["QontakNLP AI service (HTTP)"])
end
svc --> ctrl
ctrl --> uc --> repo --> db
uc --> q --> worker
worker --> chat --> hub
worker --> nlp --> ai
worker --> db
ctrl --> tree --> db
Existing Code Anchors
| Layer | Path | Why the agent reads it | What pattern it teaches |
|---|---|---|---|
| BE | app/api/frontend_service/v1/ai_agent/test_cases_controller.rb | The 3 live routes + set_role + result-matcher | Grape route + Dry::Matcher::ResultMatcher + success_response/error_response |
| BE | app/api/frontend_service/v1/ai_agent/use_cases/create_test_cases.rb | Create flow, validation, worker enqueue | APIAbstractUseCase + Dry::Monads::Do + Repositories::*.call |
| BE | app/api/frontend_service/v1/ai_agent/repositories/create_test_case.rb | How a test case is built (status='pending') | AbstractRepository write pattern |
| BE | app/api/frontend_service/v1/ai_agent/repositories/rate_test_case_question.rb | Rating write (no aggregate today) | per-field update; extension point for recompute |
| BE | app/workers/fetch_room_conversations_worker.rb | Worker shell (fetch+extract+log only) | sidekiq_options queue: :ai_agent, retry: false; per-room rescue→Rollbar |
| BE | app/core/repositories/chat_service/extract_conversation_pairs.rb | Q/A pairing, system/non-text skip | customer-question → next agent text reply |
| BE | app/core/repositories/chat_service/fetch_assigned_room_ids.rb | Assigned-room fetch (status:'assigned', LIMIT) | Hub HTTP + cursor pagination |
| BE | lib/qontak_nlp/inference.rb | prediction(...) shape + timeout: 60 | @http.call(method:'POST', url:, body:, open/read_timeout:) |
| BE | app/core/repositories/paths/get_tree_diagram_v3.rb | add_ai_agent (L850–909) node assembly | where to add avg-confidence |
| BE | app/api/frontend_service/v1/ai_agent/repositories/publish.rb | Publish = set active_version_id (no gate) | extension point for gate |
| BE | app/core/repositories/system_preferences/feature_flag.rb | FeatureFlag.enabled?(group_code, code) | org-level flag mechanism |
| FE | common/services/main/v1/ai-agents.ts | 5 test-case client methods (incl. deleteTestCase) | $apiMain + endpoint.v1.ai_agents.test_cases.* |
| FE | store/ai-agent/interface.ts | TestCase, TestCaseQuestion, TestCaseDetail types | typed payloads/responses |
| FE | store/ai-agent/actions.ts | CREATE/FETCH/FETCH_DETAIL/DELETE/UPDATE_QUESTION | $patch fetchStatus pending/resolved/rejected + optimistic rollback |
| FE | modules/bot-automation/components/ai-agents/AiAgentsTable.vue | list table loading/empty/pagination | tableContent + empty illustration |
| FE | modules/bot-automation/components/AiAgentEditor.vue | settings footer (Save button) | where Activate/gate lands (L1712–1733) |
| FE | layouts/bot-automation.vue | menu listMenu + flag gating (L209–349) | where Testing nav item lands |
| Design | qontak-designer app/pages/bot-automation/testing/index.vue | table columns + states (design ref) | 6 columns: name/type/score/status/updated/actions |
Existing Contracts to Reuse, Extend, or Replace (BE)
| Contract | Status | Justification | Owner |
|---|---|---|---|
GET /api/v1/ai_agents/:id/test_cases | reuse | exists, set_role | BOT |
POST /api/v1/ai_agents/:id/test_cases | extend | exists; add name persist, status→processing, real pipeline | BOT |
GET /api/v1/ai_agents/:ai_agent_id/test_cases/:id | reuse | exists (GetAiAgentTestCaseDetail, serializes confidence_score) | BOT |
PATCH .../test_cases/:test_case_id/questions/:question_id | extend | exists; add aggregate recompute | BOT |
DELETE /api/v1/ai_agents/:id/test_cases/:test_case_id | new-with-justification | chatbot-fe deleteTestCase calls it but no BE route exists (only delete '/:id' deletes the agent); PRD §6 soft delete needs it | BOT |
POST /api/v1/ai_agents/:id/publish | extend | exists; add confidence gate (flagged) | BOT |
GET /api/v3/paths/:id/tree_diagram | extend | exists; add_ai_agent add avg confidence | BOT |
FetchRoomConversationsWorker | extend | exists; add sampling + shadow-gen + persistence + status | BOT |
lib/qontak_nlp/inference.rb#prediction | reuse | live-predict path; call per question with throttle | AI squad |
Patterns to Follow (and where to find them)
| Layer | Concern | Pattern in repo | Reference file | Deviation? |
|---|---|---|---|---|
| FE | State management | Pinia store w/ fetchStatus enum | store/ai-agent/actions.ts | none |
| FE | Error/optimistic | snapshot + rollback on reject | store/ai-agent/actions.ts UPDATE_TEST_CASE_QUESTION | none |
| FE | List loading/empty | tableContent + empty illustration | modules/bot-automation/components/ai-agents/AiAgentsTable.vue | none |
| FE | API client | $apiMain + endpoint map | common/services/main/v1/ai-agents.ts | none |
| BE | HTTP handler | Grape + ResultMatcher + success_response | test_cases_controller.rb | none |
| BE | Use case | APIAbstractUseCase + Dry::Monads::Do.for(:result) | create_test_cases.rb | none |
| BE | Repository write | AbstractRepository#call | create_test_case.rb | none |
| BE | Worker | Sidekiq::Worker + sidekiq_options queue: + per-item rescue→Rollbar | fetch_room_conversations_worker.rb, ask_airene_predict_worker.rb | none |
| BE | Feature flag | SystemPreferences::FeatureFlag.enabled? | feature_flag.rb | none |
| BE | Error shape | ErrorException(message:[], code:, errors:, error_code:) | helpers/error_response_helpers.rb | none |
| Cross | snake_case API → FE | FE consumes snake_case JSON directly | store/ai-agent/interface.ts | none |
Reading Order for the Agent
chatbot/app/api/frontend_service/v1/ai_agent/test_cases_controller.rb— live routes + auth.chatbot/app/api/frontend_service/v1/ai_agent/use_cases/create_test_cases.rb— create + enqueue.chatbot/app/workers/fetch_room_conversations_worker.rb— the worker to extend (the core gap).chatbot/app/core/repositories/chat_service/extract_conversation_pairs.rb— Q/A extraction.chatbot/lib/qontak_nlp/inference.rb— the predict call to reuse for shadow gen.chatbot/app/api/frontend_service/v1/ai_agent/repositories/rate_test_case_question.rb— recompute extension point.chatbot/app/api/frontend_service/v1/ai_agent/repositories/publish.rb— gate extension point.chatbot/app/core/repositories/paths/get_tree_diagram_v3.rb(add_ai_agent) — tree surface.chatbot-fe/store/ai-agent/{actions,interface}.ts— the FE store/types already wired.chatbot-fe/modules/bot-automation/components/ai-agents/AiAgentsTable.vue— list/empty/loading pattern to mirror.
Source Verification (anti-hallucination — required)
| Layer | Anchor / contract | Verified by | Evidence |
|---|---|---|---|
| BE | ai_agent_test_cases schema | read migration | db/migrate/20260512000001_create_ai_agent_test_cases.rb: cols ai_agent_history_id uuid NOT NULL, status string, confidence_score integer, type string, deleted_at; uuid PK |
| BE | ai_agent_test_case_questions schema | read migration | db/migrate/20260512000002_..._questions.rb: topic, question(text), answer(text), is_score(bool default false), score(int), scored_by(uuid)/_email/_name, scored_at, response_time(int), confidence(int), status, status_description(text), sources(jsonb default []), parameters(jsonb default {}), deleted_at |
| BE | Models soft-delete | read | app/models/ai_agent_test_case.rb L4 acts_as_paranoid, L5 has_paper_trail; same in ai_agent_test_case_question.rb |
| BE | DB dialect / migrator | read | config/database.yml adapter: postgresql; db/schema.rb ActiveRecord::Schema[7.1], enable_extension "pgcrypto" |
| BE | Routes + mount | read | app/api/frontend_service/api.rb L47-48 mount V1::AiAgent::TestCasesController => '/v1/ai_agents'; config/routes.rb mounts APIBase => '/api/' → full /api/v1/ai_agents |
| BE | 3 live routes | read | test_cases_controller.rb L32 get '/:id/test_cases', L75 post, L115 patch '/:id/test_cases/:test_case_id/questions/:question_id' — all set_role(%w[owner supervisor admin]) |
| BE | No delete test-case route | grep | only ai_agents_controller.rb L237 delete '/:id' (deletes agent) — no test-case delete |
| BE | Create status pending | read | repositories/create_test_case.rb L19 record.status = 'pending'; use case enqueues FetchRoomConversationsWorker.perform_async |
| BE | Worker does NOT sample/generate/persist | read full file | app/workers/fetch_room_conversations_worker.rb L1-43: fetch rooms → extract pairs → Rails.logger.info{...}; no LLM, no question insert, no status update |
| BE | Queue :ai_agent | read | worker L5 sidekiq_options queue: :ai_agent, retry: false; config/sidekiq.yml lists ai_agent queue |
| BE | Extraction logic | read | extract_conversation_pairs.rb L40-66: skip SYSTEM; customer text → pending_question if question?; agent text reply → pair; non-text skipped |
| BE | Rating no aggregate | read | repositories/rate_test_case_question.rb L13-19 sets score/is_score/scored_by*/scored_at only; no confidence_score write |
| BE | Publish no gate | read | repositories/publish.rb L12-18 @ai_agent.update!(active_version_id: @ai_agent.version_id); no threshold |
| BE | Tree diagram v3 | read | route app/api/frontend_service/v3/path.rb L17 get ':id/tree_diagram'; core/repositories/paths/get_tree_diagram_v3.rb add_ai_agent L850-909 builds node sans confidence |
| BE | QontakNLP predict | read | lib/qontak_nlp/inference.rb#prediction timeout: 60, @http.call(method:'POST', ...); core/repositories/qontak_nlp/predict.rb resolves timeout via system pref |
| BE | Chat Service fetch | read | fetch_assigned_room_ids.rb Hub::ChatService::Rooms::List status:'assigned', limit: LIMIT; fetch_room_messages.rb Messages::GetByRoom |
| BE | Token fetch | read | channel_integrations/get_tokens.rb access_token from chatbot_tokens_encrypted (lockbox) |
| BE | Error/success shape | read | helpers/success_response_helpers.rb {status, code, message, data, meta}; error_response_helpers.rb ErrorException(message:[], code:, errors:, error_code:) |
| BE | Feature flag | read | feature_flag.rb FeatureFlag.enabled?(group_code, code, default:); no ai_agent_testing flag exists yet |
| BE | Test commands | read | bin/rspec_pipeline.sh RAILS_ENV=test bundle exec rspec spec/...; bitbucket-pipelines.yml bundle exec rubocop; Gemfile has brakeman |
| BE | Existing specs | ls | spec/api/frontend_service/v1/ai_agent/{create_test_cases,get_test_cases,rate_test_case_question}_spec.rb; spec/workers/fetch_room_conversations_worker_spec.rb; spec/core/repositories/chat_service/extract_conversation_pairs_spec.rb |
| FE | Test-case API client | read | common/services/main/v1/ai-agents.ts L250-351: createTestCase/getTestCases/getTestCaseDetail/deleteTestCase/updateTestCaseQuestion; endpoint.ts L207-214 paths |
| FE | Types | read | store/ai-agent/interface.ts L268-365: `TestCase, CreateTestCasePayload, TestCaseQuestion, TestCaseDetail, UpdateTestCaseQuestionPayload(score:0 |
| FE | Pinia actions | read | store/ai-agent/actions.ts CREATE_TEST_CASE(745), FETCH_TEST_CASES(803), FETCH_TEST_CASE_DETAIL(849), DELETE_TEST_CASE(902), UPDATE_TEST_CASE_QUESTION(950, optimistic rollback) |
| FE | No Testing page yet | ls | pages/bot-automation/ has actions/ai-agents/ai-agent[id]; no /testing; old UI in modules/ai-agent/components/forms/ValidationDetailPanel.vue |
| FE | Menu gating flag/sub | read | layouts/bot-automation.vue L209-349 listMenu gated by rolloutAIAgentPreferences/aiAgentEnabled/isNewAIAgentEngine — not roles |
| FE | Activate button absent | read | modules/bot-automation/components/AiAgentEditor.vue L1712-1733 footer shows "Save changes" only |
| FE | Design system | read | chatbot-fe package.json @mekari/pixel3@^1.0.12; qontak-designer @mekari/pixel3@1.0.13-dev.0 |
| FE | Test commands | read | chatbot-fe package.json: test: vitest run, test:e2e: playwright test, lint, build: nuxt build |
| Design | qontak-designer is static prototype | read/grep | no api/ folder, no $fetch/useFetch; app/composables/useAuth.ts mock localStorage, no roles |
Design ↔ Code Mapping (frontend half)
| Figma frame / component | Implementing file (chatbot-fe) | Reuse vs new | Tokens | Backing API | Deviation |
|---|---|---|---|---|---|
| Testing page (list) | pages/bot-automation/testing/index.vue + modules/bot-automation/components/testing/TestCasesTable.vue | new (mirror AiAgentsTable.vue) | color.surface.*, space.*, text.body* | GET .../test_cases | none — pattern-faithful |
| Generate modal + Inbox drawer | modules/bot-automation/components/testing/{GenerateTestCaseModal,GenerateFromInboxDrawer}.vue | new (port from qontak-designer layout) | Pixel3 MpModal/MpDrawer | POST .../test_cases | adds version selector (prototype lacks it — see §5 Q-A) |
| Generating modal | .../testing/TestCaseGeneratingModal.vue | new | MpModal + progress | poll GET .../test_cases | none |
| Comparison + question list | .../testing/TestCaseComparison.vue, QuestionList.vue | new (no prototype exists) | MpAccordion, MpBadge | GET .../test_cases/:id | reference old modules/ai-agent/.../ValidationDetailPanel.vue for layout |
| Confidence meter | .../testing/ConfidenceMeter.vue | new | MpBadge/progress | from detail payload | none |
| Activate gate | modules/bot-automation/components/AiAgentEditor.vue (footer) | extend | MpButton | POST .../publish | none |
The Comparison/detail view has no qontak-designer prototype — flag for Design QA before the chunk lands (§5 Q-A).
Detail 2.1 — Architecture (mermaid)
End-to-end component diagram
flowchart TB
user([SPV/Admin]) --> page["chatbot-fe Testing page"]
page --> store["Pinia ai-agent store"]
store --> client["ai-agents.ts client"]
client --> ctrl["/api/v1/ai_agents/.../test_cases/"]
ctrl --> ucCreate[CreateTestCases UC]
ucCreate --> repoCreate[(CreateTestCase repo)]
repoCreate --> db[("ai_agent_test_cases")]
ucCreate --> q[["Sidekiq :ai_agent"]]
q --> worker[FetchRoomConversationsWorker]
worker --> chat["ChatService repos"] --> hub(["Hub Chat Service"])
worker --> nlp["QontakNlp predict"] --> ai(["AI service"])
worker --> dbq[("ai_agent_test_case_questions")]
ctrl --> ucRate[RateTestCaseQuestion UC] --> dbq
ucRate --> agg[["recompute confidence_score"]] --> db
ctrl --> tree["GetTreeDiagramV3#add_ai_agent"] --> db
Data model (mermaid erDiagram)
erDiagram
AI_AGENTS ||--o{ AI_AGENT_HISTORIES : versions
AI_AGENTS ||--o{ AI_AGENT_TEST_CASES : has
AI_AGENT_HISTORIES ||--o{ AI_AGENT_TEST_CASES : binds
AI_AGENT_TEST_CASES ||--o{ AI_AGENT_TEST_CASE_QUESTIONS : has
AI_AGENT_TEST_CASES {
uuid id PK
uuid ai_agent_id FK
uuid ai_agent_history_id FK
int organization_id
string status
int confidence_score
string type
datetime deleted_at
}
AI_AGENT_TEST_CASE_QUESTIONS {
uuid id PK
uuid ai_agent_test_case_id FK
string topic
text question
text answer
int score
bool is_score
int confidence
int response_time
string status
text status_description
jsonb sources
jsonb parameters
datetime deleted_at
}
State machine — test-case status
stateDiagram-v2
[*] --> pending: POST create
pending --> processing: worker starts
processing --> completed: all questions generated
processing --> failed: fatal worker error
completed --> completed: ratings update (no status change)
failed --> processing: retry (re-enqueue)
completed --> [*]
State machine — question status
stateDiagram-v2
[*] --> pending: row created
pending --> processing: shadow-gen starts
processing --> completed: LLM answer stored
processing --> failed: LLM error (status_description set)
completed --> [*]
failed --> [*]
Branch & skip flow — sampling & filtering
flowchart TD
start([worker: rooms fetched]) --> elig{eligible rooms count}
elig -- "< 10" --> all[use 100% of rooms]
elig -- ">= 10" --> sample["random 10%"]
sample --> cap{"> 50-70 cap?"}
cap -- yes --> capped[truncate to cap]
cap -- no --> kept[keep sample]
all --> extract[ExtractConversationPairs]
capped --> extract
kept --> extract
extract --> nonText{text-only Q/A?}
nonText -- no --> skip[skip room, not counted]
nonText -- yes --> gen[shadow-generate + persist question]
skip --> done([batch continues])
gen --> done
Detail 2.2 — Sequence (mermaid, end-to-end incl. failure)
Happy path — generate test case (async batch with shadow gen)
sequenceDiagram
actor U as SPV (chatbot-fe)
participant LB as LB / API gateway
participant API as chatbot Grape API
participant UC as CreateTestCases
participant DBW as Postgres primary
participant Q as Sidekiq :ai_agent
participant W as FetchRoomConversationsWorker
participant HUB as Hub Chat Service
participant NLP as QontakNLP AI service
U->>LB: POST /api/v1/ai_agents/:id/test_cases {type, version_id, name}
LB->>API: HTTP
API->>API: set_role(owner/supervisor/admin)
API->>UC: handle
UC->>DBW: INSERT ai_agent_test_cases (status='processing')
UC->>Q: FetchRoomConversationsWorker.perform_async
UC-->>API: 201 {data: test_case}
API-->>U: 201 (UI shows generating modal, polls status)
Note over Q,W: async — worker picks up within seconds
W->>HUB: GET assigned rooms (status=assigned, 90d, limit=100)
HUB-->>W: room_ids
W->>W: sample 10% (cap 50-70; all if <10)
loop per sampled room
W->>HUB: GET messages by room
HUB-->>W: messages
W->>W: ExtractConversationPairs (text-only)
loop per Q/A pair
W->>DBW: INSERT question (status='processing', parameters.human_answer)
W->>NLP: POST predict {message: question} (NOT send_message)
Note right of NLP: timeout 60s; throttle for TPM/RPM
NLP-->>W: {answer, confidence, sources, response_time}
W->>DBW: UPDATE question (answer, confidence, sources, status='completed')
end
end
W->>DBW: UPDATE test_case status='completed'
Failure path — LLM error on one question (batch continues)
sequenceDiagram
participant W as Worker
participant DBW as Postgres primary
participant NLP as QontakNLP
W->>DBW: INSERT question (status='processing')
W->>NLP: POST predict
Note right of NLP: timeout after 60s / 5xx
NLP--xW: error
W->>W: Rollbar.error(test_case_id, room_id)
W->>DBW: UPDATE question status='failed', status_description=error
Note over W: continue with next question; test_case still reaches 'completed' (partial)
Failure path — Chat Service room-list unavailable (whole batch)
sequenceDiagram
participant W as Worker
participant HUB as Hub Chat Service
participant DBW as Postgres primary
W->>HUB: GET assigned rooms
HUB--xW: 5xx / timeout
W->>W: Rollbar.error
W->>DBW: UPDATE test_case status='failed'
Note over W: UI surfaces error + Retry (AITEST-S02/ERR-1)
Detail 2.3 — Database Model (DDL)
No new tables. Both tables exist (migrations 20260512000001, 20260512000002,
Postgres, ActiveRecord::Migration[7.1], uuid PK via pgcrypto). This phase requires
one additive migration to support partial/failed surfacing if not already present
(verify status_description exists — it does per schema). No destructive change.
Current shape (verified — for the agent's reference, not re-created):
-- db/migrate/20260512000001_create_ai_agent_test_cases.rb (EXISTS)
CREATE TABLE ai_agent_test_cases (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
ai_agent_history_id uuid NOT NULL,
ai_agent_id uuid NOT NULL,
organization_id integer NOT NULL,
company_id varchar NOT NULL,
status varchar,
confidence_score integer,
type varchar,
deleted_at timestamp,
created_at timestamp NOT NULL,
updated_at timestamp NOT NULL
);
-- indexes: organization_id, company_id, ai_agent_history_id, ai_agent_id, status, type
-- db/migrate/20260512000002_create_ai_agent_test_case_questions.rb (EXISTS)
CREATE TABLE ai_agent_test_case_questions (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
ai_agent_test_case_id uuid NOT NULL, -- FK ON DELETE CASCADE
organization_id integer NOT NULL,
company_id varchar NOT NULL,
topic varchar, question text, answer text,
is_score boolean DEFAULT false, score integer,
scored_by uuid, scored_by_email varchar, scored_by_name varchar, scored_at timestamp,
started_at timestamp, completed_at timestamp,
response_time integer, confidence integer,
status varchar, status_description text,
sources jsonb DEFAULT '[]', parameters jsonb DEFAULT '{}',
deleted_at timestamp, created_at timestamp NOT NULL, updated_at timestamp NOT NULL
);
-- indexes: ai_agent_test_case_id, organization_id, company_id, status, score, topic
Additive migration (this phase — required, not conditional). The FE TestCase
type carries name (store/ai-agent/interface.ts) and the Generate drawer submits
it, but BE create does not persist a name today (repositories/create_test_case.rb
sets no name). Chunk 1 adds and persists the column so the create round-trip and
the list name column are non-empty (resolves the §2.G partial row, REV-2):
-- db/migrate/2026XXXXXXXXXX_add_name_to_ai_agent_test_cases.rb
ALTER TABLE ai_agent_test_cases ADD COLUMN name varchar;
CREATE INDEX index_ai_agent_test_cases_on_name ON ai_agent_test_cases (name);
name is nullable for backward compatibility (existing rows have none); CreateTestCase
persists params[:name] (length-bounded ≤ 24, validated in the use-case contract).
- Cardinality: ~1 test case per agent per validation run; questions 10–70 per case.
- Growth: bounded by cap (≤70 questions/case). PII:
question,answer,parameters.human_answercontain customer/agent text → see §3 Compliance. - Retention: soft-delete (
deleted_at); hard-purge window TBD (Open Q #8).
Per-status lifecycle — ai_agent_test_cases.status:
| Status | Visibility | Retention | Restore | Transitions allowed |
|---|---|---|---|---|
pending | list (transient) | until processed | n/a | → processing |
processing | list w/ spinner | during batch | n/a | → completed / failed |
completed | default list | until soft-deleted | restore via paranoid | (ratings only) |
failed | list w/ error | until soft-deleted | re-run (re-enqueue) | → processing |
| (soft-deleted) | hidden | until hard-purge (TBD) | restore (paranoid) | — |
Per-status lifecycle — ai_agent_test_case_questions.status:
| Status | Visibility | Retention | Restore | Transitions |
|---|---|---|---|---|
pending | n/a (transient) | during batch | n/a | → processing |
processing | per-question spinner | during batch | n/a | → completed / failed |
completed | comparison shown | with parent | with parent | (rating only) |
failed | "could not generate" | with parent | re-run | → processing |
Detail 2.4 — APIs
Base: /api/v1/ai_agents (verified mount, §2.0). All set_role(%w[owner supervisor admin]). Success {status, code, message, data, meta?}; error ErrorException.
Outbound endpoints (consumers call us)
| Endpoint | Method | AuthN/AuthZ | Request | Response | Status codes | Idempotency | Versioning | Reuse? |
|---|---|---|---|---|---|---|---|---|
/api/v1/ai_agents/:id/test_cases | GET | api_auth + set_role | query: page,limit,query,status,order_by,order_direction | {data:[TestCase], meta} | 200, 403 | n/a (read) | v1 | reuse |
/api/v1/ai_agents/:id/test_cases | POST | api_auth + set_role | {type, version_id, name} | {data: TestCase} (status processing) | 201, 404 (version), 422, 403 | client-side dedupe; server idempotent per (agent,version,name) recommended | v1 | extend |
/api/v1/ai_agents/:ai_agent_id/test_cases/:id | GET | api_auth + set_role | path | {data: TestCaseDetail{questions[]}} incl. confidence_score, per-question answer/parameters.human_answer/confidence/sources/response_time/score/status | 200, 404 | n/a | v1 | reuse |
/api/v1/ai_agents/:id/test_cases/:test_case_id/questions/:question_id | PATCH | api_auth + set_role | {score: 0|1} | {data: question} + recomputed confidence_score | 200, 404, 422 (score not 0/1) | last-write-wins per question | v1 | extend |
/api/v1/ai_agents/:id/test_cases/:test_case_id | DELETE | api_auth + set_role | path | {status:success} (soft delete) | 200, 404, 403 | idempotent (already-deleted → 200/404) | v1 | new-with-justification (FE client exists; BE route missing) |
/api/v1/ai_agents/:id/publish | POST | api_auth + set_role | {override_reason?} | {data: ai_agent} | 200, 404, 422 (below threshold when gate on) | idempotent | v1 | extend (add gate) |
/api/v3/paths/:id/tree_diagram | GET | api_auth + set_role | path | {data: tree} w/ ai_agent.avg_confidence_score | 200, 404 | n/a | v3 | extend |
Inbound webhooks (other services call us)
N/A — reason: this phase introduces no inbound webhooks. Shadow generation is a
synchronous outbound call from the worker to QontakNLP (no callback); room/message
fetch is outbound to Hub Chat Service.
DELETE contract (REV-3) — full specification (this is a new BE route; chatbot-fe's
deleteTestCase already calls it):
- Request: no body. Path params
:id(ai_agent),:test_case_id. - Behavior: soft delete via
acts_as_paranoid—DeleteTestCaseuse case loads the test case scoped tocurrent_user['chatbot_organization_id']and calls.destroy(paranoid setsdeleted_at). Childai_agent_test_case_questionsare soft-deleted via the model'sdependent: :destroy(also paranoid). No hard delete. - Response:
200 {status:"success", code:200, message:"OK", data:{ id }}. - Status codes:
200on success;404"Test case not found" when the id does not exist or belongs to another org (cross-tenant reads return 404, not 403, to avoid id enumeration);403only whenset_rolerejects the role outright; idempotent — deleting an already-soft-deleted case returns404(the paranoid default scope hides it). - Restore: out of scope for the API this phase; soft-deleted rows are restorable via
acts_as_paranoid.restoreif a future undo surface is added (hard-purge window is Open Q #8). No restore endpoint is exposed now.
Example create request/response:
// POST /api/v1/ai_agents/7e.../test_cases
{ "type": "inbox", "version_id": "a1...", "name": "War room sample #1" }
// 201
{ "status":"success","code":201,"message":"OK",
"data": { "id":"c3...","type":"inbox","version_id":"a1...","name":"War room sample #1","status":"processing","score":null } }
Detail 2.A — UI Contract
ConfidenceMeter.vue (new)
- Figma: node 16514-155786 · file
chatbot-fe/modules/bot-automation/components/testing/ConfidenceMeter.vue - Props:
interface ConfidenceMeterProps {
scorePercent: number; // 0..100, = (thumbsUp / total) * 100
threshold?: number; // default 80
totalRated: number;
totalSample: number;
}
- State owner: derived from
store/ai-agenttestCaseDetail(no local source of truth). - Events: none emitted; analytics
ai_validation_completedfires from parent whenscorePercent >= threshold. - Conditional render:
< threshold→ "Low Confidence" (warning);>= threshold→ "Ready to Launch" (success). - A11y:
role="progressbar",aria-valuenow/min/max, label "Confidence meter".
TestCaseComparison.vue (new)
- Figma: node 16514-155786 · file
.../testing/TestCaseComparison.vue - Props:
interface TestCaseComparisonProps {
question: TestCaseQuestion; // from store/ai-agent/interface.ts
readonly: true; // comparison is read-only (NEG-3)
}
- Events:
@rate{ questionId: string; score: 0 | 1 }→ dispatchesUPDATE_TEST_CASE_QUESTION. - Conditional:
question.status === 'failed'→ AI panel shows "could not generate" (S05/ERR-1); else human-left / AI-right with confidence/response_time/sources. - A11y: thumbs are
<button>witharia-pressed.
Detail 2.B — Data-Fetching Strategy
- Library: Pinia store +
$apiMain(ofetch) — existing (store/ai-agent/actions.ts). - Cache key: store state slices
testCases,testCaseDetail(no external cache lib). - TTL / refetch: refetch on mount; poll
GET .../test_casesevery ~3 s while any rowstatus ∈ {pending, processing}(S08/AC-2), stop atcompleted/failed. - SWR: no — explicit fetchStatus enum (
pending/resolved/rejected). - Optimistic updates: yes for rating —
UPDATE_TEST_CASE_QUESTIONsnapshots questions and rolls back on reject (existing). On success, dispatch a meter recompute read (the BE returns recomputed aggregate).
Detail 2.C — UI State Matrix
| Surface | Loading | Empty | Error | Partial | Success |
|---|---|---|---|---|---|
| Testing list | skeleton rows (mirror AiAgentsTable) | "No test cases yet" + Generate CTA | blank slate "Couldn't load" + Retry; log ai_workspace_load_failed | some rows processing (poll) | table rows |
| Generate drawer | submit spinner | n/a | inline validation (name/version) | n/a | drawer closes → generating modal |
| Generating modal | progress bar + count | n/a | failed badge + Retry | partial count shown | → list shows completed |
| Detail/comparison | skeleton | "no questions in this test case" | retry | some questions failed (per-card) | human/AI panels |
| Confidence meter | 0% until first rating | 0% (no ratings) | n/a (derived) | partial as more rated | meter + Ready-to-Launch |
Detail 2.D — Data Integrity Matrix
| Write path | Transaction scope | Partial failure | Idempotency | Consistency | Duplicate handling | Stale read |
|---|---|---|---|---|---|---|
| Create test case | single INSERT | 422 if save fails (no worker enqueued) | recommend unique-ish (agent,version,name) guard | strong (single row) | second click → second row (FE disables button) | n/a |
| Worker: per-question INSERT+UPDATE | per-question (not one big trx) | failed question → status=failed, batch continues | re-run replaces by deleting prior questions for the case (re-enqueue) | eventual (batch) | re-run guard: clear existing questions before regen | poll reflects within ~3 s |
| Rate question + recompute | question UPDATE then aggregate UPDATE in one trx | rollback both on failure → FE restores prior | last-write-wins per question | strong within request | repeated same score → idempotent | meter refetched after write |
| Publish (gate) | single UPDATE active_version_id | 422 if <80 & gate on | idempotent | strong | n/a | n/a |
Detail 2.E — Concurrency Collision Map
| Resource | Writers | Collision | Resolution | On failure |
|---|---|---|---|---|
ai_agent_test_cases.confidence_score | concurrent raters on same case | two thumbs near-simultaneously | recompute reads current question scores in-trx (no stored delta) | last recompute wins; value is deterministic from question rows |
ai_agent_test_case_questions of a case | worker (regen) vs rater | rate while re-run in flight | re-run sets case processing → FE disables rating; reject rating with 409/422 if case not completed | FE shows "regenerating" |
ai_agents.active_version_id | publish vs publish | double activate | DB row update idempotent | last write wins |
Detail 2.F — Async Job / Event Consumer Spec
| Job | Trigger | Input | Retry | DLQ | Concurrency | Idempotency | Per-msg timeout | Poison handling |
|---|---|---|---|---|---|---|---|---|
FetchRoomConversationsWorker (extend) | perform_async from CreateTestCases | {test_case_id, organization_id} | retry: false today → change to bounded retry (e.g. 3, exp backoff) for transient Hub/NLP errors; set failed on exhaustion | none (Sidekiq dead set; alert on failure) | queue :ai_agent (sidekiq.yml concurrency 5 staging / 10 prod) | re-run clears prior questions for test_case_id before regen | per-question NLP read/open_timeout: 60s | fatal → Rollbar.error + test_case status=failed; per-room/per-question errors skip-and-continue (existing pattern) |
Detail 2.F.1 — Responsibility Boundary Matrix
| Step | Owning squad / service | Inbound trigger | Outbound effect | Failure handler | PRD anchor |
|---|---|---|---|---|---|
| 1. Create test case | BOT (chatbot API) | SPV POST | row + worker enqueue | 404/422 | §9 #1, S01 |
| 2. Fetch assigned rooms | Platform / Hub Chat Service | worker | room_ids | Rollbar + case failed | §15, S02 |
| 3. Sample 10%/cap | Data (algorithm) / BOT (impl) | worker | sampled rooms | n/a (deterministic) | §15, S02 |
| 4. Extract Q/A | BOT (ExtractConversationPairs) | worker | pairs | room skipped | S03 |
| 5. Shadow generate | AI squad (QontakNLP) | worker per question | answer/confidence/sources | question failed, continue | §15, S04 |
| 6. Persist questions + status | BOT | worker | DB rows | partial; case completed | S04, S08 |
| 7. Rate + recompute | BOT | SPV PATCH | aggregate score | rollback | S06 |
| 8. Gate publish | BOT | SPV POST publish | activate or 422 | 422 below threshold | S07 |
| 9. Tree-diagram avg | BOT | tree-diagram read | node score | "no score yet" | S10 |
Step 3 ownership (Data vs BOT) and Step 5 throttling contract (TPM/RPM) are the two cross-squad items to confirm before build (Open Q #5, §5).
Detail 2.F.2 — State Surface Contract
| Entity | State field / event | Defaults | Updated by | Read via | Stale window |
|---|---|---|---|---|---|
| Test case | status | pending | worker | GET .../test_cases (poll) | ~3 s (poll interval) |
| Test case | confidence_score | null | rate recompute | detail + list | immediate (write-through) |
| Question | status / status_description | pending | worker shadow-gen | detail | batch duration |
| Question | score / is_score | null / false | RateTestCaseQuestion | detail | immediate |
| AI agent (tree) | avg_confidence_score (computed) | "no score yet" | GetTreeDiagramV3 | tree-diagram | per request |
Detail 2.G — Cross-Layer Contract Verification
| Endpoint | BE response schema | FE expected schema | Match? | Gaps |
|---|---|---|---|---|
GET .../test_cases | {status,code,message,data:[...],meta} | TestCasesResponse {data: TestCase[]} | yes | FE reads data; meta optional |
POST .../test_cases | {data: test_case} (snake_case, incl. name) | CreateTestCasePayload {agent_id,type,name,version_id} → CreateTestCaseResponse{data:TestCase} | yes (after chunk 1) | resolved (REV-2): name column added + persisted in §2.3/§4.D chunk 1 — no longer conditional |
GET .../test_cases/:id | {data: {..., questions:[...]}} incl. confidence_score, parameters.human_answer | TestCaseDetailResponse {data: TestCaseDetail{questions}} | yes | FE TestCaseQuestion fields align 1:1 with schema |
PATCH .../questions/:id | {data: question} + recomputed aggregate | UpdateTestCaseQuestionPayload {score:0|1} | yes (after chunk 5) | aggregate recompute is in-scope work (§4.D chunk 5); meter is static until it lands |
DELETE .../test_cases/:id | 200 {data:{id}} soft delete (full contract §2.4) | deleteTestCase client exists | yes (after chunk 6) | resolved (REV-3): BE route + contract specified in §2.4; built in §4.D chunk 6 |
All rows now reach
Match? = yesonce their named execution chunk lands — there is no silent cross-layer divergence. The three former gaps (name persistence → chunk 1, aggregate recompute → chunk 5, delete endpoint → chunk 6) are explicit in-scope work.
Detail 2.H — End-to-End Data Flow
SPV clicks Generate → GenerateFromInboxDrawer @confirm {name, version} → store CREATE_TEST_CASE → ai-agents.ts POST /api/v1/ai_agents/:id/test_cases → CreateTestCases UC → INSERT (status processing) + Sidekiq enqueue → 201 → FE generating modal polls GET .../test_cases → worker (sample → extract → per-question NLP predict → persist) → status completed → SPV opens detail GET .../test_cases/:id → TestCaseComparison renders → @rate PATCH .../questions/:id → recompute confidence_score → meter updates → at ≥80% Activate enabled → POST .../publish.
- Side effects: PaperTrail versions on test_cases/questions; analytics events (§3); Rollbar on errors.
- Ownership: FE (chatbot-fe) steps 1, detail render, rating UI; BE (chatbot) create/worker/rate/publish/tree; Platform (Hub) rooms; AI (NLP) predict.
Detail 2.I — Scope Boundaries
- BE create:
chatbot/app/workers/fetch_room_conversations_worker.rb(extend), new repos/services for sampling + shadow-gen + persistence underapp/api/frontend_service/v1/ai_agent/, extendrate_test_case_question.rb+publish.rb+get_tree_diagram_v3.rb, newDELETEroute, one additive migration. - BE modify:
repositories/create_test_case.rb(persistname, statusprocessing),test_cases_controller.rb(add delete route + name param). - BE NOT touched: live
SendMessageWorker/inbox send path (must remain uncalled),ai_agent_historiesschema. - FE create:
chatbot-fe/pages/bot-automation/testing/index.vue+modules/bot-automation/components/testing/*(Table, GenerateModal, GenerateDrawer, GeneratingModal, Comparison, QuestionList, ConfidenceMeter). - FE modify:
layouts/bot-automation.vue(nav item),modules/bot-automation/components/AiAgentEditor.vue(Activate gate), bot-flow node (tree confidence badge). - FE NOT touched: legacy
modules/ai-agent/components/forms/Validation*.vue(old module — reference only, not extended). - Shared: Pinia
store/ai-agentalready has the actions/types — extend, don't fork.@mekari/pixel3components reused.
Detail 2.J — Asset Inventory
| Asset | Type | Source | Format & sizes | Path |
|---|---|---|---|---|
| "No test cases yet" empty illustration | illustration | reuse existing (/images/not-found-search-illustration.png used by AiAgentsTable) or new export | PNG @1x/2x | chatbot-fe/public/images/ |
| thumbs up/down, info icons | icon | @mekari/pixel3 MpIcon | SVG (DS) | n/a (DS) |
No new fonts/lotties. Any net-new illustration for the comparison empty/failed state flagged for Design QA (§5 Q-A).
3. High-Availability & Security
The Testing pipeline is off the live request path: it runs in the isolated
:ai_agent Sidekiq queue, reading from Hub Chat Service and calling QontakNLP. If
Hub or NLP is slow/down, only batches degrade (the test case goes failed and is
retriable) — live inbox and live AI answering are unaffected. Reads should target a
replica where the chatbot DB topology offers one (PRD §6); writes go to primary.
Performance Requirement
- Frontend: list LCP < 2.5 s; rating INP < 200 ms; CLS < 0.1; bundle delta small (reuses Pixel3 + existing store). Browser support per chatbot-fe baseline; web only.
- Backend: a ~50-item batch completes in ≈2–5 min (PRD §6); per-question NLP p99
governed by
qontak_nlp_prediction_timeout(default 60 s). Worker concurrency bounded by queue config (5 staging / 10 prod). No added live-path RPS. - NLP throttle contract (REV-1). Per-question shadow-generation calls are paced by an
in-worker token-bucket so the
:ai_agentbatch never starves live prediction traffic on the shared AI service:- Ceiling: a per-org RPM cap read from
SystemPreference(group_code: 'engine',code: 'ai_agent_testing_nlp_rpm', default60req/min — i.e. one question/sec, well within a ~50–70 item batch's 2–5 min budget). The TPM dimension is bounded indirectly by the cap (PRD Open Q #5 — AI squad confirms the production ceiling before beta; default is conservative). - Pacing: acquire a bucket token before each
QontakNlp::Predictcall; if empty, sleep until refill (bucket lives in the worker process; batch is single-worker per test case so no cross-process coordination is needed this phase). - On HTTP 429 / rate-limit from the AI service: exponential backoff
(e.g. 1 s → 2 s → 4 s, max 3 attempts per question) and re-queue the question for
a later pass; on attempts-exhausted, mark the question
status='failed'(status_description='rate_limited') and continue the batch (consistent with §2.2 failure path). A 429 never fails the whole test case. - This makes Decision D-5 fully specified (the throttle was previously deferred).
- Ceiling: a per-org RPM cap read from
Monitoring & Alerting
- FE analytics (Mixpanel, names from PRD §12):
ai_workspace_opened,ai_validation_generated {sample_size,date_range,test_case_id},ai_response_graded {grade,confidence_score,inquiry_id},ai_validation_completed,ai_agent_activated. Error slate logsai_workspace_load_failed. - BE: Rollbar (existing) on worker errors with
test_case_id/room_id. StructuredRails.logger.infoalready emits{worker, test_case_id, room_ids_count, conversation_pairs_count}— extend withgenerated_count,failed_count. - Alerts (PRD §12): batch failure rate > 10%/1h →
#bot-ai-alerts; NLP error rate5%/15m →
#bot-ai-alerts+ PagerDuty. - Cross-layer: propagate request/job id from create response into worker logs for trace.
Logging
- BE: structured worker log (above) + Rollbar; FE: console error → Sentry/Datadog per chatbot-fe.
- PII scrub: do not log
question,answer, orparameters.human_answerbodies (customer/agent text). Log only ids, counts, statuses.
Security Implications
- Threat model: (a) cross-tenant data leak via test-case ids — mitigated by
org-scoped queries +
set_role; (b) customer-message leakage during shadow gen — mitigated by calling predict only, neversend_message/notification (spec-asserted, S04/AC-1); (c) PII to 3rd-party LLM — covered by existing DPA, transient inference, not used to train public model (Open Q #1, InfoSec approval required); (d) privilege escalation —set_roleserver-side on every endpoint.
Role × Endpoint Authorization Matrix
| Role | Endpoint(s) | Methods | Tenant scope | UI visibility | Constraint | Audit |
|---|---|---|---|---|---|---|
| owner | all test-case + publish + tree | GET/POST/PATCH/DELETE | own org | full | — | PaperTrail |
| supervisor | all test-case + publish + tree | GET/POST/PATCH/DELETE | own org | full | no force-override (admin-only, S09) | PaperTrail |
| admin | all test-case + publish (+override) + tree | GET/POST/PATCH/DELETE | own org | full | override requires reason | PaperTrail + reason |
| standard agent | none | — | — | menu hidden | 403 on direct route | n/a |
| bot-specialist | tree-diagram read | GET | own org | tree only | read-only | n/a |
Every role from Detail 1.A appears here.
standard agentis explicitly denied.
- Ownership validation: queries scoped by
current_user['chatbot_organization_id'](existing use-case pattern). Enforcement: use-case layer +set_role. - Input validation:
score ∈ {0,1}(422 otherwise);type,version_id,name(length-bounded, e.g. ≤ 24 per prototype) via dry-validation contract. - Injection: ActiveRecord parameterized; outbound URLs (Hub/NLP) from config/env (no user input in URL) → SSRF-safe.
- Secrets: channel access token via lockbox-encrypted
chatbot_tokens_encrypted; NLP base URL from org settings/env. No hard-coded keys. - Audit: PaperTrail (
has_paper_trail) on both tables; force-activate writes reason + score-at-override (S09). - Rate limiting: per-question NLP throttle (TPM/RPM); create endpoint guarded by FE button disable + recommended server idempotency.
- Static analysis:
bundle exec brakeman(Gemfile) +bundle exec rubocop. - ISO 27001/27701: PII processing logged + access-controlled; see Compliance below.
Detail 3.A — Failure Mode Catalog (merged)
| Surface | FE behavior on failure | BE response on failure | Code-shape consistency |
|---|---|---|---|
| List load | blank slate + Retry; ai_workspace_load_failed | 403/500 ErrorException | yes |
| Create | inline error; button re-enabled | 404 (version) / 422 / 403 | yes |
| Generating | failed badge + Retry | worker sets status=failed (Rollbar) | yes (poll reads status) |
| Per-question gen | "could not generate" card | question status=failed + status_description | yes |
| Rate | optimistic rollback + inline error | 404/422 | yes |
| Publish below threshold | button disabled; if forced API → reason | 422 with reason (gate on) | yes |
Detail 3.A.1 — Branch & Skip Catalog
| Branch trigger | Where checked | Downstream effect | Audit | User-visible? |
|---|---|---|---|---|
| Eligible rooms < 10 | worker sampling step (BOT) | use 100% rooms (no 10% cut) | log count | no (result reflected) |
| Batch > 100 | worker sampling | show ≤ 50 (cap) | log | indirectly (count) |
| Bot-only room (no human reply) | ExtractConversationPairs | room excluded, not counted | log | no (NEG-2) |
| Non-text message (image/voice) | ExtractConversationPairs | message skipped | log | no (NEG-2) |
| Phase 2/3 sources | FE Generate modal | "from knowledge"/"imported" disabled | n/a | yes (NEG-4) |
ai_agent_testing flag OFF | BE + FE | menu hidden / endpoints inert | n/a | yes |
Detail 3.B — Error Response Catalog (BE)
{ "status": "error", "code": 422, "message": ["..."], "errors": {}, "error_code": null }
| Endpoint | Code | HTTP | Message | When | User-facing? |
|---|---|---|---|---|---|
| POST test_cases | — | 404 | "Version not found" | version_id invalid | yes |
| POST test_cases | — | 422 | "Failed to create test case" | save fails | yes |
| PATCH question | — | 422 | "Invalid score" | score ∉ {0,1} | yes |
| GET detail | — | 404 | "Test case not found" | bad id / cross-tenant | yes |
| any | — | 403 | "Permission denied" | set_role reject | no (menu hidden) |
| POST publish | — | 422 | "Confidence below threshold" | <80 & gate on | yes |
Detail 3.C — Error Message Catalog (FE)
| Error code | User-facing message (i18n key) | Surface | User-facing? |
|---|---|---|---|
| list_load_failed | "Couldn't load test cases" + Retry | blank slate | yes |
| create_failed | "Couldn't generate test case" | toast/inline | yes |
| rate_failed | "Couldn't save your rating" | inline (after rollback) | yes |
| gen_failed | "Could not generate" (per question) | comparison card | yes |
Detail 3.D — Compliance & Data Governance
Trigger: PII present. question, answer, parameters.human_answer are customer
& agent conversation text sent to a 3rd-party LLM.
| Field | Classification | Legal basis | Retention | Encryption | Access audit | Right-to-delete |
|---|---|---|---|---|---|---|
question / answer / parameters.human_answer | PII (customer content) | DPA (existing); UU PDP | soft-delete; hard-purge TBD (Open Q #8) | TLS in transit; DB at rest per platform | PaperTrail + set_role | soft-delete + paranoid; align purge to DPA |
scored_by_email / _name | PII (internal user) | legitimate interest | with row | at rest | PaperTrail | with row |
Transient inference: NLP prediction payload is not persisted beyond the stored answer/confidence/sources (PRD §6). InfoSec sign-off required before beta (Open Q #1).
Detail 3.E — Accessibility
WCAG AA. Keyboard: drawer/modal focus-trap (Pixel3 defaults); thumbs reachable via Tab
with aria-pressed; meter role="progressbar". Focus returns to trigger on modal
close. Contrast verified against Pixel3 tokens. prefers-reduced-motion honored for
the generating progress animation.
4. Backwards Compatibility and Rollout Plan
Compatibility
- BE: additive only — new
namecolumn (nullable), newDELETEroute, extended worker/publish/tree behavior behind theai_agent_testingflag. Existing endpoints' request/response shapes unchanged (onlynameadded to create response; nullable). - FE: new page + components; menu item additive. No change to saved client state.
- Cross-layer: snake_case JSON unchanged; FE already consumes the contract.
Rollout Strategy
- Deploy order: BE first, then FE. BE adds the pipeline + endpoints behind the flag; FE Testing page ships after the contract is live (FE polls real status).
- Feature flag:
ai_agent_testingviaSystemPreferences::FeatureFlag.enabled? (group_code: 'rollout', code: 'ai_agent_testing', default: false)— per-org, default OFF. Kill-switch: toggle OFF per org → menu hidden + endpoints inert + gate reverts to advisory (no deploy). FE menu reuses the existing rollout-preference read pattern (layouts/bot-automation.vue). - Migration sequence: add nullable
namecolumn (no backfill needed) → deploy BE → enable flag for internal org → deploy FE. - Stages (audience/gates owned by
delivery/; PRD §11/§14): Internal (telesales POC) → Closed beta (3–5 orgs) → Open beta (on request) → GA. - Rollback trigger: batch failure rate > 20% unresolved 24 h, or any customer-message leakage → disable flag for affected orgs.
- Rollback mechanism: flag OFF (instant); for DDL,
namecolumn is nullable and inert when unused — no down-migration needed for rollback; data written stays (soft-deletable).
Detail 4.A — Cross-Layer Rollout Compatibility Matrix
| Scenario | FE | BE | Works? | Mitigation |
|---|---|---|---|---|
| Pre-deploy | Old | Old | yes | baseline (no Testing page) |
| Backend first | Old | New | yes | new endpoints unused by old FE; flag OFF |
| Frontend first | New | Old | no | avoid — deploy BE first; if FE leads, gate page behind flag tied to BE readiness |
| Both deployed | New | New | yes | target |
| Backend rollback | New | Old | partial | FE page errors gracefully (error slate); disable flag |
| Frontend rollback | Old | New | yes | BE endpoints simply unused |
Detail 4.B — Configuration Contract
| Layer | Env var / flag | Type | Default | Required | Provisioner | Secret? |
|---|---|---|---|---|---|---|
| BE | ai_agent_testing (system_preferences rollout/ai_agent_testing) | bool | false | yes | SystemPreference (per org) | no |
| BE | ai_agent_testing_gate (publish gate enforce) | bool | false (advisory) | no | SystemPreference | no |
| BE | ai_agent_testing_threshold (engine) — gate % (REV-4) | int | 80 | no | SystemPreference (per org) | no |
| BE | ai_agent_testing_nlp_rpm (engine) — shadow-gen RPM cap (REV-1) | int | 60 | no | SystemPreference (per org) | no |
| BE | qontak_nlp_prediction_timeout | int (s) | 60 | no | SystemPreference (engine) | no |
| BE | sampling cap / pct (if configurable) | int | 10% / 50–70 | no | const or SystemPreference | no |
| BE | QONTAK_NLP_PREDICTION_ENDPOINT, AI_SERVICE_BASE_URL | url | — | yes (existing) | env/org settings | no |
| FE | rollout pref read (rollout_ai_agent pattern) | bool | false | yes | preferences store | no |
Detail 4.C — Test Plan (commands from repo)
| Layer | Command (source) | What it proves |
|---|---|---|
| BE unit/use-case | RAILS_ENV=test bundle exec rspec spec/api/frontend_service/v1/ai_agent (bin/rspec_pipeline.sh) | create/list/rate + new sampling/gen behavior |
| BE worker | RAILS_ENV=test bundle exec rspec spec/workers/fetch_room_conversations_worker_spec.rb | sampling, no send_message, persistence, status |
| BE repo | RAILS_ENV=test bundle exec rspec spec/core/repositories/chat_service/extract_conversation_pairs_spec.rb | text-only filtering (S03) |
| BE tree | RAILS_ENV=test bundle exec rspec spec/core/repositories/paths/get_tree_diagram_v3_spec.rb | avg confidence in node |
| BE lint/security | bundle exec rubocop · bundle exec brakeman (bitbucket-pipelines.yml, Gemfile) | style + security scan |
| FE unit | pnpm test → vitest run (chatbot-fe/package.json) | components + store rating rollback |
| FE e2e | pnpm test:e2e → playwright test | generate→poll→detail→rate→meter flow |
| FE lint | pnpm lint · build pnpm build | typecheck + bundle |
Detail 4.D — Agent Execution Plan
| Order | Layer | Chunk | Files | Commands | Acceptance criteria |
|---|---|---|---|---|---|
| 1 | BE | Add name column + persist; create status processing | db/migrate/2026XXXX_add_name_to_ai_agent_test_cases.rb, repositories/create_test_case.rb, use_cases/create_test_cases.rb, test_cases_controller.rb | bundle exec rails db:migrate; rspec create spec | migration up/down; create persists name, returns status='processing' |
| 2 | BE | Test-case status lifecycle helper | repositories/ (new update_test_case_status.rb) | rspec | status transitions pending→processing→completed/failed covered |
| 3 | BE | Worker sampling step | app/workers/fetch_room_conversations_worker.rb, new repositories/.../sample_rooms.rb | rspec spec/workers/... | 200→~20; <10→all; 5000→cap 50–70 (S02) |
| 4 | BE | Worker shadow-gen + question persistence | worker, new services/.../generate_shadow_answer.rb (wraps QontakNlp::Predict), question repo | rspec spec/workers/... | 0 SendMessageWorker enqueues (S04/AC-1); answer+parameters.human_answer+confidence+sources persisted; failed→status=failed+desc |
| 5 | BE | Confidence aggregate recompute on rate | repositories/rate_test_case_question.rb (extend), new recompute_confidence_score.rb | rspec spec/api/frontend_service/v1/ai_agent/use_cases/rate_test_case_question_spec.rb | confidence_score = round(up/total*100) after rate (S06/AC-3) |
| 6 | BE | DELETE test-case route (soft delete) | test_cases_controller.rb, new use_cases/delete_test_case.rb + repo | rspec | DELETE soft-deletes (deleted_at set); 404 cross-tenant; FE client now resolves |
| 7 | BE | Tree-diagram avg confidence | core/repositories/paths/get_tree_diagram_v3.rb (add_ai_agent) | rspec spec/core/repositories/paths/get_tree_diagram_v3_spec.rb | node returns avg over completed; none→"no score yet" (S10) |
| 8 | BE | Publish confidence gate (flagged) | repositories/publish.rb, use_cases/publish_ai_agent.rb | rspec | 422 when <80 & gate on; advisory when off (S07) |
| 9 | FE | Testing list page + table | pages/bot-automation/testing/index.vue, modules/bot-automation/components/testing/TestCasesTable.vue | pnpm test; pnpm lint | renders list/empty/loading/error; ai_workspace_load_failed on error (S01) |
| 10 | FE | Generate modal + Inbox drawer (with version selector) | .../testing/{GenerateTestCaseModal,GenerateFromInboxDrawer,TestCaseGeneratingModal}.vue | pnpm test | create dispatch + poll to completed (S08) |
| 11 | FE | Detail comparison + question list + meter | .../testing/{TestCaseComparison,QuestionList,ConfidenceMeter}.vue | pnpm test; pnpm test:e2e | human-left/AI-right + metrics; failed→"could not generate"; meter = up/total (S05,S06) |
| 12 | FE | Activate gate + nav item | layouts/bot-automation.vue, modules/bot-automation/components/AiAgentEditor.vue | pnpm test | button disabled <80; Testing nav behind flag (S07,S01) |
| 13 | FE | Tree-diagram node badge | bot-flow node component (chatbot-fe) | pnpm test | node shows avg / "no score yet" (S10) |
Detail 4.E — Verification & Rollback Recipe
- Pre-merge (BE): 1)
bundle exec rubocop2)RAILS_ENV=test bundle exec rspec spec/api/frontend_service/v1/ai_agent spec/workers spec/core/repositories/chat_service spec/core/repositories/paths3)bundle exec brakeman - Pre-merge (FE): 1)
pnpm lint2)pnpm test3)pnpm test:e2e4)pnpm build - Post-deploy signals: Rollbar batch error rate < 10%/1h (
#bot-ai-alerts); Mixpanelai_validation_generatedfiring; worker logfailed_count/generated_countratio healthy; zeroSendMessageWorkerenqueues correlated with:ai_agentbatches. - Rollback: 1) toggle
ai_agent_testingOFF for affected org(s) (no deploy) 2) if publish gate misbehaves, toggleai_agent_testing_gateOFF (advisory) 3) if needed, revert FE PR (BE endpoints become unused) 4) confirm Rollbar error rate normal in 15 min.
Detail 4.F — Resource & Cost Notes
- Compute: bounded by
:ai_agentqueue concurrency (5/10); no new pods required. - DB: +1 nullable column; question rows ≤70/case — negligible growth.
- Egress: per-question HTTPS to QontakNLP (cost = batch size × token usage) — the cap controls this; confirm per-tier budget (Open Q #5).
- No new infra components.
5. Concern, Questions, or Known Limitations
Review findings ledger (from historical-validation-review.md, R1)
rfc-reviewer R1 (score 7.5/10, PROCEED) raised four material findings; all are now
addressed inline in this revision:
| Id | Severity | Finding | Resolution | Status |
|---|---|---|---|---|
| REV-1 | major | NLP throttle contract unspecified (RPM/TPM, 429 behavior) | §3 Performance: token-bucket, ai_agent_testing_nlp_rpm pref (default 60), 429→backoff+requeue→fail-question; D-5 specified | resolved (in-RFC) — production ceiling still confirmed by AI squad (Open Q #5) |
| REV-2 | major | name column left conditional | §2.3: column added + persisted unconditionally (chunk 1); §2.G create row now yes | resolved |
| REV-3 | major | DELETE endpoint contract thin | §2.4: full DELETE contract (soft-delete, 404-not-403 cross-tenant, idempotency, restore out-of-scope) | resolved |
| REV-4 | minor | publish-gate threshold source unresolved | D-7 + §4.B: ai_agent_testing_threshold pref (default 80), org-configurable — resolves PRD Open Q #4 | resolved |
Minor follow-ups still open after R2 (none blocking): REV-5 worker job retry intervals
- dead-set alert depth (§2.F, item 10); REV-6 no Figma frame for the comparison/detail
view (§1 Design References, item Q-A — design dependency); REV-7 re-run trigger path
absent from §2.4 + clear-before-regen transaction (item 9, Detail 2.D); REV-8 create
idempotency "recommended" not decided (§2.4 POST); REV-9 (new in R2) the §3 NLP throttle
RPM cap is enforced by an in-worker token-bucket and is therefore per worker process —
with queue concurrency 5/10, concurrent same-org batches can aggregate past the "per-org"
ceiling. Mitigation: low likelihood (one SPV generates at a time); make it a true per-org
ceiling via a Redis-backed counter keyed by org if concurrency proves real. R2 score 8.5/10,
verdict PROCEED (see
historical-validation-review.md).
Open questions
Carried from PRD §17, scoped to engineering:
- Q-A (Design): Generate-from-Inbox drawer needs a version selector (bind test
case to
ai_agent_history_id); the qontak-designer prototype only has a name field. The comparison/detail view has no prototype — needs a Figma frame + Design QA. - Open Q #1 (Risk/InfoSec): PII to 3rd-party LLM — DPA-covered, transient inference; InfoSec approval required before beta.
- Open Q #2 (Risk): confidence recompute (S06) + activation gate (S07) ship as this RFC's work; gate is advisory for beta, enforced before GA.
- Open Q #4 — resolved (REV-4): threshold is org-configurable via
ai_agent_testing_thresholdSystemPreference(default 80); no redeploy to change. - Open Q #5 (Data/AI): per-batch token budget + the production RPM/TPM ceiling
across plan tiers — the §3 throttle ships with a conservative default
(
ai_agent_testing_nlp_rpm= 60); AI squad confirms the real ceiling before beta (REV-1). - Open Q #6: separate "relevance" metric? Schema has only
confidence— if needed, store inparameters(no schema change). - Open Q #7 (Data): single human reply as "golden answer" when a room has multiple
agent messages —
ExtractConversationPairspairs the next agent text reply. - Open Q #8 (Eng): hard-purge window for soft-deleted test cases/questions (DPA).
- Known limitation: re-running a test case must clear prior questions to avoid duplicates (Detail 2.D) — define re-run UX with Design.
- Known limitation:
FetchRoomConversationsWorkerisretry: falsetoday; this RFC changes it to bounded retry — confirm idempotent re-entry (clear-before-regen).
6. Comment logs
| Date | Comment(s) From | Action Item(s) |
|---|---|---|
| 2026-06-20 | RFC author (Claude) | Initial draft from PRD + grounded against chatbot / chatbot-fe / qontak-designer code. Flagged worker gap, FE target (chatbot-fe), missing DELETE route, name-column gap. |
| 2026-06-20 | rfc-reviewer R1 (7.5/10, PROCEED) | Raised REV-1…REV-4 (NLP throttle, name column, DELETE contract, gate threshold). See historical-validation-review.md. |
| 2026-06-20 | RFC author (Claude) | Addressed REV-1…REV-4 inline: §3 throttle contract + D-5; §2.3 name column unconditional; §2.4 full DELETE contract; D-7 + §4.B configurable threshold/RPM prefs; §2.G all rows → yes; §5 ledger added. |
| 2026-06-20 | rfc-reviewer R2 (8.5/10, PROCEED) | Confirmed REV-1…REV-4 fixed; decisions 10/10 resolved; CSS 6.5→8.0. Raised REV-9 (per-process vs per-org throttle enforcement, minor). REV-5/6/7/8/9 carry open. |
7. Ready for agent execution
- yes
All execution-readiness gates are met against verified repo state:
- §1 Design References — Figma frames + DS version (
@mekari/pixel3@^1.0.12) + Design QA named; the detail/comparison frame and the drawer version-selector are flagged in §5 Q-A (Design QA must confirm before chunks 10–11 land). - §1 PRD-to-Schema Derivation — every entity/attribute/rule mapped to table.column + endpoint + enforcement.
- Detail 1.C Per-Story Change Map — all 10 stories, one row each, FE+BE columns filled, verifiable AC.
- Repo Reading Guide (2.0) — anchors, contracts (reuse/extend/new), reading order, Source Verification with concrete evidence per row (no unverified claims).
- Design ↔ Code Mapping — frames → chatbot-fe files + tokens + backing endpoints.
- Mermaid: repo map, component, ER, two state machines, branch/skip, happy + 2 failure sequences.
- DDL — existing schema verified; one additive migration; per-status lifecycle tables for both enums.
- APIs — outbound table with reuse/extend/new tags; inbound
N/A — reason; cross-layer verification flags the 3 closeable gaps. - Failure Mode + Branch & Skip + Error catalogs complete; Role × Endpoint matrix covers every role.
- Configuration Contract complete;
ai_agent_testingflag named, default OFF. - Agent Execution Plan — 13 ordered chunks, each with files + repo-sourced commands + assertable AC.
- Verification & Rollback Recipe — runnable per-layer commands; named signals; flag-first rollback.
Optional next step: hand to
rfc-reviewerfor a second-pass score (historical-validation-review.md).