RFC: Bot-vs-Human Traffic Split — Phase 1: Per-Conversation Split
Document Conventions (do not remove)
This RFC follows the Qontak RFC Template format for governance — the Metadata table, Confluence sections 1–6, and Comment logs are mandatory; sections that do not apply are marked
N/A — reasonrather than deleted.It is also agent-execution-ready: §1 Design References (FE half) + §1 PRD-to-Schema Derivation (BE half), §2 Repo Reading Guide (Detail 2.0) for both layers, mermaid diagrams, the §2.G Cross-Layer Contract Verification, and §4 Agent Execution Plan + Verification & Rollback Recipe are complete before §7 Ready for agent execution: yes.
Delivery & project management live elsewhere. This RFC is the technical artifact only — no staffing, effort estimates, timeline, or rollout schedule. Those live in the initiative's
delivery/folder. Until handoff, the Metadata Delivery row readsnot yet handed to delivery.The YAML frontmatter at the very top is the machine-readable index agents parse. The metadata table below is the human-readable governance record. Both must agree on every shared field.
Metadata
| Field | Value | Notes |
|---|---|---|
| Status | IDEA | Human label IDEA; YAML status: carries the remapped linter enum draft |
| DRI | Dimas Fauzi Hidayat | Single accountable owner of this RFC. Per-task staffing lives in delivery/, not here. |
| Team | chatbot | Advisory squad slug carried from the source PRD / initiative README |
| Author(s) | Chatbot Squad (BE + FE) | Primary authors |
| Reviewers | Chatbot Squad (BE + FE); Data Squad | Data Squad owns the analytics ingest + comparison metrics |
| Approver(s) | TBD — Chatbot tech lead; TBD — infosec approver | infosec approver required before AGREED |
| Submitted Date | 2026-06-20 | Date RFC opened for discussion |
| Last Updated | 2026-06-20 | Bump on every material edit |
| Target Release | 2026-Q3 | Carried from source PRD |
| Target Quarter | 2026-Q3 | Advisory; carried from source PRD / initiative README |
| Delivery | not yet handed to delivery | Pointer to delivery/ artifacts once handed off |
| Related | ../prds/phase-1-per-conversation-split.md, ../README.md | Source PRD + initiative README |
| Discussion | #chatbot-alerts (Slack) — thread TBD |
Type: full-stack Frontend sub-type: new-feature Backend sub-type: new-feature
Sections at a Glance
- Overview (incl. §1 Design References — FE half, and §1 PRD-to-Schema Derivation — BE half)
- Technical Design (Repo Reading Guide for both layers → end-to-end mermaid → DDL → APIs → cross-layer contract verification)
- High-Availability & Security
- Backwards Compatibility and Rollout Plan (incl. cross-layer rollout matrix, §4 Agent Execution Plan, Verification & Rollback Recipe)
- Concern, Questions, or Known Limitations
- Comment logs
- Ready for agent execution
1. Overview
A Qontak chatbot is all-or-nothing today: once a channel integration has an
enabled, matching Path, every new incoming conversation is routed to the bot
(is_auto_assign_agent is false/nil on the default path, so the
intent_id branch runs in
UseCases::System::Hub::ProcessIncomingMessageWithResolve#send_message_assign_agent).
A Chatbot Admin who is not yet confident in the bot has no safe way to expose it
to a controlled slice of real traffic while humans cover the rest.
Phase 1 adds a per-channel traffic split: the Admin sets an integer
bot_percent (0–100) on a channel; for each new conversation the router rolls
rand(100) and routes < bot_percent to the bot arm and the rest to the human
arm (reusing the existing SendMessageAutoAssignAgentWorker path). Each
conversation (Room) is stamped once with variant = bot | human for an
apples-to-apples comparison. The split decision is an in-process check inside
find_default_path, reading config already loaded on the channel record — no
new network/DB round-trip on the hot path.
Success Criteria
- SC-1 (routing fidelity): Over an active experiment,
|observed bot share − configured bot_percent|≤ 5 percentage points (PRD §11 Routing fidelity). - SC-2 (decide-once): A conversation's
variantis decided exactly once and never changes for the life of that conversation, even under concurrent first-messages (PRD BTS-S02/AC-3, BTS-S02-NEG/NEG-1, §15 Q4). - SC-3 (fail-safe): Any config-read error routes to the bot arm (current
default) and logs
bot_traffic_split_decision_fallback; no chat is dropped (PRD BTS-S02/ERR-1). Sustaineddecision_fallback< 1 % (PRD §10). - SC-4 (clean human arm): A
variant = humanchat with no agent available enters the existing queue/offline behavior and is never re-tagged or rescued by the bot (PRD BTS-S03). - SC-5 (self-serve config): A Chatbot Admin / Admin can enable the split and
persist a validated
bot_percentper channel; SPV + Admin can view the comparison; Agents and end-customers cannot (PRD BTS-S01, role model). - SC-6 (gated): The control is invisible and the save endpoint returns
403when the org feature flag is OFF or the plan is ineligible (PRD BTS-S01-NEG, BTS-S01-NEG2).
Out of Scope
Carried verbatim from PRD §4 (Non-Goals). This RFC implements none of the following:
- Sticky per-customer bucketing / deterministic identity hashing (Phase 2).
- Mid-conversation re-bucketing (the arm is immutable post-decision).
- Bot-A-vs-bot-B testing.
- Automatic winner selection / auto-ramp of
bot_percent. - Any change to the no-agent-available behavior (no "bot rescues the queue").
- A statistical-significance engine (raw comparison only).
- Per-segment / per-topic / attribute-conditioned targeting (flat random %).
Related Documents
| Document | Path / link |
|---|---|
| Source PRD (Phase 1) | ../prds/phase-1-per-conversation-split.md |
| Initiative README | ../README.md |
| Backend repo | chatbot (Rails 7.1, PostgreSQL, Grape, Sidekiq, Karafka) |
| Frontend repo | chatbot-fe (Nuxt 4, Vue 3, TypeScript, Pinia, @mekari/pixel3) |
Assumptions
- A-1: The split unit is a channel integration (
channel_integrationsrow). The PRD’s phrase “Path config” is reconciled in ADR-1 — config lives on the channel record, not on individualpathsrows. Confirm with PM (§5 Open Q-1). - A-2:
bot_percentsemantics:rand(100) < bot_percent→ bot; otherwise human (equivalentlyrand(100) >= bot_percent→ human, PRD §5 Determinism).bot_percent = 100⇒ today’s behavior;0⇒ all human. - A-3: The conversation unit is a
Room(acts_as_paranoid); a newvariantcolumn onroomsis the per-conversation tag. - A-4: The org-level enablement flag
bot_traffic_splitis modeled as anOrganizationFeature(per-org), optionally fronted by aSystemPreferencegroup_code: 'rollout'global kill-switch (ADR-7). - A-5: The comparison view (BTS-S04, Should Have) reads aggregates from
rooms(variant,resolved_at,is_closed,assigned_at). The canonical “resolved” definition (PRD §15 Q2), handover derivation, and CSAT source are Data-squad dependencies — see §5 Open Q-2/Q-3.
Dependencies
| Dependency | Layer / owner | Deliverable | Blocking? |
|---|---|---|---|
ProcessIncomingMessageWithResolve#find_default_path hook point | BE / Chatbot | Already exists (process_incoming_message_with_resolve.rb:488) | NO — present |
SendMessageAutoAssignAgentWorker + existing queue/offline path | BE / Chatbot | Reused as-is for the human arm | NO — present |
channel_integrations PATCH endpoint + Grape stack | BE / Chatbot | New traffic_split sub-action endpoint (ADR-6) | YES — config save |
Chatbot settings screen (pages/chat/settings/index.vue) | FE / Chatbot | New Traffic Split section + comparison view | YES — config UI in scope |
Canonical “resolved” metric segmentable by variant | Data / Chatbot | Definition + query for the ⭐ resolution-parity KPI | YES for BTS-S04 — not for routing |
Product analytics ingest of bot_traffic_split_* events | Data | Mixpanel ingest + comparison dashboard | NO for routing; YES for dashboard parity |
CSAT data joinable to variant | Data / Chatbot | CSAT source (not all channels collect it) | NO — secondary metric, degrade gracefully |
Design References (frontend half — required)
| PRD-named surface | Figma / design link | Frame name | Design system version | Design QA contact | Notes |
|---|---|---|---|---|---|
| Screen A — Traffic Split configuration (PRD §6, §6.1) | n/a — design pending | Draft wireframe in PRD §6.1 (Screen A) | @mekari/pixel3@^1.0.12 (verified package.json:74) | TBD — Chatbot designer | Low-fi wireframe only; designer owns Figma frames (PRD §6.1 D1–D4). See §5 Open Q-4. |
| Screen B — Bot vs Human comparison (PRD §6.1) | n/a — design pending | Draft wireframe in PRD §6.1 (Screen B) | @mekari/pixel3@^1.0.12 | TBD — Chatbot designer | Comparison table; CSAT-not-collected treatment open (PRD §6.1 D3). |
No production Figma frame exists yet for either surface — both are
n/a — design pendingand tracked in §5 Open Q-4. Frontend chunks that depend on pixel-exact frames (visual polish, empty/error illustration) must not start against imagined designs; the structural build (controls, validation, states) proceeds from the PRD §6.1 wireframes + verified Pixel 3 components.
PRD-to-Schema Derivation (backend half — required)
| PRD-described entity / attribute / rule | Persisted as (table.column) | Exposed via (endpoint / event) | Enforced where | Source |
|---|---|---|---|---|
| Split enabled per channel | channel_integrations.traffic_split_enabled boolean default false | PATCH /v1/channel_integrations/:id/traffic_split; GET /v1/channel_integrations/:id | ChannelIntegration::UpdateTrafficSplit use case + DB default | PRD §5, §7 #1 |
| Bot percentage 0–100 | channel_integrations.bot_percent integer default 100 | same as above | Grape param validation (values: 0..100, Integer) + use case + DB CHECK | PRD §5, §7 #1, BTS-S01 |
| Per-conversation arm tag | rooms.variant string (bot/human, null until decided) | bot_traffic_split_assigned event; comparison endpoint | Atomic decide-once update in find_default_path | PRD §5.1, §7 #2, BTS-S02 |
| Arm decision rule | not persisted (in-process rand(100)) | bot_traffic_split_assigned | find_default_path split step (new) | PRD §5 Determinism, BTS-S02/AC-1,2 |
| Decide-once / no re-roll | guard on rooms.variant IS NULL | — | UPDATE … WHERE id = ? AND variant IS NULL | PRD §15 Q4, BTS-S02/AC-3 |
| Fail-safe to bot | — (defaults to existing intent_id flow) | bot_traffic_split_decision_fallback | rescue around split step | PRD §7 #2 failure, BTS-S02/ERR-1 |
| Human arm queue/offline reuse | reuses rooms.is_assign_agent_offline, assignment cols | SendMessageAutoAssignAgentWorker | existing worker (unchanged) | PRD §7 #3, BTS-S03 |
| Org-level eligibility | organization_features (feature.code = 'bot_traffic_split', enabled) | gate on endpoint + FE visibility | use case guard + FE checkSubscription | PRD §5 flag, BTS-S01-NEG/NEG2 |
| Config-save audit | bot_traffic_split_config_saved event (+ PaperTrail on channel_integrations) | Mixpanel | SendMixpanelEventWorker | PRD §10 |
Every §2.3 DDL row and every §2.4 endpoint traces back to a row here or a Design Reference frame above. Missing trace = blocker (none open for routing).
Detail 1.A — PRD Traceability (cross-layer)
Cite the PRD’s composite AC ids (<STORY-ID>/AC-n).
Forward (PRD AC → RFC):
| PRD composite AC id | FE section / component | BE section / endpoint |
|---|---|---|
BTS-S01/AC-1,/AC-2,/AC-3 | TrafficSplitSection.vue (toggle + % input + preview + Save) | PATCH /v1/channel_integrations/:id/traffic_split (§2.4) |
BTS-S01/ERR-1 | Vuelidate integer + between(0,100) inline error | param validation values: 0..100 (§2.4) |
BTS-S01/ERR-2 | toast + Retry; emit bot_traffic_split_save_failed | 5xx surfaced via Dry::Matcher failure (§2.4) |
BTS-S01 permission / BTS-S01-NEG,-NEG2 | section hidden via checkSubscription / role getter | set_role + feature-flag guard → 403 (§3 AuthZ) |
BTS-S02/AC-1,/AC-2,/AC-4 | n/a — backend routing | find_default_path split step (§2.1, §2.2) |
BTS-S02/AC-3, BTS-S02-NEG/NEG-1 | n/a | decide-once guard on rooms.variant (§2.1, §2.E) |
BTS-S02/ERR-1 | n/a | rescue → bot arm + decision_fallback (§3.A) |
BTS-S03/AC-1,/AC-2,/ERR-1 | n/a | reuse SendMessageAutoAssignAgentWorker (§2.F) |
BTS-S04/AC-1,/AC-2,/ERR-1 | BotHumanComparison.vue (pixel-table + states) | GET /v1/channel_integrations/:id/traffic_split/comparison (§2.4) |
Reverse (RFC → PRD AC):
| New FE component / BE endpoint / dependency | PRD composite AC id it serves |
|---|---|
TrafficSplitSection.vue | BTS-S01/AC-1..3, ERR-1..2 |
BotHumanComparison.vue | BTS-S04/AC-1..2, ERR-1 |
PATCH …/traffic_split | BTS-S01/AC-1..3, BTS-S01-NEG/NEG-1, BTS-S01-NEG2/NEG-1..2 |
find_default_path split step + rooms.variant | BTS-S02/AC-1..4, BTS-S02-NEG/NEG-1 |
| rescue → fail-safe | BTS-S02/ERR-1 |
SendMessageAutoAssignAgentWorker (reused) | BTS-S03/AC-1..2, ERR-1 |
GET …/comparison | BTS-S04/AC-1..2, ERR-1 |
UI / Consumer Surface Coverage
| PRD-named surface | Consumer | Required reads (BE) | Required writes (BE) | FE component | Status surface |
|---|---|---|---|---|---|
| Traffic Split configuration (PRD §6) | web | GET /v1/channel_integrations/:id (traffic_split_enabled,bot_percent) | PATCH …/traffic_split | TrafficSplitSection.vue | toast + persisted traffic_split_enabled/bot_percent |
| Bot vs Human comparison (PRD §6.1 B) | web | GET …/traffic_split/comparison | n/a | BotHumanComparison.vue | per-arm rows + Updated <ts> + empty/error states |
| Incoming-chat routing | system (no UI) | channel config (already loaded) | rooms.variant (atomic) | n/a — backend | variant on Room; bot_traffic_split_assigned |
Role Coverage
| PRD role | Authorization mechanism | Endpoints permitted (BE) | UI surface visibility (FE) | Cross-tenant? | Audit trail |
|---|---|---|---|---|---|
Chatbot Admin / Admin (owner,admin) | write: set_role(%w[owner admin]); read: set_role(%w[owner supervisor admin]) + Middlewares::Ownership + feature flag | PATCH …/traffic_split (configure), GET … (view) | section + comparison visible & editable | no — own org only | PaperTrail + bot_traffic_split_config_saved |
SPV (supervisor) | read set_role(%w[owner supervisor admin]); excluded from the write endpoint's set_role(%w[owner admin]) (REV-9) | GET …/comparison, GET … (view) | comparison visible; config read-only | no | view only |
| Human Agent | not in role set | none | section + comparison not rendered | n/a | n/a |
| End-customer | unauthenticated to admin API | none | n/a | n/a | n/a |
Cross-layer note: the existing channel endpoints gate
owner/supervisor/admintogether (channel_integration.rb:89). The PRD splits configure (Admin) from view (Admin + SPV). Decided (REV-9): the configure endpoint usesset_role(%w[owner admin])(SPV excluded); read/comparison keepowner/supervisor/admin— see ADR-6 and §3 Security.
PRD Section Coverage
| PRD § | Title | Where covered |
|---|---|---|
| HEADER / 2 | One-liner + Problem | §1 Overview |
| 3 | Target Users + Persona | §1 (roles) ; Detail 1.A Role Coverage |
| 4 | Non-Goals | §1 Out of Scope |
| 5 | Constraints | §1 Assumptions; §2.3 DDL; §3 Performance/Security; ADR-1..9 |
| 5.1 | Data Lifecycle | §2.3 Per-status/retention; ADR-3 (acts_as_paranoid) |
| 6 / 6.1 | New Features + Design Draft | §1 Design References; §2.A UI Contract; §2.C State Matrix |
| 7 | API & Webhook Behavior | §2.4 APIs; §2.2 Sequence; §3.A Failure Catalog |
| 8.1 | System Flow | §2.1 Architecture + Branch/skip; §2.2 Sequence |
| 8.2 | User Stories + ACs | Detail 1.A, Detail 1.C |
| 9 | Rollout | §4 Rollout Strategy (technical mechanics only) |
| 10 / 10.1 | Observability + cadence | §3 Monitoring & Alerting; §4.E signals |
| 11 | Success Metrics | §1 Success Criteria; §2.4 comparison endpoint |
| 12 | Launch Plan & Stage Gates | n/a — delivery/ (TPM-owned, not in RFC) |
| 13 | Dependencies | §1 Dependencies; §2.F.1 Responsibility Boundary |
| 14 | Key Decisions + Alternatives | Detail 1.B + ADR-1..9 |
| 15 | Open Questions | §5 Concerns/Questions |
Detail 1.B — Decisions Closed (cross-layer)
Full ADR blocks (context / options / decision / consequences / reversibility) are in §2.1a Architecture Decision Records. This is the index.
| Decision | Chosen option | Alternatives rejected | Why rejected | Layer |
|---|---|---|---|---|
| ADR-1 Config storage | New typed columns on channel_integrations | (a) columns on paths; (b) channel_integrations.settings JSON | (a) many paths per channel → fragmented % ; (b) JSON not indexable/typed, routing flags here are columns (is_auto_assign_agent on paths) | BE |
| ADR-2 Bucketing | In-process rand(100) < bot_percent, not persisted | persisted roll; weighted table | no value persisting an ephemeral roll; adds writes on hot path | BE |
| ADR-3 Decide-once + route-by-persisted-variant | Atomic UPDATE … WHERE variant IS NULL, then re-read + apply_arm; tag only under an active split | re-roll every message; advisory lock; route per own roll | re-roll breaks comparison; per-own-roll routing diverges under race (REV-1); lock heavier than a conditional update | BE |
| ADR-4 Fail-safe | Rescue → bot, variant stays NULL + log | fail to human; raise; tag as bot | dropping a chat is worse than today; tagging a fail-safe as a bot win pollutes the KPI (REV-3) | BE |
| ADR-5 Human-arm hook | Set is_auto_assign_agent=true, clear intent_id/agent_id/division_id/crm_intent_id | new worker; new branch | reuse existing is_auto_assign_agent branch (:545) keeps queue/offline behavior identical | BE |
| ADR-6 Config API | New dedicated PATCH …/traffic_split sub-action | extend PATCH :id (full-payload) | existing patch ':id' requires name,enabled,… (full update); partial toggle deserves its own action (mirrors patch ':id/publish') | BE+FE |
| ADR-7 Org gate | OrganizationFeature bot_traffic_split + optional SystemPreference rollout | hardcoded org list; ENV flag | matches existing per-org feature + global rollout patterns | BE+FE |
| ADR-8 Comparison source + pinned schema | Aggregate rooms WHERE variant IS NOT NULL; typed v1 schema with self-labelled proxies | full warehouse query; new events table; untyped schema | rooms already hold variant/resolved_at/assigned_at; warehouse is a Data dependency (Open Q-2/3); typed contract unblocks the agent (REV-4) | BE+Data |
| ADR-9 Split scope | Apply split only to plain bot-intent paths | split every path; treat agent/CRM as human | agent/division/CRM paths route deterministically (:84,:501-502); splitting them changes configured behavior (REV-2) | BE |
Minimum-coverage closure:
- Per-status lifecycle:
rooms.varianthas no lifecycle of its own — set once, immutable, lives & soft-deletes with theRoom(acts_as_paranoid). Config flags are booleans/int with no state machine. (ADR-1, ADR-3.) - Soft vs hard delete: inherits
Roomacts_as_paranoid(no separate cleanup); config columns persist with the channel. (PRD §5.1.) - Cross-squad responsibility: Chatbot owns routing + config + event emit; Data owns ingest + canonical resolution definition + CSAT join (§2.F.1).
- Inbound webhook ownership:
n/a — no new inbound webhook; the trigger is the existing incoming-message hub flow. - Opt-out / skip / branch: “split disabled” and “fail-safe to bot” branches (§2.1 Branch/skip; §3.A.1).
- Reuse-vs-new per endpoint: see §2.4
Reuse?column (1extended, 2new-with-justification). - FE/BE disagreement risk: snake_case API (
bot_percent,traffic_split_enabled) ↔ FE consumes as-is (no camel transform needed, see §2.G). Error shape = existingerror_responseenvelope.
Detail 1.C — Per-Story Change Map (organised by user story)
| Story id | Title | Layer scope | FE changes | BE changes | Composite AC ids | Acceptance criteria (verifiable) | RFC anchors |
|---|---|---|---|---|---|---|---|
BTS-S01 | Configure split for a channel | FE + BE | TrafficSplitSection.vue; channel-integration.ts updateTrafficSplit; store action; Vuelidate (integer,between(0,100)); toast | channel_integrations.traffic_split_enabled,bot_percent; PATCH …/traffic_split; UpdateTrafficSplit use case; PaperTrail; config_saved event | BTS-S01/AC-1..3, ERR-1..2 | RSpec request spec asserts 200 persists bot_percent=30; invalid → 422; Vitest mounts section, invalid blocks Save; toast asserted | §2.3 · §2.4 · §2.A · §2.C · §4.D #1,#3,#6 |
BTS-S02 | Route incoming chat by split + tag arm | Runtime / behavior (BE) | n/a — BE-only | split step in find_default_path; rooms.variant; atomic decide-once; assigned + decision_fallback events | BTS-S02/AC-1..4, ERR-1; BTS-S02-NEG/NEG-1 | RSpec: stub rand→bot/human arms; second message keeps variant; disabled→bot; config error→bot + fallback log | §2.1 · §2.2 · §2.3 · §2.E · §4.D #2,#4 |
BTS-S03 | Human-arm chat, no agent → queue | Runtime / behavior (BE) | n/a — BE-only | no new code — reuse SendMessageAutoAssignAgentWorker offline/queue branch | BTS-S03/AC-1..2, ERR-1 | RSpec: variant=human + agent → assigned; no agent → existing queue path; bot never re-takes; variant stays human | §2.F · §2.F.1 · §4.D #5 |
BTS-S04 | Compare bot vs human per arm | FE + BE | BotHumanComparison.vue (pixel-table, loading skeleton, empty, error+Retry) | GET …/traffic_split/comparison aggregating rooms by variant | BTS-S04/AC-1..2, ERR-1 | RSpec returns per-arm resolution/handover; bot-arm 0 rows → no_data:true; Vitest renders rows + "No data yet" | §2.4 · §2.A · §2.C · §4.D #7,#8 |
BTS-S01-NEG | Control hidden when flag OFF | FE + BE | section gated by checkSubscription('bot_traffic_split') | feature-flag guard on endpoint → 403 | BTS-S01-NEG/NEG-1 | Vitest: flag false → not rendered; RSpec: direct save → 403 | §3 AuthZ · §3.A.1 · §4.D #1,#3 |
BTS-S02-NEG | No mid-conversation re-bucketing | Runtime / behavior (BE) | n/a | decide-once guard (same as S02) | BTS-S02-NEG/NEG-1 | RSpec: existing variant=bot room, new message → no re-roll, stays bot | §2.1 · §2.E · §4.D #2 |
BTS-S01-NEG2 | Ineligible plan refused | FE + BE | section hidden when plan ineligible | plan/feature guard → 403 on direct save | BTS-S01-NEG2/NEG-1..2 | RSpec: ineligible org → 403, nothing persisted | §3 AuthZ · §4.D #3 · §5 Open Q-1 |
Coverage rule satisfied: all 7 PRD stories appear exactly once. Every
FE + BErow fills both halves; runtime/behavior rows are BE-only by nature.
2. Technical Design
Detail 2.0 — Repo Reading Guide
Repo Map (mermaid, both layers)
flowchart LR
subgraph fe["chatbot-fe (Nuxt 4 / Vue 3 / Pinia)"]
page["pages/chat/settings/index.vue"]
section["modules/settings/.../TrafficSplitSection.vue (new)"]
compare["modules/settings/.../BotHumanComparison.vue (new)"]
svc["common/services/main/v1/channel-integration.ts"]
store["store/* (Pinia, extractStore)"]
end
subgraph be["chatbot (Rails 7.1 / Grape / Sidekiq)"]
api["app/api/frontend_service/v1/channel_integration.rb"]
uc_cfg["UseCases::API::...::ChannelIntegration::UpdateTrafficSplit (new)"]
hub["UseCases::System::Hub::ProcessIncomingMessageWithResolve"]
fdp["#find_default_path (:488)"]
smaa["#send_message_assign_agent (:534)"]
worker["SendMessageAutoAssignAgentWorker"]
mix["SendMixpanelEventWorker"]
end
subgraph infra
pg[("PostgreSQL: channel_integrations, rooms, paths, organization_features")]
sq[["Sidekiq (queues: default, event_tracker)"]]
mp(("Mixpanel"))
end
page --> section --> store --> svc --> api --> uc_cfg --> pg
page --> compare --> store
hub --> fdp --> smaa
fdp --> pg
smaa --> worker --> sq
fdp --> mix --> sq --> mp
Existing Code Anchors
| Layer | Path | Why the agent reads it | What pattern it teaches |
|---|---|---|---|
| BE | app/core/use_cases/system/hub/process_incoming_message_with_resolve.rb | The routing brain; split decision lands in find_default_path (:488), branch in send_message_assign_agent (:534) | Use-case orchestration, attr_accessor routing state, Repositories::* calls |
| BE | app/models/path.rb + db/schema.rb (create_table "paths") | Path columns: is_auto_assign_agent, intent_id (null:false), channel_integration_id, is_default | Why config can’t live per-path (ADR-1) |
| BE | app/models/room.rb (acts_as_paranoid at :4) + db/schema.rb (rooms) | Conversation record; target for variant; has resolved_at,assigned_at,is_closed,path_id | Soft-delete model; columns reused by comparison |
| BE | app/models/channel_integration.rb + schema | Config home (ADR-1); has_many :paths, settings JSON | ApplicationRecord model conventions |
| BE | app/api/frontend_service/v1/channel_integration.rb (patch ':id' :88, patch ':id/publish' :153) | Grape endpoint + set_role + Dry::Matcher::ResultMatcher + dedicated sub-action precedent | Endpoint shape to mirror for …/traffic_split |
| BE | app/workers/send_message_auto_assign_agent_worker.rb (include Sidekiq::Worker :4) | Human-arm assignment + queue/offline branch (reused unchanged) | Worker shape; no edit needed (BTS-S03) |
| BE | app/workers/send_mixpanel_event_worker.rb (queue: :event_tracker, perform(distinct_id,event_name,params)) | Analytics emit pattern for all bot_traffic_split_* events | SendMixpanelEventWorker.perform_async(org_id, 'Event', payload) |
| BE | app/models/organization_feature.rb + app/models/system_preference.rb | Per-org feature gate + global rollout flag (ADR-7) | OrganizationFeature.exists?(...), SystemPreferences::FindBy |
| FE | pages/chat/settings/index.vue (MpTabs, panel refs) | Host screen for the new section + comparison | Tab/panel composition, unsaved-changes modal |
| FE | modules/settings/views/ai-assist.vue (MpInput type="number", MpInputGroup, MpInputRightAddon, useVuelidate, required/minValue/integer) | The closest existing numeric-input + validated form | Vuelidate form + Pixel input-group pattern to copy |
| FE | common/services/main/v1/channel-integration.ts (update → PATCH /v1/channel_integrations/:id) + endpoint.ts | API client to extend with updateTrafficSplit + comparison | ofetch service wrapper, AbortController, endpoint map |
| FE | store/ai-agent/{index,state,actions}.ts (extractStore, fetchStatus) | Pinia store pattern for async save/fetch | state.$patch({ … fetchStatus }) lifecycle |
| FE | common/composables/useSubscription.ts (checkSubscription(feature)) | Feature-flag gate for showing the section | subscriptionData.features.some(code===…&&enabled) |
| FE | modules/settings/views/qontak-crm/qontak-crm-list.vue (pixel-table, :empty-state) | Table pattern for the comparison view | header-list/data-list, empty-state prop, MpBadge |
| FE | plugins/toast.ts ($toast({type,message})) | Success/error toasts for Save | toast invocation |
Existing Contracts to Reuse, Extend, or Replace (BE)
| Contract | Status | Justification | Owner |
|---|---|---|---|
GET /v1/channel_integrations / GET /v1/channel_integrations/:id | extend | add traffic_split_enabled,bot_percent to response entity | Chatbot |
PATCH /v1/channel_integrations/:id/traffic_split | new-with-justification | existing patch ':id' requires the full update payload (name,enabled,timezone,…); a partial toggle needs its own action, mirroring the existing patch ':id/publish' sub-action | Chatbot |
GET /v1/channel_integrations/:id/traffic_split/comparison | new-with-justification | no existing endpoint segments room outcomes by experiment arm; report APIs (report.rb,custom_report.rb) are not arm-aware | Chatbot + Data |
SendMessageAutoAssignAgentWorker | reuse | human-arm assignment + queue/offline is exactly today’s behavior (BTS-S03, ADR-5) | Chatbot |
SendMixpanelEventWorker | reuse | emit all bot_traffic_split_* events | Chatbot |
find_default_path / send_message_assign_agent | extend | inject split step + variant stamping; reuse is_auto_assign_agent branch | Chatbot |
Patterns to Follow (and where to find them)
| Layer | Concern | Pattern in repo | Reference file | Deviation? |
|---|---|---|---|---|
| FE | State management | Pinia extractStore + fetchStatus | store/ai-agent/actions.ts | none |
| FE | Error / toast / retry | $toast({type,message}) | plugins/toast.ts | none |
| FE | Form validation | useVuelidate + MpFormControl/Label/ErrorMessage | modules/settings/views/ai-assist.vue | none |
| FE | Numeric % input | MpInputGroup+MpInput type="number"+MpInputRightAddon | modules/settings/views/ai-assist.vue | none |
| FE | Table + empty state | pixel-table with :empty-state | modules/settings/views/qontak-crm/qontak-crm-list.vue | none |
| FE | Feature gate | checkSubscription(feature) | common/composables/useSubscription.ts | none |
| BE | HTTP handler shape | Grape params do … end + set_role + Dry::Matcher::ResultMatcher | app/api/frontend_service/v1/channel_integration.rb | none |
| BE | Sub-action endpoint | patch ':id/publish' | channel_integration.rb:153 | none |
| BE | Repository / DB access | Repositories::* + dry-monads | app/core/repositories/** | none |
| BE | Async worker | include Sidekiq::Worker + sidekiq_options queue: | app/workers/send_mixpanel_event_worker.rb | none |
| BE | Analytics emit | SendMixpanelEventWorker.perform_async(org_id, name, payload) | process_incoming_message_with_resolve.rb:1234 | none |
| BE | Per-org feature flag | OrganizationFeature.exists?(feature_id:, organization_id:, enabled: true) | app/core/repositories/qontak_billing/active_subscription_status.rb:23 | none |
| BE | Global rollout flag | Repositories::SystemPreferences::FindBy.new({code:, group_code:'rollout', enabled:true}) | process_incoming_message_with_resolve.rb:120 | none |
| Cross | API casing | snake_case JSON (bot_percent) consumed as-is on FE | channel-integration.ts payloads | none — no transform needed |
Reading Order for the Agent
app/core/use_cases/system/hub/process_incoming_message_with_resolve.rb(:261 room find/create, :270 call, :488find_default_path, :534send_message_assign_agent) — where routing + the split decision live.db/schema.rb→create_table "paths","rooms","channel_integrations"— exact columns; confirm no JSON config onpaths.app/models/room.rb—acts_as_paranoid; wherevariantis added.app/api/frontend_service/v1/channel_integration.rb(:88patch ':id', :153patch ':id/publish') — endpoint shape to mirror.app/workers/send_message_auto_assign_agent_worker.rb— confirm human-arm/offline path is reused unchanged.app/workers/send_mixpanel_event_worker.rb— event emit signature.app/core/repositories/qontak_billing/active_subscription_status.rb— feature-gate pattern.pages/chat/settings/index.vue+modules/settings/views/ai-assist.vue— FE host + validated-form pattern.common/services/main/v1/channel-integration.ts+endpoint.ts— API client to extend.common/composables/useSubscription.ts— FE feature gate.
Source Verification (anti-hallucination — required)
| Layer | Anchor / pattern / contract | Verified by | Evidence |
|---|---|---|---|
| BE | find_default_path | read | def find_default_path at process_incoming_message_with_resolve.rb:488; sets self.intent_id = @path.try(:intent_id) (:499), self.is_auto_assign_agent = @path.try(:is_auto_assign_agent) (:503) |
| BE | branch order in send_message_assign_agent | read | :534 def send_message_assign_agent; order crm_intent_id→agent_id→division_id→elsif is_auto_assign_agent (:545 → SendMessageAutoAssignAgentWorker.perform_async)→elsif intent_id.present? |
| BE | Room found/created then routed | read | Repositories::Rooms::FindOrCreateBy.new(channel_integration_id:…, channel_room_id: room['id'], …).call (:261); find_default_path (:270) |
| BE | rooms.variant target + soft delete | read | class Room < ApplicationRecord + acts_as_paranoid (room.rb:3-4); rooms has resolved_at,assigned_at,is_closed,path_id,deleted_at (schema) — no variant yet |
| BE | paths has no JSON config; routing flags are columns | read | create_table "paths": t.boolean "is_auto_assign_agent", t.bigint "intent_id", null: false, t.boolean "is_default", no JSON column |
| BE | config home channel_integrations | read | app/models/channel_integration.rb has_many :paths; schema channel_integrations has t.json "settings" |
| BE | config endpoint shape | read | patch ':id' do set_role(%w[owner supervisor admin]) (channel_integration.rb:88-89) → UseCases::API::FrontendService::V1::ChannelIntegration::Update; dedicated patch ':id/publish' (:153) |
| BE | worker reuse | read | class SendMessageAutoAssignAgentWorker … include Sidekiq::Worker (send_message_auto_assign_agent_worker.rb:3-4); def perform(channel_type_id, history_id, raw_params) |
| BE | analytics emit | read | class SendMixpanelEventWorker … sidekiq_options queue: :event_tracker; def perform(distinct_id, event_name, params); live call SendMixpanelEventWorker.perform_async(channel_integration.organization_id, 'Process Message', …) (:1234) |
| BE | per-org flag | read | OrganizationFeature.exists?(feature_id: feature.id, organization_id: @organization_id, enabled: true) (active_subscription_status.rb:23) |
| BE | global rollout flag | read | Repositories::SystemPreferences::FindBy.new({ code: 'ai_assist_image_processing', group_code: 'rollout', enabled: true }).call (:120) |
| BE | test runner | read | AGENTS.md: rspec, rspec spec/path/to/file_spec.rb:42, bundle exec rspec; migrations ActiveRecord::Migration[7.1], db/schema.rb, PostgreSQL |
| FE | host settings page | read | pages/chat/settings/index.vue <MpTabs id="chat-settings-tab-list" …>, panel refs |
| FE | design system + version | read | package.json:74 "@mekari/pixel3": "^1.0.12" |
| FE | validated numeric form | read | ai-assist.vue import { useVuelidate } …; import { required, helpers, minValue, integer } from '@vuelidate/validators'; <MpInput v-model="state.reply_limit" type="number"> inside MpInputGroup+MpInputRightAddon |
| FE | API client | read | channel-integration.ts update(...) → $apiMain(endpoint.v1.channel_integrations.update.replace(':id', payload.id), { method: 'PATCH', body }); endpoint.ts channel_integrations.update: "/v1/channel_integrations/:id" |
| FE | store pattern | read | store/ai-agent/index.ts extractStore(...); actions.ts state.$patch({ … fetchStatus: 'pending'/'resolved'/'rejected' }) |
| FE | feature gate | read | useSubscription.ts checkSubscription → subscriptionFeature.some(item => item.code?.toLowerCase()===feature && item.enabled) |
| FE | table + empty state | read | qontak-crm-list.vue <pixel-table … :empty-state="emptyState"> ; MpBadge usage |
| FE | toast | read | plugins/toast.ts rootPiniaStore.pushToast(opts); usage $toast({ type, message }) |
| FE | test commands | read | package.json scripts "test": "vitest run", "test:e2e": "playwright test", "lint"; Playwright specs under tests/visual/ |
| FE/BE | i18n | grep | NOT FOUND — no vue-i18n/useI18n/locale files; FE strings hardcoded (affects new copy — §2.J / Open Q-6) |
| BE | existing rand() A/B bucketing | grep | NOT FOUND — no prior rand() bucketing; ADR-2 introduces it (cite this RFC) |
| BE/Data | canonical "resolved" def, handover derivation, CSAT source | — | NOT VERIFIED in repo — Data dependency; moved to §5 Open Q-2/Q-3 (not invented) |
Design ↔ Code Mapping (frontend half — required)
| Figma frame / component | Implementing file | Reuse vs new | Design tokens used | Backing API endpoint(s) | Deviation from design |
|---|---|---|---|---|---|
| Screen A — Traffic Split (PRD §6.1 wireframe) | modules/settings/.../TrafficSplitSection.vue (new) | new (composed of reused Pixel 3 components) | Pixel 3 semantic tokens (v2.4) — resolve via Pixel MCP get-component before build; brand reserved for primary Save | GET /v1/channel_integrations/:id, PATCH …/traffic_split | n/a — design pending (wireframe only; no Figma yet — Open Q-4) |
| Screen B — Comparison (PRD §6.1 wireframe) | modules/settings/.../BotHumanComparison.vue (new) | new (reuses pixel-table, MpBadge, MpSelect) | Pixel 3 semantic tokens; brand on the single resolution-parity KPI per PRD §6.1 | GET …/traffic_split/comparison | n/a — design pending (Open Q-4; CSAT-absent treatment Open Q-3) |
Both frames are design-pending. Per §1 Design References, build structure from the wireframe + verified Pixel 3 components; gate pixel-polish on designer frames. Pixel 3 props/variants must be confirmed via the Pixel MCP before coding (FE
AGENTS.mdconvention), not guessed.
Detail 2.1 — Architecture (mermaid)
End-to-end component diagram
flowchart TB
cust([Incoming customer message]) --> hub[ProcessIncomingMessageWithResolve#result]
hub --> findroom[Repositories::Rooms::FindOrCreateBy :261]
findroom --> fdp[#find_default_path :488]
fdp --> resolve[Resolve Path: intent_id, is_auto_assign_agent,\nagent_id :501, division_id :502, crm_intent_id :84]
resolve --> scope{plain bot-intent path?\nno agent_id/division_id/crm_assignment}
scope -- no --> passthru[No split: keep path deterministic routing\nvariant stays NULL ADR-9/REV-2]
scope -- yes --> split{traffic_split_enabled?\nchannel already loaded}
split -- no --> nosplit[No split: 100% bot, keep intent_id\nvariant stays NULL ADR-3/REV-3]
split -- yes --> decided{room.variant already set?}
decided -- yes --> reread
decided -- no --> roll[rand 0..99]
roll --> cmp{roll < bot_percent?}
cmp -- yes --> setbot[arm = bot]
cmp -- no --> sethuman[arm = human]
setbot --> stamp
sethuman --> stamp
resolve -. config read error .-> failsafe[Fail-safe: arm=bot + log decision_fallback\nvariant stays NULL]
stamp["Atomic UPDATE rooms SET variant\nWHERE id=? AND variant IS NULL"] --> reread[Re-read rooms.variant = canonical arm]
reread --> apply["apply_arm(canonical): bot -> keep intent_id;\nhuman -> is_auto_assign_agent=true, clear\nintent_id/agent_id/division_id/crm_intent_id"]
apply --> emit[[SendMixpanelEventWorker: bot_traffic_split_assigned]]
apply --> smaa[#send_message_assign_agent :534]
passthru --> smaa
nosplit --> smaa
failsafe --> smaa
smaa -- intent_id branch --> botsend[SeparateSendMessageWorker bot reply]
smaa -- is_auto_assign_agent branch :545 --> worker[[SendMessageAutoAssignAgentWorker]]
worker --> agentq{agent available?}
agentq -- yes --> assign[AssignAgentWorker]
agentq -- no --> queue[Existing queue / offline behavior\nbot does NOT take over]
subgraph config["Config path (FE → BE)"]
admin([Chatbot Admin]) --> sec[TrafficSplitSection.vue]
sec --> svc[channel-integration.ts updateTrafficSplit]
svc --> ep[/PATCH /v1/channel_integrations/:id/traffic_split/]
ep --> uc[UpdateTrafficSplit use case + flag/role guard]
uc --> pg[("channel_integrations")]
end
Data model (mermaid erDiagram)
erDiagram
CHANNEL_INTEGRATIONS ||--o{ PATHS : "has_many"
CHANNEL_INTEGRATIONS ||--o{ ROOMS : "has_many"
ORGANIZATIONS ||--o{ CHANNEL_INTEGRATIONS : "owns"
ORGANIZATIONS ||--o{ ORGANIZATION_FEATURES : "has_many"
CHANNEL_INTEGRATIONS {
bigint id PK
bigint organization_id FK
json settings
boolean traffic_split_enabled "NEW default false"
integer bot_percent "NEW default 100, CHECK 0..100"
}
ROOMS {
bigint id PK
bigint channel_integration_id FK
bigint organization_id FK
integer path_id
string variant "NEW: bot|human, null until decided"
datetime resolved_at
datetime assigned_at
boolean is_closed
datetime deleted_at "acts_as_paranoid"
}
PATHS {
bigint id PK
bigint channel_integration_id FK
bigint intent_id "null:false"
boolean is_auto_assign_agent
boolean is_default
}
ORGANIZATION_FEATURES {
bigint organization_id FK
bigint feature_id FK
boolean enabled
}
State machine for rooms.variant
stateDiagram-v2
[*] --> Undecided: Room created (variant IS NULL)
Undecided --> Undecided: split disabled / non-bot path / fail-safe\n(routes to bot but stays NULL — NOT an experiment arm)
Undecided --> Bot: split ENABLED on a bot path & rand<bot_percent
Undecided --> Human: split ENABLED on a bot path & rand>=bot_percent
Bot --> Bot: subsequent messages (NO re-roll)
Human --> Human: subsequent messages (NO re-roll; queue/offline unchanged)
Bot --> [*]: Room soft-deleted (acts_as_paranoid)
Human --> [*]: Room soft-deleted
variantis write-once and only set under an active split (REV-3): a conversation routed to the bot because the split is disabled (or because the path is non-bot, or a fail-safe) staysNULLso it never pollutes the experiment comparison. There is no Bot↔Human transition; a bot→human handover is a separate assignment event, not avariantchange (PRD BTS-S02 permission model; emitsbot_arm_handover_to_human, not a re-tag).
Branch & skip flow (non-error policy branches)
flowchart TD
trigger([find_default_path reaches split step]) --> botpath{plain bot-intent path?}
botpath -- no --> skipscope[Skip split → keep deterministic routing\nvariant stays NULL]
botpath -- yes --> flag{traffic_split_enabled?}
flag -- no --> skipbot[Skip split → bot 100%, keep intent_id\nvariant stays NULL — not an experiment arm]
flag -- yes --> roomcheck{room.variant present?}
roomcheck -- yes --> skiproll[Skip roll → re-read + apply stored variant]
roomcheck -- no --> doroll[roll rand 0..99 → stamp + re-read + apply_arm]
skipscope --> done([continue to send_message_assign_agent])
skipbot --> done
skiproll --> done
doroll --> done
Detail 2.1a — Architecture Decision Records (ADR-format)
ADR-1 — Where the split config lives.
Context: PRD §5/§7 calls it “Path config”, but the scope unit is the channel
integration and a channel can own many paths (keyword/schedule/default).
The decision runs inside find_default_path, which already holds the
channel_integration record.
Options: (a) new columns on channel_integrations; (b) new columns on paths;
(c) key in channel_integrations.settings JSON.
Decision: (a) — traffic_split_enabled boolean default false +
bot_percent integer default 100 on channel_integrations.
Consequences: one source of truth per channel; zero extra round-trip (the
record is already loaded); the config UI keeps editing channel_integrations.
The PRD wording “Path config” is reconciled to “channel config”. (b) would
fragment the percentage across paths; (c) loses typing/indexing and diverges
from the existing pattern where routing flags (is_auto_assign_agent) are
columns.
Reversibility: Medium — moving to per-path later is a migration + read-site
change. Confirm scope with PM (§5 Open Q-1) before building.
ADR-2 — Random per-conversation bucketing.
Context: PRD §5 mandates rand(100) per conversation, no identity hashing
(Phase 2). No existing rand() bucketing in the repo (grep: NOT FOUND).
Decision: in-process rand(100) < bot_percent → bot; else human. The roll is
not persisted (only the resulting variant is).
Consequences: no hot-path write beyond the single variant stamp; statistically
converges to bot_percent (SC-1). Reversibility: High — swap the bucketing
function for Phase 2 hashing without schema change.
ADR-3 — Decide-once / concurrency / route-by-persisted-variant (REV-1, REV-3).
Context: PRD §15 Q4 — two near-simultaneous first messages on the same new
Room (found via Rooms::FindOrCreateBy, :261) could both roll. Each message
is a separate use-case invocation that must also route (reply or assign), so
the tag alone is insufficient — both messages must route the same way.
Decision: (1) Only run the roll when the split is active for this
conversation (bot path + traffic_split_enabled true) and variant IS NULL; otherwise route to the bot and leave variant NULL (REV-3 — a
split-disabled / non-bot / fail-safe bot reply is not an experiment arm and
must not enter the comparison). (2) Stamp atomically: UPDATE rooms SET variant = ? WHERE id = ? AND variant IS NULL. (3) Re-read rooms.variant and derive
the routing fields from a single apply_arm(canonical_variant) step — so the
race loser (whose conditional update matched 0 rows) routes per the persisted
arm, never per its own roll. The roll result is advisory until the re-read.
Consequences: exactly-once tag and consistent routing for every message in
the conversation, without an advisory lock; safe under the existing
Rooms::FindOrCreateBy path. Reversibility: High.
ADR-4 — Fail-safe to bot.
Context: PRD §7 #2 / BTS-S02 ERR-1 — never drop a chat for the experiment.
Decision: wrap the split step in a rescue; on any error/ambiguous config →
keep the existing bot flow (intent_id unchanged) and leave variant NULL
(consistent with REV-3 — a fail-safe is not a measured bot arm), emit
bot_traffic_split_decision_fallback.
Consequences: worst case == today’s behavior; observable via the fallback
metric + alert (PRD §10); fallback conversations are excluded from the comparison
rather than silently counted as bot wins. Reversibility: High.
ADR-5 — Human-arm hook reuses is_auto_assign_agent.
Context: send_message_assign_agent branches in order crm_intent_id (:536) → agent_id (:539) → division_id (:543) → is_auto_assign_agent (:545) → intent_id (:548). The use case sets crm_intent_id at :84 (when @path.crm_assignment),
and agent_id/division_id from the path at :501-502.
Decision: the human arm is applied by apply_arm(:human) (ADR-3): set
self.is_auto_assign_agent = true and clear self.intent_id,
self.agent_id, self.division_id, self.crm_intent_id so the existing
is_auto_assign_agent branch (:545) fires SendMessageAutoAssignAgentWorker.
Because routing is derived from the persisted variant (ADR-3), the race loser
also runs apply_arm and clears the same fields — no message escapes as a bot
reply.
Consequences: zero change to assignment/queue/offline behavior (BTS-S03,
Non-Goal 5). Must clear the higher-precedence fields or the wrong branch wins.
Reversibility: High — remove the override.
ADR-6 — Dedicated config endpoint.
Context: existing patch ':id' requires the full payload (name, enabled,
timezone, …); a partial toggle of two fields shouldn’t require resending all.
There is a precedent sub-action patch ':id/publish'.
Decision: add PATCH /v1/channel_integrations/:id/traffic_split (use case
ChannelIntegration::UpdateTrafficSplit), set_role(%w[owner admin]) for the
write (SPV is view-only — see Open Q-5), feature-flag + plan guard, params
traffic_split_enabled: Boolean, bot_percent: Integer, values: 0..100.
Consequences: clean partial update + tight authZ; one new route + use case +
response entity. Reversibility: High.
ADR-7 — Org gate via OrganizationFeature (+ optional rollout SystemPreference).
Context: PRD §5 — bot_traffic_split_enabled per org, default OFF, enabled
during rollout. Repo has OrganizationFeature (per-org) and SystemPreference
group_code:'rollout' (global).
Decision: per-org OrganizationFeature with feature.code = 'bot_traffic_split';
an optional global SystemPreference rollout flag acts as a kill-switch checked
first. Both the FE section visibility and the BE write/read endpoints honor the
gate.
Consequences: matches existing entitlement patterns; supports staged rollout +
fast disable (PRD §10.1 rollback). Reversibility: High.
ADR-8 — Comparison metrics computed from rooms (with a pinned v1 contract — REV-4/REV-5).
Context: BTS-S04 is Should Have; canonical resolution def, handover
derivation, and CSAT source are Data dependencies (Open Q-2/Q-3). An agent still
needs an exact, implementable contract for v1.
Decision: v1 comparison endpoint aggregates rooms WHERE variant IS NOT NULL (REV-3 — only measured arms) by variant for a channel + date range
(created_at in [date_from, date_to], range ≤ 90 days):
resolution_rate: float 0..1=count(resolved_at IS NOT NULL) / count(*)per arm — the v1 proxy for the PRD ⭐ KPI until Data confirms the canonical "resolved" definition (Open Q-2). Documented as a proxy in the response (resolution_basis: "resolved_at_present_v1").resolution_parity: float|null= botresolution_rate÷ humanresolution_rate(null when either arm has 0 conversations) — the PRD §6.1 Screen-B parity column.handover_rate: float 0..1(bot arm only) = bot-arm conversations later escalated to a human ÷ bot-arm conversations. v1 derivation:count(variant='bot' AND assigned_at IS NOT NULL) / count(variant='bot'). The dedicatedbot_arm_handover_to_humanevent (emitted from the existing bot→agent handover path, gated onroom.variant == 'bot') is the authoritative signal once the Data pipeline ingests it; until then theassigned_atproxy is used and labelled (handover_basis: "assigned_at_proxy_v1"). See L-1.csat_avg: float|null= best-effort;null⇒ FE shows "CSAT not available" (PRD §6.1 D3, Open Q-3). Consequences: ships a fully-typed, implementable comparison without blocking on the warehouse; every proxy is self-labelled so the dashboard never presents a proxy as a confirmed metric. Reversibility: High — swap the aggregation source behind the same endpoint contract.
ADR-9 — Split applies only to plain bot-intent paths (REV-2).
Context: find_default_path can resolve a path that is not a plain bot
reply: a CRM path (@path.crm_assignment → crm_intent_id at :84), an
agent-routed path (agent_id at :501), or a division-routed path
(division_id at :502). These already route deterministically (to CRM intent /
specific agent / division), so "split a % of them to a human" is ambiguous and
would change configured behavior.
Options: (a) apply the split only when the resolved path is a plain bot-intent
path (intent_id present, none of agent_id/division_id/crm_assignment); (b)
apply the split to every path and override; (c) treat agent/division/CRM paths as
"already human" for measurement.
Decision: (a) — the split step is a no-op for non-bot paths; their routing
is untouched and variant stays NULL (not a measured arm). The roll only ever
flips a conversation that would otherwise have been a 100 %-bot reply.
Consequences: keeps the experiment’s two arms clean (bot-reply vs human-agent)
and never silently re-routes a deliberately agent/CRM-targeted path. Scope is
explicit and grounded at :84/:501-502. Reversibility: High — widen the
scope predicate later.
Detail 2.2 — Sequence (mermaid, end-to-end incl. failure paths)
Routing — bot arm, human arm, no-agent, fail-safe:
sequenceDiagram
actor C as Customer
participant HUB as ProcessIncomingMessageWithResolve
participant DB as PostgreSQL
participant MIX as SendMixpanelEventWorker
participant AA as SendMessageAutoAssignAgentWorker
participant BOT as SeparateSendMessageWorker
C->>HUB: incoming message (webhook)
HUB->>DB: Rooms::FindOrCreateBy (:261)
HUB->>HUB: find_default_path (:488) → resolve Path (intent_id, is_auto_assign_agent, agent_id, division_id, crm_intent_id)
alt non-bot path (agent/division/CRM, ADR-9) OR split disabled OR config error (fail-safe ADR-4)
HUB->>HUB: keep deterministic routing; variant stays NULL (not a measured arm)
HUB-->>MIX: bot_traffic_split_decision_fallback (only on config error)
else split active on a bot path
alt room.variant already set (ADR-3)
HUB->>HUB: no re-roll
else first decision
HUB->>HUB: roll rand(100); arm = (roll < bot_percent ? bot : human)
HUB->>DB: UPDATE rooms SET variant WHERE id=? AND variant IS NULL
end
HUB->>DB: re-read rooms.variant = canonical arm (ADR-3)
HUB->>HUB: apply_arm(canonical): bot→keep intent_id; human→is_auto_assign_agent=true, clear intent_id/agent_id/division_id/crm_intent_id (ADR-5)
HUB-->>MIX: bot_traffic_split_assigned {variant, bot_percent, …} (best-effort)
end
HUB->>HUB: send_message_assign_agent (:534)
alt variant=bot OR untagged bot (intent_id branch)
HUB->>BOT: SeparateSendMessageWorker(intent_id)
BOT-->>C: bot reply
else variant=human (is_auto_assign_agent branch :545)
HUB->>AA: SendMessageAutoAssignAgentWorker
alt agent available
AA-->>C: assigned to agent
else no agent (BTS-S03)
AA->>DB: existing queue / offline behavior (bot does NOT take over)
end
end
Config save (FE → BE), happy + failure:
sequenceDiagram
actor A as Chatbot Admin
participant FE as TrafficSplitSection.vue
participant API as Grape PATCH …/traffic_split
participant UC as UpdateTrafficSplit use case
participant DB as PostgreSQL
participant MIX as SendMixpanelEventWorker
A->>FE: toggle ON, bot_percent=30, Save
FE->>FE: Vuelidate integer + between(0,100)
alt invalid
FE-->>A: inline "Enter a whole number between 0 and 100" (BTS-S01/ERR-1)
else valid
FE->>API: PATCH {traffic_split_enabled:true, bot_percent:30}
API->>API: set_role(owner/admin) + feature flag + plan guard
alt flag OFF / ineligible plan / role
API-->>FE: 403 (BTS-S01-NEG/-NEG2)
FE-->>A: section should not have been shown
else authorized
API->>UC: result
UC->>DB: UPDATE channel_integrations SET traffic_split_enabled, bot_percent (+PaperTrail)
alt persist ok
UC-->>MIX: bot_traffic_split_config_saved
API-->>FE: 200 {traffic_split_enabled, bot_percent}
FE-->>A: toast "Traffic split updated: 30% bot / 70% human"
else persist fails (5xx)
UC-->>MIX: bot_traffic_split_save_failed {error_code} (BE-emitted, REV-6)
API-->>FE: 5xx (Dry::Matcher failure)
FE-->>A: "Couldn't save traffic split. Try again." + Retry (BTS-S01/ERR-2)
end
end
end
Detail 2.3 — Database Model (DDL)
PostgreSQL; Rails ActiveRecord::Migration[7.1]; schema tracked in
db/schema.rb. Two additive migrations, no backfill, no data migration.
# db/migrate/<ts>_add_traffic_split_to_channel_integrations.rb
class AddTrafficSplitToChannelIntegrations < ActiveRecord::Migration[7.1]
def change
add_column :channel_integrations, :traffic_split_enabled, :boolean, default: false, null: false
add_column :channel_integrations, :bot_percent, :integer, default: 100, null: false
# Range guard at the DB (defense-in-depth alongside Grape validation)
add_check_constraint :channel_integrations, "bot_percent >= 0 AND bot_percent <= 100",
name: "chk_channel_integrations_bot_percent_range"
end
end
# db/migrate/<ts>_add_variant_to_rooms.rb
class AddVariantToRooms < ActiveRecord::Migration[7.1]
def change
add_column :rooms, :variant, :string, limit: 10 # 'bot' | 'human' | NULL (undecided)
add_index :rooms, :variant
# Composite index supports the comparison aggregation by channel + arm + time
add_index :rooms, [:channel_integration_id, :variant, :resolved_at],
name: "index_rooms_on_channel_variant_resolved"
end
end
Per-status lifecycle — rooms.variant:
| Value | Set by | Mutable? | Retention | Restore semantics | Visibility |
|---|---|---|---|---|---|
NULL (undecided) | default (room created) | → set once | n/a | n/a | internal |
bot | split step only when active on a bot path (REV-3) | no (write-once, ADR-3) | lives & soft-deletes with Room (acts_as_paranoid) | restored with the Room | internal + comparison |
human | split step (active, human arm) | no | same | same | internal + comparison |
A bot reply produced because the split is disabled, the path is non-bot (ADR-9), or a fail-safe fired (ADR-4) leaves
variant = NULL— these are not experiment arms and the comparison aggregation filters them out (WHERE variant IS NOT NULL). Only conversations decided by an active split are tagged.
Config columns (
traffic_split_enabled,bot_percent) are plain mutable settings onchannel_integrationswith no state machine; audited via existing PaperTrail on the model (confirm PaperTrail is enabled onChannelIntegration— Open Q-7).
Detail 2.4 — APIs
Outbound endpoints (consumers call us)
| Endpoint | Method | AuthN/AuthZ | Request schema | Response schema | Status codes | Idempotency | Versioning | Reuse? |
|---|---|---|---|---|---|---|---|---|
/api/v1/channel_integrations/:id/traffic_split | PATCH | session + set_role(%w[owner admin]) + Middlewares::Ownership + bot_traffic_split feature flag + plan gate | { traffic_split_enabled: boolean (required), bot_percent: integer 0..100 (required when enabled) } | { id, traffic_split_enabled, bot_percent } | 200; 422 invalid bot_percent/non-integer; 403 flag-off/ineligible/role; 5xx persist fail | natural — PATCH is idempotent (last write wins) | /api/v1/ (existing) | new-with-justification (ADR-6) |
/api/v1/channel_integrations/:id | GET | session + set_role(%w[owner supervisor admin]) | { id } | existing entity + traffic_split_enabled, bot_percent | 200; 403; 404 | safe | /api/v1/ | extended |
/api/v1/channel_integrations/:id/traffic_split/comparison | GET | session + set_role(%w[owner supervisor admin]) + feature flag | { id, date_from, date_to } (ISO-8601 date; range ≤ 90 days) | see typed schema below | 200; 403; 422 bad/over-long range; 5xx | safe | /api/v1/ | new-with-justification (ADR-8) |
Comparison response schema (v1, pinned — REV-4):
{
"updated_at": "2026-06-16T14:20:00Z", // ISO-8601
"resolution_basis": "resolved_at_present_v1", // self-labelled proxy (Open Q-2)
"handover_basis": "assigned_at_proxy_v1", // until bot_arm_handover_to_human ingested (L-1)
"resolution_parity": 0.92, // float|null = bot.resolution_rate / human.resolution_rate
"arms": {
"bot": {
"conversations": 372, // integer = count(variant='bot') in range
"resolution_rate": 0.78, // float 0..1 = resolved_at-present / conversations
"csat_avg": 4.4, // float|null (null ⇒ "CSAT not available")
"handover_rate": 0.24, // float 0..1 = (variant='bot' AND assigned_at NOT NULL)/conversations
"no_data": false // true when conversations == 0
},
"human": {
"conversations": 868,
"resolution_rate": 0.85,
"csat_avg": 4.7,
"no_data": false
}
}
}
Aggregation runs
WHERE variant IS NOT NULL(REV-3) overroomsfor the channel + range, grouped byvariant. Rates are fractions0..1(the FE renders as %).resolution_basis/handover_basisflag the v1 proxies so the dashboard never shows a proxy as a confirmed metric (ADR-8).
Validation (BTS-S01/ERR-1): Grape
requires :bot_percent, type: Integer, values: 0..100rejects150,-5,"30%"with 422 before the use case runs; the DB CHECK is defense-in-depth. The comparison endpoint validatesdate_from <= date_toand the ≤ 90-day window (422 otherwise). Disable semantics (REV-10): settingtraffic_split_enabled=falsepreserves the lastbot_percent(only the boolean flips), so re-enabling restores the prior percentage; the routing simply skips the split while OFF. OpenAPI: add all three todocs/openapi/openapi.yaml(AGENTS.md convention — the Grapedesc/success/failureblocks already power it).
Inbound webhooks (other services call us)
| Endpoint | Method | AuthN/AuthZ | Source service | Request schema | Response schema | Status codes | Idempotency | Versioning |
|---|---|---|---|---|---|---|---|---|
| n/a — no new inbound webhook | — | — | — | — | — | — | — | — |
The routing trigger is the existing incoming-message hub flow (
ProcessIncomingMessageWithResolve); no new callback is introduced.
Detail 2.A — UI Contract
| Surface | Component (new) | Props / inputs | Emitted events / API calls | Source data |
|---|---|---|---|---|
| Traffic Split section | TrafficSplitSection.vue | channelIntegrationId, initial traffic_split_enabled, bot_percent | on Save → channel-integration.ts updateTrafficSplit(PATCH); $toast on success/fail | GET /v1/channel_integrations/:id |
| Enable toggle | MpToggle (Pixel 3) | v-model boolean | toggling reveals % input + preview + info banner | local state |
| Bot % input | MpInputGroup+MpInput type="number"+MpInputRightAddon (%) | v-model integer; Vuelidate integer,between(0,100) | inline error on invalid | local state |
| Live preview | MpText | computed ~{n}% to bot, ~{100−n}% to human agents | — | computed |
| Info banner | MpBanner variant="info" is-inline | static copy (no-agent → queue) | — | static |
| Save / Cancel | MpButton (Save primary, Cancel ghost) | :is-loading during save | calls API | — |
| Comparison view | BotHumanComparison.vue | channelIntegrationId, date range | GET …/comparison; Retry refetch | comparison endpoint |
| Comparison table | pixel-table + MpBadge legend, MpSelect (channel/range) | header-list,data-list,:empty-state | filter-change → refetch | comparison endpoint |
Detail 2.B — Data-Fetching Strategy
- Read config: the host settings page already loads the channel integration;
TrafficSplitSectionreceivestraffic_split_enabled/bot_percentas props (or reads from the channel store). No extra fetch on mount. - Save: Pinia action wraps
updateTrafficSplit, following thefetchStatus: pending|resolved|rejectedpattern (store/ai-agent/actions.ts);AbortControllerfor cancel-on-unmount (perchannel-integration.ts). - Comparison: lazy fetch on view open + on filter change; show skeleton while
pending; cache last successful result so Retry doesn’t flash empty.
Detail 2.C — UI State Matrix
| Surface | Empty / Disabled | Loading | Error | Success |
|---|---|---|---|---|
| Traffic Split section | split OFF (default): toggle off, % + preview + banner hidden; helper “All incoming chats are handled by the bot (100%).” | Save disabled + spinner (:is-loading); prior value retained | invalid → inline “Enter a whole number between 0 and 100”; save 5xx → “Couldn’t save traffic split. Try again.” + Retry; emit bot_traffic_split_save_failed | toggle ON, value saved, toast “Traffic split updated: 30% bot / 70% human” |
| Comparison view | bot arm 0 convos → bot column “No data yet” (not 0%); range empty → “No conversations in this range yet” | skeleton rows | “Couldn’t load comparison. Try again.” + Retry; no partial/misleading numbers | two-arm table + Updated <ts> |
| Gated (flag OFF / ineligible plan / Agent role) | section + comparison not rendered | — | direct API → 403 | — |
Detail 2.D — Data Integrity Matrix
| Invariant | Enforced by |
|---|---|
bot_percent ∈ [0,100] integer | Grape values: 0..100, type: Integer + DB CHECK chk_channel_integrations_bot_percent_range |
variant ∈ {bot, human, NULL} | application sets only 'bot'/'human'; limit: 10 column |
variant write-once | UPDATE … WHERE variant IS NULL (ADR-3) |
| Split only when enabled | traffic_split_enabled guard before roll |
| Config only for entitled org | feature-flag guard on write + read endpoints |
Detail 2.E — Concurrency Collision Map
| Collision | Scenario | Resolution |
|---|---|---|
| Double first-message (PRD §15 Q4 / REV-1) | two messages create/find the same new Room near-simultaneously, both see variant IS NULL, and each must also route (reply/assign) | conditional UPDATE … SET variant WHERE id=? AND variant IS NULL persists one arm; both messages then re-read variant and route via apply_arm(canonical) (ADR-3) — the race loser routes per the persisted arm, so you never get one bot reply + one human assignment for the same conversation |
| Config save vs in-flight routing | Admin changes bot_percent while messages route | each conversation is decided at its own arrival using the then-current bot_percent; already-decided rooms keep their variant (no retro change) |
| Concurrent config saves | two Admins PATCH the same channel | PATCH last-write-wins; PaperTrail records both (Open Q-7) |
Detail 2.F — Async Job / Event Consumer Spec
| Worker | Status | Trigger | Effect | Failure behavior |
|---|---|---|---|---|
SendMessageAutoAssignAgentWorker | reused, unchanged | is_auto_assign_agent branch (:545) for human arm | assign to agent or enter existing queue/offline | unchanged from today (BTS-S03/ERR-1: bot never rescues) |
SendMixpanelEventWorker (queue: :event_tracker) | reused | after arm decision / config save / fallback / save fail / handover | emit bot_traffic_split_* event | best-effort — emit must never block/fail routing (PRD §7 #4) |
SeparateSendMessageWorker | reused | bot arm (intent_id branch) | bot reply | unchanged |
Detail 2.F.1 — Responsibility Boundary Matrix
| Step | Owning squad / service | Inbound trigger | Outbound effect | Failure handler | PRD anchor |
|---|---|---|---|---|---|
| 1. Resolve path + split decision | Chatbot / chatbot BE | incoming message | variant stamped; arm chosen | rescue → bot arm + decision_fallback | §7 #2, §8.1 |
| 2. Bot reply | Chatbot | bot arm | SeparateSendMessageWorker | existing | §7 #2 |
| 3. Human assignment / queue | Chatbot | human arm | SendMessageAutoAssignAgentWorker | existing queue/offline | §7 #3, BTS-S03 |
| 4. Emit analytics | Chatbot → Mixpanel | arm decision / config save | bot_traffic_split_* events | best-effort, swallow errors | §7 #4, §10 |
| 5. Ingest events + dashboard | Data | Mixpanel events | comparison dashboard | Data pipeline TTL | §13 |
| 6. Canonical resolution def + CSAT join | Data | comparison query | metric definitions | n/a — Open Q-2/Q-3 | §13, §15 Q2/Q3 |
The Chatbot↔Data boundary at steps 5–6 is the one cross-squad seam. The routing (steps 1–4) is entirely within
chatbot. No disagreement with PRD §13.
Detail 2.F.2 — State Surface Contract
| Entity | State field / event | Default | Updated by | Read via | Stale window |
|---|---|---|---|---|---|
ChannelIntegration | traffic_split_enabled, bot_percent | false, 100 | UpdateTrafficSplit use case | GET /v1/channel_integrations/:id | immediate (read-your-write) |
Room | variant | NULL | split step (write-once) | comparison aggregation; bot_traffic_split_assigned | set at conversation start |
| Comparison | per-arm aggregates | computed | read endpoint | GET …/comparison | as fresh as rooms + analytics ingest lag |
Detail 2.G — Cross-Layer Contract Verification
| Endpoint | BE response schema | FE expected schema | Match? | Gaps |
|---|---|---|---|---|
PATCH …/traffic_split | { id, traffic_split_enabled, bot_percent } (snake) | same (consumed as-is) | yes | none — no casing transform; FE reads snake_case (see channel-integration.ts) |
GET …/:id (extended) | existing entity + 2 fields | section reads traffic_split_enabled,bot_percent | yes | none |
GET …/comparison | { updated_at, arms:{bot:{…,no_data}, human:{…}} } | table maps arms→rows; no_data→“No data yet”; csat_avg:null→“CSAT not available” | yes | none — null CSAT + no_data flag handled by FE (PRD §6.1 D3) |
All rows
yes. Error envelope is the existingerror_responseshape; FE’s toast/Retry handles 4xx/5xx uniformly.
Detail 2.H — End-to-End Data Flow
- Save config: Admin →
TrafficSplitSection.vue→ Pinia action →channel-integration.ts updateTrafficSplit→PATCH /v1/channel_integrations/:id/traffic_split→UpdateTrafficSplituse case →UPDATE channel_integrations(+PaperTrail) → 200 → store$patch→ toast. Side effects:bot_traffic_split_config_saved. - Route a chat: customer message → hub →
Rooms::FindOrCreateBy→find_default_path(resolve path + split step) → atomicvariantstamp →send_message_assign_agent→ bot (SeparateSendMessageWorker) or human (SendMessageAutoAssignAgentWorker). Side effects:bot_traffic_split_assigned(always),decision_fallback(on error). - View comparison: Admin/SPV →
BotHumanComparison.vue→GET …/comparison→ aggregateroomsbyvariant→ table render (skeleton→rows / empty / error).
Detail 2.I — Scope Boundaries
- BE create: 2 migrations;
UseCases::API::FrontendService::V1::ChannelIntegration::UpdateTrafficSplit; comparison use case + repository; response entities; specs. - BE modify:
app/api/frontend_service/v1/channel_integration.rb(+2 routes);find_default_path+send_message_assign_agent(split step + variant stamp + events);ChannelIntegrationGET entity (+2 fields);app/models/room.rb(validation/constant forvariant);docs/openapi/openapi.yaml. - BE NOT touched:
SendMessageAutoAssignAgentWorker,SeparateSendMessageWorker, assignment/queue logic (reused unchanged — BTS-S03). - FE create:
TrafficSplitSection.vue,BotHumanComparison.vue, store action(s), service methodsupdateTrafficSplit+getTrafficSplitComparison, endpoint entries; Vitest + Playwright specs. - FE modify:
pages/chat/settings/index.vue(mount section + comparison, gated);common/services/main/v1/channel-integration.ts;common/services/main/endpoint.ts. - FE NOT touched: unrelated settings panels; auth/session plumbing.
- Shared module impact: snake_case contract shared FE↔BE; no transform layer needed (§2.G).
Detail 2.J — Asset Inventory (frontend half)
| Asset | Type | Source | Format & sizes | Path in repo |
|---|---|---|---|---|
| Info icon (banner) | icon | @mekari/pixel3 built-in | component prop | n/a — DS-provided |
| Empty-state illustration (comparison) | illustration | n/a — design pending (Open Q-4) | TBD | TBD |
No new bespoke assets in the structural build. New copy strings (labels, toast, banner) are introduced; since there is no i18n system (verified), they are hardcoded in templates per current convention — flag for a future localization pass (Open Q-6).
3. High-Availability & Security
The split decision is in-process, on the existing hot path — no new network
or DB round-trip beyond the single atomic variant write (the channel config is
already loaded with the ChannelIntegration record). Routing availability is
therefore unchanged; the experiment can only ever fail safe to today’s behavior
(ADR-4).
Performance Requirement
- Backend: the split step adds an
O(1)rand+ one conditionalUPDATE(the variant stamp, indexed onrooms.idPK) on the first message of a conversation only; subsequent messages skip the roll. Target ≤ 5 ms added per incoming message (PRD §5 Performance); no extra read (config preloaded); analytics emit is async (SendMixpanelEventWorker,queue: :event_tracker). Comparison endpoint is a read-only aggregate over the new composite indexindex_rooms_on_channel_variant_resolved— target p95 < 500 ms for a 14-day window; cap the range server-side (e.g. ≤ 90 days) to bound the scan. - Frontend: the section is a small form on an already-loaded settings page — no bundle-budget concern beyond the new SFCs (each ≤ 250 lines per FE convention). Comparison view fetches once per open/filter; skeleton during load; browser support + a11y per existing chatbot-fe baseline.
Monitoring & Alerting
Reuse the Mixpanel emit pattern (SendMixpanelEventWorker.perform_async(org_id, name, payload)); event names follow the PRD §10 catalog:
| Event | Trigger | Properties |
|---|---|---|
| Event | Emitted from | Trigger |
| --- | --- | --- |
bot_traffic_split_config_saved | BE — UpdateTrafficSplit use case | config saved |
bot_traffic_split_assigned | BE — apply_arm step in the hub | arm decided (active split only) |
bot_traffic_split_decision_fallback | BE — rescue in the split step | config unreadable → bot fail-safe |
bot_traffic_split_save_failed | BE — UpdateTrafficSplit use case on the 5xx path (REV-6) | config save errored |
bot_arm_handover_to_human | BE — the existing bot→agent handover path, gated on room.variant == 'bot' (REV-5) | a bot-arm conversation later escalates to a human |
REV-6:
bot_traffic_split_save_failedis emitted server-side (the FE has no Mixpanel client), keeping all five events on the oneSendMixpanelEventWorkerpath. REV-5:bot_arm_handover_to_humanis emitted at the existing bot→agent handover/assign action, gated onroom.variant == 'bot'; until the Data pipeline ingests it,handover_ratefalls back to theassigned_atproxy (ADR-8, L-1).
- BE alerts (PRD §10):
decision_fallbackrate > 1 % of assignments in 1h →#chatbot-alerts(config read failing / silently defaulting to 100 % bot); human-arm queue wait p90 > 15 min during experiment hours →#chatbot-alerts. - Routing fidelity (SC-1): dashboard compares observed bot share vs configured
bot_percentper active experiment (PRD §10.1: alert if drift > 10pp/week). - FE: reuse existing error monitoring for the save-failed UX (toast + Retry); the analytics event itself is BE-emitted (REV-6).
- Cross-layer trace: existing request → worker correlation (room_id / channel_integration_id carried on every event) ties config save and routing.
Logging
- BE: structured log on the fallback path (
bot_traffic_split_decision_fallbackwithreason); follow existing repo log conventions (frozen_string_literal, no PII in event payloads — use ids, not message content). - FE: existing console/error reporting for save failures (no PII).
- PII: events carry ids only (
organization_id,channel_integration_id,room_id,actor_id) — never message text or customer identifiers, matching the existing'Process Message'payload shape.
Security Implications
- AuthN/AuthZ (REV-9 resolved): every new endpoint goes through the existing
session + Grape
set_role+Middlewares::Ownershipchain. Decision: the write endpoint usesset_role(%w[owner admin])(configure), while read/comparison useset_role(%w[owner supervisor admin])(view) — this tightens the SPV to view-only per the PRD, diverging from the existing channel endpoints that gate the three roles together (channel_integration.rb:89). Ownership middleware confines actions to the caller’s own org (no cross-tenantchannel_integration_id). - Feature/plan gate is server-side (REV-11): hiding the FE section is UX only;
the authoritative gate returns 403 on any direct call (BTS-S01-NEG/NEG2) — a
crafted request cannot persist
bot_percent. The gate has two checks, both in theUpdateTrafficSplituse case before persistence: (1) org feature —OrganizationFeature.exists?(feature_id: <bot_traffic_split>.id, organization_id:, enabled: true)(+ the optional globalSystemPreferencerollout kill-switch); (2) plan eligibility — via the existing billing pathRepositories::Orders::UseMekariBilling+ active-subscription component check (the same pattern atprocess_incoming_message_with_resolve.rb:54). The exact eligible-plan list stays Open Q-9, but the enforcement path is fixed. - Input validation:
bot_percentvalidated at the edge (Grapevalues: 0..100, type: Integer) and the DB (CHECK); rejects injection-via-type ("30%", floats, negatives) with 422. - No new secrets, no new external egress (Mixpanel already integrated).
- Tenant isolation in comparison: the aggregate filters by the caller’s
organization_id+ the requestedchannel_integration_id(ownership-checked).
Role × Endpoint Authorization Matrix
| Role | Endpoint(s) | Permitted methods | Tenant scope | UI visibility (FE) | Additional constraint | Audit trail |
|---|---|---|---|---|---|---|
Chatbot Admin (owner,admin) | …/traffic_split, …/:id, …/comparison | PATCH (config), GET (view) | own org | section + comparison editable | — | PaperTrail + config_saved |
SPV (supervisor) | …/:id, …/comparison | GET | own org | comparison visible; config read-only | no PATCH — excluded from the write endpoint's set_role(%w[owner admin]) (REV-9) | view only |
| Human Agent | none | — | — | not rendered | — | — |
| End-customer | none | — | — | n/a | — | — |
No role from Detail 1.A is left without a row. The
supervisorwrite restriction (PRD: SPV is view-only) is enforced byset_role(%w[owner admin])on the write endpoint (REV-9 — decided, was Open Q-5), tightening over the existing combinedowner/supervisor/admingate.
Detail 3.A — Failure Mode Catalog (merged)
| Surface | FE behavior on failure | BE response on failure | Code-shape consistency |
|---|---|---|---|
Save config — invalid bot_percent | inline “Enter a whole number between 0 and 100”; Save blocked | 422 field error (Grape validation) | yes |
| Save config — flag off / ineligible / role | section not shown; if forced → generic error | 403 | yes |
| Save config — persist 5xx | keep prior value; “Couldn’t save…” + Retry; emit save_failed | 5xx via Dry::Matcher failure | yes |
| Routing — config read error/ambiguous | n/a (no UI) | fail-safe to bot arm + decision_fallback log/event; chat not dropped | yes (ADR-4) |
| Routing — analytics emit fails | n/a | swallowed (best-effort); routing proceeds | yes |
| Human arm — no agent | n/a (customer sees existing offline/queue UX) | existing queue/offline; bot never rescues | yes (BTS-S03) |
| Comparison — query fails | “Couldn’t load comparison. Try again.” + Retry; no partial numbers | 5xx | yes |
| Comparison — bot arm 0 rows | bot column “No data yet” | no_data: true in payload | yes |
Detail 3.A.1 — Branch & Skip Catalog
| Branch trigger | Where checked | Downstream effect | Audit trail | User-visible? |
|---|---|---|---|---|
| non-bot path (agent/division/CRM) | scope check (ADR-9) | skip split → keep deterministic routing; variant stays NULL | none | no |
traffic_split_enabled = false | find_default_path split step | skip roll → 100 % bot (today’s behavior); variant stays NULL (REV-3) | none (not a measured arm) | no (bot replies as normal) |
room.variant already set | split step (ADR-3) | skip roll → re-read + apply stored variant (no re-bucketing) | none extra | no |
| config read error | rescue (ADR-4) | bot fail-safe; variant stays NULL | decision_fallback | no |
| feature flag OFF / ineligible plan | endpoint guard + FE gate | section hidden; write → 403 | — | yes (control absent) |
Detail 3.B — Error Response Catalog (BE)
| Condition | HTTP | Body (existing error_response envelope) |
|---|---|---|
bot_percent not integer / out of 0..100 | 422 | field error: bot_percent |
traffic_split_enabled true but bot_percent missing | 422 | field error: bot_percent required when enabled |
| feature flag OFF / ineligible plan / role | 403 | forbidden |
| channel not owned by caller | 403/404 | ownership middleware |
| persist failure | 5xx | internal error |
comparison date_from > date_to or range > 90 days (REV-8) | 422 | field error: date_from/date_to |
Detail 3.C — Error Message Catalog (FE)
| Condition | Message | Action |
|---|---|---|
| invalid percent | “Enter a whole number between 0 and 100” | inline; Save blocked |
| save 5xx/network | “Couldn’t save traffic split. Try again.” | toast + Retry; emit save_failed |
| save success | “Traffic split updated: 30% bot / 70% human” | toast |
| comparison load fail | “Couldn’t load comparison. Try again.” | Retry; render nothing partial |
| comparison empty range | “No conversations in this range yet” | empty state |
| bot arm no data | “No data yet” | bot column placeholder (not 0 %) |
Detail 3.D — Compliance & Data Governance
n/a — no new personal data. The variant tag is an internal routing label;
analytics events carry ids only (no message content/PII). Retention inherits the
Room (acts_as_paranoid) and the existing analytics TTL (PRD §5.1).
Detail 3.E — Accessibility
Reuse Pixel 3 components (labeled MpFormControl/MpFormLabel, focusable
toggle/input, MpBanner with text alternative). Inline validation errors are
associated to the input via MpFormErrorMessage; comparison table uses semantic
table markup. Target the existing chatbot-fe a11y baseline; confirm DS components
with the Pixel MCP for ARIA props.
4. Backwards Compatibility and Rollout Plan
Compatibility
- BE: purely additive — two new columns (defaults preserve today’s behavior:
traffic_split_enabled=false⇒ 100 % bot), two new routes, one extended GET entity. No change to existing assignment/queue workers. Old clients ignore the new GET fields. - FE: the section renders only when the org flag + plan allow; otherwise the settings page is unchanged. No saved-state/cache migration.
- Cross-layer: snake_case contract is stable; the extended GET is backwards-compatible (additive fields).
Rollout Strategy
- Deploy order: Backend first (migrations + endpoints + routing flag behind the org feature gate, which is OFF), then Frontend. Rationale: the routing split and config persistence must exist and be gated before any UI can toggle them; with the feature OFF, BE deploy is a no-op for live traffic.
- Feature-flag coordination: a single org-level entitlement
(
OrganizationFeature 'bot_traffic_split') gates both layers; an optional globalSystemPreferencerollout kill-switch is checked first (ADR-7). FE visibility and BE authorization both read the same gate, so they cannot drift into a state where the UI shows a control the BE rejects (beyond the intended 403-on-forced-call guard). - Rollback per layer: disabling the org feature (or the global
SystemPreference) instantly reverts to 100 % bot without a deploy (PRD §10.1). Code rollback: revert FE first, then BE; the additive columns can remain (inert when the flag is OFF). - Stop conditions:
decision_fallback> 1 % sustained 24h (PRD §10.1) → disable the org flag; human-arm queue p90 > 15 min during experiment hours → investigate staffing / pause experiment.
Detail 4.A — Cross-Layer Rollout Compatibility Matrix
| Scenario | FE | BE | Works? | Mitigation |
|---|---|---|---|---|
| Pre-deploy | Old | Old | yes | baseline (100 % bot) |
| Backend first | Old | New (flag OFF) | yes | new columns default to today’s behavior; no UI yet |
| Frontend first | New | Old | yes | section gated by feature flag; with BE old the flag is absent ⇒ section hidden; do not enable the flag until BE is live |
| Both deployed | New | New | yes | target state (flag enabled per org during rollout) |
| Backend rollback | New | Old | yes | FE section hidden (flag gone); no broken calls |
| Frontend rollback | Old | New (flag OFF) | yes | routing inert while flag OFF; config simply not editable from UI |
No “no” cells. The only ordering rule: enable the org flag only after BE is deployed.
Detail 4.B — Configuration Contract
| Layer | Env var / flag | Type | Default | Required | Provisioner | Secret? |
|---|---|---|---|---|---|---|
| BE | OrganizationFeature code='bot_traffic_split' | per-org feature row | OFF (absent) | yes (to enable) | Ops/Commercial enablement | no |
| BE | SystemPreference group_code='rollout', code='bot_traffic_split' (optional kill-switch) | global flag | OFF/absent | no | Ops | no |
| BE | channel_integrations.traffic_split_enabled | boolean column | false | per-channel | Admin via UI | no |
| BE | channel_integrations.bot_percent | integer 0..100 | 100 | per-channel | Admin via UI | no |
| FE | reads org flag via checkSubscription('bot_traffic_split') | runtime | — | — | derived from BE | no |
Detail 4.C — Test Plan (commands the agent will run)
Commands sourced from the repos (BE AGENTS.md / .rspec; FE package.json).
| Layer | Command (source) | What it must prove |
|---|---|---|
| BE unit/use-case | bundle exec rspec spec/core/use_cases/system/hub/process_incoming_message_with_resolve_spec.rb (exists — extend it) | split step: bot/human arms; non-bot path skipped (ADR-9); disabled→bot with variant NULL; fail-safe→bot+fallback; no re-roll on 2nd message; race loser routes per persisted variant (ADR-3) |
| BE unit (worker reuse) | bundle exec rspec spec/app/worker/send_message_auto_assign_agent_worker_spec.rb (exists) | human-arm assignment/queue unchanged (BTS-S03) |
| BE request | bundle exec rspec spec/api/frontend_service/v1/channel_integration_spec.rb (new — create; REV-7) | PATCH persists bot_percent; 422 invalid/over-range date; 403 flag-off/ineligible/SPV-write; comparison schema + WHERE variant IS NOT NULL |
| BE full | bundle exec rspec | no regressions |
| BE static | bundle exec rubocop ; bundle exec reek (AGENTS.md §78-79) | style + smell clean |
| FE unit | pnpm test (vitest run — package.json:17) | section: invalid blocks Save; toast on success; comparison renders rows + “No data yet” |
| FE E2E | pnpm test:e2e (playwright — package.json:22; specs under tests/visual/) | configure split end-to-end; gated when flag OFF |
| FE lint | pnpm lint (package.json) | TS + prettier clean |
| Cross-layer | manual/integration: save % → send messages → observe arm distribution ≈ bot_percent + comparison reflects arms | routing fidelity (SC-1) + end-to-end contract |
Detail 4.D — Agent Execution Plan
| Order | Layer | Chunk | Files to modify/create | Commands | Acceptance criteria |
|---|---|---|---|---|---|
| 1 | BE | Migrations: channel config + room variant | db/migrate/<ts>_add_traffic_split_to_channel_integrations.rb, db/migrate/<ts>_add_variant_to_rooms.rb, db/schema.rb | bundle exec rails db:migrate | schema has traffic_split_enabled,bot_percent (+CHECK), rooms.variant (+indexes); rails db:migrate:status green |
| 2 | BE | Split decision (apply_arm) + variant stamp + events in hub | app/core/use_cases/system/hub/process_incoming_message_with_resolve.rb, app/models/room.rb (variant constant/validation) | bundle exec rspec spec/core/use_cases/system/hub/process_incoming_message_with_resolve_spec.rb (extend) | stub rand: <pct→bot/intent kept; ≥pct→human (is_auto_assign_agent true & intent/agent/division/crm cleared); non-bot path skipped (ADR-9), variant NULL; disabled→bot, variant NULL (REV-3); error→bot+decision_fallback, variant NULL; 2nd message no re-roll; race loser routes per persisted variant (ADR-3); bot_traffic_split_assigned emitted only when tagged |
| 3 | BE | Config write endpoint + use case + entity + GET extension | app/api/frontend_service/v1/channel_integration.rb, app/core/use_cases/api/frontend_service/v1/channel_integration/update_traffic_split.rb (new), response entities, docs/openapi/openapi.yaml | bundle exec rspec spec/api/frontend_service/v1/channel_integration_spec.rb (new — create) | 200 persists; disable preserves bot_percent (REV-10); 422 invalid; 403 flag-off/ineligible-plan/SPV-write (REV-9/REV-11); GET returns the 2 fields; config_saved + save_failed-on-5xx emitted |
| 4 | BE | Comparison read endpoint + aggregate repo | channel_integration.rb (+route), comparison use case + repository, entity, openapi | bundle exec rspec spec/api/frontend_service/v1/channel_integration_spec.rb (new) | aggregates WHERE variant IS NOT NULL (REV-3); typed schema (REV-4) incl. resolution_parity, *_basis; range>90d→422 (REV-8); bot 0 rows→no_data:true; CSAT null tolerated; 403 gated |
| 5 | BE | Static analysis | — | bundle exec rubocop ; bundle exec reek ; bundle exec rspec | clean; full suite green |
| 6 | FE | API client: endpoints + service methods | common/services/main/endpoint.ts, common/services/main/v1/channel-integration.ts | pnpm lint | updateTrafficSplit (PATCH …/traffic_split) + getTrafficSplitComparison exist & typed |
| 7 | FE | Traffic Split section + store action | modules/settings/.../TrafficSplitSection.vue (new), Pinia action, mount in pages/chat/settings/index.vue (gated by checkSubscription) | pnpm test | toggle/%/preview/Save; Vuelidate integer+between(0,100) blocks invalid; success toast; hidden when flag OFF |
| 8 | FE | Comparison view | modules/settings/.../BotHumanComparison.vue (new), service wire-up, mount | pnpm test | renders two-arm pixel-table; skeleton/empty/error+Retry; “No data yet” for bot 0 rows |
| 9 | FE | E2E + lint | tests/visual/** spec | pnpm test:e2e ; pnpm lint | configure-split journey passes; gated path passes |
Order rule: BE chunks 1→5 land (flag OFF, no live impact) before FE chunks 6→9. Each chunk’s ACs must pass before the next opens.
Detail 4.E — Verification & Rollback Recipe
- Pre-merge verification (in order):
- BE: 1)
bundle exec rails db:migrate(+db:rollbackround-trip to prove reversibility); 2)bundle exec rspec; 3)bundle exec rubocop;bundle exec reek. - FE: 1)
pnpm lint; 2)pnpm test; 3)pnpm test:e2e.
- BE: 1)
- Post-deploy verification signals:
bot_traffic_split_assignedvolume > 0 on a pilot channel after enabling the flag; observed bot share ≈ configuredbot_percent(SC-1).bot_traffic_split_decision_fallback≈ 0 (alert if > 1 %/1h).bot_traffic_split_config_savedemitted on a test save; GET returns the saved value.- human-arm queue wait p90 within bounds during experiment hours.
- Rollback recipe (deploy-order-aware):
- Immediate: disable
OrganizationFeature 'bot_traffic_split'for affected orgs (or flip the globalSystemPreferencekill-switch) → instant revert to 100 % bot, no deploy. (Already-decided rooms keep theirvariant; harmless.) - If code-level revert needed: roll back FE (section disappears), then BE routing change.
- The additive columns/indexes may remain (inert when the flag is OFF); drop only if fully abandoning, via a reverse migration.
- Immediate: disable
Detail 4.F — Resource & Cost Notes (advisory)
Negligible: two columns + two indexes on existing tables; one extra UPDATE per
new conversation; async Mixpanel events on the existing :event_tracker queue.
Comparison endpoint is a bounded indexed aggregate. No new infrastructure.
5. Concern, Questions, or Known Limitations
| # | Type | Question / concern | Owner | Blocking? |
|---|---|---|---|---|
| Q-1 | Decision to confirm | Config scope — ADR-1 stores traffic_split_enabled/bot_percent on channel_integrations (per channel), reconciling the PRD’s “Path config” wording. Confirm per-channel (not per-paths) is intended. | PM + Eng | yes — sets the migration target |
| Q-2 | Dependency (Data) | Canonical “resolved” definition segmentable by variant for the ⭐ resolution-parity KPI (PRD §15 Q2). ADR-8 ships a resolved_at/is_closed proxy until confirmed. | PM + Data | no for routing; yes for KPI accuracy |
| Q-3 | Dependency / Risk | CSAT source joinable to variant; not all channels collect CSAT (PRD §15 Q3, §6.1 D3). Comparison degrades to “CSAT not available”. | PM + Data | no — secondary metric |
| Q-4 | Design | Figma frames for Screen A & B (PRD §6.1 D1–D4): inline section vs own tab; starting-% guard-rail copy; parity-formula display. | Designer | yes for FE pixel-polish; structure can start |
| Q-5 | ✅ Resolved (REV-9) | SPV write restriction — decided: write endpoint uses set_role(%w[owner admin]); read/comparison keep owner/supervisor/admin. Diverges deliberately from channel_integration.rb:89. (Confirm with PM is a courtesy, not a blocker.) | Eng + PM | no (decided) |
| Q-6 | Limitation | No i18n in chatbot-fe (verified) — new strings are hardcoded per current convention. Acceptable for Phase 1? | FE lead | no |
| Q-7 | To verify | PaperTrail on ChannelIntegration — confirm the model is audited so config changes are tracked (the config-save audit relies on it + the Mixpanel event). | Eng | no — event covers analytics regardless |
| Q-8 | Decision (PRD §15 Q5) | Starting-% guard-rail — advisory helper text only (not an enforced cap). Confirm. | PM | no |
| Q-9 | Decision (PRD §15 Q1) | Plan eligibility — which plans get Traffic Split (proposed Professional + Enterprise w/ chatbot). The enforcement path is fixed (REV-11: UseMekariBilling + subscription check); only the eligible-plan list is open. | PM + Commercial | no — list only; path decided |
| L-1 | Known limitation | Handover derivation (variant='bot' AND assigned_at IS NOT NULL proxy vs the dedicated bot_arm_handover_to_human event) — confirm the canonical signal with Data (ADR-8); response self-labels the basis. | Eng + Data | no |
| REV-12 | Citation drift (R2 review) | ADR-5 branch line numbers stale. ADR-5 cites the send_message_assign_agent branch order as agent_id (:539) and intent_id (:548); current HEAD (fa6dd8b79) has elsif agent_id.present? at :538 and elsif intent_id.present? at :547. The other anchors ADR-5 relies on (crm_intent_id :536, division_id :543, is_auto_assign_agent :545, plus :84/:501-502) are exact. Cosmetic — refresh ADR-5 to :538/:547. | Eng | no |
Review reconciliation (R1 — see
-review.md): findings REV-1, REV-2, REV-3, REV-4, REV-5, REV-6, REV-7, REV-8, REV-9, REV-10, REV-11 are all addressed in this revision (ADR-3/4/5/8/9, §2.1–§2.4, §3, §4.C/§4.D). Only Q-1 (config scope, PM confirm) remains a true pre-build blocker; Q-2/Q-3/Q-4/Q-7 and L-1 refine the comparison view but do not block routing/config.
6. Comment logs
| Date | Comment(s) From | Action Item(s) |
|---|---|---|
| 2026-06-20 | RFC author (drafted from PRD v1.2 via rfc-starter) | Initial draft (R0) |
| 2026-06-20 | rfc-reviewer cycle R1 (-review.md, score 7.5 → PROCEED) | 11 findings raised (REV-1..11) |
| 2026-06-20 | RFC author (R1 fixes) | Addressed REV-1..11: ADR-3 route-by-persisted-variant; ADR-9 split scope; variant tagged only under active split; pinned comparison schema; BE-emitted save_failed; handover emit point; SPV write owner/admin; date-range cap; disable-preserves-bot_percent; plan-gate path. Only Q-1 remains a pre-build blocker. |
7. Ready for agent execution
yes — for the routing + configuration scope (BTS-S01, S02, S03, and the NEG guard rails).
yes (with labelled proxies) — for the comparison view (BTS-S04): after R1, the
endpoint contract is fully typed (§2.4) and the v1 resolution/handover metrics are
self-labelled proxies (resolution_basis/handover_basis), so an agent can build
it now; KPI accuracy firms up once the Data-squad confirms Q-2/Q-3/L-1, and
visual polish awaits the Q-4 Figma frames. The one true pre-build blocker is Q-1
(config-scope PM confirmation, the migration target).
Post-review (R1): the
rfc-reviewerpass scored the R0 draft 7.5 / Strong / PROCEED with 11 findings; all 11 are addressed in this revision (see Comment log + the review file ledger). The two score-capping gaps (ACV/DIC from REV-1/REV-4) are closed: routing now routes-by-persisted-variant and the comparison contract is pinned.
Readiness-gate status:
- §1 Design References (FE): surfaces listed; both frames
n/a — design pending(Open Q-4) — structural build allowed, pixel-polish gated. ✅ (with noted gap) - §1 PRD-to-Schema Derivation (BE): every entity/attribute/rule mapped to table.column + endpoint/event + enforcement. ✅
- Detail 1.C Per-Story Change Map: all 7 stories, one row each, layer scope + FE/BE + verifiable AC. ✅
- Repo Reading Guide (2.0): anchors for both layers; contracts classified reuse/extend/new. ✅
- Source Verification: every anchor/pattern/contract carries concrete
file:lineevidence; unverifiable Data items moved to Open Questions (not invented). ✅ - Design ↔ Code Mapping (FE): frames mapped to new SFCs + backing endpoints; tokens via Pixel MCP at build. ✅ (design-pending noted)
- Asset Inventory: no new bespoke assets; new copy strings flagged (no i18n, Q-6). ✅
- Mermaid diagrams: repo map, end-to-end component, ER, state, branch/skip, sequence (happy + failure, both flows). ✅
- DDL: complete with per-status lifecycle for
variant; every row traces to a PRD-to-Schema row. ✅ - APIs: outbound (1 extended, 2 new-with-justification) + inbound (n/a); each tagged. ✅
- Cross-Layer Contract Verification: all rows
yes. ✅ - End-to-End Data Flow: traced for save, route, view. ✅
- UI State Matrix / Failure Catalog / Error catalogs: complete and aligned. ✅
- Cross-Layer Rollout Matrix: complete; deploy order = BE-first, enable-flag-after-BE. ✅
- Configuration Contract: per-layer; single org flag + optional kill-switch. ✅
- Agent Execution Plan: 9 ordered chunks, each files + commands + assertable AC. ✅
- Verification & Rollback Recipe: runnable per-layer commands; named signals; flag-flip rollback. ✅
Optional next step: hand to
rfc-reviewerfor a second-pass score. Confirm Open Q-1/Q-5/Q-9 with PM before starting Detail 4.D chunk 3 (config endpoint).