Skip to main content

RFC: Bot-vs-Human Traffic Split — Phase 1: Per-Conversation Split

Document Conventions (do not remove)

This RFC follows the Qontak RFC Template format for governance — the Metadata table, Confluence sections 1–6, and Comment logs are mandatory; sections that do not apply are marked N/A — reason rather than deleted.

It is also agent-execution-ready: §1 Design References (FE half) + §1 PRD-to-Schema Derivation (BE half), §2 Repo Reading Guide (Detail 2.0) for both layers, mermaid diagrams, the §2.G Cross-Layer Contract Verification, and §4 Agent Execution Plan + Verification & Rollback Recipe are complete before §7 Ready for agent execution: yes.

Delivery & project management live elsewhere. This RFC is the technical artifact only — no staffing, effort estimates, timeline, or rollout schedule. Those live in the initiative's delivery/ folder. Until handoff, the Metadata Delivery row reads not yet handed to delivery.

The YAML frontmatter at the very top is the machine-readable index agents parse. The metadata table below is the human-readable governance record. Both must agree on every shared field.

Metadata

FieldValueNotes
StatusIDEAHuman label IDEA; YAML status: carries the remapped linter enum draft
DRIDimas Fauzi HidayatSingle accountable owner of this RFC. Per-task staffing lives in delivery/, not here.
TeamchatbotAdvisory squad slug carried from the source PRD / initiative README
Author(s)Chatbot Squad (BE + FE)Primary authors
ReviewersChatbot Squad (BE + FE); Data SquadData Squad owns the analytics ingest + comparison metrics
Approver(s)TBD — Chatbot tech lead; TBD — infosec approverinfosec approver required before AGREED
Submitted Date2026-06-20Date RFC opened for discussion
Last Updated2026-06-20Bump on every material edit
Target Release2026-Q3Carried from source PRD
Target Quarter2026-Q3Advisory; carried from source PRD / initiative README
Deliverynot yet handed to deliveryPointer to delivery/ artifacts once handed off
Related../prds/phase-1-per-conversation-split.md, ../README.mdSource PRD + initiative README
Discussion#chatbot-alerts (Slack) — thread TBD

Type: full-stack Frontend sub-type: new-feature Backend sub-type: new-feature

Sections at a Glance

  1. Overview (incl. §1 Design References — FE half, and §1 PRD-to-Schema Derivation — BE half)
  2. Technical Design (Repo Reading Guide for both layers → end-to-end mermaid → DDL → APIs → cross-layer contract verification)
  3. High-Availability & Security
  4. Backwards Compatibility and Rollout Plan (incl. cross-layer rollout matrix, §4 Agent Execution Plan, Verification & Rollback Recipe)
  5. Concern, Questions, or Known Limitations
  6. Comment logs
  7. Ready for agent execution

1. Overview

A Qontak chatbot is all-or-nothing today: once a channel integration has an enabled, matching Path, every new incoming conversation is routed to the bot (is_auto_assign_agent is false/nil on the default path, so the intent_id branch runs in UseCases::System::Hub::ProcessIncomingMessageWithResolve#send_message_assign_agent). A Chatbot Admin who is not yet confident in the bot has no safe way to expose it to a controlled slice of real traffic while humans cover the rest.

Phase 1 adds a per-channel traffic split: the Admin sets an integer bot_percent (0–100) on a channel; for each new conversation the router rolls rand(100) and routes < bot_percent to the bot arm and the rest to the human arm (reusing the existing SendMessageAutoAssignAgentWorker path). Each conversation (Room) is stamped once with variant = bot | human for an apples-to-apples comparison. The split decision is an in-process check inside find_default_path, reading config already loaded on the channel record — no new network/DB round-trip on the hot path.

Success Criteria

  • SC-1 (routing fidelity): Over an active experiment, |observed bot share − configured bot_percent| ≤ 5 percentage points (PRD §11 Routing fidelity).
  • SC-2 (decide-once): A conversation's variant is decided exactly once and never changes for the life of that conversation, even under concurrent first-messages (PRD BTS-S02/AC-3, BTS-S02-NEG/NEG-1, §15 Q4).
  • SC-3 (fail-safe): Any config-read error routes to the bot arm (current default) and logs bot_traffic_split_decision_fallback; no chat is dropped (PRD BTS-S02/ERR-1). Sustained decision_fallback < 1 % (PRD §10).
  • SC-4 (clean human arm): A variant = human chat with no agent available enters the existing queue/offline behavior and is never re-tagged or rescued by the bot (PRD BTS-S03).
  • SC-5 (self-serve config): A Chatbot Admin / Admin can enable the split and persist a validated bot_percent per channel; SPV + Admin can view the comparison; Agents and end-customers cannot (PRD BTS-S01, role model).
  • SC-6 (gated): The control is invisible and the save endpoint returns 403 when the org feature flag is OFF or the plan is ineligible (PRD BTS-S01-NEG, BTS-S01-NEG2).

Out of Scope

Carried verbatim from PRD §4 (Non-Goals). This RFC implements none of the following:

  • Sticky per-customer bucketing / deterministic identity hashing (Phase 2).
  • Mid-conversation re-bucketing (the arm is immutable post-decision).
  • Bot-A-vs-bot-B testing.
  • Automatic winner selection / auto-ramp of bot_percent.
  • Any change to the no-agent-available behavior (no "bot rescues the queue").
  • A statistical-significance engine (raw comparison only).
  • Per-segment / per-topic / attribute-conditioned targeting (flat random %).
DocumentPath / link
Source PRD (Phase 1)../prds/phase-1-per-conversation-split.md
Initiative README../README.md
Backend repochatbot (Rails 7.1, PostgreSQL, Grape, Sidekiq, Karafka)
Frontend repochatbot-fe (Nuxt 4, Vue 3, TypeScript, Pinia, @mekari/pixel3)

Assumptions

  • A-1: The split unit is a channel integration (channel_integrations row). The PRD’s phrase “Path config” is reconciled in ADR-1 — config lives on the channel record, not on individual paths rows. Confirm with PM (§5 Open Q-1).
  • A-2: bot_percent semantics: rand(100) < bot_percentbot; otherwise human (equivalently rand(100) >= bot_percent → human, PRD §5 Determinism). bot_percent = 100 ⇒ today’s behavior; 0 ⇒ all human.
  • A-3: The conversation unit is a Room (acts_as_paranoid); a new variant column on rooms is the per-conversation tag.
  • A-4: The org-level enablement flag bot_traffic_split is modeled as an OrganizationFeature (per-org), optionally fronted by a SystemPreference group_code: 'rollout' global kill-switch (ADR-7).
  • A-5: The comparison view (BTS-S04, Should Have) reads aggregates from rooms (variant, resolved_at, is_closed, assigned_at). The canonical “resolved” definition (PRD §15 Q2), handover derivation, and CSAT source are Data-squad dependencies — see §5 Open Q-2/Q-3.

Dependencies

DependencyLayer / ownerDeliverableBlocking?
ProcessIncomingMessageWithResolve#find_default_path hook pointBE / ChatbotAlready exists (process_incoming_message_with_resolve.rb:488)NO — present
SendMessageAutoAssignAgentWorker + existing queue/offline pathBE / ChatbotReused as-is for the human armNO — present
channel_integrations PATCH endpoint + Grape stackBE / ChatbotNew traffic_split sub-action endpoint (ADR-6)YES — config save
Chatbot settings screen (pages/chat/settings/index.vue)FE / ChatbotNew Traffic Split section + comparison viewYES — config UI in scope
Canonical “resolved” metric segmentable by variantData / ChatbotDefinition + query for the ⭐ resolution-parity KPIYES for BTS-S04 — not for routing
Product analytics ingest of bot_traffic_split_* eventsDataMixpanel ingest + comparison dashboardNO for routing; YES for dashboard parity
CSAT data joinable to variantData / ChatbotCSAT source (not all channels collect it)NO — secondary metric, degrade gracefully

Design References (frontend half — required)

PRD-named surfaceFigma / design linkFrame nameDesign system versionDesign QA contactNotes
Screen A — Traffic Split configuration (PRD §6, §6.1)n/a — design pendingDraft wireframe in PRD §6.1 (Screen A)@mekari/pixel3@^1.0.12 (verified package.json:74)TBD — Chatbot designerLow-fi wireframe only; designer owns Figma frames (PRD §6.1 D1–D4). See §5 Open Q-4.
Screen B — Bot vs Human comparison (PRD §6.1)n/a — design pendingDraft wireframe in PRD §6.1 (Screen B)@mekari/pixel3@^1.0.12TBD — Chatbot designerComparison table; CSAT-not-collected treatment open (PRD §6.1 D3).

No production Figma frame exists yet for either surface — both are n/a — design pending and tracked in §5 Open Q-4. Frontend chunks that depend on pixel-exact frames (visual polish, empty/error illustration) must not start against imagined designs; the structural build (controls, validation, states) proceeds from the PRD §6.1 wireframes + verified Pixel 3 components.

PRD-to-Schema Derivation (backend half — required)

PRD-described entity / attribute / rulePersisted as (table.column)Exposed via (endpoint / event)Enforced whereSource
Split enabled per channelchannel_integrations.traffic_split_enabled boolean default falsePATCH /v1/channel_integrations/:id/traffic_split; GET /v1/channel_integrations/:idChannelIntegration::UpdateTrafficSplit use case + DB defaultPRD §5, §7 #1
Bot percentage 0–100channel_integrations.bot_percent integer default 100same as aboveGrape param validation (values: 0..100, Integer) + use case + DB CHECKPRD §5, §7 #1, BTS-S01
Per-conversation arm tagrooms.variant string (bot/human, null until decided)bot_traffic_split_assigned event; comparison endpointAtomic decide-once update in find_default_pathPRD §5.1, §7 #2, BTS-S02
Arm decision rulenot persisted (in-process rand(100))bot_traffic_split_assignedfind_default_path split step (new)PRD §5 Determinism, BTS-S02/AC-1,2
Decide-once / no re-rollguard on rooms.variant IS NULLUPDATE … WHERE id = ? AND variant IS NULLPRD §15 Q4, BTS-S02/AC-3
Fail-safe to bot— (defaults to existing intent_id flow)bot_traffic_split_decision_fallbackrescue around split stepPRD §7 #2 failure, BTS-S02/ERR-1
Human arm queue/offline reusereuses rooms.is_assign_agent_offline, assignment colsSendMessageAutoAssignAgentWorkerexisting worker (unchanged)PRD §7 #3, BTS-S03
Org-level eligibilityorganization_features (feature.code = 'bot_traffic_split', enabled)gate on endpoint + FE visibilityuse case guard + FE checkSubscriptionPRD §5 flag, BTS-S01-NEG/NEG2
Config-save auditbot_traffic_split_config_saved event (+ PaperTrail on channel_integrations)MixpanelSendMixpanelEventWorkerPRD §10

Every §2.3 DDL row and every §2.4 endpoint traces back to a row here or a Design Reference frame above. Missing trace = blocker (none open for routing).

Detail 1.A — PRD Traceability (cross-layer)

Cite the PRD’s composite AC ids (<STORY-ID>/AC-n).

Forward (PRD AC → RFC):

PRD composite AC idFE section / componentBE section / endpoint
BTS-S01/AC-1,/AC-2,/AC-3TrafficSplitSection.vue (toggle + % input + preview + Save)PATCH /v1/channel_integrations/:id/traffic_split (§2.4)
BTS-S01/ERR-1Vuelidate integer + between(0,100) inline errorparam validation values: 0..100 (§2.4)
BTS-S01/ERR-2toast + Retry; emit bot_traffic_split_save_failed5xx surfaced via Dry::Matcher failure (§2.4)
BTS-S01 permission / BTS-S01-NEG,-NEG2section hidden via checkSubscription / role getterset_role + feature-flag guard → 403 (§3 AuthZ)
BTS-S02/AC-1,/AC-2,/AC-4n/a — backend routingfind_default_path split step (§2.1, §2.2)
BTS-S02/AC-3, BTS-S02-NEG/NEG-1n/adecide-once guard on rooms.variant (§2.1, §2.E)
BTS-S02/ERR-1n/arescue → bot arm + decision_fallback (§3.A)
BTS-S03/AC-1,/AC-2,/ERR-1n/areuse SendMessageAutoAssignAgentWorker (§2.F)
BTS-S04/AC-1,/AC-2,/ERR-1BotHumanComparison.vue (pixel-table + states)GET /v1/channel_integrations/:id/traffic_split/comparison (§2.4)

Reverse (RFC → PRD AC):

New FE component / BE endpoint / dependencyPRD composite AC id it serves
TrafficSplitSection.vueBTS-S01/AC-1..3, ERR-1..2
BotHumanComparison.vueBTS-S04/AC-1..2, ERR-1
PATCH …/traffic_splitBTS-S01/AC-1..3, BTS-S01-NEG/NEG-1, BTS-S01-NEG2/NEG-1..2
find_default_path split step + rooms.variantBTS-S02/AC-1..4, BTS-S02-NEG/NEG-1
rescue → fail-safeBTS-S02/ERR-1
SendMessageAutoAssignAgentWorker (reused)BTS-S03/AC-1..2, ERR-1
GET …/comparisonBTS-S04/AC-1..2, ERR-1

UI / Consumer Surface Coverage

PRD-named surfaceConsumerRequired reads (BE)Required writes (BE)FE componentStatus surface
Traffic Split configuration (PRD §6)webGET /v1/channel_integrations/:id (traffic_split_enabled,bot_percent)PATCH …/traffic_splitTrafficSplitSection.vuetoast + persisted traffic_split_enabled/bot_percent
Bot vs Human comparison (PRD §6.1 B)webGET …/traffic_split/comparisonn/aBotHumanComparison.vueper-arm rows + Updated <ts> + empty/error states
Incoming-chat routingsystem (no UI)channel config (already loaded)rooms.variant (atomic)n/a — backendvariant on Room; bot_traffic_split_assigned

Role Coverage

PRD roleAuthorization mechanismEndpoints permitted (BE)UI surface visibility (FE)Cross-tenant?Audit trail
Chatbot Admin / Admin (owner,admin)write: set_role(%w[owner admin]); read: set_role(%w[owner supervisor admin]) + Middlewares::Ownership + feature flagPATCH …/traffic_split (configure), GET … (view)section + comparison visible & editableno — own org onlyPaperTrail + bot_traffic_split_config_saved
SPV (supervisor)read set_role(%w[owner supervisor admin]); excluded from the write endpoint's set_role(%w[owner admin]) (REV-9)GET …/comparison, GET … (view)comparison visible; config read-onlynoview only
Human Agentnot in role setnonesection + comparison not renderedn/an/a
End-customerunauthenticated to admin APInonen/an/an/a

Cross-layer note: the existing channel endpoints gate owner/supervisor/admin together (channel_integration.rb:89). The PRD splits configure (Admin) from view (Admin + SPV). Decided (REV-9): the configure endpoint uses set_role(%w[owner admin]) (SPV excluded); read/comparison keep owner/supervisor/admin — see ADR-6 and §3 Security.

PRD Section Coverage

PRD §TitleWhere covered
HEADER / 2One-liner + Problem§1 Overview
3Target Users + Persona§1 (roles) ; Detail 1.A Role Coverage
4Non-Goals§1 Out of Scope
5Constraints§1 Assumptions; §2.3 DDL; §3 Performance/Security; ADR-1..9
5.1Data Lifecycle§2.3 Per-status/retention; ADR-3 (acts_as_paranoid)
6 / 6.1New Features + Design Draft§1 Design References; §2.A UI Contract; §2.C State Matrix
7API & Webhook Behavior§2.4 APIs; §2.2 Sequence; §3.A Failure Catalog
8.1System Flow§2.1 Architecture + Branch/skip; §2.2 Sequence
8.2User Stories + ACsDetail 1.A, Detail 1.C
9Rollout§4 Rollout Strategy (technical mechanics only)
10 / 10.1Observability + cadence§3 Monitoring & Alerting; §4.E signals
11Success Metrics§1 Success Criteria; §2.4 comparison endpoint
12Launch Plan & Stage Gatesn/a — delivery/ (TPM-owned, not in RFC)
13Dependencies§1 Dependencies; §2.F.1 Responsibility Boundary
14Key Decisions + AlternativesDetail 1.B + ADR-1..9
15Open Questions§5 Concerns/Questions

Detail 1.B — Decisions Closed (cross-layer)

Full ADR blocks (context / options / decision / consequences / reversibility) are in §2.1a Architecture Decision Records. This is the index.

DecisionChosen optionAlternatives rejectedWhy rejectedLayer
ADR-1 Config storageNew typed columns on channel_integrations(a) columns on paths; (b) channel_integrations.settings JSON(a) many paths per channel → fragmented % ; (b) JSON not indexable/typed, routing flags here are columns (is_auto_assign_agent on paths)BE
ADR-2 BucketingIn-process rand(100) < bot_percent, not persistedpersisted roll; weighted tableno value persisting an ephemeral roll; adds writes on hot pathBE
ADR-3 Decide-once + route-by-persisted-variantAtomic UPDATE … WHERE variant IS NULL, then re-read + apply_arm; tag only under an active splitre-roll every message; advisory lock; route per own rollre-roll breaks comparison; per-own-roll routing diverges under race (REV-1); lock heavier than a conditional updateBE
ADR-4 Fail-safeRescue → bot, variant stays NULL + logfail to human; raise; tag as botdropping a chat is worse than today; tagging a fail-safe as a bot win pollutes the KPI (REV-3)BE
ADR-5 Human-arm hookSet is_auto_assign_agent=true, clear intent_id/agent_id/division_id/crm_intent_idnew worker; new branchreuse existing is_auto_assign_agent branch (:545) keeps queue/offline behavior identicalBE
ADR-6 Config APINew dedicated PATCH …/traffic_split sub-actionextend PATCH :id (full-payload)existing patch ':id' requires name,enabled,… (full update); partial toggle deserves its own action (mirrors patch ':id/publish')BE+FE
ADR-7 Org gateOrganizationFeature bot_traffic_split + optional SystemPreference rollouthardcoded org list; ENV flagmatches existing per-org feature + global rollout patternsBE+FE
ADR-8 Comparison source + pinned schemaAggregate rooms WHERE variant IS NOT NULL; typed v1 schema with self-labelled proxiesfull warehouse query; new events table; untyped schemarooms already hold variant/resolved_at/assigned_at; warehouse is a Data dependency (Open Q-2/3); typed contract unblocks the agent (REV-4)BE+Data
ADR-9 Split scopeApply split only to plain bot-intent pathssplit every path; treat agent/CRM as humanagent/division/CRM paths route deterministically (:84,:501-502); splitting them changes configured behavior (REV-2)BE

Minimum-coverage closure:

  • Per-status lifecycle: rooms.variant has no lifecycle of its own — set once, immutable, lives & soft-deletes with the Room (acts_as_paranoid). Config flags are booleans/int with no state machine. (ADR-1, ADR-3.)
  • Soft vs hard delete: inherits Room acts_as_paranoid (no separate cleanup); config columns persist with the channel. (PRD §5.1.)
  • Cross-squad responsibility: Chatbot owns routing + config + event emit; Data owns ingest + canonical resolution definition + CSAT join (§2.F.1).
  • Inbound webhook ownership: n/a — no new inbound webhook; the trigger is the existing incoming-message hub flow.
  • Opt-out / skip / branch: “split disabled” and “fail-safe to bot” branches (§2.1 Branch/skip; §3.A.1).
  • Reuse-vs-new per endpoint: see §2.4 Reuse? column (1 extended, 2 new-with-justification).
  • FE/BE disagreement risk: snake_case API (bot_percent, traffic_split_enabled) ↔ FE consumes as-is (no camel transform needed, see §2.G). Error shape = existing error_response envelope.

Detail 1.C — Per-Story Change Map (organised by user story)

Story idTitleLayer scopeFE changesBE changesComposite AC idsAcceptance criteria (verifiable)RFC anchors
BTS-S01Configure split for a channelFE + BETrafficSplitSection.vue; channel-integration.ts updateTrafficSplit; store action; Vuelidate (integer,between(0,100)); toastchannel_integrations.traffic_split_enabled,bot_percent; PATCH …/traffic_split; UpdateTrafficSplit use case; PaperTrail; config_saved eventBTS-S01/AC-1..3, ERR-1..2RSpec request spec asserts 200 persists bot_percent=30; invalid → 422; Vitest mounts section, invalid blocks Save; toast asserted§2.3 · §2.4 · §2.A · §2.C · §4.D #1,#3,#6
BTS-S02Route incoming chat by split + tag armRuntime / behavior (BE)n/a — BE-onlysplit step in find_default_path; rooms.variant; atomic decide-once; assigned + decision_fallback eventsBTS-S02/AC-1..4, ERR-1; BTS-S02-NEG/NEG-1RSpec: stub rand→bot/human arms; second message keeps variant; disabled→bot; config error→bot + fallback log§2.1 · §2.2 · §2.3 · §2.E · §4.D #2,#4
BTS-S03Human-arm chat, no agent → queueRuntime / behavior (BE)n/a — BE-onlyno new code — reuse SendMessageAutoAssignAgentWorker offline/queue branchBTS-S03/AC-1..2, ERR-1RSpec: variant=human + agent → assigned; no agent → existing queue path; bot never re-takes; variant stays human§2.F · §2.F.1 · §4.D #5
BTS-S04Compare bot vs human per armFE + BEBotHumanComparison.vue (pixel-table, loading skeleton, empty, error+Retry)GET …/traffic_split/comparison aggregating rooms by variantBTS-S04/AC-1..2, ERR-1RSpec returns per-arm resolution/handover; bot-arm 0 rows → no_data:true; Vitest renders rows + "No data yet"§2.4 · §2.A · §2.C · §4.D #7,#8
BTS-S01-NEGControl hidden when flag OFFFE + BEsection gated by checkSubscription('bot_traffic_split')feature-flag guard on endpoint → 403BTS-S01-NEG/NEG-1Vitest: flag false → not rendered; RSpec: direct save → 403§3 AuthZ · §3.A.1 · §4.D #1,#3
BTS-S02-NEGNo mid-conversation re-bucketingRuntime / behavior (BE)n/adecide-once guard (same as S02)BTS-S02-NEG/NEG-1RSpec: existing variant=bot room, new message → no re-roll, stays bot§2.1 · §2.E · §4.D #2
BTS-S01-NEG2Ineligible plan refusedFE + BEsection hidden when plan ineligibleplan/feature guard → 403 on direct saveBTS-S01-NEG2/NEG-1..2RSpec: ineligible org → 403, nothing persisted§3 AuthZ · §4.D #3 · §5 Open Q-1

Coverage rule satisfied: all 7 PRD stories appear exactly once. Every FE + BE row fills both halves; runtime/behavior rows are BE-only by nature.


2. Technical Design

Detail 2.0 — Repo Reading Guide

Repo Map (mermaid, both layers)

flowchart LR
subgraph fe["chatbot-fe (Nuxt 4 / Vue 3 / Pinia)"]
page["pages/chat/settings/index.vue"]
section["modules/settings/.../TrafficSplitSection.vue (new)"]
compare["modules/settings/.../BotHumanComparison.vue (new)"]
svc["common/services/main/v1/channel-integration.ts"]
store["store/* (Pinia, extractStore)"]
end
subgraph be["chatbot (Rails 7.1 / Grape / Sidekiq)"]
api["app/api/frontend_service/v1/channel_integration.rb"]
uc_cfg["UseCases::API::...::ChannelIntegration::UpdateTrafficSplit (new)"]
hub["UseCases::System::Hub::ProcessIncomingMessageWithResolve"]
fdp["#find_default_path (:488)"]
smaa["#send_message_assign_agent (:534)"]
worker["SendMessageAutoAssignAgentWorker"]
mix["SendMixpanelEventWorker"]
end
subgraph infra
pg[("PostgreSQL: channel_integrations, rooms, paths, organization_features")]
sq[["Sidekiq (queues: default, event_tracker)"]]
mp(("Mixpanel"))
end
page --> section --> store --> svc --> api --> uc_cfg --> pg
page --> compare --> store
hub --> fdp --> smaa
fdp --> pg
smaa --> worker --> sq
fdp --> mix --> sq --> mp

Existing Code Anchors

LayerPathWhy the agent reads itWhat pattern it teaches
BEapp/core/use_cases/system/hub/process_incoming_message_with_resolve.rbThe routing brain; split decision lands in find_default_path (:488), branch in send_message_assign_agent (:534)Use-case orchestration, attr_accessor routing state, Repositories::* calls
BEapp/models/path.rb + db/schema.rb (create_table "paths")Path columns: is_auto_assign_agent, intent_id (null:false), channel_integration_id, is_defaultWhy config can’t live per-path (ADR-1)
BEapp/models/room.rb (acts_as_paranoid at :4) + db/schema.rb (rooms)Conversation record; target for variant; has resolved_at,assigned_at,is_closed,path_idSoft-delete model; columns reused by comparison
BEapp/models/channel_integration.rb + schemaConfig home (ADR-1); has_many :paths, settings JSONApplicationRecord model conventions
BEapp/api/frontend_service/v1/channel_integration.rb (patch ':id' :88, patch ':id/publish' :153)Grape endpoint + set_role + Dry::Matcher::ResultMatcher + dedicated sub-action precedentEndpoint shape to mirror for …/traffic_split
BEapp/workers/send_message_auto_assign_agent_worker.rb (include Sidekiq::Worker :4)Human-arm assignment + queue/offline branch (reused unchanged)Worker shape; no edit needed (BTS-S03)
BEapp/workers/send_mixpanel_event_worker.rb (queue: :event_tracker, perform(distinct_id,event_name,params))Analytics emit pattern for all bot_traffic_split_* eventsSendMixpanelEventWorker.perform_async(org_id, 'Event', payload)
BEapp/models/organization_feature.rb + app/models/system_preference.rbPer-org feature gate + global rollout flag (ADR-7)OrganizationFeature.exists?(...), SystemPreferences::FindBy
FEpages/chat/settings/index.vue (MpTabs, panel refs)Host screen for the new section + comparisonTab/panel composition, unsaved-changes modal
FEmodules/settings/views/ai-assist.vue (MpInput type="number", MpInputGroup, MpInputRightAddon, useVuelidate, required/minValue/integer)The closest existing numeric-input + validated formVuelidate form + Pixel input-group pattern to copy
FEcommon/services/main/v1/channel-integration.ts (updatePATCH /v1/channel_integrations/:id) + endpoint.tsAPI client to extend with updateTrafficSplit + comparisonofetch service wrapper, AbortController, endpoint map
FEstore/ai-agent/{index,state,actions}.ts (extractStore, fetchStatus)Pinia store pattern for async save/fetchstate.$patch({ … fetchStatus }) lifecycle
FEcommon/composables/useSubscription.ts (checkSubscription(feature))Feature-flag gate for showing the sectionsubscriptionData.features.some(code===…&&enabled)
FEmodules/settings/views/qontak-crm/qontak-crm-list.vue (pixel-table, :empty-state)Table pattern for the comparison viewheader-list/data-list, empty-state prop, MpBadge
FEplugins/toast.ts ($toast({type,message}))Success/error toasts for Savetoast invocation

Existing Contracts to Reuse, Extend, or Replace (BE)

ContractStatusJustificationOwner
GET /v1/channel_integrations / GET /v1/channel_integrations/:idextendadd traffic_split_enabled,bot_percent to response entityChatbot
PATCH /v1/channel_integrations/:id/traffic_splitnew-with-justificationexisting patch ':id' requires the full update payload (name,enabled,timezone,…); a partial toggle needs its own action, mirroring the existing patch ':id/publish' sub-actionChatbot
GET /v1/channel_integrations/:id/traffic_split/comparisonnew-with-justificationno existing endpoint segments room outcomes by experiment arm; report APIs (report.rb,custom_report.rb) are not arm-awareChatbot + Data
SendMessageAutoAssignAgentWorkerreusehuman-arm assignment + queue/offline is exactly today’s behavior (BTS-S03, ADR-5)Chatbot
SendMixpanelEventWorkerreuseemit all bot_traffic_split_* eventsChatbot
find_default_path / send_message_assign_agentextendinject split step + variant stamping; reuse is_auto_assign_agent branchChatbot

Patterns to Follow (and where to find them)

LayerConcernPattern in repoReference fileDeviation?
FEState managementPinia extractStore + fetchStatusstore/ai-agent/actions.tsnone
FEError / toast / retry$toast({type,message})plugins/toast.tsnone
FEForm validationuseVuelidate + MpFormControl/Label/ErrorMessagemodules/settings/views/ai-assist.vuenone
FENumeric % inputMpInputGroup+MpInput type="number"+MpInputRightAddonmodules/settings/views/ai-assist.vuenone
FETable + empty statepixel-table with :empty-statemodules/settings/views/qontak-crm/qontak-crm-list.vuenone
FEFeature gatecheckSubscription(feature)common/composables/useSubscription.tsnone
BEHTTP handler shapeGrape params do … end + set_role + Dry::Matcher::ResultMatcherapp/api/frontend_service/v1/channel_integration.rbnone
BESub-action endpointpatch ':id/publish'channel_integration.rb:153none
BERepository / DB accessRepositories::* + dry-monadsapp/core/repositories/**none
BEAsync workerinclude Sidekiq::Worker + sidekiq_options queue:app/workers/send_mixpanel_event_worker.rbnone
BEAnalytics emitSendMixpanelEventWorker.perform_async(org_id, name, payload)process_incoming_message_with_resolve.rb:1234none
BEPer-org feature flagOrganizationFeature.exists?(feature_id:, organization_id:, enabled: true)app/core/repositories/qontak_billing/active_subscription_status.rb:23none
BEGlobal rollout flagRepositories::SystemPreferences::FindBy.new({code:, group_code:'rollout', enabled:true})process_incoming_message_with_resolve.rb:120none
CrossAPI casingsnake_case JSON (bot_percent) consumed as-is on FEchannel-integration.ts payloadsnone — no transform needed

Reading Order for the Agent

  1. app/core/use_cases/system/hub/process_incoming_message_with_resolve.rb (:261 room find/create, :270 call, :488 find_default_path, :534 send_message_assign_agent) — where routing + the split decision live.
  2. db/schema.rbcreate_table "paths", "rooms", "channel_integrations" — exact columns; confirm no JSON config on paths.
  3. app/models/room.rbacts_as_paranoid; where variant is added.
  4. app/api/frontend_service/v1/channel_integration.rb (:88 patch ':id', :153 patch ':id/publish') — endpoint shape to mirror.
  5. app/workers/send_message_auto_assign_agent_worker.rb — confirm human-arm/offline path is reused unchanged.
  6. app/workers/send_mixpanel_event_worker.rb — event emit signature.
  7. app/core/repositories/qontak_billing/active_subscription_status.rb — feature-gate pattern.
  8. pages/chat/settings/index.vue + modules/settings/views/ai-assist.vue — FE host + validated-form pattern.
  9. common/services/main/v1/channel-integration.ts + endpoint.ts — API client to extend.
  10. common/composables/useSubscription.ts — FE feature gate.

Source Verification (anti-hallucination — required)

LayerAnchor / pattern / contractVerified byEvidence
BEfind_default_pathreaddef find_default_path at process_incoming_message_with_resolve.rb:488; sets self.intent_id = @path.try(:intent_id) (:499), self.is_auto_assign_agent = @path.try(:is_auto_assign_agent) (:503)
BEbranch order in send_message_assign_agentread:534 def send_message_assign_agent; order crm_intent_idagent_iddivision_idelsif is_auto_assign_agent (:545 → SendMessageAutoAssignAgentWorker.perform_async)→elsif intent_id.present?
BERoom found/created then routedreadRepositories::Rooms::FindOrCreateBy.new(channel_integration_id:…, channel_room_id: room['id'], …).call (:261); find_default_path (:270)
BErooms.variant target + soft deletereadclass Room < ApplicationRecord + acts_as_paranoid (room.rb:3-4); rooms has resolved_at,assigned_at,is_closed,path_id,deleted_at (schema) — no variant yet
BEpaths has no JSON config; routing flags are columnsreadcreate_table "paths": t.boolean "is_auto_assign_agent", t.bigint "intent_id", null: false, t.boolean "is_default", no JSON column
BEconfig home channel_integrationsreadapp/models/channel_integration.rb has_many :paths; schema channel_integrations has t.json "settings"
BEconfig endpoint shapereadpatch ':id' do set_role(%w[owner supervisor admin]) (channel_integration.rb:88-89) → UseCases::API::FrontendService::V1::ChannelIntegration::Update; dedicated patch ':id/publish' (:153)
BEworker reusereadclass SendMessageAutoAssignAgentWorker … include Sidekiq::Worker (send_message_auto_assign_agent_worker.rb:3-4); def perform(channel_type_id, history_id, raw_params)
BEanalytics emitreadclass SendMixpanelEventWorker … sidekiq_options queue: :event_tracker; def perform(distinct_id, event_name, params); live call SendMixpanelEventWorker.perform_async(channel_integration.organization_id, 'Process Message', …) (:1234)
BEper-org flagreadOrganizationFeature.exists?(feature_id: feature.id, organization_id: @organization_id, enabled: true) (active_subscription_status.rb:23)
BEglobal rollout flagreadRepositories::SystemPreferences::FindBy.new({ code: 'ai_assist_image_processing', group_code: 'rollout', enabled: true }).call (:120)
BEtest runnerreadAGENTS.md: rspec, rspec spec/path/to/file_spec.rb:42, bundle exec rspec; migrations ActiveRecord::Migration[7.1], db/schema.rb, PostgreSQL
FEhost settings pagereadpages/chat/settings/index.vue <MpTabs id="chat-settings-tab-list" …>, panel refs
FEdesign system + versionreadpackage.json:74 "@mekari/pixel3": "^1.0.12"
FEvalidated numeric formreadai-assist.vue import { useVuelidate } …; import { required, helpers, minValue, integer } from '@vuelidate/validators'; <MpInput v-model="state.reply_limit" type="number"> inside MpInputGroup+MpInputRightAddon
FEAPI clientreadchannel-integration.ts update(...)$apiMain(endpoint.v1.channel_integrations.update.replace(':id', payload.id), { method: 'PATCH', body }); endpoint.ts channel_integrations.update: "/v1/channel_integrations/:id"
FEstore patternreadstore/ai-agent/index.ts extractStore(...); actions.ts state.$patch({ … fetchStatus: 'pending'/'resolved'/'rejected' })
FEfeature gatereaduseSubscription.ts checkSubscriptionsubscriptionFeature.some(item => item.code?.toLowerCase()===feature && item.enabled)
FEtable + empty statereadqontak-crm-list.vue <pixel-table … :empty-state="emptyState"> ; MpBadge usage
FEtoastreadplugins/toast.ts rootPiniaStore.pushToast(opts); usage $toast({ type, message })
FEtest commandsreadpackage.json scripts "test": "vitest run", "test:e2e": "playwright test", "lint"; Playwright specs under tests/visual/
FE/BEi18ngrepNOT FOUND — no vue-i18n/useI18n/locale files; FE strings hardcoded (affects new copy — §2.J / Open Q-6)
BEexisting rand() A/B bucketinggrepNOT FOUND — no prior rand() bucketing; ADR-2 introduces it (cite this RFC)
BE/Datacanonical "resolved" def, handover derivation, CSAT sourceNOT VERIFIED in repo — Data dependency; moved to §5 Open Q-2/Q-3 (not invented)

Design ↔ Code Mapping (frontend half — required)

Figma frame / componentImplementing fileReuse vs newDesign tokens usedBacking API endpoint(s)Deviation from design
Screen A — Traffic Split (PRD §6.1 wireframe)modules/settings/.../TrafficSplitSection.vue (new)new (composed of reused Pixel 3 components)Pixel 3 semantic tokens (v2.4) — resolve via Pixel MCP get-component before build; brand reserved for primary SaveGET /v1/channel_integrations/:id, PATCH …/traffic_splitn/a — design pending (wireframe only; no Figma yet — Open Q-4)
Screen B — Comparison (PRD §6.1 wireframe)modules/settings/.../BotHumanComparison.vue (new)new (reuses pixel-table, MpBadge, MpSelect)Pixel 3 semantic tokens; brand on the single resolution-parity KPI per PRD §6.1GET …/traffic_split/comparisonn/a — design pending (Open Q-4; CSAT-absent treatment Open Q-3)

Both frames are design-pending. Per §1 Design References, build structure from the wireframe + verified Pixel 3 components; gate pixel-polish on designer frames. Pixel 3 props/variants must be confirmed via the Pixel MCP before coding (FE AGENTS.md convention), not guessed.

Detail 2.1 — Architecture (mermaid)

End-to-end component diagram

flowchart TB
cust([Incoming customer message]) --> hub[ProcessIncomingMessageWithResolve#result]
hub --> findroom[Repositories::Rooms::FindOrCreateBy :261]
findroom --> fdp[#find_default_path :488]
fdp --> resolve[Resolve Path: intent_id, is_auto_assign_agent,\nagent_id :501, division_id :502, crm_intent_id :84]
resolve --> scope{plain bot-intent path?\nno agent_id/division_id/crm_assignment}
scope -- no --> passthru[No split: keep path deterministic routing\nvariant stays NULL ADR-9/REV-2]
scope -- yes --> split{traffic_split_enabled?\nchannel already loaded}
split -- no --> nosplit[No split: 100% bot, keep intent_id\nvariant stays NULL ADR-3/REV-3]
split -- yes --> decided{room.variant already set?}
decided -- yes --> reread
decided -- no --> roll[rand 0..99]
roll --> cmp{roll < bot_percent?}
cmp -- yes --> setbot[arm = bot]
cmp -- no --> sethuman[arm = human]
setbot --> stamp
sethuman --> stamp
resolve -. config read error .-> failsafe[Fail-safe: arm=bot + log decision_fallback\nvariant stays NULL]
stamp["Atomic UPDATE rooms SET variant\nWHERE id=? AND variant IS NULL"] --> reread[Re-read rooms.variant = canonical arm]
reread --> apply["apply_arm(canonical): bot -> keep intent_id;\nhuman -> is_auto_assign_agent=true, clear\nintent_id/agent_id/division_id/crm_intent_id"]
apply --> emit[[SendMixpanelEventWorker: bot_traffic_split_assigned]]
apply --> smaa[#send_message_assign_agent :534]
passthru --> smaa
nosplit --> smaa
failsafe --> smaa
smaa -- intent_id branch --> botsend[SeparateSendMessageWorker bot reply]
smaa -- is_auto_assign_agent branch :545 --> worker[[SendMessageAutoAssignAgentWorker]]
worker --> agentq{agent available?}
agentq -- yes --> assign[AssignAgentWorker]
agentq -- no --> queue[Existing queue / offline behavior\nbot does NOT take over]

subgraph config["Config path (FE → BE)"]
admin([Chatbot Admin]) --> sec[TrafficSplitSection.vue]
sec --> svc[channel-integration.ts updateTrafficSplit]
svc --> ep[/PATCH /v1/channel_integrations/:id/traffic_split/]
ep --> uc[UpdateTrafficSplit use case + flag/role guard]
uc --> pg[("channel_integrations")]
end

Data model (mermaid erDiagram)

erDiagram
CHANNEL_INTEGRATIONS ||--o{ PATHS : "has_many"
CHANNEL_INTEGRATIONS ||--o{ ROOMS : "has_many"
ORGANIZATIONS ||--o{ CHANNEL_INTEGRATIONS : "owns"
ORGANIZATIONS ||--o{ ORGANIZATION_FEATURES : "has_many"

CHANNEL_INTEGRATIONS {
bigint id PK
bigint organization_id FK
json settings
boolean traffic_split_enabled "NEW default false"
integer bot_percent "NEW default 100, CHECK 0..100"
}
ROOMS {
bigint id PK
bigint channel_integration_id FK
bigint organization_id FK
integer path_id
string variant "NEW: bot|human, null until decided"
datetime resolved_at
datetime assigned_at
boolean is_closed
datetime deleted_at "acts_as_paranoid"
}
PATHS {
bigint id PK
bigint channel_integration_id FK
bigint intent_id "null:false"
boolean is_auto_assign_agent
boolean is_default
}
ORGANIZATION_FEATURES {
bigint organization_id FK
bigint feature_id FK
boolean enabled
}

State machine for rooms.variant

stateDiagram-v2
[*] --> Undecided: Room created (variant IS NULL)
Undecided --> Undecided: split disabled / non-bot path / fail-safe\n(routes to bot but stays NULL — NOT an experiment arm)
Undecided --> Bot: split ENABLED on a bot path & rand<bot_percent
Undecided --> Human: split ENABLED on a bot path & rand>=bot_percent
Bot --> Bot: subsequent messages (NO re-roll)
Human --> Human: subsequent messages (NO re-roll; queue/offline unchanged)
Bot --> [*]: Room soft-deleted (acts_as_paranoid)
Human --> [*]: Room soft-deleted

variant is write-once and only set under an active split (REV-3): a conversation routed to the bot because the split is disabled (or because the path is non-bot, or a fail-safe) stays NULL so it never pollutes the experiment comparison. There is no Bot↔Human transition; a bot→human handover is a separate assignment event, not a variant change (PRD BTS-S02 permission model; emits bot_arm_handover_to_human, not a re-tag).

Branch & skip flow (non-error policy branches)

flowchart TD
trigger([find_default_path reaches split step]) --> botpath{plain bot-intent path?}
botpath -- no --> skipscope[Skip split → keep deterministic routing\nvariant stays NULL]
botpath -- yes --> flag{traffic_split_enabled?}
flag -- no --> skipbot[Skip split → bot 100%, keep intent_id\nvariant stays NULL — not an experiment arm]
flag -- yes --> roomcheck{room.variant present?}
roomcheck -- yes --> skiproll[Skip roll → re-read + apply stored variant]
roomcheck -- no --> doroll[roll rand 0..99 → stamp + re-read + apply_arm]
skipscope --> done([continue to send_message_assign_agent])
skipbot --> done
skiproll --> done
doroll --> done

Detail 2.1a — Architecture Decision Records (ADR-format)

ADR-1 — Where the split config lives. Context: PRD §5/§7 calls it “Path config”, but the scope unit is the channel integration and a channel can own many paths (keyword/schedule/default). The decision runs inside find_default_path, which already holds the channel_integration record. Options: (a) new columns on channel_integrations; (b) new columns on paths; (c) key in channel_integrations.settings JSON. Decision: (a)traffic_split_enabled boolean default false + bot_percent integer default 100 on channel_integrations. Consequences: one source of truth per channel; zero extra round-trip (the record is already loaded); the config UI keeps editing channel_integrations. The PRD wording “Path config” is reconciled to “channel config”. (b) would fragment the percentage across paths; (c) loses typing/indexing and diverges from the existing pattern where routing flags (is_auto_assign_agent) are columns. Reversibility: Medium — moving to per-path later is a migration + read-site change. Confirm scope with PM (§5 Open Q-1) before building.

ADR-2 — Random per-conversation bucketing. Context: PRD §5 mandates rand(100) per conversation, no identity hashing (Phase 2). No existing rand() bucketing in the repo (grep: NOT FOUND). Decision: in-process rand(100) < bot_percent → bot; else human. The roll is not persisted (only the resulting variant is). Consequences: no hot-path write beyond the single variant stamp; statistically converges to bot_percent (SC-1). Reversibility: High — swap the bucketing function for Phase 2 hashing without schema change.

ADR-3 — Decide-once / concurrency / route-by-persisted-variant (REV-1, REV-3). Context: PRD §15 Q4 — two near-simultaneous first messages on the same new Room (found via Rooms::FindOrCreateBy, :261) could both roll. Each message is a separate use-case invocation that must also route (reply or assign), so the tag alone is insufficient — both messages must route the same way. Decision: (1) Only run the roll when the split is active for this conversation (bot path + traffic_split_enabled true) and variant IS NULL; otherwise route to the bot and leave variant NULL (REV-3 — a split-disabled / non-bot / fail-safe bot reply is not an experiment arm and must not enter the comparison). (2) Stamp atomically: UPDATE rooms SET variant = ? WHERE id = ? AND variant IS NULL. (3) Re-read rooms.variant and derive the routing fields from a single apply_arm(canonical_variant) step — so the race loser (whose conditional update matched 0 rows) routes per the persisted arm, never per its own roll. The roll result is advisory until the re-read. Consequences: exactly-once tag and consistent routing for every message in the conversation, without an advisory lock; safe under the existing Rooms::FindOrCreateBy path. Reversibility: High.

ADR-4 — Fail-safe to bot. Context: PRD §7 #2 / BTS-S02 ERR-1 — never drop a chat for the experiment. Decision: wrap the split step in a rescue; on any error/ambiguous config → keep the existing bot flow (intent_id unchanged) and leave variant NULL (consistent with REV-3 — a fail-safe is not a measured bot arm), emit bot_traffic_split_decision_fallback. Consequences: worst case == today’s behavior; observable via the fallback metric + alert (PRD §10); fallback conversations are excluded from the comparison rather than silently counted as bot wins. Reversibility: High.

ADR-5 — Human-arm hook reuses is_auto_assign_agent. Context: send_message_assign_agent branches in order crm_intent_id (:536) → agent_id (:539) → division_id (:543) → is_auto_assign_agent (:545) → intent_id (:548). The use case sets crm_intent_id at :84 (when @path.crm_assignment), and agent_id/division_id from the path at :501-502. Decision: the human arm is applied by apply_arm(:human) (ADR-3): set self.is_auto_assign_agent = true and clear self.intent_id, self.agent_id, self.division_id, self.crm_intent_id so the existing is_auto_assign_agent branch (:545) fires SendMessageAutoAssignAgentWorker. Because routing is derived from the persisted variant (ADR-3), the race loser also runs apply_arm and clears the same fields — no message escapes as a bot reply. Consequences: zero change to assignment/queue/offline behavior (BTS-S03, Non-Goal 5). Must clear the higher-precedence fields or the wrong branch wins. Reversibility: High — remove the override.

ADR-6 — Dedicated config endpoint. Context: existing patch ':id' requires the full payload (name, enabled, timezone, …); a partial toggle of two fields shouldn’t require resending all. There is a precedent sub-action patch ':id/publish'. Decision: add PATCH /v1/channel_integrations/:id/traffic_split (use case ChannelIntegration::UpdateTrafficSplit), set_role(%w[owner admin]) for the write (SPV is view-only — see Open Q-5), feature-flag + plan guard, params traffic_split_enabled: Boolean, bot_percent: Integer, values: 0..100. Consequences: clean partial update + tight authZ; one new route + use case + response entity. Reversibility: High.

ADR-7 — Org gate via OrganizationFeature (+ optional rollout SystemPreference). Context: PRD §5 — bot_traffic_split_enabled per org, default OFF, enabled during rollout. Repo has OrganizationFeature (per-org) and SystemPreference group_code:'rollout' (global). Decision: per-org OrganizationFeature with feature.code = 'bot_traffic_split'; an optional global SystemPreference rollout flag acts as a kill-switch checked first. Both the FE section visibility and the BE write/read endpoints honor the gate. Consequences: matches existing entitlement patterns; supports staged rollout + fast disable (PRD §10.1 rollback). Reversibility: High.

ADR-8 — Comparison metrics computed from rooms (with a pinned v1 contract — REV-4/REV-5). Context: BTS-S04 is Should Have; canonical resolution def, handover derivation, and CSAT source are Data dependencies (Open Q-2/Q-3). An agent still needs an exact, implementable contract for v1. Decision: v1 comparison endpoint aggregates rooms WHERE variant IS NOT NULL (REV-3 — only measured arms) by variant for a channel + date range (created_at in [date_from, date_to], range ≤ 90 days):

  • resolution_rate: float 0..1 = count(resolved_at IS NOT NULL) / count(*) per arm — the v1 proxy for the PRD ⭐ KPI until Data confirms the canonical "resolved" definition (Open Q-2). Documented as a proxy in the response (resolution_basis: "resolved_at_present_v1").
  • resolution_parity: float|null = bot resolution_rate ÷ human resolution_rate (null when either arm has 0 conversations) — the PRD §6.1 Screen-B parity column.
  • handover_rate: float 0..1 (bot arm only) = bot-arm conversations later escalated to a human ÷ bot-arm conversations. v1 derivation: count(variant='bot' AND assigned_at IS NOT NULL) / count(variant='bot'). The dedicated bot_arm_handover_to_human event (emitted from the existing bot→agent handover path, gated on room.variant == 'bot') is the authoritative signal once the Data pipeline ingests it; until then the assigned_at proxy is used and labelled (handover_basis: "assigned_at_proxy_v1"). See L-1.
  • csat_avg: float|null = best-effort; null ⇒ FE shows "CSAT not available" (PRD §6.1 D3, Open Q-3). Consequences: ships a fully-typed, implementable comparison without blocking on the warehouse; every proxy is self-labelled so the dashboard never presents a proxy as a confirmed metric. Reversibility: High — swap the aggregation source behind the same endpoint contract.

ADR-9 — Split applies only to plain bot-intent paths (REV-2). Context: find_default_path can resolve a path that is not a plain bot reply: a CRM path (@path.crm_assignmentcrm_intent_id at :84), an agent-routed path (agent_id at :501), or a division-routed path (division_id at :502). These already route deterministically (to CRM intent / specific agent / division), so "split a % of them to a human" is ambiguous and would change configured behavior. Options: (a) apply the split only when the resolved path is a plain bot-intent path (intent_id present, none of agent_id/division_id/crm_assignment); (b) apply the split to every path and override; (c) treat agent/division/CRM paths as "already human" for measurement. Decision: (a) — the split step is a no-op for non-bot paths; their routing is untouched and variant stays NULL (not a measured arm). The roll only ever flips a conversation that would otherwise have been a 100 %-bot reply. Consequences: keeps the experiment’s two arms clean (bot-reply vs human-agent) and never silently re-routes a deliberately agent/CRM-targeted path. Scope is explicit and grounded at :84/:501-502. Reversibility: High — widen the scope predicate later.

Detail 2.2 — Sequence (mermaid, end-to-end incl. failure paths)

Routing — bot arm, human arm, no-agent, fail-safe:

sequenceDiagram
actor C as Customer
participant HUB as ProcessIncomingMessageWithResolve
participant DB as PostgreSQL
participant MIX as SendMixpanelEventWorker
participant AA as SendMessageAutoAssignAgentWorker
participant BOT as SeparateSendMessageWorker

C->>HUB: incoming message (webhook)
HUB->>DB: Rooms::FindOrCreateBy (:261)
HUB->>HUB: find_default_path (:488) → resolve Path (intent_id, is_auto_assign_agent, agent_id, division_id, crm_intent_id)
alt non-bot path (agent/division/CRM, ADR-9) OR split disabled OR config error (fail-safe ADR-4)
HUB->>HUB: keep deterministic routing; variant stays NULL (not a measured arm)
HUB-->>MIX: bot_traffic_split_decision_fallback (only on config error)
else split active on a bot path
alt room.variant already set (ADR-3)
HUB->>HUB: no re-roll
else first decision
HUB->>HUB: roll rand(100); arm = (roll < bot_percent ? bot : human)
HUB->>DB: UPDATE rooms SET variant WHERE id=? AND variant IS NULL
end
HUB->>DB: re-read rooms.variant = canonical arm (ADR-3)
HUB->>HUB: apply_arm(canonical): bot→keep intent_id; human→is_auto_assign_agent=true, clear intent_id/agent_id/division_id/crm_intent_id (ADR-5)
HUB-->>MIX: bot_traffic_split_assigned {variant, bot_percent, …} (best-effort)
end
HUB->>HUB: send_message_assign_agent (:534)
alt variant=bot OR untagged bot (intent_id branch)
HUB->>BOT: SeparateSendMessageWorker(intent_id)
BOT-->>C: bot reply
else variant=human (is_auto_assign_agent branch :545)
HUB->>AA: SendMessageAutoAssignAgentWorker
alt agent available
AA-->>C: assigned to agent
else no agent (BTS-S03)
AA->>DB: existing queue / offline behavior (bot does NOT take over)
end
end

Config save (FE → BE), happy + failure:

sequenceDiagram
actor A as Chatbot Admin
participant FE as TrafficSplitSection.vue
participant API as Grape PATCH …/traffic_split
participant UC as UpdateTrafficSplit use case
participant DB as PostgreSQL
participant MIX as SendMixpanelEventWorker

A->>FE: toggle ON, bot_percent=30, Save
FE->>FE: Vuelidate integer + between(0,100)
alt invalid
FE-->>A: inline "Enter a whole number between 0 and 100" (BTS-S01/ERR-1)
else valid
FE->>API: PATCH {traffic_split_enabled:true, bot_percent:30}
API->>API: set_role(owner/admin) + feature flag + plan guard
alt flag OFF / ineligible plan / role
API-->>FE: 403 (BTS-S01-NEG/-NEG2)
FE-->>A: section should not have been shown
else authorized
API->>UC: result
UC->>DB: UPDATE channel_integrations SET traffic_split_enabled, bot_percent (+PaperTrail)
alt persist ok
UC-->>MIX: bot_traffic_split_config_saved
API-->>FE: 200 {traffic_split_enabled, bot_percent}
FE-->>A: toast "Traffic split updated: 30% bot / 70% human"
else persist fails (5xx)
UC-->>MIX: bot_traffic_split_save_failed {error_code} (BE-emitted, REV-6)
API-->>FE: 5xx (Dry::Matcher failure)
FE-->>A: "Couldn't save traffic split. Try again." + Retry (BTS-S01/ERR-2)
end
end
end

Detail 2.3 — Database Model (DDL)

PostgreSQL; Rails ActiveRecord::Migration[7.1]; schema tracked in db/schema.rb. Two additive migrations, no backfill, no data migration.

# db/migrate/<ts>_add_traffic_split_to_channel_integrations.rb
class AddTrafficSplitToChannelIntegrations < ActiveRecord::Migration[7.1]
def change
add_column :channel_integrations, :traffic_split_enabled, :boolean, default: false, null: false
add_column :channel_integrations, :bot_percent, :integer, default: 100, null: false
# Range guard at the DB (defense-in-depth alongside Grape validation)
add_check_constraint :channel_integrations, "bot_percent >= 0 AND bot_percent <= 100",
name: "chk_channel_integrations_bot_percent_range"
end
end
# db/migrate/<ts>_add_variant_to_rooms.rb
class AddVariantToRooms < ActiveRecord::Migration[7.1]
def change
add_column :rooms, :variant, :string, limit: 10 # 'bot' | 'human' | NULL (undecided)
add_index :rooms, :variant
# Composite index supports the comparison aggregation by channel + arm + time
add_index :rooms, [:channel_integration_id, :variant, :resolved_at],
name: "index_rooms_on_channel_variant_resolved"
end
end

Per-status lifecycle — rooms.variant:

ValueSet byMutable?RetentionRestore semanticsVisibility
NULL (undecided)default (room created)→ set oncen/an/ainternal
botsplit step only when active on a bot path (REV-3)no (write-once, ADR-3)lives & soft-deletes with Room (acts_as_paranoid)restored with the Roominternal + comparison
humansplit step (active, human arm)nosamesameinternal + comparison

A bot reply produced because the split is disabled, the path is non-bot (ADR-9), or a fail-safe fired (ADR-4) leaves variant = NULL — these are not experiment arms and the comparison aggregation filters them out (WHERE variant IS NOT NULL). Only conversations decided by an active split are tagged.

Config columns (traffic_split_enabled, bot_percent) are plain mutable settings on channel_integrations with no state machine; audited via existing PaperTrail on the model (confirm PaperTrail is enabled on ChannelIntegration — Open Q-7).

Detail 2.4 — APIs

Outbound endpoints (consumers call us)

EndpointMethodAuthN/AuthZRequest schemaResponse schemaStatus codesIdempotencyVersioningReuse?
/api/v1/channel_integrations/:id/traffic_splitPATCHsession + set_role(%w[owner admin]) + Middlewares::Ownership + bot_traffic_split feature flag + plan gate{ traffic_split_enabled: boolean (required), bot_percent: integer 0..100 (required when enabled) }{ id, traffic_split_enabled, bot_percent }200; 422 invalid bot_percent/non-integer; 403 flag-off/ineligible/role; 5xx persist failnatural — PATCH is idempotent (last write wins)/api/v1/ (existing)new-with-justification (ADR-6)
/api/v1/channel_integrations/:idGETsession + set_role(%w[owner supervisor admin]){ id }existing entity + traffic_split_enabled, bot_percent200; 403; 404safe/api/v1/extended
/api/v1/channel_integrations/:id/traffic_split/comparisonGETsession + set_role(%w[owner supervisor admin]) + feature flag{ id, date_from, date_to } (ISO-8601 date; range ≤ 90 days)see typed schema below200; 403; 422 bad/over-long range; 5xxsafe/api/v1/new-with-justification (ADR-8)

Comparison response schema (v1, pinned — REV-4):

{
"updated_at": "2026-06-16T14:20:00Z", // ISO-8601
"resolution_basis": "resolved_at_present_v1", // self-labelled proxy (Open Q-2)
"handover_basis": "assigned_at_proxy_v1", // until bot_arm_handover_to_human ingested (L-1)
"resolution_parity": 0.92, // float|null = bot.resolution_rate / human.resolution_rate
"arms": {
"bot": {
"conversations": 372, // integer = count(variant='bot') in range
"resolution_rate": 0.78, // float 0..1 = resolved_at-present / conversations
"csat_avg": 4.4, // float|null (null ⇒ "CSAT not available")
"handover_rate": 0.24, // float 0..1 = (variant='bot' AND assigned_at NOT NULL)/conversations
"no_data": false // true when conversations == 0
},
"human": {
"conversations": 868,
"resolution_rate": 0.85,
"csat_avg": 4.7,
"no_data": false
}
}
}

Aggregation runs WHERE variant IS NOT NULL (REV-3) over rooms for the channel + range, grouped by variant. Rates are fractions 0..1 (the FE renders as %). resolution_basis/handover_basis flag the v1 proxies so the dashboard never shows a proxy as a confirmed metric (ADR-8).

Validation (BTS-S01/ERR-1): Grape requires :bot_percent, type: Integer, values: 0..100 rejects 150, -5, "30%" with 422 before the use case runs; the DB CHECK is defense-in-depth. The comparison endpoint validates date_from <= date_to and the ≤ 90-day window (422 otherwise). Disable semantics (REV-10): setting traffic_split_enabled=false preserves the last bot_percent (only the boolean flips), so re-enabling restores the prior percentage; the routing simply skips the split while OFF. OpenAPI: add all three to docs/openapi/openapi.yaml (AGENTS.md convention — the Grape desc/success/failure blocks already power it).

Inbound webhooks (other services call us)

EndpointMethodAuthN/AuthZSource serviceRequest schemaResponse schemaStatus codesIdempotencyVersioning
n/a — no new inbound webhook

The routing trigger is the existing incoming-message hub flow (ProcessIncomingMessageWithResolve); no new callback is introduced.

Detail 2.A — UI Contract

SurfaceComponent (new)Props / inputsEmitted events / API callsSource data
Traffic Split sectionTrafficSplitSection.vuechannelIntegrationId, initial traffic_split_enabled, bot_percenton Save → channel-integration.ts updateTrafficSplit(PATCH); $toast on success/failGET /v1/channel_integrations/:id
Enable toggleMpToggle (Pixel 3)v-model booleantoggling reveals % input + preview + info bannerlocal state
Bot % inputMpInputGroup+MpInput type="number"+MpInputRightAddon (%)v-model integer; Vuelidate integer,between(0,100)inline error on invalidlocal state
Live previewMpTextcomputed ~{n}% to bot, ~{100−n}% to human agentscomputed
Info bannerMpBanner variant="info" is-inlinestatic copy (no-agent → queue)static
Save / CancelMpButton (Save primary, Cancel ghost):is-loading during savecalls API
Comparison viewBotHumanComparison.vuechannelIntegrationId, date rangeGET …/comparison; Retry refetchcomparison endpoint
Comparison tablepixel-table + MpBadge legend, MpSelect (channel/range)header-list,data-list,:empty-statefilter-change → refetchcomparison endpoint

Detail 2.B — Data-Fetching Strategy

  • Read config: the host settings page already loads the channel integration; TrafficSplitSection receives traffic_split_enabled/bot_percent as props (or reads from the channel store). No extra fetch on mount.
  • Save: Pinia action wraps updateTrafficSplit, following the fetchStatus: pending|resolved|rejected pattern (store/ai-agent/actions.ts); AbortController for cancel-on-unmount (per channel-integration.ts).
  • Comparison: lazy fetch on view open + on filter change; show skeleton while pending; cache last successful result so Retry doesn’t flash empty.

Detail 2.C — UI State Matrix

SurfaceEmpty / DisabledLoadingErrorSuccess
Traffic Split sectionsplit OFF (default): toggle off, % + preview + banner hidden; helper “All incoming chats are handled by the bot (100%).”Save disabled + spinner (:is-loading); prior value retainedinvalid → inline “Enter a whole number between 0 and 100”; save 5xx → “Couldn’t save traffic split. Try again.” + Retry; emit bot_traffic_split_save_failedtoggle ON, value saved, toast “Traffic split updated: 30% bot / 70% human”
Comparison viewbot arm 0 convos → bot column “No data yet” (not 0%); range empty → “No conversations in this range yet”skeleton rows“Couldn’t load comparison. Try again.” + Retry; no partial/misleading numberstwo-arm table + Updated <ts>
Gated (flag OFF / ineligible plan / Agent role)section + comparison not rendereddirect API → 403

Detail 2.D — Data Integrity Matrix

InvariantEnforced by
bot_percent ∈ [0,100] integerGrape values: 0..100, type: Integer + DB CHECK chk_channel_integrations_bot_percent_range
variant ∈ {bot, human, NULL}application sets only 'bot'/'human'; limit: 10 column
variant write-onceUPDATE … WHERE variant IS NULL (ADR-3)
Split only when enabledtraffic_split_enabled guard before roll
Config only for entitled orgfeature-flag guard on write + read endpoints

Detail 2.E — Concurrency Collision Map

CollisionScenarioResolution
Double first-message (PRD §15 Q4 / REV-1)two messages create/find the same new Room near-simultaneously, both see variant IS NULL, and each must also route (reply/assign)conditional UPDATE … SET variant WHERE id=? AND variant IS NULL persists one arm; both messages then re-read variant and route via apply_arm(canonical) (ADR-3) — the race loser routes per the persisted arm, so you never get one bot reply + one human assignment for the same conversation
Config save vs in-flight routingAdmin changes bot_percent while messages routeeach conversation is decided at its own arrival using the then-current bot_percent; already-decided rooms keep their variant (no retro change)
Concurrent config savestwo Admins PATCH the same channelPATCH last-write-wins; PaperTrail records both (Open Q-7)

Detail 2.F — Async Job / Event Consumer Spec

WorkerStatusTriggerEffectFailure behavior
SendMessageAutoAssignAgentWorkerreused, unchangedis_auto_assign_agent branch (:545) for human armassign to agent or enter existing queue/offlineunchanged from today (BTS-S03/ERR-1: bot never rescues)
SendMixpanelEventWorker (queue: :event_tracker)reusedafter arm decision / config save / fallback / save fail / handoveremit bot_traffic_split_* eventbest-effort — emit must never block/fail routing (PRD §7 #4)
SeparateSendMessageWorkerreusedbot arm (intent_id branch)bot replyunchanged

Detail 2.F.1 — Responsibility Boundary Matrix

StepOwning squad / serviceInbound triggerOutbound effectFailure handlerPRD anchor
1. Resolve path + split decisionChatbot / chatbot BEincoming messagevariant stamped; arm chosenrescue → bot arm + decision_fallback§7 #2, §8.1
2. Bot replyChatbotbot armSeparateSendMessageWorkerexisting§7 #2
3. Human assignment / queueChatbothuman armSendMessageAutoAssignAgentWorkerexisting queue/offline§7 #3, BTS-S03
4. Emit analyticsChatbot → Mixpanelarm decision / config savebot_traffic_split_* eventsbest-effort, swallow errors§7 #4, §10
5. Ingest events + dashboardDataMixpanel eventscomparison dashboardData pipeline TTL§13
6. Canonical resolution def + CSAT joinDatacomparison querymetric definitionsn/a — Open Q-2/Q-3§13, §15 Q2/Q3

The Chatbot↔Data boundary at steps 5–6 is the one cross-squad seam. The routing (steps 1–4) is entirely within chatbot. No disagreement with PRD §13.

Detail 2.F.2 — State Surface Contract

EntityState field / eventDefaultUpdated byRead viaStale window
ChannelIntegrationtraffic_split_enabled, bot_percentfalse, 100UpdateTrafficSplit use caseGET /v1/channel_integrations/:idimmediate (read-your-write)
RoomvariantNULLsplit step (write-once)comparison aggregation; bot_traffic_split_assignedset at conversation start
Comparisonper-arm aggregatescomputedread endpointGET …/comparisonas fresh as rooms + analytics ingest lag

Detail 2.G — Cross-Layer Contract Verification

EndpointBE response schemaFE expected schemaMatch?Gaps
PATCH …/traffic_split{ id, traffic_split_enabled, bot_percent } (snake)same (consumed as-is)yesnone — no casing transform; FE reads snake_case (see channel-integration.ts)
GET …/:id (extended)existing entity + 2 fieldssection reads traffic_split_enabled,bot_percentyesnone
GET …/comparison{ updated_at, arms:{bot:{…,no_data}, human:{…}} }table maps arms→rows; no_data→“No data yet”; csat_avg:null→“CSAT not available”yesnone — null CSAT + no_data flag handled by FE (PRD §6.1 D3)

All rows yes. Error envelope is the existing error_response shape; FE’s toast/Retry handles 4xx/5xx uniformly.

Detail 2.H — End-to-End Data Flow

  • Save config: Admin → TrafficSplitSection.vue → Pinia action → channel-integration.ts updateTrafficSplitPATCH /v1/channel_integrations/:id/traffic_splitUpdateTrafficSplit use case → UPDATE channel_integrations (+PaperTrail) → 200 → store $patch → toast. Side effects: bot_traffic_split_config_saved.
  • Route a chat: customer message → hub → Rooms::FindOrCreateByfind_default_path (resolve path + split step) → atomic variant stamp → send_message_assign_agent → bot (SeparateSendMessageWorker) or human (SendMessageAutoAssignAgentWorker). Side effects: bot_traffic_split_assigned (always), decision_fallback (on error).
  • View comparison: Admin/SPV → BotHumanComparison.vueGET …/comparison → aggregate rooms by variant → table render (skeleton→rows / empty / error).

Detail 2.I — Scope Boundaries

  • BE create: 2 migrations; UseCases::API::FrontendService::V1::ChannelIntegration::UpdateTrafficSplit; comparison use case + repository; response entities; specs.
  • BE modify: app/api/frontend_service/v1/channel_integration.rb (+2 routes); find_default_path + send_message_assign_agent (split step + variant stamp + events); ChannelIntegration GET entity (+2 fields); app/models/room.rb (validation/constant for variant); docs/openapi/openapi.yaml.
  • BE NOT touched: SendMessageAutoAssignAgentWorker, SeparateSendMessageWorker, assignment/queue logic (reused unchanged — BTS-S03).
  • FE create: TrafficSplitSection.vue, BotHumanComparison.vue, store action(s), service methods updateTrafficSplit + getTrafficSplitComparison, endpoint entries; Vitest + Playwright specs.
  • FE modify: pages/chat/settings/index.vue (mount section + comparison, gated); common/services/main/v1/channel-integration.ts; common/services/main/endpoint.ts.
  • FE NOT touched: unrelated settings panels; auth/session plumbing.
  • Shared module impact: snake_case contract shared FE↔BE; no transform layer needed (§2.G).

Detail 2.J — Asset Inventory (frontend half)

AssetTypeSourceFormat & sizesPath in repo
Info icon (banner)icon@mekari/pixel3 built-incomponent propn/a — DS-provided
Empty-state illustration (comparison)illustrationn/a — design pending (Open Q-4)TBDTBD

No new bespoke assets in the structural build. New copy strings (labels, toast, banner) are introduced; since there is no i18n system (verified), they are hardcoded in templates per current convention — flag for a future localization pass (Open Q-6).


3. High-Availability & Security

The split decision is in-process, on the existing hot path — no new network or DB round-trip beyond the single atomic variant write (the channel config is already loaded with the ChannelIntegration record). Routing availability is therefore unchanged; the experiment can only ever fail safe to today’s behavior (ADR-4).

Performance Requirement

  • Backend: the split step adds an O(1) rand + one conditional UPDATE (the variant stamp, indexed on rooms.id PK) on the first message of a conversation only; subsequent messages skip the roll. Target ≤ 5 ms added per incoming message (PRD §5 Performance); no extra read (config preloaded); analytics emit is async (SendMixpanelEventWorker, queue: :event_tracker). Comparison endpoint is a read-only aggregate over the new composite index index_rooms_on_channel_variant_resolved — target p95 < 500 ms for a 14-day window; cap the range server-side (e.g. ≤ 90 days) to bound the scan.
  • Frontend: the section is a small form on an already-loaded settings page — no bundle-budget concern beyond the new SFCs (each ≤ 250 lines per FE convention). Comparison view fetches once per open/filter; skeleton during load; browser support + a11y per existing chatbot-fe baseline.

Monitoring & Alerting

Reuse the Mixpanel emit pattern (SendMixpanelEventWorker.perform_async(org_id, name, payload)); event names follow the PRD §10 catalog:

EventTriggerProperties
EventEmitted fromTrigger
---------
bot_traffic_split_config_savedBE — UpdateTrafficSplit use caseconfig saved
bot_traffic_split_assignedBE — apply_arm step in the hubarm decided (active split only)
bot_traffic_split_decision_fallbackBE — rescue in the split stepconfig unreadable → bot fail-safe
bot_traffic_split_save_failedBE — UpdateTrafficSplit use case on the 5xx path (REV-6)config save errored
bot_arm_handover_to_humanBE — the existing bot→agent handover path, gated on room.variant == 'bot' (REV-5)a bot-arm conversation later escalates to a human

REV-6: bot_traffic_split_save_failed is emitted server-side (the FE has no Mixpanel client), keeping all five events on the one SendMixpanelEventWorker path. REV-5: bot_arm_handover_to_human is emitted at the existing bot→agent handover/assign action, gated on room.variant == 'bot'; until the Data pipeline ingests it, handover_rate falls back to the assigned_at proxy (ADR-8, L-1).

  • BE alerts (PRD §10): decision_fallback rate > 1 % of assignments in 1h → #chatbot-alerts (config read failing / silently defaulting to 100 % bot); human-arm queue wait p90 > 15 min during experiment hours → #chatbot-alerts.
  • Routing fidelity (SC-1): dashboard compares observed bot share vs configured bot_percent per active experiment (PRD §10.1: alert if drift > 10pp/week).
  • FE: reuse existing error monitoring for the save-failed UX (toast + Retry); the analytics event itself is BE-emitted (REV-6).
  • Cross-layer trace: existing request → worker correlation (room_id / channel_integration_id carried on every event) ties config save and routing.

Logging

  • BE: structured log on the fallback path (bot_traffic_split_decision_fallback with reason); follow existing repo log conventions (frozen_string_literal, no PII in event payloads — use ids, not message content).
  • FE: existing console/error reporting for save failures (no PII).
  • PII: events carry ids only (organization_id, channel_integration_id, room_id, actor_id) — never message text or customer identifiers, matching the existing 'Process Message' payload shape.

Security Implications

  • AuthN/AuthZ (REV-9 resolved): every new endpoint goes through the existing session + Grape set_role + Middlewares::Ownership chain. Decision: the write endpoint uses set_role(%w[owner admin]) (configure), while read/comparison use set_role(%w[owner supervisor admin]) (view) — this tightens the SPV to view-only per the PRD, diverging from the existing channel endpoints that gate the three roles together (channel_integration.rb:89). Ownership middleware confines actions to the caller’s own org (no cross-tenant channel_integration_id).
  • Feature/plan gate is server-side (REV-11): hiding the FE section is UX only; the authoritative gate returns 403 on any direct call (BTS-S01-NEG/NEG2) — a crafted request cannot persist bot_percent. The gate has two checks, both in the UpdateTrafficSplit use case before persistence: (1) org featureOrganizationFeature.exists?(feature_id: <bot_traffic_split>.id, organization_id:, enabled: true) (+ the optional global SystemPreference rollout kill-switch); (2) plan eligibility — via the existing billing path Repositories::Orders::UseMekariBilling + active-subscription component check (the same pattern at process_incoming_message_with_resolve.rb:54). The exact eligible-plan list stays Open Q-9, but the enforcement path is fixed.
  • Input validation: bot_percent validated at the edge (Grape values: 0..100, type: Integer) and the DB (CHECK); rejects injection-via-type ("30%", floats, negatives) with 422.
  • No new secrets, no new external egress (Mixpanel already integrated).
  • Tenant isolation in comparison: the aggregate filters by the caller’s organization_id + the requested channel_integration_id (ownership-checked).

Role × Endpoint Authorization Matrix

RoleEndpoint(s)Permitted methodsTenant scopeUI visibility (FE)Additional constraintAudit trail
Chatbot Admin (owner,admin)…/traffic_split, …/:id, …/comparisonPATCH (config), GET (view)own orgsection + comparison editablePaperTrail + config_saved
SPV (supervisor)…/:id, …/comparisonGETown orgcomparison visible; config read-onlyno PATCH — excluded from the write endpoint's set_role(%w[owner admin]) (REV-9)view only
Human Agentnonenot rendered
End-customernonen/a

No role from Detail 1.A is left without a row. The supervisor write restriction (PRD: SPV is view-only) is enforced by set_role(%w[owner admin]) on the write endpoint (REV-9 — decided, was Open Q-5), tightening over the existing combined owner/supervisor/admin gate.

Detail 3.A — Failure Mode Catalog (merged)

SurfaceFE behavior on failureBE response on failureCode-shape consistency
Save config — invalid bot_percentinline “Enter a whole number between 0 and 100”; Save blocked422 field error (Grape validation)yes
Save config — flag off / ineligible / rolesection not shown; if forced → generic error403yes
Save config — persist 5xxkeep prior value; “Couldn’t save…” + Retry; emit save_failed5xx via Dry::Matcher failureyes
Routing — config read error/ambiguousn/a (no UI)fail-safe to bot arm + decision_fallback log/event; chat not droppedyes (ADR-4)
Routing — analytics emit failsn/aswallowed (best-effort); routing proceedsyes
Human arm — no agentn/a (customer sees existing offline/queue UX)existing queue/offline; bot never rescuesyes (BTS-S03)
Comparison — query fails“Couldn’t load comparison. Try again.” + Retry; no partial numbers5xxyes
Comparison — bot arm 0 rowsbot column “No data yet”no_data: true in payloadyes

Detail 3.A.1 — Branch & Skip Catalog

Branch triggerWhere checkedDownstream effectAudit trailUser-visible?
non-bot path (agent/division/CRM)scope check (ADR-9)skip split → keep deterministic routing; variant stays NULLnoneno
traffic_split_enabled = falsefind_default_path split stepskip roll → 100 % bot (today’s behavior); variant stays NULL (REV-3)none (not a measured arm)no (bot replies as normal)
room.variant already setsplit step (ADR-3)skip roll → re-read + apply stored variant (no re-bucketing)none extrano
config read errorrescue (ADR-4)bot fail-safe; variant stays NULLdecision_fallbackno
feature flag OFF / ineligible planendpoint guard + FE gatesection hidden; write → 403yes (control absent)

Detail 3.B — Error Response Catalog (BE)

ConditionHTTPBody (existing error_response envelope)
bot_percent not integer / out of 0..100422field error: bot_percent
traffic_split_enabled true but bot_percent missing422field error: bot_percent required when enabled
feature flag OFF / ineligible plan / role403forbidden
channel not owned by caller403/404ownership middleware
persist failure5xxinternal error
comparison date_from > date_to or range > 90 days (REV-8)422field error: date_from/date_to

Detail 3.C — Error Message Catalog (FE)

ConditionMessageAction
invalid percent“Enter a whole number between 0 and 100”inline; Save blocked
save 5xx/network“Couldn’t save traffic split. Try again.”toast + Retry; emit save_failed
save success“Traffic split updated: 30% bot / 70% human”toast
comparison load fail“Couldn’t load comparison. Try again.”Retry; render nothing partial
comparison empty range“No conversations in this range yet”empty state
bot arm no data“No data yet”bot column placeholder (not 0 %)

Detail 3.D — Compliance & Data Governance

n/a — no new personal data. The variant tag is an internal routing label; analytics events carry ids only (no message content/PII). Retention inherits the Room (acts_as_paranoid) and the existing analytics TTL (PRD §5.1).

Detail 3.E — Accessibility

Reuse Pixel 3 components (labeled MpFormControl/MpFormLabel, focusable toggle/input, MpBanner with text alternative). Inline validation errors are associated to the input via MpFormErrorMessage; comparison table uses semantic table markup. Target the existing chatbot-fe a11y baseline; confirm DS components with the Pixel MCP for ARIA props.


4. Backwards Compatibility and Rollout Plan

Compatibility

  • BE: purely additive — two new columns (defaults preserve today’s behavior: traffic_split_enabled=false ⇒ 100 % bot), two new routes, one extended GET entity. No change to existing assignment/queue workers. Old clients ignore the new GET fields.
  • FE: the section renders only when the org flag + plan allow; otherwise the settings page is unchanged. No saved-state/cache migration.
  • Cross-layer: snake_case contract is stable; the extended GET is backwards-compatible (additive fields).

Rollout Strategy

  • Deploy order: Backend first (migrations + endpoints + routing flag behind the org feature gate, which is OFF), then Frontend. Rationale: the routing split and config persistence must exist and be gated before any UI can toggle them; with the feature OFF, BE deploy is a no-op for live traffic.
  • Feature-flag coordination: a single org-level entitlement (OrganizationFeature 'bot_traffic_split') gates both layers; an optional global SystemPreference rollout kill-switch is checked first (ADR-7). FE visibility and BE authorization both read the same gate, so they cannot drift into a state where the UI shows a control the BE rejects (beyond the intended 403-on-forced-call guard).
  • Rollback per layer: disabling the org feature (or the global SystemPreference) instantly reverts to 100 % bot without a deploy (PRD §10.1). Code rollback: revert FE first, then BE; the additive columns can remain (inert when the flag is OFF).
  • Stop conditions: decision_fallback > 1 % sustained 24h (PRD §10.1) → disable the org flag; human-arm queue p90 > 15 min during experiment hours → investigate staffing / pause experiment.

Detail 4.A — Cross-Layer Rollout Compatibility Matrix

ScenarioFEBEWorks?Mitigation
Pre-deployOldOldyesbaseline (100 % bot)
Backend firstOldNew (flag OFF)yesnew columns default to today’s behavior; no UI yet
Frontend firstNewOldyessection gated by feature flag; with BE old the flag is absent ⇒ section hidden; do not enable the flag until BE is live
Both deployedNewNewyestarget state (flag enabled per org during rollout)
Backend rollbackNewOldyesFE section hidden (flag gone); no broken calls
Frontend rollbackOldNew (flag OFF)yesrouting inert while flag OFF; config simply not editable from UI

No “no” cells. The only ordering rule: enable the org flag only after BE is deployed.

Detail 4.B — Configuration Contract

LayerEnv var / flagTypeDefaultRequiredProvisionerSecret?
BEOrganizationFeature code='bot_traffic_split'per-org feature rowOFF (absent)yes (to enable)Ops/Commercial enablementno
BESystemPreference group_code='rollout', code='bot_traffic_split' (optional kill-switch)global flagOFF/absentnoOpsno
BEchannel_integrations.traffic_split_enabledboolean columnfalseper-channelAdmin via UIno
BEchannel_integrations.bot_percentinteger 0..100100per-channelAdmin via UIno
FEreads org flag via checkSubscription('bot_traffic_split')runtimederived from BEno

Detail 4.C — Test Plan (commands the agent will run)

Commands sourced from the repos (BE AGENTS.md / .rspec; FE package.json).

LayerCommand (source)What it must prove
BE unit/use-casebundle exec rspec spec/core/use_cases/system/hub/process_incoming_message_with_resolve_spec.rb (exists — extend it)split step: bot/human arms; non-bot path skipped (ADR-9); disabled→bot with variant NULL; fail-safe→bot+fallback; no re-roll on 2nd message; race loser routes per persisted variant (ADR-3)
BE unit (worker reuse)bundle exec rspec spec/app/worker/send_message_auto_assign_agent_worker_spec.rb (exists)human-arm assignment/queue unchanged (BTS-S03)
BE requestbundle exec rspec spec/api/frontend_service/v1/channel_integration_spec.rb (new — create; REV-7)PATCH persists bot_percent; 422 invalid/over-range date; 403 flag-off/ineligible/SPV-write; comparison schema + WHERE variant IS NOT NULL
BE fullbundle exec rspecno regressions
BE staticbundle exec rubocop ; bundle exec reek (AGENTS.md §78-79)style + smell clean
FE unitpnpm test (vitest run — package.json:17)section: invalid blocks Save; toast on success; comparison renders rows + “No data yet”
FE E2Epnpm test:e2e (playwright — package.json:22; specs under tests/visual/)configure split end-to-end; gated when flag OFF
FE lintpnpm lint (package.json)TS + prettier clean
Cross-layermanual/integration: save % → send messages → observe arm distribution ≈ bot_percent + comparison reflects armsrouting fidelity (SC-1) + end-to-end contract

Detail 4.D — Agent Execution Plan

OrderLayerChunkFiles to modify/createCommandsAcceptance criteria
1BEMigrations: channel config + room variantdb/migrate/<ts>_add_traffic_split_to_channel_integrations.rb, db/migrate/<ts>_add_variant_to_rooms.rb, db/schema.rbbundle exec rails db:migrateschema has traffic_split_enabled,bot_percent (+CHECK), rooms.variant (+indexes); rails db:migrate:status green
2BESplit decision (apply_arm) + variant stamp + events in hubapp/core/use_cases/system/hub/process_incoming_message_with_resolve.rb, app/models/room.rb (variant constant/validation)bundle exec rspec spec/core/use_cases/system/hub/process_incoming_message_with_resolve_spec.rb (extend)stub rand: <pct→bot/intent kept; ≥pct→human (is_auto_assign_agent true & intent/agent/division/crm cleared); non-bot path skipped (ADR-9), variant NULL; disabled→bot, variant NULL (REV-3); error→bot+decision_fallback, variant NULL; 2nd message no re-roll; race loser routes per persisted variant (ADR-3); bot_traffic_split_assigned emitted only when tagged
3BEConfig write endpoint + use case + entity + GET extensionapp/api/frontend_service/v1/channel_integration.rb, app/core/use_cases/api/frontend_service/v1/channel_integration/update_traffic_split.rb (new), response entities, docs/openapi/openapi.yamlbundle exec rspec spec/api/frontend_service/v1/channel_integration_spec.rb (new — create)200 persists; disable preserves bot_percent (REV-10); 422 invalid; 403 flag-off/ineligible-plan/SPV-write (REV-9/REV-11); GET returns the 2 fields; config_saved + save_failed-on-5xx emitted
4BEComparison read endpoint + aggregate repochannel_integration.rb (+route), comparison use case + repository, entity, openapibundle exec rspec spec/api/frontend_service/v1/channel_integration_spec.rb (new)aggregates WHERE variant IS NOT NULL (REV-3); typed schema (REV-4) incl. resolution_parity, *_basis; range>90d→422 (REV-8); bot 0 rows→no_data:true; CSAT null tolerated; 403 gated
5BEStatic analysisbundle exec rubocop ; bundle exec reek ; bundle exec rspecclean; full suite green
6FEAPI client: endpoints + service methodscommon/services/main/endpoint.ts, common/services/main/v1/channel-integration.tspnpm lintupdateTrafficSplit (PATCH …/traffic_split) + getTrafficSplitComparison exist & typed
7FETraffic Split section + store actionmodules/settings/.../TrafficSplitSection.vue (new), Pinia action, mount in pages/chat/settings/index.vue (gated by checkSubscription)pnpm testtoggle/%/preview/Save; Vuelidate integer+between(0,100) blocks invalid; success toast; hidden when flag OFF
8FEComparison viewmodules/settings/.../BotHumanComparison.vue (new), service wire-up, mountpnpm testrenders two-arm pixel-table; skeleton/empty/error+Retry; “No data yet” for bot 0 rows
9FEE2E + linttests/visual/** specpnpm test:e2e ; pnpm lintconfigure-split journey passes; gated path passes

Order rule: BE chunks 1→5 land (flag OFF, no live impact) before FE chunks 6→9. Each chunk’s ACs must pass before the next opens.

Detail 4.E — Verification & Rollback Recipe

  • Pre-merge verification (in order):
    • BE: 1) bundle exec rails db:migrate (+ db:rollback round-trip to prove reversibility); 2) bundle exec rspec; 3) bundle exec rubocop ; bundle exec reek.
    • FE: 1) pnpm lint; 2) pnpm test; 3) pnpm test:e2e.
  • Post-deploy verification signals:
    • bot_traffic_split_assigned volume > 0 on a pilot channel after enabling the flag; observed bot share ≈ configured bot_percent (SC-1).
    • bot_traffic_split_decision_fallback ≈ 0 (alert if > 1 %/1h).
    • bot_traffic_split_config_saved emitted on a test save; GET returns the saved value.
    • human-arm queue wait p90 within bounds during experiment hours.
  • Rollback recipe (deploy-order-aware):
    1. Immediate: disable OrganizationFeature 'bot_traffic_split' for affected orgs (or flip the global SystemPreference kill-switch) → instant revert to 100 % bot, no deploy. (Already-decided rooms keep their variant; harmless.)
    2. If code-level revert needed: roll back FE (section disappears), then BE routing change.
    3. The additive columns/indexes may remain (inert when the flag is OFF); drop only if fully abandoning, via a reverse migration.

Detail 4.F — Resource & Cost Notes (advisory)

Negligible: two columns + two indexes on existing tables; one extra UPDATE per new conversation; async Mixpanel events on the existing :event_tracker queue. Comparison endpoint is a bounded indexed aggregate. No new infrastructure.


5. Concern, Questions, or Known Limitations

#TypeQuestion / concernOwnerBlocking?
Q-1Decision to confirmConfig scope — ADR-1 stores traffic_split_enabled/bot_percent on channel_integrations (per channel), reconciling the PRD’s “Path config” wording. Confirm per-channel (not per-paths) is intended.PM + Engyes — sets the migration target
Q-2Dependency (Data)Canonical “resolved” definition segmentable by variant for the ⭐ resolution-parity KPI (PRD §15 Q2). ADR-8 ships a resolved_at/is_closed proxy until confirmed.PM + Datano for routing; yes for KPI accuracy
Q-3Dependency / RiskCSAT source joinable to variant; not all channels collect CSAT (PRD §15 Q3, §6.1 D3). Comparison degrades to “CSAT not available”.PM + Datano — secondary metric
Q-4DesignFigma frames for Screen A & B (PRD §6.1 D1–D4): inline section vs own tab; starting-% guard-rail copy; parity-formula display.Designeryes for FE pixel-polish; structure can start
Q-5✅ Resolved (REV-9)SPV write restriction — decided: write endpoint uses set_role(%w[owner admin]); read/comparison keep owner/supervisor/admin. Diverges deliberately from channel_integration.rb:89. (Confirm with PM is a courtesy, not a blocker.)Eng + PMno (decided)
Q-6LimitationNo i18n in chatbot-fe (verified) — new strings are hardcoded per current convention. Acceptable for Phase 1?FE leadno
Q-7To verifyPaperTrail on ChannelIntegration — confirm the model is audited so config changes are tracked (the config-save audit relies on it + the Mixpanel event).Engno — event covers analytics regardless
Q-8Decision (PRD §15 Q5)Starting-% guard-rail — advisory helper text only (not an enforced cap). Confirm.PMno
Q-9Decision (PRD §15 Q1)Plan eligibility — which plans get Traffic Split (proposed Professional + Enterprise w/ chatbot). The enforcement path is fixed (REV-11: UseMekariBilling + subscription check); only the eligible-plan list is open.PM + Commercialno — list only; path decided
L-1Known limitationHandover derivation (variant='bot' AND assigned_at IS NOT NULL proxy vs the dedicated bot_arm_handover_to_human event) — confirm the canonical signal with Data (ADR-8); response self-labels the basis.Eng + Datano
REV-12Citation drift (R2 review)ADR-5 branch line numbers stale. ADR-5 cites the send_message_assign_agent branch order as agent_id (:539) and intent_id (:548); current HEAD (fa6dd8b79) has elsif agent_id.present? at :538 and elsif intent_id.present? at :547. The other anchors ADR-5 relies on (crm_intent_id :536, division_id :543, is_auto_assign_agent :545, plus :84/:501-502) are exact. Cosmetic — refresh ADR-5 to :538/:547.Engno

Review reconciliation (R1 — see -review.md): findings REV-1, REV-2, REV-3, REV-4, REV-5, REV-6, REV-7, REV-8, REV-9, REV-10, REV-11 are all addressed in this revision (ADR-3/4/5/8/9, §2.1–§2.4, §3, §4.C/§4.D). Only Q-1 (config scope, PM confirm) remains a true pre-build blocker; Q-2/Q-3/Q-4/Q-7 and L-1 refine the comparison view but do not block routing/config.


6. Comment logs

DateComment(s) FromAction Item(s)
2026-06-20RFC author (drafted from PRD v1.2 via rfc-starter)Initial draft (R0)
2026-06-20rfc-reviewer cycle R1 (-review.md, score 7.5 → PROCEED)11 findings raised (REV-1..11)
2026-06-20RFC author (R1 fixes)Addressed REV-1..11: ADR-3 route-by-persisted-variant; ADR-9 split scope; variant tagged only under active split; pinned comparison schema; BE-emitted save_failed; handover emit point; SPV write owner/admin; date-range cap; disable-preserves-bot_percent; plan-gate path. Only Q-1 remains a pre-build blocker.

7. Ready for agent execution

yes — for the routing + configuration scope (BTS-S01, S02, S03, and the NEG guard rails). yes (with labelled proxies) — for the comparison view (BTS-S04): after R1, the endpoint contract is fully typed (§2.4) and the v1 resolution/handover metrics are self-labelled proxies (resolution_basis/handover_basis), so an agent can build it now; KPI accuracy firms up once the Data-squad confirms Q-2/Q-3/L-1, and visual polish awaits the Q-4 Figma frames. The one true pre-build blocker is Q-1 (config-scope PM confirmation, the migration target).

Post-review (R1): the rfc-reviewer pass scored the R0 draft 7.5 / Strong / PROCEED with 11 findings; all 11 are addressed in this revision (see Comment log + the review file ledger). The two score-capping gaps (ACV/DIC from REV-1/REV-4) are closed: routing now routes-by-persisted-variant and the comparison contract is pinned.

Readiness-gate status:

  • §1 Design References (FE): surfaces listed; both frames n/a — design pending (Open Q-4) — structural build allowed, pixel-polish gated. ✅ (with noted gap)
  • §1 PRD-to-Schema Derivation (BE): every entity/attribute/rule mapped to table.column + endpoint/event + enforcement. ✅
  • Detail 1.C Per-Story Change Map: all 7 stories, one row each, layer scope + FE/BE + verifiable AC. ✅
  • Repo Reading Guide (2.0): anchors for both layers; contracts classified reuse/extend/new. ✅
  • Source Verification: every anchor/pattern/contract carries concrete file:line evidence; unverifiable Data items moved to Open Questions (not invented). ✅
  • Design ↔ Code Mapping (FE): frames mapped to new SFCs + backing endpoints; tokens via Pixel MCP at build. ✅ (design-pending noted)
  • Asset Inventory: no new bespoke assets; new copy strings flagged (no i18n, Q-6). ✅
  • Mermaid diagrams: repo map, end-to-end component, ER, state, branch/skip, sequence (happy + failure, both flows). ✅
  • DDL: complete with per-status lifecycle for variant; every row traces to a PRD-to-Schema row. ✅
  • APIs: outbound (1 extended, 2 new-with-justification) + inbound (n/a); each tagged. ✅
  • Cross-Layer Contract Verification: all rows yes. ✅
  • End-to-End Data Flow: traced for save, route, view. ✅
  • UI State Matrix / Failure Catalog / Error catalogs: complete and aligned. ✅
  • Cross-Layer Rollout Matrix: complete; deploy order = BE-first, enable-flag-after-BE. ✅
  • Configuration Contract: per-layer; single org flag + optional kill-switch. ✅
  • Agent Execution Plan: 9 ordered chunks, each files + commands + assertable AC. ✅
  • Verification & Rollback Recipe: runnable per-layer commands; named signals; flag-flip rollback. ✅

Optional next step: hand to rfc-reviewer for a second-pass score. Confirm Open Q-1/Q-5/Q-9 with PM before starting Detail 4.D chunk 3 (config endpoint).