Qontak | Chatbot & AI | Unified Agent Quality Scorecard — Phase 8: Multi-Agent Scoring & Selectable Scorecard

Template: PHASE PRD v1.2 · Companion to PRD Section Reference v1.5 + Hierarchy v1.0 Note: Phase 8 (parked) of the initiative — a human-QA scaling capability, independent of the AI-agent phases (1–5). It extends the in-room Scorecard panel to score every agent who served a room, each on a selectable scorecard. Grounded in the cloned hub-chat + chatbot code.

HEADER BLOCK

Field	Value
PM	Dimas Fauzi Hidayat
PRD Version	1.0
Status	DRAFT
PRD Type	PHASE
Epic	QC-XXXXX — add once Epic is created
Squad	BOT — Bot, AI & Automation (with Omnichannel/Chat)
RFC Link	Pending — RFC to follow via `rfc-starter`
Figma Master	The provided design mock (Agent + Scorecard dropdowns in the in-room panel) — frame link on design
Anchor	Qontak \| Chatbot & AI \| Unified Agent Quality Scorecard — ANCHOR
Labels	`epic:qontak-chatbot-ai` \| `module:chatbot-ai` \| `feature:unified-agent-scorecard`
Last Updated	2026-06-19

HEADER BLOCK
2. CONDITIONAL BLOCK: PHASE CONTEXT
3. One-liner + Problem
4. What Happens If We Don't Ship This Phase
5. Target Users + Persona Context
6. Non-Goals
7. Constraints
8. Feature Changes
9. API & Webhook Behavior
10. System Flow + User Stories + ACs
- 10.1 System Flow
- 10.2 User Stories
11. Rollout
12. Observability
- 12.1 Post-Launch Monitoring Cadence
13. Success Metrics
14. Launch Plan & Stage Gates
15. Dependencies
16. Key Decisions + Alternatives Rejected
17. Open Questions
PRD CHANGELOG

2. CONDITIONAL BLOCK: PHASE CONTEXT

Field	Detail
Anchor PRD	Qontak \| Chatbot & AI \| Unified Agent Quality Scorecard — ANCHOR
Phase	Phase 8 of 8
Phase Goal	Score every agent who served a room — each on a selectable scorecard template — via the Agent + Scorecard selectors.
Prior phases	Phases 1–5 deliver the AI-agent scoring core; Phase 2 shipped the in-room Scorecard panel that this phase extends. This phase is a human-QA scaling effort, independent of AI-agent scoring.
Deferred to later phases	None — this is the last planned phase (parked).
Cross-phase dependencies	Reuses (1) the Phase 2 in-room Scorecard panel, (2) the existing `agent_scorecard` model + the per-room `GET/POST/PATCH /agent_scorecards/{roomId}` API, (3) the `roomParticipant[]` array, and (4) the handover events (`agent_take_room` / `remove_agent` / `handover_id`).

3. One-liner + Problem

One-liner: Let a supervisor score any agent who handled a room — not just the first — and pick which scorecard template applies to each.

Problem: Today the in-room Scorecard panel scores only one agent per room — selectedAgent = roomParticipant[0] (the first participant) — against a single org-wide scorecard. When a room is handed over (agent takeover), the agent who actually resolved it often isn't the first, so supervisors can't QA the real handler; and there's no way to apply a role-appropriate scorecard (e.g. a different template for sales vs support). The (org, room_id, agent_id) unique key and the roomParticipant[] array already support multiple agents — only the UI and API resolve a single one. This caps QA coverage on multi-agent rooms and forces one-size-fits-all criteria.

4. What Happens If We Don't Ship This Phase

Multi-agent rooms stay mis-QA'd — only the first participant is scorable; the agent who actually resolved a handed-over room goes unscored, so QA coverage has a blind spot that grows with handover volume.
One-size-fits-all scoring persists — no role/use-case-appropriate scorecard, so criteria don't fit every agent (a sales agent judged on support parameters).
The design mock stays unbuilt — the Agent + Scorecard dropdowns are a known, parked backlog gap with an existing design.

5. Target Users + Persona Context

Primary Persona: QA Lead / Supervisor

Field	Detail
Role	QA Lead or Supervisor reviewing conversation quality, including handed-over (multi-agent) rooms
Goal	Score the agent who actually handled the room (any of them), each against the right scorecard template
Pain	The panel only exposes the first participant and one scorecard; the real resolver is often unscorable
Workaround	Skipping QA on handover rooms, or scoring the wrong (first) agent

Secondary Persona: Team Lead (mixed-role team)

Field	Detail
Role	Lead of a team spanning roles (sales + support) on shared rooms
Goal	Apply role-appropriate scorecards per agent rather than one generic template
Pain	A single org-wide scorecard can't fairly judge agents in different roles
Workaround	Mentally adjusting scores, or maintaining role criteria in spreadsheets

6. Non-Goals

Not AI-agent scoring — Phases 1–3 own that; this phase is human-QA scaling.
Not authoring/creating scorecard templates — templates are assumed to exist; this phase only selects among them.
Not the Analytics report — Phase 3.
Not the go-live gate — Phase 5.
No change to the auto-scorer logic (auto_agent_scoring.rb still picks the first agent — see Open Q#2).
No mobile — web only.
Not per-team permission scoping — tracked separately (the org-wide RBAC caveat).

7. Constraints

Field	Value
Platform	Web only — Qontak omnichannel web app
Performance	Agent/scorecard switch in the panel ≤ 500ms; scorecard fetch ≤ 800ms P95
Data limits	One scorecard record per `(org, room_id, agent_id)` (existing unique key); the existing edit-once rule applies per record (`edit_count` / `correction_at`)
Plan scope	Professional + Enterprise (Agent Scorecard tier)
Feature flag	`scorecard_multi_agent` \| default: OFF
Read/write	Read/score: Usman `inbox_scorecard_view` / `inbox_scorecard_manage` (org-wide today — no team scope). End CS agents: no access.

8. Feature Changes

Change ID: CHG-A — Agent selector in the in-room Scorecard panel

Field	Detail
Change Type	Modified component (in-room Scorecard panel)
Page	/inbox — conversation right panel → Scorecard tab
Page Intent	A supervisor scores an agent's quality for the open conversation
Before	The panel scores only the first participant — `selectedAgent = roomParticipant[0]` (`AgentScorecard.vue`); other agents who served the room are not selectable or scorable.
After	An Agent dropdown lists every agent who served the room (from `roomParticipant[]` / `roomAgent[]`); selecting an agent loads/creates that agent's scorecard record, stored per `(room, agent_id)`.

Element	Before	After
Agent in scope	Hardcoded `roomParticipant[0]`	Selectable from all room agents
Scorecard records	One per room (first agent)	One per `(room, agent_id)` (existing key)

Figma: The provided design mock (Agent dropdown).

Change ID: CHG-B — Scorecard template selector

Field	Detail
Change Type	Modified component (in-room Scorecard panel)
Page	/inbox — conversation right panel → Scorecard tab
Page Intent	Apply the right scoring criteria to the selected agent
Before	A single org-wide enabled scorecard is loaded; the "Scorecard: Default" label is static — no choice.
After	The Scorecard dropdown lists the org's available scorecard templates; choosing one loads its categories/parameters for the selected agent. Requires the scorecard model/API to expose multiple templates (see Open Q#1).

Element	Before	After
Scorecard template	Single org-wide ("Default", static)	Selectable per agent/room

Figma: The provided design mock (Scorecard dropdown).

9. API & Webhook Behavior

Behavior 1: Get/submit a scorecard for a specific (room, agent, template)

Field	Detail
Entity affected	`agent_scorecard` record keyed by `(org, room_id, agent_id)`
Triggered by	Supervisor selects an agent (and template) in the panel, then loads/submits
Information passed	`room_id`, `agent_id`, `scorecard_id` (template)
Expected behavior	Extend the per-room `GET/POST/PATCH /api/v1/gpt/omnichannel/agent_scorecards/{roomId}` to accept `agent_id` + `scorecard_id`; return or persist that agent's record on the chosen template
Failure behavior	• Selected agent is not a room participant → 422. • Record already corrected (edit-once) → load read-only / block further edit. • Save fails → error + retry.

Behavior 2: List scorecard templates for the org

Field	Detail
Entity affected	Scorecard templates (category sets) available to the org
Triggered by	Panel opens / the Scorecard dropdown is opened
Information passed	`organization_id`
Expected behavior	Return the org's available scorecard templates for selection
Failure behavior	• None enabled → fall back to the Default scorecard. • Fetch fails → error; keep the last-loaded template.

Claude resolves during RFC: HTTP method, path, request/response JSON schema, error codes.

10. System Flow + User Stories + ACs

10.1 System Flow

Flow: Score a Specific Agent on a Chosen Scorecard Type: User Journey

A supervisor opens a resolved room's Scorecard panel.
The Agent dropdown lists every agent who served the room (from roomParticipant[]).
The supervisor selects an agent → the panel requests that agent's scorecard record (room_id + agent_id).
The supervisor picks a template from the Scorecard dropdown → its categories/parameters load.
Decision — does a record already exist for (room, agent)? Yes → load it (read-only if the edit-once slot is used). No → start a new score.
The supervisor scores (thumbs/values) and submits → persisted per (room, agent_id, scorecard_id).
(Optional) each agent's score is scoped to the turns they handled, derived from the handover events.
Failure branch — the selected agent is not a participant → error; the dropdown only lists valid room agents.

📊 System Flow — Multi-Agent Scoring

graph TD
    A[Supervisor opens resolved room Scorecard panel] --> B[Agent dropdown lists all room agents]
    B --> C[Select an agent]
    C --> D[Scorecard dropdown lists templates]
    D --> E[Select a template -> categories load]
    E --> F{Record exists for room+agent?}
    F -->|Yes| G[Load record - read-only if edit-once used]
    F -->|No| H[Start new score]
    G --> I[Score + submit -> persist per room+agent+template]
    H --> I
    A -.invalid agent.-> J[Error - dropdown only lists valid room agents]

10.2 User Stories

[P8-S01] — Score a specific agent via the Agent selector


User Story	As a Supervisor, I want to pick any agent who served the room and score them, so that I can QA the agent who actually handled the conversation, not just the first.
Before State	The panel scores only `roomParticipant[0]`; other agents who served the room are not selectable.
After Delta	An Agent dropdown lists all room agents; selecting one loads/creates that agent's scorecard record per `(room, agent_id)`.
Importance	Must Have
Mockup / Technical Notes	Figma: The provided design mock (Agent dropdown) Data Fields: • `room_id` (string, required) — conversation • `agent_id` (string, required) — selected agent • `roomParticipant[]` (array) — source of the agent list Technical Notes: Extends `AgentScorecard.vue` (`selectedAgent` no longer hardcoded `[0]`) + the per-room API to be agent-aware. Records reuse the `(org, room_id, agent_id)` unique key.
Acceptance Criteria	— Happy Path — • AC-1: Given a resolved room served by ≥2 agents, when the supervisor opens the panel, then the Agent dropdown lists every agent who served the room. • AC-2: Given the supervisor selects an agent, when the panel loads, then that agent's scorecard record is fetched (or a new one is started) for `(room, agent_id)`. • AC-3: Given the supervisor scores and submits, when saved, then the record persists against the selected agent without affecting other agents' records on the same room. — Edge — • AC-4: Given an agent already has a saved (and edit-once-locked) record, when selected, then their score loads read-only. — Error / Unhappy Path — • ERR-1: Given the selected agent is not a room participant, when the panel requests their record, then a 422/"not a participant" state is shown and `scorecard_invalid_agent` is logged. — Permission Model — • CAN: Usman `inbox_scorecard_manage` (org-wide). • CANNOT: `inbox_scorecard_view` is read-only; end CS agents none. • Unauthorized: panel/dropdown not rendered. — UI States — • Loading: dropdown + form skeleton while fetching. • Empty: single-agent room → dropdown shows one agent (today's behavior). • Error: as ERR-1. • Success: selected agent's scorecard shown. — Negative Scenarios — • NEG-1: Given a room with only one agent, when the panel opens, then it behaves exactly as today (no regression).

Dependencies: Phase 2 in-room panel.

[P8-S02] — Choose a scorecard template via the Scorecard selector


User Story	As a Supervisor, I want to choose which scorecard template applies to the agent I'm scoring, so that the criteria fit the agent's role/use-case.
Before State	One org-wide scorecard loads; the "Scorecard: Default" label is static — no choice.
After Delta	A Scorecard dropdown lists the org's available templates; choosing one loads its categories/parameters for the selected agent.
Importance	Must Have
Mockup / Technical Notes	Figma: The provided design mock (Scorecard dropdown) Data Fields: • `organization_id` (string, required) — Auth session • `scorecard_id` (string, required) — chosen template Technical Notes: Needs the scorecard model/API to expose multiple templates — confirm whether multiple templates exist today or are net-new (Open Q#1).
Acceptance Criteria	— Happy Path — • AC-1: Given the panel is open, when the supervisor opens the Scorecard dropdown, then the org's available scorecard templates are listed. • AC-2: Given the supervisor selects a template, when it loads, then its categories/parameters render for the selected agent. • AC-3: Given a template is chosen and a score submitted, when saved, then the record records which `scorecard_id` (template) was used. — Edge — • AC-4: Given the org has only the Default scorecard, when the dropdown opens, then only "Default" is listed (no regression). — Error / Unhappy Path — • ERR-1: Given the template list fetch fails, when the dropdown opens, then an error is shown and the last-loaded template stays active; `scorecard_template_fetch_failed` is logged. — Permission Model — • CAN: `inbox_scorecard_manage`. • CANNOT: `inbox_scorecard_view` read-only; end agents none. • Unauthorized: control not rendered. — UI States — • Loading: dropdown spinner while listing templates. • Empty: only Default available. • Error: as ERR-1. • Success: chosen template loaded. — Negative Scenarios — • NEG-1: Given a Starter/Free org, when the panel opens, then template selection is not available (plan-gated).

Dependencies: P8-S01.

[P8-S03] — Scope each agent's score to the turns they handled


User Story	As a Supervisor, I want each agent scored on the turns they actually handled, so that handover doesn't credit/penalize the wrong agent.
Before State	Scoring is over the whole room; there's no per-agent turn segmentation.
After Delta	The panel scopes the conversation shown for an agent to their handled turns, derived from the handover events (`agent_take_room` / `remove_agent` / `handover_id`).
Importance	Should Have
Mockup / Technical Notes	Figma: The provided design mock Data Fields: • `room_id`, `agent_id` (required) • `handover_id` / handover events — segment boundaries Technical Notes: Same segmentation mechanism as Phase 2 (P2/Open Q#3); confirm events fire for human→human handover.
Acceptance Criteria	— Happy Path — • AC-1: Given a room handed over from Agent A to Agent B, when the supervisor selects Agent A, then the turns shown are scoped to A's handled segment (before the handover). • AC-2: Given the supervisor selects Agent B, when the panel loads, then the turns shown are scoped to B's segment (after the handover). — Edge — • AC-3: Given handover events are missing for a room, when an agent is selected, then the panel falls back to the full transcript and flags "segment unavailable". — Permission Model — • CAN: `inbox_scorecard_view` / `manage`. • CANNOT: end agents. • Unauthorized: not rendered. — UI States — • Loading: transcript skeleton. • Empty: N/A. • Error: "segment unavailable" → full transcript fallback. • Success: segmented transcript shown. — Negative Scenarios — • NEG-1: Given a single-agent room, when an agent is selected, then the full transcript is shown (no segmentation needed).

Dependencies: P8-S01.

11. Rollout

Field	Value
Feature flag	`scorecard_multi_agent` — default: OFF
Stage 1	Internal QA: 3–5 internal accounts with multi-agent (handover) rooms
Stage 2	Closed beta: 3–5 accounts with high handover volume + mixed-role teams
Stage 3	All Professional + Enterprise on request
GA	All Professional + Enterprise (flag on)
Backward compat	Yes — single-agent rooms behave exactly as today; the dropdowns are additive
Migration	None — reuses the existing `agent_scorecard` `(org, room_id, agent_id)` records.

12. Observability

Key Events:

Event Name	Trigger	Properties
`scorecard_agent_selected`	Supervisor selects an agent in the panel	org_id, room_id, agent_id
`scorecard_template_selected`	Supervisor selects a scorecard template	org_id, scorecard_id
`scorecard_multi_agent_submitted`	A per-agent scorecard is submitted	org_id, room_id, agent_id, scorecard_id
`scorecard_invalid_agent`	Selected agent not a room participant	org_id, room_id, agent_id
`scorecard_template_fetch_failed`	Template list fetch failed	org_id, reason

Field	Detail
Dashboard owner	Bot, AI & Automation (squad: BOT) + Omnichannel
Alert 1	`scorecard_invalid_agent` > 2% of selections in 1h → Slack: #bot-ai-oncall

12.1 Post-Launch Monitoring Cadence

Field	Detail
Review cadence	Weekly for the first 4 weeks post-GA, then monthly
Owner	Dimas Fauzi Hidayat (PM) + BOT squad
Review scope	`scorecard_agent_selected`, `scorecard_template_selected`, `scorecard_multi_agent_submitted`, `scorecard_invalid_agent`
Trigger threshold 1	`scorecard_invalid_agent` > 2% week-over-week → investigate the participant list
Trigger threshold 2	Multi-agent rooms scored / multi-agent rooms resolved < 20% after 4 weeks → revisit discoverability
Rollback consideration	If invalid-agent or save errors persist > 48h, PM disables `scorecard_multi_agent` for affected orgs.

13. Success Metrics

Adoption & Usage:

Metric	Definition	Baseline	Target
⭐ Multi-agent QA coverage	% of multi-agent (handover) resolved rooms where >1 agent is scored	~0% — only the first agent is scorable today	≥40% within 90 days of GA
Template selection usage	% of submissions that use a non-Default scorecard	0% — single org-wide scorecard	≥25% within 90 days of GA (for orgs with ≥2 templates)

Quality & Accuracy:

Metric	Definition	Baseline	Target
Correct-handler scoring	Sampled checks where the scored agent matches the actual resolver on handover rooms	N/A	≥95% on the validation sample

14. Launch Plan & Stage Gates

Stage	Audience	Duration	Success Gate to Advance	Owner
Internal Alpha	3–5 internal QA accounts	2 weeks	0 P0/P1; single-agent rooms unchanged; `scorecard_invalid_agent` ≤2%	PM + QA
Closed Beta	3–5 high-handover / mixed-role accounts	2 weeks	>1 agent scored on handover rooms; template selection works; no P0	PM + BOT
Open Beta	All Pro+Ent on request	2 weeks	Multi-agent QA coverage climbing; no P0 for 2 weeks	Eng Lead
GA	All Pro+Ent	Ongoing	All Open Beta gates sustained 2 weeks; PMM approved	PM + PMM

15. Dependencies

Dependency	Owning Team	Deliverable Needed	Blocking?
Multiple scorecard templates in the model/API	BOT (chatbot)	The scorecard model/API exposes >1 selectable template per org (confirm if net-new — Open Q#1)	YES
Agent-aware scorecard API	BOT (chatbot)	Extend `GET/POST/PATCH /agent_scorecards/{roomId}` to accept `agent_id` + `scorecard_id`	YES
Handover events for segmentation	Omnichannel	`agent_take_room` / `remove_agent` / `handover_id` fire reliably for human→human handover	NO (P8-S03 only)
In-room panel (Phase 2)	BOT (Phase 2)	The panel this phase extends	YES
Design / UX	Design squad	Frame the provided mock (Agent + Scorecard dropdowns)	YES

16. Key Decisions + Alternatives Rejected

8a — Decisions Made

Date	Decision	Rationale
2026-06-19	Reuse the existing `(org, room_id, agent_id)` key — one record per agent per room	The schema already supports it; no migration needed (verified in `chatbot` + `hub-chat`)
2026-06-19	Surface a selector over the existing `roomParticipant[]` array rather than a new data path	The array already carries every agent; the gap is UI-only (`selectedAgent = roomParticipant[0]`)
2026-06-19	Per-agent segmentation reuses the Phase 2 handover-event mechanism	One segmentation approach across the initiative; lower build + consistent behavior
2026-06-19	Sequence as the last (parked) phase, after the AI-agent core	Human-QA scaling is independent of and lower-priority than the AI-agent scoring focus

8b — Alternatives Rejected

Alternative	Why Rejected	Date
Keep scoring only the first agent	Misses the actual resolver on handover rooms; the core problem	2026-06-19
One global scorecard for all roles	Can't fairly judge agents in different roles; the second half of the mock	2026-06-19
Build a new per-agent data model	Unnecessary — the `(org, room_id, agent_id)` key already supports it	2026-06-19

17. Open Questions

#	Type	Question	Owner	Deadline
1	Open Question	Does the scorecard model/API already support multiple selectable templates per org, or is that net-new? Today the FE loads a single org-wide enabled scorecard (the static "Default").	BOT	2026-10-15
2	Open Question	The auto-scorer (`auto_agent_scoring.rb`) scores only the "first agent" — should it be extended to auto-score every agent, or stay first-agent with manual multi-agent scoring here?	Bot/AI	2026-10-15
3	Risk	Per-agent segmentation depends on handover events firing reliably for human→human takeover. Mitigation: validate the events in Internal Alpha; fall back to the full transcript with a "segment unavailable" flag (P8-S03/AC-3).	Omnichannel	2026-10-31
4	Open Question	Per-team permission scoping is out of scope (RBAC is org-wide today). Confirm org-wide is acceptable for this phase, or schedule team-scoping separately.	PM + Platform	2026-10-15

PRD CHANGELOG

Version	Date	By	Section	Type	Summary
1.0	2026-06-19	Claude	All	CREATED	Phase 8 PRD (Multi-Agent Scoring & Selectable Scorecard) — grounded in cloned hub-chat: Agent + Scorecard selectors over the existing `roomParticipant[]` array and `(org, room_id, agent_id)` key, with per-agent segmentation reusing the handover events.

HEADER BLOCK​

Table of Contents​

2. CONDITIONAL BLOCK: PHASE CONTEXT​

3. One-liner + Problem​

4. What Happens If We Don't Ship This Phase​

5. Target Users + Persona Context​

6. Non-Goals​

7. Constraints​

8. Feature Changes​

9. API & Webhook Behavior​

10. System Flow + User Stories + ACs​

10.1 System Flow​

📊 System Flow — Multi-Agent Scoring​

10.2 User Stories​

11. Rollout​

12. Observability​

12.1 Post-Launch Monitoring Cadence​

13. Success Metrics​

14. Launch Plan & Stage Gates​

15. Dependencies​

16. Key Decisions + Alternatives Rejected​

17. Open Questions​

PRD CHANGELOG​