Qontak | Chatbot & AI | Unified Agent Quality Scorecard — Phase 8: Multi-Agent Scoring & Selectable Scorecard
Template: PHASE PRD v1.2 · Companion to PRD Section Reference v1.5 + Hierarchy v1.0 Note: Phase 8 (parked) of the initiative — a human-QA scaling capability, independent of the AI-agent phases (1–5). It extends the in-room Scorecard panel to score every agent who served a room, each on a selectable scorecard. Grounded in the cloned
hub-chat+chatbotcode.
HEADER BLOCK
| Field | Value |
|---|---|
| PM | Dimas Fauzi Hidayat |
| PRD Version | 1.0 |
| Status | DRAFT |
| PRD Type | PHASE |
| Epic | QC-XXXXX — add once Epic is created |
| Squad | BOT — Bot, AI & Automation (with Omnichannel/Chat) |
| RFC Link | Pending — RFC to follow via rfc-starter |
| Figma Master | The provided design mock (Agent + Scorecard dropdowns in the in-room panel) — frame link on design |
| Anchor | Qontak | Chatbot & AI | Unified Agent Quality Scorecard — ANCHOR |
| Labels | epic:qontak-chatbot-ai | module:chatbot-ai | feature:unified-agent-scorecard |
| Last Updated | 2026-06-19 |
Table of Contents
- HEADER BLOCK
- 2. CONDITIONAL BLOCK: PHASE CONTEXT
- 3. One-liner + Problem
- 4. What Happens If We Don't Ship This Phase
- 5. Target Users + Persona Context
- 6. Non-Goals
- 7. Constraints
- 8. Feature Changes
- 9. API & Webhook Behavior
- 10. System Flow + User Stories + ACs
- 11. Rollout
- 12. Observability
- 13. Success Metrics
- 14. Launch Plan & Stage Gates
- 15. Dependencies
- 16. Key Decisions + Alternatives Rejected
- 17. Open Questions
- PRD CHANGELOG
2. CONDITIONAL BLOCK: PHASE CONTEXT
| Field | Detail |
|---|---|
| Anchor PRD | Qontak | Chatbot & AI | Unified Agent Quality Scorecard — ANCHOR |
| Phase | Phase 8 of 8 |
| Phase Goal | Score every agent who served a room — each on a selectable scorecard template — via the Agent + Scorecard selectors. |
| Prior phases | Phases 1–5 deliver the AI-agent scoring core; Phase 2 shipped the in-room Scorecard panel that this phase extends. This phase is a human-QA scaling effort, independent of AI-agent scoring. |
| Deferred to later phases | None — this is the last planned phase (parked). |
| Cross-phase dependencies | Reuses (1) the Phase 2 in-room Scorecard panel, (2) the existing agent_scorecard model + the per-room GET/POST/PATCH /agent_scorecards/{roomId} API, (3) the roomParticipant[] array, and (4) the handover events (agent_take_room / remove_agent / handover_id). |
3. One-liner + Problem
One-liner: Let a supervisor score any agent who handled a room — not just the first — and pick which scorecard template applies to each.
Problem:
Today the in-room Scorecard panel scores only one agent per room — selectedAgent = roomParticipant[0] (the first participant) — against a single org-wide scorecard. When a room is handed over (agent takeover), the agent who actually resolved it often isn't the first, so supervisors can't QA the real handler; and there's no way to apply a role-appropriate scorecard (e.g. a different template for sales vs support). The (org, room_id, agent_id) unique key and the roomParticipant[] array already support multiple agents — only the UI and API resolve a single one. This caps QA coverage on multi-agent rooms and forces one-size-fits-all criteria.
4. What Happens If We Don't Ship This Phase
- Multi-agent rooms stay mis-QA'd — only the first participant is scorable; the agent who actually resolved a handed-over room goes unscored, so QA coverage has a blind spot that grows with handover volume.
- One-size-fits-all scoring persists — no role/use-case-appropriate scorecard, so criteria don't fit every agent (a sales agent judged on support parameters).
- The design mock stays unbuilt — the Agent + Scorecard dropdowns are a known, parked backlog gap with an existing design.
5. Target Users + Persona Context
Primary Persona: QA Lead / Supervisor
| Field | Detail |
|---|---|
| Role | QA Lead or Supervisor reviewing conversation quality, including handed-over (multi-agent) rooms |
| Goal | Score the agent who actually handled the room (any of them), each against the right scorecard template |
| Pain | The panel only exposes the first participant and one scorecard; the real resolver is often unscorable |
| Workaround | Skipping QA on handover rooms, or scoring the wrong (first) agent |
Secondary Persona: Team Lead (mixed-role team)
| Field | Detail |
|---|---|
| Role | Lead of a team spanning roles (sales + support) on shared rooms |
| Goal | Apply role-appropriate scorecards per agent rather than one generic template |
| Pain | A single org-wide scorecard can't fairly judge agents in different roles |
| Workaround | Mentally adjusting scores, or maintaining role criteria in spreadsheets |
6. Non-Goals
- Not AI-agent scoring — Phases 1–3 own that; this phase is human-QA scaling.
- Not authoring/creating scorecard templates — templates are assumed to exist; this phase only selects among them.
- Not the Analytics report — Phase 3.
- Not the go-live gate — Phase 5.
- No change to the auto-scorer logic (
auto_agent_scoring.rbstill picks the first agent — see Open Q#2). - No mobile — web only.
- Not per-team permission scoping — tracked separately (the org-wide RBAC caveat).
7. Constraints
| Field | Value |
|---|---|
| Platform | Web only — Qontak omnichannel web app |
| Performance | Agent/scorecard switch in the panel ≤ 500ms; scorecard fetch ≤ 800ms P95 |
| Data limits | One scorecard record per (org, room_id, agent_id) (existing unique key); the existing edit-once rule applies per record (edit_count / correction_at) |
| Plan scope | Professional + Enterprise (Agent Scorecard tier) |
| Feature flag | scorecard_multi_agent | default: OFF |
| Read/write | Read/score: Usman inbox_scorecard_view / inbox_scorecard_manage (org-wide today — no team scope). End CS agents: no access. |
8. Feature Changes
Change ID: CHG-A — Agent selector in the in-room Scorecard panel
| Field | Detail |
|---|---|
| Change Type | Modified component (in-room Scorecard panel) |
| Page | /inbox — conversation right panel → Scorecard tab |
| Page Intent | A supervisor scores an agent's quality for the open conversation |
| Before | The panel scores only the first participant — selectedAgent = roomParticipant[0] (AgentScorecard.vue); other agents who served the room are not selectable or scorable. |
| After | An Agent dropdown lists every agent who served the room (from roomParticipant[] / roomAgent[]); selecting an agent loads/creates that agent's scorecard record, stored per (room, agent_id). |
| Element | Before | After |
|---|---|---|
| Agent in scope | Hardcoded roomParticipant[0] | Selectable from all room agents |
| Scorecard records | One per room (first agent) | One per (room, agent_id) (existing key) |
Figma: The provided design mock (Agent dropdown).
Change ID: CHG-B — Scorecard template selector
| Field | Detail |
|---|---|
| Change Type | Modified component (in-room Scorecard panel) |
| Page | /inbox — conversation right panel → Scorecard tab |
| Page Intent | Apply the right scoring criteria to the selected agent |
| Before | A single org-wide enabled scorecard is loaded; the "Scorecard: Default" label is static — no choice. |
| After | The Scorecard dropdown lists the org's available scorecard templates; choosing one loads its categories/parameters for the selected agent. Requires the scorecard model/API to expose multiple templates (see Open Q#1). |
| Element | Before | After |
|---|---|---|
| Scorecard template | Single org-wide ("Default", static) | Selectable per agent/room |
Figma: The provided design mock (Scorecard dropdown).
9. API & Webhook Behavior
Behavior 1: Get/submit a scorecard for a specific (room, agent, template)
| Field | Detail |
|---|---|
| Entity affected | agent_scorecard record keyed by (org, room_id, agent_id) |
| Triggered by | Supervisor selects an agent (and template) in the panel, then loads/submits |
| Information passed | room_id, agent_id, scorecard_id (template) |
| Expected behavior | Extend the per-room GET/POST/PATCH /api/v1/gpt/omnichannel/agent_scorecards/{roomId} to accept agent_id + scorecard_id; return or persist that agent's record on the chosen template |
| Failure behavior | • Selected agent is not a room participant → 422. • Record already corrected (edit-once) → load read-only / block further edit. • Save fails → error + retry. |
Behavior 2: List scorecard templates for the org
| Field | Detail |
|---|---|
| Entity affected | Scorecard templates (category sets) available to the org |
| Triggered by | Panel opens / the Scorecard dropdown is opened |
| Information passed | organization_id |
| Expected behavior | Return the org's available scorecard templates for selection |
| Failure behavior | • None enabled → fall back to the Default scorecard. • Fetch fails → error; keep the last-loaded template. |
Claude resolves during RFC: HTTP method, path, request/response JSON schema, error codes.
10. System Flow + User Stories + ACs
10.1 System Flow
Flow: Score a Specific Agent on a Chosen Scorecard Type: User Journey
- A supervisor opens a resolved room's Scorecard panel.
- The Agent dropdown lists every agent who served the room (from
roomParticipant[]). - The supervisor selects an agent → the panel requests that agent's scorecard record (
room_id+agent_id). - The supervisor picks a template from the Scorecard dropdown → its categories/parameters load.
- Decision — does a record already exist for
(room, agent)? Yes → load it (read-only if the edit-once slot is used). No → start a new score. - The supervisor scores (thumbs/values) and submits → persisted per
(room, agent_id, scorecard_id). - (Optional) each agent's score is scoped to the turns they handled, derived from the handover events.
- Failure branch — the selected agent is not a participant → error; the dropdown only lists valid room agents.
📊 System Flow — Multi-Agent Scoring
graph TD
A[Supervisor opens resolved room Scorecard panel] --> B[Agent dropdown lists all room agents]
B --> C[Select an agent]
C --> D[Scorecard dropdown lists templates]
D --> E[Select a template -> categories load]
E --> F{Record exists for room+agent?}
F -->|Yes| G[Load record - read-only if edit-once used]
F -->|No| H[Start new score]
G --> I[Score + submit -> persist per room+agent+template]
H --> I
A -.invalid agent.-> J[Error - dropdown only lists valid room agents]
10.2 User Stories
[P8-S01] — Score a specific agent via the Agent selector
| User Story | As a Supervisor, I want to pick any agent who served the room and score them, so that I can QA the agent who actually handled the conversation, not just the first. |
| Before State | The panel scores only roomParticipant[0]; other agents who served the room are not selectable. |
| After Delta | An Agent dropdown lists all room agents; selecting one loads/creates that agent's scorecard record per (room, agent_id). |
| Importance | Must Have |
| Mockup / Technical Notes | Figma: The provided design mock (Agent dropdown) Data Fields: • room_id (string, required) — conversation• agent_id (string, required) — selected agent• roomParticipant[] (array) — source of the agent listTechnical Notes: Extends AgentScorecard.vue (selectedAgent no longer hardcoded [0]) + the per-room API to be agent-aware. Records reuse the (org, room_id, agent_id) unique key. |
| Acceptance Criteria | — Happy Path — • AC-1: Given a resolved room served by ≥2 agents, when the supervisor opens the panel, then the Agent dropdown lists every agent who served the room. • AC-2: Given the supervisor selects an agent, when the panel loads, then that agent's scorecard record is fetched (or a new one is started) for (room, agent_id).• AC-3: Given the supervisor scores and submits, when saved, then the record persists against the selected agent without affecting other agents' records on the same room. — Edge — • AC-4: Given an agent already has a saved (and edit-once-locked) record, when selected, then their score loads read-only. — Error / Unhappy Path — • ERR-1: Given the selected agent is not a room participant, when the panel requests their record, then a 422/"not a participant" state is shown and scorecard_invalid_agent is logged.— Permission Model — • CAN: Usman inbox_scorecard_manage (org-wide).• CANNOT: inbox_scorecard_view is read-only; end CS agents none.• Unauthorized: panel/dropdown not rendered. — UI States — • Loading: dropdown + form skeleton while fetching. • Empty: single-agent room → dropdown shows one agent (today's behavior). • Error: as ERR-1. • Success: selected agent's scorecard shown. — Negative Scenarios — • NEG-1: Given a room with only one agent, when the panel opens, then it behaves exactly as today (no regression). |
Dependencies: Phase 2 in-room panel.
[P8-S02] — Choose a scorecard template via the Scorecard selector
| User Story | As a Supervisor, I want to choose which scorecard template applies to the agent I'm scoring, so that the criteria fit the agent's role/use-case. |
| Before State | One org-wide scorecard loads; the "Scorecard: Default" label is static — no choice. |
| After Delta | A Scorecard dropdown lists the org's available templates; choosing one loads its categories/parameters for the selected agent. |
| Importance | Must Have |
| Mockup / Technical Notes | Figma: The provided design mock (Scorecard dropdown) Data Fields: • organization_id (string, required) — Auth session• scorecard_id (string, required) — chosen templateTechnical Notes: Needs the scorecard model/API to expose multiple templates — confirm whether multiple templates exist today or are net-new (Open Q#1). |
| Acceptance Criteria | — Happy Path — • AC-1: Given the panel is open, when the supervisor opens the Scorecard dropdown, then the org's available scorecard templates are listed. • AC-2: Given the supervisor selects a template, when it loads, then its categories/parameters render for the selected agent. • AC-3: Given a template is chosen and a score submitted, when saved, then the record records which scorecard_id (template) was used.— Edge — • AC-4: Given the org has only the Default scorecard, when the dropdown opens, then only "Default" is listed (no regression). — Error / Unhappy Path — • ERR-1: Given the template list fetch fails, when the dropdown opens, then an error is shown and the last-loaded template stays active; scorecard_template_fetch_failed is logged.— Permission Model — • CAN: inbox_scorecard_manage.• CANNOT: inbox_scorecard_view read-only; end agents none.• Unauthorized: control not rendered. — UI States — • Loading: dropdown spinner while listing templates. • Empty: only Default available. • Error: as ERR-1. • Success: chosen template loaded. — Negative Scenarios — • NEG-1: Given a Starter/Free org, when the panel opens, then template selection is not available (plan-gated). |
Dependencies: P8-S01.
[P8-S03] — Scope each agent's score to the turns they handled
| User Story | As a Supervisor, I want each agent scored on the turns they actually handled, so that handover doesn't credit/penalize the wrong agent. |
| Before State | Scoring is over the whole room; there's no per-agent turn segmentation. |
| After Delta | The panel scopes the conversation shown for an agent to their handled turns, derived from the handover events (agent_take_room / remove_agent / handover_id). |
| Importance | Should Have |
| Mockup / Technical Notes | Figma: The provided design mock Data Fields: • room_id, agent_id (required)• handover_id / handover events — segment boundariesTechnical Notes: Same segmentation mechanism as Phase 2 (P2/Open Q#3); confirm events fire for human→human handover. |
| Acceptance Criteria | — Happy Path — • AC-1: Given a room handed over from Agent A to Agent B, when the supervisor selects Agent A, then the turns shown are scoped to A's handled segment (before the handover). • AC-2: Given the supervisor selects Agent B, when the panel loads, then the turns shown are scoped to B's segment (after the handover). — Edge — • AC-3: Given handover events are missing for a room, when an agent is selected, then the panel falls back to the full transcript and flags "segment unavailable". — Permission Model — • CAN: inbox_scorecard_view / manage.• CANNOT: end agents. • Unauthorized: not rendered. — UI States — • Loading: transcript skeleton. • Empty: N/A. • Error: "segment unavailable" → full transcript fallback. • Success: segmented transcript shown. — Negative Scenarios — • NEG-1: Given a single-agent room, when an agent is selected, then the full transcript is shown (no segmentation needed). |
Dependencies: P8-S01.
11. Rollout
| Field | Value |
|---|---|
| Feature flag | scorecard_multi_agent — default: OFF |
| Stage 1 | Internal QA: 3–5 internal accounts with multi-agent (handover) rooms |
| Stage 2 | Closed beta: 3–5 accounts with high handover volume + mixed-role teams |
| Stage 3 | All Professional + Enterprise on request |
| GA | All Professional + Enterprise (flag on) |
| Backward compat | Yes — single-agent rooms behave exactly as today; the dropdowns are additive |
| Migration | None — reuses the existing agent_scorecard (org, room_id, agent_id) records. |
12. Observability
Key Events:
| Event Name | Trigger | Properties |
|---|---|---|
scorecard_agent_selected | Supervisor selects an agent in the panel | org_id, room_id, agent_id |
scorecard_template_selected | Supervisor selects a scorecard template | org_id, scorecard_id |
scorecard_multi_agent_submitted | A per-agent scorecard is submitted | org_id, room_id, agent_id, scorecard_id |
scorecard_invalid_agent | Selected agent not a room participant | org_id, room_id, agent_id |
scorecard_template_fetch_failed | Template list fetch failed | org_id, reason |
| Field | Detail |
|---|---|
| Dashboard owner | Bot, AI & Automation (squad: BOT) + Omnichannel |
| Alert 1 | scorecard_invalid_agent > 2% of selections in 1h → Slack: #bot-ai-oncall |
12.1 Post-Launch Monitoring Cadence
| Field | Detail |
|---|---|
| Review cadence | Weekly for the first 4 weeks post-GA, then monthly |
| Owner | Dimas Fauzi Hidayat (PM) + BOT squad |
| Review scope | scorecard_agent_selected, scorecard_template_selected, scorecard_multi_agent_submitted, scorecard_invalid_agent |
| Trigger threshold 1 | scorecard_invalid_agent > 2% week-over-week → investigate the participant list |
| Trigger threshold 2 | Multi-agent rooms scored / multi-agent rooms resolved < 20% after 4 weeks → revisit discoverability |
| Rollback consideration | If invalid-agent or save errors persist > 48h, PM disables scorecard_multi_agent for affected orgs. |
13. Success Metrics
Adoption & Usage:
| Metric | Definition | Baseline | Target |
|---|---|---|---|
| ⭐ Multi-agent QA coverage | % of multi-agent (handover) resolved rooms where >1 agent is scored | ~0% — only the first agent is scorable today | ≥40% within 90 days of GA |
| Template selection usage | % of submissions that use a non-Default scorecard | 0% — single org-wide scorecard | ≥25% within 90 days of GA (for orgs with ≥2 templates) |
Quality & Accuracy:
| Metric | Definition | Baseline | Target |
|---|---|---|---|
| Correct-handler scoring | Sampled checks where the scored agent matches the actual resolver on handover rooms | N/A | ≥95% on the validation sample |
14. Launch Plan & Stage Gates
| Stage | Audience | Duration | Success Gate to Advance | Owner |
|---|---|---|---|---|
| Internal Alpha | 3–5 internal QA accounts | 2 weeks | 0 P0/P1; single-agent rooms unchanged; scorecard_invalid_agent ≤2% | PM + QA |
| Closed Beta | 3–5 high-handover / mixed-role accounts | 2 weeks | >1 agent scored on handover rooms; template selection works; no P0 | PM + BOT |
| Open Beta | All Pro+Ent on request | 2 weeks | Multi-agent QA coverage climbing; no P0 for 2 weeks | Eng Lead |
| GA | All Pro+Ent | Ongoing | All Open Beta gates sustained 2 weeks; PMM approved | PM + PMM |
15. Dependencies
| Dependency | Owning Team | Deliverable Needed | Blocking? |
|---|---|---|---|
| Multiple scorecard templates in the model/API | BOT (chatbot) | The scorecard model/API exposes >1 selectable template per org (confirm if net-new — Open Q#1) | YES |
| Agent-aware scorecard API | BOT (chatbot) | Extend GET/POST/PATCH /agent_scorecards/{roomId} to accept agent_id + scorecard_id | YES |
| Handover events for segmentation | Omnichannel | agent_take_room / remove_agent / handover_id fire reliably for human→human handover | NO (P8-S03 only) |
| In-room panel (Phase 2) | BOT (Phase 2) | The panel this phase extends | YES |
| Design / UX | Design squad | Frame the provided mock (Agent + Scorecard dropdowns) | YES |
16. Key Decisions + Alternatives Rejected
8a — Decisions Made
| Date | Decision | Rationale |
|---|---|---|
| 2026-06-19 | Reuse the existing (org, room_id, agent_id) key — one record per agent per room | The schema already supports it; no migration needed (verified in chatbot + hub-chat) |
| 2026-06-19 | Surface a selector over the existing roomParticipant[] array rather than a new data path | The array already carries every agent; the gap is UI-only (selectedAgent = roomParticipant[0]) |
| 2026-06-19 | Per-agent segmentation reuses the Phase 2 handover-event mechanism | One segmentation approach across the initiative; lower build + consistent behavior |
| 2026-06-19 | Sequence as the last (parked) phase, after the AI-agent core | Human-QA scaling is independent of and lower-priority than the AI-agent scoring focus |
8b — Alternatives Rejected
| Alternative | Why Rejected | Date |
|---|---|---|
| Keep scoring only the first agent | Misses the actual resolver on handover rooms; the core problem | 2026-06-19 |
| One global scorecard for all roles | Can't fairly judge agents in different roles; the second half of the mock | 2026-06-19 |
| Build a new per-agent data model | Unnecessary — the (org, room_id, agent_id) key already supports it | 2026-06-19 |
17. Open Questions
| # | Type | Question | Owner | Deadline |
|---|---|---|---|---|
| 1 | Open Question | Does the scorecard model/API already support multiple selectable templates per org, or is that net-new? Today the FE loads a single org-wide enabled scorecard (the static "Default"). | BOT | 2026-10-15 |
| 2 | Open Question | The auto-scorer (auto_agent_scoring.rb) scores only the "first agent" — should it be extended to auto-score every agent, or stay first-agent with manual multi-agent scoring here? | Bot/AI | 2026-10-15 |
| 3 | Risk | Per-agent segmentation depends on handover events firing reliably for human→human takeover. Mitigation: validate the events in Internal Alpha; fall back to the full transcript with a "segment unavailable" flag (P8-S03/AC-3). | Omnichannel | 2026-10-31 |
| 4 | Open Question | Per-team permission scoping is out of scope (RBAC is org-wide today). Confirm org-wide is acceptable for this phase, or schedule team-scoping separately. | PM + Platform | 2026-10-15 |
PRD CHANGELOG
| Version | Date | By | Section | Type | Summary |
|---|---|---|---|---|---|
| 1.0 | 2026-06-19 | Claude | All | CREATED | Phase 8 PRD (Multi-Agent Scoring & Selectable Scorecard) — grounded in cloned hub-chat: Agent + Scorecard selectors over the existing roomParticipant[] array and (org, room_id, agent_id) key, with per-agent segmentation reusing the handover events. |