Skip to main content

Qontak | Chatbot & AI | Unified Agent Quality Scorecard — Phase 8: Multi-Agent Scoring & Selectable Scorecard

Template: PHASE PRD v1.2 · Companion to PRD Section Reference v1.5 + Hierarchy v1.0 Note: Phase 8 (parked) of the initiative — a human-QA scaling capability, independent of the AI-agent phases (1–5). It extends the in-room Scorecard panel to score every agent who served a room, each on a selectable scorecard. Grounded in the cloned hub-chat + chatbot code.


HEADER BLOCK

FieldValue
PMDimas Fauzi Hidayat
PRD Version1.0
StatusDRAFT
PRD TypePHASE
EpicQC-XXXXX — add once Epic is created
SquadBOT — Bot, AI & Automation (with Omnichannel/Chat)
RFC LinkPending — RFC to follow via rfc-starter
Figma MasterThe provided design mock (Agent + Scorecard dropdowns in the in-room panel) — frame link on design
AnchorQontak | Chatbot & AI | Unified Agent Quality Scorecard — ANCHOR
Labelsepic:qontak-chatbot-ai | module:chatbot-ai | feature:unified-agent-scorecard
Last Updated2026-06-19

Table of Contents


2. CONDITIONAL BLOCK: PHASE CONTEXT

FieldDetail
Anchor PRDQontak | Chatbot & AI | Unified Agent Quality Scorecard — ANCHOR
PhasePhase 8 of 8
Phase GoalScore every agent who served a room — each on a selectable scorecard template — via the Agent + Scorecard selectors.
Prior phasesPhases 1–5 deliver the AI-agent scoring core; Phase 2 shipped the in-room Scorecard panel that this phase extends. This phase is a human-QA scaling effort, independent of AI-agent scoring.
Deferred to later phasesNone — this is the last planned phase (parked).
Cross-phase dependenciesReuses (1) the Phase 2 in-room Scorecard panel, (2) the existing agent_scorecard model + the per-room GET/POST/PATCH /agent_scorecards/{roomId} API, (3) the roomParticipant[] array, and (4) the handover events (agent_take_room / remove_agent / handover_id).

3. One-liner + Problem

One-liner: Let a supervisor score any agent who handled a room — not just the first — and pick which scorecard template applies to each.

Problem: Today the in-room Scorecard panel scores only one agent per room — selectedAgent = roomParticipant[0] (the first participant) — against a single org-wide scorecard. When a room is handed over (agent takeover), the agent who actually resolved it often isn't the first, so supervisors can't QA the real handler; and there's no way to apply a role-appropriate scorecard (e.g. a different template for sales vs support). The (org, room_id, agent_id) unique key and the roomParticipant[] array already support multiple agents — only the UI and API resolve a single one. This caps QA coverage on multi-agent rooms and forces one-size-fits-all criteria.


4. What Happens If We Don't Ship This Phase

  • Multi-agent rooms stay mis-QA'd — only the first participant is scorable; the agent who actually resolved a handed-over room goes unscored, so QA coverage has a blind spot that grows with handover volume.
  • One-size-fits-all scoring persists — no role/use-case-appropriate scorecard, so criteria don't fit every agent (a sales agent judged on support parameters).
  • The design mock stays unbuilt — the Agent + Scorecard dropdowns are a known, parked backlog gap with an existing design.

5. Target Users + Persona Context

Primary Persona: QA Lead / Supervisor

FieldDetail
RoleQA Lead or Supervisor reviewing conversation quality, including handed-over (multi-agent) rooms
GoalScore the agent who actually handled the room (any of them), each against the right scorecard template
PainThe panel only exposes the first participant and one scorecard; the real resolver is often unscorable
WorkaroundSkipping QA on handover rooms, or scoring the wrong (first) agent

Secondary Persona: Team Lead (mixed-role team)

FieldDetail
RoleLead of a team spanning roles (sales + support) on shared rooms
GoalApply role-appropriate scorecards per agent rather than one generic template
PainA single org-wide scorecard can't fairly judge agents in different roles
WorkaroundMentally adjusting scores, or maintaining role criteria in spreadsheets

6. Non-Goals

  1. Not AI-agent scoring — Phases 1–3 own that; this phase is human-QA scaling.
  2. Not authoring/creating scorecard templates — templates are assumed to exist; this phase only selects among them.
  3. Not the Analytics report — Phase 3.
  4. Not the go-live gate — Phase 5.
  5. No change to the auto-scorer logic (auto_agent_scoring.rb still picks the first agent — see Open Q#2).
  6. No mobile — web only.
  7. Not per-team permission scoping — tracked separately (the org-wide RBAC caveat).

7. Constraints

FieldValue
PlatformWeb only — Qontak omnichannel web app
PerformanceAgent/scorecard switch in the panel ≤ 500ms; scorecard fetch ≤ 800ms P95
Data limitsOne scorecard record per (org, room_id, agent_id) (existing unique key); the existing edit-once rule applies per record (edit_count / correction_at)
Plan scopeProfessional + Enterprise (Agent Scorecard tier)
Feature flagscorecard_multi_agent | default: OFF
Read/writeRead/score: Usman inbox_scorecard_view / inbox_scorecard_manage (org-wide today — no team scope). End CS agents: no access.

8. Feature Changes

Change ID: CHG-A — Agent selector in the in-room Scorecard panel

FieldDetail
Change TypeModified component (in-room Scorecard panel)
Page/inbox — conversation right panel → Scorecard tab
Page IntentA supervisor scores an agent's quality for the open conversation
BeforeThe panel scores only the first participant — selectedAgent = roomParticipant[0] (AgentScorecard.vue); other agents who served the room are not selectable or scorable.
AfterAn Agent dropdown lists every agent who served the room (from roomParticipant[] / roomAgent[]); selecting an agent loads/creates that agent's scorecard record, stored per (room, agent_id).
ElementBeforeAfter
Agent in scopeHardcoded roomParticipant[0]Selectable from all room agents
Scorecard recordsOne per room (first agent)One per (room, agent_id) (existing key)

Figma: The provided design mock (Agent dropdown).

Change ID: CHG-B — Scorecard template selector

FieldDetail
Change TypeModified component (in-room Scorecard panel)
Page/inbox — conversation right panel → Scorecard tab
Page IntentApply the right scoring criteria to the selected agent
BeforeA single org-wide enabled scorecard is loaded; the "Scorecard: Default" label is static — no choice.
AfterThe Scorecard dropdown lists the org's available scorecard templates; choosing one loads its categories/parameters for the selected agent. Requires the scorecard model/API to expose multiple templates (see Open Q#1).
ElementBeforeAfter
Scorecard templateSingle org-wide ("Default", static)Selectable per agent/room

Figma: The provided design mock (Scorecard dropdown).


9. API & Webhook Behavior

Behavior 1: Get/submit a scorecard for a specific (room, agent, template)

FieldDetail
Entity affectedagent_scorecard record keyed by (org, room_id, agent_id)
Triggered bySupervisor selects an agent (and template) in the panel, then loads/submits
Information passedroom_id, agent_id, scorecard_id (template)
Expected behaviorExtend the per-room GET/POST/PATCH /api/v1/gpt/omnichannel/agent_scorecards/{roomId} to accept agent_id + scorecard_id; return or persist that agent's record on the chosen template
Failure behavior• Selected agent is not a room participant → 422.
• Record already corrected (edit-once) → load read-only / block further edit.
• Save fails → error + retry.

Behavior 2: List scorecard templates for the org

FieldDetail
Entity affectedScorecard templates (category sets) available to the org
Triggered byPanel opens / the Scorecard dropdown is opened
Information passedorganization_id
Expected behaviorReturn the org's available scorecard templates for selection
Failure behavior• None enabled → fall back to the Default scorecard.
• Fetch fails → error; keep the last-loaded template.

Claude resolves during RFC: HTTP method, path, request/response JSON schema, error codes.


10. System Flow + User Stories + ACs

10.1 System Flow

Flow: Score a Specific Agent on a Chosen Scorecard Type: User Journey

  1. A supervisor opens a resolved room's Scorecard panel.
  2. The Agent dropdown lists every agent who served the room (from roomParticipant[]).
  3. The supervisor selects an agent → the panel requests that agent's scorecard record (room_id + agent_id).
  4. The supervisor picks a template from the Scorecard dropdown → its categories/parameters load.
  5. Decision — does a record already exist for (room, agent)? Yes → load it (read-only if the edit-once slot is used). No → start a new score.
  6. The supervisor scores (thumbs/values) and submits → persisted per (room, agent_id, scorecard_id).
  7. (Optional) each agent's score is scoped to the turns they handled, derived from the handover events.
  8. Failure branch — the selected agent is not a participant → error; the dropdown only lists valid room agents.

📊 System Flow — Multi-Agent Scoring

graph TD
A[Supervisor opens resolved room Scorecard panel] --> B[Agent dropdown lists all room agents]
B --> C[Select an agent]
C --> D[Scorecard dropdown lists templates]
D --> E[Select a template -> categories load]
E --> F{Record exists for room+agent?}
F -->|Yes| G[Load record - read-only if edit-once used]
F -->|No| H[Start new score]
G --> I[Score + submit -> persist per room+agent+template]
H --> I
A -.invalid agent.-> J[Error - dropdown only lists valid room agents]

10.2 User Stories

[P8-S01] — Score a specific agent via the Agent selector

User StoryAs a Supervisor, I want to pick any agent who served the room and score them, so that I can QA the agent who actually handled the conversation, not just the first.
Before StateThe panel scores only roomParticipant[0]; other agents who served the room are not selectable.
After DeltaAn Agent dropdown lists all room agents; selecting one loads/creates that agent's scorecard record per (room, agent_id).
ImportanceMust Have
Mockup / Technical NotesFigma: The provided design mock (Agent dropdown)

Data Fields:
room_id (string, required) — conversation
agent_id (string, required) — selected agent
roomParticipant[] (array) — source of the agent list

Technical Notes: Extends AgentScorecard.vue (selectedAgent no longer hardcoded [0]) + the per-room API to be agent-aware. Records reuse the (org, room_id, agent_id) unique key.
Acceptance Criteria— Happy Path —
• AC-1: Given a resolved room served by ≥2 agents, when the supervisor opens the panel, then the Agent dropdown lists every agent who served the room.
• AC-2: Given the supervisor selects an agent, when the panel loads, then that agent's scorecard record is fetched (or a new one is started) for (room, agent_id).
• AC-3: Given the supervisor scores and submits, when saved, then the record persists against the selected agent without affecting other agents' records on the same room.

— Edge —
• AC-4: Given an agent already has a saved (and edit-once-locked) record, when selected, then their score loads read-only.

— Error / Unhappy Path —
• ERR-1: Given the selected agent is not a room participant, when the panel requests their record, then a 422/"not a participant" state is shown and scorecard_invalid_agent is logged.

— Permission Model —
• CAN: Usman inbox_scorecard_manage (org-wide).
• CANNOT: inbox_scorecard_view is read-only; end CS agents none.
• Unauthorized: panel/dropdown not rendered.

— UI States —
• Loading: dropdown + form skeleton while fetching.
• Empty: single-agent room → dropdown shows one agent (today's behavior).
• Error: as ERR-1.
• Success: selected agent's scorecard shown.

— Negative Scenarios —
• NEG-1: Given a room with only one agent, when the panel opens, then it behaves exactly as today (no regression).

Dependencies: Phase 2 in-room panel.


[P8-S02] — Choose a scorecard template via the Scorecard selector

User StoryAs a Supervisor, I want to choose which scorecard template applies to the agent I'm scoring, so that the criteria fit the agent's role/use-case.
Before StateOne org-wide scorecard loads; the "Scorecard: Default" label is static — no choice.
After DeltaA Scorecard dropdown lists the org's available templates; choosing one loads its categories/parameters for the selected agent.
ImportanceMust Have
Mockup / Technical NotesFigma: The provided design mock (Scorecard dropdown)

Data Fields:
organization_id (string, required) — Auth session
scorecard_id (string, required) — chosen template

Technical Notes: Needs the scorecard model/API to expose multiple templates — confirm whether multiple templates exist today or are net-new (Open Q#1).
Acceptance Criteria— Happy Path —
• AC-1: Given the panel is open, when the supervisor opens the Scorecard dropdown, then the org's available scorecard templates are listed.
• AC-2: Given the supervisor selects a template, when it loads, then its categories/parameters render for the selected agent.
• AC-3: Given a template is chosen and a score submitted, when saved, then the record records which scorecard_id (template) was used.

— Edge —
• AC-4: Given the org has only the Default scorecard, when the dropdown opens, then only "Default" is listed (no regression).

— Error / Unhappy Path —
• ERR-1: Given the template list fetch fails, when the dropdown opens, then an error is shown and the last-loaded template stays active; scorecard_template_fetch_failed is logged.

— Permission Model —
• CAN: inbox_scorecard_manage.
• CANNOT: inbox_scorecard_view read-only; end agents none.
• Unauthorized: control not rendered.

— UI States —
• Loading: dropdown spinner while listing templates.
• Empty: only Default available.
• Error: as ERR-1.
• Success: chosen template loaded.

— Negative Scenarios —
• NEG-1: Given a Starter/Free org, when the panel opens, then template selection is not available (plan-gated).

Dependencies: P8-S01.


[P8-S03] — Scope each agent's score to the turns they handled

User StoryAs a Supervisor, I want each agent scored on the turns they actually handled, so that handover doesn't credit/penalize the wrong agent.
Before StateScoring is over the whole room; there's no per-agent turn segmentation.
After DeltaThe panel scopes the conversation shown for an agent to their handled turns, derived from the handover events (agent_take_room / remove_agent / handover_id).
ImportanceShould Have
Mockup / Technical NotesFigma: The provided design mock

Data Fields:
room_id, agent_id (required)
handover_id / handover events — segment boundaries

Technical Notes: Same segmentation mechanism as Phase 2 (P2/Open Q#3); confirm events fire for human→human handover.
Acceptance Criteria— Happy Path —
• AC-1: Given a room handed over from Agent A to Agent B, when the supervisor selects Agent A, then the turns shown are scoped to A's handled segment (before the handover).
• AC-2: Given the supervisor selects Agent B, when the panel loads, then the turns shown are scoped to B's segment (after the handover).

— Edge —
• AC-3: Given handover events are missing for a room, when an agent is selected, then the panel falls back to the full transcript and flags "segment unavailable".

— Permission Model —
• CAN: inbox_scorecard_view / manage.
• CANNOT: end agents.
• Unauthorized: not rendered.

— UI States —
• Loading: transcript skeleton.
• Empty: N/A.
• Error: "segment unavailable" → full transcript fallback.
• Success: segmented transcript shown.

— Negative Scenarios —
• NEG-1: Given a single-agent room, when an agent is selected, then the full transcript is shown (no segmentation needed).

Dependencies: P8-S01.


11. Rollout

FieldValue
Feature flagscorecard_multi_agent — default: OFF
Stage 1Internal QA: 3–5 internal accounts with multi-agent (handover) rooms
Stage 2Closed beta: 3–5 accounts with high handover volume + mixed-role teams
Stage 3All Professional + Enterprise on request
GAAll Professional + Enterprise (flag on)
Backward compatYes — single-agent rooms behave exactly as today; the dropdowns are additive
MigrationNone — reuses the existing agent_scorecard (org, room_id, agent_id) records.

12. Observability

Key Events:

Event NameTriggerProperties
scorecard_agent_selectedSupervisor selects an agent in the panelorg_id, room_id, agent_id
scorecard_template_selectedSupervisor selects a scorecard templateorg_id, scorecard_id
scorecard_multi_agent_submittedA per-agent scorecard is submittedorg_id, room_id, agent_id, scorecard_id
scorecard_invalid_agentSelected agent not a room participantorg_id, room_id, agent_id
scorecard_template_fetch_failedTemplate list fetch failedorg_id, reason
FieldDetail
Dashboard ownerBot, AI & Automation (squad: BOT) + Omnichannel
Alert 1scorecard_invalid_agent > 2% of selections in 1h → Slack: #bot-ai-oncall

12.1 Post-Launch Monitoring Cadence

FieldDetail
Review cadenceWeekly for the first 4 weeks post-GA, then monthly
OwnerDimas Fauzi Hidayat (PM) + BOT squad
Review scopescorecard_agent_selected, scorecard_template_selected, scorecard_multi_agent_submitted, scorecard_invalid_agent
Trigger threshold 1scorecard_invalid_agent > 2% week-over-week → investigate the participant list
Trigger threshold 2Multi-agent rooms scored / multi-agent rooms resolved < 20% after 4 weeks → revisit discoverability
Rollback considerationIf invalid-agent or save errors persist > 48h, PM disables scorecard_multi_agent for affected orgs.

13. Success Metrics

Adoption & Usage:

MetricDefinitionBaselineTarget
Multi-agent QA coverage% of multi-agent (handover) resolved rooms where >1 agent is scored~0% — only the first agent is scorable today≥40% within 90 days of GA
Template selection usage% of submissions that use a non-Default scorecard0% — single org-wide scorecard≥25% within 90 days of GA (for orgs with ≥2 templates)

Quality & Accuracy:

MetricDefinitionBaselineTarget
Correct-handler scoringSampled checks where the scored agent matches the actual resolver on handover roomsN/A≥95% on the validation sample

14. Launch Plan & Stage Gates

StageAudienceDurationSuccess Gate to AdvanceOwner
Internal Alpha3–5 internal QA accounts2 weeks0 P0/P1; single-agent rooms unchanged; scorecard_invalid_agent ≤2%PM + QA
Closed Beta3–5 high-handover / mixed-role accounts2 weeks>1 agent scored on handover rooms; template selection works; no P0PM + BOT
Open BetaAll Pro+Ent on request2 weeksMulti-agent QA coverage climbing; no P0 for 2 weeksEng Lead
GAAll Pro+EntOngoingAll Open Beta gates sustained 2 weeks; PMM approvedPM + PMM

15. Dependencies

DependencyOwning TeamDeliverable NeededBlocking?
Multiple scorecard templates in the model/APIBOT (chatbot)The scorecard model/API exposes >1 selectable template per org (confirm if net-new — Open Q#1)YES
Agent-aware scorecard APIBOT (chatbot)Extend GET/POST/PATCH /agent_scorecards/{roomId} to accept agent_id + scorecard_idYES
Handover events for segmentationOmnichannelagent_take_room / remove_agent / handover_id fire reliably for human→human handoverNO (P8-S03 only)
In-room panel (Phase 2)BOT (Phase 2)The panel this phase extendsYES
Design / UXDesign squadFrame the provided mock (Agent + Scorecard dropdowns)YES

16. Key Decisions + Alternatives Rejected

8a — Decisions Made

DateDecisionRationale
2026-06-19Reuse the existing (org, room_id, agent_id) key — one record per agent per roomThe schema already supports it; no migration needed (verified in chatbot + hub-chat)
2026-06-19Surface a selector over the existing roomParticipant[] array rather than a new data pathThe array already carries every agent; the gap is UI-only (selectedAgent = roomParticipant[0])
2026-06-19Per-agent segmentation reuses the Phase 2 handover-event mechanismOne segmentation approach across the initiative; lower build + consistent behavior
2026-06-19Sequence as the last (parked) phase, after the AI-agent coreHuman-QA scaling is independent of and lower-priority than the AI-agent scoring focus

8b — Alternatives Rejected

AlternativeWhy RejectedDate
Keep scoring only the first agentMisses the actual resolver on handover rooms; the core problem2026-06-19
One global scorecard for all rolesCan't fairly judge agents in different roles; the second half of the mock2026-06-19
Build a new per-agent data modelUnnecessary — the (org, room_id, agent_id) key already supports it2026-06-19

17. Open Questions

#TypeQuestionOwnerDeadline
1Open QuestionDoes the scorecard model/API already support multiple selectable templates per org, or is that net-new? Today the FE loads a single org-wide enabled scorecard (the static "Default").BOT2026-10-15
2Open QuestionThe auto-scorer (auto_agent_scoring.rb) scores only the "first agent" — should it be extended to auto-score every agent, or stay first-agent with manual multi-agent scoring here?Bot/AI2026-10-15
3RiskPer-agent segmentation depends on handover events firing reliably for human→human takeover. Mitigation: validate the events in Internal Alpha; fall back to the full transcript with a "segment unavailable" flag (P8-S03/AC-3).Omnichannel2026-10-31
4Open QuestionPer-team permission scoping is out of scope (RBAC is org-wide today). Confirm org-wide is acceptable for this phase, or schedule team-scoping separately.PM + Platform2026-10-15

PRD CHANGELOG

VersionDateBySectionTypeSummary
1.02026-06-19ClaudeAllCREATEDPhase 8 PRD (Multi-Agent Scoring & Selectable Scorecard) — grounded in cloned hub-chat: Agent + Scorecard selectors over the existing roomParticipant[] array and (org, room_id, agent_id) key, with per-agent segmentation reusing the handover events.