Skip to main content

Qontak | Chatbot & AI | Unified Agent Quality Scorecard — Phase 1: Scorecard Settings & Rubric Config

Template: NEW PRD v1.2 · Companion to PRD Section Reference v1.5 + Hierarchy v1.0 Note: Phase 1 of the Unified Agent Quality Scorecard initiative. Builds the config layer only — no scoring, in-room panel, report, or gate (those are Phases 2–5). The detailed superset draft is preserved at unified_agent_scorecard_SUPERSET_allphases_19Jun.md.


HEADER BLOCK

FieldValue
PMDimas Fauzi Hidayat
PRD Version1.2
StatusDRAFT
PRD TypeNEW
EpicQC-XXXXX — add once Epic is created
SquadBOT — Bot, AI & Automation
RFC LinkPending — RFC to follow via rfc-starter
Figma MasterPending — settings + rubric editor not yet designed (Stitch prompts in Appendix B)
AnchorYes — Qontak | Chatbot & AI | Unified Agent Quality Scorecard — ANCHOR
Labelsepic:qontak-chatbot-ai | module:chatbot-ai | feature:unified-agent-scorecard
Last Updated2026-06-19

Table of Contents


2. One-liner + Problem

One-liner: Let admins enable AI auto-scoring and set the pass bar, and let QA leads / bot admins define the rubric that scores AI agents.

Problem: There is no configuration layer for AI-agent scoring today. The existing is_auto_score already drives a GPT auto-scorer (auto_agent_scoring.rb) that scores the human agent on the manual categories on room resolve — but there is no way to turn on AI-agent (two-tier) scoring, no AI pass threshold for it, and scorecard_custom_parameter.prompt exists in the schema yet is unused and unsurfaced. Before any AI conversation can be scored (Phase 2), QA leads and bot admins across Qontak omnichannel accounts need a place to define what "good" means for the AI agent — which metrics apply, what the pass bar is, and any org-specific criteria. Without this foundation, every later phase (scoring, report, gate) has nothing to score the AI agent against.


3. What Happens If We Don't Build This

  • Every later phase is blocked — Phase 2 scoring (targeted Q3 2026), the report (P3), and the gate (P5) all consume the rubric + threshold this phase defines; each quarter of slippage pushes measured AI quality out another quarter.
  • The custom-param prompt field stays unused — it has sat in the schema since the Nov 2024 migration with no surface, and there is no way to turn on AI-agent scoring at all; without this phase, orgs can't express AI scoring criteria.
  • The adoption decline continues — Agent Scorecard is already the lowest-adoption paid feature at every tier (declining 3 consecutive months); with no AI value to define, there is nothing to reverse it.

4. Target Users + Persona Context

Primary Persona: QA Lead / Supervisor

FieldDetail
RoleQA Lead or Supervisor accountable for conversation quality across human and AI agents
GoalDefine the quality bar and the rubric (defaults + org-specific criteria) the AI agent will be scored against
PainNo way to configure AI scoring; the existing scorecard config is manual-human-only
WorkaroundQuality expectations live in spreadsheets/training docs, not in the product

Secondary Persona: Bot / AI Admin (Agent Owner)

FieldDetail
RoleThe Bot/AI specialist/admin who configures AI agents
GoalAdd org-specific scoring criteria (e.g. BANT capture, promo accuracy) for their agents
PainCannot express bespoke success criteria for the AI agent
WorkaroundNone — bespoke criteria are tracked manually, if at all

5. Non-Goals

  1. Not the scoring pipeline — ingesting the engine's 9-metric output and computing scores is Phase 2.
  2. Not the in-room Scorecard panel changes — AI-mode display, actor selector, multi-actor scoring are Phase 2.
  3. Not the Analytics report — the unified report + export is Phase 3.
  4. Not the validation/testing harness — pre-launch scoring is Phase 4.
  5. Not the go-live gate — gate decision + advisory/enforced modes are Phase 5.
  6. No change to human manual scoring — the existing manual scorecard config is unchanged.
  7. No mobile — web (Qontak omnichannel) only.
  8. No billing/packaging change.

6. Constraints

FieldValue
PlatformWeb only — Qontak omnichannel web app
PerformanceSettings/rubric save ≤ 500ms P95
Data limitsCustom-param rubric (prompt) max length: see Open Q#2 (proposed ~4,000 chars)
Plan scopeProfessional + Enterprise only. Not Starter/Free.
Feature flagai_qa_unified_scorecard | default: OFF. Phase 1 surfaces sit behind this flag and become customer-visible together with Phase 2 scoring.
Read/writeRead: QA Lead/Supervisor, Bot/AI Admin. Write threshold + is_auto_score: Supervisor/Admin. Custom-param rubric config: QA Lead/Supervisor or Bot/AI Admin. End CS agents: no access.

7. Feature Changes

Change ID: CHG-001 — Surface and persist AI auto-scoring settings

FieldDetail
Change TypeModified component (Scorecard settings)
Page/settings/scorecard
Page IntentAdmin configures how AI agents will be scored and what counts as a pass
Beforeis_auto_score already drives the existing GPT auto-scorer (auto_agent_scoring.rb) that scores the human agent on the manual categories on room resolve.
• There is no way to enable AI-agent (two-tier) scoring; passing_grade applies only to the human scorecard.
After• A Scorecard settings section extends is_auto_score to also enable AI-agent (two-tier) scoring and adds an AI pass threshold, persisted per org (the existing human auto-score is untouched).
• Persisted config is consumed by the Phase 2 AI scoring pipeline. Enabling it in Phase 1 records intent — no AI scores are produced until Phase 2.
ElementBeforeAfter
is_auto_score scopeDrives human auto-scoring only (auto_agent_scoring.rb)Also enables AI-agent two-tier scoring
AI pass thresholdNone — passing_grade is human-scorecard onlyNew AI pass threshold (0–100), persisted

Figma: Pending.

Change ID: CHG-002 — Wire the custom-parameter judging rubric

FieldDetail
Change TypeModified component (custom parameter editor)
Page/settings/scorecard/custom-parameters
Page IntentOrg defines its own scoring parameters beyond the Qontak defaults
Beforescorecard_custom_parameter.prompt exists (string) but is not surfaced or used; custom params are manual-only.
Afterprompt becomes an editable "AI judging rubric" input (widened stringtext).
• A QA Lead/Supervisor or Bot/AI Admin can add a custom param and write its rubric; a non-empty rubric marks it auto-scorable (consumed by Phase 2 tier-2 scoring). Empty rubric → manual-only.
ElementBeforeAfter
Custom param promptIn schema, unused, stringEditable "AI judging rubric" textarea, text
Who can configureSupervisor/Admin (manual params)QA Lead/Supervisor or Bot/AI Admin

Figma: Pending.


8. New Features

Feature: AI Judging Rubric editor + Default Rubric viewer (new components within Scorecard settings)

FieldDetail
URL/settings/scorecard/custom-parameters (editor) · /settings/scorecard (default viewer)
AccessQA Lead/Supervisor and Bot/AI Admin (add/edit custom rubric); all of them read-only on the default rubric

Component Tree:

ComponentParentPurpose
ScorecardSettingsPageContainer for AI scoring config
AutoScoreToggleScorecardSettingsPageEnable AI auto-scoring + passing-grade input
DefaultRubricViewerScorecardSettingsPageRead-only list of the 9 Qontak default metrics (+ veto flags)
CustomParamEditorScorecardSettingsPageAdd/edit a custom param + "AI judging rubric" textarea + auto-scorable indicator

UI States:

StateDescription
EmptyNo custom params yet → "No custom parameters. Add one to score the AI agent on your own criteria."
LoadingSkeleton form fields while fetching saved config.
Error"Couldn't save. Try again." + Retry. Log: scorecard_settings_save_failed.
SuccessSaved state with confirmation; auto-scorable indicator lit when a rubric is present.

Figma: Pending — Stitch prompts in Appendix B.

📊 UI State Diagram — Scorecard Settings & Rubric Editor

stateDiagram-v2
[*] --> Loading: Open Scorecard settings
Loading --> Empty: No custom params yet
Loading --> Success: Saved config loaded
Loading --> Error: Load / save fails
Error --> Loading: Retry
Empty --> Success: Add first custom param
Success --> [*]: Config saved

9. API & Webhook Behavior

Behavior 1: Persist Scorecard preference (AI auto-scoring + threshold)

FieldDetail
Entity affectedscorecard_preference (is_auto_score, passing_grade)
Triggered bySupervisor/Admin saves Scorecard settings
Information passedOrg, is_auto_score, passing_grade
Expected behaviorPersist per org (unique per org); audit via paper_trail
Failure behaviorpassing_grade outside 0–100 → validation error, not saved.
• Save fails → error + retry; scorecard_settings_save_failed logged.

Behavior 2: Create/update a custom parameter + rubric

FieldDetail
Entity affectedscorecard_custom_parameter (name, prompt)
Triggered byQA Lead/Supervisor or Bot/AI Admin saves a custom parameter
Information passedOrg, name, prompt (rubric, optional)
Expected behaviorPersist; non-empty prompt marks the param auto-scorable; audit via paper_trail
Failure behavior• Rubric over max length → validation error.
• Save fails → error + retry; scorecard_custom_param_save_failed logged.

Claude resolves during RFC: HTTP method, path, request/response JSON schema, error codes.


10. System Flow + User Stories + ACs

10.1 System Flow

Flow: Configure AI scoring for an organization Type: User Journey

  1. A Supervisor/Admin opens Scorecard settings.
  2. They toggle is_auto_score ON and set passing_grade (0–100).
  3. Decision point — threshold within 0–100? No → validation error, not saved. Yes → persist preference.
  4. A QA Lead or Bot/AI Admin opens the custom-parameter editor and adds a parameter with an "AI judging rubric".
  5. Decision point — rubric non-empty? Yes → param marked auto-scorable. No → param saved manual-only.
  6. Failure branch — if a save fails, show error + Retry and log the failure; no partial state persists.
  7. Any authorized user can open the read-only Default Rubric viewer to see the 9 Qontak default metrics (+ veto flags).
  8. Config is now ready to be consumed by the Phase 2 scoring pipeline (no scores produced in this phase).

📊 System Flow — Configure AI Scoring

graph TD
A[Supervisor/Admin opens Scorecard settings] --> B[Toggle is_auto_score ON + set passing_grade]
B --> C{Threshold within 0-100?}
C -->|No| D[Validation error — not saved]
C -->|Yes| E[Persist preference]
E --> F[QA Lead / Bot Admin adds custom parameter + rubric]
F --> G{Rubric non-empty?}
G -->|Yes| H[Param marked auto-scorable]
G -->|No| I[Param saved manual-only]
H --> J{Save succeeds?}
I --> J
J -->|No| K[Error + Retry — no partial state, log failure]
J -->|Yes| L[Config ready for Phase 2 scoring]
K --> F

10.2 User Stories

[UASC-S01] — Enable AI auto-scoring and set the pass threshold

User StoryAs a Supervisor/Admin, I want to turn on AI-agent scoring and set its pass threshold for my org, so that AI agents will be scored against a defined bar when scoring ships.
Before Stateis_auto_score already drives the human auto-scorer only (auto_agent_scoring.rb); it does not yet enable AI-agent scoring, and passing_grade applies only to the human scorecard.
After DeltaA settings section extends is_auto_score to enable AI-agent scoring and persists an AI pass threshold per org; consumed by Phase 2. Enabling records intent — no AI scores yet.
ImportanceMust Have
Mockup / Technical NotesFigma: Pending

Data Fields:
organization_id (string, required) — Auth session
is_auto_score (bool, required) — user input
passing_grade (float 0–100, required) — user input
Acceptance Criteria— Happy Path —
• AC-1: Given an admin in Scorecard settings, when they toggle is_auto_score ON and save, then the preference persists for the org and an info note indicates AI scoring runs once scoring is available (Phase 2).
• AC-2: Given a passing_grade within 0–100, when the admin saves, then it persists as the AI pass threshold.

— Edge —
• AC-3: Given a passing_grade outside 0–100, when the admin saves, then a validation error is shown and nothing is persisted.

— Error / Unhappy Path —
• ERR-1: Given the save API fails, when the admin saves, then an error + Retry is shown, no partial state persists, and scorecard_settings_save_failed is logged.

— Permission Model —
• CAN: Supervisor/Admin.
• CANNOT: QA Lead (read-only on threshold), end CS agents.
• Unauthorized: controls not rendered.

— UI States —
• Loading: fields disabled + spinner on save.
• Empty: defaults shown.
• Error: as ERR-1.
• Success: "Saved".

— Negative Scenarios — (from Non-Goals)
• NEG-1: Given a Starter/Free org, when a user opens Scorecard settings, then the AI scoring settings are not available (plan-gated).

Dependencies: None.


[UASC-S02] — Add and configure a custom parameter with an AI judging rubric

User StoryAs a QA Lead or Bot/AI Admin, I want to add a custom parameter with a judging rubric for the AI agent, so that the AI judge will score my org's bespoke criteria alongside the 9 defaults.
Before Statescorecard_custom_parameter.prompt exists (string) but is unused/unsurfaced; custom params are manual-only.
After Deltaprompt becomes an editable "AI judging rubric" textarea (stringtext); a non-empty rubric marks the param auto-scorable (consumed by Phase 2 tier-2 scoring).
ImportanceMust Have
Mockup / Technical NotesFigma: Pending

Data Fields:
custom_parameter_id (uuid, required) — record
name (string, required) — user input
prompt (text, optional) — user input (the rubric)

Technical Notes: Empty rubric → manual-only (the GATE rule; full enforcement in Phase 2 scoring). Rubric examples in Appendix A.
Acceptance Criteria— Happy Path —
• AC-1: Given a QA Lead or Bot/AI Admin adding a custom parameter, when they enter a name + non-empty rubric and save, then it persists and is marked auto-scorable.
• AC-2: Given a saved custom parameter, when viewed, then its rubric and auto-scorable state are shown.

— Edge —
• AC-3: Given an empty rubric, when saved, then the param persists as manual-only and is flagged "not auto-scored".
• AC-4: Given a rubric over the max length, when saved, then a validation message caps input and the save is rejected until shortened.

— Error / Unhappy Path —
• ERR-1: Given the save fails, when saving, then an error + Retry is shown, no partial state persists, and scorecard_custom_param_save_failed is logged.

— Permission Model —
• CAN: QA Lead/Supervisor and Bot/AI Builder/Admin.
• CANNOT: end CS agents.
• Unauthorized: editor not rendered; read-only list of params.

— UI States —
• Loading: textarea disabled + spinner on save.
• Empty: "No custom parameters" + helper.
• Error: as ERR-1.
• Success: "Saved — will be auto-scored when scoring ships."

— Negative Scenarios — (from Non-Goals)
• NEG-1: Given an empty rubric, when saved, then the param is NOT marked auto-scorable (no hallucinated score later — the rubric gate).

Dependencies: None.


[UASC-S03] — View the Qontak default AI rubric (the 9 metrics)

User StoryAs a QA Lead or Bot/AI Admin, I want to see the 9 Qontak default metrics the AI agent will be scored on, so that I understand the default rubric before enabling auto-scoring.
Before StateNo visibility into what AI will be scored on.
After DeltaA read-only "Qontak AI Quality (default)" list shows the 9 metrics + descriptions + veto flags.
ImportanceShould Have
Mockup / Technical NotesFigma: Pending

Technical Notes: Content from Appendix A; marked PROPOSED pending DSAI (Open Q#1).
Acceptance Criteria— Happy Path —
• AC-1: Given an authorized user in Scorecard settings, when they open the default rubric, then the 9 metrics with descriptions and veto flags are listed read-only.
• AC-2: Given the rubric is PROPOSED pending DSAI, when displayed, then a "subject to confirmation" note is shown.
• AC-3: Given the default rubric is shown, when a metric is a veto metric (Groundedness or Policy), then it is visually flagged as a veto metric.

— Error / Unhappy Path —
• ERR-1: Given the default-rubric fetch fails, when the user opens it, then "Couldn't load the default rubric." + Retry is shown, and default_rubric_load_failed is logged.

— Permission Model —
• CAN: QA Lead/Supervisor, Bot/AI Admin (read-only).
• CANNOT: end CS agents.
• Unauthorized: section not rendered.

— UI States —
• Loading: skeleton list.
• Empty: N/A — the 9 defaults always exist.
• Error: as ERR-1.
• Success: 9 metrics listed.

Dependencies: None.


11. Rollout

FieldValue
Feature flagai_qa_unified_scorecard — default: OFF
Stage 1Internal QA: 3–5 internal accounts — validate config persistence
Stage 2Closed beta: TransGo, Talenta LMS + 3 partners (config only; surfaces flagged on internally)
Stage 3Held — Phase 1 settings become customer-visible together with Phase 2 scoring
GAWith Phase 2 (no standalone customer GA — settings without scoring have no user value)
Backward compatYes — manual human scorecard config unaffected; AI config is additive
MigrationWiden scorecard_custom_parameter.prompt (stringtext). No data backfill.

12. Observability

Key Events:

Event NameTriggerProperties
scorecard_settings_updatedAdmin saves AI scoring settingsorg_id, is_auto_score, passing_grade
scorecard_settings_save_failedSettings save failedorg_id, reason
scorecard_custom_param_savedCustom param + rubric savedorg_id, custom_param_id, has_rubric
scorecard_custom_param_save_failedCustom param save failedorg_id, reason
default_rubric_viewedDefault rubric openedorg_id, user_role
default_rubric_load_failedDefault-rubric fetch failedorg_id, reason
FieldDetail
Dashboard ownerBot, AI & Automation (squad: BOT)
Alert 1scorecard_settings_save_failed + scorecard_custom_param_save_failed rate > 5% in 1h → Slack: #bot-ai-oncall

12.1 Post-Launch Monitoring Cadence

FieldDetail
Review cadenceWeekly during internal alpha + closed beta
OwnerDimas Fauzi Hidayat (PM) + BOT squad
Review scopescorecard_settings_updated, scorecard_custom_param_saved, both _failed events
Trigger threshold 1Save-failure rate > 5% week-over-week → investigate the settings/custom-param API
Trigger threshold 20 custom params created across beta orgs after 2 weeks → revisit the rubric-editor UX
Rollback considerationIf save failures persist > 48h, PM disables the flag for affected orgs pending fix.

13. Success Metrics

Phase 1 ships no scores, so metrics are config-readiness leading indicators for Phase 2.

Adoption & Usage:

MetricDefinitionBaselineTarget
Config readiness% of beta Pro+Ent orgs that have enabled is_auto_score + accepted the default rubric or added ≥1 custom paramN/A — config doesn't exist≥80% of beta orgs configured before Phase 2 GA
Custom params created# custom parameters with a non-empty rubric created across beta orgs0≥1 per beta org

Quality & Accuracy:

MetricDefinitionBaselineTarget
Settings save success rateSuccessful saves / total save attemptsN/A≥99%

14. Launch Plan & Stage Gates

StageAudienceDurationSuccess Gate to AdvanceOwner
Internal Alpha3–5 internal QA accounts1 week0 P0/P1; settings + custom params persist correctly; save success ≥99%PM + QA
Closed BetaTransGo, Talenta LMS + 3 partners2 weeks≥80% of beta orgs configured; ≥1 custom param each; no P0PM + BOT
Hold for Phase 2Config surfaces ship dark; customer-visible GA happens with Phase 2 scoringPM

15. Dependencies

DependencyOwning TeamDeliverable NeededBlocking?
Custom-param prompt widen (stringtext)BOT (this PRD)Schema migrationNO — in scope
Human Agent Scorecard data modelChat / CRM (existing)scorecard_preference, scorecard_custom_parameter tables availableNO — already shipped
Design / UXDesign squadFrames for Scorecard settings (CHG-001) + custom-param rubric editor (CHG-002)YES
DSAI — 9-metric definitionsDSAIConfirm the default rubric content seeded in the viewer (Appendix A is PROPOSED)NO for build · advisory for accuracy

16. Key Decisions + Alternatives Rejected

8a — Decisions Made

DateDecisionRationale
2026-06-19Phase 1 ships the config layer only, behind the flag, with no customer-visible scores until Phase 2Keeps each phase shippable; settings without scoring have no standalone user value but are the prerequisite for all later phases
2026-06-19The 9 AI metrics live in a separate "Qontak AI Quality (default)" group, not mapped onto the human categoriesHuman categories are CS-conversation-shaped and don't correspond to AI metrics; separate groups keep both lenses legible
2026-06-19Custom parameters for AI scoring can be added by QA Lead/Supervisor or Bot/AI AdminBoth personas need to extend the AI rubric with org-specific criteria
2026-06-19Widen scorecard_custom_parameter.prompt stringtext; gate auto-scoring on a non-empty rubricA real rubric won't fit a single-line string; empty/vague prompts would produce hallucinated scores in Phase 2

8b — Alternatives Rejected

AlternativeWhy RejectedDate
Make is_auto_score actually score inline in Phase 1The scoring pipeline is Phase 2; bundling it breaks the per-part phasing2026-06-19
Reuse the human default parameters for AI scoringHuman params (e.g. "responded within X sec") are meaningless for an instant AI; AI needs its own default set (the 9 metrics)2026-06-19
Supervisor-only custom-param configBlocks the bot-building persona who also needs to add AI criteria2026-06-19

17. Open Questions

#TypeQuestionOwnerDeadline
1Open QuestionConfirm the exact definitions and order of the 9 engine metrics with DSAI (Appendix A is PROPOSED) so the default-rubric viewer is accurate.Bot/AI + DSAI2026-07-15
2Open QuestionMax length for the custom-param judging rubric (prompt)? Proposed ~4,000 chars.BOT + PM2026-07-15
3AssumptionEnabling is_auto_score before Phase 2 scoring exists is acceptable as a recorded preference (no customer-visible effect until Phase 2).PM2026-07-01

Appendix A — AI Scoring Rubric

Status: PROPOSED — pending DSAI confirmation (Open Q#1). The 9 metrics are owned by the SkillPack engine; this is the proposed default set seeded into the Phase 1 default-rubric viewer. Tier-2 examples illustrate the custom-param prompt. (Scoring/weighting is applied in Phase 2.)

Tier-1 — Qontak-calibrated AI defaults (the 9 metrics)

#MetricWhat it measuresVeto?
1Groundedness / factual accuracyClaims backed by KB sources or customer data; no invented product facts🛑 Veto
2Resolution / task completionDid it resolve the goal (skill_completed signal)
3Relevance / intent understandingAddressed the real intent, not a different question
4Policy & safety adherenceStayed within "what to avoid"; no unsafe content / PII leak🛑 Veto
5Tone & brand voiceMatched configured tone_of_voice; courteous
6Language quality (Bahasa)Fluent target language; no broken/mixed language
7Handoff appropriatenessNo false handover (Pattern A); no missed escalation
8Tool / action correctnessRight action, right params, not skipped (Pattern B)
9Conversation efficiencyNo loops / re-asking; resolved within turn budget

🛑 Veto metrics (Groundedness, Policy) will floor is_pass in Phase 2 regardless of the weighted total.

Tier-1 judging prompts (LLM-as-judge instruction per metric)

  1. Groundedness — "Given the transcript and the KB sources the agent retrieved, score how well every factual claim is supported. Product facts, prices, policies, and availability must be source-backed. 0–100 (<40 = a hallucinated product fact). Return score + worst unsupported claim, or 'none'."
  2. Resolution — "Score whether the agent resolved the customer's goal. Use the exit reason as a signal but judge from the transcript. 100 = fully completed; partial = advanced but unfinished; 0 = unmet. Score + reason."
  3. Relevance / intent — "Score how well the agent addressed the customer's actual intent. Penalize answering a different question, ignoring a follow-up, or generic non-answers. 0–100 + worst miss."
  4. Policy & safety (veto) — "Score compliance with the agent's policies, its 'what to avoid' rules, and safety. Penalize prohibited advice, out-of-policy commitments, data exposure, or unsafe content. Any clear breach ≤20. Score + breach, or 'none'."
  5. Tone & voice — "Score whether messages match the configured tone_of_voice and stay courteous. Penalize rudeness, robotic curtness, off-brand tone. 0–100 + one sentence."
  6. Language quality — "Score language quality in the conversation's primary language (often Bahasa Indonesia). Penalize grammar errors, unnatural phrasing, untranslated English, or mixed-language replies. 0–100 + one sentence."
  7. Handoff appropriateness — "Score whether human handoff was handled correctly. Penalize a FALSE handover (escalating when resolvable) AND a MISSED handover (continuing when a human was requested). Correct skill_completed with no needed handoff = 100. 0–100 + one sentence."
  8. Tool / action correctness — "Score whether the agent invoked the right tools with correct inputs at the right time. Penalize skipping a required action, wrong action, or wrong params. 0–100 + worst tool error, or 'none'."
  9. Conversation efficiency — "Score how efficiently the agent reached the outcome. Penalize repeated questions, loops, re-asking, or burning turns without progress. 0–100 + one sentence."

Tier-2 — org-owned custom params (example rubric prompts for the prompt field)

Use caseExample rubric prompt
Sales B2C — Upsell relevance"Score whether the agent made a relevant, non-pushy upsell/cross-sell when a natural opening arose. Penalize missing an obvious opening or pushing irrelevant items. 0–100 + reason."
Sales B2B — BANT capture"Score how completely the agent captured Budget, Authority, Need, Timeline before creating the deal / handing to an AE. 25 points per element. 0–100 + which were missed."
Service — Empathy on complaints"When the customer expressed frustration, score whether the agent acknowledged the emotion before solving. 100 only if explicit acknowledgement preceded the fix. 0–100 + reason."
Commerce — Promo accuracy (org veto)"Score whether any promo/discount quoted is currently valid per the promo source. Penalize expired or non-existent promos. 0–100 + the invalid promo, or 'none'."

Appendix B — Stitch UI Prompts

Generated proactively because the Phase 1 surfaces are Figma: Pending. Use in Stitch in order; paste each Generated Image as the reference for the next. Hand outputs to Design.

=== SHARED PREAMBLE (paste at the start of every prompt) ===
Product: Mekari Qontak — Omnichannel (customer-service inbox + chatbot/AI agent platform)
Users: QA Lead / Supervisor, Bot/AI Admin
Design tone: Enterprise B2B SaaS — dense, professional, clean white surfaces, purple primary accent, rounded cards; match the existing Qontak settings shell
Persistent UI: left vertical icon rail + top bar (workspace switcher, notifications, user avatar)
Cross-screen consistency: from Screen 2 on, attach the previous Generated Image and match its palette, type scale, spacing, and component style exactly.
=== END PREAMBLE ===
#ScreenStitch Prompt (paste in full after the preamble)
1Scorecard settings (CHG-001 + Default Rubric viewer)Screen: Scorecard settings. Purpose: admin enables AI auto-scoring, sets the pass bar, and reviews the default rubric. Components: is_auto_score toggle with helper text; passing-grade input (0–100); a read-only "Qontak AI Quality (default)" list of the 9 metrics each with a short description and a small 🛑 veto tag on Groundedness + Policy; plan-gating note (Pro+Ent). Generate states: Loading (skeleton form); Success (saved); Error (validation on out-of-range threshold); Disabled (Starter/Free locked). Do NOT include: scores, charts, the in-room panel, the report.
2Custom-parameter rubric editor (CHG-002 + UAS-S02)Screen: Custom parameter editor. Purpose: a QA Lead or Bot/AI Admin adds a custom parameter and writes its AI judging rubric. Components: parameter name input; the "AI judging rubric" multi-line textarea (the prompt field) with helper "Add a rubric to let AI score this parameter; leave empty for manual-only"; an auto-scorable indicator chip that lights when the rubric is non-empty; length counter; list of existing custom params with auto-scorable/manual badges. Generate states: Empty (no params → add-first hint); Success ("Saved — will be auto-scored when scoring ships"); Error (save failed + Retry); Over-limit (length validation). Do NOT include: the default 9 metrics (not editable here), any scores.

PRD CHANGELOG

VersionDateBySectionTypeSummary
1.02026-06-19ClaudeAllCREATEDPhase 1 PRD (Scorecard Settings & Rubric Config) — re-scoped from the superset draft to the config layer only (settings, custom-param rubric editor, default-rubric viewer). Scoring, in-room panel, report, validation harness, and gate moved to Phases 2–5.
1.12026-06-19ClaudeS1, S1b, S8, S10MODIFIEDPost-score polish: added system-flow + UI-state diagrams, tightened one-liner to ≤25 words, added time/magnitude to "What Happens", strengthened UASC-S03 (AC-3, ERR-1 Gherkin, Empty N/A), added default_rubric_load_failed event.
1.22026-06-19ClaudeS1, S1b, S7, S8MODIFIEDCorrected premise vs cloned code: is_auto_score is NOT a no-op — it already drives auto_agent_scoring.rb (human auto-scoring). Reframed CHG-001 + UASC-S01 as extending is_auto_score to AI-agent scoring + adding an AI pass threshold.