Qontak Chatbot | AI Agent | Autonomous AI Agent — Phase 2: AI-Assisted Refinement
HEADER BLOCK
| Field | Value |
|---|---|
| PM | Dimas Fauzi Hidayat (Product Manager, Mekari Qontak) |
| PRD Version | 1.3 |
| Status | DRAFT |
| PRD Type | PHASE |
| Epic | BOT-4191 |
| Squad | BOT — Hadiningbot Squad |
| Engineering Lead | Eko Aprianto |
| Data Team | Data / ML Platform (noncore-mrag, chatbot-ai, mekari-agent owners) |
| RFC Link | RFC — Qontak Chatbot AI · Autonomous Agent §10.3b refine-skill-pack · detail: refine-skill-pack endpoint |
| Figma Master | Design exists — canonical source of truth is the designer prototype (Wulan): qontak-designer → app/pages/bot-automation/ai-agents/[id].vue (right rail with Preview + Refine tabs). Like Phase 1, the live prototype supersedes prose here; re-check it before implementing. Figma frames TBD. |
| UI/UX Designer | Wulan Febyazzahra |
| Anchor | Yes — Autonomous AI Agent — ANCHOR (Epic BOT-4191) |
| Labels | epic:qontak-chatbot | module:ai-agent | feature:ai-agent-refine |
| Last Updated | 2026-06-29 |
Status values:
DRAFT→READY→BUILD→SHIPPED
Table of Contents
- HEADER BLOCK
- 2. CONDITIONAL BLOCK: PHASE CONTEXT
- 3. One-liner + Problem
- 4. What Happens If We Don't Ship This Phase
- 5. Target Users + Persona Context
- 6. Non-Goals
- 7. Scope Changes
- 8. Constraints
- 9. New Features
- 10. API & Webhook Behavior
- 11. System Flow + User Stories + ACs
- 12. Rollout
- 13. Observability
- 14. Success Metrics
- 15. Launch Plan & Stage Gates
- 16. Dependencies
- 17. Key Decisions + Alternatives Rejected
- 18. Open Questions
- PRD CHANGELOG
2. CONDITIONAL BLOCK: PHASE CONTEXT
| Field | Detail |
|---|---|
| Anchor PRD | Autonomous AI Agent — ANCHOR (Confluence: QON 51188335138, Epic BOT-4191) |
| Phase Number | Phase 2 of 4 |
| Phase Goal | Let tenants iterate a live autonomous agent's capability_pack through a conversational AI surface in the agent editor — paste an error or describe a misbehavior and get a conversational reply plus reviewable, apply-or-discard config changes — by adding a refine proxy on chatbot BE over the upstream refine-skill-pack, reusing the Phase-1 engine/config model and the drafter's validation pipeline |
| Prior phases | Phase 1: New Engine Migration (shipped on /v2/ai_agents autonomous mode, BOT-4235) — productionised the autonomous engine + new config model (Profile · Capabilities · Routing) and the drafter (POST /v2/ai_agents/generate → upstream draft-skill-pack). PRD: Phase 1. No other prior phases — the later Phase 3: Migrate Existing Configurations (TBD) and Phase 4: New-Configuration Iteration (TBD) come after this phase and do not block it. |
| This phase | The refiner: a conversational way to fix an existing autonomous agent. New POST /v2/ai_agents/:id/refine BE proxy → upstream refine-skill-pack; a "Refine" tab in the agent editor's right rail (alongside Preview) in chatbot-fe — a multi-turn chat where the AI proposes one or more options (each a field-level diff), and accepting one applies it into the form (highlights changed fields, switches to the relevant tab). Persistence is the editor's existing Save (→ PATCH /v2/ai_agents/:id). Design SoT = the qontak-designer prototype. No new Rails DDL. |
| Deferred to next | Server-side session/history persistence (FE owns it this phase); editing knowledge-base content via refine; auto-apply of high-confidence patches. |
| Cross-phase deps | Inherits Phase 1's capability_pack model, the capability_pack↔skill_pack adapter (skill_pack_mapper.rb + SyncToAiService#build_skill_pack), and the drafter's defensive post-processing (gate validation, tone coercion, orphan cleanup, reference filtering). Independently shippable — depends only on Phase 1, not on the later Phase 3/4. The build_skill_pack extraction (see §7/§17) must not change Phase 1's sync behavior. |
Note: Phase 2 here is a workstream ordering, not a strict sequence gate — it can ship right after Phase 1 (and before the later Migrate/Iteration phases) because it builds only on Phase 1.
3. One-liner + Problem
One-liner:
Let tenants fix a live autonomous agent by describing the problem and accepting reviewable, AI-proposed config changes — no hand-editing the capability_pack.
Problem:
Today the only way to change a configured autonomous agent is to manually re-edit the Profile / Capabilities / Routing tabs in the form editor (AiAgentEditor.vue) and re-save the whole config — a full-merge PATCH /v2/ai_agents/:id that replaces the entire profile / capabilities / routing blocks (update_ai_agent.rb:83 in chatbot BE). When an agent misbehaves in production (an action firing on the wrong error, a capability that never triggers, a routing rule that exits too early), the tenant has to diagnose the capability_pack by hand and guess which field to change.
The drafter shipped in Phase 1 only generates from scratch — it cannot fix an existing agent. After this phase, a Chatbot Specialist (or customer Bot Builder) can describe the problem in plain language and get a conversational diagnosis plus a previewable, apply-or-discard diff of concrete config changes. For full initiative context, see the ANCHOR PRD: Autonomous AI Agent — ANCHOR.
4. What Happens If We Don't Ship This Phase
- Maintenance stays specialist-bound and slow (immediately, every release). The 15+ production autonomous agents (26Q2 cohort) can only be fixed by a Chatbot Specialist hand-diagnosing the
capability_packand trial-and-error re-editing — every customer change request becomes a manual ticket, capping how many agents one specialist can maintain as the cohort grows through 26Q3+. - Self-serve confidence stays low (undercuts the later self-serve phases, 26Q3–Q4). The Design Validation research (15 IDIs) put non-technical self-config confidence at ~50–60%. Without an AI-assisted "fix it" loop, customer Bot Builders keep handing problems back to Mekari instead of resolving them, undercutting the self-serve goal the later Migrate/Iteration phases (Phase 3/4) are scheduled to deliver.
- Competitive gap widens (ongoing). Competitor agent platforms increasingly offer conversational "fix your bot" iteration; every quarter we ship draft-only (generate but can't refine) leaves a visible hole in the autonomous product line during active competitive evaluations.
5. Target Users + Persona Context
| Persona | Role | Goal | Pain | Workaround |
|---|---|---|---|---|
| Primary — Chatbot Specialist | Internal Qontak Chatbot Specialist (technical) maintaining the production autonomous agents (26Q2 cohort of 15 agents across 15 cids) on behalf of / jointly with customers | When an agent misbehaves, diagnose and fix the capability_pack quickly and safely, with a preview before it goes live | Must read the raw capability_pack, guess which capabilities[] / routing[] field is wrong, hand-edit the 3-tab editor, re-save the whole config, and re-test in preview — slow and error-prone | Trial-and-error edits in the Config tabs + repeated preview runs; sometimes rebuilds the capability from scratch via the drafter |
| Secondary — Dedicated Bot Builder (customer-side) | Technical or non-technical Plus / Ultimate / Qontak 360 admin maintaining their company's own AI agents | Fix their own agent's behavior without waiting on a Mekari specialist | No guided way to diagnose; the legacy modal lacks the autonomous engine; editing the new config still requires knowing the model | Files a change request to a Mekari specialist and waits, or accepts the degraded behavior |
(Full persona background: see ANCHOR PRD. Plan availability + flag scope in §8 Constraints.)
6. Non-Goals
- No silent / auto-apply. Refine never writes config on its own — every change is preview-then-apply; the tenant explicitly applies or discards. (Auto-apply of high-confidence patches is explicitly rejected — see §17.)
- Autonomous-mode agents only. Refining a legacy
tree_node//ai-agentmodal agent is out of scope — the refiner operates on thecapability_packof agents on the new engine. - No server-side session/history persistence (this phase). The BE is stateless per the RFC; the FE owns any refine session state. A persisted refine-conversation store is deferred.
- Not a runtime test harness. Refine edits configuration; it does not run conversations against the agent. Validating behavior is done via Preview (Phase 1) and the AI Agent Testing initiative.
- No knowledge-base content editing via refine. Refine may re-reference an existing
file_search/ vector store that already belongs to the agent, but it does not upload, edit, or vectorise KB files — that lives in the Resources / AI Agent Knowledge surface. - No creating new actions/tools via refine. Refine references only actions already registered in the tenant's functions registry — any action name or
kb_idthe model invents is stripped (same reference-filtering guarantee as the drafter) and surfaced as a warning. - One agent at a time. No bulk / multi-agent refine, and no cross-agent suggestions.
7. Scope Changes
Engineering surfaces this PRD touches (controlled vocab). Kept in sync with the scope_changes frontmatter above.
- Backend —
chatbot:- New endpoint
POST /v2/ai_agents/:id/refine(FrontendService::V2, proxy/BFF, session-auth, rolesowner/supervisor/admin, flag-gated) — builds the request, proxies upstream, maps the response back. Does not persist. - New client method
refine_capability_pack(upstreamrefine-skill-pack) inlib/ai_service/ai_agent.rb— mirrors the existingdraft_skill_pack(POST /qontak-ai-noncore-mrag/api/ai-agent/refine-skill-pack). - New use-case + repository
UseCases::RefineAiAgent+Repositories::Refineunderapp/api/frontend_service/v2/ai_agent/— Clean Architecture, same shape asGenerate. - Refactor (shared mapper): extract
build_skill_pack/build_skill/build_routing_rules/build_skill_actions/build_completionetc. out ofRepositories::SyncToAiServiceinto a sharedMappers::SkillPackBuilder(pure shaping), parameterised by a vector-store resolver strategy.SyncToAiServicepasses its existing stateful resolver (creates/reuses vector DBs);Refinepasses a read-only resolver that reads the already-persistedcapability['vector_store']— so refine serialisescapability_pack→skill_packwithout creating vector stores. This is the one Phase-1 file touched; its sync behavior must not change. - Reuse (apply path): consume the upstream's already-applied
updated_skill_packvia the existingMappers::SkillPackMapper(reverse direction), then persist through the existingRepositories::Update+SyncToAiService(mode: :update) — i.e. apply = a normal update; no new write path, same authz +ai_agent_historiesaudit. - New feature flag
ai_agent_refine | default: OFF. qontak-ai-noncore-mrag/mekari-agent: new upstream endpointrefine-skill-pack(returnsreply+ JSON Patchpatches+ already-applied, re-validatedupdated_skill_pack) — owned by Data / ML, the key external dependency (see §16).
- New endpoint
- Frontend —
chatbot-fe:- "Refine" tab in the agent editor's right rail (beside Preview), per Wulan's
qontak-designerprototype — a multi-turn chat: the AI proposes option cards (each a per-fieldProposedChangediff, Recommended flagged); Accept stages the option into the form inAiAgentEditor.vue(highlights changed fields, switches to the relevant tab), and the editor's existing Save persists (→PATCH /v2/ai_agents/:id). FE owns the thread state. Design-vs-prod placement reconciliation: §18 OQ-7. - Diff preview of returned
patcheswith Apply / Discard; Apply calls the existing update endpoint with the previewedcapability_pack.
- "Refine" tab in the agent editor's right rail (beside Preview), per Wulan's
- Design — Figma for the Refine panel + diff-preview interaction (currently
TBD; Stitch prompts stand in until then).
8. Constraints
| Field | Value |
|---|---|
| Platform | Web only (Qontak admin — chatbot-fe). No mobile. |
| Performance | Refine round-trip target ≤ 10s p95 (dominated by the upstream LLM call); BE proxy overhead < 500ms. Hard client/read timeout aligns with the drafter (60s open/read) but the perceived target is ≤ 10s — beyond that the FE shows a "still working / try again" state. |
| Data limits | BE is stateless (no refine record persisted). chat_history sent upstream is capped at the last N turns (N TBD with ML — see §18) to bound token cost. One agent per refine request. |
| Plan scope | Same as autonomous mode — Plus / Ultimate / Qontak 360 workspaces with autonomous_ai_agent rollout = ON. Not Starter/Free. |
| Feature flag | ai_agent_refine | default: OFF — enabled per workspace; gates both the BE endpoint and the FE panel. |
| Read/write | Refine (propose) + Apply (write) both restricted to roles owner / supervisor / admin — identical to today's draft/update authz. Refine itself writes nothing; Apply goes through the standard update + sync path. |
9. New Features
Feature: "Refine" tab in the agent editor's right rail
Design source of truth: the
qontak-designerprototypeapp/pages/bot-automation/ai-agents/[id].vue(Wulan). The agent editor has a right rail that toggles between Preview and Refine tabs; the left side is the tabbed form (Profile · Capabilities · Routing · Advanced). The prod editor today isAiAgentEditor.vueat/bot-automation/ai-agent/:id(note: prototype route is plural/ai-agents/:idand renders the editor in a modal — a design-vs-prod structural delta to reconcile at build, see §18 OQ-7; the Preview rail itself is a Phase-1 "pending" item, §16).
| Field | Detail |
|---|---|
| URL | /bot-automation/ai-agent/:id (existing agent editor; Refine is a tab in its right rail, beside Preview) |
| Access | owner / supervisor / admin on autonomous-eligible workspaces with ai_agent_refine = ON |
Component Tree (per the prototype):
| Component | Parent | Purpose |
|---|---|---|
AiAgentEditor | — | Existing editor: left = form tabs (Profile · Capabilities · Routing · Advanced); right = rail |
RightRail | AiAgentEditor | Hosts the Preview and Refine tabs (rightRailTab) |
RefinePanel | RightRail | The Refine tab — multi-turn chat thread + input |
RefineEmptyState | RefinePanel | "Refine your agent" + suggestion chips (e.g. "The refund answer is not correct, fix it", "Add order tracking capability", "Make it faster to escalate to a human agent", "Make the tone more formal") |
RefineMessageThread | RefinePanel | User/AI turns; AI replies stream in (reply) |
RefineOptionCard | RefineMessageThread | One proposed option — label, description, Recommended badge, and a per-field diff (ProposedChange: type · field · current → new); Accept applies it, others dismiss |
RefineInput | RefinePanel | Free-text box (Enter to send) → POST /v2/ai_agents/:id/refine |
Apply behaviour (from the prototype's acceptRefineOption): accepting an option calls applyPendingData() → writes the change into the form, highlights the changed fields, switches to the relevant tab (Profile/Capabilities/Routing), and posts an AI confirmation turn. The other options for that message are marked dismissed. Nothing is persisted until the editor's existing Save (→ PATCH /v2/ai_agents/:id → Update + Sync).
UI States:
| State | Description |
|---|---|
| Empty | "Refine your agent" + suggestion chips. |
| Loading | AI reply streaming / generating — input disabled (refineIsGenerating). |
| Error | Upstream timeout/5xx or BE failure — error turn, agent unchanged, retry; no option cards. |
| Success | AI reply streamed + one or more RefineOptionCards (Recommended flagged) with Accept; on Accept, form fields highlight + tab switches. |
📊 UI State Diagram — Refine panel
stateDiagram-v2
[*] --> Empty: Open Refine tab
Empty --> Loading: Submit message / chip
Loading --> SuccessOptions: reply + option card(s) returned
Loading --> NoChange: reply, no actionable options
Loading --> Error: upstream timeout / 5xx
Error --> Loading: Retry
NoChange --> Loading: Send another message
SuccessOptions --> FormApplied: Accept an option (form highlighted, tab switched)
SuccessOptions --> Loading: Send follow-up (multi-turn)
FormApplied --> Saved: Editor Save → PATCH /v2/ai_agents/:id
FormApplied --> Loading: Keep refining
Saved --> [*]
Error --> [*]: Close (agent unchanged)
Figma: Frames TBD — the prototype above is canonical until then (see Header + §7).
10. API & Webhook Behavior
| # | Behavior | Entity Affected | Triggered By | Expected Behavior | Failure Behavior |
|---|---|---|---|---|---|
| 1 | Refine capability_pack | AI Agent capability_pack (read-only — not persisted) | Tenant submits a message in the Refine panel → POST /v2/ai_agents/:id/refine | BE loads the agent, serialises its current capability_pack → skill_pack via the shared SkillPackBuilder (read-only vector resolver), gathers available_tools, and proxies upstream with user_message + chat_history (+ optional trace). Upstream returns a conversational reply, JSON Patch patches (RFC 6902), and an already-applied, re-validated updated_skill_pack. BE maps updated_skill_pack → capability_pack via SkillPackMapper and returns reply + patches + the previewed capability_pack + warnings. Nothing is written. | Upstream timeout/5xx → BE returns a graceful error; agent unchanged. Upstream LLM/validation issue → upstream returns its deterministic fallback (never 5xx for LLM transport); BE passes through reply + empty/partial patches.Referenced action/ kb_id not in inputs → stripped by reference filtering, returned as a warning. |
| 2 | Accept an option (stage into form) | Editor form state (client-side; not yet persisted) | Tenant clicks Accept on a RefineOptionCard | FE applies the option's pendingData into the form, highlights the changed fields, and switches to the relevant tab; other options for that message are dismissed. No BE call yet. | N/A — client-side; reversible by not saving / re-refining. |
| 3 | Save (persist the refined config) | AI Agent capability_pack (persisted + re-synced) | Tenant clicks the editor's existing Save after accepting one or more options | Standard update path: PATCH /v2/ai_agents/:id → Repositories::Update writes new parameters, SyncToAiService (mode: :update) re-pushes skill_pack upstream + re-resolves vector stores; change live immediately; prior config snapshotted in ai_agent_histories. | Sync upstream fails → DB transaction rolls back; agent stays on prior config; error shown. Capability/routing ref validation fails → 400; no write. |
| 4 | Discard / don't apply | None | Tenant ignores the options or closes without saving | No write; proposed options dropped; the chat thread may continue. | N/A — purely client-side. |
[Claude to resolve during RFC: exact request/response JSON schema for /refine (user_message, chat_history[], trace{}, available_tools[] in; reply, patches[], updated_capability_pack, warnings[] out), HTTP error codes, and the SkillPackBuilder vector-resolver interface.]
11. System Flow + User Stories + ACs
11.1. System Flow
Flow: Refine an autonomous agent and apply a change · Type: User Journey + API Sequence
- Tenant opens an autonomous agent in the editor (
AiAgentEditor) and opens the Refine with AI surface. - Tenant types a message — e.g. pastes an error trace: "
createorderkeeps failing with 'belum terdaftar' but the bot just gives up." - FE sends
POST /v2/ai_agents/:id/refinewithuser_message+chat_history. - BE loads the agent, serialises its current
capability_pack→skill_packvia the sharedSkillPackBuilder(read-only vector resolver — no vector DB created), gathersavailable_tools, and proxies upstreamrefine-skill-pack. - Upstream LLM returns
reply+patches(RFC 6902) +updated_skill_pack(already applied + re-validated through the drafter's defensive pipeline). - BE maps
updated_skill_pack→capability_packviaSkillPackMapper; returnsreply+patches+ previewedcapability_pack+warnings. Nothing persisted. - FE renders the streamed reply plus one or more option cards — each with a
Recommendedflag and a per-field diff (ProposedChange: current → new); warnings shown if any. - Tenant clicks Accept on an option → FE applies its
pendingDatainto the form, highlights the changed fields, switches to the relevant tab; other options dismissed. (No BE write yet.) - Tenant clicks the editor's existing Save →
PATCH /v2/ai_agents/:id→Update+SyncToAiServicere-push; change live immediately; prior config snapshotted inai_agent_histories. - Failure branch (refine): upstream times out / 5xx → FE shows "couldn't generate a suggestion — agent unchanged", retry available.
- Failure branch (save): sync to upstream fails → transaction rolls back; agent stays on prior config; error shown.
- Tenant can keep refining — follow-up turns carry
chat_history(multi-turn thread per the design).
📊 System Flow — Refine with AI
sequenceDiagram
actor Tenant
participant FE as chatbot-fe (Refine panel)
participant BE as chatbot BE (/v2/ai_agents)
participant ML as noncore-mrag / mekari-agent
Tenant->>FE: Describe issue / paste error
FE->>BE: POST /v2/ai_agents/:id/refine (user_message, chat_history)
BE->>BE: SkillPackBuilder → skill_pack (read-only vector resolver)
BE->>ML: refine-skill-pack (skill_pack, message, history, available_tools)
ML-->>BE: reply + patches + updated_skill_pack (re-validated)
BE->>BE: SkillPackMapper → capability_pack (not persisted)
BE-->>FE: reply + patches + previewed capability_pack + warnings
FE-->>Tenant: Reply + option cards (Recommended; per-field diff)
alt Accept an option, then Save
Tenant->>FE: Accept option
FE->>FE: applyPendingData → form highlighted + tab switched (no write)
Tenant->>FE: Save
FE->>BE: PATCH /v2/ai_agents/:id (updated capability_pack)
BE->>BE: Update + SyncToAiService(mode: update)
BE->>ML: PUT /ai-agent (re-push skill_pack)
BE-->>FE: Updated agent (live) + history snapshot
else Refine upstream fails
ML-->>BE: timeout / 5xx
BE-->>FE: Graceful error — agent unchanged
FE-->>Tenant: "Couldn't generate a suggestion — try again"
end
11.2. User Stories
| User Story | Importance | Mockup / Technical Notes | Acceptance Criteria |
|---|---|---|---|
| [REFINE-S01] — Refine an agent in natural language As a Chatbot Specialist, I want to describe a misbehavior or paste an error and get a suggested config change, so that I can fix a live agent without hand-diagnosing the capability_pack. | Must Have | Figma: Pending — see §9 / Stitch. Data Fields: • id (string/uuid, required) — agent id, URL param• user_message (string, required, min 1 char) — User input• chat_history (array, optional) — FE thread state (multi-turn per the design)• trace (object, optional) — recent workflow_state / turns (see §18)• patches (array, response) — RFC 6902 ops from upstream• warnings (array, response) — stripped refsBefore-After Behavior: Before: the only way to change an agent is to manually re-edit the Profile/Capabilities/Routing form tabs and full-merge PATCH /v2/ai_agents/:id; after, the tenant describes the issue in the editor and the system returns a reply + reviewable diff with nothing written until applied. | — Happy Path — • AC-1: Given an autonomous agent and ai_agent_refine = ON, when the tenant submits a user_message, then the system returns a conversational reply plus patches and a previewed capability_pack, and persists nothing.• AC-2: Given a change is returned, when the response renders, then it shows one or more option cards (the best flagged Recommended), each with a per-field diff ( ProposedChange: current → new) and an Accept action.• AC-3: Given the message asks for nothing actionable (e.g. "thanks"), when the system responds, then reply is returned with no option cards.• AC-4: Given the upstream proposes an action name or kb_id not in the agent's inputs, when the response is built, then that reference is stripped and surfaced in warnings.— Error / Unhappy Path — • ERR-1: Given the upstream refine-skill-pack times out or returns 5xx, when the tenant submits, then the agent is unchanged, an error reply is shown with retry, and refine_failed is logged with the reason.• ERR-2: Given an upstream LLM/validation issue (not transport), when it occurs, then the upstream deterministic fallback reply is passed through with empty/partial patches (never a 5xx to the tenant).— Permission Model — • CAN: owner / supervisor / admin on autonomous-eligible workspaces with ai_agent_refine = ON.• CANNOT: other roles; workspaces with the flag OFF. • Unauthorized: Refine panel not rendered; endpoint returns 403. — UI States — • Loading: thinking indicator on assistant turn; input disabled. • Empty: prompt to describe the issue or paste an error. • Error: "couldn't generate a suggestion — agent unchanged" + retry. • Success: reply + option card(s), Recommended flagged (+ warnings). |
| [REFINE-S02] — Accept an option and save the change As a Chatbot Specialist, I want to accept a proposed option, see it land in the form, then save, so that no change goes live without my explicit review. | Must Have | Figma: Per prototype. Data Fields: • pendingData (object) — the option's structured change applied to the form• changes (array) — ProposedChange[] rendered as the per-field diff• capability_pack (object, required on Save) — the resulting pack sent to PATCH• id (string/uuid, required) — agent idBefore-After Behavior: Before: any save rewrites the whole config with no diff; after, the tenant accepts a specific option (form fields highlight, tab switches), reviews in the form, and saves through the standard update path with the prior config snapshotted for revert. | — Happy Path — • AC-1: Given option cards are shown, when the tenant clicks Accept on one, then its pendingData is applied into the form, the changed fields are highlighted, the editor switches to the relevant tab, the other options are dismissed, and refine_accepted is logged. No BE write yet.• AC-2: Given an option was accepted, when the tenant clicks the editor's Save, then PATCH /v2/ai_agents/:id runs Update + SyncToAiService (mode: update), the change is live, the prior config is snapshotted in ai_agent_histories, and refine_applied is logged.• AC-3: Given option cards are shown, when the tenant accepts none (closes / keeps chatting), then nothing is written, and refine_discarded is logged.— Error / Unhappy Path — • ERR-1: Given Save is clicked, when SyncToAiService fails to push upstream, then the DB transaction rolls back, the agent stays on the prior config, and an error is shown.• ERR-2: Given Save is clicked, when capability/routing reference validation fails, then the update returns 400 and no write occurs. — Permission Model — • CAN: owner / supervisor / admin (same as the existing update endpoint).• CANNOT: other roles. • Unauthorized: Accept/Save not rendered; endpoint returns 403. — UI States — • Loading: Save shows a spinner; actions disabled. • Empty: N/A (only shown when an option exists). • Error: inline "couldn't save — agent unchanged" + retry. • Success: accepted option marked applied; form reflects the change; Save confirms. Dependencies: [REFINE-S01] |
| [REFINE-S03] — Iterative (multi-turn) refinement As a Chatbot Specialist, I want to send follow-up requests that build on the prior turn, so that I can iterate ("now also handle the timeout case") without restating context. | Should Have | Figma: Per prototype (the Refine tab is a multi-turn thread). Data Fields: • chat_history (array) — FE-owned thread, capped at last N turns sent upstreamBefore-After Behavior: Before: no refine concept exists; after, the Refine tab keeps a multi-turn thread and the FE sends chat_history so follow-ups are context-aware, while the BE stays stateless. | — Happy Path — • AC-1: Given a prior refine turn, when the tenant sends a follow-up, then the FE includes chat_history and the response reflects the earlier context.• AC-2: Given a long thread, when history exceeds the cap, then only the last N turns are sent upstream (N per §18) and the rest stay client-side. • AC-3: Given the tenant reloads or reopens the editor, when no session is restored, then a fresh thread starts (no server-side history this phase — see §6 Non-Goal 3). — Error / Unhappy Path — • ERR-1: Given a mid-thread refine call fails, when it errors, then prior turns remain visible and the failed turn can be retried. — Permission Model — • CAN: same as [REFINE-S01]. • CANNOT: same as [REFINE-S01]. • Unauthorized: Refine tab not rendered. — UI States — • Loading: per-turn streaming indicator. • Empty: "Refine your agent" + suggestion chips. • Error: failed turn marked retryable. • Success: appended response (+ option cards if any). Dependencies: [REFINE-S01] |
| [REFINE-S01-NEG] — No refine on legacy agents; never auto-apply (Guard Rail — from Non-Goals 1 & 2) As a tenant on a legacy tree_node agent, when I look for Refine, then it is not available; and no refine result is ever written without explicit Apply. | Guard Rail | — | • NEG-1: Given a legacy tree_node / /ai-agent modal agent, when the tenant opens its config, then the Refine panel is not rendered and /v2/ai_agents/:id/refine returns a 4xx for that agent.• NEG-2: Given any successful refine response, when the tenant takes no action, then no config change is persisted (no auto-apply). |
🧪 Test Coverage Matrix — [REFINE-S01]
| Dimension | Coverage | Notes |
|---|---|---|
| Boundary values | ⚠️ partial | AC-3 covers no-actionable-change (empty patches); ⚠️ QA: empty/whitespace user_message (min 1 char), very long message, very large capability_pack |
| State transitions | ✅ defined | AC-1 (returns, nothing persisted) → S02 apply/discard transition |
| Data validation | ✅ defined | AC-4 reference filtering (unknown action/kb_id stripped → warning) |
| Concurrency | ⚠️ TBD | ⚠️ QA: two specialists refine the same agent simultaneously; refine in flight while another user applies a manual edit |
| Network/timeout | ✅ defined | ERR-1 upstream timeout/5xx → agent unchanged + retry; ERR-2 LLM fallback never 5xx |
🧪 Test Coverage Matrix — [REFINE-S02]
| Dimension | Coverage | Notes |
|---|---|---|
| Boundary values | ⚠️ TBD | ⚠️ QA: apply with empty patch set; apply a stale preview after the agent changed underneath |
| State transitions | ✅ defined | AC-1 apply→live; AC-3 discard→no-op; ERR-1 apply-fail→rollback |
| Data validation | ✅ defined | ERR-2 capability/routing ref validation → 400, no write |
| Concurrency | ⚠️ TBD | ⚠️ QA: apply while a parallel manual save commits (last-writer / optimistic-lock behavior) |
| Network/timeout | ✅ defined | ERR-1 SyncToAiService upstream failure → transaction rollback, prior config intact |
12. Rollout
| Field | Detail |
|---|---|
| Feature flag | ai_agent_refine (see §8 — OFF by default) |
| Rollout | Stage 1 (Internal Alpha) → Chatbot Specialists maintaining the 26Q2 cohort (the 15 production agents) Stage 2 (Closed Beta) → 3–5 customer Bot Builders on Plus/Ultimate/360 Stage 3 (Open Beta) → all autonomous-eligible workspaces, opt-in GA → all autonomous-eligible workspaces, flag default ON |
| Backward compat | Yes — purely additive. The existing draft + full-config edit path (PATCH /v2/ai_agents/:id) is unchanged; Apply reuses it. The only Phase-1 code touched is the build_skill_pack extraction (§7), which must preserve identical sync output. |
| Migration | None — no Rails DDL; no data migration. |
12.1. Semantic Regression Rollback
Refine produces AI output (proposed config patches), so this section applies.
| Field | Detail |
|---|---|
| Model flag | ai_agent_refine | default: OFF — disabling it removes the refine endpoint + panel; manual config editing remains fully available. |
| Regression metric | (a) refine patch apply-success rate (applied / proposed) and (b) post-apply agent regression — agents whose config was changed via refine and then reverted or re-edited within 48h. |
| Rollback threshold | Apply-success rate < 30% sustained over a week, or post-apply revert rate > 20%, or refine_failed rate > 10% → pause rollout / flip the flag OFF for affected workspaces. |
| Rollback path | Two levels: (1) feature — toggle ai_agent_refine OFF (no deploy); (2) per-agent — an applied-but-worse config is reverted by restoring the prior snapshot from ai_agent_histories (the standard update-audit trail), which re-syncs the old skill_pack upstream. |
13. Observability
Key Events:
| Event Name | Trigger | Properties |
|---|---|---|
refine_requested | Tenant submits a refine message | company_id, ai_agent_id, message_len, history_turns, timestamp |
refine_succeeded | Upstream returns a valid response | company_id, ai_agent_id, patch_count, warning_count, latency_ms, timestamp |
refine_failed | Upstream timeout/5xx or BE error | company_id, ai_agent_id, reason, latency_ms, timestamp |
refine_applied | Tenant clicks Apply and update succeeds | company_id, ai_agent_id, patch_count, timestamp |
refine_discarded | Tenant clicks Discard | company_id, ai_agent_id, patch_count, timestamp |
refine_reverted | Applied config reverted via ai_agent_histories within 48h | company_id, ai_agent_id, timestamp |
Dashboard owner: BOT — Hadiningbot Squad (chatbot)
Alerts:
refine_failedrate > 10% ofrefine_requestedover 1h → page on-call (chatbot) + notify PM.- Refine latency p95 > 10s over 1h → notify chatbot squad (upstream LLM latency check).
refine_reverted/refine_applied> 20% over a week → PM review (quality regression).
13.1. Post-Launch Monitoring Cadence
| Field | Detail |
|---|---|
| Review cadence | Weekly for the first 4 weeks post-GA, then monthly. |
| Owner | PM (Dimas) + BOT squad. |
| Review scope | All §14 metrics — adoption (refine vs manual edits), apply-success rate, error rate, time-to-fix. |
| Trigger thresholds | • Apply-success rate < 30% for a week → investigate prompt/UX. • refine_failed rate > 10% in any week → investigate upstream.• refine_reverted/refine_applied > 20% → quality review within 48h. |
| Rollback consideration | If error or revert thresholds breach and are unresolved within 48h, PM flips ai_agent_refine OFF for affected workspaces (see §12.1). |
14. Success Metrics
Adoption & Usage:
| Metric | Definition | Baseline | Target |
|---|---|---|---|
| ⭐ Refine adoption | Share of autonomous-agent config changes made via Refine (applied) vs manual tab edits | N/A — new capability | ≥ 40% of config changes via Refine within 60 days of GA |
| Refine engagement | Distinct agents that received ≥1 refine session | N/A | ≥ 60% of active autonomous agents within 60 days of GA |
Quality & Accuracy:
| Metric | Definition | Baseline | Target |
|---|---|---|---|
| Apply-success rate | Applied refinements / proposed refinements (a proxy for suggestion usefulness) | N/A | ≥ 50% within 30 days of GA |
| Refine error rate | refine_failed / refine_requested | N/A | < 5% steady-state |
| Post-apply revert rate | Applied configs reverted/re-edited within 48h | N/A | < 15% |
Efficiency & Impact:
| Metric | Definition | Baseline | Target |
|---|---|---|---|
| Time-to-fix | Median time from "agent misbehaving" to a shipped config fix | Manual baseline TBD (measure in Alpha) | −50% vs manual baseline within 90 days of GA |
15. Launch Plan & Stage Gates
| Stage | Audience | Duration | Success Gate to Advance | Owner |
|---|---|---|---|---|
| Internal Alpha | Chatbot Specialists, 26Q2 cohort (15 agents) | 2 weeks | ≥ 20 real refine sessions; apply-success ≥ 40%; refine_failed < 10%; no rollback-worthy regression | PM + Eng |
| Closed Beta | 3–5 customer Bot Builders | 3 weeks | Apply-success ≥ 50%; error rate < 5%; ≥ 1 customer fixes an agent unaided; post-apply revert < 20% | PM + CSM |
| Open Beta | All autonomous-eligible, opt-in | 3 weeks | Adoption trending toward 40%; latency p95 ≤ 10s; all Closed-Beta gates sustained | Eng Lead |
| GA | All autonomous-eligible (flag default ON) | Ongoing | All Open-Beta gates sustained 2 weeks; PMM approved | PM + PMM |
16. Dependencies
| Dependency | Owning Team | Deliverable Needed | Blocking? |
|---|---|---|---|
Upstream refine-skill-pack endpoint (mekari-agent / proxied by noncore-mrag) | Data / ML Platform | The endpoint itself: accepts skill_pack + user_message + chat_history (+ trace, available_tools); returns reply + RFC 6902 patches + already-applied, re-validated updated_skill_pack + warnings. Does not exist yet. | YES |
Phase 1 capability_pack model + drafter live on /v2/ai_agents | BOT — Hadiningbot (chatbot) | Must remain stable — the refiner serialises/maps the same capability_pack and the SkillPackBuilder is extracted from Phase 1's SyncToAiService | YES |
capability_pack↔skill_pack adapter (skill_pack_mapper.rb reverse map + the extracted SkillPackBuilder) | BOT — Hadiningbot (chatbot) | Bidirectional mapping reused for refine input/output; extraction must not change Phase-1 sync output | YES |
ai_agent_histories audit (exists) | BOT — Hadiningbot (chatbot) | Used as the per-agent revert path for an applied-but-worse config | NO |
trace source (recent workflow_state / turns) for debugging context | BOT + Data/ML (overlaps AI Agent Live Monitoring) | Optional runtime telemetry to enrich refine; refine works without it (degraded) | NO |
| Agent editor right rail (Preview tab — Phase-1 "pending") | BOT — Hadiningbot (chatbot) | The Refine tab lives in the same right rail as Preview; confirm whether the rail ships with Preview or refine stands up the rail (see §18 OQ-7) | NO |
| Refine design (right-rail chat) | Design — Wulan | Already prototyped in qontak-designer (app/pages/bot-automation/ai-agents/[id].vue); Figma frames are a follow-up, prototype is canonical meanwhile | NO |
17. Key Decisions + Alternatives Rejected
8a — Decisions Made
| Date | Decision | Rationale |
|---|---|---|
| 2026-06-29 | Upstream returns updated_skill_pack already applied + re-validated; BE does not apply patches itself. BE passes patches through to FE for the diff preview only. | Keeps chatbot BE a thin proxy (same posture as the drafter) and guarantees the refined pack went through the same defensive pipeline as the drafter (gate validation, tone coercion, orphan cleanup, reference filtering). Avoids a second, drift-prone RFC-6902 implementation in Rails. |
| 2026-06-29 | Apply reuses the existing PATCH /v2/ai_agents/:id (Update + SyncToAiService), not a new apply endpoint. | Apply is a normal config update — reuses authz, validation, upstream re-push, and ai_agent_histories audit/revert for free. |
| 2026-06-29 | Extract build_skill_pack from SyncToAiService into a shared SkillPackBuilder with a pluggable vector-store resolver (stateful for sync, read-only for refine). | The refiner must serialise the current capability_pack→skill_pack without creating vector DBs; sync must keep its side-effecting resolution. One shared mapper, two resolvers, no duplicated shaping logic. |
| 2026-06-29 | Stateless BE; FE owns any refine session state (no new table). | Matches RFC §10.3b; avoids Rails DDL and a dual source of truth during the 26Q2 window. |
| 2026-06-29 | Refine is always review-then-apply (no auto-apply). | Trust/safety — a config change to a live customer agent must be a human decision. |
| 2026-06-29 | Interaction model resolved by design (Wulan prototype): a "Refine" tab in the agent editor's right rail (beside Preview), multi-turn chat, AI proposes 1+ options each with a per-field diff (Recommended flagged); Accept stages the option into the form (highlight + tab switch); persistence is the editor's existing Save. | Supersedes the earlier "single-shot vs chat / drawer vs inline" open question — the qontak-designer prototype is the design source of truth (same posture Phase 1 took with its prototype). |
8b — Alternatives Rejected
| Alternative | Why Rejected | Date |
|---|---|---|
| BE applies the RFC 6902 patches itself (Rails JSON-Patch) | Upstream already returns the applied + re-validated pack; reapplying in BE duplicates logic and risks drift from the drafter's validation pipeline | 2026-06-29 |
A dedicated /refine/apply endpoint | Apply is just an update — reuse PATCH /v2/ai_agents/:id; a new endpoint duplicates authz/sync/audit | 2026-06-29 |
| Persist refine chat history server-side (new table) | Adds DDL + dual source of truth; FE-owned session is sufficient for this phase (deferred to a later phase if audit needs it) | 2026-06-29 |
| Auto-apply high-confidence patches | Unacceptable risk to live customer agents; conflicts with Non-Goal 1 | 2026-06-29 |
Build refinement entirely in the chatbot-ml-dev prototype | Same reasons Phase 1 productionised the engine — no plan/tier gating, no auth surface, no rollout control, no audit | 2026-06-29 |
18. Open Questions
| # | Type | Question | Owner | Deadline |
|---|---|---|---|---|
| 1 | Risk | Upstream refine-skill-pack does not exist yet. The whole feature is blocked on the Data/ML endpoint. Mitigation: confirm ownership + the exact contract with the mekari-agent/noncore-mrag owners before BE build; agree the request/response schema up front so the BE proxy + FE can be built against a stub. | PM (Dimas) + Data/ML | 2026-07-15 |
| 2 | Open Question | What goes in trace? Which runtime telemetry (recent workflow_state, recent turns) meaningfully improves refine quality, and where does it come from — does it overlap AI Agent Live Monitoring's signals? Refine must work without it (degraded). | PM + Eng (Eko) | before RFC |
| 3 | Assumption | chat_history cap (N turns). We assume the FE sends the last N turns to bound upstream token cost. Confirm N + truncation strategy with ML. | Eng + Data/ML | before RFC |
| 4 | Open Question | KB scope on apply. Non-Goal 5 keeps KB content out, but if a refinement changes which file_search/vector store a capability points to, does Apply trigger SyncToAiService vector resolution (and is that desired), or must KB-affecting patches be rejected? | PM + Eng | before RFC |
| 5 | Risk | Applied-but-worse config. A refinement can look valid but degrade the agent. Mitigation: preview-then-apply (no auto-apply) + ai_agent_histories per-agent revert + the §12.1 flag and revert-rate alert. | PM + Eng | before GA |
| 6 | Open Question | Diff rendering source of truth. Does the FE render the diff from patches (RFC 6902 paths) or by diffing previous vs updated_capability_pack? Paths reference upstream skill_pack shape, not the public capability_pack — confirm the FE has a readable mapping. | Eng (FE) + Eng (BE) | before RFC |
| 7 | Open Question | Design-vs-prod placement reconciliation. The interaction model is settled (§17 — right-rail Refine tab, multi-turn chat, accept-option-applies-to-form). But the prototype renders the editor in a modal at /bot-automation/ai-agents/:id (plural) with a Preview+Refine right rail, while prod AiAgentEditor.vue is a page at /bot-automation/ai-agent/:id (singular) and its right-rail Preview is a Phase-1 "pending" item (§16). Where exactly does the right rail live in prod, and does refine ship before/with Preview? | PM + Eng (FE) | before RFC |
PRD CHANGELOG
| Version | Date | By | Section | Type | Summary |
|---|---|---|---|---|---|
| 1.0 | 2026-06-29 | Claude | All | CREATED | Phase 4 (AI-Assisted Refinement / "Refine with AI") PRD created from the Autonomous Agent RFC §10.3b (refine-skill-pack, QON 51153994292 / 51226214880) and grounded in the current chatbot BE — drafter (draft-skill-pack) exists, full-merge PATCH /v2/ai_agents/:id is the only edit path today, bidirectional capability_pack↔skill_pack translation already present (sync_to_ai_service.rb + skill_pack_mapper.rb). Refiner does not exist yet. |
| 1.3 | 2026-06-29 | Claude | Title, Header, S2 (Phase Context) | MODIFIED | Renumbered Phase 4 → Phase 2 per PM (refinement is the next concrete step after Phase 1 and is independently shippable). File renamed to phase-2-ai-assisted-refinement.md; title, H1, CB Phase Number (Phase 2 of 4), prior/cross-phase references updated. In the anchor, Migrate shifted to Phase 3 and Iteration to Phase 4. |
| 1.2 | 2026-06-29 | Claude | S1, S4, S9 | MODIFIED | Score-prd v3.3 fixes: tightened the one-liner to ≤25 words (S1), added the UI State Diagram for the Refine panel (S9 New Features, closing the 10.6 diagram gap), and added explicit time horizons to "What Happens If We Don't Ship" (S4). |
| 1.1 | 2026-06-29 | Claude | Header, S2, S9, S10, S11, S16, S17, S18 | MODIFIED | Incorporated the existing design (Wulan's qontak-designer prototype app/pages/bot-automation/ai-agents/[id].vue): refine is a right-rail "Refine" tab (beside Preview), a multi-turn chat with suggestion chips, where the AI proposes options (each a per-field ProposedChange diff, Recommended flagged) and Accept stages the change into the form (highlight + tab switch); persistence stays the editor's existing Save. Corrected the earlier wrong assumption of a "side-panel chat config screen"; resolved OQ-7 (interaction model) into a §17 decision and replaced it with a design-vs-prod placement reconciliation question; upgraded REFINE-S03 to Should Have. |