Qontak Chatbot | AI Agent | Autonomous AI Agent — Phase 2: AI-Assisted Refinement

HEADER BLOCK

Field	Value
PM	Dimas Fauzi Hidayat (Product Manager, Mekari Qontak)
PRD Version	1.3
Status	DRAFT
PRD Type	PHASE
Epic	BOT-4191
Squad	BOT — Hadiningbot Squad
Engineering Lead	Eko Aprianto
Data Team	Data / ML Platform (noncore-mrag, chatbot-ai, mekari-agent owners)
RFC Link	RFC — Qontak Chatbot AI · Autonomous Agent §10.3b refine-skill-pack · detail: refine-skill-pack endpoint
Figma Master	Design exists — canonical source of truth is the designer prototype (Wulan): `qontak-designer` → `app/pages/bot-automation/ai-agents/[id].vue` (right rail with Preview + Refine tabs). Like Phase 1, the live prototype supersedes prose here; re-check it before implementing. Figma frames TBD.
UI/UX Designer	Wulan Febyazzahra
Anchor	Yes — Autonomous AI Agent — ANCHOR (Epic BOT-4191)
Labels	`epic:qontak-chatbot` \| `module:ai-agent` \| `feature:ai-agent-refine`
Last Updated	2026-06-29

Status values: DRAFT → READY → BUILD → SHIPPED

HEADER BLOCK
2. CONDITIONAL BLOCK: PHASE CONTEXT
3. One-liner + Problem
4. What Happens If We Don't Ship This Phase
5. Target Users + Persona Context
6. Non-Goals
7. Scope Changes
8. Constraints
9. New Features
10. API & Webhook Behavior
11. System Flow + User Stories + ACs
- 11.1. System Flow
- 11.2. User Stories
12. Rollout
- 12.1. Semantic Regression Rollback
13. Observability
- 13.1. Post-Launch Monitoring Cadence
14. Success Metrics
15. Launch Plan & Stage Gates
16. Dependencies
17. Key Decisions + Alternatives Rejected
18. Open Questions
PRD CHANGELOG

2. CONDITIONAL BLOCK: PHASE CONTEXT

Field	Detail
Anchor PRD	Autonomous AI Agent — ANCHOR (Confluence: QON 51188335138, Epic BOT-4191)
Phase Number	Phase 2 of 4
Phase Goal	Let tenants iterate a live autonomous agent's `capability_pack` through a conversational AI surface in the agent editor — paste an error or describe a misbehavior and get a conversational reply plus reviewable, apply-or-discard config changes — by adding a `refine` proxy on `chatbot` BE over the upstream `refine-skill-pack`, reusing the Phase-1 engine/config model and the drafter's validation pipeline
Prior phases	Phase 1: New Engine Migration (shipped on `/v2/ai_agents` autonomous mode, BOT-4235) — productionised the autonomous engine + new config model (Profile · Capabilities · Routing) and the drafter (`POST /v2/ai_agents/generate` → upstream `draft-skill-pack`). PRD: Phase 1. No other prior phases — the later Phase 3: Migrate Existing Configurations (TBD) and Phase 4: New-Configuration Iteration (TBD) come after this phase and do not block it.
This phase	The refiner: a conversational way to fix an existing autonomous agent. New `POST /v2/ai_agents/:id/refine` BE proxy → upstream `refine-skill-pack`; a "Refine" tab in the agent editor's right rail (alongside Preview) in `chatbot-fe` — a multi-turn chat where the AI proposes one or more options (each a field-level diff), and accepting one applies it into the form (highlights changed fields, switches to the relevant tab). Persistence is the editor's existing Save (→ `PATCH /v2/ai_agents/:id`). Design SoT = the `qontak-designer` prototype. No new Rails DDL.
Deferred to next	Server-side session/history persistence (FE owns it this phase); editing knowledge-base content via refine; auto-apply of high-confidence patches.
Cross-phase deps	Inherits Phase 1's `capability_pack` model, the `capability_pack`↔`skill_pack` adapter (`skill_pack_mapper.rb` + `SyncToAiService#build_skill_pack`), and the drafter's defensive post-processing (gate validation, tone coercion, orphan cleanup, reference filtering). Independently shippable — depends only on Phase 1, not on the later Phase 3/4. The `build_skill_pack` extraction (see §7/§17) must not change Phase 1's sync behavior.

Note: Phase 2 here is a workstream ordering, not a strict sequence gate — it can ship right after Phase 1 (and before the later Migrate/Iteration phases) because it builds only on Phase 1.

3. One-liner + Problem

One-liner: Let tenants fix a live autonomous agent by describing the problem and accepting reviewable, AI-proposed config changes — no hand-editing the capability_pack.

Problem: Today the only way to change a configured autonomous agent is to manually re-edit the Profile / Capabilities / Routing tabs in the form editor (AiAgentEditor.vue) and re-save the whole config — a full-merge PATCH /v2/ai_agents/:id that replaces the entire profile / capabilities / routing blocks (update_ai_agent.rb:83 in chatbot BE). When an agent misbehaves in production (an action firing on the wrong error, a capability that never triggers, a routing rule that exits too early), the tenant has to diagnose the capability_pack by hand and guess which field to change.

The drafter shipped in Phase 1 only generates from scratch — it cannot fix an existing agent. After this phase, a Chatbot Specialist (or customer Bot Builder) can describe the problem in plain language and get a conversational diagnosis plus a previewable, apply-or-discard diff of concrete config changes. For full initiative context, see the ANCHOR PRD: Autonomous AI Agent — ANCHOR.

4. What Happens If We Don't Ship This Phase

Maintenance stays specialist-bound and slow (immediately, every release). The 15+ production autonomous agents (26Q2 cohort) can only be fixed by a Chatbot Specialist hand-diagnosing the capability_pack and trial-and-error re-editing — every customer change request becomes a manual ticket, capping how many agents one specialist can maintain as the cohort grows through 26Q3+.
Self-serve confidence stays low (undercuts the later self-serve phases, 26Q3–Q4). The Design Validation research (15 IDIs) put non-technical self-config confidence at ~50–60%. Without an AI-assisted "fix it" loop, customer Bot Builders keep handing problems back to Mekari instead of resolving them, undercutting the self-serve goal the later Migrate/Iteration phases (Phase 3/4) are scheduled to deliver.
Competitive gap widens (ongoing). Competitor agent platforms increasingly offer conversational "fix your bot" iteration; every quarter we ship draft-only (generate but can't refine) leaves a visible hole in the autonomous product line during active competitive evaluations.

5. Target Users + Persona Context

Persona	Role	Goal	Pain	Workaround
Primary — Chatbot Specialist	Internal Qontak Chatbot Specialist (technical) maintaining the production autonomous agents (26Q2 cohort of 15 agents across 15 cids) on behalf of / jointly with customers	When an agent misbehaves, diagnose and fix the `capability_pack` quickly and safely, with a preview before it goes live	Must read the raw `capability_pack`, guess which `capabilities[]` / `routing[]` field is wrong, hand-edit the 3-tab editor, re-save the whole config, and re-test in preview — slow and error-prone	Trial-and-error edits in the Config tabs + repeated preview runs; sometimes rebuilds the capability from scratch via the drafter
Secondary — Dedicated Bot Builder (customer-side)	Technical or non-technical Plus / Ultimate / Qontak 360 admin maintaining their company's own AI agents	Fix their own agent's behavior without waiting on a Mekari specialist	No guided way to diagnose; the legacy modal lacks the autonomous engine; editing the new config still requires knowing the model	Files a change request to a Mekari specialist and waits, or accepts the degraded behavior

(Full persona background: see ANCHOR PRD. Plan availability + flag scope in §8 Constraints.)

6. Non-Goals

No silent / auto-apply. Refine never writes config on its own — every change is preview-then-apply; the tenant explicitly applies or discards. (Auto-apply of high-confidence patches is explicitly rejected — see §17.)
Autonomous-mode agents only. Refining a legacy tree_node / /ai-agent modal agent is out of scope — the refiner operates on the capability_pack of agents on the new engine.
No server-side session/history persistence (this phase). The BE is stateless per the RFC; the FE owns any refine session state. A persisted refine-conversation store is deferred.
Not a runtime test harness. Refine edits configuration; it does not run conversations against the agent. Validating behavior is done via Preview (Phase 1) and the AI Agent Testing initiative.
No knowledge-base content editing via refine. Refine may re-reference an existing file_search / vector store that already belongs to the agent, but it does not upload, edit, or vectorise KB files — that lives in the Resources / AI Agent Knowledge surface.
No creating new actions/tools via refine. Refine references only actions already registered in the tenant's functions registry — any action name or kb_id the model invents is stripped (same reference-filtering guarantee as the drafter) and surfaced as a warning.
One agent at a time. No bulk / multi-agent refine, and no cross-agent suggestions.

7. Scope Changes

Engineering surfaces this PRD touches (controlled vocab). Kept in sync with the scope_changes frontmatter above.

Backend — chatbot:
- New endpoint POST /v2/ai_agents/:id/refine (FrontendService::V2, proxy/BFF, session-auth, roles owner/supervisor/admin, flag-gated) — builds the request, proxies upstream, maps the response back. Does not persist.
- New client method refine_capability_pack (upstream refine-skill-pack) in lib/ai_service/ai_agent.rb — mirrors the existing draft_skill_pack (POST /qontak-ai-noncore-mrag/api/ai-agent/refine-skill-pack).
- New use-case + repository UseCases::RefineAiAgent + Repositories::Refine under app/api/frontend_service/v2/ai_agent/ — Clean Architecture, same shape as Generate.
- Refactor (shared mapper): extract build_skill_pack / build_skill / build_routing_rules / build_skill_actions / build_completion etc. out of Repositories::SyncToAiService into a shared Mappers::SkillPackBuilder (pure shaping), parameterised by a vector-store resolver strategy. SyncToAiService passes its existing stateful resolver (creates/reuses vector DBs); Refine passes a read-only resolver that reads the already-persisted capability['vector_store'] — so refine serialises capability_pack→skill_pack without creating vector stores. This is the one Phase-1 file touched; its sync behavior must not change.
- Reuse (apply path): consume the upstream's already-applied updated_skill_pack via the existing Mappers::SkillPackMapper (reverse direction), then persist through the existing Repositories::Update + SyncToAiService (mode: :update) — i.e. apply = a normal update; no new write path, same authz + ai_agent_histories audit.
- New feature flag ai_agent_refine | default: OFF.
- qontak-ai-noncore-mrag / mekari-agent: new upstream endpoint refine-skill-pack (returns reply + JSON Patch patches + already-applied, re-validated updated_skill_pack) — owned by Data / ML, the key external dependency (see §16).
Frontend — chatbot-fe:
- "Refine" tab in the agent editor's right rail (beside Preview), per Wulan's qontak-designer prototype — a multi-turn chat: the AI proposes option cards (each a per-field ProposedChange diff, Recommended flagged); Accept stages the option into the form in AiAgentEditor.vue (highlights changed fields, switches to the relevant tab), and the editor's existing Save persists (→ PATCH /v2/ai_agents/:id). FE owns the thread state. Design-vs-prod placement reconciliation: §18 OQ-7.
- Diff preview of returned patches with Apply / Discard; Apply calls the existing update endpoint with the previewed capability_pack.
Design — Figma for the Refine panel + diff-preview interaction (currently TBD; Stitch prompts stand in until then).

8. Constraints

Field	Value
Platform	Web only (Qontak admin — `chatbot-fe`). No mobile.
Performance	Refine round-trip target ≤ 10s p95 (dominated by the upstream LLM call); BE proxy overhead < 500ms. Hard client/read timeout aligns with the drafter (60s open/read) but the perceived target is ≤ 10s — beyond that the FE shows a "still working / try again" state.
Data limits	BE is stateless (no refine record persisted). `chat_history` sent upstream is capped at the last N turns (N TBD with ML — see §18) to bound token cost. One agent per refine request.
Plan scope	Same as autonomous mode — Plus / Ultimate / Qontak 360 workspaces with `autonomous_ai_agent` rollout = ON. Not Starter/Free.
Feature flag	`ai_agent_refine \| default: OFF` — enabled per workspace; gates both the BE endpoint and the FE panel.
Read/write	Refine (propose) + Apply (write) both restricted to roles `owner` / `supervisor` / `admin` — identical to today's draft/update authz. Refine itself writes nothing; Apply goes through the standard update + sync path.

9. New Features

Feature: "Refine" tab in the agent editor's right rail

Design source of truth: the qontak-designer prototype app/pages/bot-automation/ai-agents/[id].vue (Wulan). The agent editor has a right rail that toggles between Preview and Refine tabs; the left side is the tabbed form (Profile · Capabilities · Routing · Advanced). The prod editor today is AiAgentEditor.vue at /bot-automation/ai-agent/:id (note: prototype route is plural /ai-agents/:id and renders the editor in a modal — a design-vs-prod structural delta to reconcile at build, see §18 OQ-7; the Preview rail itself is a Phase-1 "pending" item, §16).

Field	Detail
URL	`/bot-automation/ai-agent/:id` (existing agent editor; Refine is a tab in its right rail, beside Preview)
Access	`owner` / `supervisor` / `admin` on autonomous-eligible workspaces with `ai_agent_refine` = ON

Component Tree (per the prototype):

Component	Parent	Purpose
`AiAgentEditor`	—	Existing editor: left = form tabs (Profile · Capabilities · Routing · Advanced); right = rail
`RightRail`	`AiAgentEditor`	Hosts the Preview and Refine tabs (`rightRailTab`)
`RefinePanel`	`RightRail`	The Refine tab — multi-turn chat thread + input
`RefineEmptyState`	`RefinePanel`	"Refine your agent" + suggestion chips (e.g. "The refund answer is not correct, fix it", "Add order tracking capability", "Make it faster to escalate to a human agent", "Make the tone more formal")
`RefineMessageThread`	`RefinePanel`	User/AI turns; AI replies stream in (`reply`)
`RefineOptionCard`	`RefineMessageThread`	One proposed option — `label`, `description`, `Recommended` badge, and a per-field diff (`ProposedChange`: type · field · current → new); Accept applies it, others dismiss
`RefineInput`	`RefinePanel`	Free-text box (Enter to send) → `POST /v2/ai_agents/:id/refine`

Apply behaviour (from the prototype's acceptRefineOption): accepting an option calls applyPendingData() → writes the change into the form, highlights the changed fields, switches to the relevant tab (Profile/Capabilities/Routing), and posts an AI confirmation turn. The other options for that message are marked dismissed. Nothing is persisted until the editor's existing Save (→ PATCH /v2/ai_agents/:id → Update + Sync).

UI States:

State	Description
Empty	"Refine your agent" + suggestion chips.
Loading	AI reply streaming / generating — input disabled (`refineIsGenerating`).
Error	Upstream timeout/5xx or BE failure — error turn, agent unchanged, retry; no option cards.
Success	AI `reply` streamed + one or more `RefineOptionCard`s (Recommended flagged) with Accept; on Accept, form fields highlight + tab switches.

📊 UI State Diagram — Refine panel

stateDiagram-v2
    [*] --> Empty: Open Refine tab
    Empty --> Loading: Submit message / chip
    Loading --> SuccessOptions: reply + option card(s) returned
    Loading --> NoChange: reply, no actionable options
    Loading --> Error: upstream timeout / 5xx
    Error --> Loading: Retry
    NoChange --> Loading: Send another message
    SuccessOptions --> FormApplied: Accept an option (form highlighted, tab switched)
    SuccessOptions --> Loading: Send follow-up (multi-turn)
    FormApplied --> Saved: Editor Save → PATCH /v2/ai_agents/:id
    FormApplied --> Loading: Keep refining
    Saved --> [*]
    Error --> [*]: Close (agent unchanged)

Figma: Frames TBD — the prototype above is canonical until then (see Header + §7).

10. API & Webhook Behavior

#	Behavior	Entity Affected	Triggered By	Expected Behavior	Failure Behavior
1	Refine capability_pack	AI Agent `capability_pack` (read-only — not persisted)	Tenant submits a message in the Refine panel → `POST /v2/ai_agents/:id/refine`	BE loads the agent, serialises its current `capability_pack` → `skill_pack` via the shared `SkillPackBuilder` (read-only vector resolver), gathers `available_tools`, and proxies upstream with `user_message` + `chat_history` (+ optional `trace`). Upstream returns a conversational `reply`, JSON Patch `patches` (RFC 6902), and an already-applied, re-validated `updated_skill_pack`. BE maps `updated_skill_pack` → `capability_pack` via `SkillPackMapper` and returns `reply` + `patches` + the previewed `capability_pack` + `warnings`. Nothing is written.	Upstream timeout/5xx → BE returns a graceful error; agent unchanged. Upstream LLM/validation issue → upstream returns its deterministic fallback (never 5xx for LLM transport); BE passes through `reply` + empty/partial `patches`. Referenced action/`kb_id` not in inputs → stripped by reference filtering, returned as a `warning`.
2	Accept an option (stage into form)	Editor form state (client-side; not yet persisted)	Tenant clicks Accept on a `RefineOptionCard`	FE applies the option's `pendingData` into the form, highlights the changed fields, and switches to the relevant tab; other options for that message are dismissed. No BE call yet.	N/A — client-side; reversible by not saving / re-refining.
3	Save (persist the refined config)	AI Agent `capability_pack` (persisted + re-synced)	Tenant clicks the editor's existing Save after accepting one or more options	Standard update path: `PATCH /v2/ai_agents/:id` → `Repositories::Update` writes new `parameters`, `SyncToAiService` (`mode: :update`) re-pushes `skill_pack` upstream + re-resolves vector stores; change live immediately; prior config snapshotted in `ai_agent_histories`.	Sync upstream fails → DB transaction rolls back; agent stays on prior config; error shown. Capability/routing ref validation fails → 400; no write.
4	Discard / don't apply	None	Tenant ignores the options or closes without saving	No write; proposed options dropped; the chat thread may continue.	N/A — purely client-side.

[Claude to resolve during RFC: exact request/response JSON schema for /refine (user_message, chat_history[], trace{}, available_tools[] in; reply, patches[], updated_capability_pack, warnings[] out), HTTP error codes, and the SkillPackBuilder vector-resolver interface.]

11. System Flow + User Stories + ACs

11.1. System Flow

Flow: Refine an autonomous agent and apply a change · Type: User Journey + API Sequence

Tenant opens an autonomous agent in the editor (AiAgentEditor) and opens the Refine with AI surface.
Tenant types a message — e.g. pastes an error trace: "createorder keeps failing with 'belum terdaftar' but the bot just gives up."
FE sends POST /v2/ai_agents/:id/refine with user_message + chat_history.
BE loads the agent, serialises its current capability_pack → skill_pack via the shared SkillPackBuilder (read-only vector resolver — no vector DB created), gathers available_tools, and proxies upstream refine-skill-pack.
Upstream LLM returns reply + patches (RFC 6902) + updated_skill_pack (already applied + re-validated through the drafter's defensive pipeline).
BE maps updated_skill_pack → capability_pack via SkillPackMapper; returns reply + patches + previewed capability_pack + warnings. Nothing persisted.
FE renders the streamed reply plus one or more option cards — each with a Recommended flag and a per-field diff (ProposedChange: current → new); warnings shown if any.
Tenant clicks Accept on an option → FE applies its pendingData into the form, highlights the changed fields, switches to the relevant tab; other options dismissed. (No BE write yet.)
Tenant clicks the editor's existing Save → PATCH /v2/ai_agents/:id → Update + SyncToAiService re-push; change live immediately; prior config snapshotted in ai_agent_histories.
Failure branch (refine): upstream times out / 5xx → FE shows "couldn't generate a suggestion — agent unchanged", retry available.
Failure branch (save): sync to upstream fails → transaction rolls back; agent stays on prior config; error shown.
Tenant can keep refining — follow-up turns carry chat_history (multi-turn thread per the design).

📊 System Flow — Refine with AI

sequenceDiagram
    actor Tenant
    participant FE as chatbot-fe (Refine panel)
    participant BE as chatbot BE (/v2/ai_agents)
    participant ML as noncore-mrag / mekari-agent
    Tenant->>FE: Describe issue / paste error
    FE->>BE: POST /v2/ai_agents/:id/refine (user_message, chat_history)
    BE->>BE: SkillPackBuilder → skill_pack (read-only vector resolver)
    BE->>ML: refine-skill-pack (skill_pack, message, history, available_tools)
    ML-->>BE: reply + patches + updated_skill_pack (re-validated)
    BE->>BE: SkillPackMapper → capability_pack (not persisted)
    BE-->>FE: reply + patches + previewed capability_pack + warnings
    FE-->>Tenant: Reply + option cards (Recommended; per-field diff)
    alt Accept an option, then Save
        Tenant->>FE: Accept option
        FE->>FE: applyPendingData → form highlighted + tab switched (no write)
        Tenant->>FE: Save
        FE->>BE: PATCH /v2/ai_agents/:id (updated capability_pack)
        BE->>BE: Update + SyncToAiService(mode: update)
        BE->>ML: PUT /ai-agent (re-push skill_pack)
        BE-->>FE: Updated agent (live) + history snapshot
    else Refine upstream fails
        ML-->>BE: timeout / 5xx
        BE-->>FE: Graceful error — agent unchanged
        FE-->>Tenant: "Couldn't generate a suggestion — try again"
    end

11.2. User Stories

User Story	Importance	Mockup / Technical Notes	Acceptance Criteria
[REFINE-S01] — Refine an agent in natural language As a Chatbot Specialist, I want to describe a misbehavior or paste an error and get a suggested config change, so that I can fix a live agent without hand-diagnosing the `capability_pack`.	Must Have	Figma: Pending — see §9 / Stitch. Data Fields: • `id` (string/uuid, required) — agent id, URL param • `user_message` (string, required, min 1 char) — User input • `chat_history` (array, optional) — FE thread state (multi-turn per the design) • `trace` (object, optional) — recent `workflow_state` / turns (see §18) • `patches` (array, response) — RFC 6902 ops from upstream • `warnings` (array, response) — stripped refs Before-After Behavior: Before: the only way to change an agent is to manually re-edit the Profile/Capabilities/Routing form tabs and full-merge `PATCH /v2/ai_agents/:id`; after, the tenant describes the issue in the editor and the system returns a reply + reviewable diff with nothing written until applied.	— Happy Path — • AC-1: Given an autonomous agent and `ai_agent_refine` = ON, when the tenant submits a `user_message`, then the system returns a conversational `reply` plus `patches` and a previewed `capability_pack`, and persists nothing. • AC-2: Given a change is returned, when the response renders, then it shows one or more option cards (the best flagged Recommended), each with a per-field diff (`ProposedChange`: current → new) and an Accept action. • AC-3: Given the message asks for nothing actionable (e.g. "thanks"), when the system responds, then `reply` is returned with no option cards. • AC-4: Given the upstream proposes an action name or `kb_id` not in the agent's inputs, when the response is built, then that reference is stripped and surfaced in `warnings`. — Error / Unhappy Path — • ERR-1: Given the upstream `refine-skill-pack` times out or returns 5xx, when the tenant submits, then the agent is unchanged, an error reply is shown with retry, and `refine_failed` is logged with the reason. • ERR-2: Given an upstream LLM/validation issue (not transport), when it occurs, then the upstream deterministic fallback `reply` is passed through with empty/partial `patches` (never a 5xx to the tenant). — Permission Model — • CAN: `owner` / `supervisor` / `admin` on autonomous-eligible workspaces with `ai_agent_refine` = ON. • CANNOT: other roles; workspaces with the flag OFF. • Unauthorized: Refine panel not rendered; endpoint returns 403. — UI States — • Loading: thinking indicator on assistant turn; input disabled. • Empty: prompt to describe the issue or paste an error. • Error: "couldn't generate a suggestion — agent unchanged" + retry. • Success: reply + option card(s), Recommended flagged (+ warnings).
[REFINE-S02] — Accept an option and save the change As a Chatbot Specialist, I want to accept a proposed option, see it land in the form, then save, so that no change goes live without my explicit review.	Must Have	Figma: Per prototype. Data Fields: • `pendingData` (object) — the option's structured change applied to the form • `changes` (array) — `ProposedChange[]` rendered as the per-field diff • `capability_pack` (object, required on Save) — the resulting pack sent to `PATCH` • `id` (string/uuid, required) — agent id Before-After Behavior: Before: any save rewrites the whole config with no diff; after, the tenant accepts a specific option (form fields highlight, tab switches), reviews in the form, and saves through the standard update path with the prior config snapshotted for revert.	— Happy Path — • AC-1: Given option cards are shown, when the tenant clicks Accept on one, then its `pendingData` is applied into the form, the changed fields are highlighted, the editor switches to the relevant tab, the other options are dismissed, and `refine_accepted` is logged. No BE write yet. • AC-2: Given an option was accepted, when the tenant clicks the editor's Save, then `PATCH /v2/ai_agents/:id` runs `Update` + `SyncToAiService` (`mode: update`), the change is live, the prior config is snapshotted in `ai_agent_histories`, and `refine_applied` is logged. • AC-3: Given option cards are shown, when the tenant accepts none (closes / keeps chatting), then nothing is written, and `refine_discarded` is logged. — Error / Unhappy Path — • ERR-1: Given Save is clicked, when `SyncToAiService` fails to push upstream, then the DB transaction rolls back, the agent stays on the prior config, and an error is shown. • ERR-2: Given Save is clicked, when capability/routing reference validation fails, then the update returns 400 and no write occurs. — Permission Model — • CAN: `owner` / `supervisor` / `admin` (same as the existing update endpoint). • CANNOT: other roles. • Unauthorized: Accept/Save not rendered; endpoint returns 403. — UI States — • Loading: Save shows a spinner; actions disabled. • Empty: N/A (only shown when an option exists). • Error: inline "couldn't save — agent unchanged" + retry. • Success: accepted option marked applied; form reflects the change; Save confirms. Dependencies: [REFINE-S01]
[REFINE-S03] — Iterative (multi-turn) refinement As a Chatbot Specialist, I want to send follow-up requests that build on the prior turn, so that I can iterate ("now also handle the timeout case") without restating context.	Should Have	Figma: Per prototype (the Refine tab is a multi-turn thread). Data Fields: • `chat_history` (array) — FE-owned thread, capped at last N turns sent upstream Before-After Behavior: Before: no refine concept exists; after, the Refine tab keeps a multi-turn thread and the FE sends `chat_history` so follow-ups are context-aware, while the BE stays stateless.	— Happy Path — • AC-1: Given a prior refine turn, when the tenant sends a follow-up, then the FE includes `chat_history` and the response reflects the earlier context. • AC-2: Given a long thread, when history exceeds the cap, then only the last N turns are sent upstream (N per §18) and the rest stay client-side. • AC-3: Given the tenant reloads or reopens the editor, when no session is restored, then a fresh thread starts (no server-side history this phase — see §6 Non-Goal 3). — Error / Unhappy Path — • ERR-1: Given a mid-thread refine call fails, when it errors, then prior turns remain visible and the failed turn can be retried. — Permission Model — • CAN: same as [REFINE-S01]. • CANNOT: same as [REFINE-S01]. • Unauthorized: Refine tab not rendered. — UI States — • Loading: per-turn streaming indicator. • Empty: "Refine your agent" + suggestion chips. • Error: failed turn marked retryable. • Success: appended response (+ option cards if any). Dependencies: [REFINE-S01]
[REFINE-S01-NEG] — No refine on legacy agents; never auto-apply (Guard Rail — from Non-Goals 1 & 2) As a tenant on a legacy `tree_node` agent, when I look for Refine, then it is not available; and no refine result is ever written without explicit Apply.	Guard Rail	—	• NEG-1: Given a legacy `tree_node` / `/ai-agent` modal agent, when the tenant opens its config, then the Refine panel is not rendered and `/v2/ai_agents/:id/refine` returns a 4xx for that agent. • NEG-2: Given any successful refine response, when the tenant takes no action, then no config change is persisted (no auto-apply).

🧪 Test Coverage Matrix — [REFINE-S01]

Dimension	Coverage	Notes
Boundary values	⚠️ partial	AC-3 covers no-actionable-change (empty patches); ⚠️ QA: empty/whitespace `user_message` (min 1 char), very long message, very large `capability_pack`
State transitions	✅ defined	AC-1 (returns, nothing persisted) → S02 apply/discard transition
Data validation	✅ defined	AC-4 reference filtering (unknown action/`kb_id` stripped → warning)
Concurrency	⚠️ TBD	⚠️ QA: two specialists refine the same agent simultaneously; refine in flight while another user applies a manual edit
Network/timeout	✅ defined	ERR-1 upstream timeout/5xx → agent unchanged + retry; ERR-2 LLM fallback never 5xx

🧪 Test Coverage Matrix — [REFINE-S02]

Dimension	Coverage	Notes
Boundary values	⚠️ TBD	⚠️ QA: apply with empty patch set; apply a stale preview after the agent changed underneath
State transitions	✅ defined	AC-1 apply→live; AC-3 discard→no-op; ERR-1 apply-fail→rollback
Data validation	✅ defined	ERR-2 capability/routing ref validation → 400, no write
Concurrency	⚠️ TBD	⚠️ QA: apply while a parallel manual save commits (last-writer / optimistic-lock behavior)
Network/timeout	✅ defined	ERR-1 `SyncToAiService` upstream failure → transaction rollback, prior config intact

12. Rollout

Field	Detail
Feature flag	`ai_agent_refine` (see §8 — OFF by default)
Rollout	Stage 1 (Internal Alpha) → Chatbot Specialists maintaining the 26Q2 cohort (the 15 production agents) Stage 2 (Closed Beta) → 3–5 customer Bot Builders on Plus/Ultimate/360 Stage 3 (Open Beta) → all autonomous-eligible workspaces, opt-in GA → all autonomous-eligible workspaces, flag default ON
Backward compat	Yes — purely additive. The existing draft + full-config edit path (`PATCH /v2/ai_agents/:id`) is unchanged; Apply reuses it. The only Phase-1 code touched is the `build_skill_pack` extraction (§7), which must preserve identical sync output.
Migration	None — no Rails DDL; no data migration.

12.1. Semantic Regression Rollback

Refine produces AI output (proposed config patches), so this section applies.

Field	Detail
Model flag	`ai_agent_refine \| default: OFF` — disabling it removes the refine endpoint + panel; manual config editing remains fully available.
Regression metric	(a) refine patch apply-success rate (applied / proposed) and (b) post-apply agent regression — agents whose config was changed via refine and then reverted or re-edited within 48h.
Rollback threshold	Apply-success rate < 30% sustained over a week, or post-apply revert rate > 20%, or `refine_failed` rate > 10% → pause rollout / flip the flag OFF for affected workspaces.
Rollback path	Two levels: (1) feature — toggle `ai_agent_refine` OFF (no deploy); (2) per-agent — an applied-but-worse config is reverted by restoring the prior snapshot from `ai_agent_histories` (the standard update-audit trail), which re-syncs the old `skill_pack` upstream.

13. Observability

Key Events:

Event Name	Trigger	Properties
`refine_requested`	Tenant submits a refine message	company_id, ai_agent_id, message_len, history_turns, timestamp
`refine_succeeded`	Upstream returns a valid response	company_id, ai_agent_id, patch_count, warning_count, latency_ms, timestamp
`refine_failed`	Upstream timeout/5xx or BE error	company_id, ai_agent_id, reason, latency_ms, timestamp
`refine_applied`	Tenant clicks Apply and update succeeds	company_id, ai_agent_id, patch_count, timestamp
`refine_discarded`	Tenant clicks Discard	company_id, ai_agent_id, patch_count, timestamp
`refine_reverted`	Applied config reverted via `ai_agent_histories` within 48h	company_id, ai_agent_id, timestamp

Dashboard owner: BOT — Hadiningbot Squad (chatbot)

Alerts:

refine_failed rate > 10% of refine_requested over 1h → page on-call (chatbot) + notify PM.
Refine latency p95 > 10s over 1h → notify chatbot squad (upstream LLM latency check).
refine_reverted / refine_applied > 20% over a week → PM review (quality regression).

13.1. Post-Launch Monitoring Cadence

Field	Detail
Review cadence	Weekly for the first 4 weeks post-GA, then monthly.
Owner	PM (Dimas) + BOT squad.
Review scope	All §14 metrics — adoption (refine vs manual edits), apply-success rate, error rate, time-to-fix.
Trigger thresholds	• Apply-success rate < 30% for a week → investigate prompt/UX. • `refine_failed` rate > 10% in any week → investigate upstream. • `refine_reverted`/`refine_applied` > 20% → quality review within 48h.
Rollback consideration	If error or revert thresholds breach and are unresolved within 48h, PM flips `ai_agent_refine` OFF for affected workspaces (see §12.1).

14. Success Metrics

Adoption & Usage:

Metric	Definition	Baseline	Target
⭐ Refine adoption	Share of autonomous-agent config changes made via Refine (applied) vs manual tab edits	N/A — new capability	≥ 40% of config changes via Refine within 60 days of GA
Refine engagement	Distinct agents that received ≥1 refine session	N/A	≥ 60% of active autonomous agents within 60 days of GA

Quality & Accuracy:

Metric	Definition	Baseline	Target
Apply-success rate	Applied refinements / proposed refinements (a proxy for suggestion usefulness)	N/A	≥ 50% within 30 days of GA
Refine error rate	`refine_failed` / `refine_requested`	N/A	< 5% steady-state
Post-apply revert rate	Applied configs reverted/re-edited within 48h	N/A	< 15%

Efficiency & Impact:

Metric	Definition	Baseline	Target
Time-to-fix	Median time from "agent misbehaving" to a shipped config fix	Manual baseline TBD (measure in Alpha)	−50% vs manual baseline within 90 days of GA

15. Launch Plan & Stage Gates

Stage	Audience	Duration	Success Gate to Advance	Owner
Internal Alpha	Chatbot Specialists, 26Q2 cohort (15 agents)	2 weeks	≥ 20 real refine sessions; apply-success ≥ 40%; `refine_failed` < 10%; no rollback-worthy regression	PM + Eng
Closed Beta	3–5 customer Bot Builders	3 weeks	Apply-success ≥ 50%; error rate < 5%; ≥ 1 customer fixes an agent unaided; post-apply revert < 20%	PM + CSM
Open Beta	All autonomous-eligible, opt-in	3 weeks	Adoption trending toward 40%; latency p95 ≤ 10s; all Closed-Beta gates sustained	Eng Lead
GA	All autonomous-eligible (flag default ON)	Ongoing	All Open-Beta gates sustained 2 weeks; PMM approved	PM + PMM

16. Dependencies

Dependency	Owning Team	Deliverable Needed	Blocking?
Upstream `refine-skill-pack` endpoint (`mekari-agent` / proxied by `noncore-mrag`)	Data / ML Platform	The endpoint itself: accepts `skill_pack` + `user_message` + `chat_history` (+ `trace`, `available_tools`); returns `reply` + RFC 6902 `patches` + already-applied, re-validated `updated_skill_pack` + `warnings`. Does not exist yet.	YES
Phase 1 `capability_pack` model + drafter live on `/v2/ai_agents`	BOT — Hadiningbot (chatbot)	Must remain stable — the refiner serialises/maps the same `capability_pack` and the `SkillPackBuilder` is extracted from Phase 1's `SyncToAiService`	YES
`capability_pack`↔`skill_pack` adapter (`skill_pack_mapper.rb` reverse map + the extracted `SkillPackBuilder`)	BOT — Hadiningbot (chatbot)	Bidirectional mapping reused for refine input/output; extraction must not change Phase-1 sync output	YES
`ai_agent_histories` audit (exists)	BOT — Hadiningbot (chatbot)	Used as the per-agent revert path for an applied-but-worse config	NO
`trace` source (recent `workflow_state` / turns) for debugging context	BOT + Data/ML (overlaps AI Agent Live Monitoring)	Optional runtime telemetry to enrich refine; refine works without it (degraded)	NO
Agent editor right rail (Preview tab — Phase-1 "pending")	BOT — Hadiningbot (chatbot)	The Refine tab lives in the same right rail as Preview; confirm whether the rail ships with Preview or refine stands up the rail (see §18 OQ-7)	NO
Refine design (right-rail chat)	Design — Wulan	Already prototyped in `qontak-designer` (`app/pages/bot-automation/ai-agents/[id].vue`); Figma frames are a follow-up, prototype is canonical meanwhile	NO

17. Key Decisions + Alternatives Rejected

8a — Decisions Made

Date	Decision	Rationale
2026-06-29	Upstream returns `updated_skill_pack` already applied + re-validated; BE does not apply patches itself. BE passes `patches` through to FE for the diff preview only.	Keeps `chatbot` BE a thin proxy (same posture as the drafter) and guarantees the refined pack went through the same defensive pipeline as the drafter (gate validation, tone coercion, orphan cleanup, reference filtering). Avoids a second, drift-prone RFC-6902 implementation in Rails.
2026-06-29	Apply reuses the existing `PATCH /v2/ai_agents/:id` (Update + SyncToAiService), not a new apply endpoint.	Apply is a normal config update — reuses authz, validation, upstream re-push, and `ai_agent_histories` audit/revert for free.
2026-06-29	Extract `build_skill_pack` from `SyncToAiService` into a shared `SkillPackBuilder` with a pluggable vector-store resolver (stateful for sync, read-only for refine).	The refiner must serialise the current `capability_pack`→`skill_pack` without creating vector DBs; sync must keep its side-effecting resolution. One shared mapper, two resolvers, no duplicated shaping logic.
2026-06-29	Stateless BE; FE owns any refine session state (no new table).	Matches RFC §10.3b; avoids Rails DDL and a dual source of truth during the 26Q2 window.
2026-06-29	Refine is always review-then-apply (no auto-apply).	Trust/safety — a config change to a live customer agent must be a human decision.
2026-06-29	Interaction model resolved by design (Wulan prototype): a "Refine" tab in the agent editor's right rail (beside Preview), multi-turn chat, AI proposes 1+ options each with a per-field diff (Recommended flagged); Accept stages the option into the form (highlight + tab switch); persistence is the editor's existing Save.	Supersedes the earlier "single-shot vs chat / drawer vs inline" open question — the `qontak-designer` prototype is the design source of truth (same posture Phase 1 took with its prototype).

8b — Alternatives Rejected

Alternative	Why Rejected	Date
BE applies the RFC 6902 patches itself (Rails JSON-Patch)	Upstream already returns the applied + re-validated pack; reapplying in BE duplicates logic and risks drift from the drafter's validation pipeline	2026-06-29
A dedicated `/refine/apply` endpoint	Apply is just an update — reuse `PATCH /v2/ai_agents/:id`; a new endpoint duplicates authz/sync/audit	2026-06-29
Persist refine chat history server-side (new table)	Adds DDL + dual source of truth; FE-owned session is sufficient for this phase (deferred to a later phase if audit needs it)	2026-06-29
Auto-apply high-confidence patches	Unacceptable risk to live customer agents; conflicts with Non-Goal 1	2026-06-29
Build refinement entirely in the `chatbot-ml-dev` prototype	Same reasons Phase 1 productionised the engine — no plan/tier gating, no auth surface, no rollout control, no audit	2026-06-29

18. Open Questions

#	Type	Question	Owner	Deadline
1	Risk	Upstream `refine-skill-pack` does not exist yet. The whole feature is blocked on the Data/ML endpoint. Mitigation: confirm ownership + the exact contract with the `mekari-agent`/`noncore-mrag` owners before BE build; agree the request/response schema up front so the BE proxy + FE can be built against a stub.	PM (Dimas) + Data/ML	2026-07-15
2	Open Question	What goes in `trace`? Which runtime telemetry (recent `workflow_state`, recent turns) meaningfully improves refine quality, and where does it come from — does it overlap AI Agent Live Monitoring's signals? Refine must work without it (degraded).	PM + Eng (Eko)	before RFC
3	Assumption	`chat_history` cap (N turns). We assume the FE sends the last N turns to bound upstream token cost. Confirm N + truncation strategy with ML.	Eng + Data/ML	before RFC
4	Open Question	KB scope on apply. Non-Goal 5 keeps KB content out, but if a refinement changes which `file_search`/vector store a capability points to, does Apply trigger `SyncToAiService` vector resolution (and is that desired), or must KB-affecting patches be rejected?	PM + Eng	before RFC
5	Risk	Applied-but-worse config. A refinement can look valid but degrade the agent. Mitigation: preview-then-apply (no auto-apply) + `ai_agent_histories` per-agent revert + the §12.1 flag and revert-rate alert.	PM + Eng	before GA
6	Open Question	Diff rendering source of truth. Does the FE render the diff from `patches` (RFC 6902 paths) or by diffing previous vs `updated_capability_pack`? Paths reference upstream `skill_pack` shape, not the public `capability_pack` — confirm the FE has a readable mapping.	Eng (FE) + Eng (BE)	before RFC
7	Open Question	Design-vs-prod placement reconciliation. The interaction model is settled (§17 — right-rail Refine tab, multi-turn chat, accept-option-applies-to-form). But the prototype renders the editor in a modal at `/bot-automation/ai-agents/:id` (plural) with a Preview+Refine right rail, while prod `AiAgentEditor.vue` is a page at `/bot-automation/ai-agent/:id` (singular) and its right-rail Preview is a Phase-1 "pending" item (§16). Where exactly does the right rail live in prod, and does refine ship before/with Preview?	PM + Eng (FE)	before RFC

PRD CHANGELOG

Version	Date	By	Section	Type	Summary
1.0	2026-06-29	Claude	All	CREATED	Phase 4 (AI-Assisted Refinement / "Refine with AI") PRD created from the Autonomous Agent RFC §10.3b (`refine-skill-pack`, QON 51153994292 / 51226214880) and grounded in the current `chatbot` BE — drafter (`draft-skill-pack`) exists, full-merge `PATCH /v2/ai_agents/:id` is the only edit path today, bidirectional `capability_pack`↔`skill_pack` translation already present (`sync_to_ai_service.rb` + `skill_pack_mapper.rb`). Refiner does not exist yet.
1.3	2026-06-29	Claude	Title, Header, S2 (Phase Context)	MODIFIED	Renumbered Phase 4 → Phase 2 per PM (refinement is the next concrete step after Phase 1 and is independently shippable). File renamed to `phase-2-ai-assisted-refinement.md`; title, H1, CB Phase Number (Phase 2 of 4), prior/cross-phase references updated. In the anchor, Migrate shifted to Phase 3 and Iteration to Phase 4.
1.2	2026-06-29	Claude	S1, S4, S9	MODIFIED	Score-prd v3.3 fixes: tightened the one-liner to ≤25 words (S1), added the UI State Diagram for the Refine panel (S9 New Features, closing the 10.6 diagram gap), and added explicit time horizons to "What Happens If We Don't Ship" (S4).
1.1	2026-06-29	Claude	Header, S2, S9, S10, S11, S16, S17, S18	MODIFIED	Incorporated the existing design (Wulan's `qontak-designer` prototype `app/pages/bot-automation/ai-agents/[id].vue`): refine is a right-rail "Refine" tab (beside Preview), a multi-turn chat with suggestion chips, where the AI proposes options (each a per-field `ProposedChange` diff, Recommended flagged) and Accept stages the change into the form (highlight + tab switch); persistence stays the editor's existing Save. Corrected the earlier wrong assumption of a "side-panel chat config screen"; resolved OQ-7 (interaction model) into a §17 decision and replaced it with a design-vs-prod placement reconciliation question; upgraded REFINE-S03 to Should Have.

HEADER BLOCK​

Table of Contents​

2. CONDITIONAL BLOCK: PHASE CONTEXT​

3. One-liner + Problem​

4. What Happens If We Don't Ship This Phase​

5. Target Users + Persona Context​

6. Non-Goals​

7. Scope Changes​

8. Constraints​

9. New Features​

📊 UI State Diagram — Refine panel​

10. API & Webhook Behavior​

11. System Flow + User Stories + ACs​

11.1. System Flow​

📊 System Flow — Refine with AI​

11.2. User Stories​

🧪 Test Coverage Matrix — [REFINE-S01]​

🧪 Test Coverage Matrix — [REFINE-S02]​

12. Rollout​

12.1. Semantic Regression Rollback​

13. Observability​

13.1. Post-Launch Monitoring Cadence​

14. Success Metrics​

15. Launch Plan & Stage Gates​

16. Dependencies​

17. Key Decisions + Alternatives Rejected​

18. Open Questions​

PRD CHANGELOG​