Skip to main content

Qontak Chatbot | AI Agent | Autonomous AI Agent — Phase 2: AI-Assisted Refinement

HEADER BLOCK

FieldValue
PMDimas Fauzi Hidayat (Product Manager, Mekari Qontak)
PRD Version1.3
StatusDRAFT
PRD TypePHASE
EpicBOT-4191
SquadBOT — Hadiningbot Squad
Engineering LeadEko Aprianto
Data TeamData / ML Platform (noncore-mrag, chatbot-ai, mekari-agent owners)
RFC LinkRFC — Qontak Chatbot AI · Autonomous Agent §10.3b refine-skill-pack · detail: refine-skill-pack endpoint
Figma MasterDesign exists — canonical source of truth is the designer prototype (Wulan): qontak-designerapp/pages/bot-automation/ai-agents/[id].vue (right rail with Preview + Refine tabs). Like Phase 1, the live prototype supersedes prose here; re-check it before implementing. Figma frames TBD.
UI/UX DesignerWulan Febyazzahra
AnchorYes — Autonomous AI Agent — ANCHOR (Epic BOT-4191)
Labelsepic:qontak-chatbot | module:ai-agent | feature:ai-agent-refine
Last Updated2026-06-29

Status values: DRAFTREADYBUILDSHIPPED


Table of Contents


2. CONDITIONAL BLOCK: PHASE CONTEXT

FieldDetail
Anchor PRDAutonomous AI Agent — ANCHOR (Confluence: QON 51188335138, Epic BOT-4191)
Phase NumberPhase 2 of 4
Phase GoalLet tenants iterate a live autonomous agent's capability_pack through a conversational AI surface in the agent editor — paste an error or describe a misbehavior and get a conversational reply plus reviewable, apply-or-discard config changes — by adding a refine proxy on chatbot BE over the upstream refine-skill-pack, reusing the Phase-1 engine/config model and the drafter's validation pipeline
Prior phasesPhase 1: New Engine Migration (shipped on /v2/ai_agents autonomous mode, BOT-4235) — productionised the autonomous engine + new config model (Profile · Capabilities · Routing) and the drafter (POST /v2/ai_agents/generate → upstream draft-skill-pack). PRD: Phase 1. No other prior phases — the later Phase 3: Migrate Existing Configurations (TBD) and Phase 4: New-Configuration Iteration (TBD) come after this phase and do not block it.
This phaseThe refiner: a conversational way to fix an existing autonomous agent. New POST /v2/ai_agents/:id/refine BE proxy → upstream refine-skill-pack; a "Refine" tab in the agent editor's right rail (alongside Preview) in chatbot-fe — a multi-turn chat where the AI proposes one or more options (each a field-level diff), and accepting one applies it into the form (highlights changed fields, switches to the relevant tab). Persistence is the editor's existing Save (→ PATCH /v2/ai_agents/:id). Design SoT = the qontak-designer prototype. No new Rails DDL.
Deferred to nextServer-side session/history persistence (FE owns it this phase); editing knowledge-base content via refine; auto-apply of high-confidence patches.
Cross-phase depsInherits Phase 1's capability_pack model, the capability_packskill_pack adapter (skill_pack_mapper.rb + SyncToAiService#build_skill_pack), and the drafter's defensive post-processing (gate validation, tone coercion, orphan cleanup, reference filtering). Independently shippable — depends only on Phase 1, not on the later Phase 3/4. The build_skill_pack extraction (see §7/§17) must not change Phase 1's sync behavior.

Note: Phase 2 here is a workstream ordering, not a strict sequence gate — it can ship right after Phase 1 (and before the later Migrate/Iteration phases) because it builds only on Phase 1.


3. One-liner + Problem

One-liner: Let tenants fix a live autonomous agent by describing the problem and accepting reviewable, AI-proposed config changes — no hand-editing the capability_pack.

Problem: Today the only way to change a configured autonomous agent is to manually re-edit the Profile / Capabilities / Routing tabs in the form editor (AiAgentEditor.vue) and re-save the whole config — a full-merge PATCH /v2/ai_agents/:id that replaces the entire profile / capabilities / routing blocks (update_ai_agent.rb:83 in chatbot BE). When an agent misbehaves in production (an action firing on the wrong error, a capability that never triggers, a routing rule that exits too early), the tenant has to diagnose the capability_pack by hand and guess which field to change.

The drafter shipped in Phase 1 only generates from scratch — it cannot fix an existing agent. After this phase, a Chatbot Specialist (or customer Bot Builder) can describe the problem in plain language and get a conversational diagnosis plus a previewable, apply-or-discard diff of concrete config changes. For full initiative context, see the ANCHOR PRD: Autonomous AI Agent — ANCHOR.


4. What Happens If We Don't Ship This Phase

  • Maintenance stays specialist-bound and slow (immediately, every release). The 15+ production autonomous agents (26Q2 cohort) can only be fixed by a Chatbot Specialist hand-diagnosing the capability_pack and trial-and-error re-editing — every customer change request becomes a manual ticket, capping how many agents one specialist can maintain as the cohort grows through 26Q3+.
  • Self-serve confidence stays low (undercuts the later self-serve phases, 26Q3–Q4). The Design Validation research (15 IDIs) put non-technical self-config confidence at ~50–60%. Without an AI-assisted "fix it" loop, customer Bot Builders keep handing problems back to Mekari instead of resolving them, undercutting the self-serve goal the later Migrate/Iteration phases (Phase 3/4) are scheduled to deliver.
  • Competitive gap widens (ongoing). Competitor agent platforms increasingly offer conversational "fix your bot" iteration; every quarter we ship draft-only (generate but can't refine) leaves a visible hole in the autonomous product line during active competitive evaluations.

5. Target Users + Persona Context

PersonaRoleGoalPainWorkaround
Primary — Chatbot SpecialistInternal Qontak Chatbot Specialist (technical) maintaining the production autonomous agents (26Q2 cohort of 15 agents across 15 cids) on behalf of / jointly with customersWhen an agent misbehaves, diagnose and fix the capability_pack quickly and safely, with a preview before it goes liveMust read the raw capability_pack, guess which capabilities[] / routing[] field is wrong, hand-edit the 3-tab editor, re-save the whole config, and re-test in preview — slow and error-proneTrial-and-error edits in the Config tabs + repeated preview runs; sometimes rebuilds the capability from scratch via the drafter
Secondary — Dedicated Bot Builder (customer-side)Technical or non-technical Plus / Ultimate / Qontak 360 admin maintaining their company's own AI agentsFix their own agent's behavior without waiting on a Mekari specialistNo guided way to diagnose; the legacy modal lacks the autonomous engine; editing the new config still requires knowing the modelFiles a change request to a Mekari specialist and waits, or accepts the degraded behavior

(Full persona background: see ANCHOR PRD. Plan availability + flag scope in §8 Constraints.)


6. Non-Goals

  1. No silent / auto-apply. Refine never writes config on its own — every change is preview-then-apply; the tenant explicitly applies or discards. (Auto-apply of high-confidence patches is explicitly rejected — see §17.)
  2. Autonomous-mode agents only. Refining a legacy tree_node / /ai-agent modal agent is out of scope — the refiner operates on the capability_pack of agents on the new engine.
  3. No server-side session/history persistence (this phase). The BE is stateless per the RFC; the FE owns any refine session state. A persisted refine-conversation store is deferred.
  4. Not a runtime test harness. Refine edits configuration; it does not run conversations against the agent. Validating behavior is done via Preview (Phase 1) and the AI Agent Testing initiative.
  5. No knowledge-base content editing via refine. Refine may re-reference an existing file_search / vector store that already belongs to the agent, but it does not upload, edit, or vectorise KB files — that lives in the Resources / AI Agent Knowledge surface.
  6. No creating new actions/tools via refine. Refine references only actions already registered in the tenant's functions registry — any action name or kb_id the model invents is stripped (same reference-filtering guarantee as the drafter) and surfaced as a warning.
  7. One agent at a time. No bulk / multi-agent refine, and no cross-agent suggestions.

7. Scope Changes

Engineering surfaces this PRD touches (controlled vocab). Kept in sync with the scope_changes frontmatter above.

  • Backendchatbot:
    • New endpoint POST /v2/ai_agents/:id/refine (FrontendService::V2, proxy/BFF, session-auth, roles owner/supervisor/admin, flag-gated) — builds the request, proxies upstream, maps the response back. Does not persist.
    • New client method refine_capability_pack (upstream refine-skill-pack) in lib/ai_service/ai_agent.rb — mirrors the existing draft_skill_pack (POST /qontak-ai-noncore-mrag/api/ai-agent/refine-skill-pack).
    • New use-case + repository UseCases::RefineAiAgent + Repositories::Refine under app/api/frontend_service/v2/ai_agent/ — Clean Architecture, same shape as Generate.
    • Refactor (shared mapper): extract build_skill_pack / build_skill / build_routing_rules / build_skill_actions / build_completion etc. out of Repositories::SyncToAiService into a shared Mappers::SkillPackBuilder (pure shaping), parameterised by a vector-store resolver strategy. SyncToAiService passes its existing stateful resolver (creates/reuses vector DBs); Refine passes a read-only resolver that reads the already-persisted capability['vector_store'] — so refine serialises capability_packskill_pack without creating vector stores. This is the one Phase-1 file touched; its sync behavior must not change.
    • Reuse (apply path): consume the upstream's already-applied updated_skill_pack via the existing Mappers::SkillPackMapper (reverse direction), then persist through the existing Repositories::Update + SyncToAiService (mode: :update) — i.e. apply = a normal update; no new write path, same authz + ai_agent_histories audit.
    • New feature flag ai_agent_refine | default: OFF.
    • qontak-ai-noncore-mrag / mekari-agent: new upstream endpoint refine-skill-pack (returns reply + JSON Patch patches + already-applied, re-validated updated_skill_pack) — owned by Data / ML, the key external dependency (see §16).
  • Frontendchatbot-fe:
    • "Refine" tab in the agent editor's right rail (beside Preview), per Wulan's qontak-designer prototype — a multi-turn chat: the AI proposes option cards (each a per-field ProposedChange diff, Recommended flagged); Accept stages the option into the form in AiAgentEditor.vue (highlights changed fields, switches to the relevant tab), and the editor's existing Save persists (→ PATCH /v2/ai_agents/:id). FE owns the thread state. Design-vs-prod placement reconciliation: §18 OQ-7.
    • Diff preview of returned patches with Apply / Discard; Apply calls the existing update endpoint with the previewed capability_pack.
  • Design — Figma for the Refine panel + diff-preview interaction (currently TBD; Stitch prompts stand in until then).

8. Constraints

FieldValue
PlatformWeb only (Qontak admin — chatbot-fe). No mobile.
PerformanceRefine round-trip target ≤ 10s p95 (dominated by the upstream LLM call); BE proxy overhead < 500ms. Hard client/read timeout aligns with the drafter (60s open/read) but the perceived target is ≤ 10s — beyond that the FE shows a "still working / try again" state.
Data limitsBE is stateless (no refine record persisted). chat_history sent upstream is capped at the last N turns (N TBD with ML — see §18) to bound token cost. One agent per refine request.
Plan scopeSame as autonomous mode — Plus / Ultimate / Qontak 360 workspaces with autonomous_ai_agent rollout = ON. Not Starter/Free.
Feature flagai_agent_refine | default: OFF — enabled per workspace; gates both the BE endpoint and the FE panel.
Read/writeRefine (propose) + Apply (write) both restricted to roles owner / supervisor / admin — identical to today's draft/update authz. Refine itself writes nothing; Apply goes through the standard update + sync path.

9. New Features

Feature: "Refine" tab in the agent editor's right rail

Design source of truth: the qontak-designer prototype app/pages/bot-automation/ai-agents/[id].vue (Wulan). The agent editor has a right rail that toggles between Preview and Refine tabs; the left side is the tabbed form (Profile · Capabilities · Routing · Advanced). The prod editor today is AiAgentEditor.vue at /bot-automation/ai-agent/:id (note: prototype route is plural /ai-agents/:id and renders the editor in a modal — a design-vs-prod structural delta to reconcile at build, see §18 OQ-7; the Preview rail itself is a Phase-1 "pending" item, §16).

FieldDetail
URL/bot-automation/ai-agent/:id (existing agent editor; Refine is a tab in its right rail, beside Preview)
Accessowner / supervisor / admin on autonomous-eligible workspaces with ai_agent_refine = ON

Component Tree (per the prototype):

ComponentParentPurpose
AiAgentEditorExisting editor: left = form tabs (Profile · Capabilities · Routing · Advanced); right = rail
RightRailAiAgentEditorHosts the Preview and Refine tabs (rightRailTab)
RefinePanelRightRailThe Refine tab — multi-turn chat thread + input
RefineEmptyStateRefinePanel"Refine your agent" + suggestion chips (e.g. "The refund answer is not correct, fix it", "Add order tracking capability", "Make it faster to escalate to a human agent", "Make the tone more formal")
RefineMessageThreadRefinePanelUser/AI turns; AI replies stream in (reply)
RefineOptionCardRefineMessageThreadOne proposed optionlabel, description, Recommended badge, and a per-field diff (ProposedChange: type · field · current → new); Accept applies it, others dismiss
RefineInputRefinePanelFree-text box (Enter to send) → POST /v2/ai_agents/:id/refine

Apply behaviour (from the prototype's acceptRefineOption): accepting an option calls applyPendingData() → writes the change into the form, highlights the changed fields, switches to the relevant tab (Profile/Capabilities/Routing), and posts an AI confirmation turn. The other options for that message are marked dismissed. Nothing is persisted until the editor's existing Save (→ PATCH /v2/ai_agents/:id → Update + Sync).

UI States:

StateDescription
Empty"Refine your agent" + suggestion chips.
LoadingAI reply streaming / generating — input disabled (refineIsGenerating).
ErrorUpstream timeout/5xx or BE failure — error turn, agent unchanged, retry; no option cards.
SuccessAI reply streamed + one or more RefineOptionCards (Recommended flagged) with Accept; on Accept, form fields highlight + tab switches.

📊 UI State Diagram — Refine panel

stateDiagram-v2
[*] --> Empty: Open Refine tab
Empty --> Loading: Submit message / chip
Loading --> SuccessOptions: reply + option card(s) returned
Loading --> NoChange: reply, no actionable options
Loading --> Error: upstream timeout / 5xx
Error --> Loading: Retry
NoChange --> Loading: Send another message
SuccessOptions --> FormApplied: Accept an option (form highlighted, tab switched)
SuccessOptions --> Loading: Send follow-up (multi-turn)
FormApplied --> Saved: Editor Save → PATCH /v2/ai_agents/:id
FormApplied --> Loading: Keep refining
Saved --> [*]
Error --> [*]: Close (agent unchanged)

Figma: Frames TBD — the prototype above is canonical until then (see Header + §7).


10. API & Webhook Behavior

#BehaviorEntity AffectedTriggered ByExpected BehaviorFailure Behavior
1Refine capability_packAI Agent capability_pack (read-only — not persisted)Tenant submits a message in the Refine panel → POST /v2/ai_agents/:id/refineBE loads the agent, serialises its current capability_packskill_pack via the shared SkillPackBuilder (read-only vector resolver), gathers available_tools, and proxies upstream with user_message + chat_history (+ optional trace). Upstream returns a conversational reply, JSON Patch patches (RFC 6902), and an already-applied, re-validated updated_skill_pack. BE maps updated_skill_packcapability_pack via SkillPackMapper and returns reply + patches + the previewed capability_pack + warnings. Nothing is written.Upstream timeout/5xx → BE returns a graceful error; agent unchanged.
Upstream LLM/validation issue → upstream returns its deterministic fallback (never 5xx for LLM transport); BE passes through reply + empty/partial patches.
Referenced action/kb_id not in inputs → stripped by reference filtering, returned as a warning.
2Accept an option (stage into form)Editor form state (client-side; not yet persisted)Tenant clicks Accept on a RefineOptionCardFE applies the option's pendingData into the form, highlights the changed fields, and switches to the relevant tab; other options for that message are dismissed. No BE call yet.N/A — client-side; reversible by not saving / re-refining.
3Save (persist the refined config)AI Agent capability_pack (persisted + re-synced)Tenant clicks the editor's existing Save after accepting one or more optionsStandard update path: PATCH /v2/ai_agents/:idRepositories::Update writes new parameters, SyncToAiService (mode: :update) re-pushes skill_pack upstream + re-resolves vector stores; change live immediately; prior config snapshotted in ai_agent_histories.Sync upstream fails → DB transaction rolls back; agent stays on prior config; error shown.
Capability/routing ref validation fails → 400; no write.
4Discard / don't applyNoneTenant ignores the options or closes without savingNo write; proposed options dropped; the chat thread may continue.N/A — purely client-side.

[Claude to resolve during RFC: exact request/response JSON schema for /refine (user_message, chat_history[], trace{}, available_tools[] in; reply, patches[], updated_capability_pack, warnings[] out), HTTP error codes, and the SkillPackBuilder vector-resolver interface.]


11. System Flow + User Stories + ACs

11.1. System Flow

Flow: Refine an autonomous agent and apply a change · Type: User Journey + API Sequence

  1. Tenant opens an autonomous agent in the editor (AiAgentEditor) and opens the Refine with AI surface.
  2. Tenant types a message — e.g. pastes an error trace: "createorder keeps failing with 'belum terdaftar' but the bot just gives up."
  3. FE sends POST /v2/ai_agents/:id/refine with user_message + chat_history.
  4. BE loads the agent, serialises its current capability_packskill_pack via the shared SkillPackBuilder (read-only vector resolver — no vector DB created), gathers available_tools, and proxies upstream refine-skill-pack.
  5. Upstream LLM returns reply + patches (RFC 6902) + updated_skill_pack (already applied + re-validated through the drafter's defensive pipeline).
  6. BE maps updated_skill_packcapability_pack via SkillPackMapper; returns reply + patches + previewed capability_pack + warnings. Nothing persisted.
  7. FE renders the streamed reply plus one or more option cards — each with a Recommended flag and a per-field diff (ProposedChange: current → new); warnings shown if any.
  8. Tenant clicks Accept on an option → FE applies its pendingData into the form, highlights the changed fields, switches to the relevant tab; other options dismissed. (No BE write yet.)
  9. Tenant clicks the editor's existing SavePATCH /v2/ai_agents/:idUpdate + SyncToAiService re-push; change live immediately; prior config snapshotted in ai_agent_histories.
  10. Failure branch (refine): upstream times out / 5xx → FE shows "couldn't generate a suggestion — agent unchanged", retry available.
  11. Failure branch (save): sync to upstream fails → transaction rolls back; agent stays on prior config; error shown.
  12. Tenant can keep refining — follow-up turns carry chat_history (multi-turn thread per the design).

📊 System Flow — Refine with AI

sequenceDiagram
actor Tenant
participant FE as chatbot-fe (Refine panel)
participant BE as chatbot BE (/v2/ai_agents)
participant ML as noncore-mrag / mekari-agent
Tenant->>FE: Describe issue / paste error
FE->>BE: POST /v2/ai_agents/:id/refine (user_message, chat_history)
BE->>BE: SkillPackBuilder → skill_pack (read-only vector resolver)
BE->>ML: refine-skill-pack (skill_pack, message, history, available_tools)
ML-->>BE: reply + patches + updated_skill_pack (re-validated)
BE->>BE: SkillPackMapper → capability_pack (not persisted)
BE-->>FE: reply + patches + previewed capability_pack + warnings
FE-->>Tenant: Reply + option cards (Recommended; per-field diff)
alt Accept an option, then Save
Tenant->>FE: Accept option
FE->>FE: applyPendingData → form highlighted + tab switched (no write)
Tenant->>FE: Save
FE->>BE: PATCH /v2/ai_agents/:id (updated capability_pack)
BE->>BE: Update + SyncToAiService(mode: update)
BE->>ML: PUT /ai-agent (re-push skill_pack)
BE-->>FE: Updated agent (live) + history snapshot
else Refine upstream fails
ML-->>BE: timeout / 5xx
BE-->>FE: Graceful error — agent unchanged
FE-->>Tenant: "Couldn't generate a suggestion — try again"
end

11.2. User Stories

User StoryImportanceMockup / Technical NotesAcceptance Criteria
[REFINE-S01] — Refine an agent in natural language

As a Chatbot Specialist, I want to describe a misbehavior or paste an error and get a suggested config change, so that I can fix a live agent without hand-diagnosing the capability_pack.
Must HaveFigma: Pending — see §9 / Stitch.

Data Fields:
id (string/uuid, required) — agent id, URL param
user_message (string, required, min 1 char) — User input
chat_history (array, optional) — FE thread state (multi-turn per the design)
trace (object, optional) — recent workflow_state / turns (see §18)
patches (array, response) — RFC 6902 ops from upstream
warnings (array, response) — stripped refs

Before-After Behavior: Before: the only way to change an agent is to manually re-edit the Profile/Capabilities/Routing form tabs and full-merge PATCH /v2/ai_agents/:id; after, the tenant describes the issue in the editor and the system returns a reply + reviewable diff with nothing written until applied.
— Happy Path —
• AC-1: Given an autonomous agent and ai_agent_refine = ON, when the tenant submits a user_message, then the system returns a conversational reply plus patches and a previewed capability_pack, and persists nothing.
• AC-2: Given a change is returned, when the response renders, then it shows one or more option cards (the best flagged Recommended), each with a per-field diff (ProposedChange: current → new) and an Accept action.
• AC-3: Given the message asks for nothing actionable (e.g. "thanks"), when the system responds, then reply is returned with no option cards.
• AC-4: Given the upstream proposes an action name or kb_id not in the agent's inputs, when the response is built, then that reference is stripped and surfaced in warnings.

— Error / Unhappy Path —
• ERR-1: Given the upstream refine-skill-pack times out or returns 5xx, when the tenant submits, then the agent is unchanged, an error reply is shown with retry, and refine_failed is logged with the reason.
• ERR-2: Given an upstream LLM/validation issue (not transport), when it occurs, then the upstream deterministic fallback reply is passed through with empty/partial patches (never a 5xx to the tenant).

— Permission Model —
• CAN: owner / supervisor / admin on autonomous-eligible workspaces with ai_agent_refine = ON.
• CANNOT: other roles; workspaces with the flag OFF.
• Unauthorized: Refine panel not rendered; endpoint returns 403.

— UI States —
• Loading: thinking indicator on assistant turn; input disabled.
• Empty: prompt to describe the issue or paste an error.
• Error: "couldn't generate a suggestion — agent unchanged" + retry.
• Success: reply + option card(s), Recommended flagged (+ warnings).
[REFINE-S02] — Accept an option and save the change

As a Chatbot Specialist, I want to accept a proposed option, see it land in the form, then save, so that no change goes live without my explicit review.
Must HaveFigma: Per prototype.

Data Fields:
pendingData (object) — the option's structured change applied to the form
changes (array) — ProposedChange[] rendered as the per-field diff
capability_pack (object, required on Save) — the resulting pack sent to PATCH
id (string/uuid, required) — agent id

Before-After Behavior: Before: any save rewrites the whole config with no diff; after, the tenant accepts a specific option (form fields highlight, tab switches), reviews in the form, and saves through the standard update path with the prior config snapshotted for revert.
— Happy Path —
• AC-1: Given option cards are shown, when the tenant clicks Accept on one, then its pendingData is applied into the form, the changed fields are highlighted, the editor switches to the relevant tab, the other options are dismissed, and refine_accepted is logged. No BE write yet.
• AC-2: Given an option was accepted, when the tenant clicks the editor's Save, then PATCH /v2/ai_agents/:id runs Update + SyncToAiService (mode: update), the change is live, the prior config is snapshotted in ai_agent_histories, and refine_applied is logged.
• AC-3: Given option cards are shown, when the tenant accepts none (closes / keeps chatting), then nothing is written, and refine_discarded is logged.

— Error / Unhappy Path —
• ERR-1: Given Save is clicked, when SyncToAiService fails to push upstream, then the DB transaction rolls back, the agent stays on the prior config, and an error is shown.
• ERR-2: Given Save is clicked, when capability/routing reference validation fails, then the update returns 400 and no write occurs.

— Permission Model —
• CAN: owner / supervisor / admin (same as the existing update endpoint).
• CANNOT: other roles.
• Unauthorized: Accept/Save not rendered; endpoint returns 403.

— UI States —
• Loading: Save shows a spinner; actions disabled.
• Empty: N/A (only shown when an option exists).
• Error: inline "couldn't save — agent unchanged" + retry.
• Success: accepted option marked applied; form reflects the change; Save confirms.

Dependencies: [REFINE-S01]
[REFINE-S03] — Iterative (multi-turn) refinement

As a Chatbot Specialist, I want to send follow-up requests that build on the prior turn, so that I can iterate ("now also handle the timeout case") without restating context.
Should HaveFigma: Per prototype (the Refine tab is a multi-turn thread).

Data Fields:
chat_history (array) — FE-owned thread, capped at last N turns sent upstream

Before-After Behavior: Before: no refine concept exists; after, the Refine tab keeps a multi-turn thread and the FE sends chat_history so follow-ups are context-aware, while the BE stays stateless.
— Happy Path —
• AC-1: Given a prior refine turn, when the tenant sends a follow-up, then the FE includes chat_history and the response reflects the earlier context.
• AC-2: Given a long thread, when history exceeds the cap, then only the last N turns are sent upstream (N per §18) and the rest stay client-side.
• AC-3: Given the tenant reloads or reopens the editor, when no session is restored, then a fresh thread starts (no server-side history this phase — see §6 Non-Goal 3).

— Error / Unhappy Path —
• ERR-1: Given a mid-thread refine call fails, when it errors, then prior turns remain visible and the failed turn can be retried.

— Permission Model —
• CAN: same as [REFINE-S01].
• CANNOT: same as [REFINE-S01].
• Unauthorized: Refine tab not rendered.

— UI States —
• Loading: per-turn streaming indicator.
• Empty: "Refine your agent" + suggestion chips.
• Error: failed turn marked retryable.
• Success: appended response (+ option cards if any).

Dependencies: [REFINE-S01]
[REFINE-S01-NEG] — No refine on legacy agents; never auto-apply (Guard Rail — from Non-Goals 1 & 2)

As a tenant on a legacy tree_node agent, when I look for Refine, then it is not available; and no refine result is ever written without explicit Apply.
Guard Rail• NEG-1: Given a legacy tree_node / /ai-agent modal agent, when the tenant opens its config, then the Refine panel is not rendered and /v2/ai_agents/:id/refine returns a 4xx for that agent.
• NEG-2: Given any successful refine response, when the tenant takes no action, then no config change is persisted (no auto-apply).

🧪 Test Coverage Matrix — [REFINE-S01]

DimensionCoverageNotes
Boundary values⚠️ partialAC-3 covers no-actionable-change (empty patches); ⚠️ QA: empty/whitespace user_message (min 1 char), very long message, very large capability_pack
State transitions✅ definedAC-1 (returns, nothing persisted) → S02 apply/discard transition
Data validation✅ definedAC-4 reference filtering (unknown action/kb_id stripped → warning)
Concurrency⚠️ TBD⚠️ QA: two specialists refine the same agent simultaneously; refine in flight while another user applies a manual edit
Network/timeout✅ definedERR-1 upstream timeout/5xx → agent unchanged + retry; ERR-2 LLM fallback never 5xx

🧪 Test Coverage Matrix — [REFINE-S02]

DimensionCoverageNotes
Boundary values⚠️ TBD⚠️ QA: apply with empty patch set; apply a stale preview after the agent changed underneath
State transitions✅ definedAC-1 apply→live; AC-3 discard→no-op; ERR-1 apply-fail→rollback
Data validation✅ definedERR-2 capability/routing ref validation → 400, no write
Concurrency⚠️ TBD⚠️ QA: apply while a parallel manual save commits (last-writer / optimistic-lock behavior)
Network/timeout✅ definedERR-1 SyncToAiService upstream failure → transaction rollback, prior config intact

12. Rollout

FieldDetail
Feature flagai_agent_refine (see §8 — OFF by default)
RolloutStage 1 (Internal Alpha) → Chatbot Specialists maintaining the 26Q2 cohort (the 15 production agents)
Stage 2 (Closed Beta) → 3–5 customer Bot Builders on Plus/Ultimate/360
Stage 3 (Open Beta) → all autonomous-eligible workspaces, opt-in
GA → all autonomous-eligible workspaces, flag default ON
Backward compatYes — purely additive. The existing draft + full-config edit path (PATCH /v2/ai_agents/:id) is unchanged; Apply reuses it. The only Phase-1 code touched is the build_skill_pack extraction (§7), which must preserve identical sync output.
MigrationNone — no Rails DDL; no data migration.

12.1. Semantic Regression Rollback

Refine produces AI output (proposed config patches), so this section applies.

FieldDetail
Model flagai_agent_refine | default: OFF — disabling it removes the refine endpoint + panel; manual config editing remains fully available.
Regression metric(a) refine patch apply-success rate (applied / proposed) and (b) post-apply agent regression — agents whose config was changed via refine and then reverted or re-edited within 48h.
Rollback thresholdApply-success rate < 30% sustained over a week, or post-apply revert rate > 20%, or refine_failed rate > 10% → pause rollout / flip the flag OFF for affected workspaces.
Rollback pathTwo levels: (1) feature — toggle ai_agent_refine OFF (no deploy); (2) per-agent — an applied-but-worse config is reverted by restoring the prior snapshot from ai_agent_histories (the standard update-audit trail), which re-syncs the old skill_pack upstream.

13. Observability

Key Events:

Event NameTriggerProperties
refine_requestedTenant submits a refine messagecompany_id, ai_agent_id, message_len, history_turns, timestamp
refine_succeededUpstream returns a valid responsecompany_id, ai_agent_id, patch_count, warning_count, latency_ms, timestamp
refine_failedUpstream timeout/5xx or BE errorcompany_id, ai_agent_id, reason, latency_ms, timestamp
refine_appliedTenant clicks Apply and update succeedscompany_id, ai_agent_id, patch_count, timestamp
refine_discardedTenant clicks Discardcompany_id, ai_agent_id, patch_count, timestamp
refine_revertedApplied config reverted via ai_agent_histories within 48hcompany_id, ai_agent_id, timestamp

Dashboard owner: BOT — Hadiningbot Squad (chatbot)

Alerts:

  • refine_failed rate > 10% of refine_requested over 1h → page on-call (chatbot) + notify PM.
  • Refine latency p95 > 10s over 1h → notify chatbot squad (upstream LLM latency check).
  • refine_reverted / refine_applied > 20% over a week → PM review (quality regression).

13.1. Post-Launch Monitoring Cadence

FieldDetail
Review cadenceWeekly for the first 4 weeks post-GA, then monthly.
OwnerPM (Dimas) + BOT squad.
Review scopeAll §14 metrics — adoption (refine vs manual edits), apply-success rate, error rate, time-to-fix.
Trigger thresholds• Apply-success rate < 30% for a week → investigate prompt/UX.
refine_failed rate > 10% in any week → investigate upstream.
refine_reverted/refine_applied > 20% → quality review within 48h.
Rollback considerationIf error or revert thresholds breach and are unresolved within 48h, PM flips ai_agent_refine OFF for affected workspaces (see §12.1).

14. Success Metrics

Adoption & Usage:

MetricDefinitionBaselineTarget
⭐ Refine adoptionShare of autonomous-agent config changes made via Refine (applied) vs manual tab editsN/A — new capability≥ 40% of config changes via Refine within 60 days of GA
Refine engagementDistinct agents that received ≥1 refine sessionN/A≥ 60% of active autonomous agents within 60 days of GA

Quality & Accuracy:

MetricDefinitionBaselineTarget
Apply-success rateApplied refinements / proposed refinements (a proxy for suggestion usefulness)N/A≥ 50% within 30 days of GA
Refine error raterefine_failed / refine_requestedN/A< 5% steady-state
Post-apply revert rateApplied configs reverted/re-edited within 48hN/A< 15%

Efficiency & Impact:

MetricDefinitionBaselineTarget
Time-to-fixMedian time from "agent misbehaving" to a shipped config fixManual baseline TBD (measure in Alpha)−50% vs manual baseline within 90 days of GA

15. Launch Plan & Stage Gates

StageAudienceDurationSuccess Gate to AdvanceOwner
Internal AlphaChatbot Specialists, 26Q2 cohort (15 agents)2 weeks≥ 20 real refine sessions; apply-success ≥ 40%; refine_failed < 10%; no rollback-worthy regressionPM + Eng
Closed Beta3–5 customer Bot Builders3 weeksApply-success ≥ 50%; error rate < 5%; ≥ 1 customer fixes an agent unaided; post-apply revert < 20%PM + CSM
Open BetaAll autonomous-eligible, opt-in3 weeksAdoption trending toward 40%; latency p95 ≤ 10s; all Closed-Beta gates sustainedEng Lead
GAAll autonomous-eligible (flag default ON)OngoingAll Open-Beta gates sustained 2 weeks; PMM approvedPM + PMM

16. Dependencies

DependencyOwning TeamDeliverable NeededBlocking?
Upstream refine-skill-pack endpoint (mekari-agent / proxied by noncore-mrag)Data / ML PlatformThe endpoint itself: accepts skill_pack + user_message + chat_history (+ trace, available_tools); returns reply + RFC 6902 patches + already-applied, re-validated updated_skill_pack + warnings. Does not exist yet.YES
Phase 1 capability_pack model + drafter live on /v2/ai_agentsBOT — Hadiningbot (chatbot)Must remain stable — the refiner serialises/maps the same capability_pack and the SkillPackBuilder is extracted from Phase 1's SyncToAiServiceYES
capability_packskill_pack adapter (skill_pack_mapper.rb reverse map + the extracted SkillPackBuilder)BOT — Hadiningbot (chatbot)Bidirectional mapping reused for refine input/output; extraction must not change Phase-1 sync outputYES
ai_agent_histories audit (exists)BOT — Hadiningbot (chatbot)Used as the per-agent revert path for an applied-but-worse configNO
trace source (recent workflow_state / turns) for debugging contextBOT + Data/ML (overlaps AI Agent Live Monitoring)Optional runtime telemetry to enrich refine; refine works without it (degraded)NO
Agent editor right rail (Preview tab — Phase-1 "pending")BOT — Hadiningbot (chatbot)The Refine tab lives in the same right rail as Preview; confirm whether the rail ships with Preview or refine stands up the rail (see §18 OQ-7)NO
Refine design (right-rail chat)Design — WulanAlready prototyped in qontak-designer (app/pages/bot-automation/ai-agents/[id].vue); Figma frames are a follow-up, prototype is canonical meanwhileNO

17. Key Decisions + Alternatives Rejected

8a — Decisions Made

DateDecisionRationale
2026-06-29Upstream returns updated_skill_pack already applied + re-validated; BE does not apply patches itself. BE passes patches through to FE for the diff preview only.Keeps chatbot BE a thin proxy (same posture as the drafter) and guarantees the refined pack went through the same defensive pipeline as the drafter (gate validation, tone coercion, orphan cleanup, reference filtering). Avoids a second, drift-prone RFC-6902 implementation in Rails.
2026-06-29Apply reuses the existing PATCH /v2/ai_agents/:id (Update + SyncToAiService), not a new apply endpoint.Apply is a normal config update — reuses authz, validation, upstream re-push, and ai_agent_histories audit/revert for free.
2026-06-29Extract build_skill_pack from SyncToAiService into a shared SkillPackBuilder with a pluggable vector-store resolver (stateful for sync, read-only for refine).The refiner must serialise the current capability_packskill_pack without creating vector DBs; sync must keep its side-effecting resolution. One shared mapper, two resolvers, no duplicated shaping logic.
2026-06-29Stateless BE; FE owns any refine session state (no new table).Matches RFC §10.3b; avoids Rails DDL and a dual source of truth during the 26Q2 window.
2026-06-29Refine is always review-then-apply (no auto-apply).Trust/safety — a config change to a live customer agent must be a human decision.
2026-06-29Interaction model resolved by design (Wulan prototype): a "Refine" tab in the agent editor's right rail (beside Preview), multi-turn chat, AI proposes 1+ options each with a per-field diff (Recommended flagged); Accept stages the option into the form (highlight + tab switch); persistence is the editor's existing Save.Supersedes the earlier "single-shot vs chat / drawer vs inline" open question — the qontak-designer prototype is the design source of truth (same posture Phase 1 took with its prototype).

8b — Alternatives Rejected

AlternativeWhy RejectedDate
BE applies the RFC 6902 patches itself (Rails JSON-Patch)Upstream already returns the applied + re-validated pack; reapplying in BE duplicates logic and risks drift from the drafter's validation pipeline2026-06-29
A dedicated /refine/apply endpointApply is just an update — reuse PATCH /v2/ai_agents/:id; a new endpoint duplicates authz/sync/audit2026-06-29
Persist refine chat history server-side (new table)Adds DDL + dual source of truth; FE-owned session is sufficient for this phase (deferred to a later phase if audit needs it)2026-06-29
Auto-apply high-confidence patchesUnacceptable risk to live customer agents; conflicts with Non-Goal 12026-06-29
Build refinement entirely in the chatbot-ml-dev prototypeSame reasons Phase 1 productionised the engine — no plan/tier gating, no auth surface, no rollout control, no audit2026-06-29

18. Open Questions

#TypeQuestionOwnerDeadline
1RiskUpstream refine-skill-pack does not exist yet. The whole feature is blocked on the Data/ML endpoint. Mitigation: confirm ownership + the exact contract with the mekari-agent/noncore-mrag owners before BE build; agree the request/response schema up front so the BE proxy + FE can be built against a stub.PM (Dimas) + Data/ML2026-07-15
2Open QuestionWhat goes in trace? Which runtime telemetry (recent workflow_state, recent turns) meaningfully improves refine quality, and where does it come from — does it overlap AI Agent Live Monitoring's signals? Refine must work without it (degraded).PM + Eng (Eko)before RFC
3Assumptionchat_history cap (N turns). We assume the FE sends the last N turns to bound upstream token cost. Confirm N + truncation strategy with ML.Eng + Data/MLbefore RFC
4Open QuestionKB scope on apply. Non-Goal 5 keeps KB content out, but if a refinement changes which file_search/vector store a capability points to, does Apply trigger SyncToAiService vector resolution (and is that desired), or must KB-affecting patches be rejected?PM + Engbefore RFC
5RiskApplied-but-worse config. A refinement can look valid but degrade the agent. Mitigation: preview-then-apply (no auto-apply) + ai_agent_histories per-agent revert + the §12.1 flag and revert-rate alert.PM + Engbefore GA
6Open QuestionDiff rendering source of truth. Does the FE render the diff from patches (RFC 6902 paths) or by diffing previous vs updated_capability_pack? Paths reference upstream skill_pack shape, not the public capability_pack — confirm the FE has a readable mapping.Eng (FE) + Eng (BE)before RFC
7Open QuestionDesign-vs-prod placement reconciliation. The interaction model is settled (§17 — right-rail Refine tab, multi-turn chat, accept-option-applies-to-form). But the prototype renders the editor in a modal at /bot-automation/ai-agents/:id (plural) with a Preview+Refine right rail, while prod AiAgentEditor.vue is a page at /bot-automation/ai-agent/:id (singular) and its right-rail Preview is a Phase-1 "pending" item (§16). Where exactly does the right rail live in prod, and does refine ship before/with Preview?PM + Eng (FE)before RFC

PRD CHANGELOG

VersionDateBySectionTypeSummary
1.02026-06-29ClaudeAllCREATEDPhase 4 (AI-Assisted Refinement / "Refine with AI") PRD created from the Autonomous Agent RFC §10.3b (refine-skill-pack, QON 51153994292 / 51226214880) and grounded in the current chatbot BE — drafter (draft-skill-pack) exists, full-merge PATCH /v2/ai_agents/:id is the only edit path today, bidirectional capability_packskill_pack translation already present (sync_to_ai_service.rb + skill_pack_mapper.rb). Refiner does not exist yet.
1.32026-06-29ClaudeTitle, Header, S2 (Phase Context)MODIFIEDRenumbered Phase 4 → Phase 2 per PM (refinement is the next concrete step after Phase 1 and is independently shippable). File renamed to phase-2-ai-assisted-refinement.md; title, H1, CB Phase Number (Phase 2 of 4), prior/cross-phase references updated. In the anchor, Migrate shifted to Phase 3 and Iteration to Phase 4.
1.22026-06-29ClaudeS1, S4, S9MODIFIEDScore-prd v3.3 fixes: tightened the one-liner to ≤25 words (S1), added the UI State Diagram for the Refine panel (S9 New Features, closing the 10.6 diagram gap), and added explicit time horizons to "What Happens If We Don't Ship" (S4).
1.12026-06-29ClaudeHeader, S2, S9, S10, S11, S16, S17, S18MODIFIEDIncorporated the existing design (Wulan's qontak-designer prototype app/pages/bot-automation/ai-agents/[id].vue): refine is a right-rail "Refine" tab (beside Preview), a multi-turn chat with suggestion chips, where the AI proposes options (each a per-field ProposedChange diff, Recommended flagged) and Accept stages the change into the form (highlight + tab switch); persistence stays the editor's existing Save. Corrected the earlier wrong assumption of a "side-panel chat config screen"; resolved OQ-7 (interaction model) into a §17 decision and replaced it with a design-vs-prod placement reconciliation question; upgraded REFINE-S03 to Should Have.