Skip to main content

RFC: Legacy Migration — CRM Contact Notes → CDP Notes (S2S gocraft/work pipeline, idempotent, timestamp-preserving)

Document Conventions (do not remove)

This RFC follows the Qontak RFC Template format for governance — the metadata table, Confluence sections 1–6, and Comment logs are mandatory. Mark a section N/A — reason when truly inapplicable rather than deleting it.

It is also agent-execution-ready: §1 PRD-to-Schema Derivation (backend RFCs require no Figma), §2 Repo Reading Guide (Detail 2.0), mermaid diagrams, and §4 Agent Execution Plan + Verification & Rollback Recipe are complete before §7 says yes.

The YAML frontmatter at the top is the machine-readable index. The Metadata table below is the human-readable governance record. Both agree on every shared field.

Grounding note (anti-hallucination). Every path:line reference in this RFC was verified against the live worktrees contact-service (CDP, Go/MongoDB) and qontak.com (Legacy CRM, Rails) on 2026-06-18 (see Detail 2.0 Source Verification). Where the PRD's assumed contract differed from the repo, the repo wins and the deviation is called out. The most consequential corrections: (1) the PRD's POST /cdp/notes/migrate HTTP batch endpoint is not built — because the migration runs in-process as a gocraft/work consumer (mirroring ActivityLogMigrationConsumer), the insert is a direct repository write, not an internal HTTP call (Decision 1); (2) S2S in contact-service is HTTP Basic auth on the /private + /api/v1 route groups — the migrate trigger and status live under /private/notes/... (Decision 2); (3) the idempotency unique index must be partial, or it breaks every existing human-created note that has no legacy_crm_note_id (Decision 4); (4) migration status is Redis-backed like the existing migration framework, not a Mongo job collection (Decision 2).

Metadata

FieldValueNotes
StatusRFC (IDEA)Human label; YAML status: carries the remapped linter enum draft
DRIZhelia AlifaRFC owner (frontmatter dri)
TeamcdpAdvisory squad slug carried from PRD / initiative README
Author(s)Zhelia AlifaPrimary author
ReviewersCDP Backend Lead, Legacy CRM Squad Lead, Data Engineering LeadTech reviewers across affected squads (CDP BE + CRM + Data)
Approver(s)CDP Tech Lead, InfoSec ApproverTech leaders + infosec approver
Submitted Date2026-06-18ISO-8601
Last Updated2026-06-18ISO-8601
Target Release2026-Q3Quarter
Target Quarter2026-Q3Advisory, carried from PRD
Related../prds/prd-legacy-migration-crm-notes.mdSource PRD v2.1
Discussion#cdp-ops (Slack)Alerts + discussion channel

Type: backend Sub-type: new-feature

Sections at a Glance

  1. Overview (PRD-to-Schema Derivation; traceability; per-story change map; no Figma — backend migration, output via existing CDP Notes UI)
  2. Technical Design (Infrastructure Topology → Technical Decisions [ADR] → Repo Reading Guide → architecture & service map → end-to-end sequences → DDL/Mongo → APIs → integrity / concurrency / async-job specs)
  3. High-Availability & Security
  4. Backwards Compatibility and Rollout Plan (Agent Execution Plan + Verification & Rollback Recipe)
  5. Concern, Questions, or Known Limitations
  6. Comment logs
  7. Ready for agent execution

1. Overview

CDP (the contact-service backend, Go + MongoDB) has no CRM-notes migration capability today — verified: the only notes surface is single-record CRUD under /iag/v1/contacts/{contact_id}/notes (internal/server/rest_router.go:150-159), the ContactNote model has no legacy_crm_note_id and no legacy_owner_label field (internal/app/repository/contact_notes/base.go:26-36), SetDefaults() overwrites caller timestamps with time.Now() (base.go:51-54, create.go:12), note writes derive the company from the user IAG context with no system path (contact_notes_handler.go:75-79, :478-486), and there is no server-side HTML sanitization — content is validated by length only (contact_notes_service.go:268-274). On the CRM side, the assumed extraction contract (GET /crm/notes?organization_id&limit&offset) does not exist; the real notes API (app/controllers/api/v4/notes.rb:131-147) is entity-scoped (per lead/company/deal/ticket) and does not actually paginate.

This RFC specifies a net-new, one-time historical migration pipeline that ingests ~21,000+ Legacy CRM Person notes across ~130 client accounts (CIDs) into CDP Notes. It is built by mirroring the migration framework that already exists in contact-serviceActivityLogMigrationHandlerActivityLogMigrationConsumer.ProcessUpdateUserIDJob(job *work.Job)activity_log_migration_service.go, with the house status route GET /private/activity_logs/migration/status (rest_router.go:74). The migration is triggered by a gocraft/work job enqueue (not a synchronous HTTP call), runs entirely in-process as an S2S consumer, resolves each CRM Person note to a CDP contact via the contact's existing crm_data.id linkage (contact/base.go:53,342-344), sanitizes the CRM rich HTML server-side, re-links attachments to company-scoped CDP storage, and inserts notes idempotently (stored legacy_crm_note_id + a partial unique index) while preserving the original CRM timestamps.

Success Criteria

  • Migration completeness ≥ 99% match_pct per CID (source count vs CDP migrated count) before any CID cutover — PRD §13.
  • CIDs migrated: 100% of ~130 Notes-using CIDs at completed_success by CDP GA — PRD §13.
  • Attachment success ≥ 95% (re-linked images + audios + documents / total) — PRD §13.
  • Idempotency: a full re-run where every note already exists inserts zero duplicates (notes_migrated=0, notes_skipped=N) — PRD NOTES-MIG-S03/AC-3.
  • Throughput / window: ≥ 10,000 notes/hour/CID; ≤ 4h window/CID — PRD §7.
  • Integrity: failure rate ≤ 1%/CID; halt + alert above; zero silent failures (every failed record logged with a reason code) — PRD §7.
  • Timestamp fidelity: migrated notes render in reverse-chronological order by their original CRM created_at, not insert time — PRD NOTES-MIG-S02/AC-4, S05/AC-1.

Out of Scope

  1. No real-time / ongoing sync — one-time historical migration only (PRD §6.1).
  2. No live @mentions — embedded CRM mention anchors are stripped to plain @Name text; native CDP mentions are a separate PRD (PRD §6.2, D-8).
  3. No dedup vs human-created CDP notes — idempotency is enforced only by legacy_crm_note_id (PRD §6.3).
  4. No client self-service trigger/monitor UI — Ops-triggered S2S only (PRD §6.4).
  5. No deletion/archival of source CRM notes during the retention window (PRD §6.5). CRM source is read-only and untouched.
  6. Activity-type entries (Calls/Emails/Meetings/WhatsApp/SMS) are excluded by default — notes-only filter on crm_note_type_id (PRD §6.6, OQ-4 default).
  7. No notes for other Qontak products (Inbox/Campaign/Chatbot) (PRD §6.7).
  8. S06 "Legacy" banner/tag is OUT of scope — the FE has no banner/tag infra and CustomerNote has no metadata field; it is not a no-UI-change item and is re-scoped to a separate FE+BE change (PRD §6.8, D-9). There is no frontend work in this RFC — migrated notes render through the existing CDP Notes UI.
  9. crm_checkin geolocation is dropped — a deliberate data-loss decision (Decision 9, PRD D-10).

Assumptions

  • The CRM crm_data.id on a CDP contact holds the CRM crm_person_idconfirmed by grounding (REV-1). The contact document stores CrmData{ID} (contact/base.go:53,342-344), populated from the CRM contact-sync payload's contact_id for app_name="crm" (payload/contact_sync_request.go:104-105), and on the CDP-initiated create-back from CrmContactResponse.CrmID (consumer/send_contact.go:303-323). On the CRM side both values are the Crm::Lead/Crm::Person primary key (crm/centralized_contacts/params_mapper.rb:31 "contact_id": @lead.id.to_s; centralized_contacts_controller.rb:120-129 crm_id: lead.id) — there is no separate "centralized-contact" id space (Crm::Lead < Crm::Person, STI on crm_people). So a Person note's crm_person_id matches the CDP contact's crm_data.id directly (string-cast: crm_data.id is stored as a string, base.go:343). An index crm_contact_index exists on crm_data.id (db/migrations/001_create_contact.up.json). The remaining per-CID coverage (some CRM persons may not have synced to a CDP contact) is a data-quality gate, not an id-space ambiguity — OQ-2.
  • The CRM exposes no org-scoped notes extraction today (verified: v4 notes API is entity-scoped and unpaginated, api/v4/notes.rb:137-147). The Legacy CRM Squad will deliver a net-new S2S org-scoped extraction contract that contact-service consumes via the existing QontakCrmClient pattern (internal/app/api/qontak_crm.go, already authenticated through CRM_API_ROOT_URL + CRM_API_AUTH, config/load.go:197-198) — OQ-7.
  • CRM attachment originals are served from a CarrierWave/S3 bucket with public-read ACL (config/initializers/carrierwave-s3.rb:27,58) and CDN-rewritten URLs — i.e. generally fetchable without signing. CDP still re-uploads them to company-scoped storage rather than referencing CRM URLs (Decision 8). This corrects the PRD's "internal creds" assumption — confirm the bucket is not later locked down (OQ-9).
  • MongoDB is schemaless, so adding legacy_crm_note_id / legacy_owner_label to contact_notes needs no DDL migration — only application struct fields plus one partial unique index migration (db/migrations/, JSON format).
  • The "≤1 voice_note per note" rule in PRD OQ-8 is not enforced in contact-service today — note validation accepts any number of attachments of type {image,doc,pdf,video,voice_note,xlsx} (contact_notes_service.go:286-293). This RFC therefore does not add a voice_note cap unless the product owner requires one (OQ-8).

Dependencies

DependencyOwnerAvailabilityBlocking?
gocraft/work worker + job registrationCDP BEExistsgo.mod github.com/gocraft/work v0.5.1; registerJobWithOptions(...) internal/worker/worker_service.go:132,138Reuse
Existing migration framework to mirrorCDP BEExistsActivityLogMigrationHandler (activity_log_migration_handler.go:32,91), ActivityLogMigrationConsumer.ProcessUpdateUserIDJob (activity_log_migration_consumer.go:25), activity_log_migration_service.go (Redis status key :25, TTL 7d :31, batch 10000 :28)Reuse (mirror)
JobEnqueuer.EnqueueJobCDP BEExistsinternal/app/service/job_enqueuer.go:38-39,65-67 (work.Q{"data": params, ...})Reuse
/private route group (BasicAuth S2S)CDP BEExistsrest_router.go:69-70,78-79; status route :74; mymiddleware.BasicAuth internal/pkg/middleware/basic_auth.go:10Reuse
contact_notes Mongo storeCDP BEExistsrepository/contact_notes/base.go:39-41 (TableName()="contact_notes"); no legacy_crm_note_id/legacy_owner_label, no batch insert, no unique indexExtend
Contact.CrmData.ID linkage + crm_contact_indexCDP BE / Data EngExistscontact/base.go:53,342-344; index db/migrations/001_create_contact.up.jsonReuse (confirm coverage — OQ-2)
ContactRepository.SearchWithFilters / CountWithFiltersCDP BEExistscontact/search.go:125,147 (driven with bson.M{"crm_data.id": {"$in": …}})Reuse
QontakCrmClient (S2S CRM HTTP client)CDP BEExistsinternal/app/api/qontak_crm.go:14-24; config CRM_API_ROOT_URL/CRM_API_AUTH config/load.go:197-198Extend (new ListPersonNotes)
CRM org-scoped Person-notes extraction contract (NET-NEW)Legacy CRM SquadDoes NOT exist — v4 API is entity-scoped + unpaginated (api/v4/notes.rb:137-147); no /crm/notes/count, no org-scoped bulkYES
Server-side HTML sanitizer (Go)CDP BENet-new — none exists (grep sanitize/bluemonday → 0 hits in notes service)YES (add lib)
CDP company-scoped attachment storage (re-upload)CDP InfraConfirm — re-upload to {company_sso_id}/...; quota for full attachment volume incl. documentsYES (confirm)
User identity (CRM creator_id → SSO UUID)Launchpad / IdentityExists — owner-name resolution path contact_notes_service.go:131-136 (GetUserNamesBulk)NO (degrades quality only)
CSM approval + maintenance windowCSMPer-CID consentYES (Stage 2+)

PRD-to-Schema Derivation (backend-specific — required)

Backend RFCs do not require Figma. The "design" is the schema + contracts derived from the PRD's entities, business rules, and consumer needs.

PRD entity / attribute / rulePersisted as (collection.field)Exposed / enforced viaEnforced whereSource
A migration run for one CID becomes a durable, pollable jobRedis status record (mirror activity_log_migration:user_id_update) — {status, progress_pct, notes_processed, notes_total, failure_rate, match_pct}POST /private/notes/migrate (enqueue) + GET /private/notes/migration/status?cid=NotesMigrationService.ValidateAndEnqueue + BasicAuth; status written by the consumerPRD §8, §9 #1/#9, NOTES-MIG-S01
CRM note id (idempotency key)contact_notes.legacy_crm_note_id (NET-NEW) + partial unique index (company_sso_id, legacy_crm_note_id)skip-on-conflict at batch insertCreateNotesBatch repo method + partial unique indexPRD §7, §9.1, NOTES-MIG-S03
CRM note note (sanitized rich HTML)contact_notes.note (HTML, ≤10000 chars)server-side sanitize + mention-strip on the migrate write pathHtmlNormalizer in the consumer (net-new)PRD §9.1, D-4/D-8
CRM crm_person_id → CDP contactcontact_notes.contact_id (resolved)SearchWithFilters(bson.M{"crm_data.id":{"$in":…}})ContactResolver in the consumerPRD §9 #3, §9.1, D-6/D-12
CRM creator_id → ownercontact_notes.owner_id + contact_notes.legacy_owner_label (NET-NEW)live owner-name resolution unchanged; label shown when owner_id=nullOwnerResolver; render path contact_notes_service.go:131-136PRD §9 #4, §9.1, D-7
CRM created_at/updated_at (TZ)contact_notes.created_at/updated_at (UTC, preserved)migrate path bypasses SetDefaults()CreateNotesBatch sets timestamps explicitlyPRD §9.1, D-2
CRM crm_note_images / crm_note_audios / crm_note_attachments (documents)contact_notes.attachments[] ({url,type,file_size*,file_name})re-upload to {company_sso_id}/... → proxy URL; type ∈ {image,doc,pdf,video,voice_note,xlsx}AttachmentProcessor in the consumerPRD §9 #6, §9.1, D-10
CRM crm_note_type_id (activity taxonomy)(filter — not stored)notes-only filter (exclude Calls/Emails/…)CRMExtractor query / consumer filterPRD §9.1, OQ-4
CRM crm_checkin (geolocation)(NOT migrated)explicit drop; logged per noteconsumer (no field written)PRD §9.1, D-10
Source-vs-CDP count validationMongo CountWithFilters on contact_notes where legacy_crm_note_id existsmatch_pct in statusValidationRunner in the consumerPRD §9 #8, NOTES-MIG-S04

Every §2.3 collection field and every §2.4 endpoint traces back to a row here.

Detail 1.A — PRD Traceability Matrix

Forward (PRD AC → RFC):

PRD composite AC idService / endpoint / jobRFC section
NOTES-MIG-S01/AC-1POST /private/notes/migrate → enqueue NotesMigrationJobName§2.4 row 1 · Decision 1/2
NOTES-MIG-S01/AC-2GET /private/notes/migration/status?cid= (progress)§2.4 row 2 · §2.F
NOTES-MIG-S01/AC-3consumer → completed_success (failure ≤1%, match ≥99%)§2.2 · §2.F · §3
NOTES-MIG-S01/ERR-1flag OFF → 403 FLAG_DISABLED§2.4 · §3.B
NOTES-MIG-S01/ERR-2already completed → 409 ALREADY_MIGRATED§2.4 · §3.B
NOTES-MIG-S01/ERR-3failure >1% → halt, halted, PagerDuty§2.2 (failure) · §3 Monitoring
NOTES-MIG-S01/ERR-4non-S2S call → 401/403 (BasicAuth; no IAG/user path)Decision 2 · §3 Role × Endpoint
NOTES-MIG-S02/AC-1ContactResolver via crm_data.id§2.4 algorithm · Decision 7
NOTES-MIG-S02/AC-2HtmlNormalizer sanitize + mention-strip; no <p> wrapDecision 5 · §3 Security
NOTES-MIG-S02/AC-3AttachmentProcessor re-links documentsDecision 8
NOTES-MIG-S02/AC-4preserved created_at/updated_atDecision 3
NOTES-MIG-S02/AC-5skip on existing legacy_crm_note_idDecision 4
NOTES-MIG-S02/ERR-1no contact match → CONTACT_NOT_MAPPED, skip§2.4 · §3.A
NOTES-MIG-S02/ERR-2attachment download fail → insert without itDecision 8 · §3.A
NOTES-MIG-S02/ERR-3owner unmappable → owner_id=null + labelDecision 6
NOTES-MIG-S02/ERR-4activity-type note → skipped (out-of-scope count)§2.4 filter · OQ-4
NOTES-MIG-S03/AC-1..AC-3idempotent re-run via partial unique indexDecision 4 · §2.E
NOTES-MIG-S03/ERR-1two concurrent jobs/CID → one runs, other 409§2.E · Decision 2
NOTES-MIG-S04/AC-1, AC-2ValidationRunner count compare → match_pct§2.4 · §2.F
NOTES-MIG-S04/ERR-1, ERR-2match_pct<99% / count unavailable → completed_with_errors§2.F · §3
NOTES-MIG-S05/AC-1..AC-3render via existing CDP Notes UI (orig ts, attachment, label)§1 Out of Scope #8 (no FE work) · Decision 6
NOTES-MIG-S05/ERR-1, ERR-2missing attachment / hidden edit-delete — existing UI behavior§1 Out of Scope #8 · OQ-6
NOTES-MIG-S06-NEG/NEG-1mention anchors → plain textDecision 5
NOTES-MIG-S06-NEG/NEG-2activity entries excluded§2.4 filter · OQ-4

Reverse (RFC → PRD AC):

New endpoint / field / job / dependencyPRD composite AC id it serves
POST /private/notes/migrateNOTES-MIG-S01/AC-1, ERR-1, ERR-2, ERR-4
GET /private/notes/migration/statusNOTES-MIG-S01/AC-2, AC-3; NOTES-MIG-S04/*
NotesMigrationConsumer.ProcessNotesMigrationJobNOTES-MIG-S01/AC-3; S02/; S04/
contact_notes.legacy_crm_note_id + partial unique indexNOTES-MIG-S02/AC-5; NOTES-MIG-S03/AC-1..AC-3, ERR-1
contact_notes.legacy_owner_label (net-new)NOTES-MIG-S02/ERR-3; NOTES-MIG-S05/AC-3
CreateNotesBatch (timestamp-preserving, bypass SetDefaults)NOTES-MIG-S02/AC-4; NOTES-MIG-S05/AC-1
HtmlNormalizer (sanitize + mention-strip)NOTES-MIG-S02/AC-2; NOTES-MIG-S06-NEG/NEG-1
AttachmentProcessorNOTES-MIG-S02/AC-3, ERR-2; NOTES-MIG-S05/AC-2, ERR-1
QontakCrmClient.ListPersonNotes (+ CRM net-new endpoint)NOTES-MIG-S01/AC-1; S02/* (extraction)

UI / Consumer Surface Coverage

PRD-named surfaceConsumerRequired readsRequired writesStatus surface
Migration triggerOps (S2S)n/aPOST /private/notes/migratejob_id + Redis status
Migration monitorOps (S2S)GET /private/notes/migration/status?cid=n/astatus/progress_pct/match_pct
Migrated notes panelweb + mobile (existing CDP Notes UI)existing GET /iag/v1/contacts/{id}/notesn/a — populated by the consumernote created_at, owner_name/legacy_owner_label, attachments[]

The notes panel is existing UI — no FE work in this RFC (Out of Scope #8).

Role Coverage

PRD roleAuthorization mechanismEndpoints permittedCross-tenant?Audit trail
Internal Ops / Migration EngineerHTTP Basic auth (S2S, /private)POST /private/notes/migrate, GET /private/notes/migration/statusyes — explicit per-batch company_sso_id (this is the only system path; note CRUD cannot do this)per-record success/failure log + Redis status (7d) + audit map (permanent)
Migrated Sales/Support AgentIAG JWT (existing)existing GET /iag/v1/contacts/{id}/notes onlyno — company-scoped by IAG contextn/a (read of migrated data)
Client admin / end usernone for migrationno401/403 (not a logged-in path)

PRD Section Coverage

PRD §TitleWhere covered
3One-liner + Problem§1 Overview
4What happens if we don't build§1 Overview (problem)
5Target Users + Persona§1 Detail 1.A Role Coverage
Scope Changesaffected surfacesfrontmatter scope_changes + §2.I
6Non-Goals§1 Out of Scope
7Constraints§2 Technical Decisions, §2.4, §3
7.1Data Lifecycle§2.3 per-status lifecycle + §3.D Compliance
8New Features (component tree)§2.1 Architecture + §2.I Scope Boundaries
9API & Webhook Behavior§2.4 APIs + §2.2 Sequences
9.1Schema Mapping§1 PRD-to-Schema + §2.3 DDL
10System Flow + Stories + ACs§2.2 Sequences + §1 Detail 1.A/1.C
11Rollout§4 Rollout Strategy
12Observability§3 Monitoring & Alerting
13Success Metrics§1 Success Criteria + §3 SLO
14Launch Plan & Stage Gates§4 Rollout Strategy
15Dependencies§1 Dependencies + §2.F.1 Responsibility Boundary
16Key Decisions + Alternatives§2 Technical Decisions (ADR) + §1 Detail 1.B
17Open Questions§5 Concerns / Open Questions
App. AGrounded Code References§2.0 Repo Reading Guide + Source Verification

Detail 1.B — Key Decisions Summary (full ADR treatment in §2 Technical Decisions)

#DecisionChosen option§2 blockPRD ref
1Insert mechanismIn-process repository batch write (no internal HTTP /cdp/notes/migrate)Decision 1D-1/D-11
2Trigger + statusgocraft/work job + /private/... (BasicAuth) + Redis statusDecision 2D-3/D-11
3TimestampsCaller-set; migrate path bypasses SetDefaults()Decision 3D-2
4Idempotencylegacy_crm_note_id + partial unique indexDecision 4D-1
5ContentServer-side sanitize (net-new) + strip mention anchors; no <p> wrapDecision 5D-4/D-8
6Ownerowner_id=null + net-new legacy_owner_label fallbackDecision 6D-7
7Person→ContactResolve via crm_data.id (indexed); precedence for multi-FKDecision 7D-6/D-12
8AttachmentsRe-upload images+audios+documents to company-scoped storageDecision 8D-10
9Check-inGeolocation dropped (deliberate data loss)Decision 9D-10
10ExtractionExtend QontakCrmClient; CRM squad adds org-scoped endpointDecision 10D-5
11Activity scopeNotes-only filter on crm_note_type_idDecision 11OQ-4

Minimum-coverage notes — Storage: reuse contact_notes Mongo (Decision 1/4). Sync vs async: async gocraft/work (Decision 2). Caching: n/a — one-shot backfill; no read cache. Third-party: CRM via extended QontakCrmClient (Decision 10). Consistency: per-record eventual; idempotency makes re-runs safe (Decision 4). Multi-tenancy: explicit per-batch company_sso_id + company-scoped queries/storage (Decision 2/8). Reuse vs new: every endpoint tagged in §2.4.

Detail 1.C — Per-Story Change Map

Story idTitleLayer scopeBE changes (concrete artifacts)Composite AC idsAcceptance criteria (verifiable)RFC anchors
NOTES-MIG-S01Run batch migration for a CIDBE-onlyPOST /private/notes/migrate + GET /private/notes/migration/status handlers (internal/app/handler/), NotesMigrationService.ValidateAndEnqueue (internal/app/service/), NotesMigrationConsumer.ProcessNotesMigrationJob (internal/app/consumer/), NotesMigrationJobName const, worker registration (internal/worker/worker_service.go), Redis status recordS01/AC-1, AC-2, AC-3, ERR-1..ERR-4go test: enqueue returns {job_id}; status returns progress; 403 flag OFF; 409 already-migrated; 401/403 when not BasicAuth§2.4 rows 1-2 · §4.D chunks 1,7,8 · §1 PRD-to-Schema rows 1-2
NOTES-MIG-S02Transform CRM note → CDP schemaBE-onlyContactResolver (SearchWithFilters on crm_data.id), OwnerResolver, HtmlNormalizer (sanitizer), AttachmentProcessor, CreateNotesBatch (timestamp-preserving)S02/AC-1..AC-5, ERR-1..ERR-4go test: contact resolved by crm_data.id; HTML sanitized, mentions→text, no <p> wrap; document re-linked; ts preserved; dup skipped; each ERR path logged + counted§2.4 algorithm · Decisions 3,5,6,7,8 · §4.D chunks 2-6
NOTES-MIG-S03Idempotent re-runBE-onlycontact_notes.legacy_crm_note_id field + partial unique index migration; skip-on-conflict in CreateNotesBatch; in-progress guard (Redis) per CIDS03/AC-1..AC-3, ERR-1go test/integration: re-run inserts only missing; existing skipped; full re-run → migrated=0; concurrent jobs → one runs, other 409Decision 4 · §2.E · §4.D chunks 2,7
NOTES-MIG-S04Validation & error reportingBE-onlyValidationRunner (CountWithFilters on legacy_crm_note_id existence vs CRM count); structured failed-record log {legacy_crm_note_id, reason_code, details}S04/AC-1, AC-2, ERR-1, ERR-2go test: match_pct computed; ≥99% → success; <99% → completed_with_errors + alert; count unavailable → VALIDATION_SKIPPED§2.4 · §2.F · §3 Monitoring · §4.D chunk 8
NOTES-MIG-S05View migrated notes in CDPBE + FE consumes existingNo FE/BE UI work — migrated rows are read by the existing notes endpoint + UI; legacy_owner_label makes the author renderS05/AC-1, AC-2, AC-3, ERR-1, ERR-2manual/Stage-1: opening a contact shows migrated notes with original created_at (reverse-chron), attachment downloads from CDP storage, legacy_owner_label shows for unmapped owners§1 Out of Scope #8 · Decision 6 · OQ-6
NOTES-MIG-S06-NEGMentions not live; activities not floodedBE-only (guard rail)HtmlNormalizer strips mention anchors; consumer filters activity crm_note_type_idS06-NEG/NEG-1, NEG-2go test: <a data-user-id> → plain @Name, no mention/notification; activity-type note excludedDecision 5 · §2.4 filter · OQ-4

Coverage: all 6 PRD stories present exactly once. NOTES-MIG-S05 is FE consumes existing — no new FE work (Out of Scope #8); the only backend enabler is the legacy_owner_label field (Decision 6).


2. Technical Design

Infrastructure Topology

Deployment topology

flowchart TB
ops([Internal Ops / Migration Engineer]) -->|"HTTPS + Basic auth"| lb[API Gateway / Ingress]
lb -->|"POST /private/notes/migrate"| api["contact-service api pods xN<br/>(cmd/server, Chi router /private)"]
api -->|"enqueue NotesMigrationJobName"| q[["gocraft/work queue<br/>(Redis-backed)"]]
api -->|"write status"| redis[("Redis<br/>(migration status, TTL 7d)")]
q -->|consume| worker["contact-service worker pods xM<br/>(cmd/worker, NotesMigrationConsumer)"]
worker -->|"read crm_data.id / write notes"| mongo[("MongoDB primary<br/>(contacts, contact_notes)")]
worker -->|"update status / progress"| redis
worker -->|"HTTPS, Authorization: CRM_API_AUTH"| crm(["Legacy CRM<br/>(qontak.com, net-new extraction endpoint)"])
worker -->|"HTTPS download (public-read/CDN)"| s3(["CRM CarrierWave S3 / CDN"])
worker -->|"re-upload company-scoped"| store[("CDP attachment storage<br/>{company_sso_id}/...")]
worker -->|"creator_id to SSO UUID"| lp(["Launchpad / Identity"])
agent([Migrated agent]) -->|"GET /iag/v1/contacts/{id}/notes"| lb

Per-service responsibility

flowchart LR
subgraph cs["contact-service (CDP Backend)"]
ep1["POST /private/notes/migrate<br/>(trigger; BasicAuth S2S)"]
ep2["GET /private/notes/migration/status<br/>(monitor; BasicAuth S2S)"]
cons["NotesMigrationConsumer<br/>(extract to transform to insert to validate)"]
end
ep1 -->|"enqueue gocraft/work"| cons
cons -->|"HTTPS — ListPersonNotes (extend QontakCrmClient)"| crm(["Legacy CRM (CRM squad)"])
cons -->|"HTTPS — download originals"| s3(["CRM S3 / CDN"])
cons -->|"re-upload"| store(["CDP company-scoped storage (CDP Infra)"])
cons -->|"creator_id to SSO"| lp(["Launchpad (Identity)"])
cons -->|"resolve / batch insert"| db[("MongoDB: contacts, contact_notes")]
cons -->|"status / progress"| redis[("Redis")]
ServiceUse cases (this RFC)Internal callsExternal / third-party APIs
contact-service (server)validate + enqueue migration; serve statusJobEnqueuer.EnqueueJob, Redis status, BasicAuth
contact-service (worker)extract, resolve, sanitize, re-link, batch-insert, validateContactRepository.SearchWithFilters/CountWithFilters, contact_notes repo (CreateNotesBatch), Redis statusLegacy CRM (extraction, CRM squad); CRM S3/CDN; CDP storage (CDP Infra); Launchpad (Identity)

Technical Decisions (ADR-format — the engineering heart)


Decision 1: Insert via an in-process repository batch write — not an internal POST /cdp/notes/migrate HTTP endpoint

Context The PRD (D-1, §9 #7) specifies a net-new POST /cdp/notes/migrate batch S2S endpoint that a CDPNoteInserter calls. But the PRD also mandates D-11: mirror the in-process migration framework that already exists in contact-service (ActivityLogMigrationConsumer). These two are in tension: if the migration consumer runs inside contact-service, it can write to the contact_notes collection directly — an internal HTTP endpoint would mean the service calling itself over the wire.

Options considered

  • Option A — in-process batch write. The NotesMigrationConsumer calls a new repository method CreateNotesBatch(ctx, []ContactNote) directly (mirroring how ActivityLogMigrationConsumer calls activity_log_migration_service in-process).
    • Pros: no self-HTTP; reuses the proven framework; one fewer public surface to auth/rate-limit; transactional control over skip-on-conflict.
    • Cons: the batch-insert logic is not independently callable by an external orchestrator (acceptable — Ops triggers via the /private enqueue path).
  • Option B — build POST /cdp/notes/migrate (PRD literal). An HTTP batch endpoint the consumer (or an external orchestrator) POSTs to.
    • Pros: matches PRD prose; reusable by a future external orchestrator.
    • Cons: redundant when the consumer is in-process; adds an auth surface; the /cdp/... namespace does not exist in the router (all routes are /iag/v1, /api/v1, /private, rest_router.go).

Decision Option A. The only HTTP surfaces are the /private/notes/migrate trigger and the /private/notes/migration/status monitor (Decision 2). Insert is a direct, idempotent repository write.

Rationale Anti-hallucination grounding: there is no /cdp route group; the existing migration framework is in-process. Building a self-called HTTP endpoint would fork the convention for no functional gain.

Consequences A new CreateNotesBatch repo method (skip-on-conflict via the partial unique index, Decision 4) and a payload struct. If a future cross-service caller needs batch insert, expose it then as a thin handler over the same method.

Reversibility High — adding an HTTP handler over CreateNotesBatch later is additive.


Decision 2: Trigger = gocraft/work enqueue under /private (HTTP Basic auth); status is Redis-backed

Context Bulk migration has no logged-in user, so the existing note-write path (company derived from IAG context, contact_notes_handler.go:75-79,478-486) cannot serve it. The PRD (D-3/D-11) requires S2S with an explicit per-batch company_sso_id, a job-enqueue trigger (not synchronous HTTP), and a status endpoint under the house namespace.

Options considered

  • Option A — mirror ActivityLogMigration. POST /private/notes/migrate (BasicAuth) → NotesMigrationService.ValidateAndEnqueueEnqueueJob(NotesMigrationJobName, …); consumer ProcessNotesMigrationJob(job *work.Job) reads job.Args["data"]; status stored in Redis (key like notes_migration:{cid}, TTL 7d) and read by GET /private/notes/migration/status?cid=.
    • Pros: exact reuse of the proven framework (activity_log_migration_consumer.go:25, activity_log_migration_service.go:64-86, status route rest_router.go:74); BasicAuth is the established S2S mechanism; async survives the request lifecycle.
    • Cons: status is Redis (ephemeral, 7d TTL) — acceptable; the permanent audit map of legacy_crm_note_id → CDP note id lives in the contact_notes documents themselves (PRD §7.1).
  • Option B — synchronous HTTP migrate. Reject — a >21k-note CID exceeds any HTTP timeout; the framework is built for async.
  • Option C — new bearer/system-token auth. Reject — contact-service has no bearer S2S middleware; /private + /api/v1 are guarded by mymiddleware.BasicAuth (rest_router.go:70,79,280; basic_auth.go:10). The S2S field_properties migrate also uses BasicAuth (rest_router.go:344-349).

Decision Option A. Endpoints under /private/notes/..., BasicAuth. This is a grounded deviation from the PRD's /cdp/notes/migrate path — the repo's S2S namespace is /private and its S2S auth is HTTP Basic.

Rationale Maximum reuse + correct auth grounding. NOTES-MIG-S01/ERR-4 (reject non-S2S) is satisfied by BasicAuth: a logged-in IAG user token is simply not accepted on /private.

Consequences Redis status is ephemeral (7d). The per-record failed queue (reason codes, 30d, PRD §7.1) and the permanent audit map are separate: the audit map is intrinsic (each migrated note stores its legacy_crm_note_id); the failed queue is a structured log stream (OQ-3 retry policy).

Reversibility High — endpoints are additive; framework is reused.


Decision 3: Caller-set timestamps — the migrate write path bypasses SetDefaults()

Context ContactNote.SetDefaults() unconditionally overwrites CreatedAt/UpdatedAt with time.Now() (base.go:51-54) and is called in create.go:12 before the Mongo insert. Migrated notes must keep their original CRM timestamps so the existing UI renders them in correct reverse-chronological order (NOTES-MIG-S05/AC-1).

Options considered

  • Option A — CreateNotesBatch sets timestamps explicitly and never calls SetDefaults() (it sets IsDeleted=false/Attachments=[] itself for the fields SetDefaults would otherwise initialise). Pros: surgical; leaves the single-CRUD create.go path untouched. Cons: must replicate the non-timestamp defaults SetDefaults provides.
  • Option B — add a flag to SetDefaults() to skip timestamp overwrite. Cons: changes a shared method used by the live single-CRUD path; higher blast radius.

Decision Option A.

Rationale Lowest blast radius — the live note-create path is unchanged.

Consequences CreateNotesBatch owns default initialisation for migrated rows; a unit test asserts the stored created_at equals the CRM value (not insert time).

Reversibility High — internal repo method.


Decision 4: Idempotency via legacy_crm_note_id + a PARTIAL unique index

Context Re-runnable migration requires skip-on-conflict. ContactNote has no legacy_crm_note_id (base.go:26-36) and contact_notes has four non-unique indexes only (db/migrations/013_create_contact_notes.up.json).

Critical grounding (correctness). A naive unique index on (company_sso_id, legacy_crm_note_id) would break every existing human-created note: those documents have no legacy_crm_note_id, MongoDB indexes a missing field as null, and the second such note per company collides on the null key → E11000 duplicate key. The index must be partial, indexing only documents where the field exists.

Options considered

  • Option A — partial unique index + idempotent upsert {company_sso_id:1, legacy_crm_note_id:1} with partialFilterExpression: {legacy_crm_note_id: {$exists: true}}. CreateNotesBatch writes via the existing IDbRepo.BulkUpdate(ctx, "contact_notes", []mongo.WriteModel) (db.go:180-181, already BulkWrite(SetOrdered(false))) using mongo.NewUpdateOneModel().SetFilter(bson.M{"company_sso_id":…, "legacy_crm_note_id":…}).SetUpdate(bson.M{"$setOnInsert": note}).SetUpsert(true) per note. This is idempotent — an already-migrated note is a no-op ($setOnInsert matches the partial-unique key), so no E11000 is thrown and none needs catching (REV-3/REV-6). result.UpsertedCount = newly inserted; MatchedCount = skipped. The per-CID lock (§2.E) means no concurrent writer, so upsert + partial unique index is race-free. Pros: existing notes untouched; safe re-runs; uses an existing repo method. Cons: must remember the partial filter (captured here + in the migration JSON).
  • Option B — plain unique index. Reject — corrupts existing data on first collision.
  • Option C — app-level "check-then-insert". Reject — race-prone; the DB constraint is the correct backstop.

Decision Option A. Migration JSON (db/migrations/NNN_index_contact_notes_legacy_crm_note_id.up.json) adds a createIndexes entry with "unique": true + the partialFilterExpression, following the existing 013_create_contact_notes.up.json JSON pattern; .down.json drops it.

Rationale The partial filter is the only correct way to add uniqueness to a collection whose legacy rows lack the field.

Consequences CreateNotesBatch counts result.MatchedCount as notes_skipped and result.UpsertedCount as notes_migrated — no duplicate-key error path exists to handle (the upsert is the skip). Note: CreateNotesBatch is implemented over BulkUpdate (upsert), not IDbRepo.CreateMany (db.go:123, a default-ordered InsertMany that would abort the whole batch on the first duplicate). NOTES-MIG-S03/AC-2/AC-3 verified by integration test.

Reversibility High — make migrate-down drops the index; data untouched.


Decision 5: Server-side HTML sanitization (net-new) + strip mention anchors to plain text; no <p> re-wrap

Context CRM note content is sanitized rich HTML at write time in Rails (crm/note.rb:43 before_save :sanitize_note:379application_helper.rb:296-325, Rails sanitize allowing a b i strong em u s span br div p ul ol li blockquote h1-h6 pre + attrs incl. data-user-id). CDP performs no server-side sanitization today — content is length-validated only (contact_notes_service.go:268-274). The migrate write path is a new ingestion surface for externally-sourced HTML, so it must sanitize defensively (XSS posture; PRD D-4, Constraint §7 "Security"). Mentions are embedded as <a data-user-id="…">//users/{id}/edit_user referencing CRM integer user IDs (crm/note.rb:99-107) that do not resolve in CDP (PRD D-8).

Options considered

  • Option A — sanitize with a Go allow-list sanitizer (bluemonday), policy mirroring CRM's allowed tags, and a pre-pass that replaces mention anchors with their inner @Name text. Pros: defends the new ingestion surface; preserves safe markup; kills dangling mention links. Cons: adds a go.mod dependency (bluemonday) — verify license/approval.
  • Option B — trust CRM (already sanitized), store as-is. Reject — CDP would inherit CRM's allow-list decisions for an unauthenticated bulk write path; defense-in-depth requires CDP to sanitize at its own boundary; mention anchors would remain as dead links.
  • Option C — strip all HTML to plain text / wrap in <p> (v1.1 assumption). Reject — corrupts the rich markup the UI renders via DOMPurify; the PRD explicitly forbids <p> re-wrap (§9.1, alternatives-rejected).

Decision Option A, with a deny-by-default policy specified explicitly (do not simply "mirror CRM" — CRM's Rails allow-list permits style/class/ data-mce-href, and style enables CSS-based UI-redress while an unscoped href permits javascript:/data: URIs). Concrete bluemonday policy:

  • Start from bluemonday.UGCPolicy() (strips style, scripts, event handlers).
  • Allow only the structural tags a b i strong em u s span br div p ul ol li blockquote h1 h2 h3 h4 h5 h6 pre.
  • On <a>: AllowStandardURLs() (http/https/mailto only — no javascript:/ data:) + RequireNoFollowOnLinks(true); drop style.
  • Pre-pass: replace every data-user-id/data-mention anchor (and /users/{id}/edit_user links) with its visible text prefixed @, then sanitize.
  • Do not wrap output in <p>.
  • Order: sanitize first, then validate the sanitized output against the existing max=10000 length rule (contact_notes_service.go:271-274) — a note exceeding 10000 chars post-sanitize is counted a failure, never silently truncated.

Rationale A bulk S2S write of externally-sourced HTML is exactly where server-side sanitization belongs; deny-by-default closes the stored-XSS gap (CDP has no sanitization today) without inheriting CRM's looser attribute allow-list; mention-stripping prevents dangling links + false notifications (Out of Scope #2).

Consequences New dependency (bluemonday, InfoSec approval OQ-10) + a HtmlNormalizer unit suite: XSS payloads (<script>, onerror=, javascript: href, style exfil) all neutralised; mention-strip; malformed HTML → best-effort + warning; no <p> wrap; post-sanitize length enforced.

Reversibility Medium — sanitization is internal; the allow-list can be tuned.


Decision 6: Unmappable owner → owner_id=null + a net-new legacy_owner_label

Context owner_name is resolved live from identity, not stored (contact_notes_service.go:131-136 GetUserNamesBulk); edit/delete permission is computed live (contact_notes_handler.go:143-166). A migrated note whose CRM creator_id has no SSO mapping would render a blank author and hidden edit/delete. ContactNote has no field to carry a fallback label (base.go:26-36).

Options considered

  • Option A — add legacy_owner_label string to ContactNote (net-new, schemaless → no DDL). When creator_id→SSO fails, set owner_id=null + legacy_owner_label (e.g. "[Legacy CRM User]" or the CRM display name). The render path falls back to the label when owner_id is empty. Pros: author always renders; non-blocking. Cons: the existing render path (contact_notes_service.go:131-136) must learn to use the label (a small, bounded read-path change — note this is the only read-path touch and it does not alter the UI contract).
  • Option B — drop unmappable notes. Reject — loses history (the whole point).
  • Option C — store a sentinel owner_id. Reject — pollutes identity space; the label is cleaner.

Decision Option A.

Rationale Preserves author display without inventing identities; degrades gracefully (NOTES-MIG-S02/ERR-3, S05/AC-3).

Consequences Net-new field + a render-path branch; edit/delete may be hidden for label-only authors (acceptable for historical notes — OQ-6).

Reversibility High — additive field.


Decision 7: Resolve Person→Contact via crm_data.id (indexed), not source_id; precedence for multi-FK

Context The CDP contact stores the CRM linkage as Source, SourceID, SourceName (contact/base.go:68-70) and CrmData{ID} (:53,342-344). An index crm_contact_index exists on crm_data.id (db/migrations/001_create_contact.up.json); no index exists on source_id. A CRM note can carry person/company/deal/ticket FKs simultaneously — type is STI metadata, not a constraint (crm/note.rb:5-8, schema type column).

Options considered

  • Option A — resolve by crm_data.id using the purpose-built ContactRepository.SearchByAppContactID(ctx, "crm", crmPersonID) (contact/search.go:27), whose appNameColumnMapper maps "crm" → "crm_data.id" (contact/base.go:531-538); for batches, call it per id or extend it to an $in variant. Apply person-first precedence for multi-FK notes. Pros: hits the existing crm_contact_index; reuses a method built exactly for "resolve by source app + contact id"; crm_data.id == crm_person_id is confirmed (REV-1, see Assumptions). Cons: none material — coverage (not id-space) is the only variable (OQ-2).
  • Option B — resolve by source_id. Reject as primary — unindexed → slow scans over ~130 CIDs; keep only as fallback.
  • Option C — net-new external mapping table. Reject as primary — the linkage already exists on the contact (PRD D-12); a table is a last-resort fallback where crm_data.id coverage is incomplete (OQ-2).

Decision Option A.

Rationale Uses the indexed field and a purpose-built query method; the id-space is confirmed (REV-1), so the only variable is coverage, measured cheaply via CountDocuments/CountWithFilters for the pre-migration report.

Consequences OQ-2 is now a coverage gate only (not id-space): run the per-CID coverage report and gate job start at ≥99%; unmatched notes → CONTACT_NOT_MAPPED failed queue. crm_data.id is a string, so the resolver string-casts the CRM crm_person_id before lookup.

Reversibility High — resolution strategy is internal; a mapping-table fallback is additive.


Context A CRM note has three attachment associations, all CarrierWave/S3 assets: crm_note_images (crm/note.rb:21), crm_note_audios (:23), and the documents association missed by v1.1crm_note_attachment has_one (:15) + crm_note_attachments has_many (:20), model Crm::NoteAttachment (note_attachment.rb, allowed types incl. PDF/Word/Excel/PPT/CSV/images/video/audio, note_attachment_uploader.rb:13-50). CDP's ContactNote.Attachments[] is {URL, Type, FileSizeInByte, FileSize, FileName} with Type ∈ {image,doc,pdf,video,voice_note,xlsx} (base.go:17-23, validation contact_notes_service.go:286-293). CRM S3 is public-read (URLs generally fetchable without signing, carrierwave-s3.rb:27,58).

Options considered

  • Option A — download each asset, re-upload to CDP {company_sso_id}/... storage, store the proxy URL + derived Type. Pros: no permanent CRM-S3 dependency (alternatives-rejected); company-scoped (tenant isolation). Cons: download+upload latency (PRD budget ≤30s/file P95); type-mapping work.
  • Option B — store CRM S3/CDN URLs directly. Reject — permanent legacy dependency; cross-tenant URL exposure.

Decision Option A. Type mapping: CRM image asset → image; audio → voice_note (or video for video/* per content type); document → doc/pdf/xlsx by file extension/content type (default doc). No ≤1 voice_note cap is enforced today (contact_notes_service.go:286-293) — multiple audios are allowed; OQ-8 decides whether to add a cap. Download safety (required): (a) SSRF guard — only fetch URLs whose host is on an allow-list of the CRM S3/CDN domains (reject arbitrary hosts, internal IPs, and cloud metadata endpoints); the URL comes from CRM API response data and must not be trusted blindly; (b) verify the downloaded magic bytes / content-type match the declared Type (the extension is attacker-influenceable); (c) enforce a max download size. Storage key (deterministic, idempotent on re-run): {company_sso_id}/{legacy_crm_note_id}/{asset} — a re-run overwrites the same key safely (matches §2.E).

Rationale Matches PRD §9 #6 and the data-lifecycle (CDP holds its own copy); the SSRF/content-type guards harden a new outbound-fetch path against malicious or malformed CRM URLs.

Consequences A per-attachment failure inserts the note without that attachment and logs ATTACHMENT_*_FAILED (non-blocking — NOTES-MIG-S02/ERR-2).

Reversibility Medium — re-uploaded objects would need cleanup if reverted.


Decision 9: Drop crm_checkin geolocation (deliberate data loss)

Context crm_checkin is has_one :crm_checkin, class_name 'Crm::Checkin' (crm/note.rb:16); Crm::Checkin < Crm::Location (STI on crm_locations), with longitude, lattitude (sic), address (now Lockbox-encrypted address_ciphertext, schema crm_locations :1789), checkin_time. CDP Notes have no geolocation field and no PRD requirement to render one.

Decision Do not migrate check-in geolocation. Log a per-note marker when a note has a check-in so the data loss is auditable (PRD D-10).

Rationale No CDP target field, no requirement; mischaracterising it as a string (v1.1) hid real data loss. Encrypted address would additionally require key access.

Consequences Documented, audited data loss; revisit only if a CDP geo field is added later.

Reversibility N/A — explicit non-goal.


Decision 10: Extract via the existing QontakCrmClient; CRM squad delivers a net-new org-scoped endpoint

Context The assumed GET /crm/notes?organization_id&limit&offset does not exist. The real v4 API (api/v4/notes.rb:131-147) is entity-scoped (requires lead/company/deal/ticket) and does not actually paginate. There is no org-scoped bulk or count endpoint. But contact-service already has an authenticated S2S CRM client — QontakCrmClient (qontak_crm.go:14-24, CRM_API_ROOT_URL/CRM_API_AUTH, posting to /crm/centralized_contacts/*).

Options considered

  • Option A — CRM squad adds a net-new org-scoped Person-notes extraction endpoint (paginated by page/per_page, returning HTML, creator_id, images/audios/documents, crm_note_type_id, timestamps), and contact-service consumes it by extending QontakCrmClient with ListPersonNotes(ctx, cid, page, perPage). Pros: reuses the existing authenticated client + error handling (qontak_crm.go:43-47 5xx/Locked/429 handling); CRM owns its data access. Cons: cross-squad dependency (blocking).
  • Option B — direct Postgres read of crm_notes (e.g. via Bifrost) using Crm::PersonNote.where(organization_id: cid). Pros: no CRM API work. Cons: organization_id has no index on crm_notes (schema :2004-2008) → heavy scans; couples CDP to CRM's physical schema; bypasses CRM's read auth. Keep as a fallback (OQ-1).

Decision Option A as default; Option B (Bifrost/DB read) as the fallback if the CRM endpoint slips (OQ-1). Extraction throughput is load-tested before Internal QA (OQ-7).

Rationale Reuses a proven, authenticated client; respects service boundaries; the indexed-org-read problem makes the raw DB path costly.

Consequences Blocking cross-squad dependency on the CRM endpoint; an extension method + payload structs in contact-service. Client/timeout (REV-2): the current QontakCrmClient uses http.DefaultClient.Do with no timeout (qontak_crm.go:37) — ListPersonNotes must instead use the repo's standard heimdall httpclient pattern (httpclient.NewClient(WithHTTPTimeout(timeout)), as in api/iag_mekari.go:69-71, qontak_billing.go:183-185) with the timeout from a config duration getDurationOrPanic("CRM_NOTES_EXTRACT_TIMEOUT") (default 10s) and a heimdall retrier: 3 attempts, exponential backoff 1s / 3s / 9s, retrying on timeout + 5xx/Locked/429 (matching qontak_crm.go:43-47); after the budget is exhausted → CRM_EXTRACT_FAILED halt (PRD §9 #2).

Reversibility Medium — the extractor is behind an interface; swap to Bifrost is one implementation.


Decision 11: Notes-only scope (filter on crm_note_type_id)

Context The crm_notes table stores an activity taxonomy via crm_note_type_idCrm::NoteType (crm/note.rb:4, table crm_note_types, seed: Notes, Calls, Emails, Meeting, Tickets, Documents, Tasks, Whatsapp, Telegram, SMS, … note_type.rb:18). Migrating everything would flood the Notes panel with calls/emails/etc. (PRD OQ-4, NOTES-MIG-S06-NEG/NEG-2).

Decision Default notes-only: the extractor/consumer filters to note-type entries (e.g. crm_note_type_id IN (Notes, Documents) — the report queries already treat (1,6) as notes/documents, crm/note.rb:166,299,312). Confirm the exact type-id set with PM (OQ-4); excluded entries are counted out-of-scope, not failures.

Rationale Avoids polluting the Notes panel; matches the PRD default.

Consequences The exact crm_note_type_id set is a PM-confirmed config value; NOTES-MIG-S02/ERR-4 + S06-NEG/NEG-2 verified by test.

Reversibility High — the filter is config.


Detail 2.0 — Repo Reading Guide

Repo Map (slice this RFC touches)

flowchart LR
subgraph cs["contact-service/internal/"]
rr["server/rest_router.go<br/>(/private group)"]
h["app/handler/<br/>(notes_migration_handler)"]
svc["app/service/<br/>(notes_migration_service, HtmlNormalizer)"]
cons["app/consumer/<br/>(notes_migration_consumer)"]
apic["app/api/qontak_crm.go<br/>(ListPersonNotes)"]
repoN["app/repository/contact_notes/<br/>(CreateNotesBatch, fields)"]
repoC["app/repository/contact/<br/>(SearchWithFilters)"]
wrk["worker/worker_service.go<br/>(register job)"]
enq["app/service/job_enqueuer.go"]
end
subgraph infra["infrastructure"]
mongo[("MongoDB: contacts, contact_notes")]
redis[("Redis: status + work queue")]
store[("CDP attachment storage")]
end
rr --> h --> svc --> enq --> redis
cons --> apic
cons --> repoC --> mongo
cons --> repoN --> mongo
cons --> store
cons --> redis
wrk --> cons

Existing Code Anchors

PathWhy the agent reads itWhat pattern it teaches
internal/app/handler/activity_log_migration_handler.go:32,77,91The handler to mirrorUpdateUserIDValidateAndEnqueue; GetMigrationStatus shape
internal/app/consumer/activity_log_migration_consumer.go:25-50The consumer to mirrorProcessUpdateUserIDJob(job *work.Job) error; reads job.Args["data"] → unmarshal → service
internal/app/service/activity_log_migration_service.go:22-31,64-86,115Service + Redis status + enqueue + batchjob-name const; Redis status key + TTL 7d; EnqueueJob; batched execute (10000)
internal/app/service/job_enqueuer.go:38-39,53,65-67How to enqueuework.NewEnqueuer(namespace, redis); Enqueue(name, work.Q{"data": params})
internal/worker/worker_service.go:100,132,138Register the new jobregisterJobregisterJobWithOptions(jobName, opts, handler, pool)
internal/server/rest_router.go:69-79,344-349Where to register /private/notes/* + S2S auth/private groups guarded by mymiddleware.BasicAuth; S2S migrate pattern
internal/pkg/middleware/basic_auth.go:10S2S auth mechanismconstant-time Basic-auth compare vs config.BasicAuth
internal/app/repository/contact_notes/base.go:17-23,26-36,39-54The note store to extendAttachment + ContactNote structs; TableName()="contact_notes"; SetDefaults() overwrites ts
internal/app/repository/contact_notes/create.go:12The single-CRUD insert (do not break)SetDefaults() then mongo.Create
internal/app/service/contact_notes/contact_notes_service.go:131-136,268-274,286-293Render path + validation ruleslive owner-name resolve; length-only validation; attachment Type allow-set
internal/app/handler/contact_notes_handler.go:75-79,143-166,478-486Why a system path is neededcompany from IAG ctx; live permission compute
internal/app/repository/contact/base.go:53,68-70,342-344Person→Contact linkageSource/SourceID/SourceName; CrmData{ID}
internal/app/repository/contact/search.go:125,147Resolution + coverage querySearchWithFilters(ctx, bson.M, …); CountWithFilters
internal/app/api/qontak_crm.go:14-58The CRM client to extendQontakCrmClient; auth header from CRM_API_AUTH; 5xx/Locked/429 handling
db/migrations/013_create_contact_notes.up.jsonIndex migration JSON patterncreateIndexes JSON; basis for the partial unique index
db/migrations/001_create_contact.up.jsonExisting crm_data.id indexcrm_contact_index
config/load.go:197-198,306-314Config injectiongetStringOrPanic("CRM_API_ROOT_URL"/"CRM_API_AUTH")

Existing Contracts to Reuse, Extend, or Replace

ContractStatusJustificationOwner
POST /private/notes/migratenew-with-justificationNo migration trigger exists; mirrors /private/activity_logs PATCH-enqueue; BasicAuth S2SCDP BE
GET /private/notes/migration/statusnew-with-justificationNo notes-migration status; mirrors rest_router.go:74CDP BE
contact_notes collectionextendedAdd legacy_crm_note_id + legacy_owner_label; collection + repo existCDP BE
CreateNotesBatch repo methodnew-with-justificationNo batch insert exists; needed for throughput + skip-on-conflictCDP BE
Partial unique index on (company_sso_id, legacy_crm_note_id)new-with-justificationNo unique index exists; must be partial (Decision 4)CDP BE
QontakCrmClientextendedAdd ListPersonNotes; client + auth existCDP BE
ContactRepository.SearchWithFilters/CountWithFiltersreusedDrive with crm_data.id $inCDP BE
JobEnqueuer / gocraft/work / worker registrationreusedSame as ActivityLogMigrationCDP BE
/cdp/notes/migrate (PRD literal HTTP batch endpoint)replaced (not built)In-process batch write instead (Decision 1)CDP BE
CRM org-scoped Person-notes extraction endpointnew (external)Does not exist; v4 is entity-scoped + unpaginatedLegacy CRM Squad
Go HTML sanitizer (bluemonday)new dependencyNo server-side sanitization in CDP todayCDP BE

Patterns to Follow

ConcernPattern in repoReference fileDeviation?
Handler shapedecode → validate → service → typed responseactivity_log_migration_handler.go:32-91; myhttp.NewJSONResponse/ErrBadRequestnone
Service + enqueuevalidate → EnqueueJob(name, form) → Redis statusactivity_log_migration_service.go:64-86none
Queue consumerfunc (w *Consumer) Method(job *work.Job) error; job.Args["data"]activity_log_migration_consumer.go:25-50none
External HTTP (S2S)http.NewRequest + Authorization header; 5xx/Locked/429 → retry/errorapi/qontak_crm.go:26-58extend with ListPersonNotes
Repository / DB accessr.mongo.Create/Where/Update; filters as bson.Mcontact_notes/create.go; contact/search.go:125new CreateNotesBatch (bulk)
Error wrapping / loggingfmt.Errorf("ctx: %w", err); slog.ErrorContextactivity_log_migration_consumer.go:27-35none
Index declarationcreateIndexes JSON migrationdb/migrations/013_create_contact_notes.up.jsonadd unique+partialFilterExpression
Config/secretsgetStringOrPanic(key) in config/load.goconfig/load.go:197-198,306new keys for extraction if needed

Reading Order for the Agent

  1. internal/app/consumer/activity_log_migration_consumer.go:25-50 — the consumer shape to mirror.
  2. internal/app/service/activity_log_migration_service.go:22-86 — enqueue + Redis status + batching.
  3. internal/app/handler/activity_log_migration_handler.go:32-91 — handler + status endpoint.
  4. internal/server/rest_router.go:69-79,344-349/private groups + BasicAuth S2S.
  5. internal/app/repository/contact_notes/base.go:17-54 + create.go:12 — the note store + SetDefaults pitfall.
  6. internal/app/service/contact_notes/contact_notes_service.go:131-136,268-293 — render + validation rules.
  7. internal/app/repository/contact/{base.go:53,342-344, search.go:125,147} — Person→Contact linkage + query.
  8. internal/app/api/qontak_crm.go:14-58 — the CRM client to extend.
  9. internal/worker/worker_service.go:100,132,138 + job_enqueuer.go:38-67 — job registration + enqueue.
  10. db/migrations/013_create_contact_notes.up.json + 001_create_contact.up.json — index JSON pattern + existing crm_data.id index.

Source Verification (anti-hallucination — verified 2026-06-18)

Anchor / pattern / contractVerified byEvidence
Notes single-CRUD only; no migrate/batch/count/sourcereadrest_router.go:150-159 (+ deprecated /notes :162-169); GetNotes params contact_notes_handler.go:112-125 (page/per_page/order_by/order_direction/owner_ids)
ContactNote has no legacy_crm_note_id/legacy_owner_labelreadcontact_notes/base.go:26-36 fields: ID, ContactID, CompanySsoID, Note(max=10000), Attachments, OwnerID, IsDeleted, CreatedAt, UpdatedAt
Attachment shape + Type allow-setreadbase.go:17-23 {URL,Type,FileSizeInByte,FileSize,FileName}; valid types contact_notes_service.go:286-293 = image/doc/pdf/video/voice_note/xlsx; no ≤1 voice_note cap
SetDefaults() overwrites timestampsreadbase.go:51-54 cn.CreatedAt = now; cn.UpdatedAt = now; called create.go:12
owner_name live; permission livereadcontact_notes_service.go:131-136 GetUserNamesBulk; contact_notes_handler.go:143-166 resolveNotePermission
company from IAG ctx (no system write path)readcontact_notes_handler.go:75-79 extractCompanyIDFromContext; def :478-486 reads consts.CompanySSOKey
no server-side sanitizationread/grepcontact_notes_service.go:268-274 length-only; grep sanitize/bluemonday/policy → 0 hits
existing migration frameworkreadactivity_log_migration_handler.go:32,77,91; activity_log_migration_consumer.go:25 ProcessUpdateUserIDJob(job *work.Job) error; activity_log_migration_service.go:22 job-name const, :25 Redis key, :28 batch 10000, :31 TTL 7d, :64-86 enqueue; status route rest_router.go:74
S2S = BasicAuth on /private + /api/v1readrest_router.go:69-70,78-79,279-280; basic_auth.go:10; S2S migrate :344-349 /migrate-default-fields
EnqueueJob mechanismreadjob_enqueuer.go:38-39,53,65-67 work.NewEnqueuer; Enqueue(name, work.Q{"data":params})
worker registrationreadworker_service.go:100 registerJob, :132,138 registerJobWithOptions(jobName, opts, handler, pool)
contact CRM linkage + indexreadcontact/base.go:53 CrmData *CrmData, :68-70 Source/SourceID/SourceName, :342-344 CrmData{ID}; index crm_contact_index on crm_data.id db/migrations/001_create_contact.up.json; no source_id index
resolution + coverage queryreadcontact/search.go:125 SearchWithFilters(ctx, bson.M, limit, page, sort); :147 CountWithFilters(ctx, bson.M)
existing CRM clientreadapi/qontak_crm.go:14-24 QontakCrmClient; :34/:68/:101 Authorization header; :43,77,110 5xx/Locked/429 handling; config load.go:197-198 CRM_API_ROOT_URL/CRM_API_AUTH
notes collection + index patternreadcontact_notes/base.go:39-41 TableName()="contact_notes"; indexes via db/migrations/013_create_contact_notes.up.json (4 non-unique); none unique
build/test/lint/migratereadMakefile: make build (go build -tags dynamic), make test (go test -race -tags dynamic ./internal/... ./config/...), make lint (staticcheck ./...), make sec (gosec), make migrate-up (golang-migrate Mongo driver; JSON migrations db/migrations/, {seq}_{name}.up.json)
gocraft/work version + worker entryreadgo.mod github.com/gocraft/work v0.5.1; cmd/worker
CRM note sanitize + tagsreadqontak.com/app/models/crm/note.rb:43 before_save :sanitize_note; :379 sanitize_note; app/helpers/application_helper.rb:296-325 Rails sanitize, tags a b i strong em u s span br div p ul ol li blockquote h1-h6 pre
CRM 3 attachment typesreadcrm/note.rb:21 crm_note_images (Crm::NoteImage<Asset), :23 crm_note_audios, :15 has_one crm_note_attachment, :20 has_many crm_note_attachments; note_attachment.rb; types note_attachment_uploader.rb:13-50; CarrierWave→S3 public-read carrierwave-s3.rb:27,58
CRM checkin geolocationreadcrm/note.rb:16 has_one :crm_checkin (Crm::Checkin<Crm::Location); geo on crm_locations db/schema.rb:1762-1798 (longitude,lattitude,address_ciphertext,checkin_time)
CRM multi-FK + STIreadcrm/note.rb:5-8 belongs_to crm_person/crm_company/crm_deal/tickets (all nullable int); type string column db/schema.rb:1990; Crm::PersonNote<Crm::Note app/models/crm/person_note.rb
CRM real API entity-scoped + no paginatereadapp/controllers/api/v4/notes.rb:131-147 params page/per_page declared but index set_entity-scoped, no .page/.per_page; no /crm/notes/count/bulk/org-scoped
CRM mentions via data-user-idreadcrm/note.rb:99-107 mention_people scans /users/(\d+)/edit_user + data-user-id (integer IDs)
CRM activity taxonomyreadcrm/note.rb:4 belongs_to :crm_note_type; crm_note_type_id db/schema.rb:1983; note_type.rb:18 seed Notes/Calls/Emails/…; (1,6) notes/documents note_type.rb refs
CRM hard-delete (no deleted_at); std timestampsreaddb/schema.rb:1978-2009 no deleted_at; v4 delete destroy! api/v4/notes.rb:242; created_at/updated_at :1981-1982
CRM org-wide person notes: no scope/indexreadno scope/default_scope on Crm::Note/PersonNote; organization_id int db/schema.rb:1985, no index; Crm::PersonNote.where(organization_id: cid) is the raw path

Detail 2.1 — Architecture (mermaid)

Component diagram

flowchart TB
ops([Ops S2S]) --> handler[/"NotesMigrationHandler<br/>/private/notes/migrate"/]
handler --> svc["NotesMigrationService.ValidateAndEnqueue"]
svc --> enq[["JobEnqueuer.EnqueueJob<br/>NotesMigrationJobName"]]
svc --> redis[("Redis status")]
enq --> queue[["gocraft/work (Redis)"]]
queue --> cons["NotesMigrationConsumer.ProcessNotesMigrationJob"]
cons --> ext["QontakCrmClient.ListPersonNotes"]
cons --> resolver["ContactResolver (SearchWithFilters crm_data.id)"]
cons --> owner["OwnerResolver (creator_id to SSO)"]
cons --> html["HtmlNormalizer (sanitize + strip mentions)"]
cons --> att["AttachmentProcessor (download + re-upload)"]
cons --> batch["contact_notes.CreateNotesBatch (idempotent, ts-preserving)"]
cons --> valid["ValidationRunner (CountWithFilters)"]
resolver --> mongo[("MongoDB: contacts")]
batch --> notes[("MongoDB: contact_notes")]
att --> store[("CDP company-scoped storage")]
cons --> redis

Data model (erDiagram)

erDiagram
CONTACT_NOTES {
objectid _id PK
string contact_id "resolved CDP contact UUID"
string company_sso_id "per-batch tenant scope"
string note "sanitized HTML (no p-wrap)"
array attachments "type in image|doc|pdf|video|voice_note|xlsx"
string owner_id "SSO UUID or null"
string legacy_owner_label "NEW: shown when owner_id null"
string legacy_crm_note_id "NEW: idempotency key"
bool is_deleted
datetime created_at "PRESERVED from CRM"
datetime updated_at "PRESERVED from CRM"
}
CONTACTS {
string id PK
string source
string source_id "not indexed"
object crm_data "crm_data.id INDEXED (crm_contact_index)"
}
CONTACTS ||..o{ CONTACT_NOTES : "crm_data.id == legacy crm_person_id"

Idempotency: partial unique index {company_sso_id:1, legacy_crm_note_id:1} with partialFilterExpression:{legacy_crm_note_id:{$exists:true}} (Decision 4) — does not touch existing notes that lack the field.

State machine — migration job status (Redis-backed)

stateDiagram-v2
[*] --> not_started
not_started --> in_progress: enqueue accepted
in_progress --> in_progress: per-batch progress
in_progress --> halted: failure_rate gt 1 pct
in_progress --> completed_success: match_pct gte 99 pct
in_progress --> completed_with_errors: match_pct lt 99 pct or VALIDATION_SKIPPED
halted --> in_progress: re-trigger (idempotent)
completed_with_errors --> in_progress: re-trigger after fix
completed_success --> [*]

Branch & skip flow — per-note routing

flowchart TD
note([CRM note dequeued]) --> typ{"note-type in scope?"}
typ -- no --> oos["count out-of-scope (not a failure)"]
typ -- yes --> dup{"legacy_crm_note_id exists?"}
dup -- yes --> skip["skip (count notes_skipped)"]
dup -- no --> res{"contact resolved by crm_data.id?"}
res -- no --> cnf["CONTACT_NOT_MAPPED (skip + count failure)"]
res -- yes --> ins["sanitize + re-link + insert (preserve ts)"]
oos --> done([next note])
skip --> done
cnf --> done
ins --> done

Detail 2.2 — Sequence (end-to-end, incl. failure paths)

Happy path — trigger + async migrate + validate

sequenceDiagram
actor Ops as Ops (S2S, BasicAuth)
participant LB as Ingress
participant API as contact-service api
participant RD as Redis (status + queue)
participant Q as gocraft/work
participant W as NotesMigrationConsumer (worker)
participant CRM as Legacy CRM (extraction)
participant DBc as MongoDB contacts
participant S3 as CRM S3/CDN
participant ST as CDP storage
participant DBn as MongoDB contact_notes

Ops->>LB: POST /private/notes/migrate {cid, company_sso_id}
LB->>API: BasicAuth
alt flag OFF / already completed
API-->>Ops: 403 FLAG_DISABLED / 409 ALREADY_MIGRATED
else valid
API->>RD: set status in_progress {cid}
API->>Q: EnqueueJob(NotesMigrationJobName, {cid, company_sso_id})
API-->>Ops: 200 {job_id}
Note over Q,W: async
loop paginated (page/per_page)
W->>CRM: ListPersonNotes(cid, page) (notes-only filter)
CRM-->>W: notes (HTML, creator_id, images/audios/documents, ts)
end
loop per batch
W->>DBc: SearchWithFilters(crm_data.id in [...])
W->>W: sanitize HTML + strip mentions; resolve owner; map attachments
loop per attachment
W->>S3: download original
W->>ST: re-upload {company_sso_id}/... then proxy URL
end
W->>DBn: CreateNotesBatch (legacy_crm_note_id, caller ts, skip-on-conflict)
W->>RD: update progress_pct / notes_processed
alt failure_rate gt 1 pct
W->>RD: status halted
W-->>Ops: PagerDuty P1 {job_id, cid, failure_rate}
end
end
W->>DBn: CountWithFilters(legacy_crm_note_id exists) vs CRM count
W->>RD: status completed_success {match_pct}
Ops->>API: GET /private/notes/migration/status?cid
API->>RD: read status
API-->>Ops: {status, progress_pct, match_pct, counts}
end

Failure path — extraction / attachment / validation

sequenceDiagram
participant W as NotesMigrationConsumer
participant CRM as Legacy CRM
participant S3 as CRM S3/CDN
participant DBn as MongoDB contact_notes
participant RD as Redis

alt CRM 5xx / timeout
W->>CRM: ListPersonNotes (retry 3x backoff)
CRM-->>W: still failing
W->>RD: status halted, CRM_EXTRACT_FAILED
else attachment download/upload fails (non-blocking)
W->>S3: download (fails)
W->>DBn: insert note WITHOUT that attachment
W->>RD: log ATTACHMENT_DOWNLOAD_FAILED (not a note failure)
else owner unmappable (non-blocking)
W->>DBn: insert with owner_id=null + legacy_owner_label
else contact not mapped
W->>RD: log CONTACT_NOT_MAPPED (skip + count failure)
else count source unavailable after retries
W->>RD: status completed_with_errors, VALIDATION_SKIPPED + alert
end

Detail 2.3 — Database Model (Mongo)

MongoDB (schemaless). Extend the existing contact_notes collection (contact_notes/base.go). No DDL migration for the new fields; one partial unique index migration.

// New fields on ContactNote (application struct additions):
// legacy_crm_note_id string // CRM crm_notes.id (migrated rows only)
// legacy_owner_label string // shown when owner_id is empty (Decision 6)

// Partial unique index migration (db/migrations/NNN_index_contact_notes_legacy_crm_note_id.up.json),
// following the createIndexes JSON pattern of 013_create_contact_notes.up.json:
{
"createIndexes": "contact_notes",
"indexes": [{
"key": { "company_sso_id": 1, "legacy_crm_note_id": 1 },
"name": "uq_contact_notes_company_legacy_crm_note_id",
"unique": true,
"partialFilterExpression": { "legacy_crm_note_id": { "$exists": true } }
}]
}
// .down.json: { "dropIndexes": "contact_notes", "index": "uq_contact_notes_company_legacy_crm_note_id" }
  • Cardinality / growth: ~21,000+ notes across ~130 CIDs (one-time). Attachment bodies live in CDP storage, not Mongo.
  • PII classification: note (free-text customer interaction history — PII), attachments[].url (links to PII files), legacy_owner_label (may be a person's name), contact_id/company_sso_id (internal identifiers). See §3.D.
  • Retention (PRD §7.1): migration status (Redis) 7d; failed-record queue 30d; audit map (legacy_crm_note_id → CDP note id) permanent (intrinsic to each migrated document); source CRM notes untouched (read-only per CRM policy).

Per-status lifecycle (migration run, Redis status):

StatusVisibilityRetentionRestore semanticsTransitions
not_startedinternal (status API)n/an/a→ in_progress
in_progressinternal7d (Redis)n/a→ halted / completed_*
haltedinternal + PagerDuty7dre-trigger (idempotent)→ in_progress
completed_with_errorsinternal + alert + error log7dre-trigger after fix→ in_progress
completed_successinternal7dre-run is a no-op (all skipped)terminal
  • Partition/sharding: none — bounded one-time volume.

Detail 2.4 — APIs

Outbound endpoints (consumers call us)

EndpointMethodAuthN/AuthZRequestResponseStatus codesIdempotencyReuse?
/private/notes/migratePOSTmymiddleware.BasicAuth (S2S){cid:string, company_sso_id:string} (body){job_id:string, status:"in_progress"}200; 403 FLAG_DISABLED; 404 CID_NOT_FOUND; 409 ALREADY_MIGRATED/JOB_ALREADY_RUNNING; 401/403 non-Basicenqueue is guarded by an in-progress lock per CID (Redis); re-trigger after terminal is safe (skip-on-conflict)new-with-justification
/private/notes/migration/statusGETmymiddleware.BasicAuth (S2S)query cid{status, progress_pct, notes_processed, notes_total, failure_rate, match_pct, error_log_url?}200; 404 CID_NOT_FOUND (→ not_started)n/a (read)new-with-justification (mirrors rest_router.go:74)

Person→Contact resolution algorithm (implementation contract for chunk 3). Resolve via ContactRepository.SearchByAppContactID(ctx, "crm", crmPersonID) (contact/search.go:27), which maps "crm" → "crm_data.id" through appNameColumnMapper (contact/base.go:531-538) and hits crm_contact_index; string-cast the CRM crm_person_id first (crm_data.id is a string, base.go:343). For throughput, batch by extending it (or SearchWithFilters) with bson.M{"crm_data.id": {"$in": batchOfCrmPersonIDs}, "company_sso_id": companySsoID} → build a crm_person_id → contact_id map. For a note carrying multiple FKs, apply person-first precedence (Decision 7). A note whose crm_person_id is absent from the map → CONTACT_NOT_MAPPED (skip + count failure, no halt). Fall back to a source_id/mapping-table lookup only where crm_data.id coverage is incomplete (OQ-2).

Internal calls (no HTTP surface):

  • QontakCrmClient.ListPersonNotes(ctx, cid, page, perPage) — extend qontak_crm.go; calls the CRM net-new org-scoped endpoint with the existing Authorization: {CRM_API_AUTH} header; reuses the 5xx/Locked/429 handling (:43-47).
  • ContactNoteRepo.CreateNotesBatch(ctx, []ContactNote) — bulk write with skip-on-conflict (Decision 4) and explicit timestamps (Decision 3).

Inbound webhooks (other services call us)

EndpointSourceNotes
n/a — no inbound webhook; contact-service initiates extraction (pull) from CRM, not a receiver

Detail 2.A — Async Job / Event Consumer Spec

Job/ConsumerTriggerInput shapeRetryConcurrencyIdempotency keyPer-msg timeoutPoison handling
NotesMigrationConsumer.ProcessNotesMigrationJobEnqueueJob(NotesMigrationJobName){cid, company_sso_id} via job.Args["data"] (mirror activity_log_migration_consumer.go:38-47)extraction: 10s timeout, 3× backoff 1s/3s/9s (heimdall retrier) → CRM_EXTRACT_FAILED halt; per-note failures counted, not retried at job level; batch write is an idempotent upsert (retry-safe, Decision 4)per-CID in-progress lock (Redis) → second concurrent job for the same CID returns 409 JOB_ALREADY_RUNNINGlegacy_crm_note_id (per note, via upsert filter) + per-CID lockbounded by ≤4h/CID window (PRD §7); batch 500 (max 1000)failure_rate >1% → halted + PagerDuty; never silently drop a note

Detail 2.E — Concurrency Collision Map

ResourceWritersCollision scenarioResolutionBehavior on conflict
Migration run (one CID)Opstwo enqueues for the same CIDper-CID in-progress lock in Redis (mirror the activity-log status key)second enqueue → 409 JOB_ALREADY_RUNNING (NOTES-MIG-S03/ERR-1)
contact_notes docconsumersame note written twice (re-run / overlapping batch)idempotent upsert on (company_sso_id, legacy_crm_note_id) + partial unique index (Decision 4)already-present note matches → no-op (MatchedCount, counted notes_skipped); never an error
CDP storage objectconsumersame attachment re-uploaded on re-runobject key namespaced {company_sso_id}/{legacy_crm_note_id}/{asset} (deterministic)overwrite same key safely

Detail 2.F — Responsibility Boundary Matrix

StepOwning squad / serviceInbound triggerOutbound effectFailure handlerPRD anchor
1. Validate + enqueueCDP BE (api)POST /private/notes/migrateRedis status + job enqueue403/404/409 to Ops§9 #1, S01
2. Extract Person notesCDP BE (worker) → CRM squad endpointdequeued jobpaginated note pullretry 3× → CRM_EXTRACT_FAILED halt§9 #2, D-5
3. Resolve contactCDP BE (worker)per notecrm_person_id → contact_idCONTACT_NOT_MAPPED skip + count§9 #3, D-7
4. Resolve ownerCDP BE (worker) → Launchpadper noteowner_id or labelunmappable → null + legacy_owner_label (non-blocking)§9 #4, D-7
5. Sanitize + strip mentionsCDP BE (worker)per notesafe HTMLmalformed → best-effort + warning§9 #5, D-4/D-8
6. Re-link attachmentsCDP BE (worker) → CDP storageper attachmentproxy URLATTACHMENT_*_FAILED insert-without (non-blocking)§9 #6, D-10
7. Batch insertCDP BE (worker)per batchidempotent upsert (BulkUpdate)already-present → no-op skip; transient Mongo write error → gocraft retries the batch (upsert makes retry a no-op for written notes); persistent error → BATCH_WRITE_FAILED, count + continue; failure_rate >1% → halt§9 #7, D-1
8. ValidateCDP BE (worker)post-batchesmatch_pctcount unavailable → VALIDATION_SKIPPED + alert§9 #8, S04
9. StatusCDP BE (api)GET /private/notes/migration/statusstatus payloadCID unknown → not_started§9 #9, S01/AC-2
10. Render migrated notesexisting CDP Notes UI (web+mobile)GET /iag/v1/contacts/{id}/notesUI renderexisting behavior; legacy_owner_label fallback§10, S05

Detail 2.I — Scope Boundaries

  • BE create: internal/app/handler/notes_migration_handler.go, internal/app/service/notes_migration_service.go, internal/app/consumer/notes_migration_consumer.go, HtmlNormalizer (internal/app/service/... or internal/pkg/util/), ContactResolver / OwnerResolver / AttachmentProcessor helpers, NotesMigrationJobName const, payload structs (internal/app/payload/), the partial-unique-index migration (db/migrations/), docs/NOTES_MIGRATION_SERVICE.md.
  • BE modify: internal/app/repository/contact_notes/base.go (+legacy_crm_note_id, legacy_owner_label) + new CreateNotesBatch (.../create.go or a new file); internal/app/service/contact_notes/contact_notes_service.go (render-path legacy_owner_label fallback — Decision 6); internal/app/api/qontak_crm.go (+ListPersonNotes); internal/server/rest_router.go (register 2 /private/notes routes); internal/worker/worker_service.go (register job); config/load.go (extraction config if a distinct CRM notes endpoint base is needed); go.mod (bluemonday).
  • BE NOT touched: the single-CRUD note path (create.go:12 SetDefaults), /iag/v1/contacts/{id}/notes handlers, existing indexes.
  • CRM (qontak.com): read-only — the CRM squad adds the org-scoped extraction endpoint in its own RFC/PR; no schema or data change in CRM.
  • FE: none — migrated notes render via the existing CDP Notes UI (Out of Scope #8).
  • Shared modules: JobEnqueuer, worker_service, ContactRepository, QontakCrmClient — reused/extended.

3. High-Availability & Security

The migration is async, S2S, off the request path, and per-CID isolated: one CID's failure halts only that CID's job. All dependencies degrade gracefully — CRM extraction failure halts with CRM_EXTRACT_FAILED (retryable); attachment and owner-resolution failures are non-blocking (insert-without / label fallback); count-source failure yields completed_with_errors + alert rather than data loss. Idempotency (Decision 4) makes every halt safely re-runnable.

Performance Requirement

  • API: POST /private/notes/migrate p99 < 300 ms (validate + enqueue; no work on-request); GET .../status is a single Redis read.
  • Worker: ≥ 10,000 notes/hour/CID; ≤ 4h/CID; batch insert ≤ 2s/500; attachment re-upload ≤ 30s/file P95 (PRD §7). Scale workers horizontally (cmd/worker ×M).
  • Resolution: Person→Contact uses the indexed crm_data.id (crm_contact_index) in $in batches — avoids collection scans across ~130 CIDs.
  • Load test (OQ-7): run the chosen extraction path at realistic CID size in staging before Internal QA; add a configurable inter-page delay if the CRM throttles.

Monitoring & Alerting

Observability events (PRD §12) — names preserved exactly: crm_notes_migration_started, _batch_completed, _note_failed, _attachment_failed, _owner_not_resolved, _halted, _completed. BE structured logs via slog.*Context (existing convention, activity_log_migration_consumer.go:27-35). Alerts: halted → PagerDuty P1; match_pct < 99% → P2; attachment-fail

20% → Slack #cdp-ops (PRD §12). SLO: match_pct ≥ 99%/CID; failure rate ≤ 1%/CID; halt rate < 2% in Stage 3.

Logging

  • BE fields: job_id, cid, company_sso_id, legacy_crm_note_id, reason_code, notes_processed, notes_total, failure_rate, duration_seconds.
  • PII scrubbed: never log note HTML body, attachment URLs/tokens, or contact PII — log ids + counts + reason codes only.

Security Implications

  • Threat model: (a) unauthorized bulk migration into another company's data; (b) stored XSS via un-sanitized CRM HTML on a new ingestion surface; (c) cross-tenant attachment exposure (CRM S3 public-read); (d) dangling/false mention links; (e) PII leakage in logs; (f) SSRF via attacker-influenced CRM attachment URLs on the worker's outbound fetch (Decision 8); (g) storage-quota DoS — a malicious or pathologically large CID exhausting CDP attachment storage.
  • AuthN/AuthZ: both endpoints behind mymiddleware.BasicAuth (S2S only; rest_router.go:70; basic_auth.go:10, constant-time compare). A logged-in IAG user token is not accepted (NOTES-MIG-S01/ERR-4). The per-batch company_sso_id is explicit and is applied to every contact query (company_sso_id filter) and every note write, and it namespaces the attachment storage path {company_sso_id}/... — cross-tenant writes are structurally impossible. Caveat: BasicAuth is a single shared credential, so it authenticates that the caller is the trusted S2S principal but cannot attribute which operator triggered a given CID's migration. Front /private/notes/migrate with the platform's gateway/mesh identity (mTLS or per-service token) where available, and log the triggering principal alongside cid/company_sso_id (OQ-11).
  • SSRF / download integrity (attachments): the worker fetches CRM attachment URLs (CRM S3 public-read, Decision 8) — restrict fetches to an allow-list of CRM S3/CDN hosts (reject internal IPs / metadata endpoints), validate magic-bytes vs declared Type, and cap download size. Re-uploaded objects are written only under the caller's {company_sso_id}/... prefix.
  • Storage-quota DoS: bound per-CID attachment volume; the CDP storage quota dependency (§1) is also a DoS control — confirm headroom at Stage 0 (OQ-9) and alert on quota approach.
  • Input sanitization (the headline control): the migrate write path sanitizes the CRM HTML server-side (bluemonday allow-list mirroring CRM's tag set) and strips mention anchors — closing the XSS gap that exists because CDP performs no server-side sanitization today (contact_notes_service.go:268-274). This is defense-in-depth even though CRM also sanitizes (Decision 5).
  • Attachments: re-uploaded into company-scoped CDP storage (never reference raw CRM S3 URLs — alternatives-rejected); validate Type against the allow-set (contact_notes_service.go:286-293).
  • Secrets: CRM credentials via config/load.go (getStringOrPanic, CRM_API_ROOT_URL/CRM_API_AUTH) — no hardcoding; BasicAuth creds from config.BasicAuth.
  • Static analysis: staticcheck ./... (make lint) + gosec (make sec).

Role × Endpoint Authorization Matrix

RoleEndpoint(s)MethodsTenant scopeConstraintAudit
Internal Ops (S2S)/private/notes/migrate, /private/notes/migration/statusPOST/GETexplicit per-batch company_sso_idflag ON per CID; one in-progress job/CIDper-record log + Redis status (7d) + intrinsic audit map (permanent)
Migrated agent (IAG)existing GET /iag/v1/contacts/{id}/notesGETown company (IAG ctx)n/an/a
Client admin / end usernone for migration401/403

Detail 3.A — Failure Mode Catalog

FailureWhereBehaviorCounted asUser/Ops-visible
Flag OFFhandler403 FLAG_DISABLED, no jobn/ayes (Ops)
Already completedhandler409 ALREADY_MIGRATEDn/ayes
Concurrent job/CIDhandler409 JOB_ALREADY_RUNNINGn/ayes
CRM 5xx/timeoutextractor10s per-request timeout; retry 3× backoff (1s/3s/9s) → CRM_EXTRACT_FAILED haltrun haltPagerDuty P1
Contact not mappedresolverskip note, CONTACT_NOT_MAPPEDnote failureerror log
Owner unmappableowner resolverowner_id=null + legacy_owner_labelnon-failure_owner_not_resolved
Attachment download/upload failattachment processorinsert note without attachmentnon-failure_attachment_failed
Batch write error (transient)repo (BulkUpdate upsert)gocraft retries the batch (upsert ⇒ already-written notes are no-ops); persistent → BATCH_WRITE_FAILED, count + continue; >1% → haltnote/run failurePagerDuty if halt
Already-migrated legacy_crm_note_idrepo (upsert)upsert matches → no-opnotes_skippedn/a
Count source unavailablevalidatorVALIDATION_SKIPPED, completed_with_errors, alertrun warningP2
Malformed HTMLnormalizersanitized best-effort + warningnon-failurelog

Detail 3.B — Error Response Catalog

Shape: { "error": "CODE", "message": "...", "details": {} }

EndpointCodeHTTPWhen
migrateFLAG_DISABLED403crm_notes_migration_enabled OFF for CID
migrateALREADY_MIGRATED409CID already completed_success
migrateJOB_ALREADY_RUNNING409in-progress lock held for CID
migrateCID_NOT_FOUND404unknown CID
migrate / status(BasicAuth fail)401/403missing/invalid Basic credentials (not S2S)
statusCID_NOT_FOUND404no status record → treat as not_started

Detail 3.D — Compliance & Data Governance

Triggered — migrated notes contain contact PII (interaction history, attachments).

FieldClassificationLegal basisRetentionEncryptionAccess audit
note (HTML)PIIlegitimate migration of the company's own dataper CDP note lifecycleTLS in transit; storage at-restper-record migration log (30d failed queue)
attachments[].url + objectPIICDP storage policyTLS; company-scoped pathaudit map
legacy_owner_labelmay be a person namewith the noteat-rest
migration status (Redis)internal ids/counts7d TTLat-rest

Right-to-delete (REV-5): migrated notes are stored identically to native CDP notes and inherit the same soft-delete lifecycle — is_deleted set on delete, filtered out on every read (contact_notes/read.go:36-37); the new fields (legacy_crm_note_id/legacy_owner_label) are erased with the document, adding no new barrier to deletion. There is no contact-delete → notes cascade in contact-service today (verified: no caller deletes contact_notes on contact/company deletion), so contact/company erasure does not auto-remove either native or migrated notes — that cascade, if required for UU PDP erasure (including the re-uploaded attachment objects), is a separate platform concern that applies equally to native notes and is out of scope here.

Controls: S2S-only access, explicit per-batch tenant scoping, server-side HTML sanitization, company-scoped attachment storage, no PII in logs (ids + counts + reason codes only), crm_checkin geolocation explicitly dropped (Decision 9). CRM source data is read-only; no deletion during the ≥90d coexistence window (PRD §11.1). OSS/storage data-residency for the re-uploaded PII attachments — InfoSec to confirm the CDP bucket region is UU PDP-compliant (OQ-9).


4. Backwards Compatibility and Rollout Plan

Compatibility

  • BE: all routes are additive (/private/notes/*). contact_notes gains two optional fields; the partial unique index does not touch existing notes (Decision 4); the single-CRUD note path is unchanged (Decision 3). No API version bump.
  • CRM: read-only; the org-scoped extraction endpoint is additive in qontak.com (CRM squad's PR). No CRM schema/data change.
  • FE: none — existing CDP Notes UI renders migrated rows unchanged.

Rollout Strategy

  • Deploy order: CRM extraction endpoint (CRM squad) → contact-service (migrate pipeline + flag default OFF) → Ops triggers per CID. The pipeline is dormant until Ops enqueues a job, and gated by crm_notes_migration_enabled per CID.
  • Feature flag: crm_notes_migration_enabled | default OFF, per CID (PRD §11). Kill-switch = flip OFF (migrate endpoint → 403; no jobs).
  • Stages (PRD §11, §14):
    • Stage 1 — Internal QA: 2 synthetic CIDs (100 + 5,000 notes incl. images/audios/documents + mentions + activity entries). Verify idempotency (zero dup on re-run), timestamp preservation, sanitization + mention-strip, attachment re-link ≥ 95%, match_pct = 100%, activity exclusion.
    • Stage 2 — Pilot: 5–10 CSM-approved CIDs (2 wk); match_pct ≥ 99%, zero pipeline-bug halts, error log root-caused.
    • Stage 3 — Batch: remaining ~120 CIDs per schedule; halt rate < 2%, match_pct ≥ 99% before each cutover.
  • Gate before any CID: pre-migration coverage report (OQ-2) — block job start if Person→Contact coverage (crm_data.id) < 99%.
  • Stop conditions: failure rate > 1%/CID (auto-halt) or halt rate > 2% in Stage 3 → pause rollout, investigate.
  • Rollback: flip crm_notes_migration_enabled OFF (instant; no data migration); in-flight job completes its current batch and stops; migrated rows remain valid (idempotent re-run later). If a bad index migration: make migrate-down (index only; data untouched).
  • Blast radius: flag-ON CIDs only; isolated from read/write contact paths.

Detail 4.A — Configuration Contract

Env var / flagTypeDefaultRequiredProvisionerSecret?
crm_notes_migration_enabledflag (per-CID)OFFyesOps/flag serviceno
CRM_API_ROOT_URLstringyes (exists)config/load.go:197no
CRM_API_AUTHstringyes (exists)config/load.go:198yes
BASIC_AUTH_USERNAME / BASIC_AUTH_PASSWORDstringyes (exists)config/load.go:143-144yes
Notes-migration batch sizeint500 (max 1000)yescode/configno
Notes-only crm_note_type_id setlist(PM-confirmed; default Notes/Documents)yesconfig (OQ-4)no
CRM_NOTES_EXTRACT_TIMEOUTduration10syesconfig/load.go getDurationOrPanicno
Inter-page extraction delayms0 (tune if throttled — OQ-7)noconfigno

Detail 4.B — Test Plan (commands sourced from repo)

LayerCommand (source)What it must prove
BE unitgo test -race -tags dynamic ./internal/app/service/... ./internal/app/consumer/... (source: Makefile make test)sanitizer (XSS + mention-strip + no <p>); timestamp preservation; skip-on-conflict; owner-label fallback; attachment mapping; per-error counting; notes-only filter
BE fullmake test (go test -race -tags dynamic ./internal/... ./config/...)no regression across service
BE lintmake lint (staticcheck ./...)static analysis clean
BE secmake sec (gosec)no new security findings on the ingestion path
BE buildmake build (go build -tags dynamic)compiles
BE migrationmake migrate-up && make migrate-downpartial unique index applies + rolls back; existing notes unaffected
Integrationseeded Mongo: insert N CRM-shaped notes twicere-run inserts only missing; full re-run → migrated=0; concurrent job → 409
Cross-squad (Stage 1)manual: Ops POST → status → notes visible in CDP UIend-to-end incl. CRM extraction + attachment re-link

Detail 4.C — Agent Execution Plan

OrderChunkFiles to modify/createCommandsAcceptance criteria
1Constants + payload + statusinternal/app/service/notes_migration_service.go (new — NotesMigrationJobName, Redis status key like activity_log_migration_service.go:22-31); internal/app/payload/notes_migration.gomake buildbuilds; job-name + payload exported; Redis status read/write mirrors activity-log
2Extend note store + partial unique indexinternal/app/repository/contact_notes/base.go (+legacy_crm_note_id,legacy_owner_label); new CreateNotesBatch over IDbRepo.BulkUpdate (UpdateOneModel+SetUpsert(true)+$setOnInsert, filter (company_sso_id, legacy_crm_note_id); bypass SetDefaults, set caller ts) — not CreateMany; db/migrations/NNN_index_contact_notes_legacy_crm_note_id.{up,down}.json (partial unique)make migrate-up && make migrate-down && go test ... ./internal/app/repository/contact_notes/struct compiles; index up/down; existing notes unaffected; CreateNotesBatch preserves ts; re-run upserts → UpsertedCount=0/MatchedCount=N (no dup, no error)
3Contact + owner resolversinternal/app/consumer/notes_migration_consumer.go (resolve via contact/search.go:27 SearchByAppContactID(ctx,"crm",crmPersonID) — or a batched $in variant — string-cast the id); owner resolve + legacy_owner_labelgo test ... ./internal/app/consumer/resolves by crm_data.id (= crm_person_id); multi-FK person-first; unmapped→CONTACT_NOT_MAPPED; unmappable owner→null+label
4HTML normalizerinternal/pkg/util/html_normalizer.go (+bluemonday in go.mod); deny-by-default policy (Decision 5): UGCPolicy base, structural tags only, AllowStandardURLs+RequireNoFollowOnLinks, no style; strip data-user-id/data-mention anchors→@Name; no <p> wrap; sanitize→then length-checkmake build && go test ... ./internal/pkg/util/<script>/onerror=/javascript: href/style exfil all neutralised; mention anchors→plain text; safe markup preserved; not wrapped in <p>; >10000 post-sanitize → failure (not truncated)
5Attachment processorinternal/app/consumer/... (download from CRM S3/CDN → re-upload to {company_sso_id}/{legacy_crm_note_id}/{asset} → proxy URL + Type)go test ... ./internal/app/consumer/image/audio/document mapped to allowed Type; SSRF host-allow-list (reject non-CRM hosts/internal IPs/metadata); magic-byte vs Type check; max size enforced; download fail → insert-without + ATTACHMENT_*_FAILED
6CRM extraction clientinternal/app/api/qontak_crm.go (+ListPersonNotes(ctx,cid,page,perPage) built on heimdall httpclient.NewClient(WithHTTPTimeout(...)) + retrier — like iag_mekari.go:69-71, not http.DefaultClient); config/load.go (+CRM_NOTES_EXTRACT_TIMEOUT duration, default 10s)make build && go test ... ./internal/app/api/paginated pull with Authorization; 10s timeout + 3× backoff (1s/3s/9s) on timeout/5xx then CRM_EXTRACT_FAILED; notes-only filter applied
7Consumer assembly + worker registrationinternal/app/consumer/notes_migration_consumer.go (ProcessNotesMigrationJob(job *work.Job) reads job.Args["data"]); internal/worker/worker_service.go (register NotesMigrationJobName)go test ... ./internal/app/consumer/ ./internal/worker/end-to-end consumer: extract→resolve→sanitize→relink→CreateNotesBatch→progress; per-CID lock; halt at >1%
8Service + handler + routes + validationinternal/app/service/notes_migration_service.go (ValidateAndEnqueue, GetMigrationStatus, ValidationRunner via CountWithFilters); internal/app/handler/notes_migration_handler.go; internal/server/rest_router.go (2 /private/notes routes under BasicAuth)make build && make lint && make testenqueue returns {job_id}; 403/404/409 guards; match_pct computed; routes BasicAuth-guarded; suite green
9Render-path owner-label fallbackinternal/app/service/contact_notes/contact_notes_service.go:131-136 (use legacy_owner_label when owner_id empty)go test ... ./internal/app/service/contact_notes/author renders label when owner_id empty; existing path unchanged otherwise
10API docdocs/NOTES_MIGRATION_SERVICE.md (markdown — repo has no OpenAPI spec)ls docs/NOTES_MIGRATION_SERVICE.mddoc describes the 2 endpoints + the gocraft/work job

Detail 4.D — Verification & Rollback Recipe

  • Pre-merge (in order): 1) make lint 2) make sec 3) make test 4) make build 5) make migrate-up && make migrate-down.
  • Post-deploy signals (Stage 1): crm_notes_migration_completed count > 0 with match_pct = 100% on the synthetic CIDs; #cdp-ops quiet (no halted/P2); re-run a completed CID → notes_migrated=0 (idempotency proof); migrated notes visible in the CDP Notes UI with original created_at + re-linked attachments.
  • Rollback (in order):
    1. Flip crm_notes_migration_enabled OFF (migrate → 403; no new jobs).
    2. If a bad index migration: make migrate-down (index only; data untouched).
    3. Revert the offending PR; confirm single-CRUD note create/read still works and existing notes are intact (no unique-index collisions).

5. Concern, Questions, or Known Limitations

Resolved by grounding (closed in this RFC):

  • PRD POST /cdp/notes/migrate HTTP batch endpoint → not built; in-process repository batch write instead (Decision 1; /cdp namespace does not exist).
  • PRD S2S model → HTTP Basic auth on /private, mirroring /private/activity_logs/migration/status (Decision 2).
  • Idempotency index → partial unique index (Decision 4) — a plain unique index would corrupt existing notes.
  • Migration job store → Redis status (mirror the existing framework), audit map intrinsic to each migrated document (Decision 2).
  • Person→Contact resolution → crm_data.id (indexed), not source_id (unindexed) (Decision 7).
  • legacy_owner_label is a net-new field the PRD implied but did not enumerate (Decision 6).
  • CRM real extraction API is entity-scoped + unpaginated → CRM squad must build a net-new org-scoped endpoint; extend the existing QontakCrmClient (Decision 10).
  • CRM S3 is public-read → attachment fetch generally needs no signing (correction to PRD "internal creds"); CDP still re-uploads company-scoped (Decision 8).
  • No ≤1 voice_note backend rule exists today (Decision 8 / OQ-8).

Open — adopted default, confirm at the noted gate:

#QuestionAdopted defaultOwnerBlocks?
OQ-1Migration mechanism: CDP gocraft/work consumer vs Bifrost (Postgres→Mongo)?CDP gocraft/work consumer reusing the existing framework (Decision 2/10); Bifrost is the fallbackCDP Eng + Platformconfirm at design kickoff
OQ-2Person→Contact coverage per CID (id-space resolved by REV-1: crm_data.id == crm_person_id)Pre-migration coverage report per CID (CountDocuments/CountWithFilters); block job start if coverage < 99%; unmatched → 30d failed queueCDP / Data Enggate before each CID
OQ-3Failed (CONTACT_NOT_MAPPED) notes → retry queue or permanent error log?30-day failed-record queue (PRD §7.1) with manual retry after mapping backfillPM + Engno
OQ-4 (REV-4)Migrate all crm_notes or notes-only?Notes-only — filter crm_note_type_id (default Notes/Documents, ids (1,6)); confirm exact set with PM; agent reads the set from config, not hardcodedPMconfirm before Stage 1
OQ-5Document attachments in scope?Yes — re-link to CDP doc/pdf/xlsx (Decision 8)PM + Engno
OQ-6Unmappable-owner notes: edit/delete hidden acceptable?Yes for historical notes; legacy_owner_label preserves author displayPMno
OQ-7CRM extraction at bulk throughput (endpoint vs DB; rate limits)Load-test the chosen path in staging before Internal QA; configurable inter-page delayLegacy CRM Squad + CDPStage 0 gate
OQ-8Multiple audios / unsupported types per note (no ≤1 voice_note rule exists today)Map each audio to voice_note; log+skip unsupported types; add a cap only if PM requiresPM + Engno
OQ-9CRM S3 access (confirm still public-read) + CDP storage data residency for re-uploaded PIIConfirm CRM bucket access at Stage 0; InfoSec confirms CDP bucket region UU-PDP-compliantInfoSec + CDP Infra + CRMconfirm at Stage 0
OQ-10bluemonday (or equivalent) dependency approval + sanitizer policyAdopt bluemonday with the deny-by-default policy in Decision 5 (UGCPolicy base, structural tags only, no style, scheme-allow-listed href)CDP BE + InfoSecconfirm before chunk 4
OQ-11Per-caller identity/audit on /private/notes/migrate (BasicAuth is a single shared credential)Front with gateway/mesh identity where available; log the triggering principal with cid/company_sso_idCDP BE + Platformconfirm at design kickoff
_(rfc-reviewer findings REV-1/2/3/5/6 were resolved in this revision — see §6 Comment
logs and the companion review's Findings Ledger; REV-4 remains a PM scoping decision,
captured as OQ-4 above.)_

Known limitations: one-time migration (no ongoing sync); crm_checkin geolocation dropped (Decision 9); mentions become plain text (no live CDP mentions); extraction depends on a net-new CRM endpoint (cross-squad); Redis status is ephemeral (7d) — the durable record is the set of migrated documents themselves. Future: native CDP mentions migration; a richer durable migration-run audit store if re-runs need history beyond 7d.


6. Comment logs

DateComment(s) FromAction Item(s)
2026-06-18rfc-starter (initial draft, grounded vs contact-service + qontak.com live worktrees)Confirm with CRM squad the net-new org-scoped extraction endpoint (OQ-7); Data Eng confirm crm_data.id semantics + coverage (OQ-2); InfoSec confirm bluemonday (OQ-10) + storage residency (OQ-9)
2026-06-18rfc-starter (grounding corrections)Corrected vs PRD: /cdp/notes/migrate not built (in-process write, Decision 1); S2S = BasicAuth on /private (Decision 2); idempotency index must be partial (Decision 4); status Redis-backed; resolve via indexed crm_data.id not source_id (Decision 7); legacy_owner_label is a net-new field (Decision 6); CRM v4 API entity-scoped + unpaginated (Decision 10); CRM S3 public-read (Decision 8); no ≤1 voice_note rule today (OQ-8)
2026-06-18Verification pass (frontmatter linter + mermaid + checklist, run against the live qontak-docs linter)PASS on all 7 gates; pinned lint-docs.mjs reports zero errors attributed to this RFC; 9/9 mermaid blocks render; frontmatter ↔ Metadata table agree; all 6 stories covered once; no placeholders
2026-06-18Security review (Staff-Eng lens + anti-hallucination spot-check vs both worktrees)Grounding CLEAN (every spot-checked path:line verified; no secrets). Hardened in-doc: deny-by-default bluemonday policy (no style/javascript:, sanitize-then-length, ISSUE-1/2); attachment SSRF host-allow-list + magic-byte + size cap (ISSUE-3); deterministic storage key (ISSUE-4); per-caller audit/mesh identity (ISSUE-5, OQ-11); storage-quota DoS added to threat model
2026-06-18rfc-reviewer (backend rubric; score 8.5/10 Agentic-Ready / PROCEED; report co-located at rfc-legacy-migration-crm-notes-review.md)8/11 decisions Resolved, 3 Partial (carry adopted defaults). 6 findings promoted to Open Questions: REV-1→OQ-2 (crm_data.id semantics, top risk), REV-4→OQ-4 (notes-only type-id set), REV-2 (pin extraction timeout/backoff), REV-3 (reword batch-insert failure from HTTP "5xx" to Mongo write-error), REV-5 (right-to-delete path), REV-6 (pin bulk-insert driver call)
2026-06-18Fix REV findings (grounded against live contact-service + qontak.com)REV-1 RESOLVED — confirmed crm_data.id == crm_person_id (contact_sync_request.go:104-105, params_mapper.rb:31, centralized_contacts_controller.rb:120-129; no separate centralized-contact id space); resolve via SearchByAppContactID("crm",…) (search.go:27); OQ-2 downgraded to a coverage gate. REV-2 RESOLVED — extractor uses heimdall httpclient (WithHTTPTimeout, iag_mekari.go:69-71) + retrier, 10s/3× backoff, config CRM_NOTES_EXTRACT_TIMEOUT (current QontakCrmClient has no timeout). REV-3+REV-6 RESOLVED — idempotent upsert via existing IDbRepo.BulkUpdateBulkWrite(SetOrdered(false)) (db.go:180-181), keyed on (company_sso_id, legacy_crm_note_id); no E11000 path; reworded §2.F/§3.A from HTTP to Mongo write semantics. REV-5 RESOLVED — migrated notes inherit native soft-delete (read.go:36-37); no contact-delete→notes cascade exists today (out of scope, applies to native notes equally). REV-4 remains a PM scoping decision (OQ-4).

7. Ready for agent execution

  • yes — for the core BE migration pipeline. The blocking external item (CRM org-scoped extraction endpoint, OQ-7) and the per-CID coverage gate (OQ-2) are prerequisites for running a migration, not for building the pipeline; the extractor is behind an interface so chunks 1–10 can proceed with a stub/contract.

Execution-readiness gates (all met unless noted):

  • §1 PRD-to-Schema — every entity/rule mapped to field + endpoint + enforcement: yes.
  • Detail 1.C Per-Story Change Map — all 6 stories, layer scope, verifiable AC: yes.
  • Repo Reading Guide (Detail 2.0) + contracts classified (reuse/extend/new): yes.
  • Source Verification table — concrete evidence per anchor across both repos: yes.
  • Mermaid: topology, per-service, repo map, component, ER, state, branch/skip, sequence (happy + failure): yes.
  • DDL/collection + partial unique index + per-status lifecycle; every field traces to a PRD-to-Schema row: yes.
  • APIs outbound (2, tagged new-with-justification) + inbound (n/a — puller): yes.
  • Async Job + Concurrency + Responsibility Boundary specs: yes.
  • Failure Mode + Error Response catalogs; Security (XSS sanitization headline, tenant scoping, secrets): yes.
  • Configuration Contract + flag; deploy order (CRM → BE → Ops): yes.
  • Agent Execution Plan (10 chunks, files + commands + verifiable AC): yes.
  • Verification & Rollback Recipe (commands runnable; signals named; partial-index rollback safe): yes.
  • Pending (external, do not block build): CRM extraction endpoint (OQ-7), per-CID coverage ≥99% gate (OQ-2), bluemonday approval (OQ-10), storage residency (OQ-9).

Optional next step: hand to rfc-reviewer for a second-pass score.