RFC: Legacy Migration — CRM Contact Notes → CDP Notes (S2S gocraft/work pipeline, idempotent, timestamp-preserving)
Document Conventions (do not remove)
This RFC follows the Qontak RFC Template format for governance — the metadata table, Confluence sections 1–6, and Comment logs are mandatory. Mark a section
N/A — reasonwhen truly inapplicable rather than deleting it.It is also agent-execution-ready: §1 PRD-to-Schema Derivation (backend RFCs require no Figma), §2 Repo Reading Guide (Detail 2.0), mermaid diagrams, and §4 Agent Execution Plan + Verification & Rollback Recipe are complete before §7 says yes.
The YAML frontmatter at the top is the machine-readable index. The Metadata table below is the human-readable governance record. Both agree on every shared field.
Grounding note (anti-hallucination). Every
path:linereference in this RFC was verified against the live worktreescontact-service(CDP, Go/MongoDB) andqontak.com(Legacy CRM, Rails) on 2026-06-18 (see Detail 2.0 Source Verification). Where the PRD's assumed contract differed from the repo, the repo wins and the deviation is called out. The most consequential corrections: (1) the PRD'sPOST /cdp/notes/migrateHTTP batch endpoint is not built — because the migration runs in-process as agocraft/workconsumer (mirroringActivityLogMigrationConsumer), the insert is a direct repository write, not an internal HTTP call (Decision 1); (2) S2S incontact-serviceis HTTP Basic auth on the/private+/api/v1route groups — the migrate trigger and status live under/private/notes/...(Decision 2); (3) the idempotency unique index must be partial, or it breaks every existing human-created note that has nolegacy_crm_note_id(Decision 4); (4) migration status is Redis-backed like the existing migration framework, not a Mongo job collection (Decision 2).
Metadata
| Field | Value | Notes |
|---|---|---|
| Status | RFC (IDEA) | Human label; YAML status: carries the remapped linter enum draft |
| DRI | Zhelia Alifa | RFC owner (frontmatter dri) |
| Team | cdp | Advisory squad slug carried from PRD / initiative README |
| Author(s) | Zhelia Alifa | Primary author |
| Reviewers | CDP Backend Lead, Legacy CRM Squad Lead, Data Engineering Lead | Tech reviewers across affected squads (CDP BE + CRM + Data) |
| Approver(s) | CDP Tech Lead, InfoSec Approver | Tech leaders + infosec approver |
| Submitted Date | 2026-06-18 | ISO-8601 |
| Last Updated | 2026-06-18 | ISO-8601 |
| Target Release | 2026-Q3 | Quarter |
| Target Quarter | 2026-Q3 | Advisory, carried from PRD |
| Related | ../prds/prd-legacy-migration-crm-notes.md | Source PRD v2.1 |
| Discussion | #cdp-ops (Slack) | Alerts + discussion channel |
Type: backend Sub-type: new-feature
Sections at a Glance
- Overview (PRD-to-Schema Derivation; traceability; per-story change map; no Figma — backend migration, output via existing CDP Notes UI)
- Technical Design (Infrastructure Topology → Technical Decisions [ADR] → Repo Reading Guide → architecture & service map → end-to-end sequences → DDL/Mongo → APIs → integrity / concurrency / async-job specs)
- High-Availability & Security
- Backwards Compatibility and Rollout Plan (Agent Execution Plan + Verification & Rollback Recipe)
- Concern, Questions, or Known Limitations
- Comment logs
- Ready for agent execution
1. Overview
CDP (the contact-service backend, Go + MongoDB) has no CRM-notes migration
capability today — verified: the only notes surface is single-record CRUD under
/iag/v1/contacts/{contact_id}/notes (internal/server/rest_router.go:150-159),
the ContactNote model has no legacy_crm_note_id and no
legacy_owner_label field (internal/app/repository/contact_notes/base.go:26-36),
SetDefaults() overwrites caller timestamps with time.Now()
(base.go:51-54, create.go:12), note writes derive the company from the
user IAG context with no system path (contact_notes_handler.go:75-79,
:478-486), and there is no server-side HTML sanitization — content is
validated by length only (contact_notes_service.go:268-274). On the CRM side,
the assumed extraction contract (GET /crm/notes?organization_id&limit&offset)
does not exist; the real notes API (app/controllers/api/v4/notes.rb:131-147)
is entity-scoped (per lead/company/deal/ticket) and does not actually
paginate.
This RFC specifies a net-new, one-time historical migration pipeline that
ingests ~21,000+ Legacy CRM Person notes across ~130 client accounts (CIDs)
into CDP Notes. It is built by mirroring the migration framework that already
exists in contact-service — ActivityLogMigrationHandler →
ActivityLogMigrationConsumer.ProcessUpdateUserIDJob(job *work.Job) →
activity_log_migration_service.go, with the house status route
GET /private/activity_logs/migration/status (rest_router.go:74). The migration
is triggered by a gocraft/work job enqueue (not a synchronous HTTP call),
runs entirely in-process as an S2S consumer, resolves each CRM Person note to a
CDP contact via the contact's existing crm_data.id linkage
(contact/base.go:53,342-344), sanitizes the CRM rich HTML server-side, re-links
attachments to company-scoped CDP storage, and inserts notes idempotently
(stored legacy_crm_note_id + a partial unique index) while preserving the
original CRM timestamps.
Success Criteria
- Migration completeness ≥ 99%
match_pctper CID (source count vs CDP migrated count) before any CID cutover — PRD §13. - CIDs migrated: 100% of ~130 Notes-using CIDs at
completed_successby CDP GA — PRD §13. - Attachment success ≥ 95% (re-linked images + audios + documents / total) — PRD §13.
- Idempotency: a full re-run where every note already exists inserts zero
duplicates (
notes_migrated=0,notes_skipped=N) — PRD NOTES-MIG-S03/AC-3. - Throughput / window: ≥ 10,000 notes/hour/CID; ≤ 4h window/CID — PRD §7.
- Integrity: failure rate ≤ 1%/CID; halt + alert above; zero silent failures (every failed record logged with a reason code) — PRD §7.
- Timestamp fidelity: migrated notes render in reverse-chronological order by
their original CRM
created_at, not insert time — PRD NOTES-MIG-S02/AC-4, S05/AC-1.
Out of Scope
- No real-time / ongoing sync — one-time historical migration only (PRD §6.1).
- No live @mentions — embedded CRM mention anchors are stripped to plain
@Nametext; native CDP mentions are a separate PRD (PRD §6.2, D-8). - No dedup vs human-created CDP notes — idempotency is enforced only by
legacy_crm_note_id(PRD §6.3). - No client self-service trigger/monitor UI — Ops-triggered S2S only (PRD §6.4).
- No deletion/archival of source CRM notes during the retention window (PRD §6.5). CRM source is read-only and untouched.
- Activity-type entries (Calls/Emails/Meetings/WhatsApp/SMS) are excluded
by default — notes-only filter on
crm_note_type_id(PRD §6.6, OQ-4 default). - No notes for other Qontak products (Inbox/Campaign/Chatbot) (PRD §6.7).
- S06 "Legacy" banner/tag is OUT of scope — the FE has no banner/tag infra
and
CustomerNotehas no metadata field; it is not a no-UI-change item and is re-scoped to a separate FE+BE change (PRD §6.8, D-9). There is no frontend work in this RFC — migrated notes render through the existing CDP Notes UI. crm_checkingeolocation is dropped — a deliberate data-loss decision (Decision 9, PRD D-10).
Related Documents
- Source PRD (v2.1):
../prds/prd-legacy-migration-crm-notes.md - Initiative README:
../README.md - Jira epic: TF-3183 — https://jurnal.atlassian.net/browse/TF-3183
- Sibling RFC (pattern reference):
../../legacy-migration-crm-activity-logs/— sameActivityLogMigration*framework is the in-repo reference this RFC mirrors.
Assumptions
- The CRM
crm_data.idon a CDP contact holds the CRMcrm_person_id— confirmed by grounding (REV-1). The contact document storesCrmData{ID}(contact/base.go:53,342-344), populated from the CRM contact-sync payload'scontact_idforapp_name="crm"(payload/contact_sync_request.go:104-105), and on the CDP-initiated create-back fromCrmContactResponse.CrmID(consumer/send_contact.go:303-323). On the CRM side both values are theCrm::Lead/Crm::Personprimary key (crm/centralized_contacts/params_mapper.rb:31"contact_id": @lead.id.to_s;centralized_contacts_controller.rb:120-129crm_id: lead.id) — there is no separate "centralized-contact" id space (Crm::Lead < Crm::Person, STI oncrm_people). So a Person note'scrm_person_idmatches the CDP contact'scrm_data.iddirectly (string-cast:crm_data.idis stored as a string,base.go:343). An indexcrm_contact_indexexists oncrm_data.id(db/migrations/001_create_contact.up.json). The remaining per-CID coverage (some CRM persons may not have synced to a CDP contact) is a data-quality gate, not an id-space ambiguity — OQ-2. - The CRM exposes no org-scoped notes extraction today (verified: v4 notes API
is entity-scoped and unpaginated,
api/v4/notes.rb:137-147). The Legacy CRM Squad will deliver a net-new S2S org-scoped extraction contract thatcontact-serviceconsumes via the existingQontakCrmClientpattern (internal/app/api/qontak_crm.go, already authenticated throughCRM_API_ROOT_URL+CRM_API_AUTH,config/load.go:197-198) — OQ-7. - CRM attachment originals are served from a CarrierWave/S3 bucket with
public-readACL (config/initializers/carrierwave-s3.rb:27,58) and CDN-rewritten URLs — i.e. generally fetchable without signing. CDP still re-uploads them to company-scoped storage rather than referencing CRM URLs (Decision 8). This corrects the PRD's "internal creds" assumption — confirm the bucket is not later locked down (OQ-9). - MongoDB is schemaless, so adding
legacy_crm_note_id/legacy_owner_labeltocontact_notesneeds no DDL migration — only application struct fields plus one partial unique index migration (db/migrations/, JSON format). - The "≤1 voice_note per note" rule in PRD OQ-8 is not enforced in
contact-servicetoday — note validation accepts any number of attachments of type{image,doc,pdf,video,voice_note,xlsx}(contact_notes_service.go:286-293). This RFC therefore does not add a voice_note cap unless the product owner requires one (OQ-8).
Dependencies
| Dependency | Owner | Availability | Blocking? |
|---|---|---|---|
gocraft/work worker + job registration | CDP BE | Exists — go.mod github.com/gocraft/work v0.5.1; registerJobWithOptions(...) internal/worker/worker_service.go:132,138 | Reuse |
| Existing migration framework to mirror | CDP BE | Exists — ActivityLogMigrationHandler (activity_log_migration_handler.go:32,91), ActivityLogMigrationConsumer.ProcessUpdateUserIDJob (activity_log_migration_consumer.go:25), activity_log_migration_service.go (Redis status key :25, TTL 7d :31, batch 10000 :28) | Reuse (mirror) |
JobEnqueuer.EnqueueJob | CDP BE | Exists — internal/app/service/job_enqueuer.go:38-39,65-67 (work.Q{"data": params, ...}) | Reuse |
/private route group (BasicAuth S2S) | CDP BE | Exists — rest_router.go:69-70,78-79; status route :74; mymiddleware.BasicAuth internal/pkg/middleware/basic_auth.go:10 | Reuse |
contact_notes Mongo store | CDP BE | Exists — repository/contact_notes/base.go:39-41 (TableName()="contact_notes"); no legacy_crm_note_id/legacy_owner_label, no batch insert, no unique index | Extend |
Contact.CrmData.ID linkage + crm_contact_index | CDP BE / Data Eng | Exists — contact/base.go:53,342-344; index db/migrations/001_create_contact.up.json | Reuse (confirm coverage — OQ-2) |
ContactRepository.SearchWithFilters / CountWithFilters | CDP BE | Exists — contact/search.go:125,147 (driven with bson.M{"crm_data.id": {"$in": …}}) | Reuse |
QontakCrmClient (S2S CRM HTTP client) | CDP BE | Exists — internal/app/api/qontak_crm.go:14-24; config CRM_API_ROOT_URL/CRM_API_AUTH config/load.go:197-198 | Extend (new ListPersonNotes) |
| CRM org-scoped Person-notes extraction contract (NET-NEW) | Legacy CRM Squad | Does NOT exist — v4 API is entity-scoped + unpaginated (api/v4/notes.rb:137-147); no /crm/notes/count, no org-scoped bulk | YES |
| Server-side HTML sanitizer (Go) | CDP BE | Net-new — none exists (grep sanitize/bluemonday → 0 hits in notes service) | YES (add lib) |
| CDP company-scoped attachment storage (re-upload) | CDP Infra | Confirm — re-upload to {company_sso_id}/...; quota for full attachment volume incl. documents | YES (confirm) |
User identity (CRM creator_id → SSO UUID) | Launchpad / Identity | Exists — owner-name resolution path contact_notes_service.go:131-136 (GetUserNamesBulk) | NO (degrades quality only) |
| CSM approval + maintenance window | CSM | Per-CID consent | YES (Stage 2+) |
PRD-to-Schema Derivation (backend-specific — required)
Backend RFCs do not require Figma. The "design" is the schema + contracts derived from the PRD's entities, business rules, and consumer needs.
| PRD entity / attribute / rule | Persisted as (collection.field) | Exposed / enforced via | Enforced where | Source |
|---|---|---|---|---|
| A migration run for one CID becomes a durable, pollable job | Redis status record (mirror activity_log_migration:user_id_update) — {status, progress_pct, notes_processed, notes_total, failure_rate, match_pct} | POST /private/notes/migrate (enqueue) + GET /private/notes/migration/status?cid= | NotesMigrationService.ValidateAndEnqueue + BasicAuth; status written by the consumer | PRD §8, §9 #1/#9, NOTES-MIG-S01 |
CRM note id (idempotency key) | contact_notes.legacy_crm_note_id (NET-NEW) + partial unique index (company_sso_id, legacy_crm_note_id) | skip-on-conflict at batch insert | CreateNotesBatch repo method + partial unique index | PRD §7, §9.1, NOTES-MIG-S03 |
CRM note note (sanitized rich HTML) | contact_notes.note (HTML, ≤10000 chars) | server-side sanitize + mention-strip on the migrate write path | HtmlNormalizer in the consumer (net-new) | PRD §9.1, D-4/D-8 |
CRM crm_person_id → CDP contact | contact_notes.contact_id (resolved) | SearchWithFilters(bson.M{"crm_data.id":{"$in":…}}) | ContactResolver in the consumer | PRD §9 #3, §9.1, D-6/D-12 |
CRM creator_id → owner | contact_notes.owner_id + contact_notes.legacy_owner_label (NET-NEW) | live owner-name resolution unchanged; label shown when owner_id=null | OwnerResolver; render path contact_notes_service.go:131-136 | PRD §9 #4, §9.1, D-7 |
CRM created_at/updated_at (TZ) | contact_notes.created_at/updated_at (UTC, preserved) | migrate path bypasses SetDefaults() | CreateNotesBatch sets timestamps explicitly | PRD §9.1, D-2 |
CRM crm_note_images / crm_note_audios / crm_note_attachments (documents) | contact_notes.attachments[] ({url,type,file_size*,file_name}) | re-upload to {company_sso_id}/... → proxy URL; type ∈ {image,doc,pdf,video,voice_note,xlsx} | AttachmentProcessor in the consumer | PRD §9 #6, §9.1, D-10 |
CRM crm_note_type_id (activity taxonomy) | (filter — not stored) | notes-only filter (exclude Calls/Emails/…) | CRMExtractor query / consumer filter | PRD §9.1, OQ-4 |
CRM crm_checkin (geolocation) | (NOT migrated) | explicit drop; logged per note | consumer (no field written) | PRD §9.1, D-10 |
| Source-vs-CDP count validation | Mongo CountWithFilters on contact_notes where legacy_crm_note_id exists | match_pct in status | ValidationRunner in the consumer | PRD §9 #8, NOTES-MIG-S04 |
Every §2.3 collection field and every §2.4 endpoint traces back to a row here.
Detail 1.A — PRD Traceability Matrix
Forward (PRD AC → RFC):
| PRD composite AC id | Service / endpoint / job | RFC section |
|---|---|---|
| NOTES-MIG-S01/AC-1 | POST /private/notes/migrate → enqueue NotesMigrationJobName | §2.4 row 1 · Decision 1/2 |
| NOTES-MIG-S01/AC-2 | GET /private/notes/migration/status?cid= (progress) | §2.4 row 2 · §2.F |
| NOTES-MIG-S01/AC-3 | consumer → completed_success (failure ≤1%, match ≥99%) | §2.2 · §2.F · §3 |
| NOTES-MIG-S01/ERR-1 | flag OFF → 403 FLAG_DISABLED | §2.4 · §3.B |
| NOTES-MIG-S01/ERR-2 | already completed → 409 ALREADY_MIGRATED | §2.4 · §3.B |
| NOTES-MIG-S01/ERR-3 | failure >1% → halt, halted, PagerDuty | §2.2 (failure) · §3 Monitoring |
| NOTES-MIG-S01/ERR-4 | non-S2S call → 401/403 (BasicAuth; no IAG/user path) | Decision 2 · §3 Role × Endpoint |
| NOTES-MIG-S02/AC-1 | ContactResolver via crm_data.id | §2.4 algorithm · Decision 7 |
| NOTES-MIG-S02/AC-2 | HtmlNormalizer sanitize + mention-strip; no <p> wrap | Decision 5 · §3 Security |
| NOTES-MIG-S02/AC-3 | AttachmentProcessor re-links documents | Decision 8 |
| NOTES-MIG-S02/AC-4 | preserved created_at/updated_at | Decision 3 |
| NOTES-MIG-S02/AC-5 | skip on existing legacy_crm_note_id | Decision 4 |
| NOTES-MIG-S02/ERR-1 | no contact match → CONTACT_NOT_MAPPED, skip | §2.4 · §3.A |
| NOTES-MIG-S02/ERR-2 | attachment download fail → insert without it | Decision 8 · §3.A |
| NOTES-MIG-S02/ERR-3 | owner unmappable → owner_id=null + label | Decision 6 |
| NOTES-MIG-S02/ERR-4 | activity-type note → skipped (out-of-scope count) | §2.4 filter · OQ-4 |
| NOTES-MIG-S03/AC-1..AC-3 | idempotent re-run via partial unique index | Decision 4 · §2.E |
| NOTES-MIG-S03/ERR-1 | two concurrent jobs/CID → one runs, other 409 | §2.E · Decision 2 |
| NOTES-MIG-S04/AC-1, AC-2 | ValidationRunner count compare → match_pct | §2.4 · §2.F |
| NOTES-MIG-S04/ERR-1, ERR-2 | match_pct<99% / count unavailable → completed_with_errors | §2.F · §3 |
| NOTES-MIG-S05/AC-1..AC-3 | render via existing CDP Notes UI (orig ts, attachment, label) | §1 Out of Scope #8 (no FE work) · Decision 6 |
| NOTES-MIG-S05/ERR-1, ERR-2 | missing attachment / hidden edit-delete — existing UI behavior | §1 Out of Scope #8 · OQ-6 |
| NOTES-MIG-S06-NEG/NEG-1 | mention anchors → plain text | Decision 5 |
| NOTES-MIG-S06-NEG/NEG-2 | activity entries excluded | §2.4 filter · OQ-4 |
Reverse (RFC → PRD AC):
| New endpoint / field / job / dependency | PRD composite AC id it serves |
|---|---|
POST /private/notes/migrate | NOTES-MIG-S01/AC-1, ERR-1, ERR-2, ERR-4 |
GET /private/notes/migration/status | NOTES-MIG-S01/AC-2, AC-3; NOTES-MIG-S04/* |
NotesMigrationConsumer.ProcessNotesMigrationJob | NOTES-MIG-S01/AC-3; S02/; S04/ |
contact_notes.legacy_crm_note_id + partial unique index | NOTES-MIG-S02/AC-5; NOTES-MIG-S03/AC-1..AC-3, ERR-1 |
contact_notes.legacy_owner_label (net-new) | NOTES-MIG-S02/ERR-3; NOTES-MIG-S05/AC-3 |
CreateNotesBatch (timestamp-preserving, bypass SetDefaults) | NOTES-MIG-S02/AC-4; NOTES-MIG-S05/AC-1 |
HtmlNormalizer (sanitize + mention-strip) | NOTES-MIG-S02/AC-2; NOTES-MIG-S06-NEG/NEG-1 |
AttachmentProcessor | NOTES-MIG-S02/AC-3, ERR-2; NOTES-MIG-S05/AC-2, ERR-1 |
QontakCrmClient.ListPersonNotes (+ CRM net-new endpoint) | NOTES-MIG-S01/AC-1; S02/* (extraction) |
UI / Consumer Surface Coverage
| PRD-named surface | Consumer | Required reads | Required writes | Status surface |
|---|---|---|---|---|
| Migration trigger | Ops (S2S) | n/a | POST /private/notes/migrate | job_id + Redis status |
| Migration monitor | Ops (S2S) | GET /private/notes/migration/status?cid= | n/a | status/progress_pct/match_pct |
| Migrated notes panel | web + mobile (existing CDP Notes UI) | existing GET /iag/v1/contacts/{id}/notes | n/a — populated by the consumer | note created_at, owner_name/legacy_owner_label, attachments[] |
The notes panel is existing UI — no FE work in this RFC (Out of Scope #8).
Role Coverage
| PRD role | Authorization mechanism | Endpoints permitted | Cross-tenant? | Audit trail |
|---|---|---|---|---|
| Internal Ops / Migration Engineer | HTTP Basic auth (S2S, /private) | POST /private/notes/migrate, GET /private/notes/migration/status | yes — explicit per-batch company_sso_id (this is the only system path; note CRUD cannot do this) | per-record success/failure log + Redis status (7d) + audit map (permanent) |
| Migrated Sales/Support Agent | IAG JWT (existing) | existing GET /iag/v1/contacts/{id}/notes only | no — company-scoped by IAG context | n/a (read of migrated data) |
| Client admin / end user | — | none for migration | no | 401/403 (not a logged-in path) |
PRD Section Coverage
| PRD § | Title | Where covered |
|---|---|---|
| 3 | One-liner + Problem | §1 Overview |
| 4 | What happens if we don't build | §1 Overview (problem) |
| 5 | Target Users + Persona | §1 Detail 1.A Role Coverage |
| Scope Changes | affected surfaces | frontmatter scope_changes + §2.I |
| 6 | Non-Goals | §1 Out of Scope |
| 7 | Constraints | §2 Technical Decisions, §2.4, §3 |
| 7.1 | Data Lifecycle | §2.3 per-status lifecycle + §3.D Compliance |
| 8 | New Features (component tree) | §2.1 Architecture + §2.I Scope Boundaries |
| 9 | API & Webhook Behavior | §2.4 APIs + §2.2 Sequences |
| 9.1 | Schema Mapping | §1 PRD-to-Schema + §2.3 DDL |
| 10 | System Flow + Stories + ACs | §2.2 Sequences + §1 Detail 1.A/1.C |
| 11 | Rollout | §4 Rollout Strategy |
| 12 | Observability | §3 Monitoring & Alerting |
| 13 | Success Metrics | §1 Success Criteria + §3 SLO |
| 14 | Launch Plan & Stage Gates | §4 Rollout Strategy |
| 15 | Dependencies | §1 Dependencies + §2.F.1 Responsibility Boundary |
| 16 | Key Decisions + Alternatives | §2 Technical Decisions (ADR) + §1 Detail 1.B |
| 17 | Open Questions | §5 Concerns / Open Questions |
| App. A | Grounded Code References | §2.0 Repo Reading Guide + Source Verification |
Detail 1.B — Key Decisions Summary (full ADR treatment in §2 Technical Decisions)
| # | Decision | Chosen option | §2 block | PRD ref |
|---|---|---|---|---|
| 1 | Insert mechanism | In-process repository batch write (no internal HTTP /cdp/notes/migrate) | Decision 1 | D-1/D-11 |
| 2 | Trigger + status | gocraft/work job + /private/... (BasicAuth) + Redis status | Decision 2 | D-3/D-11 |
| 3 | Timestamps | Caller-set; migrate path bypasses SetDefaults() | Decision 3 | D-2 |
| 4 | Idempotency | legacy_crm_note_id + partial unique index | Decision 4 | D-1 |
| 5 | Content | Server-side sanitize (net-new) + strip mention anchors; no <p> wrap | Decision 5 | D-4/D-8 |
| 6 | Owner | owner_id=null + net-new legacy_owner_label fallback | Decision 6 | D-7 |
| 7 | Person→Contact | Resolve via crm_data.id (indexed); precedence for multi-FK | Decision 7 | D-6/D-12 |
| 8 | Attachments | Re-upload images+audios+documents to company-scoped storage | Decision 8 | D-10 |
| 9 | Check-in | Geolocation dropped (deliberate data loss) | Decision 9 | D-10 |
| 10 | Extraction | Extend QontakCrmClient; CRM squad adds org-scoped endpoint | Decision 10 | D-5 |
| 11 | Activity scope | Notes-only filter on crm_note_type_id | Decision 11 | OQ-4 |
Minimum-coverage notes — Storage: reuse
contact_notesMongo (Decision 1/4). Sync vs async: asyncgocraft/work(Decision 2). Caching:n/a — one-shot backfill; no read cache. Third-party: CRM via extendedQontakCrmClient(Decision 10). Consistency: per-record eventual; idempotency makes re-runs safe (Decision 4). Multi-tenancy: explicit per-batchcompany_sso_id+ company-scoped queries/storage (Decision 2/8). Reuse vs new: every endpoint tagged in §2.4.
Detail 1.C — Per-Story Change Map
| Story id | Title | Layer scope | BE changes (concrete artifacts) | Composite AC ids | Acceptance criteria (verifiable) | RFC anchors |
|---|---|---|---|---|---|---|
| NOTES-MIG-S01 | Run batch migration for a CID | BE-only | POST /private/notes/migrate + GET /private/notes/migration/status handlers (internal/app/handler/), NotesMigrationService.ValidateAndEnqueue (internal/app/service/), NotesMigrationConsumer.ProcessNotesMigrationJob (internal/app/consumer/), NotesMigrationJobName const, worker registration (internal/worker/worker_service.go), Redis status record | S01/AC-1, AC-2, AC-3, ERR-1..ERR-4 | go test: enqueue returns {job_id}; status returns progress; 403 flag OFF; 409 already-migrated; 401/403 when not BasicAuth | §2.4 rows 1-2 · §4.D chunks 1,7,8 · §1 PRD-to-Schema rows 1-2 |
| NOTES-MIG-S02 | Transform CRM note → CDP schema | BE-only | ContactResolver (SearchWithFilters on crm_data.id), OwnerResolver, HtmlNormalizer (sanitizer), AttachmentProcessor, CreateNotesBatch (timestamp-preserving) | S02/AC-1..AC-5, ERR-1..ERR-4 | go test: contact resolved by crm_data.id; HTML sanitized, mentions→text, no <p> wrap; document re-linked; ts preserved; dup skipped; each ERR path logged + counted | §2.4 algorithm · Decisions 3,5,6,7,8 · §4.D chunks 2-6 |
| NOTES-MIG-S03 | Idempotent re-run | BE-only | contact_notes.legacy_crm_note_id field + partial unique index migration; skip-on-conflict in CreateNotesBatch; in-progress guard (Redis) per CID | S03/AC-1..AC-3, ERR-1 | go test/integration: re-run inserts only missing; existing skipped; full re-run → migrated=0; concurrent jobs → one runs, other 409 | Decision 4 · §2.E · §4.D chunks 2,7 |
| NOTES-MIG-S04 | Validation & error reporting | BE-only | ValidationRunner (CountWithFilters on legacy_crm_note_id existence vs CRM count); structured failed-record log {legacy_crm_note_id, reason_code, details} | S04/AC-1, AC-2, ERR-1, ERR-2 | go test: match_pct computed; ≥99% → success; <99% → completed_with_errors + alert; count unavailable → VALIDATION_SKIPPED | §2.4 · §2.F · §3 Monitoring · §4.D chunk 8 |
| NOTES-MIG-S05 | View migrated notes in CDP | BE + FE consumes existing | No FE/BE UI work — migrated rows are read by the existing notes endpoint + UI; legacy_owner_label makes the author render | S05/AC-1, AC-2, AC-3, ERR-1, ERR-2 | manual/Stage-1: opening a contact shows migrated notes with original created_at (reverse-chron), attachment downloads from CDP storage, legacy_owner_label shows for unmapped owners | §1 Out of Scope #8 · Decision 6 · OQ-6 |
| NOTES-MIG-S06-NEG | Mentions not live; activities not flooded | BE-only (guard rail) | HtmlNormalizer strips mention anchors; consumer filters activity crm_note_type_id | S06-NEG/NEG-1, NEG-2 | go test: <a data-user-id> → plain @Name, no mention/notification; activity-type note excluded | Decision 5 · §2.4 filter · OQ-4 |
Coverage: all 6 PRD stories present exactly once. NOTES-MIG-S05 is
FE consumes existing— no new FE work (Out of Scope #8); the only backend enabler is thelegacy_owner_labelfield (Decision 6).
2. Technical Design
Infrastructure Topology
Deployment topology
flowchart TB
ops([Internal Ops / Migration Engineer]) -->|"HTTPS + Basic auth"| lb[API Gateway / Ingress]
lb -->|"POST /private/notes/migrate"| api["contact-service api pods xN<br/>(cmd/server, Chi router /private)"]
api -->|"enqueue NotesMigrationJobName"| q[["gocraft/work queue<br/>(Redis-backed)"]]
api -->|"write status"| redis[("Redis<br/>(migration status, TTL 7d)")]
q -->|consume| worker["contact-service worker pods xM<br/>(cmd/worker, NotesMigrationConsumer)"]
worker -->|"read crm_data.id / write notes"| mongo[("MongoDB primary<br/>(contacts, contact_notes)")]
worker -->|"update status / progress"| redis
worker -->|"HTTPS, Authorization: CRM_API_AUTH"| crm(["Legacy CRM<br/>(qontak.com, net-new extraction endpoint)"])
worker -->|"HTTPS download (public-read/CDN)"| s3(["CRM CarrierWave S3 / CDN"])
worker -->|"re-upload company-scoped"| store[("CDP attachment storage<br/>{company_sso_id}/...")]
worker -->|"creator_id to SSO UUID"| lp(["Launchpad / Identity"])
agent([Migrated agent]) -->|"GET /iag/v1/contacts/{id}/notes"| lb
Per-service responsibility
flowchart LR
subgraph cs["contact-service (CDP Backend)"]
ep1["POST /private/notes/migrate<br/>(trigger; BasicAuth S2S)"]
ep2["GET /private/notes/migration/status<br/>(monitor; BasicAuth S2S)"]
cons["NotesMigrationConsumer<br/>(extract to transform to insert to validate)"]
end
ep1 -->|"enqueue gocraft/work"| cons
cons -->|"HTTPS — ListPersonNotes (extend QontakCrmClient)"| crm(["Legacy CRM (CRM squad)"])
cons -->|"HTTPS — download originals"| s3(["CRM S3 / CDN"])
cons -->|"re-upload"| store(["CDP company-scoped storage (CDP Infra)"])
cons -->|"creator_id to SSO"| lp(["Launchpad (Identity)"])
cons -->|"resolve / batch insert"| db[("MongoDB: contacts, contact_notes")]
cons -->|"status / progress"| redis[("Redis")]
| Service | Use cases (this RFC) | Internal calls | External / third-party APIs |
|---|---|---|---|
contact-service (server) | validate + enqueue migration; serve status | JobEnqueuer.EnqueueJob, Redis status, BasicAuth | — |
contact-service (worker) | extract, resolve, sanitize, re-link, batch-insert, validate | ContactRepository.SearchWithFilters/CountWithFilters, contact_notes repo (CreateNotesBatch), Redis status | Legacy CRM (extraction, CRM squad); CRM S3/CDN; CDP storage (CDP Infra); Launchpad (Identity) |
Technical Decisions (ADR-format — the engineering heart)
Decision 1: Insert via an in-process repository batch write — not an internal POST /cdp/notes/migrate HTTP endpoint
Context The PRD (D-1, §9 #7) specifies a net-new POST /cdp/notes/migrate
batch S2S endpoint that a CDPNoteInserter calls. But the PRD also mandates D-11:
mirror the in-process migration framework that already exists in
contact-service (ActivityLogMigrationConsumer). These two are in tension: if
the migration consumer runs inside contact-service, it can write to the
contact_notes collection directly — an internal HTTP endpoint would mean the
service calling itself over the wire.
Options considered
- Option A — in-process batch write. The
NotesMigrationConsumercalls a new repository methodCreateNotesBatch(ctx, []ContactNote)directly (mirroring howActivityLogMigrationConsumercallsactivity_log_migration_servicein-process).- Pros: no self-HTTP; reuses the proven framework; one fewer public surface to auth/rate-limit; transactional control over skip-on-conflict.
- Cons: the batch-insert logic is not independently callable by an external
orchestrator (acceptable — Ops triggers via the
/privateenqueue path).
- Option B — build
POST /cdp/notes/migrate(PRD literal). An HTTP batch endpoint the consumer (or an external orchestrator) POSTs to.- Pros: matches PRD prose; reusable by a future external orchestrator.
- Cons: redundant when the consumer is in-process; adds an auth surface; the
/cdp/...namespace does not exist in the router (all routes are/iag/v1,/api/v1,/private,rest_router.go).
Decision Option A. The only HTTP surfaces are the /private/notes/migrate
trigger and the /private/notes/migration/status monitor (Decision 2). Insert is
a direct, idempotent repository write.
Rationale Anti-hallucination grounding: there is no /cdp route group; the
existing migration framework is in-process. Building a self-called HTTP endpoint
would fork the convention for no functional gain.
Consequences A new CreateNotesBatch repo method (skip-on-conflict via the
partial unique index, Decision 4) and a payload struct. If a future cross-service
caller needs batch insert, expose it then as a thin handler over the same method.
Reversibility High — adding an HTTP handler over CreateNotesBatch later is
additive.
Decision 2: Trigger = gocraft/work enqueue under /private (HTTP Basic auth); status is Redis-backed
Context Bulk migration has no logged-in user, so the existing note-write
path (company derived from IAG context, contact_notes_handler.go:75-79,478-486)
cannot serve it. The PRD (D-3/D-11) requires S2S with an explicit per-batch
company_sso_id, a job-enqueue trigger (not synchronous HTTP), and a status
endpoint under the house namespace.
Options considered
- Option A — mirror
ActivityLogMigration.POST /private/notes/migrate(BasicAuth) →NotesMigrationService.ValidateAndEnqueue→EnqueueJob(NotesMigrationJobName, …); consumerProcessNotesMigrationJob(job *work.Job)readsjob.Args["data"]; status stored in Redis (key likenotes_migration:{cid}, TTL 7d) and read byGET /private/notes/migration/status?cid=.- Pros: exact reuse of the proven framework (
activity_log_migration_consumer.go:25,activity_log_migration_service.go:64-86, status routerest_router.go:74); BasicAuth is the established S2S mechanism; async survives the request lifecycle. - Cons: status is Redis (ephemeral, 7d TTL) — acceptable; the permanent audit map
of
legacy_crm_note_id → CDP note idlives in thecontact_notesdocuments themselves (PRD §7.1).
- Pros: exact reuse of the proven framework (
- Option B — synchronous HTTP migrate. Reject — a >21k-note CID exceeds any HTTP timeout; the framework is built for async.
- Option C — new bearer/system-token auth. Reject —
contact-servicehas no bearer S2S middleware;/private+/api/v1are guarded bymymiddleware.BasicAuth(rest_router.go:70,79,280;basic_auth.go:10). The S2Sfield_propertiesmigrate also uses BasicAuth (rest_router.go:344-349).
Decision Option A. Endpoints under /private/notes/..., BasicAuth. This is
a grounded deviation from the PRD's /cdp/notes/migrate path — the repo's S2S
namespace is /private and its S2S auth is HTTP Basic.
Rationale Maximum reuse + correct auth grounding. NOTES-MIG-S01/ERR-4 (reject
non-S2S) is satisfied by BasicAuth: a logged-in IAG user token is simply not
accepted on /private.
Consequences Redis status is ephemeral (7d). The per-record failed queue
(reason codes, 30d, PRD §7.1) and the permanent audit map are separate: the audit
map is intrinsic (each migrated note stores its legacy_crm_note_id); the failed
queue is a structured log stream (OQ-3 retry policy).
Reversibility High — endpoints are additive; framework is reused.
Decision 3: Caller-set timestamps — the migrate write path bypasses SetDefaults()
Context ContactNote.SetDefaults() unconditionally overwrites
CreatedAt/UpdatedAt with time.Now() (base.go:51-54) and is called in
create.go:12 before the Mongo insert. Migrated notes must keep their original
CRM timestamps so the existing UI renders them in correct reverse-chronological
order (NOTES-MIG-S05/AC-1).
Options considered
- Option A —
CreateNotesBatchsets timestamps explicitly and never callsSetDefaults()(it setsIsDeleted=false/Attachments=[]itself for the fieldsSetDefaultswould otherwise initialise). Pros: surgical; leaves the single-CRUDcreate.gopath untouched. Cons: must replicate the non-timestamp defaultsSetDefaultsprovides. - Option B — add a flag to
SetDefaults()to skip timestamp overwrite. Cons: changes a shared method used by the live single-CRUD path; higher blast radius.
Decision Option A.
Rationale Lowest blast radius — the live note-create path is unchanged.
Consequences CreateNotesBatch owns default initialisation for migrated rows;
a unit test asserts the stored created_at equals the CRM value (not insert time).
Reversibility High — internal repo method.
Decision 4: Idempotency via legacy_crm_note_id + a PARTIAL unique index
Context Re-runnable migration requires skip-on-conflict. ContactNote has no
legacy_crm_note_id (base.go:26-36) and contact_notes has four non-unique
indexes only (db/migrations/013_create_contact_notes.up.json).
Critical grounding (correctness). A naive unique index on
(company_sso_id, legacy_crm_note_id) would break every existing
human-created note: those documents have no legacy_crm_note_id, MongoDB indexes
a missing field as null, and the second such note per company collides on the
null key → E11000 duplicate key. The index must be partial, indexing only
documents where the field exists.
Options considered
- Option A — partial unique index + idempotent upsert
{company_sso_id:1, legacy_crm_note_id:1}withpartialFilterExpression: {legacy_crm_note_id: {$exists: true}}.CreateNotesBatchwrites via the existingIDbRepo.BulkUpdate(ctx, "contact_notes", []mongo.WriteModel)(db.go:180-181, alreadyBulkWrite(SetOrdered(false))) usingmongo.NewUpdateOneModel().SetFilter(bson.M{"company_sso_id":…, "legacy_crm_note_id":…}).SetUpdate(bson.M{"$setOnInsert": note}).SetUpsert(true)per note. This is idempotent — an already-migrated note is a no-op ($setOnInsertmatches the partial-unique key), so noE11000is thrown and none needs catching (REV-3/REV-6).result.UpsertedCount= newly inserted;MatchedCount= skipped. The per-CID lock (§2.E) means no concurrent writer, so upsert + partial unique index is race-free. Pros: existing notes untouched; safe re-runs; uses an existing repo method. Cons: must remember the partial filter (captured here + in the migration JSON). - Option B — plain unique index. Reject — corrupts existing data on first collision.
- Option C — app-level "check-then-insert". Reject — race-prone; the DB constraint is the correct backstop.
Decision Option A. Migration JSON (db/migrations/NNN_index_contact_notes_legacy_crm_note_id.up.json)
adds a createIndexes entry with "unique": true + the partialFilterExpression,
following the existing 013_create_contact_notes.up.json JSON pattern; .down.json
drops it.
Rationale The partial filter is the only correct way to add uniqueness to a collection whose legacy rows lack the field.
Consequences CreateNotesBatch counts result.MatchedCount as notes_skipped
and result.UpsertedCount as notes_migrated — no duplicate-key error path exists
to handle (the upsert is the skip). Note: CreateNotesBatch is implemented over
BulkUpdate (upsert), not IDbRepo.CreateMany (db.go:123, a default-ordered
InsertMany that would abort the whole batch on the first duplicate). NOTES-MIG-S03/AC-2/AC-3
verified by integration test.
Reversibility High — make migrate-down drops the index; data untouched.
Decision 5: Server-side HTML sanitization (net-new) + strip mention anchors to plain text; no <p> re-wrap
Context CRM note content is sanitized rich HTML at write time in Rails
(crm/note.rb:43 before_save :sanitize_note → :379 → application_helper.rb:296-325,
Rails sanitize allowing a b i strong em u s span br div p ul ol li blockquote h1-h6 pre + attrs incl. data-user-id). CDP performs no server-side
sanitization today — content is length-validated only (contact_notes_service.go:268-274).
The migrate write path is a new ingestion surface for externally-sourced HTML, so
it must sanitize defensively (XSS posture; PRD D-4, Constraint §7 "Security").
Mentions are embedded as <a data-user-id="…">//users/{id}/edit_user referencing
CRM integer user IDs (crm/note.rb:99-107) that do not resolve in CDP (PRD D-8).
Options considered
- Option A — sanitize with a Go allow-list sanitizer (
bluemonday), policy mirroring CRM's allowed tags, and a pre-pass that replaces mention anchors with their inner@Nametext. Pros: defends the new ingestion surface; preserves safe markup; kills dangling mention links. Cons: adds ago.moddependency (bluemonday) — verify license/approval. - Option B — trust CRM (already sanitized), store as-is. Reject — CDP would inherit CRM's allow-list decisions for an unauthenticated bulk write path; defense-in-depth requires CDP to sanitize at its own boundary; mention anchors would remain as dead links.
- Option C — strip all HTML to plain text / wrap in
<p>(v1.1 assumption). Reject — corrupts the rich markup the UI renders via DOMPurify; the PRD explicitly forbids<p>re-wrap (§9.1, alternatives-rejected).
Decision Option A, with a deny-by-default policy specified explicitly (do
not simply "mirror CRM" — CRM's Rails allow-list permits style/class/
data-mce-href, and style enables CSS-based UI-redress while an unscoped href
permits javascript:/data: URIs). Concrete bluemonday policy:
- Start from
bluemonday.UGCPolicy()(stripsstyle, scripts, event handlers). - Allow only the structural tags
a b i strong em u s span br div p ul ol li blockquote h1 h2 h3 h4 h5 h6 pre. - On
<a>:AllowStandardURLs()(http/https/mailto only — nojavascript:/data:) +RequireNoFollowOnLinks(true); dropstyle. - Pre-pass: replace every
data-user-id/data-mentionanchor (and/users/{id}/edit_userlinks) with its visible text prefixed@, then sanitize. - Do not wrap output in
<p>. - Order: sanitize first, then validate the sanitized output against the
existing
max=10000length rule (contact_notes_service.go:271-274) — a note exceeding 10000 chars post-sanitize is counted a failure, never silently truncated.
Rationale A bulk S2S write of externally-sourced HTML is exactly where server-side sanitization belongs; deny-by-default closes the stored-XSS gap (CDP has no sanitization today) without inheriting CRM's looser attribute allow-list; mention-stripping prevents dangling links + false notifications (Out of Scope #2).
Consequences New dependency (bluemonday, InfoSec approval OQ-10) + a
HtmlNormalizer unit suite: XSS payloads (<script>, onerror=, javascript:
href, style exfil) all neutralised; mention-strip; malformed HTML → best-effort +
warning; no <p> wrap; post-sanitize length enforced.
Reversibility Medium — sanitization is internal; the allow-list can be tuned.
Decision 6: Unmappable owner → owner_id=null + a net-new legacy_owner_label
Context owner_name is resolved live from identity, not stored
(contact_notes_service.go:131-136 GetUserNamesBulk); edit/delete permission is
computed live (contact_notes_handler.go:143-166). A migrated note whose CRM
creator_id has no SSO mapping would render a blank author and hidden
edit/delete. ContactNote has no field to carry a fallback label
(base.go:26-36).
Options considered
- Option A — add
legacy_owner_label stringtoContactNote(net-new, schemaless → no DDL). Whencreator_id→SSO fails, setowner_id=null+legacy_owner_label(e.g. "[Legacy CRM User]" or the CRM display name). The render path falls back to the label whenowner_idis empty. Pros: author always renders; non-blocking. Cons: the existing render path (contact_notes_service.go:131-136) must learn to use the label (a small, bounded read-path change — note this is the only read-path touch and it does not alter the UI contract). - Option B — drop unmappable notes. Reject — loses history (the whole point).
- Option C — store a sentinel
owner_id. Reject — pollutes identity space; the label is cleaner.
Decision Option A.
Rationale Preserves author display without inventing identities; degrades gracefully (NOTES-MIG-S02/ERR-3, S05/AC-3).
Consequences Net-new field + a render-path branch; edit/delete may be hidden for label-only authors (acceptable for historical notes — OQ-6).
Reversibility High — additive field.
Decision 7: Resolve Person→Contact via crm_data.id (indexed), not source_id; precedence for multi-FK
Context The CDP contact stores the CRM linkage as Source, SourceID,
SourceName (contact/base.go:68-70) and CrmData{ID} (:53,342-344). An index
crm_contact_index exists on crm_data.id (db/migrations/001_create_contact.up.json);
no index exists on source_id. A CRM note can carry person/company/deal/ticket
FKs simultaneously — type is STI metadata, not a constraint
(crm/note.rb:5-8, schema type column).
Options considered
- Option A — resolve by
crm_data.idusing the purpose-builtContactRepository.SearchByAppContactID(ctx, "crm", crmPersonID)(contact/search.go:27), whoseappNameColumnMappermaps"crm" → "crm_data.id"(contact/base.go:531-538); for batches, call it per id or extend it to an$invariant. Apply person-first precedence for multi-FK notes. Pros: hits the existingcrm_contact_index; reuses a method built exactly for "resolve by source app + contact id";crm_data.id == crm_person_idis confirmed (REV-1, see Assumptions). Cons: none material — coverage (not id-space) is the only variable (OQ-2). - Option B — resolve by
source_id. Reject as primary — unindexed → slow scans over ~130 CIDs; keep only as fallback. - Option C — net-new external mapping table. Reject as primary — the linkage
already exists on the contact (PRD D-12); a table is a last-resort fallback where
crm_data.idcoverage is incomplete (OQ-2).
Decision Option A.
Rationale Uses the indexed field and a purpose-built query method; the id-space
is confirmed (REV-1), so the only variable is coverage, measured cheaply via
CountDocuments/CountWithFilters for the pre-migration report.
Consequences OQ-2 is now a coverage gate only (not id-space): run the
per-CID coverage report and gate job start at ≥99%; unmatched notes →
CONTACT_NOT_MAPPED failed queue. crm_data.id is a string, so the resolver
string-casts the CRM crm_person_id before lookup.
Reversibility High — resolution strategy is internal; a mapping-table fallback is additive.
Decision 8: Re-link images + audios + documents to company-scoped CDP storage
Context A CRM note has three attachment associations, all CarrierWave/S3
assets: crm_note_images (crm/note.rb:21), crm_note_audios (:23), and the
documents association missed by v1.1 — crm_note_attachment has_one (:15) +
crm_note_attachments has_many (:20), model Crm::NoteAttachment
(note_attachment.rb, allowed types incl. PDF/Word/Excel/PPT/CSV/images/video/audio,
note_attachment_uploader.rb:13-50). CDP's ContactNote.Attachments[] is
{URL, Type, FileSizeInByte, FileSize, FileName} with Type ∈ {image,doc,pdf,video,voice_note,xlsx} (base.go:17-23, validation
contact_notes_service.go:286-293). CRM S3 is public-read (URLs generally
fetchable without signing, carrierwave-s3.rb:27,58).
Options considered
- Option A — download each asset, re-upload to CDP
{company_sso_id}/...storage, store the proxy URL + derivedType. Pros: no permanent CRM-S3 dependency (alternatives-rejected); company-scoped (tenant isolation). Cons: download+upload latency (PRD budget ≤30s/file P95); type-mapping work. - Option B — store CRM S3/CDN URLs directly. Reject — permanent legacy dependency; cross-tenant URL exposure.
Decision Option A. Type mapping: CRM image asset → image; audio → voice_note
(or video for video/* per content type); document → doc/pdf/xlsx by file
extension/content type (default doc). No ≤1 voice_note cap is enforced today
(contact_notes_service.go:286-293) — multiple audios are allowed; OQ-8 decides
whether to add a cap. Download safety (required): (a) SSRF guard — only
fetch URLs whose host is on an allow-list of the CRM S3/CDN domains (reject
arbitrary hosts, internal IPs, and cloud metadata endpoints); the URL comes from CRM
API response data and must not be trusted blindly; (b) verify the downloaded
magic bytes / content-type match the declared Type (the extension is
attacker-influenceable); (c) enforce a max download size. Storage key
(deterministic, idempotent on re-run): {company_sso_id}/{legacy_crm_note_id}/{asset}
— a re-run overwrites the same key safely (matches §2.E).
Rationale Matches PRD §9 #6 and the data-lifecycle (CDP holds its own copy); the SSRF/content-type guards harden a new outbound-fetch path against malicious or malformed CRM URLs.
Consequences A per-attachment failure inserts the note without that
attachment and logs ATTACHMENT_*_FAILED (non-blocking — NOTES-MIG-S02/ERR-2).
Reversibility Medium — re-uploaded objects would need cleanup if reverted.
Decision 9: Drop crm_checkin geolocation (deliberate data loss)
Context crm_checkin is has_one :crm_checkin, class_name 'Crm::Checkin'
(crm/note.rb:16); Crm::Checkin < Crm::Location (STI on crm_locations), with
longitude, lattitude (sic), address (now Lockbox-encrypted
address_ciphertext, schema crm_locations :1789), checkin_time. CDP Notes
have no geolocation field and no PRD requirement to render one.
Decision Do not migrate check-in geolocation. Log a per-note marker when a note has a check-in so the data loss is auditable (PRD D-10).
Rationale No CDP target field, no requirement; mischaracterising it as a string (v1.1) hid real data loss. Encrypted address would additionally require key access.
Consequences Documented, audited data loss; revisit only if a CDP geo field is added later.
Reversibility N/A — explicit non-goal.
Decision 10: Extract via the existing QontakCrmClient; CRM squad delivers a net-new org-scoped endpoint
Context The assumed GET /crm/notes?organization_id&limit&offset does not
exist. The real v4 API (api/v4/notes.rb:131-147) is entity-scoped (requires
lead/company/deal/ticket) and does not actually paginate. There is no
org-scoped bulk or count endpoint. But contact-service already has an
authenticated S2S CRM client — QontakCrmClient (qontak_crm.go:14-24,
CRM_API_ROOT_URL/CRM_API_AUTH, posting to /crm/centralized_contacts/*).
Options considered
- Option A — CRM squad adds a net-new org-scoped Person-notes extraction
endpoint (paginated by
page/per_page, returning HTML,creator_id, images/audios/documents,crm_note_type_id, timestamps), andcontact-serviceconsumes it by extendingQontakCrmClientwithListPersonNotes(ctx, cid, page, perPage). Pros: reuses the existing authenticated client + error handling (qontak_crm.go:43-475xx/Locked/429 handling); CRM owns its data access. Cons: cross-squad dependency (blocking). - Option B — direct Postgres read of
crm_notes(e.g. via Bifrost) usingCrm::PersonNote.where(organization_id: cid). Pros: no CRM API work. Cons:organization_idhas no index oncrm_notes(schema:2004-2008) → heavy scans; couples CDP to CRM's physical schema; bypasses CRM's read auth. Keep as a fallback (OQ-1).
Decision Option A as default; Option B (Bifrost/DB read) as the fallback if the CRM endpoint slips (OQ-1). Extraction throughput is load-tested before Internal QA (OQ-7).
Rationale Reuses a proven, authenticated client; respects service boundaries; the indexed-org-read problem makes the raw DB path costly.
Consequences Blocking cross-squad dependency on the CRM endpoint; an extension
method + payload structs in contact-service. Client/timeout (REV-2): the
current QontakCrmClient uses http.DefaultClient.Do with no timeout
(qontak_crm.go:37) — ListPersonNotes must instead use the repo's standard
heimdall httpclient pattern (httpclient.NewClient(WithHTTPTimeout(timeout)),
as in api/iag_mekari.go:69-71, qontak_billing.go:183-185) with the timeout from
a config duration getDurationOrPanic("CRM_NOTES_EXTRACT_TIMEOUT") (default 10s)
and a heimdall retrier: 3 attempts, exponential backoff 1s / 3s / 9s, retrying on
timeout + 5xx/Locked/429 (matching qontak_crm.go:43-47); after the budget is
exhausted → CRM_EXTRACT_FAILED halt (PRD §9 #2).
Reversibility Medium — the extractor is behind an interface; swap to Bifrost is one implementation.
Decision 11: Notes-only scope (filter on crm_note_type_id)
Context The crm_notes table stores an activity taxonomy via
crm_note_type_id → Crm::NoteType (crm/note.rb:4, table crm_note_types,
seed: Notes, Calls, Emails, Meeting, Tickets, Documents, Tasks, Whatsapp, Telegram,
SMS, … note_type.rb:18). Migrating everything would flood the Notes panel with
calls/emails/etc. (PRD OQ-4, NOTES-MIG-S06-NEG/NEG-2).
Decision Default notes-only: the extractor/consumer filters to note-type
entries (e.g. crm_note_type_id IN (Notes, Documents) — the report queries already
treat (1,6) as notes/documents, crm/note.rb:166,299,312). Confirm the exact
type-id set with PM (OQ-4); excluded entries are counted out-of-scope, not
failures.
Rationale Avoids polluting the Notes panel; matches the PRD default.
Consequences The exact crm_note_type_id set is a PM-confirmed config value;
NOTES-MIG-S02/ERR-4 + S06-NEG/NEG-2 verified by test.
Reversibility High — the filter is config.
Detail 2.0 — Repo Reading Guide
Repo Map (slice this RFC touches)
flowchart LR
subgraph cs["contact-service/internal/"]
rr["server/rest_router.go<br/>(/private group)"]
h["app/handler/<br/>(notes_migration_handler)"]
svc["app/service/<br/>(notes_migration_service, HtmlNormalizer)"]
cons["app/consumer/<br/>(notes_migration_consumer)"]
apic["app/api/qontak_crm.go<br/>(ListPersonNotes)"]
repoN["app/repository/contact_notes/<br/>(CreateNotesBatch, fields)"]
repoC["app/repository/contact/<br/>(SearchWithFilters)"]
wrk["worker/worker_service.go<br/>(register job)"]
enq["app/service/job_enqueuer.go"]
end
subgraph infra["infrastructure"]
mongo[("MongoDB: contacts, contact_notes")]
redis[("Redis: status + work queue")]
store[("CDP attachment storage")]
end
rr --> h --> svc --> enq --> redis
cons --> apic
cons --> repoC --> mongo
cons --> repoN --> mongo
cons --> store
cons --> redis
wrk --> cons
Existing Code Anchors
| Path | Why the agent reads it | What pattern it teaches |
|---|---|---|
internal/app/handler/activity_log_migration_handler.go:32,77,91 | The handler to mirror | UpdateUserID → ValidateAndEnqueue; GetMigrationStatus shape |
internal/app/consumer/activity_log_migration_consumer.go:25-50 | The consumer to mirror | ProcessUpdateUserIDJob(job *work.Job) error; reads job.Args["data"] → unmarshal → service |
internal/app/service/activity_log_migration_service.go:22-31,64-86,115 | Service + Redis status + enqueue + batch | job-name const; Redis status key + TTL 7d; EnqueueJob; batched execute (10000) |
internal/app/service/job_enqueuer.go:38-39,53,65-67 | How to enqueue | work.NewEnqueuer(namespace, redis); Enqueue(name, work.Q{"data": params}) |
internal/worker/worker_service.go:100,132,138 | Register the new job | registerJob → registerJobWithOptions(jobName, opts, handler, pool) |
internal/server/rest_router.go:69-79,344-349 | Where to register /private/notes/* + S2S auth | /private groups guarded by mymiddleware.BasicAuth; S2S migrate pattern |
internal/pkg/middleware/basic_auth.go:10 | S2S auth mechanism | constant-time Basic-auth compare vs config.BasicAuth |
internal/app/repository/contact_notes/base.go:17-23,26-36,39-54 | The note store to extend | Attachment + ContactNote structs; TableName()="contact_notes"; SetDefaults() overwrites ts |
internal/app/repository/contact_notes/create.go:12 | The single-CRUD insert (do not break) | SetDefaults() then mongo.Create |
internal/app/service/contact_notes/contact_notes_service.go:131-136,268-274,286-293 | Render path + validation rules | live owner-name resolve; length-only validation; attachment Type allow-set |
internal/app/handler/contact_notes_handler.go:75-79,143-166,478-486 | Why a system path is needed | company from IAG ctx; live permission compute |
internal/app/repository/contact/base.go:53,68-70,342-344 | Person→Contact linkage | Source/SourceID/SourceName; CrmData{ID} |
internal/app/repository/contact/search.go:125,147 | Resolution + coverage query | SearchWithFilters(ctx, bson.M, …); CountWithFilters |
internal/app/api/qontak_crm.go:14-58 | The CRM client to extend | QontakCrmClient; auth header from CRM_API_AUTH; 5xx/Locked/429 handling |
db/migrations/013_create_contact_notes.up.json | Index migration JSON pattern | createIndexes JSON; basis for the partial unique index |
db/migrations/001_create_contact.up.json | Existing crm_data.id index | crm_contact_index |
config/load.go:197-198,306-314 | Config injection | getStringOrPanic("CRM_API_ROOT_URL"/"CRM_API_AUTH") |
Existing Contracts to Reuse, Extend, or Replace
| Contract | Status | Justification | Owner |
|---|---|---|---|
POST /private/notes/migrate | new-with-justification | No migration trigger exists; mirrors /private/activity_logs PATCH-enqueue; BasicAuth S2S | CDP BE |
GET /private/notes/migration/status | new-with-justification | No notes-migration status; mirrors rest_router.go:74 | CDP BE |
contact_notes collection | extended | Add legacy_crm_note_id + legacy_owner_label; collection + repo exist | CDP BE |
CreateNotesBatch repo method | new-with-justification | No batch insert exists; needed for throughput + skip-on-conflict | CDP BE |
Partial unique index on (company_sso_id, legacy_crm_note_id) | new-with-justification | No unique index exists; must be partial (Decision 4) | CDP BE |
QontakCrmClient | extended | Add ListPersonNotes; client + auth exist | CDP BE |
ContactRepository.SearchWithFilters/CountWithFilters | reused | Drive with crm_data.id $in | CDP BE |
JobEnqueuer / gocraft/work / worker registration | reused | Same as ActivityLogMigration | CDP BE |
/cdp/notes/migrate (PRD literal HTTP batch endpoint) | replaced (not built) | In-process batch write instead (Decision 1) | CDP BE |
| CRM org-scoped Person-notes extraction endpoint | new (external) | Does not exist; v4 is entity-scoped + unpaginated | Legacy CRM Squad |
Go HTML sanitizer (bluemonday) | new dependency | No server-side sanitization in CDP today | CDP BE |
Patterns to Follow
| Concern | Pattern in repo | Reference file | Deviation? |
|---|---|---|---|
| Handler shape | decode → validate → service → typed response | activity_log_migration_handler.go:32-91; myhttp.NewJSONResponse/ErrBadRequest | none |
| Service + enqueue | validate → EnqueueJob(name, form) → Redis status | activity_log_migration_service.go:64-86 | none |
| Queue consumer | func (w *Consumer) Method(job *work.Job) error; job.Args["data"] | activity_log_migration_consumer.go:25-50 | none |
| External HTTP (S2S) | http.NewRequest + Authorization header; 5xx/Locked/429 → retry/error | api/qontak_crm.go:26-58 | extend with ListPersonNotes |
| Repository / DB access | r.mongo.Create/Where/Update; filters as bson.M | contact_notes/create.go; contact/search.go:125 | new CreateNotesBatch (bulk) |
| Error wrapping / logging | fmt.Errorf("ctx: %w", err); slog.ErrorContext | activity_log_migration_consumer.go:27-35 | none |
| Index declaration | createIndexes JSON migration | db/migrations/013_create_contact_notes.up.json | add unique+partialFilterExpression |
| Config/secrets | getStringOrPanic(key) in config/load.go | config/load.go:197-198,306 | new keys for extraction if needed |
Reading Order for the Agent
internal/app/consumer/activity_log_migration_consumer.go:25-50— the consumer shape to mirror.internal/app/service/activity_log_migration_service.go:22-86— enqueue + Redis status + batching.internal/app/handler/activity_log_migration_handler.go:32-91— handler + status endpoint.internal/server/rest_router.go:69-79,344-349—/privategroups + BasicAuth S2S.internal/app/repository/contact_notes/base.go:17-54+create.go:12— the note store +SetDefaultspitfall.internal/app/service/contact_notes/contact_notes_service.go:131-136,268-293— render + validation rules.internal/app/repository/contact/{base.go:53,342-344, search.go:125,147}— Person→Contact linkage + query.internal/app/api/qontak_crm.go:14-58— the CRM client to extend.internal/worker/worker_service.go:100,132,138+job_enqueuer.go:38-67— job registration + enqueue.db/migrations/013_create_contact_notes.up.json+001_create_contact.up.json— index JSON pattern + existingcrm_data.idindex.
Source Verification (anti-hallucination — verified 2026-06-18)
| Anchor / pattern / contract | Verified by | Evidence |
|---|---|---|
| Notes single-CRUD only; no migrate/batch/count/source | read | rest_router.go:150-159 (+ deprecated /notes :162-169); GetNotes params contact_notes_handler.go:112-125 (page/per_page/order_by/order_direction/owner_ids) |
ContactNote has no legacy_crm_note_id/legacy_owner_label | read | contact_notes/base.go:26-36 fields: ID, ContactID, CompanySsoID, Note(max=10000), Attachments, OwnerID, IsDeleted, CreatedAt, UpdatedAt |
Attachment shape + Type allow-set | read | base.go:17-23 {URL,Type,FileSizeInByte,FileSize,FileName}; valid types contact_notes_service.go:286-293 = image/doc/pdf/video/voice_note/xlsx; no ≤1 voice_note cap |
SetDefaults() overwrites timestamps | read | base.go:51-54 cn.CreatedAt = now; cn.UpdatedAt = now; called create.go:12 |
| owner_name live; permission live | read | contact_notes_service.go:131-136 GetUserNamesBulk; contact_notes_handler.go:143-166 resolveNotePermission |
| company from IAG ctx (no system write path) | read | contact_notes_handler.go:75-79 extractCompanyIDFromContext; def :478-486 reads consts.CompanySSOKey |
| no server-side sanitization | read/grep | contact_notes_service.go:268-274 length-only; grep sanitize/bluemonday/policy → 0 hits |
| existing migration framework | read | activity_log_migration_handler.go:32,77,91; activity_log_migration_consumer.go:25 ProcessUpdateUserIDJob(job *work.Job) error; activity_log_migration_service.go:22 job-name const, :25 Redis key, :28 batch 10000, :31 TTL 7d, :64-86 enqueue; status route rest_router.go:74 |
S2S = BasicAuth on /private + /api/v1 | read | rest_router.go:69-70,78-79,279-280; basic_auth.go:10; S2S migrate :344-349 /migrate-default-fields |
EnqueueJob mechanism | read | job_enqueuer.go:38-39,53,65-67 work.NewEnqueuer; Enqueue(name, work.Q{"data":params}) |
| worker registration | read | worker_service.go:100 registerJob, :132,138 registerJobWithOptions(jobName, opts, handler, pool) |
| contact CRM linkage + index | read | contact/base.go:53 CrmData *CrmData, :68-70 Source/SourceID/SourceName, :342-344 CrmData{ID}; index crm_contact_index on crm_data.id db/migrations/001_create_contact.up.json; no source_id index |
| resolution + coverage query | read | contact/search.go:125 SearchWithFilters(ctx, bson.M, limit, page, sort); :147 CountWithFilters(ctx, bson.M) |
| existing CRM client | read | api/qontak_crm.go:14-24 QontakCrmClient; :34/:68/:101 Authorization header; :43,77,110 5xx/Locked/429 handling; config load.go:197-198 CRM_API_ROOT_URL/CRM_API_AUTH |
| notes collection + index pattern | read | contact_notes/base.go:39-41 TableName()="contact_notes"; indexes via db/migrations/013_create_contact_notes.up.json (4 non-unique); none unique |
| build/test/lint/migrate | read | Makefile: make build (go build -tags dynamic), make test (go test -race -tags dynamic ./internal/... ./config/...), make lint (staticcheck ./...), make sec (gosec), make migrate-up (golang-migrate Mongo driver; JSON migrations db/migrations/, {seq}_{name}.up.json) |
| gocraft/work version + worker entry | read | go.mod github.com/gocraft/work v0.5.1; cmd/worker |
| CRM note sanitize + tags | read | qontak.com/app/models/crm/note.rb:43 before_save :sanitize_note; :379 sanitize_note; app/helpers/application_helper.rb:296-325 Rails sanitize, tags a b i strong em u s span br div p ul ol li blockquote h1-h6 pre |
| CRM 3 attachment types | read | crm/note.rb:21 crm_note_images (Crm::NoteImage<Asset), :23 crm_note_audios, :15 has_one crm_note_attachment, :20 has_many crm_note_attachments; note_attachment.rb; types note_attachment_uploader.rb:13-50; CarrierWave→S3 public-read carrierwave-s3.rb:27,58 |
| CRM checkin geolocation | read | crm/note.rb:16 has_one :crm_checkin (Crm::Checkin<Crm::Location); geo on crm_locations db/schema.rb:1762-1798 (longitude,lattitude,address_ciphertext,checkin_time) |
| CRM multi-FK + STI | read | crm/note.rb:5-8 belongs_to crm_person/crm_company/crm_deal/tickets (all nullable int); type string column db/schema.rb:1990; Crm::PersonNote<Crm::Note app/models/crm/person_note.rb |
| CRM real API entity-scoped + no paginate | read | app/controllers/api/v4/notes.rb:131-147 params page/per_page declared but index set_entity-scoped, no .page/.per_page; no /crm/notes/count/bulk/org-scoped |
| CRM mentions via data-user-id | read | crm/note.rb:99-107 mention_people scans /users/(\d+)/edit_user + data-user-id (integer IDs) |
| CRM activity taxonomy | read | crm/note.rb:4 belongs_to :crm_note_type; crm_note_type_id db/schema.rb:1983; note_type.rb:18 seed Notes/Calls/Emails/…; (1,6) notes/documents note_type.rb refs |
| CRM hard-delete (no deleted_at); std timestamps | read | db/schema.rb:1978-2009 no deleted_at; v4 delete destroy! api/v4/notes.rb:242; created_at/updated_at :1981-1982 |
| CRM org-wide person notes: no scope/index | read | no scope/default_scope on Crm::Note/PersonNote; organization_id int db/schema.rb:1985, no index; Crm::PersonNote.where(organization_id: cid) is the raw path |
Detail 2.1 — Architecture (mermaid)
Component diagram
flowchart TB
ops([Ops S2S]) --> handler[/"NotesMigrationHandler<br/>/private/notes/migrate"/]
handler --> svc["NotesMigrationService.ValidateAndEnqueue"]
svc --> enq[["JobEnqueuer.EnqueueJob<br/>NotesMigrationJobName"]]
svc --> redis[("Redis status")]
enq --> queue[["gocraft/work (Redis)"]]
queue --> cons["NotesMigrationConsumer.ProcessNotesMigrationJob"]
cons --> ext["QontakCrmClient.ListPersonNotes"]
cons --> resolver["ContactResolver (SearchWithFilters crm_data.id)"]
cons --> owner["OwnerResolver (creator_id to SSO)"]
cons --> html["HtmlNormalizer (sanitize + strip mentions)"]
cons --> att["AttachmentProcessor (download + re-upload)"]
cons --> batch["contact_notes.CreateNotesBatch (idempotent, ts-preserving)"]
cons --> valid["ValidationRunner (CountWithFilters)"]
resolver --> mongo[("MongoDB: contacts")]
batch --> notes[("MongoDB: contact_notes")]
att --> store[("CDP company-scoped storage")]
cons --> redis
Data model (erDiagram)
erDiagram
CONTACT_NOTES {
objectid _id PK
string contact_id "resolved CDP contact UUID"
string company_sso_id "per-batch tenant scope"
string note "sanitized HTML (no p-wrap)"
array attachments "type in image|doc|pdf|video|voice_note|xlsx"
string owner_id "SSO UUID or null"
string legacy_owner_label "NEW: shown when owner_id null"
string legacy_crm_note_id "NEW: idempotency key"
bool is_deleted
datetime created_at "PRESERVED from CRM"
datetime updated_at "PRESERVED from CRM"
}
CONTACTS {
string id PK
string source
string source_id "not indexed"
object crm_data "crm_data.id INDEXED (crm_contact_index)"
}
CONTACTS ||..o{ CONTACT_NOTES : "crm_data.id == legacy crm_person_id"
Idempotency: partial unique index
{company_sso_id:1, legacy_crm_note_id:1}withpartialFilterExpression:{legacy_crm_note_id:{$exists:true}}(Decision 4) — does not touch existing notes that lack the field.
State machine — migration job status (Redis-backed)
stateDiagram-v2
[*] --> not_started
not_started --> in_progress: enqueue accepted
in_progress --> in_progress: per-batch progress
in_progress --> halted: failure_rate gt 1 pct
in_progress --> completed_success: match_pct gte 99 pct
in_progress --> completed_with_errors: match_pct lt 99 pct or VALIDATION_SKIPPED
halted --> in_progress: re-trigger (idempotent)
completed_with_errors --> in_progress: re-trigger after fix
completed_success --> [*]
Branch & skip flow — per-note routing
flowchart TD
note([CRM note dequeued]) --> typ{"note-type in scope?"}
typ -- no --> oos["count out-of-scope (not a failure)"]
typ -- yes --> dup{"legacy_crm_note_id exists?"}
dup -- yes --> skip["skip (count notes_skipped)"]
dup -- no --> res{"contact resolved by crm_data.id?"}
res -- no --> cnf["CONTACT_NOT_MAPPED (skip + count failure)"]
res -- yes --> ins["sanitize + re-link + insert (preserve ts)"]
oos --> done([next note])
skip --> done
cnf --> done
ins --> done
Detail 2.2 — Sequence (end-to-end, incl. failure paths)
Happy path — trigger + async migrate + validate
sequenceDiagram
actor Ops as Ops (S2S, BasicAuth)
participant LB as Ingress
participant API as contact-service api
participant RD as Redis (status + queue)
participant Q as gocraft/work
participant W as NotesMigrationConsumer (worker)
participant CRM as Legacy CRM (extraction)
participant DBc as MongoDB contacts
participant S3 as CRM S3/CDN
participant ST as CDP storage
participant DBn as MongoDB contact_notes
Ops->>LB: POST /private/notes/migrate {cid, company_sso_id}
LB->>API: BasicAuth
alt flag OFF / already completed
API-->>Ops: 403 FLAG_DISABLED / 409 ALREADY_MIGRATED
else valid
API->>RD: set status in_progress {cid}
API->>Q: EnqueueJob(NotesMigrationJobName, {cid, company_sso_id})
API-->>Ops: 200 {job_id}
Note over Q,W: async
loop paginated (page/per_page)
W->>CRM: ListPersonNotes(cid, page) (notes-only filter)
CRM-->>W: notes (HTML, creator_id, images/audios/documents, ts)
end
loop per batch
W->>DBc: SearchWithFilters(crm_data.id in [...])
W->>W: sanitize HTML + strip mentions; resolve owner; map attachments
loop per attachment
W->>S3: download original
W->>ST: re-upload {company_sso_id}/... then proxy URL
end
W->>DBn: CreateNotesBatch (legacy_crm_note_id, caller ts, skip-on-conflict)
W->>RD: update progress_pct / notes_processed
alt failure_rate gt 1 pct
W->>RD: status halted
W-->>Ops: PagerDuty P1 {job_id, cid, failure_rate}
end
end
W->>DBn: CountWithFilters(legacy_crm_note_id exists) vs CRM count
W->>RD: status completed_success {match_pct}
Ops->>API: GET /private/notes/migration/status?cid
API->>RD: read status
API-->>Ops: {status, progress_pct, match_pct, counts}
end
Failure path — extraction / attachment / validation
sequenceDiagram
participant W as NotesMigrationConsumer
participant CRM as Legacy CRM
participant S3 as CRM S3/CDN
participant DBn as MongoDB contact_notes
participant RD as Redis
alt CRM 5xx / timeout
W->>CRM: ListPersonNotes (retry 3x backoff)
CRM-->>W: still failing
W->>RD: status halted, CRM_EXTRACT_FAILED
else attachment download/upload fails (non-blocking)
W->>S3: download (fails)
W->>DBn: insert note WITHOUT that attachment
W->>RD: log ATTACHMENT_DOWNLOAD_FAILED (not a note failure)
else owner unmappable (non-blocking)
W->>DBn: insert with owner_id=null + legacy_owner_label
else contact not mapped
W->>RD: log CONTACT_NOT_MAPPED (skip + count failure)
else count source unavailable after retries
W->>RD: status completed_with_errors, VALIDATION_SKIPPED + alert
end
Detail 2.3 — Database Model (Mongo)
MongoDB (schemaless). Extend the existing contact_notes collection
(contact_notes/base.go). No DDL migration for the new fields; one partial
unique index migration.
// New fields on ContactNote (application struct additions):
// legacy_crm_note_id string // CRM crm_notes.id (migrated rows only)
// legacy_owner_label string // shown when owner_id is empty (Decision 6)
// Partial unique index migration (db/migrations/NNN_index_contact_notes_legacy_crm_note_id.up.json),
// following the createIndexes JSON pattern of 013_create_contact_notes.up.json:
{
"createIndexes": "contact_notes",
"indexes": [{
"key": { "company_sso_id": 1, "legacy_crm_note_id": 1 },
"name": "uq_contact_notes_company_legacy_crm_note_id",
"unique": true,
"partialFilterExpression": { "legacy_crm_note_id": { "$exists": true } }
}]
}
// .down.json: { "dropIndexes": "contact_notes", "index": "uq_contact_notes_company_legacy_crm_note_id" }
- Cardinality / growth: ~21,000+ notes across ~130 CIDs (one-time). Attachment bodies live in CDP storage, not Mongo.
- PII classification:
note(free-text customer interaction history — PII),attachments[].url(links to PII files),legacy_owner_label(may be a person's name),contact_id/company_sso_id(internal identifiers). See §3.D. - Retention (PRD §7.1): migration status (Redis) 7d; failed-record queue 30d;
audit map (
legacy_crm_note_id→ CDP note id) permanent (intrinsic to each migrated document); source CRM notes untouched (read-only per CRM policy).
Per-status lifecycle (migration run, Redis status):
| Status | Visibility | Retention | Restore semantics | Transitions |
|---|---|---|---|---|
not_started | internal (status API) | n/a | n/a | → in_progress |
in_progress | internal | 7d (Redis) | n/a | → halted / completed_* |
halted | internal + PagerDuty | 7d | re-trigger (idempotent) | → in_progress |
completed_with_errors | internal + alert + error log | 7d | re-trigger after fix | → in_progress |
completed_success | internal | 7d | re-run is a no-op (all skipped) | terminal |
- Partition/sharding: none — bounded one-time volume.
Detail 2.4 — APIs
Outbound endpoints (consumers call us)
| Endpoint | Method | AuthN/AuthZ | Request | Response | Status codes | Idempotency | Reuse? |
|---|---|---|---|---|---|---|---|
/private/notes/migrate | POST | mymiddleware.BasicAuth (S2S) | {cid:string, company_sso_id:string} (body) | {job_id:string, status:"in_progress"} | 200; 403 FLAG_DISABLED; 404 CID_NOT_FOUND; 409 ALREADY_MIGRATED/JOB_ALREADY_RUNNING; 401/403 non-Basic | enqueue is guarded by an in-progress lock per CID (Redis); re-trigger after terminal is safe (skip-on-conflict) | new-with-justification |
/private/notes/migration/status | GET | mymiddleware.BasicAuth (S2S) | query cid | {status, progress_pct, notes_processed, notes_total, failure_rate, match_pct, error_log_url?} | 200; 404 CID_NOT_FOUND (→ not_started) | n/a (read) | new-with-justification (mirrors rest_router.go:74) |
Person→Contact resolution algorithm (implementation contract for chunk 3).
Resolve via ContactRepository.SearchByAppContactID(ctx, "crm", crmPersonID)
(contact/search.go:27), which maps "crm" → "crm_data.id" through
appNameColumnMapper (contact/base.go:531-538) and hits crm_contact_index;
string-cast the CRM crm_person_id first (crm_data.id is a string,
base.go:343). For throughput, batch by extending it (or SearchWithFilters) with
bson.M{"crm_data.id": {"$in": batchOfCrmPersonIDs}, "company_sso_id": companySsoID}
→ build a crm_person_id → contact_id map. For a note carrying multiple FKs, apply
person-first precedence (Decision 7). A note whose crm_person_id is absent from
the map → CONTACT_NOT_MAPPED (skip + count failure, no halt). Fall back to a
source_id/mapping-table lookup only where crm_data.id coverage is incomplete
(OQ-2).
Internal calls (no HTTP surface):
QontakCrmClient.ListPersonNotes(ctx, cid, page, perPage)— extendqontak_crm.go; calls the CRM net-new org-scoped endpoint with the existingAuthorization: {CRM_API_AUTH}header; reuses the 5xx/Locked/429 handling (:43-47).ContactNoteRepo.CreateNotesBatch(ctx, []ContactNote)— bulk write with skip-on-conflict (Decision 4) and explicit timestamps (Decision 3).
Inbound webhooks (other services call us)
| Endpoint | Source | Notes |
|---|---|---|
| — | — | n/a — no inbound webhook; contact-service initiates extraction (pull) from CRM, not a receiver |
Detail 2.A — Async Job / Event Consumer Spec
| Job/Consumer | Trigger | Input shape | Retry | Concurrency | Idempotency key | Per-msg timeout | Poison handling |
|---|---|---|---|---|---|---|---|
NotesMigrationConsumer.ProcessNotesMigrationJob | EnqueueJob(NotesMigrationJobName) | {cid, company_sso_id} via job.Args["data"] (mirror activity_log_migration_consumer.go:38-47) | extraction: 10s timeout, 3× backoff 1s/3s/9s (heimdall retrier) → CRM_EXTRACT_FAILED halt; per-note failures counted, not retried at job level; batch write is an idempotent upsert (retry-safe, Decision 4) | per-CID in-progress lock (Redis) → second concurrent job for the same CID returns 409 JOB_ALREADY_RUNNING | legacy_crm_note_id (per note, via upsert filter) + per-CID lock | bounded by ≤4h/CID window (PRD §7); batch 500 (max 1000) | failure_rate >1% → halted + PagerDuty; never silently drop a note |
Detail 2.E — Concurrency Collision Map
| Resource | Writers | Collision scenario | Resolution | Behavior on conflict |
|---|---|---|---|---|
| Migration run (one CID) | Ops | two enqueues for the same CID | per-CID in-progress lock in Redis (mirror the activity-log status key) | second enqueue → 409 JOB_ALREADY_RUNNING (NOTES-MIG-S03/ERR-1) |
contact_notes doc | consumer | same note written twice (re-run / overlapping batch) | idempotent upsert on (company_sso_id, legacy_crm_note_id) + partial unique index (Decision 4) | already-present note matches → no-op (MatchedCount, counted notes_skipped); never an error |
| CDP storage object | consumer | same attachment re-uploaded on re-run | object key namespaced {company_sso_id}/{legacy_crm_note_id}/{asset} (deterministic) | overwrite same key safely |
Detail 2.F — Responsibility Boundary Matrix
| Step | Owning squad / service | Inbound trigger | Outbound effect | Failure handler | PRD anchor |
|---|---|---|---|---|---|
| 1. Validate + enqueue | CDP BE (api) | POST /private/notes/migrate | Redis status + job enqueue | 403/404/409 to Ops | §9 #1, S01 |
| 2. Extract Person notes | CDP BE (worker) → CRM squad endpoint | dequeued job | paginated note pull | retry 3× → CRM_EXTRACT_FAILED halt | §9 #2, D-5 |
| 3. Resolve contact | CDP BE (worker) | per note | crm_person_id → contact_id | CONTACT_NOT_MAPPED skip + count | §9 #3, D-7 |
| 4. Resolve owner | CDP BE (worker) → Launchpad | per note | owner_id or label | unmappable → null + legacy_owner_label (non-blocking) | §9 #4, D-7 |
| 5. Sanitize + strip mentions | CDP BE (worker) | per note | safe HTML | malformed → best-effort + warning | §9 #5, D-4/D-8 |
| 6. Re-link attachments | CDP BE (worker) → CDP storage | per attachment | proxy URL | ATTACHMENT_*_FAILED insert-without (non-blocking) | §9 #6, D-10 |
| 7. Batch insert | CDP BE (worker) | per batch | idempotent upsert (BulkUpdate) | already-present → no-op skip; transient Mongo write error → gocraft retries the batch (upsert makes retry a no-op for written notes); persistent error → BATCH_WRITE_FAILED, count + continue; failure_rate >1% → halt | §9 #7, D-1 |
| 8. Validate | CDP BE (worker) | post-batches | match_pct | count unavailable → VALIDATION_SKIPPED + alert | §9 #8, S04 |
| 9. Status | CDP BE (api) | GET /private/notes/migration/status | status payload | CID unknown → not_started | §9 #9, S01/AC-2 |
| 10. Render migrated notes | existing CDP Notes UI (web+mobile) | GET /iag/v1/contacts/{id}/notes | UI render | existing behavior; legacy_owner_label fallback | §10, S05 |
Detail 2.I — Scope Boundaries
- BE create:
internal/app/handler/notes_migration_handler.go,internal/app/service/notes_migration_service.go,internal/app/consumer/notes_migration_consumer.go,HtmlNormalizer(internal/app/service/...orinternal/pkg/util/),ContactResolver/OwnerResolver/AttachmentProcessorhelpers,NotesMigrationJobNameconst, payload structs (internal/app/payload/), the partial-unique-index migration (db/migrations/),docs/NOTES_MIGRATION_SERVICE.md. - BE modify:
internal/app/repository/contact_notes/base.go(+legacy_crm_note_id,legacy_owner_label) + newCreateNotesBatch(.../create.goor a new file);internal/app/service/contact_notes/contact_notes_service.go(render-pathlegacy_owner_labelfallback — Decision 6);internal/app/api/qontak_crm.go(+ListPersonNotes);internal/server/rest_router.go(register 2/private/notesroutes);internal/worker/worker_service.go(register job);config/load.go(extraction config if a distinct CRM notes endpoint base is needed);go.mod(bluemonday). - BE NOT touched: the single-CRUD note path (
create.go:12SetDefaults),/iag/v1/contacts/{id}/noteshandlers, existing indexes. - CRM (qontak.com): read-only — the CRM squad adds the org-scoped extraction endpoint in its own RFC/PR; no schema or data change in CRM.
- FE: none — migrated notes render via the existing CDP Notes UI (Out of Scope #8).
- Shared modules:
JobEnqueuer,worker_service,ContactRepository,QontakCrmClient— reused/extended.
3. High-Availability & Security
The migration is async, S2S, off the request path, and per-CID isolated:
one CID's failure halts only that CID's job. All dependencies degrade gracefully —
CRM extraction failure halts with CRM_EXTRACT_FAILED (retryable); attachment and
owner-resolution failures are non-blocking (insert-without / label fallback);
count-source failure yields completed_with_errors + alert rather than data loss.
Idempotency (Decision 4) makes every halt safely re-runnable.
Performance Requirement
- API:
POST /private/notes/migratep99 < 300 ms (validate + enqueue; no work on-request);GET .../statusis a single Redis read. - Worker: ≥ 10,000 notes/hour/CID; ≤ 4h/CID; batch insert ≤ 2s/500; attachment
re-upload ≤ 30s/file P95 (PRD §7). Scale workers horizontally (
cmd/worker×M). - Resolution: Person→Contact uses the indexed
crm_data.id(crm_contact_index) in$inbatches — avoids collection scans across ~130 CIDs. - Load test (OQ-7): run the chosen extraction path at realistic CID size in staging before Internal QA; add a configurable inter-page delay if the CRM throttles.
Monitoring & Alerting
Observability events (PRD §12) — names preserved exactly:
crm_notes_migration_started, _batch_completed, _note_failed,
_attachment_failed, _owner_not_resolved, _halted, _completed. BE structured
logs via slog.*Context (existing convention, activity_log_migration_consumer.go:27-35).
Alerts: halted → PagerDuty P1; match_pct < 99% → P2; attachment-fail
20% → Slack
#cdp-ops(PRD §12). SLO:match_pct≥ 99%/CID; failure rate ≤ 1%/CID; halt rate < 2% in Stage 3.
Logging
- BE fields:
job_id,cid,company_sso_id,legacy_crm_note_id,reason_code,notes_processed,notes_total,failure_rate,duration_seconds. - PII scrubbed: never log
noteHTML body, attachment URLs/tokens, or contact PII — log ids + counts + reason codes only.
Security Implications
- Threat model: (a) unauthorized bulk migration into another company's data;
(b) stored XSS via un-sanitized CRM HTML on a new ingestion surface;
(c) cross-tenant attachment exposure (CRM S3
public-read); (d) dangling/false mention links; (e) PII leakage in logs; (f) SSRF via attacker-influenced CRM attachment URLs on the worker's outbound fetch (Decision 8); (g) storage-quota DoS — a malicious or pathologically large CID exhausting CDP attachment storage. - AuthN/AuthZ: both endpoints behind
mymiddleware.BasicAuth(S2S only;rest_router.go:70;basic_auth.go:10, constant-time compare). A logged-in IAG user token is not accepted (NOTES-MIG-S01/ERR-4). The per-batchcompany_sso_idis explicit and is applied to every contact query (company_sso_idfilter) and every note write, and it namespaces the attachment storage path{company_sso_id}/...— cross-tenant writes are structurally impossible. Caveat: BasicAuth is a single shared credential, so it authenticates that the caller is the trusted S2S principal but cannot attribute which operator triggered a given CID's migration. Front/private/notes/migratewith the platform's gateway/mesh identity (mTLS or per-service token) where available, and log the triggering principal alongsidecid/company_sso_id(OQ-11). - SSRF / download integrity (attachments): the worker fetches CRM attachment
URLs (CRM S3
public-read, Decision 8) — restrict fetches to an allow-list of CRM S3/CDN hosts (reject internal IPs / metadata endpoints), validate magic-bytes vs declaredType, and cap download size. Re-uploaded objects are written only under the caller's{company_sso_id}/...prefix. - Storage-quota DoS: bound per-CID attachment volume; the CDP storage quota dependency (§1) is also a DoS control — confirm headroom at Stage 0 (OQ-9) and alert on quota approach.
- Input sanitization (the headline control): the migrate write path sanitizes
the CRM HTML server-side (
bluemondayallow-list mirroring CRM's tag set) and strips mention anchors — closing the XSS gap that exists because CDP performs no server-side sanitization today (contact_notes_service.go:268-274). This is defense-in-depth even though CRM also sanitizes (Decision 5). - Attachments: re-uploaded into company-scoped CDP storage (never reference raw
CRM S3 URLs — alternatives-rejected); validate
Typeagainst the allow-set (contact_notes_service.go:286-293). - Secrets: CRM credentials via
config/load.go(getStringOrPanic,CRM_API_ROOT_URL/CRM_API_AUTH) — no hardcoding; BasicAuth creds fromconfig.BasicAuth. - Static analysis:
staticcheck ./...(make lint) +gosec(make sec).
Role × Endpoint Authorization Matrix
| Role | Endpoint(s) | Methods | Tenant scope | Constraint | Audit |
|---|---|---|---|---|---|
| Internal Ops (S2S) | /private/notes/migrate, /private/notes/migration/status | POST/GET | explicit per-batch company_sso_id | flag ON per CID; one in-progress job/CID | per-record log + Redis status (7d) + intrinsic audit map (permanent) |
| Migrated agent (IAG) | existing GET /iag/v1/contacts/{id}/notes | GET | own company (IAG ctx) | n/a | n/a |
| Client admin / end user | none for migration | — | — | — | 401/403 |
Detail 3.A — Failure Mode Catalog
| Failure | Where | Behavior | Counted as | User/Ops-visible |
|---|---|---|---|---|
| Flag OFF | handler | 403 FLAG_DISABLED, no job | n/a | yes (Ops) |
| Already completed | handler | 409 ALREADY_MIGRATED | n/a | yes |
| Concurrent job/CID | handler | 409 JOB_ALREADY_RUNNING | n/a | yes |
| CRM 5xx/timeout | extractor | 10s per-request timeout; retry 3× backoff (1s/3s/9s) → CRM_EXTRACT_FAILED halt | run halt | PagerDuty P1 |
| Contact not mapped | resolver | skip note, CONTACT_NOT_MAPPED | note failure | error log |
| Owner unmappable | owner resolver | owner_id=null + legacy_owner_label | non-failure | _owner_not_resolved |
| Attachment download/upload fail | attachment processor | insert note without attachment | non-failure | _attachment_failed |
| Batch write error (transient) | repo (BulkUpdate upsert) | gocraft retries the batch (upsert ⇒ already-written notes are no-ops); persistent → BATCH_WRITE_FAILED, count + continue; >1% → halt | note/run failure | PagerDuty if halt |
Already-migrated legacy_crm_note_id | repo (upsert) | upsert matches → no-op | notes_skipped | n/a |
| Count source unavailable | validator | VALIDATION_SKIPPED, completed_with_errors, alert | run warning | P2 |
| Malformed HTML | normalizer | sanitized best-effort + warning | non-failure | log |
Detail 3.B — Error Response Catalog
Shape: { "error": "CODE", "message": "...", "details": {} }
| Endpoint | Code | HTTP | When |
|---|---|---|---|
| migrate | FLAG_DISABLED | 403 | crm_notes_migration_enabled OFF for CID |
| migrate | ALREADY_MIGRATED | 409 | CID already completed_success |
| migrate | JOB_ALREADY_RUNNING | 409 | in-progress lock held for CID |
| migrate | CID_NOT_FOUND | 404 | unknown CID |
| migrate / status | (BasicAuth fail) | 401/403 | missing/invalid Basic credentials (not S2S) |
| status | CID_NOT_FOUND | 404 | no status record → treat as not_started |
Detail 3.D — Compliance & Data Governance
Triggered — migrated notes contain contact PII (interaction history, attachments).
| Field | Classification | Legal basis | Retention | Encryption | Access audit |
|---|---|---|---|---|---|
note (HTML) | PII | legitimate migration of the company's own data | per CDP note lifecycle | TLS in transit; storage at-rest | per-record migration log (30d failed queue) |
attachments[].url + object | PII | — | CDP storage policy | TLS; company-scoped path | audit map |
legacy_owner_label | may be a person name | — | with the note | at-rest | — |
| migration status (Redis) | internal ids/counts | — | 7d TTL | at-rest | — |
Right-to-delete (REV-5): migrated notes are stored identically to native CDP
notes and inherit the same soft-delete lifecycle — is_deleted set on delete,
filtered out on every read (contact_notes/read.go:36-37); the new fields
(legacy_crm_note_id/legacy_owner_label) are erased with the document, adding no
new barrier to deletion. There is no contact-delete → notes cascade in
contact-service today (verified: no caller deletes contact_notes on
contact/company deletion), so contact/company erasure does not auto-remove either
native or migrated notes — that cascade, if required for UU PDP erasure (including
the re-uploaded attachment objects), is a separate platform concern that applies
equally to native notes and is out of scope here.
Controls: S2S-only access, explicit per-batch tenant scoping, server-side HTML
sanitization, company-scoped attachment storage, no PII in logs (ids + counts +
reason codes only), crm_checkin geolocation explicitly dropped (Decision 9). CRM
source data is read-only; no deletion during the ≥90d coexistence window (PRD
§11.1). OSS/storage data-residency for the re-uploaded PII attachments — InfoSec
to confirm the CDP bucket region is UU PDP-compliant (OQ-9).
4. Backwards Compatibility and Rollout Plan
Compatibility
- BE: all routes are additive (
/private/notes/*).contact_notesgains two optional fields; the partial unique index does not touch existing notes (Decision 4); the single-CRUD note path is unchanged (Decision 3). No API version bump. - CRM: read-only; the org-scoped extraction endpoint is additive in qontak.com (CRM squad's PR). No CRM schema/data change.
- FE: none — existing CDP Notes UI renders migrated rows unchanged.
Rollout Strategy
- Deploy order: CRM extraction endpoint (CRM squad) →
contact-service(migrate pipeline + flag default OFF) → Ops triggers per CID. The pipeline is dormant until Ops enqueues a job, and gated bycrm_notes_migration_enabledper CID. - Feature flag:
crm_notes_migration_enabled | default OFF, per CID (PRD §11). Kill-switch = flip OFF (migrate endpoint → 403; no jobs). - Stages (PRD §11, §14):
- Stage 1 — Internal QA: 2 synthetic CIDs (100 + 5,000 notes incl.
images/audios/documents + mentions + activity entries). Verify idempotency
(zero dup on re-run), timestamp preservation, sanitization + mention-strip,
attachment re-link ≥ 95%,
match_pct= 100%, activity exclusion. - Stage 2 — Pilot: 5–10 CSM-approved CIDs (2 wk);
match_pct≥ 99%, zero pipeline-bug halts, error log root-caused. - Stage 3 — Batch: remaining ~120 CIDs per schedule; halt rate < 2%,
match_pct≥ 99% before each cutover.
- Stage 1 — Internal QA: 2 synthetic CIDs (100 + 5,000 notes incl.
images/audios/documents + mentions + activity entries). Verify idempotency
(zero dup on re-run), timestamp preservation, sanitization + mention-strip,
attachment re-link ≥ 95%,
- Gate before any CID: pre-migration coverage report (OQ-2) — block job
start if Person→Contact coverage (
crm_data.id) < 99%. - Stop conditions: failure rate > 1%/CID (auto-halt) or halt rate > 2% in Stage 3 → pause rollout, investigate.
- Rollback: flip
crm_notes_migration_enabledOFF (instant; no data migration); in-flight job completes its current batch and stops; migrated rows remain valid (idempotent re-run later). If a bad index migration:make migrate-down(index only; data untouched). - Blast radius: flag-ON CIDs only; isolated from read/write contact paths.
Detail 4.A — Configuration Contract
| Env var / flag | Type | Default | Required | Provisioner | Secret? |
|---|---|---|---|---|---|
crm_notes_migration_enabled | flag (per-CID) | OFF | yes | Ops/flag service | no |
CRM_API_ROOT_URL | string | — | yes (exists) | config/load.go:197 | no |
CRM_API_AUTH | string | — | yes (exists) | config/load.go:198 | yes |
BASIC_AUTH_USERNAME / BASIC_AUTH_PASSWORD | string | — | yes (exists) | config/load.go:143-144 | yes |
| Notes-migration batch size | int | 500 (max 1000) | yes | code/config | no |
Notes-only crm_note_type_id set | list | (PM-confirmed; default Notes/Documents) | yes | config (OQ-4) | no |
CRM_NOTES_EXTRACT_TIMEOUT | duration | 10s | yes | config/load.go getDurationOrPanic | no |
| Inter-page extraction delay | ms | 0 (tune if throttled — OQ-7) | no | config | no |
Detail 4.B — Test Plan (commands sourced from repo)
| Layer | Command (source) | What it must prove |
|---|---|---|
| BE unit | go test -race -tags dynamic ./internal/app/service/... ./internal/app/consumer/... (source: Makefile make test) | sanitizer (XSS + mention-strip + no <p>); timestamp preservation; skip-on-conflict; owner-label fallback; attachment mapping; per-error counting; notes-only filter |
| BE full | make test (go test -race -tags dynamic ./internal/... ./config/...) | no regression across service |
| BE lint | make lint (staticcheck ./...) | static analysis clean |
| BE sec | make sec (gosec) | no new security findings on the ingestion path |
| BE build | make build (go build -tags dynamic) | compiles |
| BE migration | make migrate-up && make migrate-down | partial unique index applies + rolls back; existing notes unaffected |
| Integration | seeded Mongo: insert N CRM-shaped notes twice | re-run inserts only missing; full re-run → migrated=0; concurrent job → 409 |
| Cross-squad (Stage 1) | manual: Ops POST → status → notes visible in CDP UI | end-to-end incl. CRM extraction + attachment re-link |
Detail 4.C — Agent Execution Plan
| Order | Chunk | Files to modify/create | Commands | Acceptance criteria |
|---|---|---|---|---|
| 1 | Constants + payload + status | internal/app/service/notes_migration_service.go (new — NotesMigrationJobName, Redis status key like activity_log_migration_service.go:22-31); internal/app/payload/notes_migration.go | make build | builds; job-name + payload exported; Redis status read/write mirrors activity-log |
| 2 | Extend note store + partial unique index | internal/app/repository/contact_notes/base.go (+legacy_crm_note_id,legacy_owner_label); new CreateNotesBatch over IDbRepo.BulkUpdate (UpdateOneModel+SetUpsert(true)+$setOnInsert, filter (company_sso_id, legacy_crm_note_id); bypass SetDefaults, set caller ts) — not CreateMany; db/migrations/NNN_index_contact_notes_legacy_crm_note_id.{up,down}.json (partial unique) | make migrate-up && make migrate-down && go test ... ./internal/app/repository/contact_notes/ | struct compiles; index up/down; existing notes unaffected; CreateNotesBatch preserves ts; re-run upserts → UpsertedCount=0/MatchedCount=N (no dup, no error) |
| 3 | Contact + owner resolvers | internal/app/consumer/notes_migration_consumer.go (resolve via contact/search.go:27 SearchByAppContactID(ctx,"crm",crmPersonID) — or a batched $in variant — string-cast the id); owner resolve + legacy_owner_label | go test ... ./internal/app/consumer/ | resolves by crm_data.id (= crm_person_id); multi-FK person-first; unmapped→CONTACT_NOT_MAPPED; unmappable owner→null+label |
| 4 | HTML normalizer | internal/pkg/util/html_normalizer.go (+bluemonday in go.mod); deny-by-default policy (Decision 5): UGCPolicy base, structural tags only, AllowStandardURLs+RequireNoFollowOnLinks, no style; strip data-user-id/data-mention anchors→@Name; no <p> wrap; sanitize→then length-check | make build && go test ... ./internal/pkg/util/ | <script>/onerror=/javascript: href/style exfil all neutralised; mention anchors→plain text; safe markup preserved; not wrapped in <p>; >10000 post-sanitize → failure (not truncated) |
| 5 | Attachment processor | internal/app/consumer/... (download from CRM S3/CDN → re-upload to {company_sso_id}/{legacy_crm_note_id}/{asset} → proxy URL + Type) | go test ... ./internal/app/consumer/ | image/audio/document mapped to allowed Type; SSRF host-allow-list (reject non-CRM hosts/internal IPs/metadata); magic-byte vs Type check; max size enforced; download fail → insert-without + ATTACHMENT_*_FAILED |
| 6 | CRM extraction client | internal/app/api/qontak_crm.go (+ListPersonNotes(ctx,cid,page,perPage) built on heimdall httpclient.NewClient(WithHTTPTimeout(...)) + retrier — like iag_mekari.go:69-71, not http.DefaultClient); config/load.go (+CRM_NOTES_EXTRACT_TIMEOUT duration, default 10s) | make build && go test ... ./internal/app/api/ | paginated pull with Authorization; 10s timeout + 3× backoff (1s/3s/9s) on timeout/5xx then CRM_EXTRACT_FAILED; notes-only filter applied |
| 7 | Consumer assembly + worker registration | internal/app/consumer/notes_migration_consumer.go (ProcessNotesMigrationJob(job *work.Job) reads job.Args["data"]); internal/worker/worker_service.go (register NotesMigrationJobName) | go test ... ./internal/app/consumer/ ./internal/worker/ | end-to-end consumer: extract→resolve→sanitize→relink→CreateNotesBatch→progress; per-CID lock; halt at >1% |
| 8 | Service + handler + routes + validation | internal/app/service/notes_migration_service.go (ValidateAndEnqueue, GetMigrationStatus, ValidationRunner via CountWithFilters); internal/app/handler/notes_migration_handler.go; internal/server/rest_router.go (2 /private/notes routes under BasicAuth) | make build && make lint && make test | enqueue returns {job_id}; 403/404/409 guards; match_pct computed; routes BasicAuth-guarded; suite green |
| 9 | Render-path owner-label fallback | internal/app/service/contact_notes/contact_notes_service.go:131-136 (use legacy_owner_label when owner_id empty) | go test ... ./internal/app/service/contact_notes/ | author renders label when owner_id empty; existing path unchanged otherwise |
| 10 | API doc | docs/NOTES_MIGRATION_SERVICE.md (markdown — repo has no OpenAPI spec) | ls docs/NOTES_MIGRATION_SERVICE.md | doc describes the 2 endpoints + the gocraft/work job |
Detail 4.D — Verification & Rollback Recipe
- Pre-merge (in order): 1)
make lint2)make sec3)make test4)make build5)make migrate-up && make migrate-down. - Post-deploy signals (Stage 1):
crm_notes_migration_completedcount > 0 withmatch_pct= 100% on the synthetic CIDs;#cdp-opsquiet (nohalted/P2); re-run a completed CID →notes_migrated=0(idempotency proof); migrated notes visible in the CDP Notes UI with originalcreated_at+ re-linked attachments. - Rollback (in order):
- Flip
crm_notes_migration_enabledOFF (migrate → 403; no new jobs). - If a bad index migration:
make migrate-down(index only; data untouched). - Revert the offending PR; confirm single-CRUD note create/read still works and existing notes are intact (no unique-index collisions).
- Flip
5. Concern, Questions, or Known Limitations
Resolved by grounding (closed in this RFC):
- PRD
POST /cdp/notes/migrateHTTP batch endpoint → not built; in-process repository batch write instead (Decision 1;/cdpnamespace does not exist). - PRD S2S model → HTTP Basic auth on
/private, mirroring/private/activity_logs/migration/status(Decision 2). - Idempotency index → partial unique index (Decision 4) — a plain unique index would corrupt existing notes.
- Migration job store → Redis status (mirror the existing framework), audit map intrinsic to each migrated document (Decision 2).
- Person→Contact resolution →
crm_data.id(indexed), notsource_id(unindexed) (Decision 7). legacy_owner_labelis a net-new field the PRD implied but did not enumerate (Decision 6).- CRM real extraction API is entity-scoped + unpaginated → CRM squad must build a
net-new org-scoped endpoint; extend the existing
QontakCrmClient(Decision 10). - CRM S3 is
public-read→ attachment fetch generally needs no signing (correction to PRD "internal creds"); CDP still re-uploads company-scoped (Decision 8). - No
≤1 voice_notebackend rule exists today (Decision 8 / OQ-8).
Open — adopted default, confirm at the noted gate:
| # | Question | Adopted default | Owner | Blocks? |
|---|---|---|---|---|
| OQ-1 | Migration mechanism: CDP gocraft/work consumer vs Bifrost (Postgres→Mongo)? | CDP gocraft/work consumer reusing the existing framework (Decision 2/10); Bifrost is the fallback | CDP Eng + Platform | confirm at design kickoff |
| OQ-2 | Person→Contact coverage per CID (id-space resolved by REV-1: crm_data.id == crm_person_id) | Pre-migration coverage report per CID (CountDocuments/CountWithFilters); block job start if coverage < 99%; unmatched → 30d failed queue | CDP / Data Eng | gate before each CID |
| OQ-3 | Failed (CONTACT_NOT_MAPPED) notes → retry queue or permanent error log? | 30-day failed-record queue (PRD §7.1) with manual retry after mapping backfill | PM + Eng | no |
| OQ-4 (REV-4) | Migrate all crm_notes or notes-only? | Notes-only — filter crm_note_type_id (default Notes/Documents, ids (1,6)); confirm exact set with PM; agent reads the set from config, not hardcoded | PM | confirm before Stage 1 |
| OQ-5 | Document attachments in scope? | Yes — re-link to CDP doc/pdf/xlsx (Decision 8) | PM + Eng | no |
| OQ-6 | Unmappable-owner notes: edit/delete hidden acceptable? | Yes for historical notes; legacy_owner_label preserves author display | PM | no |
| OQ-7 | CRM extraction at bulk throughput (endpoint vs DB; rate limits) | Load-test the chosen path in staging before Internal QA; configurable inter-page delay | Legacy CRM Squad + CDP | Stage 0 gate |
| OQ-8 | Multiple audios / unsupported types per note (no ≤1 voice_note rule exists today) | Map each audio to voice_note; log+skip unsupported types; add a cap only if PM requires | PM + Eng | no |
| OQ-9 | CRM S3 access (confirm still public-read) + CDP storage data residency for re-uploaded PII | Confirm CRM bucket access at Stage 0; InfoSec confirms CDP bucket region UU-PDP-compliant | InfoSec + CDP Infra + CRM | confirm at Stage 0 |
| OQ-10 | bluemonday (or equivalent) dependency approval + sanitizer policy | Adopt bluemonday with the deny-by-default policy in Decision 5 (UGCPolicy base, structural tags only, no style, scheme-allow-listed href) | CDP BE + InfoSec | confirm before chunk 4 |
| OQ-11 | Per-caller identity/audit on /private/notes/migrate (BasicAuth is a single shared credential) | Front with gateway/mesh identity where available; log the triggering principal with cid/company_sso_id | CDP BE + Platform | confirm at design kickoff |
| _(rfc-reviewer findings REV-1/2/3/5/6 were resolved in this revision — see §6 Comment | ||||
| logs and the companion review's Findings Ledger; REV-4 remains a PM scoping decision, | ||||
| captured as OQ-4 above.)_ |
Known limitations: one-time migration (no ongoing sync); crm_checkin
geolocation dropped (Decision 9); mentions become plain text (no live CDP mentions);
extraction depends on a net-new CRM endpoint (cross-squad); Redis status is
ephemeral (7d) — the durable record is the set of migrated documents themselves.
Future: native CDP mentions migration; a richer durable migration-run audit store if
re-runs need history beyond 7d.
6. Comment logs
| Date | Comment(s) From | Action Item(s) |
|---|---|---|
| 2026-06-18 | rfc-starter (initial draft, grounded vs contact-service + qontak.com live worktrees) | Confirm with CRM squad the net-new org-scoped extraction endpoint (OQ-7); Data Eng confirm crm_data.id semantics + coverage (OQ-2); InfoSec confirm bluemonday (OQ-10) + storage residency (OQ-9) |
| 2026-06-18 | rfc-starter (grounding corrections) | Corrected vs PRD: /cdp/notes/migrate not built (in-process write, Decision 1); S2S = BasicAuth on /private (Decision 2); idempotency index must be partial (Decision 4); status Redis-backed; resolve via indexed crm_data.id not source_id (Decision 7); legacy_owner_label is a net-new field (Decision 6); CRM v4 API entity-scoped + unpaginated (Decision 10); CRM S3 public-read (Decision 8); no ≤1 voice_note rule today (OQ-8) |
| 2026-06-18 | Verification pass (frontmatter linter + mermaid + checklist, run against the live qontak-docs linter) | PASS on all 7 gates; pinned lint-docs.mjs reports zero errors attributed to this RFC; 9/9 mermaid blocks render; frontmatter ↔ Metadata table agree; all 6 stories covered once; no placeholders |
| 2026-06-18 | Security review (Staff-Eng lens + anti-hallucination spot-check vs both worktrees) | Grounding CLEAN (every spot-checked path:line verified; no secrets). Hardened in-doc: deny-by-default bluemonday policy (no style/javascript:, sanitize-then-length, ISSUE-1/2); attachment SSRF host-allow-list + magic-byte + size cap (ISSUE-3); deterministic storage key (ISSUE-4); per-caller audit/mesh identity (ISSUE-5, OQ-11); storage-quota DoS added to threat model |
| 2026-06-18 | rfc-reviewer (backend rubric; score 8.5/10 Agentic-Ready / PROCEED; report co-located at rfc-legacy-migration-crm-notes-review.md) | 8/11 decisions Resolved, 3 Partial (carry adopted defaults). 6 findings promoted to Open Questions: REV-1→OQ-2 (crm_data.id semantics, top risk), REV-4→OQ-4 (notes-only type-id set), REV-2 (pin extraction timeout/backoff), REV-3 (reword batch-insert failure from HTTP "5xx" to Mongo write-error), REV-5 (right-to-delete path), REV-6 (pin bulk-insert driver call) |
| 2026-06-18 | Fix REV findings (grounded against live contact-service + qontak.com) | REV-1 RESOLVED — confirmed crm_data.id == crm_person_id (contact_sync_request.go:104-105, params_mapper.rb:31, centralized_contacts_controller.rb:120-129; no separate centralized-contact id space); resolve via SearchByAppContactID("crm",…) (search.go:27); OQ-2 downgraded to a coverage gate. REV-2 RESOLVED — extractor uses heimdall httpclient (WithHTTPTimeout, iag_mekari.go:69-71) + retrier, 10s/3× backoff, config CRM_NOTES_EXTRACT_TIMEOUT (current QontakCrmClient has no timeout). REV-3+REV-6 RESOLVED — idempotent upsert via existing IDbRepo.BulkUpdate→BulkWrite(SetOrdered(false)) (db.go:180-181), keyed on (company_sso_id, legacy_crm_note_id); no E11000 path; reworded §2.F/§3.A from HTTP to Mongo write semantics. REV-5 RESOLVED — migrated notes inherit native soft-delete (read.go:36-37); no contact-delete→notes cascade exists today (out of scope, applies to native notes equally). REV-4 remains a PM scoping decision (OQ-4). |
7. Ready for agent execution
- yes — for the core BE migration pipeline. The blocking external item (CRM org-scoped extraction endpoint, OQ-7) and the per-CID coverage gate (OQ-2) are prerequisites for running a migration, not for building the pipeline; the extractor is behind an interface so chunks 1–10 can proceed with a stub/contract.
Execution-readiness gates (all met unless noted):
- §1 PRD-to-Schema — every entity/rule mapped to field + endpoint + enforcement: yes.
- Detail 1.C Per-Story Change Map — all 6 stories, layer scope, verifiable AC: yes.
- Repo Reading Guide (Detail 2.0) + contracts classified (reuse/extend/new): yes.
- Source Verification table — concrete evidence per anchor across both repos: yes.
- Mermaid: topology, per-service, repo map, component, ER, state, branch/skip, sequence (happy + failure): yes.
- DDL/collection + partial unique index + per-status lifecycle; every field traces to a PRD-to-Schema row: yes.
- APIs outbound (2, tagged new-with-justification) + inbound (n/a — puller): yes.
- Async Job + Concurrency + Responsibility Boundary specs: yes.
- Failure Mode + Error Response catalogs; Security (XSS sanitization headline, tenant scoping, secrets): yes.
- Configuration Contract + flag; deploy order (CRM → BE → Ops): yes.
- Agent Execution Plan (10 chunks, files + commands + verifiable AC): yes.
- Verification & Rollback Recipe (commands runnable; signals named; partial-index rollback safe): yes.
- Pending (external, do not block build): CRM extraction endpoint (OQ-7),
per-CID coverage ≥99% gate (OQ-2),
bluemondayapproval (OQ-10), storage residency (OQ-9).
Optional next step: hand to
rfc-reviewerfor a second-pass score.