Skip to main content

Task Breakdown — RFC: Legacy Migration — CRM Contact Notes → CDP Notes

Mode: Horizontal (Phase 1: Foundation → Phase 2: Pipeline + API) · Scope: BE-only (contact-service, Go/MongoDB) · No FE work — migrated notes render via the existing CDP Notes UI · Blocked tasks shown inline (full picture).

All 10 RFC execution chunks map to 8 tasks below. The blocking cross-squad dependency (CRM org-scoped extraction endpoint, OQ-7) only blocks running migrations, not building the pipeline — the extractor is behind a CRMNotesExtractor interface and can be stubbed throughout. Every task is buildable today.

Effort Summary

Phase / AreaBE daysQA daysTotal
Phase 1 — Foundation (data model + schema + HTML normalizer)3.51.04.5
Phase 2 — Pipeline components + assembly + API10.03.513.5
Grand total13.54.518.0

Confidence: medium. Key assumptions: (1) MongoDB schemaless — adding legacy_crm_note_id/legacy_owner_label needs no data migration, only a partial unique index; (2) bluemonday dep requires InfoSec approval (OQ-10) before AGREED but is buildable now; (3) CRM extraction client (Task 2.3) is built against a stub contract — the CRM squad's org-scoped endpoint (OQ-7) is the only external blocker for running migrations, not for building; (4) CDP attachment storage quota + CRM S3 access confirmed at Stage 0 (OQ-9), not a build blocker.


Phase 1 — Foundation

Task 1.1: [BE] Data model + job constants + CreateNotesBatch + partial unique index (NOTES-MIG-S03)

An Ops engineer can safely re-trigger a migration at any time — notes already in CDP are skipped without error, zero duplicates are inserted, and the job constants + Redis status scaffold are in place for all downstream tasks.

Status: ✅ Actionable.

What to build

Add legacy_crm_note_id (idempotency key) and legacy_owner_label (unmapped-owner fallback) to the ContactNote struct; create CreateNotesBatch — a timestamp-preserving, skip-on-conflict upsert method over IDbRepo.BulkUpdate; write the partial unique index migration; scaffold the NotesMigrationJobName const, Redis status key, and payload structs that all subsequent tasks consume.

Implementation Plan

ActionFileWhat changes
extendinternal/app/repository/contact_notes/base.goAdd LegacyCRMNoteID string (bson:"legacy_crm_note_id,omitempty") and LegacyOwnerLabel string (bson:"legacy_owner_label,omitempty") to ContactNote; do not touch SetDefaults()
createinternal/app/repository/contact_notes/batch_create.goCreateNotesBatch(ctx, []ContactNote) — uses IDbRepo.BulkUpdate (db.go:180-181) with UpdateOneModel + $setOnInsert + SetUpsert(true), filter {company_sso_id, legacy_crm_note_id}; bypasses SetDefaults(), sets caller CreatedAt/UpdatedAt explicitly; sets IsDeleted=false/Attachments=[] for defaults SetDefaults() would otherwise provide; returns UpsertedCount (migrated) + MatchedCount (skipped)
createdb/migrations/NNN_index_contact_notes_legacy_crm_note_id.up.jsoncreateIndexes on contact_notes: {company_sso_id:1, legacy_crm_note_id:1}, "unique":true, "partialFilterExpression":{"legacy_crm_note_id":{"$exists":true}}; index name uq_contact_notes_company_legacy_crm_note_id
createdb/migrations/NNN_index_contact_notes_legacy_crm_note_id.down.jsondropIndexes for uq_contact_notes_company_legacy_crm_note_id
createinternal/app/service/notes_migration_service.goNotesMigrationJobName const; Redis status key notes_migration:{cid} + TTL 7d (mirror activity_log_migration_service.go:22-31); MigrationStatus struct {Status, ProgressPct, NotesProcessed, NotesTotal, FailureRate, MatchPct}
createinternal/app/payload/notes_migration.goNotesMigrationRequest{CID, CompanySsoID}, NotesMigrationResponse{JobID, Status}
createinternal/app/repository/contact_notes/batch_create_test.goTests: new note inserted with caller ts (UpsertedCount=1); same note re-upserted → UpsertedCount=0, MatchedCount=1, no E11000; full re-run → UpsertedCount=0; human-written note (no legacy_crm_note_id) unaffected

Implementation steps

  1. Write failing tests (red) — Create batch_create_test.go: (a) new note with LegacyCRMNoteID="crm-1"UpsertedCount=1, CreatedAt equals caller value (not time.Now()); (b) same note re-upserted → UpsertedCount=0, MatchedCount=1, no error; (c) human-written note without legacy_crm_note_id is unaffected. Run make test, confirm red.
  2. Extend the struct — Add LegacyCRMNoteID and LegacyOwnerLabel to ContactNote in internal/app/repository/contact_notes/base.go. Do not modify SetDefaults() — the single-CRUD create path must remain untouched (create.go:12).
  3. Implement CreateNotesBatch — Build over IDbRepo.BulkUpdate: for each ContactNote, create mongo.NewUpdateOneModel().SetFilter(bson.M{"company_sso_id":..., "legacy_crm_note_id":...}).SetUpdate(bson.M{"$setOnInsert": note}).SetUpsert(true); set IsDeleted=false and Attachments=[] explicitly; never call SetDefaults().
  4. Write index migration.up.json with partialFilterExpression:{"legacy_crm_note_id":{"$exists":true}}, following the pattern of 013_create_contact_notes.up.json; .down.json drops by name. Run make migrate-up && make migrate-down, confirm index applies and rolls back without touching existing notes.
  5. Scaffold service constants — Create notes_migration_service.go with NotesMigrationJobName, Redis status key + TTL; create notes_migration.go payload structs.
  6. Go greenmake test.
  7. Quality gatemake build && make migrate-up && make migrate-down.

Acceptance criteria

  • CreateNotesBatch stores created_at/updated_at from caller — not from time.Now() (NOTES-MIG-S02/AC-4, Decision 3).
  • Re-upsert of an existing legacy_crm_note_idUpsertedCount=0, MatchedCount=1, no E11000 error (NOTES-MIG-S03/AC-2).
  • Full re-run where all notes exist → notes_migrated=0, notes_skipped=N (NOTES-MIG-S03/AC-3).
  • Partial unique index does not affect human-written notes (no legacy_crm_note_id) — no E11000 on the second human note per company (Decision 4 correctness).
  • make migrate-up && make migrate-down applies and rolls back cleanly; data untouched.
  • NotesMigrationJobName, Redis status key + TTL, and payload structs exported and compile.

Test strategy

Go table tests in batch_create_test.go seed a test Mongo collection with and without legacy_crm_note_id, run CreateNotesBatch twice, assert UpsertedCount/MatchedCount/zero-error and caller-ts preservation. Index migration validated by running the JSON and confirming existing notes are untouched.

Effort estimate

DisciplineDays
Backend2.0
QA0.5
Total2.5

Assumptions: IDbRepo.BulkUpdate already exists at db.go:180-181 with BulkWrite(SetOrdered(false)); JSON index pattern mirrors 013_create_contact_notes.up.json; MongoDB schemaless — no data migration.

Run to verify

make test && make build && make migrate-up && make migrate-down

Depends on

  • None.

Task 1.2: [BE] HTML Normalizer — bluemonday deny-by-default sanitizer + CRM mention-anchor strip; no <p> re-wrap (NOTES-MIG-S02, NOTES-MIG-S06-NEG)

Every CRM note's HTML is stripped of XSS payloads and dangling CRM mention anchors before entering CDP — safe rich markup is preserved, not flattened to plain text.

Status: ⚠️ Partially blocked — InfoSec approval of the bluemonday allow-list (OQ-10) is required before AGREED. The implementation is fully buildable and reviewable now.

What to build

HtmlNormalizer with a deny-by-default bluemonday policy: structural tags only, no style, AllowStandardURLs on <a> (http/https/mailto only), RequireNoFollowOnLinks. Pre-pass strips CRM mention anchors (data-user-id, /users/{id}/edit_user hrefs) to plain @Name text before sanitization. Post-sanitize: validate ≤ 10,000 chars — return error if exceeded, never truncate.

The bluemonday policy (deny-by-default):

  • Base: bluemonday.UGCPolicy() — strips style, scripts, event handlers
  • Allow tags: a b i strong em u s span br div p ul ol li blockquote h1 h2 h3 h4 h5 h6 pre
  • On <a>: AllowStandardURLs() (http/https/mailto) + RequireNoFollowOnLinks(true); no style
  • Do not wrap output in <p>

Implementation Plan

ActionFileWhat changes
createinternal/pkg/util/html_normalizer.goHtmlNormalizer.Normalize(html string) (string, error): (1) pre-pass — replace <a data-user-id …>@Name</a> and <a href="…/users/{id}/edit_user…">@Name</a> with @Name plain text; (2) bluemonday sanitize with deny-by-default policy; (3) post-sanitize length check — len > 10000ErrNoteTooLong, not truncated
extendgo.mod / go.sumAdd github.com/microcosm-cc/bluemonday
createinternal/pkg/util/html_normalizer_test.goTable tests: XSS payloads neutralized (<script>, onerror=, javascript: href, style exfil); mention anchors → @Name text; safe markup preserved; no <p> wrap on bare text; post-sanitize > 10,000 chars → ErrNoteTooLong

Implementation steps

  1. Write failing tests (red) — Create html_normalizer_test.go with table tests: (a) <script>alert(1)</script> → empty; (b) <p onerror="x"> → attr stripped; (c) href="javascript:void(0)" → link stripped; (d) <a data-user-id="123">@Alice</a>@Alice plain text; (e) <a href="/users/123/edit_user">@Bob</a>@Bob plain text; (f) <strong>bold</strong> → preserved; (g) bare text not wrapped in <p>; (h) string > 10,000 chars post-sanitize → ErrNoteTooLong. Run make test, confirm red.
  2. Add dependencygo get github.com/microcosm-cc/bluemonday.
  3. Implement pre-pass — Regex-replace CRM mention anchor patterns to their inner @Name text before sanitization.
  4. Implement sanitize — Build deny-by-default bluemonday policy per spec above; run on pre-passed output.
  5. Implement post-sanitize length check — If len(sanitized) > 10000 → return "", ErrNoteTooLong; never truncate silently.
  6. Go greenmake test.
  7. Quality gatemake lint && make sec && make build.

Gate: InfoSec must approve the allow-list (OQ-10) before AGREED. Build and review now; get sign-off before merging.

Acceptance criteria

  • <script>, onerror=, javascript: href, style exfil all neutralized — stored XSS impossible via this path (NOTES-MIG-S02/AC-2, Decision 5).
  • CRM mention anchors (data-user-id, /users/{id}/edit_user hrefs) → plain @Name text; no dangling links, no CDP mention notification (NOTES-MIG-S06-NEG/NEG-1, Decision 5).
  • Safe structural markup (<strong>, <em>, <ul>, <blockquote>, etc.) is preserved — not stripped to plain text (Decision 5 rationale).
  • Output is not wrapped in a <p> tag — this was an explicit v1.1 error (Decision 5).
  • Post-sanitize length > 10,000 chars → ErrNoteTooLong returned; note counted as failure, never truncated.
  • make sec (gosec) reports no new findings on the normalizer.

Test strategy

Go table tests in html_normalizer_test.go assert positive cases (safe markup survives) and negative cases (XSS stripped, mentions stripped, no <p> wrap, oversized → error). Tests are pure (no I/O) — fast and exhaustive.

Effort estimate

DisciplineDays
Backend1.5
QA0.5
Total2.0

Assumptions: bluemonday dep approved by InfoSec (OQ-10); policy is deny-by-default, not a mirror of CRM's Rails allow-list (which permits style and unscoped href) per Decision 5.

Run to verify

make test && make lint && make sec && make build

Depends on

  • None. Gate: OQ-10 InfoSec approval of the allow-list before AGREED.

Phase 2 — Pipeline Components + Assembly + API

Task 2.1: [BE] Contact resolver + Owner resolver (NOTES-MIG-S02)

Each CRM note lands on the right CDP contact and shows the right author — or a readable fallback label when the original author's account can't be resolved.

Status: ✅ Actionable.

What to build

ContactResolver: batch-resolve crm_person_idcontact_id via SearchWithFilters(bson.M{"crm_data.id":{"$in":[...]}}) against the existing crm_contact_index; string-cast CRM int IDs; apply person-first precedence for multi-FK notes. OwnerResolver: map CRM creator_id → SSO UUID via GetUserNamesBulk; on failure set OwnerID="" + populate LegacyOwnerLabel.

Implementation Plan

ActionFileWhat changes
createinternal/app/consumer/notes_migration_consumer.go (initial scaffold)ContactResolver.Resolve(ctx, companySsoID string, crmPersonIDs []string) (map[string]string, []string) — drives ContactRepository.SearchWithFilters(bson.M{"crm_data.id":{"$in": ids},"company_sso_id":...}); string-casts CRM int IDs; person-first precedence for multi-FK notes; unresolved IDs → CONTACT_NOT_MAPPED list
extendinternal/app/consumer/notes_migration_consumer.goOwnerResolver.Resolve(ctx, creatorIDs []string) map[string]OwnerResult — calls GetUserNamesBulk (mirroring contact_notes_service.go:131-136); on failure sets OwnerID="" + LegacyOwnerLabel (CRM display name or "[Legacy CRM User]")
createinternal/app/consumer/notes_migration_consumer_test.goTests: resolved by crm_data.id matching crm_person_id string-cast; multi-FK note → person wins; no match → CONTACT_NOT_MAPPED; unmappable owner → OwnerID="" + label non-empty

Implementation steps

  1. Write failing tests (red) — Create notes_migration_consumer_test.go: (a) ContactResolver with a mocked contact crm_data.id="42" resolves note crm_person_id=42 (string-cast from int); (b) multi-FK note → person-first; (c) unresolvable → in the CONTACT_NOT_MAPPED list; (d) OwnerResolver with unmappable creator_idOwnerID="", LegacyOwnerLabel non-empty. Run make test, confirm red.
  2. Implement ContactResolver — Drive ContactRepository.SearchWithFilters with bson.M{"crm_data.id":{"$in": crmPersonIDs}, "company_sso_id": companySsoID} (pattern from contact/search.go:125); build a crm_person_id → contact_id map; string-cast CRM IDs before lookup (crm_data.id stored as string, base.go:343).
  3. Implement OwnerResolver — Call GetUserNamesBulk (pattern from contact_notes_service.go:131-136); for each unmapped creator_id set OwnerID="" and populate LegacyOwnerLabel from CRM display name.
  4. Go greenmake test.
  5. Quality gatemake lint && make build.

Acceptance criteria

  • CRM crm_person_id (int, string-cast) resolves to CDP contact_id via the indexed crm_data.id field — no collection scan (NOTES-MIG-S02/AC-1, Decision 7).
  • Multi-FK note: person takes precedence over company/deal/ticket (Decision 7).
  • No CDP contact match → note ID added to CONTACT_NOT_MAPPED list; note skipped and counted as failure (NOTES-MIG-S02/ERR-1).
  • Unmappable creator_idOwnerID="" + LegacyOwnerLabel set; note still inserted (non-blocking) (NOTES-MIG-S02/ERR-3, Decision 6).

Test strategy

Go unit tests with mocked ContactRepository and mocked GetUserNamesBulk; table-driven for contact resolution (match, multi-FK, no-match) and owner resolution (mapped, unmapped with label).

Effort estimate

DisciplineDays
Backend1.5
QA0.5
Total2.0

Assumptions: ContactRepository.SearchWithFilters already accepts bson.M filter at contact/search.go:125; crm_data.id == crm_person_id confirmed (RFC REV-1); GetUserNamesBulk already exists at contact_notes_service.go:131-136.

Run to verify

make test && make lint && make build

Depends on

  • [Task 1.1] (ContactNote struct with LegacyOwnerLabel field).

Task 2.2: [BE] Attachment processor — SSRF guard + download + type-map + re-upload (NOTES-MIG-S02)

Every CRM note's attachments — images, audios, and documents — are safely re-hosted in company-scoped CDP storage, never referencing legacy CRM S3 URLs, with SSRF protection on every outbound download.

Status: ✅ Actionable. CDP storage quota + CRM S3 access confirmation is a Stage 0 gate (OQ-9), not a build blocker.

What to build

AttachmentProcessor: for each CRM attachment URL (from crm_note_images, crm_note_audios, crm_note_attachments documents) — validate host against a CRM S3/CDN allow-list (SSRF guard), reject internal IPs and metadata endpoints, validate magic bytes vs declared type, enforce max download size, re-upload to deterministic key {company_sso_id}/{legacy_crm_note_id}/{asset} in CDP storage, return proxy URL + mapped Type. Failures are non-blocking: note inserts without the failed attachment.

Implementation Plan

ActionFileWhat changes
createinternal/app/consumer/attachment_processor.goAttachmentProcessor.Process(ctx context.Context, companySsoID, legacyCRMNoteID string, attachments []CRMAttachment) ([]ContactNoteAttachment, []AttachmentError)
extendinternal/app/consumer/attachment_processor.goSSRF allow-list of CRM S3/CDN hostnames; reject 10.x, 172.16-31.x, 192.168.x, 169.254.x, arbitrary hosts; magic-byte validation vs declared content type; max download size cap; type mapping: CRM image → image; audio → voice_note (or video for video/*); document → doc/pdf/xlsx by extension/content-type (default doc)
extendinternal/app/consumer/attachment_processor.goDeterministic storage key: {company_sso_id}/{legacy_crm_note_id}/{asset} — idempotent on re-run (same key overwrites safely)
createinternal/app/consumer/attachment_processor_test.goTests: image/audio/document mapped to correct Type; URL from 169.254.169.254ATTACHMENT_DOWNLOAD_FAILED, other attachments continue; magic-byte mismatch → failure; file over max size → failure; re-run uploads to same deterministic key

Implementation steps

  1. Write failing tests (red) — Create attachment_processor_test.go: (a) image URL from allowed CRM S3 host → Type="image", proxy URL stored; (b) URL pointing to 169.254.169.254ATTACHMENT_DOWNLOAD_FAILED, processing continues for remaining attachments; (c) .pdf with PDF magic bytes → Type="pdf"; (d) .pdf with mismatched magic bytes → failure; (e) file exceeding max size → failure; (f) re-run → same storage key written (overwrite). Run make test, confirm red.
  2. Implement SSRF guard — Parse URL host; reject if not in the CRM S3/CDN allow-list; reject internal IP ranges explicitly; return AttachmentError{Code: "ATTACHMENT_SSRF_BLOCKED"} for rejected URLs.
  3. Implement downloadhttp.NewRequest with a context-bound timeout; read body up to maxAttachmentBytes limit; capture content-type header.
  4. Implement magic-byte check — Read first N bytes; match against expected magic bytes for the declared type; mismatch → failure.
  5. Implement type mapping — Map CRMAttachment.Type + file extension + content-type to one of {image, doc, pdf, video, voice_note, xlsx} (validated against contact_notes_service.go:286-293 allow-set; default doc).
  6. Implement re-upload — Write to {company_sso_id}/{legacy_crm_note_id}/{filename} in CDP storage; store proxy URL + Type + FileName + FileSizeInByte.
  7. Go greenmake test.
  8. Quality gatemake lint && make sec && make build.

Acceptance criteria

  • Images, audios, and crm_note_attachments documents are all processed and re-hosted in CDP storage (NOTES-MIG-S02/AC-3, Decision 8).
  • SSRF guard rejects non-allow-listed hosts, internal IPs (10.x, 192.168.x), and the cloud metadata endpoint (169.254.169.254) (Decision 8 security).
  • Magic bytes validated against declared content type before upload.
  • Download failure → note inserted without that attachment; ATTACHMENT_DOWNLOAD_FAILED logged; note not counted as failed (NOTES-MIG-S02/ERR-2).
  • Storage key {company_sso_id}/{legacy_crm_note_id}/{asset} is deterministic — safe to overwrite on re-run (§2.E).
  • Resulting Type is one of {image, doc, pdf, video, voice_note, xlsx} — no unvalidated type reaches the DB.
  • make sec reports no new gosec findings on the outbound fetch path.

Test strategy

Go table tests with a mock HTTP server (simulating CRM S3) and mock CDP storage client assert type mapping, SSRF rejection, magic-byte validation, size cap enforcement, and non-blocking failure behavior.

Effort estimate

DisciplineDays
Backend2.0
QA0.5
Total2.5

Assumptions: CRM S3 is public-read ACL — no signing required (confirmed in RFC grounding, carrierwave-s3.rb:27,58); CDP storage client exists and accepts a key + byte payload; crm_note_attachments documents are in scope (OQ-5 resolved: yes).

Run to verify

make test && make lint && make sec && make build

Depends on

  • [Task 1.1] (ContactNoteAttachment struct from base.go).

Task 2.3: [BE] CRM extraction client — ListPersonNotes + heimdall retrier + CRMNotesExtractor interface (NOTES-MIG-S01, NOTES-MIG-S02)

The pipeline can paginate all Person notes for a CID from Legacy CRM using a properly timeout-guarded S2S client — not the no-timeout http.DefaultClient currently on QontakCrmClient.

Status: ⚠️ Partially blocked — the CRM org-scoped endpoint does not exist yet (OQ-7, Legacy CRM Squad dependency). Actionable now: build ListPersonNotes behind a CRMNotesExtractor interface with a stub; all downstream tasks (2.4) compile and test against the stub until OQ-7 resolves.

What to build

Extend QontakCrmClient (qontak_crm.go) with ListPersonNotes(ctx, cid string, page, perPage int) ([]CRMNote, error) — built on the heimdall httpclient pattern (iag_mekari.go:69-71), not http.DefaultClient (which has no timeout); 10s timeout via CRM_NOTES_EXTRACT_TIMEOUT; 3 retries with exponential backoff 1s/3s/9s on timeout + 5xx/Locked(423)/429. Define CRMNote payload struct and CRMNotesExtractor interface.

Implementation Plan

ActionFileWhat changes
extendinternal/app/api/qontak_crm.goAdd ListPersonNotes(ctx context.Context, cid string, page, perPage int) ([]CRMNote, error) — uses httpclient.NewClient(WithHTTPTimeout(timeout)) + heimdall retrier (3×, 1s/3s/9s), existing Authorization: {CRM_API_AUTH} header, existing 5xx/Locked/429 handling pattern from :43-47; not http.DefaultClient
createinternal/app/payload/crm_note.goCRMNote{ID, Note, CreatorID, CRMPersonID, CRMNoteTypeID, CRMNoteImages, CRMNoteAudios, CRMNoteAttachments, CreatedAt, UpdatedAt}
createinternal/app/api/crm_notes_extractor.goCRMNotesExtractor interface (ListPersonNotes); CRMNotesExtractorStub implementation returning hardcoded fixtures for use in consumer tests (Task 2.4)
extendconfig/load.goCRM_NOTES_EXTRACT_TIMEOUTgetDurationOrPanic("CRM_NOTES_EXTRACT_TIMEOUT") with default 10s
createinternal/app/api/qontak_crm_notes_test.goTests: correct Authorization header; 5xx → retried 3× with backoff → CRM_EXTRACT_FAILED; 429 → retried; successful page → []CRMNote returned

Implementation steps

  1. Write failing tests (red) — Create qontak_crm_notes_test.go: (a) ListPersonNotes sends Authorization header from CRM_API_AUTH; (b) mock returns 500 three times → CRM_EXTRACT_FAILED after 3 attempts; (c) mock returns 429 once then 200 → success after retry; (d) mock returns paginated response → []CRMNote correctly unmarshaled. Run make test, confirm red.
  2. Define interface + stub — Create CRMNotesExtractor interface in crm_notes_extractor.go; implement CRMNotesExtractorStub returning a fixture []CRMNote — used by Task 2.4 consumer tests until OQ-7 resolves.
  3. Implement ListPersonNotes — Use httpclient.NewClient(WithHTTPTimeout(cfg.CRMNotesExtractTimeout)) (pattern from api/iag_mekari.go:69-71); add heimdall retrier (3 attempts, exponential 1s/3s/9s); pass existing Authorization: {CRM_API_AUTH} header; reuse 5xx/Locked/429 handling from qontak_crm.go:43-47.
  4. Add configgetDurationOrPanic("CRM_NOTES_EXTRACT_TIMEOUT") in config/load.go; default duration 10s.
  5. Go greenmake test.
  6. Quality gatemake lint && make build.

Once OQ-7 resolves: point CRM_NOTES_EXTRACT_TIMEOUT config and the endpoint URL at the CRM squad's new endpoint — no consumer code change required (the interface isolates it).

Acceptance criteria

  • ListPersonNotes uses heimdall httpclient with 10s timeout — not http.DefaultClient (Decision 10 timeout correction, RFC REV-2).
  • 5xx / Locked (423) / 429 → retried 3× with 1s/3s/9s exponential backoff; budget exhausted → CRM_EXTRACT_FAILED (NOTES-MIG-S01 extraction failure path).
  • Authorization: {CRM_API_AUTH} header present on every request (reuses existing pattern).
  • CRMNotesExtractor interface lets Task 2.4 compile and test against the stub — entire pipeline buildable without OQ-7.
  • (pending OQ-7) Real CRM endpoint URL wired via config — zero code change once available.

Test strategy

Go tests with a mock HTTP server assert request shape (URL, headers, pagination params), retry behavior (3 attempts on 5xx), and timeout propagation; stub tests confirm fixture is returned correctly.

Effort estimate

DisciplineDays
Backend1.5
QA0.5
Total2.0

Assumptions: QontakCrmClient + auth config (CRM_API_ROOT_URL/CRM_API_AUTH) already exist (qontak_crm.go:14-24, config/load.go:197-198); heimdall pattern already used in api/iag_mekari.go:69-71 and qontak_billing.go:183-185.

Run to verify

make test && make lint && make build

Depends on

  • None. External blocker for running migrations: OQ-7 (CRM squad delivers org-scoped endpoint). Stage 0 gate: OQ-9 (confirm CRM S3 still public-read; CDP storage residency).

Task 2.4: [BE] Consumer assembly — ProcessNotesMigrationJob + note-type filter + failure guard + ValidationRunner + worker registration (NOTES-MIG-S01, NOTES-MIG-S02, NOTES-MIG-S03, NOTES-MIG-S04, NOTES-MIG-S06-NEG)

The end-to-end migration pipeline runs as a background worker: extract → transform per note → batch insert → halt if failure rate > 1% → validate match_pct — every failure logged with a reason code, zero silent drops.

Status: ✅ Actionable (Tasks 1.1, 1.2, 2.1, 2.2, 2.3 must land first; CRM extractor uses stub from Task 2.3).

What to build

NotesMigrationConsumer.ProcessNotesMigrationJob(job *work.Job) — the full assembled pipeline: paginated extract (via CRMNotesExtractor) → filter out-of-scope note types → batch-resolve contacts (Task 2.1) → batch-resolve owners (Task 2.1) → normalize HTML (Task 1.2) → process attachments (Task 2.2) → CreateNotesBatch (Task 1.1) → update Redis progress → check failure rate → halt if > 1%. Post-batches: ValidationRunner.Run (count compare) → set terminal status. Register job in worker_service.go.

Implementation Plan

ActionFileWhat changes
extendinternal/app/consumer/notes_migration_consumer.goProcessNotesMigrationJob(job *work.Job) error: unmarshal job.Args["data"] (mirror activity_log_migration_consumer.go:38-47); paginated extract loop; per-batch: filter note types → resolve contacts → resolve owners → normalize HTML → process attachments → CreateNotesBatch → update Redis {progress_pct, notes_processed}
extendinternal/app/consumer/notes_migration_consumer.goPer-batch failure-rate check: failure_rate = failure_count / total_processed; > 0.01 → set Redis halted + log crm_notes_migration_halted → return job (manual re-trigger required); never silently drop — every failure logged {legacy_crm_note_id, reason_code, details}
extendinternal/app/consumer/notes_migration_consumer.goNote-type filter: check CRMNote.CRMNoteTypeID against config allow-set (default (1,6) Notes/Documents); excluded → out_of_scope_count++, not failure_count
extendinternal/app/consumer/notes_migration_consumer.goValidationRunner.Run(ctx, cid, companySsoID): CountWithFilters(bson.M{"company_sso_id":..., "legacy_crm_note_id":{"$exists":true}}) vs CRM source count → match_pct; ≥ 99%completed_success; < 99%completed_with_errors; source count unavailable → VALIDATION_SKIPPED + completed_with_errors
extendinternal/worker/worker_service.goregisterJobWithOptions(NotesMigrationJobName, opts, consumer.ProcessNotesMigrationJob, pool) (mirror :132,138)
extendinternal/app/consumer/notes_migration_consumer_test.goTests: end-to-end happy path (extract → resolve → normalize → insert → progress); out-of-scope type → out_of_scope_count++ not failure; failure_rate > 1% → halted; CONTACT_NOT_MAPPED → skip + count failure; full re-run → notes_migrated=0; match_pct ≥ 99%completed_success; match_pct < 99%completed_with_errors

Implementation steps

  1. Write failing tests (red) — Extend notes_migration_consumer_test.go: (a) happy path — 10 notes extracted, resolved, sanitized, inserted, notes_processed=10, progress_pct updated in Redis; (b) note with out-of-scope crm_note_type_idout_of_scope_count++, failure_count unchanged; (c) 3 of 200 notes fail contact resolution → failure_rate=1.5% → status halted; (d) full re-run (all exist) → UpsertedCount=0, MatchedCount=N, completed_success; (e) match_pct=98%completed_with_errors. Run make test, confirm red.
  2. Implement ProcessNotesMigrationJob — Follow ActivityLogMigrationConsumer shape (activity_log_migration_consumer.go:25-50): unmarshal args → check per-CID Redis in-progress lock (409 guard lives in the service layer from Task 2.5, but the consumer also checks and exits if another job is running) → extract in pagination loop (batch perPage=500) → process per note through the pipeline.
  3. Implement note-type filter — Read crm_note_type_id allow-set from config (default (1,6)); excluded notes → out_of_scope_count++ only (not failures).
  4. Implement per-batch halt check — After each CreateNotesBatch call: failure_rate = float64(failure_count) / float64(notes_processed); if > 0.01 → write halted to Redis, log crm_notes_migration_halted with {job_id, cid, failure_rate}, return the job.
  5. Implement ValidationRunner — After all batches: CountWithFilters(bson.M{"company_sso_id":..., "legacy_crm_note_id":{"$exists":true}}) (mirrors contact/search.go:147); compare to CRM source count; write terminal status to Redis.
  6. Register jobregisterJobWithOptions(NotesMigrationJobName, opts, consumer.ProcessNotesMigrationJob, pool) in worker_service.go (mirror :132,138).
  7. Go greenmake test.
  8. Quality gatemake lint && make build.

Acceptance criteria

  • End-to-end: extract → transform → CreateNotesBatch → Redis progress updated per batch (NOTES-MIG-S01/AC-3).
  • Out-of-scope crm_note_type_idout_of_scope_count++; not counted as a failure (NOTES-MIG-S06-NEG/NEG-2, Decision 11).
  • failure_rate > 1% within a batch → Redis status halted, crm_notes_migration_halted logged (NOTES-MIG-S01/ERR-3).
  • Zero silent failures — every failed note logged with {legacy_crm_note_id, reason_code, details} (§1 Success Criteria).
  • Full re-run → notes_migrated=0, notes_skipped=N, completed_success (NOTES-MIG-S03/AC-3).
  • match_pct ≥ 99%completed_success; < 99%completed_with_errors + alert (NOTES-MIG-S04/AC-2, ERR-1).
  • NotesMigrationJobName registered in worker_service.go; make build produces a working worker binary.

Test strategy

Go integration-style tests in notes_migration_consumer_test.go with CRMNotesExtractorStub (Task 2.3), mocked ContactRepository, mocked HtmlNormalizer, mocked AttachmentProcessor, and a test Mongo for CreateNotesBatch. Assertions cover the full pipeline, halt trigger, out-of-scope filter, and both ValidationRunner terminal states.

Effort estimate

DisciplineDays
Backend2.5
QA1.0
Total3.5

Assumptions: gocraft/work job shape mirrors ActivityLogMigrationConsumer.ProcessUpdateUserIDJob (activity_log_migration_consumer.go:25-50); per-CID in-progress lock mirrors the activity-log Redis status key; batch size 500 (max 1000) from config.

Run to verify

make test && make lint && make build

Depends on

  • [Task 1.1] (CreateNotesBatch, NotesMigrationJobName, Redis status), [Task 1.2] (HtmlNormalizer), [Task 2.1] (ContactResolver, OwnerResolver), [Task 2.2] (AttachmentProcessor), [Task 2.3] (CRMNotesExtractor interface + stub).

Task 2.5: [BE] Migration service + handler + routes — trigger, status endpoint, flag guard, full error catalog (NOTES-MIG-S01, NOTES-MIG-S04)

An Ops engineer can trigger a migration job for any CID via S2S (POST /private/notes/migrate) and poll its status (GET /private/notes/migration/status) — with all guards: flag check, idempotency, concurrent-job prevention, and the full error response catalog.

Status: ✅ Actionable.

What to build

NotesMigrationService.ValidateAndEnqueue (flag gate → duplicate check → per-CID in-progress lock → JobEnqueuer.EnqueueJob → Redis in_progress) + GetMigrationStatus (Redis read). NotesMigrationHandler with POST /private/notes/migrate and GET /private/notes/migration/status. Register both routes under the existing /private BasicAuth group in rest_router.go.

Implementation Plan

ActionFileWhat changes
extendinternal/app/service/notes_migration_service.goValidateAndEnqueue(ctx, req NotesMigrationRequest) (NotesMigrationResponse, error): check flag → 403 FLAG_DISABLED; check Redis for completed_success → 409 ALREADY_MIGRATED; acquire per-CID in-progress lock → 409 JOB_ALREADY_RUNNING; JobEnqueuer.EnqueueJob(NotesMigrationJobName, work.Q{"data": req}) (pattern job_enqueuer.go:65-67); write Redis in_progress; return {job_id}
extendinternal/app/service/notes_migration_service.goGetMigrationStatus(ctx, cid string) (MigrationStatus, error): read Redis key notes_migration:{cid}; absent → {status: "not_started"}
createinternal/app/handler/notes_migration_handler.goNotesMigrationHandler{Migrate(w,r), GetStatus(w,r)} — mirrors activity_log_migration_handler.go:32-91; uses myhttp.NewJSONResponse/ErrBadRequest; error response shape {"error":"CODE","message":"...","details":{}}
extendinternal/server/rest_router.goRegister private.Post("/notes/migrate", handler.Migrate) + private.Get("/notes/migration/status", handler.GetStatus) under the /private group guarded by mymiddleware.BasicAuth (:70); add after existing /private routes
createinternal/app/handler/notes_migration_handler_test.goTests: valid BasicAuth + flag ON → 200 {job_id}; flag OFF → 403 FLAG_DISABLED; completed → 409 ALREADY_MIGRATED; in-progress lock → 409 JOB_ALREADY_RUNNING; CID not found → 404 CID_NOT_FOUND; missing BasicAuth → 401; GET returns progress fields; unknown CID → {status:"not_started"}

Implementation steps

  1. Write failing tests (red) — Create notes_migration_handler_test.go: (a) valid POST → 200 {job_id, status:"in_progress"}; (b) flag OFF → 403 FLAG_DISABLED; (c) already completed_success in Redis → 409 ALREADY_MIGRATED; (d) in-progress lock held → 409 JOB_ALREADY_RUNNING; (e) missing/invalid BasicAuth → 401/403; (f) GET with known CID → {status, progress_pct, notes_processed, notes_total, failure_rate, match_pct}; (g) GET with unknown CID → {status:"not_started"}. Run make test, confirm red.
  2. Implement ValidateAndEnqueue — Sequential: check crm_notes_migration_enabled flag for CID (403) → read Redis for completed_success (409) → try-acquire in-progress lock (409 if held) → JobEnqueuer.EnqueueJob(NotesMigrationJobName, work.Q{"data": req}) → set Redis in_progress → return {job_id: result.JobID}.
  3. Implement GetMigrationStatus — Read Redis key notes_migration:{cid}; absent → MigrationStatus{Status: "not_started"}.
  4. Implement handlerPOST decodes body → calls ValidateAndEnqueue; GET reads ?cid= query param → calls GetMigrationStatus. Use myhttp.NewJSONResponse for success, typed error codes from §3.B for failures.
  5. Register routes — In rest_router.go, inside the private group block (:69-79): private.Post("/notes/migrate", handler.Migrate) + private.Get("/notes/migration/status", handler.GetStatus).
  6. Go greenmake test.
  7. Quality gatemake lint && make build.

Acceptance criteria

  • POST /private/notes/migrate valid BasicAuth + flag ON → 200 {job_id, status:"in_progress"} (NOTES-MIG-S01/AC-1).
  • Flag OFF → 403 FLAG_DISABLED; no job enqueued (NOTES-MIG-S01/ERR-1).
  • CID already completed_success → 409 ALREADY_MIGRATED (NOTES-MIG-S01/ERR-2).
  • Per-CID in-progress lock held → 409 JOB_ALREADY_RUNNING (NOTES-MIG-S03/ERR-1).
  • Non-BasicAuth call (no Authorization header) → 401/403; IAG JWT not accepted (NOTES-MIG-S01/ERR-4).
  • GET /private/notes/migration/status?cid={status, progress_pct, notes_processed, notes_total, failure_rate, match_pct} (NOTES-MIG-S01/AC-2).
  • Both routes are under the /private BasicAuth group — verified in rest_router.go and by handler test.

Test strategy

Go handler tests with mocked NotesMigrationService assert the full error response catalog (HTTP status + error code strings) and the happy-path response shape. Manual: curl -u user:pass -X POST localhost:.../private/notes/migrate -d '{"cid":"..."}' verifies route registration against a local server.

Effort estimate

DisciplineDays
Backend2.0
QA1.0
Total3.0

Assumptions: JobEnqueuer.EnqueueJob already exists at job_enqueuer.go:38-67; mymiddleware.BasicAuth already guards /private at rest_router.go:70; error response shape follows existing myhttp conventions in the codebase.

Run to verify

make test && make lint && make build

Depends on

  • [Task 1.1] (NotesMigrationJobName, Redis status key + TTL, NotesMigrationRequest/Response payload), [Task 2.4] (consumer registered in worker_service.go).

Task 2.6: [BE] Render-path legacy_owner_label fallback in contact_notes_service.go (NOTES-MIG-S05)

A migrated note whose original author couldn't be mapped to an SSO user still shows a readable author name in the CDP Notes UI — not a blank field.

Status: ✅ Actionable. This is the only read-path change in the entire RFC — a single conditional branch.

What to build

In contact_notes_service.go:131-136, after GetUserNamesBulk resolves owner names, add a fallback: if a note's resolved owner_name is empty and note.LegacyOwnerLabel is non-empty, use LegacyOwnerLabel as the display name. The live-permission logic (contact_notes_handler.go:143-166) is untouched — edit/delete may remain hidden for label-only notes, which is accepted for historical notes.

Implementation Plan

ActionFileWhat changes
extendinternal/app/service/contact_notes/contact_notes_service.go:131-136After GetUserNamesBulk, for any note where resolved name is "" and note.LegacyOwnerLabel != "", set display name = note.LegacyOwnerLabel
extendinternal/app/service/contact_notes/contact_notes_service_test.goTests: note with OwnerID="" + LegacyOwnerLabel="Former Agent" → resolved author is "Former Agent"; note with valid OwnerID → live SSO name (existing behavior unchanged)

Implementation steps

  1. Write failing tests (red) — Add two test cases in contact_notes_service_test.go: (a) note with OwnerID="", LegacyOwnerLabel="Former Agent" → resolved author in response is "Former Agent"; (b) note with valid OwnerID="abc-uuid", LegacyOwnerLabel="" → resolved author is the live SSO name from GetUserNamesBulk (existing path). Run make test, confirm red.
  2. Implement fallback — In the GetUserNamesBulk resolution block at :131-136, add: if resolvedName == "" && note.LegacyOwnerLabel != "" { resolvedName = note.LegacyOwnerLabel }.
  3. Go greenmake test.
  4. Quality gatemake lint && make build.

Acceptance criteria

  • Note with owner_id=null + legacy_owner_label="Former Agent" → renders "Former Agent" as author, not blank (NOTES-MIG-S05/AC-3, Decision 6).
  • Note with valid owner_id → live SSO name shown; existing behavior is unchanged.
  • contact_notes_handler.go:143-166 (edit/delete permission) is not modified — permission computation stays as-is.

Test strategy

Go unit tests on contact_notes_service.go with mocked GetUserNamesBulk response assert the fallback branch independently from the existing live-name branch.

Effort estimate

DisciplineDays
Backend0.5
QA0
Total0.5

Assumptions: ContactNote.LegacyOwnerLabel added in Task 1.1; GetUserNamesBulk call at :131-136 already iterates notes and resolves names — this is a one-line fallback after that loop.

Run to verify

make test && make lint && make build

Depends on

  • [Task 1.1] (LegacyOwnerLabel field on ContactNote).

Ordering rationale

  • Start with Task 1.1 (data model + constants + index) — every other task depends on the ContactNote struct extensions, NotesMigrationJobName, and CreateNotesBatch. It has no dependencies and can land on day 1.
  • Task 1.2 (HTML Normalizer) is fully independent and can run in parallel with Tasks 2.1, 2.2, and 2.3 — fan all four out simultaneously once Task 1.1 lands. Task 2.4 (consumer assembly) consumes all of them and is the integration spine that must wait.
  • Tasks 2.1, 2.2, 2.3 are independently parallelizable (resolvers, attachment processor, and CRM client each have clean interfaces) — assign to separate developers; all three feed Task 2.4.
  • Task 2.4 (consumer assembly) is the critical path — it depends on all Phase 1 tasks and Tasks 2.1–2.3. Prioritize landing predecessors quickly; the per-CID lock in 2.4 and the ValidationRunner are the last pieces before Stage 1 is runnable.
  • Task 2.5 (handler + routes) can build in parallel with 2.4 — it only needs the service/enqueue scaffold from Task 1.1 (job name, payload, Redis key), not the consumer internals. Merge before Stage 1.
  • Task 2.6 (render-path fallback) is the smallest task and can ship any time after Task 1.1 — it is independent of the migration pipeline and can land even before Stage 1 to ensure the UI is ready for migrated notes.
  • Push externally on OQ-7 (CRM extraction endpoint) — the only hard external blocker for running migrations. The stub interface in Task 2.3 keeps the entire pipeline buildable and testable today. Also push on OQ-10 (InfoSec bluemonday approval) to unblock AGREED on Task 1.2, and OQ-9 (CDP storage quota + CRM S3 access confirmation) at Stage 0.

Skipped stories

Story / TaskReason
NOTES-MIG-S05 (View migrated notes — full story)No new BE or FE code — migrated notes render via the existing CDP Notes UI and the existing GET /iag/v1/contacts/{id}/notes endpoint unchanged. The only new piece is the owner-label fallback, covered in Task 2.6. S06 "Legacy" banner/tag has no FE infrastructure (CustomerNote has no metadata field — D-9) and is explicitly out of scope.
POST /cdp/notes/migrate (PRD literal batch endpoint)Superseded by the in-process gocraft/work write + /private/notes/migrate trigger (Tasks 2.4 + 2.5) per Decision 1. The /cdp namespace does not exist in rest_router.go.
CRM org-scoped extraction endpointOwned by the Legacy CRM Squad (OQ-7). Task 2.3 stubs the CRMNotesExtractor interface; no consumer code is blocked.
Mobile (all stories)No mobile work in this RFC — backend migration pipeline only; migrated notes surface on mobile via the existing notes read path unchanged.
crm_checkin geolocationDeliberately dropped (Decision 9, D-10). CDP has no geo field; CRM address is Lockbox-encrypted. Each note with a check-in logs a marker for auditability — no migration needed.