Task Breakdown — RFC: Legacy Migration — CRM Contact Notes → CDP Notes
Mode: Horizontal (Phase 1: Foundation → Phase 2: Pipeline + API) · Scope: BE-only (contact-service, Go/MongoDB) · No FE work — migrated notes render via the existing CDP Notes UI · Blocked tasks shown inline (full picture).
All 10 RFC execution chunks map to 8 tasks below. The blocking cross-squad dependency (CRM org-scoped extraction endpoint, OQ-7) only blocks running migrations, not building the pipeline — the extractor is behind a
CRMNotesExtractorinterface and can be stubbed throughout. Every task is buildable today.
Effort Summary
| Phase / Area | BE days | QA days | Total |
|---|---|---|---|
| Phase 1 — Foundation (data model + schema + HTML normalizer) | 3.5 | 1.0 | 4.5 |
| Phase 2 — Pipeline components + assembly + API | 10.0 | 3.5 | 13.5 |
| Grand total | 13.5 | 4.5 | 18.0 |
Confidence: medium. Key assumptions: (1) MongoDB schemaless — adding
legacy_crm_note_id/legacy_owner_labelneeds no data migration, only a partial unique index; (2)bluemondaydep requires InfoSec approval (OQ-10) before AGREED but is buildable now; (3) CRM extraction client (Task 2.3) is built against a stub contract — the CRM squad's org-scoped endpoint (OQ-7) is the only external blocker for running migrations, not for building; (4) CDP attachment storage quota + CRM S3 access confirmed at Stage 0 (OQ-9), not a build blocker.
Phase 1 — Foundation
Task 1.1: [BE] Data model + job constants + CreateNotesBatch + partial unique index (NOTES-MIG-S03)
An Ops engineer can safely re-trigger a migration at any time — notes already in CDP are skipped without error, zero duplicates are inserted, and the job constants + Redis status scaffold are in place for all downstream tasks.
Status: ✅ Actionable.
What to build
Add legacy_crm_note_id (idempotency key) and legacy_owner_label (unmapped-owner fallback) to the ContactNote struct; create CreateNotesBatch — a timestamp-preserving, skip-on-conflict upsert method over IDbRepo.BulkUpdate; write the partial unique index migration; scaffold the NotesMigrationJobName const, Redis status key, and payload structs that all subsequent tasks consume.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| extend | internal/app/repository/contact_notes/base.go | Add LegacyCRMNoteID string (bson:"legacy_crm_note_id,omitempty") and LegacyOwnerLabel string (bson:"legacy_owner_label,omitempty") to ContactNote; do not touch SetDefaults() |
| create | internal/app/repository/contact_notes/batch_create.go | CreateNotesBatch(ctx, []ContactNote) — uses IDbRepo.BulkUpdate (db.go:180-181) with UpdateOneModel + $setOnInsert + SetUpsert(true), filter {company_sso_id, legacy_crm_note_id}; bypasses SetDefaults(), sets caller CreatedAt/UpdatedAt explicitly; sets IsDeleted=false/Attachments=[] for defaults SetDefaults() would otherwise provide; returns UpsertedCount (migrated) + MatchedCount (skipped) |
| create | db/migrations/NNN_index_contact_notes_legacy_crm_note_id.up.json | createIndexes on contact_notes: {company_sso_id:1, legacy_crm_note_id:1}, "unique":true, "partialFilterExpression":{"legacy_crm_note_id":{"$exists":true}}; index name uq_contact_notes_company_legacy_crm_note_id |
| create | db/migrations/NNN_index_contact_notes_legacy_crm_note_id.down.json | dropIndexes for uq_contact_notes_company_legacy_crm_note_id |
| create | internal/app/service/notes_migration_service.go | NotesMigrationJobName const; Redis status key notes_migration:{cid} + TTL 7d (mirror activity_log_migration_service.go:22-31); MigrationStatus struct {Status, ProgressPct, NotesProcessed, NotesTotal, FailureRate, MatchPct} |
| create | internal/app/payload/notes_migration.go | NotesMigrationRequest{CID, CompanySsoID}, NotesMigrationResponse{JobID, Status} |
| create | internal/app/repository/contact_notes/batch_create_test.go | Tests: new note inserted with caller ts (UpsertedCount=1); same note re-upserted → UpsertedCount=0, MatchedCount=1, no E11000; full re-run → UpsertedCount=0; human-written note (no legacy_crm_note_id) unaffected |
Implementation steps
- Write failing tests (red) — Create
batch_create_test.go: (a) new note withLegacyCRMNoteID="crm-1"→UpsertedCount=1,CreatedAtequals caller value (nottime.Now()); (b) same note re-upserted →UpsertedCount=0,MatchedCount=1, no error; (c) human-written note withoutlegacy_crm_note_idis unaffected. Runmake test, confirm red. - Extend the struct — Add
LegacyCRMNoteIDandLegacyOwnerLabeltoContactNoteininternal/app/repository/contact_notes/base.go. Do not modifySetDefaults()— the single-CRUD create path must remain untouched (create.go:12). - Implement
CreateNotesBatch— Build overIDbRepo.BulkUpdate: for eachContactNote, createmongo.NewUpdateOneModel().SetFilter(bson.M{"company_sso_id":..., "legacy_crm_note_id":...}).SetUpdate(bson.M{"$setOnInsert": note}).SetUpsert(true); setIsDeleted=falseandAttachments=[]explicitly; never callSetDefaults(). - Write index migration —
.up.jsonwithpartialFilterExpression:{"legacy_crm_note_id":{"$exists":true}}, following the pattern of013_create_contact_notes.up.json;.down.jsondrops by name. Runmake migrate-up && make migrate-down, confirm index applies and rolls back without touching existing notes. - Scaffold service constants — Create
notes_migration_service.gowithNotesMigrationJobName, Redis status key + TTL; createnotes_migration.gopayload structs. - Go green —
make test. - Quality gate —
make build && make migrate-up && make migrate-down.
Acceptance criteria
-
CreateNotesBatchstorescreated_at/updated_atfrom caller — not fromtime.Now()(NOTES-MIG-S02/AC-4, Decision 3). - Re-upsert of an existing
legacy_crm_note_id→UpsertedCount=0,MatchedCount=1, noE11000error (NOTES-MIG-S03/AC-2). - Full re-run where all notes exist →
notes_migrated=0,notes_skipped=N(NOTES-MIG-S03/AC-3). - Partial unique index does not affect human-written notes (no
legacy_crm_note_id) — noE11000on the second human note per company (Decision 4 correctness). -
make migrate-up && make migrate-downapplies and rolls back cleanly; data untouched. -
NotesMigrationJobName, Redis status key + TTL, and payload structs exported and compile.
Test strategy
Go table tests in batch_create_test.go seed a test Mongo collection with and without legacy_crm_note_id, run CreateNotesBatch twice, assert UpsertedCount/MatchedCount/zero-error and caller-ts preservation. Index migration validated by running the JSON and confirming existing notes are untouched.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 2.0 |
| QA | 0.5 |
| Total | 2.5 |
Assumptions:
IDbRepo.BulkUpdatealready exists atdb.go:180-181withBulkWrite(SetOrdered(false)); JSON index pattern mirrors013_create_contact_notes.up.json; MongoDB schemaless — no data migration.
Run to verify
make test && make build && make migrate-up && make migrate-down
Depends on
- None.
Task 1.2: [BE] HTML Normalizer — bluemonday deny-by-default sanitizer + CRM mention-anchor strip; no <p> re-wrap (NOTES-MIG-S02, NOTES-MIG-S06-NEG)
Every CRM note's HTML is stripped of XSS payloads and dangling CRM mention anchors before entering CDP — safe rich markup is preserved, not flattened to plain text.
Status: ⚠️ Partially blocked — InfoSec approval of the bluemonday allow-list (OQ-10) is required before AGREED. The implementation is fully buildable and reviewable now.
What to build
HtmlNormalizer with a deny-by-default bluemonday policy: structural tags only, no style, AllowStandardURLs on <a> (http/https/mailto only), RequireNoFollowOnLinks. Pre-pass strips CRM mention anchors (data-user-id, /users/{id}/edit_user hrefs) to plain @Name text before sanitization. Post-sanitize: validate ≤ 10,000 chars — return error if exceeded, never truncate.
The bluemonday policy (deny-by-default):
- Base:
bluemonday.UGCPolicy()— stripsstyle, scripts, event handlers - Allow tags:
a b i strong em u s span br div p ul ol li blockquote h1 h2 h3 h4 h5 h6 pre - On
<a>:AllowStandardURLs()(http/https/mailto) +RequireNoFollowOnLinks(true); nostyle - Do not wrap output in
<p>
Implementation Plan
| Action | File | What changes |
|---|---|---|
| create | internal/pkg/util/html_normalizer.go | HtmlNormalizer.Normalize(html string) (string, error): (1) pre-pass — replace <a data-user-id …>@Name</a> and <a href="…/users/{id}/edit_user…">@Name</a> with @Name plain text; (2) bluemonday sanitize with deny-by-default policy; (3) post-sanitize length check — len > 10000 → ErrNoteTooLong, not truncated |
| extend | go.mod / go.sum | Add github.com/microcosm-cc/bluemonday |
| create | internal/pkg/util/html_normalizer_test.go | Table tests: XSS payloads neutralized (<script>, onerror=, javascript: href, style exfil); mention anchors → @Name text; safe markup preserved; no <p> wrap on bare text; post-sanitize > 10,000 chars → ErrNoteTooLong |
Implementation steps
- Write failing tests (red) — Create
html_normalizer_test.gowith table tests: (a)<script>alert(1)</script>→ empty; (b)<p onerror="x">→ attr stripped; (c)href="javascript:void(0)"→ link stripped; (d)<a data-user-id="123">@Alice</a>→@Aliceplain text; (e)<a href="/users/123/edit_user">@Bob</a>→@Bobplain text; (f)<strong>bold</strong>→ preserved; (g) bare text not wrapped in<p>; (h) string > 10,000 chars post-sanitize →ErrNoteTooLong. Runmake test, confirm red. - Add dependency —
go get github.com/microcosm-cc/bluemonday. - Implement pre-pass — Regex-replace CRM mention anchor patterns to their inner
@Nametext before sanitization. - Implement sanitize — Build deny-by-default
bluemondaypolicy per spec above; run on pre-passed output. - Implement post-sanitize length check — If
len(sanitized) > 10000→ return"", ErrNoteTooLong; never truncate silently. - Go green —
make test. - Quality gate —
make lint && make sec && make build.
Gate: InfoSec must approve the allow-list (OQ-10) before AGREED. Build and review now; get sign-off before merging.
Acceptance criteria
-
<script>,onerror=,javascript:href,styleexfil all neutralized — stored XSS impossible via this path (NOTES-MIG-S02/AC-2, Decision 5). - CRM mention anchors (
data-user-id,/users/{id}/edit_userhrefs) → plain@Nametext; no dangling links, no CDP mention notification (NOTES-MIG-S06-NEG/NEG-1, Decision 5). - Safe structural markup (
<strong>,<em>,<ul>,<blockquote>, etc.) is preserved — not stripped to plain text (Decision 5 rationale). - Output is not wrapped in a
<p>tag — this was an explicit v1.1 error (Decision 5). - Post-sanitize length > 10,000 chars →
ErrNoteTooLongreturned; note counted as failure, never truncated. -
make sec(gosec) reports no new findings on the normalizer.
Test strategy
Go table tests in html_normalizer_test.go assert positive cases (safe markup survives) and negative cases (XSS stripped, mentions stripped, no <p> wrap, oversized → error). Tests are pure (no I/O) — fast and exhaustive.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 1.5 |
| QA | 0.5 |
| Total | 2.0 |
Assumptions:
bluemondaydep approved by InfoSec (OQ-10); policy is deny-by-default, not a mirror of CRM's Rails allow-list (which permitsstyleand unscopedhref) per Decision 5.
Run to verify
make test && make lint && make sec && make build
Depends on
- None. Gate: OQ-10 InfoSec approval of the allow-list before AGREED.
Phase 2 — Pipeline Components + Assembly + API
Task 2.1: [BE] Contact resolver + Owner resolver (NOTES-MIG-S02)
Each CRM note lands on the right CDP contact and shows the right author — or a readable fallback label when the original author's account can't be resolved.
Status: ✅ Actionable.
What to build
ContactResolver: batch-resolve crm_person_id → contact_id via SearchWithFilters(bson.M{"crm_data.id":{"$in":[...]}}) against the existing crm_contact_index; string-cast CRM int IDs; apply person-first precedence for multi-FK notes. OwnerResolver: map CRM creator_id → SSO UUID via GetUserNamesBulk; on failure set OwnerID="" + populate LegacyOwnerLabel.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| create | internal/app/consumer/notes_migration_consumer.go (initial scaffold) | ContactResolver.Resolve(ctx, companySsoID string, crmPersonIDs []string) (map[string]string, []string) — drives ContactRepository.SearchWithFilters(bson.M{"crm_data.id":{"$in": ids},"company_sso_id":...}); string-casts CRM int IDs; person-first precedence for multi-FK notes; unresolved IDs → CONTACT_NOT_MAPPED list |
| extend | internal/app/consumer/notes_migration_consumer.go | OwnerResolver.Resolve(ctx, creatorIDs []string) map[string]OwnerResult — calls GetUserNamesBulk (mirroring contact_notes_service.go:131-136); on failure sets OwnerID="" + LegacyOwnerLabel (CRM display name or "[Legacy CRM User]") |
| create | internal/app/consumer/notes_migration_consumer_test.go | Tests: resolved by crm_data.id matching crm_person_id string-cast; multi-FK note → person wins; no match → CONTACT_NOT_MAPPED; unmappable owner → OwnerID="" + label non-empty |
Implementation steps
- Write failing tests (red) — Create
notes_migration_consumer_test.go: (a)ContactResolverwith a mocked contactcrm_data.id="42"resolves notecrm_person_id=42(string-cast from int); (b) multi-FK note → person-first; (c) unresolvable → in theCONTACT_NOT_MAPPEDlist; (d)OwnerResolverwith unmappablecreator_id→OwnerID="",LegacyOwnerLabelnon-empty. Runmake test, confirm red. - Implement
ContactResolver— DriveContactRepository.SearchWithFilterswithbson.M{"crm_data.id":{"$in": crmPersonIDs}, "company_sso_id": companySsoID}(pattern fromcontact/search.go:125); build acrm_person_id → contact_idmap; string-cast CRM IDs before lookup (crm_data.idstored as string,base.go:343). - Implement
OwnerResolver— CallGetUserNamesBulk(pattern fromcontact_notes_service.go:131-136); for each unmappedcreator_idsetOwnerID=""and populateLegacyOwnerLabelfrom CRM display name. - Go green —
make test. - Quality gate —
make lint && make build.
Acceptance criteria
- CRM
crm_person_id(int, string-cast) resolves to CDPcontact_idvia the indexedcrm_data.idfield — no collection scan (NOTES-MIG-S02/AC-1, Decision 7). - Multi-FK note: person takes precedence over company/deal/ticket (Decision 7).
- No CDP contact match → note ID added to
CONTACT_NOT_MAPPEDlist; note skipped and counted as failure (NOTES-MIG-S02/ERR-1). - Unmappable
creator_id→OwnerID=""+LegacyOwnerLabelset; note still inserted (non-blocking) (NOTES-MIG-S02/ERR-3, Decision 6).
Test strategy
Go unit tests with mocked ContactRepository and mocked GetUserNamesBulk; table-driven for contact resolution (match, multi-FK, no-match) and owner resolution (mapped, unmapped with label).
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 1.5 |
| QA | 0.5 |
| Total | 2.0 |
Assumptions:
ContactRepository.SearchWithFiltersalready acceptsbson.Mfilter atcontact/search.go:125;crm_data.id == crm_person_idconfirmed (RFC REV-1);GetUserNamesBulkalready exists atcontact_notes_service.go:131-136.
Run to verify
make test && make lint && make build
Depends on
- [Task 1.1] (
ContactNotestruct withLegacyOwnerLabelfield).
Task 2.2: [BE] Attachment processor — SSRF guard + download + type-map + re-upload (NOTES-MIG-S02)
Every CRM note's attachments — images, audios, and documents — are safely re-hosted in company-scoped CDP storage, never referencing legacy CRM S3 URLs, with SSRF protection on every outbound download.
Status: ✅ Actionable. CDP storage quota + CRM S3 access confirmation is a Stage 0 gate (OQ-9), not a build blocker.
What to build
AttachmentProcessor: for each CRM attachment URL (from crm_note_images, crm_note_audios, crm_note_attachments documents) — validate host against a CRM S3/CDN allow-list (SSRF guard), reject internal IPs and metadata endpoints, validate magic bytes vs declared type, enforce max download size, re-upload to deterministic key {company_sso_id}/{legacy_crm_note_id}/{asset} in CDP storage, return proxy URL + mapped Type. Failures are non-blocking: note inserts without the failed attachment.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| create | internal/app/consumer/attachment_processor.go | AttachmentProcessor.Process(ctx context.Context, companySsoID, legacyCRMNoteID string, attachments []CRMAttachment) ([]ContactNoteAttachment, []AttachmentError) |
| extend | internal/app/consumer/attachment_processor.go | SSRF allow-list of CRM S3/CDN hostnames; reject 10.x, 172.16-31.x, 192.168.x, 169.254.x, arbitrary hosts; magic-byte validation vs declared content type; max download size cap; type mapping: CRM image → image; audio → voice_note (or video for video/*); document → doc/pdf/xlsx by extension/content-type (default doc) |
| extend | internal/app/consumer/attachment_processor.go | Deterministic storage key: {company_sso_id}/{legacy_crm_note_id}/{asset} — idempotent on re-run (same key overwrites safely) |
| create | internal/app/consumer/attachment_processor_test.go | Tests: image/audio/document mapped to correct Type; URL from 169.254.169.254 → ATTACHMENT_DOWNLOAD_FAILED, other attachments continue; magic-byte mismatch → failure; file over max size → failure; re-run uploads to same deterministic key |
Implementation steps
- Write failing tests (red) — Create
attachment_processor_test.go: (a) image URL from allowed CRM S3 host →Type="image", proxy URL stored; (b) URL pointing to169.254.169.254→ATTACHMENT_DOWNLOAD_FAILED, processing continues for remaining attachments; (c).pdfwith PDF magic bytes →Type="pdf"; (d).pdfwith mismatched magic bytes → failure; (e) file exceeding max size → failure; (f) re-run → same storage key written (overwrite). Runmake test, confirm red. - Implement SSRF guard — Parse URL host; reject if not in the CRM S3/CDN allow-list; reject internal IP ranges explicitly; return
AttachmentError{Code: "ATTACHMENT_SSRF_BLOCKED"}for rejected URLs. - Implement download —
http.NewRequestwith a context-bound timeout; read body up tomaxAttachmentByteslimit; capture content-type header. - Implement magic-byte check — Read first N bytes; match against expected magic bytes for the declared type; mismatch → failure.
- Implement type mapping — Map
CRMAttachment.Type+ file extension + content-type to one of{image, doc, pdf, video, voice_note, xlsx}(validated againstcontact_notes_service.go:286-293allow-set; defaultdoc). - Implement re-upload — Write to
{company_sso_id}/{legacy_crm_note_id}/{filename}in CDP storage; store proxy URL +Type+FileName+FileSizeInByte. - Go green —
make test. - Quality gate —
make lint && make sec && make build.
Acceptance criteria
- Images, audios, and
crm_note_attachmentsdocuments are all processed and re-hosted in CDP storage (NOTES-MIG-S02/AC-3, Decision 8). - SSRF guard rejects non-allow-listed hosts, internal IPs (
10.x,192.168.x), and the cloud metadata endpoint (169.254.169.254) (Decision 8 security). - Magic bytes validated against declared content type before upload.
- Download failure → note inserted without that attachment;
ATTACHMENT_DOWNLOAD_FAILEDlogged; note not counted as failed (NOTES-MIG-S02/ERR-2). - Storage key
{company_sso_id}/{legacy_crm_note_id}/{asset}is deterministic — safe to overwrite on re-run (§2.E). - Resulting
Typeis one of{image, doc, pdf, video, voice_note, xlsx}— no unvalidated type reaches the DB. -
make secreports no new gosec findings on the outbound fetch path.
Test strategy
Go table tests with a mock HTTP server (simulating CRM S3) and mock CDP storage client assert type mapping, SSRF rejection, magic-byte validation, size cap enforcement, and non-blocking failure behavior.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 2.0 |
| QA | 0.5 |
| Total | 2.5 |
Assumptions: CRM S3 is
public-readACL — no signing required (confirmed in RFC grounding,carrierwave-s3.rb:27,58); CDP storage client exists and accepts a key + byte payload;crm_note_attachmentsdocuments are in scope (OQ-5 resolved: yes).
Run to verify
make test && make lint && make sec && make build
Depends on
- [Task 1.1] (
ContactNoteAttachmentstruct frombase.go).
Task 2.3: [BE] CRM extraction client — ListPersonNotes + heimdall retrier + CRMNotesExtractor interface (NOTES-MIG-S01, NOTES-MIG-S02)
The pipeline can paginate all Person notes for a CID from Legacy CRM using a properly timeout-guarded S2S client — not the no-timeout
http.DefaultClientcurrently onQontakCrmClient.
Status: ⚠️ Partially blocked — the CRM org-scoped endpoint does not exist yet (OQ-7, Legacy CRM Squad dependency). Actionable now: build ListPersonNotes behind a CRMNotesExtractor interface with a stub; all downstream tasks (2.4) compile and test against the stub until OQ-7 resolves.
What to build
Extend QontakCrmClient (qontak_crm.go) with ListPersonNotes(ctx, cid string, page, perPage int) ([]CRMNote, error) — built on the heimdall httpclient pattern (iag_mekari.go:69-71), not http.DefaultClient (which has no timeout); 10s timeout via CRM_NOTES_EXTRACT_TIMEOUT; 3 retries with exponential backoff 1s/3s/9s on timeout + 5xx/Locked(423)/429. Define CRMNote payload struct and CRMNotesExtractor interface.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| extend | internal/app/api/qontak_crm.go | Add ListPersonNotes(ctx context.Context, cid string, page, perPage int) ([]CRMNote, error) — uses httpclient.NewClient(WithHTTPTimeout(timeout)) + heimdall retrier (3×, 1s/3s/9s), existing Authorization: {CRM_API_AUTH} header, existing 5xx/Locked/429 handling pattern from :43-47; not http.DefaultClient |
| create | internal/app/payload/crm_note.go | CRMNote{ID, Note, CreatorID, CRMPersonID, CRMNoteTypeID, CRMNoteImages, CRMNoteAudios, CRMNoteAttachments, CreatedAt, UpdatedAt} |
| create | internal/app/api/crm_notes_extractor.go | CRMNotesExtractor interface (ListPersonNotes); CRMNotesExtractorStub implementation returning hardcoded fixtures for use in consumer tests (Task 2.4) |
| extend | config/load.go | CRM_NOTES_EXTRACT_TIMEOUT — getDurationOrPanic("CRM_NOTES_EXTRACT_TIMEOUT") with default 10s |
| create | internal/app/api/qontak_crm_notes_test.go | Tests: correct Authorization header; 5xx → retried 3× with backoff → CRM_EXTRACT_FAILED; 429 → retried; successful page → []CRMNote returned |
Implementation steps
- Write failing tests (red) — Create
qontak_crm_notes_test.go: (a)ListPersonNotessendsAuthorizationheader fromCRM_API_AUTH; (b) mock returns 500 three times →CRM_EXTRACT_FAILEDafter 3 attempts; (c) mock returns 429 once then 200 → success after retry; (d) mock returns paginated response →[]CRMNotecorrectly unmarshaled. Runmake test, confirm red. - Define interface + stub — Create
CRMNotesExtractorinterface incrm_notes_extractor.go; implementCRMNotesExtractorStubreturning a fixture[]CRMNote— used by Task 2.4 consumer tests until OQ-7 resolves. - Implement
ListPersonNotes— Usehttpclient.NewClient(WithHTTPTimeout(cfg.CRMNotesExtractTimeout))(pattern fromapi/iag_mekari.go:69-71); add heimdall retrier (3 attempts, exponential 1s/3s/9s); pass existingAuthorization: {CRM_API_AUTH}header; reuse 5xx/Locked/429 handling fromqontak_crm.go:43-47. - Add config —
getDurationOrPanic("CRM_NOTES_EXTRACT_TIMEOUT")inconfig/load.go; default duration 10s. - Go green —
make test. - Quality gate —
make lint && make build.
Once OQ-7 resolves: point
CRM_NOTES_EXTRACT_TIMEOUTconfig and the endpoint URL at the CRM squad's new endpoint — no consumer code change required (the interface isolates it).
Acceptance criteria
-
ListPersonNotesuses heimdallhttpclientwith 10s timeout — nothttp.DefaultClient(Decision 10 timeout correction, RFC REV-2). - 5xx / Locked (423) / 429 → retried 3× with 1s/3s/9s exponential backoff; budget exhausted →
CRM_EXTRACT_FAILED(NOTES-MIG-S01 extraction failure path). -
Authorization: {CRM_API_AUTH}header present on every request (reuses existing pattern). -
CRMNotesExtractorinterface lets Task 2.4 compile and test against the stub — entire pipeline buildable without OQ-7. - (pending OQ-7) Real CRM endpoint URL wired via config — zero code change once available.
Test strategy
Go tests with a mock HTTP server assert request shape (URL, headers, pagination params), retry behavior (3 attempts on 5xx), and timeout propagation; stub tests confirm fixture is returned correctly.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 1.5 |
| QA | 0.5 |
| Total | 2.0 |
Assumptions:
QontakCrmClient+ auth config (CRM_API_ROOT_URL/CRM_API_AUTH) already exist (qontak_crm.go:14-24,config/load.go:197-198); heimdall pattern already used inapi/iag_mekari.go:69-71andqontak_billing.go:183-185.
Run to verify
make test && make lint && make build
Depends on
- None. External blocker for running migrations: OQ-7 (CRM squad delivers org-scoped endpoint). Stage 0 gate: OQ-9 (confirm CRM S3 still
public-read; CDP storage residency).
Task 2.4: [BE] Consumer assembly — ProcessNotesMigrationJob + note-type filter + failure guard + ValidationRunner + worker registration (NOTES-MIG-S01, NOTES-MIG-S02, NOTES-MIG-S03, NOTES-MIG-S04, NOTES-MIG-S06-NEG)
The end-to-end migration pipeline runs as a background worker: extract → transform per note → batch insert → halt if failure rate > 1% → validate match_pct — every failure logged with a reason code, zero silent drops.
Status: ✅ Actionable (Tasks 1.1, 1.2, 2.1, 2.2, 2.3 must land first; CRM extractor uses stub from Task 2.3).
What to build
NotesMigrationConsumer.ProcessNotesMigrationJob(job *work.Job) — the full assembled pipeline: paginated extract (via CRMNotesExtractor) → filter out-of-scope note types → batch-resolve contacts (Task 2.1) → batch-resolve owners (Task 2.1) → normalize HTML (Task 1.2) → process attachments (Task 2.2) → CreateNotesBatch (Task 1.1) → update Redis progress → check failure rate → halt if > 1%. Post-batches: ValidationRunner.Run (count compare) → set terminal status. Register job in worker_service.go.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| extend | internal/app/consumer/notes_migration_consumer.go | ProcessNotesMigrationJob(job *work.Job) error: unmarshal job.Args["data"] (mirror activity_log_migration_consumer.go:38-47); paginated extract loop; per-batch: filter note types → resolve contacts → resolve owners → normalize HTML → process attachments → CreateNotesBatch → update Redis {progress_pct, notes_processed} |
| extend | internal/app/consumer/notes_migration_consumer.go | Per-batch failure-rate check: failure_rate = failure_count / total_processed; > 0.01 → set Redis halted + log crm_notes_migration_halted → return job (manual re-trigger required); never silently drop — every failure logged {legacy_crm_note_id, reason_code, details} |
| extend | internal/app/consumer/notes_migration_consumer.go | Note-type filter: check CRMNote.CRMNoteTypeID against config allow-set (default (1,6) Notes/Documents); excluded → out_of_scope_count++, not failure_count |
| extend | internal/app/consumer/notes_migration_consumer.go | ValidationRunner.Run(ctx, cid, companySsoID): CountWithFilters(bson.M{"company_sso_id":..., "legacy_crm_note_id":{"$exists":true}}) vs CRM source count → match_pct; ≥ 99% → completed_success; < 99% → completed_with_errors; source count unavailable → VALIDATION_SKIPPED + completed_with_errors |
| extend | internal/worker/worker_service.go | registerJobWithOptions(NotesMigrationJobName, opts, consumer.ProcessNotesMigrationJob, pool) (mirror :132,138) |
| extend | internal/app/consumer/notes_migration_consumer_test.go | Tests: end-to-end happy path (extract → resolve → normalize → insert → progress); out-of-scope type → out_of_scope_count++ not failure; failure_rate > 1% → halted; CONTACT_NOT_MAPPED → skip + count failure; full re-run → notes_migrated=0; match_pct ≥ 99% → completed_success; match_pct < 99% → completed_with_errors |
Implementation steps
- Write failing tests (red) — Extend
notes_migration_consumer_test.go: (a) happy path — 10 notes extracted, resolved, sanitized, inserted,notes_processed=10,progress_pctupdated in Redis; (b) note with out-of-scopecrm_note_type_id→out_of_scope_count++,failure_countunchanged; (c) 3 of 200 notes fail contact resolution →failure_rate=1.5%→ statushalted; (d) full re-run (all exist) →UpsertedCount=0,MatchedCount=N,completed_success; (e)match_pct=98%→completed_with_errors. Runmake test, confirm red. - Implement
ProcessNotesMigrationJob— FollowActivityLogMigrationConsumershape (activity_log_migration_consumer.go:25-50): unmarshal args → check per-CID Redis in-progress lock (409 guard lives in the service layer from Task 2.5, but the consumer also checks and exits if another job is running) → extract in pagination loop (batchperPage=500) → process per note through the pipeline. - Implement note-type filter — Read
crm_note_type_idallow-set from config (default(1,6)); excluded notes →out_of_scope_count++only (not failures). - Implement per-batch halt check — After each
CreateNotesBatchcall:failure_rate = float64(failure_count) / float64(notes_processed); if> 0.01→ writehaltedto Redis, logcrm_notes_migration_haltedwith{job_id, cid, failure_rate}, return the job. - Implement
ValidationRunner— After all batches:CountWithFilters(bson.M{"company_sso_id":..., "legacy_crm_note_id":{"$exists":true}})(mirrorscontact/search.go:147); compare to CRM source count; write terminal status to Redis. - Register job —
registerJobWithOptions(NotesMigrationJobName, opts, consumer.ProcessNotesMigrationJob, pool)inworker_service.go(mirror:132,138). - Go green —
make test. - Quality gate —
make lint && make build.
Acceptance criteria
- End-to-end: extract → transform →
CreateNotesBatch→ Redis progress updated per batch (NOTES-MIG-S01/AC-3). - Out-of-scope
crm_note_type_id→out_of_scope_count++; not counted as a failure (NOTES-MIG-S06-NEG/NEG-2, Decision 11). -
failure_rate > 1%within a batch → Redis statushalted,crm_notes_migration_haltedlogged (NOTES-MIG-S01/ERR-3). - Zero silent failures — every failed note logged with
{legacy_crm_note_id, reason_code, details}(§1 Success Criteria). - Full re-run →
notes_migrated=0,notes_skipped=N,completed_success(NOTES-MIG-S03/AC-3). -
match_pct ≥ 99%→completed_success;< 99%→completed_with_errors+ alert (NOTES-MIG-S04/AC-2, ERR-1). -
NotesMigrationJobNameregistered inworker_service.go;make buildproduces a working worker binary.
Test strategy
Go integration-style tests in notes_migration_consumer_test.go with CRMNotesExtractorStub (Task 2.3), mocked ContactRepository, mocked HtmlNormalizer, mocked AttachmentProcessor, and a test Mongo for CreateNotesBatch. Assertions cover the full pipeline, halt trigger, out-of-scope filter, and both ValidationRunner terminal states.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 2.5 |
| QA | 1.0 |
| Total | 3.5 |
Assumptions:
gocraft/workjob shape mirrorsActivityLogMigrationConsumer.ProcessUpdateUserIDJob(activity_log_migration_consumer.go:25-50); per-CID in-progress lock mirrors the activity-log Redis status key; batch size 500 (max 1000) from config.
Run to verify
make test && make lint && make build
Depends on
- [Task 1.1] (
CreateNotesBatch,NotesMigrationJobName, Redis status), [Task 1.2] (HtmlNormalizer), [Task 2.1] (ContactResolver,OwnerResolver), [Task 2.2] (AttachmentProcessor), [Task 2.3] (CRMNotesExtractorinterface + stub).
Task 2.5: [BE] Migration service + handler + routes — trigger, status endpoint, flag guard, full error catalog (NOTES-MIG-S01, NOTES-MIG-S04)
An Ops engineer can trigger a migration job for any CID via S2S (
POST /private/notes/migrate) and poll its status (GET /private/notes/migration/status) — with all guards: flag check, idempotency, concurrent-job prevention, and the full error response catalog.
Status: ✅ Actionable.
What to build
NotesMigrationService.ValidateAndEnqueue (flag gate → duplicate check → per-CID in-progress lock → JobEnqueuer.EnqueueJob → Redis in_progress) + GetMigrationStatus (Redis read). NotesMigrationHandler with POST /private/notes/migrate and GET /private/notes/migration/status. Register both routes under the existing /private BasicAuth group in rest_router.go.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| extend | internal/app/service/notes_migration_service.go | ValidateAndEnqueue(ctx, req NotesMigrationRequest) (NotesMigrationResponse, error): check flag → 403 FLAG_DISABLED; check Redis for completed_success → 409 ALREADY_MIGRATED; acquire per-CID in-progress lock → 409 JOB_ALREADY_RUNNING; JobEnqueuer.EnqueueJob(NotesMigrationJobName, work.Q{"data": req}) (pattern job_enqueuer.go:65-67); write Redis in_progress; return {job_id} |
| extend | internal/app/service/notes_migration_service.go | GetMigrationStatus(ctx, cid string) (MigrationStatus, error): read Redis key notes_migration:{cid}; absent → {status: "not_started"} |
| create | internal/app/handler/notes_migration_handler.go | NotesMigrationHandler{Migrate(w,r), GetStatus(w,r)} — mirrors activity_log_migration_handler.go:32-91; uses myhttp.NewJSONResponse/ErrBadRequest; error response shape {"error":"CODE","message":"...","details":{}} |
| extend | internal/server/rest_router.go | Register private.Post("/notes/migrate", handler.Migrate) + private.Get("/notes/migration/status", handler.GetStatus) under the /private group guarded by mymiddleware.BasicAuth (:70); add after existing /private routes |
| create | internal/app/handler/notes_migration_handler_test.go | Tests: valid BasicAuth + flag ON → 200 {job_id}; flag OFF → 403 FLAG_DISABLED; completed → 409 ALREADY_MIGRATED; in-progress lock → 409 JOB_ALREADY_RUNNING; CID not found → 404 CID_NOT_FOUND; missing BasicAuth → 401; GET returns progress fields; unknown CID → {status:"not_started"} |
Implementation steps
- Write failing tests (red) — Create
notes_migration_handler_test.go: (a) valid POST → 200{job_id, status:"in_progress"}; (b) flag OFF → 403FLAG_DISABLED; (c) alreadycompleted_successin Redis → 409ALREADY_MIGRATED; (d) in-progress lock held → 409JOB_ALREADY_RUNNING; (e) missing/invalid BasicAuth → 401/403; (f) GET with known CID →{status, progress_pct, notes_processed, notes_total, failure_rate, match_pct}; (g) GET with unknown CID →{status:"not_started"}. Runmake test, confirm red. - Implement
ValidateAndEnqueue— Sequential: checkcrm_notes_migration_enabledflag for CID (403) → read Redis forcompleted_success(409) → try-acquire in-progress lock (409 if held) →JobEnqueuer.EnqueueJob(NotesMigrationJobName, work.Q{"data": req})→ set Redisin_progress→ return{job_id: result.JobID}. - Implement
GetMigrationStatus— Read Redis keynotes_migration:{cid}; absent →MigrationStatus{Status: "not_started"}. - Implement handler —
POSTdecodes body → callsValidateAndEnqueue;GETreads?cid=query param → callsGetMigrationStatus. Usemyhttp.NewJSONResponsefor success, typed error codes from §3.B for failures. - Register routes — In
rest_router.go, inside theprivategroup block (:69-79):private.Post("/notes/migrate", handler.Migrate)+private.Get("/notes/migration/status", handler.GetStatus). - Go green —
make test. - Quality gate —
make lint && make build.
Acceptance criteria
-
POST /private/notes/migratevalid BasicAuth + flag ON → 200{job_id, status:"in_progress"}(NOTES-MIG-S01/AC-1). - Flag OFF → 403
FLAG_DISABLED; no job enqueued (NOTES-MIG-S01/ERR-1). - CID already
completed_success→ 409ALREADY_MIGRATED(NOTES-MIG-S01/ERR-2). - Per-CID in-progress lock held → 409
JOB_ALREADY_RUNNING(NOTES-MIG-S03/ERR-1). - Non-BasicAuth call (no
Authorizationheader) → 401/403; IAG JWT not accepted (NOTES-MIG-S01/ERR-4). -
GET /private/notes/migration/status?cid=→{status, progress_pct, notes_processed, notes_total, failure_rate, match_pct}(NOTES-MIG-S01/AC-2). - Both routes are under the
/privateBasicAuth group — verified inrest_router.goand by handler test.
Test strategy
Go handler tests with mocked NotesMigrationService assert the full error response catalog (HTTP status + error code strings) and the happy-path response shape. Manual: curl -u user:pass -X POST localhost:.../private/notes/migrate -d '{"cid":"..."}' verifies route registration against a local server.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 2.0 |
| QA | 1.0 |
| Total | 3.0 |
Assumptions:
JobEnqueuer.EnqueueJobalready exists atjob_enqueuer.go:38-67;mymiddleware.BasicAuthalready guards/privateatrest_router.go:70; error response shape follows existingmyhttpconventions in the codebase.
Run to verify
make test && make lint && make build
Depends on
- [Task 1.1] (
NotesMigrationJobName, Redis status key + TTL,NotesMigrationRequest/Responsepayload), [Task 2.4] (consumer registered inworker_service.go).
Task 2.6: [BE] Render-path legacy_owner_label fallback in contact_notes_service.go (NOTES-MIG-S05)
A migrated note whose original author couldn't be mapped to an SSO user still shows a readable author name in the CDP Notes UI — not a blank field.
Status: ✅ Actionable. This is the only read-path change in the entire RFC — a single conditional branch.
What to build
In contact_notes_service.go:131-136, after GetUserNamesBulk resolves owner names, add a fallback: if a note's resolved owner_name is empty and note.LegacyOwnerLabel is non-empty, use LegacyOwnerLabel as the display name. The live-permission logic (contact_notes_handler.go:143-166) is untouched — edit/delete may remain hidden for label-only notes, which is accepted for historical notes.
Implementation Plan
| Action | File | What changes |
|---|---|---|
| extend | internal/app/service/contact_notes/contact_notes_service.go:131-136 | After GetUserNamesBulk, for any note where resolved name is "" and note.LegacyOwnerLabel != "", set display name = note.LegacyOwnerLabel |
| extend | internal/app/service/contact_notes/contact_notes_service_test.go | Tests: note with OwnerID="" + LegacyOwnerLabel="Former Agent" → resolved author is "Former Agent"; note with valid OwnerID → live SSO name (existing behavior unchanged) |
Implementation steps
- Write failing tests (red) — Add two test cases in
contact_notes_service_test.go: (a) note withOwnerID="",LegacyOwnerLabel="Former Agent"→ resolved author in response is"Former Agent"; (b) note with validOwnerID="abc-uuid",LegacyOwnerLabel=""→ resolved author is the live SSO name fromGetUserNamesBulk(existing path). Runmake test, confirm red. - Implement fallback — In the
GetUserNamesBulkresolution block at:131-136, add:if resolvedName == "" && note.LegacyOwnerLabel != "" { resolvedName = note.LegacyOwnerLabel }. - Go green —
make test. - Quality gate —
make lint && make build.
Acceptance criteria
- Note with
owner_id=null+legacy_owner_label="Former Agent"→ renders"Former Agent"as author, not blank (NOTES-MIG-S05/AC-3, Decision 6). - Note with valid
owner_id→ live SSO name shown; existing behavior is unchanged. -
contact_notes_handler.go:143-166(edit/delete permission) is not modified — permission computation stays as-is.
Test strategy
Go unit tests on contact_notes_service.go with mocked GetUserNamesBulk response assert the fallback branch independently from the existing live-name branch.
Effort estimate
| Discipline | Days |
|---|---|
| Backend | 0.5 |
| QA | 0 |
| Total | 0.5 |
Assumptions:
ContactNote.LegacyOwnerLabeladded in Task 1.1;GetUserNamesBulkcall at:131-136already iterates notes and resolves names — this is a one-line fallback after that loop.
Run to verify
make test && make lint && make build
Depends on
- [Task 1.1] (
LegacyOwnerLabelfield onContactNote).
Ordering rationale
- Start with Task 1.1 (data model + constants + index) — every other task depends on the
ContactNotestruct extensions,NotesMigrationJobName, andCreateNotesBatch. It has no dependencies and can land on day 1. - Task 1.2 (HTML Normalizer) is fully independent and can run in parallel with Tasks 2.1, 2.2, and 2.3 — fan all four out simultaneously once Task 1.1 lands. Task 2.4 (consumer assembly) consumes all of them and is the integration spine that must wait.
- Tasks 2.1, 2.2, 2.3 are independently parallelizable (resolvers, attachment processor, and CRM client each have clean interfaces) — assign to separate developers; all three feed Task 2.4.
- Task 2.4 (consumer assembly) is the critical path — it depends on all Phase 1 tasks and Tasks 2.1–2.3. Prioritize landing predecessors quickly; the per-CID lock in 2.4 and the
ValidationRunnerare the last pieces before Stage 1 is runnable. - Task 2.5 (handler + routes) can build in parallel with 2.4 — it only needs the service/enqueue scaffold from Task 1.1 (job name, payload, Redis key), not the consumer internals. Merge before Stage 1.
- Task 2.6 (render-path fallback) is the smallest task and can ship any time after Task 1.1 — it is independent of the migration pipeline and can land even before Stage 1 to ensure the UI is ready for migrated notes.
- Push externally on OQ-7 (CRM extraction endpoint) — the only hard external blocker for running migrations. The stub interface in Task 2.3 keeps the entire pipeline buildable and testable today. Also push on OQ-10 (InfoSec
bluemondayapproval) to unblock AGREED on Task 1.2, and OQ-9 (CDP storage quota + CRM S3 access confirmation) at Stage 0.
Skipped stories
| Story / Task | Reason |
|---|---|
| NOTES-MIG-S05 (View migrated notes — full story) | No new BE or FE code — migrated notes render via the existing CDP Notes UI and the existing GET /iag/v1/contacts/{id}/notes endpoint unchanged. The only new piece is the owner-label fallback, covered in Task 2.6. S06 "Legacy" banner/tag has no FE infrastructure (CustomerNote has no metadata field — D-9) and is explicitly out of scope. |
POST /cdp/notes/migrate (PRD literal batch endpoint) | Superseded by the in-process gocraft/work write + /private/notes/migrate trigger (Tasks 2.4 + 2.5) per Decision 1. The /cdp namespace does not exist in rest_router.go. |
| CRM org-scoped extraction endpoint | Owned by the Legacy CRM Squad (OQ-7). Task 2.3 stubs the CRMNotesExtractor interface; no consumer code is blocked. |
| Mobile (all stories) | No mobile work in this RFC — backend migration pipeline only; migrated notes surface on mobile via the existing notes read path unchanged. |
crm_checkin geolocation | Deliberately dropped (Decision 9, D-10). CDP has no geo field; CRM address is Lockbox-encrypted. Each note with a check-in logs a marker for auditability — no migration needed. |