Skip to main content

RFC: PII Encryption — Contact Service

Document Conventions (do not remove)

This RFC follows the Qontak RFC Template for governance. Sections 1–6 and Comment logs are mandatory; sections that do not apply are marked N/A — reason.

It is agent-execution-ready: §1 PRD-to-Schema Derivation (BE), §2 Repo Reading Guide with Source Verification, mermaid diagrams, and the §4 Agent Execution Plan + Verification & Rollback Recipe are the readiness gates checked in §7.

Implementation state as of 2026-07-01. Tasks I–VIII are done (go-utils/crypto dependency, key config, Mongo migration, Contact struct extension, crypto helpers, dual-write on create/update, Usernames column, per-team backfill HTTP endpoint). The endpoint approach timed out on large teams — the root cause is documented in Decision 1 and the fix is a cron-based backfill job (Task VIII-fix / TF-3434). Remaining tasks are VIII-fix through XV.

Scope update (2026-07-01): accounts field (accounts_encrypted, accounts_bidx) is excluded from remaining tasks by product decision and is listed under Out of Scope. address field (address_encrypted, address_bidx) has been re-added to scope — field additions land in Task VIIIa (TF-3435) and the backfill in Task VIIIb (TF-3436, requires sparse index migration 036 before running).

Production scale note: The contact collection has ~150M documents. The backfill cron (TF-3434) requires migration 035 (name_encrypted index) to be applied before the image is deployed — make migrate-up blocks 30–60 min on 150M documents and must run off-peak.

Metadata

FieldValueNotes
StatusRFCHuman label; YAML status: in-review.
DRIPuji TriwibowoRFC + implementation owner.
Typebackend-onlyPure contact-service change; no FE/mobile surface.
Author(s)CDP Squad
ReviewersCDP Squad BE, Infosec, Data PlatformInfosec sign-off required (AES key scope, blind-index risk).
Approver(s)TBD tech lead + TBD infosec
Submitted Date2026-07-01
Last Updated2026-07-01
Target Release2026-Q3
Source ConfluenceDRAFT RFC — PII EncryptionOriginal design; this RFC supersedes it with implementation state.
Jira EpicTF-2480PII Encryption (Supporting)

Sections at a Glance

  1. Overview (Problem, Success Criteria, Out of Scope, Dependencies, PRD-to-Schema, Traceability)
  2. Technical Design (Infra Topology, ADR Decisions, Repo Reading Guide + Source Verification, Sequence Diagrams, Data Model, APIs, Async Paths)
  3. High-Availability & Security
  4. Backwards Compatibility & Rollout Plan (Agent Execution Plan, Verification & Rollback Recipe)
  5. Concerns, Questions, or Known Limitations
  6. Comment Logs
  7. Ready for Agent Execution

1. Overview

Encrypt all PII fields in contact-service's MongoDB contact collection using AES-256-GCM (go-utils/crypto, single-algorithm method). Introduce parallel blind-index fields (*_bidx) for exact-match search and tokenized shadow fields (*_search) for partial-match search, so search parity is preserved after plaintext fields are removed.

The migration is phased to zero hard downtime:

PhaseWhat changesWhen
1 — AdditiveAdd *_encrypted, *_bidx, *_search fields + indexes (no behavior change)Done (TF-2563)
2 — Dual-writeEvery create/update writes plaintext and encrypted fields simultaneouslyDone (TF-2593, TF-2594, TF-2866)
BackfillEncrypt historical docs via background cron (migrated from endpoint — Decision 1)Next (TF-3434 + migration 035)
Backfill addressStruct fields + crypto helpers (TF-3435); gap backfill via sparse index + HTTP endpoint (TF-3436 + migration 036) — runs after main cron completesAfter TF-3434 cron finishes
3 — Read switch*_encrypted becomes primary read source; plaintext is fallbackTask X (TF-2597)
4 — CleanupStop writing plaintext; unset legacy fields after soak periodTask XV (TF-2600)

Success Criteria

  • 100% of non-deleted contact documents have name_encrypted populated before read switch.
  • Read path serves from encrypted fields with zero functional regression (API responses identical).
  • Search (exact and partial) passes parity test suite against same data.
  • p95 endpoint latency increase ≤ 100 ms for read-path endpoints touching PII fields.
  • No plaintext PII appears in service logs at any phase.
  • Backfill job throughput ≥ 1,000 contacts/min at default sleep (MongoDB oplog lag < 5 s).

Out of Scope

  1. Client-side encryption — this RFC covers server-side / database-level encryption only.
  2. Key rotation (MultiAlgAdapter) — follow-up after steady-state.
  3. Rewriting unrelated APIs or domain behavior outside the target PII fields.
  4. Building an external search engine — native Mongo with blind-index is the chosen approach.
  5. accounts field encryption (accounts_encrypted, accounts_bidx) — excluded by product decision; accounts is not used and will not be encrypted. SearchByAccountUniqueID blind-index migration (Task XIII) is also excluded.
  6. Kafka consumer paths and webhook delivery — tracked in TF-2589 (Async/Consumer Paths); scoped out of this RFC to reduce blast radius.
TitleLinkWhat this RFC takes from it
DRAFT RFC — PII Encryption (Confluence)ConfluenceOriginal design, field format, phase plan, search strategy
TF-2480 Jira Epic + child storiesJiraTask breakdown I–XV; implementation state
go-utils/crypto docsbitbucket.org/mid-kelola-indonesia/go-utils/src/master/docs/crypto_single_alg.mdAES256GCM payload format, cipher API

Assumptions

  1. PII_ENCRYPTION_KEY and PII_ENCRYPTION_KEY_ID are provisioned via Vault and injected as env vars — key management is out of scope for this RFC.
  2. MongoDB replication lag is monitored; backfill rate is throttled via configurable sleep to stay within oplog capacity.
  3. Redis is available for cron control flags (same requirement as existing BackfillNameTokenizedCron).
  4. The read-switch feature flag (pii_read_encrypted_enabled) is toggled manually by engineering; no automatic cut-over.

Dependencies

DependencyOwnerStatusBlocking?
go-utils/crypto AES256GCM cipherPlatformdone — wired in contact/crypto_helpers.goYES (done)
Vault PII encryption keyInfra/Secdone (env present in staging)YES
Redis cron controlInfraexists — shared Redis used by BackfillNameTokenizedCronYES
Mongo *_bidx indexesDB/Infradone (TF-2563, migration files present)YES
Data Platform ETL updateDatapending — ETL jobs must add decrypt step before Phase 4 (TF-2589)YES for Phase 4

PRD-to-Schema Derivation

The contact collection lives in MongoDB. "Schema" here is the Contact Go struct + bson tags (internal/app/repository/contact/base.go).

PII requirementPersisted as (collection.field)Enforced whereSource
Encrypt name at restname_encrypted: EncryptedPayload, name_bidx: string, name_search: []stringencryptContactPIIFields (crypto_helpers.go:65)go-utils AES256GCM
Encrypt email at restemail_encrypted: EncryptedPayload, email_bidx: stringsame
Encrypt phone[] at restphone_encrypted: []EncryptedPayload, phone_bidx: []stringsame
Encrypt usernames[] at restusernames_encrypted: []EncryptedUsername, usernames_bidx: []UsernameBidxsame
Encrypt address at rest (sub-document)address_encrypted: *EncryptedPayload, address_bidx: string — JSON-marshal the *Address struct before encryptingencryptContactPIIFields (TF-3435)Sub-document: json.Marshal(c.Address)encryptToPayload
accounts[]Excluded by product decisionaccounts_encrypted, accounts_bidx are not encrypted in this RFC
Exact search (email, phone)*_bidx HMAC-SHA256 keyed hash (HKDF-derived blind key from encryption key)NewConfig (crypto_helpers.go:34)
Partial search (name)name_search: []string token array (same as name_tokenized)encryptContactPIIFields:80
Dual-write during transitionplaintext field and *_encrypted/*_bidx written atomically per documentapplyDualWriteToBSONMap (crypto_helpers.go:350)
Backfill historical docsper-batch cron using BuildPIIEncryptedUpdateFields + BulkUpdateFieldsBackfillPIIEncryptionCron (TF-3434) — requires migration 035 (name_encrypted index) before running
Backfill address gapbounded HTTP endpoint POST /private/contacts/backfill/pii/addressBackfillAddressPII (TF-3436) — requires migration 036 (sparse index on address) before running

2. Technical Design

Infrastructure Topology

flowchart TB
agent([API / Consumer caller]) -->|HTTPS| lb[Ingress / API Gateway]
lb --> cs["contact-service-api pods\n(chi, stateless)"]
cs -->|read/write bson| mongo[("MongoDB\ncontact collection")]
cs -->|HKDF blind-index key| cr["contact/crypto_helpers\n(in-process AES256GCM)"]
cr --> vault(["Vault\n(PII_ENCRYPTION_KEY)"])

redis[("Redis\n(cron control flags)")] -.->|enable/disable/sleep| cron["BackfillPIIEncryptionCron\n(gocraft/work, worker pod)"]
cron -->|SearchMissingPIIEncryptionByTeam\n+ BulkUpdateFields| mongo
cron -->|BuildPIIEncryptedUpdateFields| cr

cs -->|feature flag\npii_read_encrypted_enabled| redis

Verified infra: MongoDB go.mongodb.org/mongo-driver v1.12.1 (go.mod); Redis-backed gocraft/work worker (internal/app/service/job_enqueuer.go, make run-worker Makefile:64-77); existing cron pattern: BackfillNameTokenizedCron (internal/app/cron/backfill_name_tokenized.go).

Technical Decisions (ADR)

Decision 1: Convert backfill from blocking HTTP endpoint to cron job

Context. Task VIII delivered POST /private/contacts/backfill/pii/{team_id} backed by PIIBackfillService.BackfillPIIByTeam. The service loops through all pages of 100 contacts synchronously within the HTTP request context (pii_backfill_service.go:41). For large teams (tens-of-thousands of contacts), the full loop exhausts the request deadline → HTTP timeout, leaving the backfill partially done with no way to resume from the last page.

Root cause. HTTP handlers have a bounded timeout (typically 30–60 s); the pagination loop inside a single request is unbounded — it runs until all pages are exhausted or the connection drops. Page count = ceil(team_doc_count / 100), so any team with > ~500 contacts will timeout at default HTTP deadlines.

Options considered.

  • Option A — paginated endpoint (cursor-based). Keep HTTP; return a next_page token; caller drives pagination. Pros: no infra change. Cons: requires an external orchestrator (script / pipeline) to loop; the caller must handle retries; operational overhead to script across all teams.
  • Option B — enqueue per-team gocraft/work job. Endpoint enqueues a PIIBackfillJob for the given team; worker processes all pages. Pros: non-blocking; survives pod restarts. Cons: one job per team → job fan-out; no global rate limiting across teams; harder to observe cross-team progress.
  • Option C — background cron job (all teams, batch-at-a-time). A BackfillPIIEncryptionCron runs on the worker pod, continuously fetches the next batch of contacts missing name_encrypted across all teams, encrypts them, and sleeps between batches. Controlled by Redis flags (enable/disable, force-break, sleep-time). Matches exactly the BackfillNameTokenizedCron pattern already in the repo. Pros: self-driving, no external orchestrator; uniform across all teams; configurable throttle; identical to proven pattern. Cons: no team-ordering control (processes whichever doc Mongo returns first from the missing-encrypted index).

Decision. Option C — implement BackfillPIIEncryptionCron following the BackfillNameTokenizedCron pattern (verified internal/app/cron/backfill_name_tokenized.go).

The existing HTTP endpoint (BackfillPIIByTeam) is kept but demoted to a single-batch trigger (processes max 100 contacts per call) for manual/targeted use; it no longer loops.

Cron control flags (Redis keys):

Redis keyEffectDefault
pii_backfill_encryption_enabledMaster switch — cron exits early if absent/emptydisabled (safe default)
pii_backfill_encryption_force_breakAbort current run immediatelyunset
pii_backfill_encryption_sleep_msSleep between batches (ms); 0 = no sleep0
pii_backfill_encryption_activeTTL-guarded mutex — prevents concurrent runsset for 3600 s on start
pii_backfill_encryption_batch_sizeContacts per batch; default 1,0001000

Rationale. The BackfillNameTokenizedCron pattern is battle-tested in production. Reusing it gives force-break, active-job dedup, and rate control for free. The cron runs on the existing worker pod deployment — no new infra.

Consequences. The HTTP endpoint loop is shortened to one batch for targeted use. The cron replaces the bulk backfill. Progress visibility comes from logs (pii_backfill_cron batch_processed) and the reconciliation checker (Decision 2 / Task IX).

Reversibility. Disable via Redis flag; HTTP endpoint loop can be re-enabled in one line.


Decision 2: Reconciliation checker via dedicated cron pass

Context. After the backfill completes, we need confidence that every non-deleted contact has name_encrypted before switching reads (Task X). A simple count query is insufficient — it doesn't catch docs where encryption was skipped due to a transient error mid-batch.

Decision. Implement a reconciliation checker (BackfillPIIReconciliationCron or a dedicated service method) that:

  1. Queries { name: { $exists: true, $ne: "" }, name_encrypted: { $exists: false } } for the whole collection (same query as the backfill cron uses to find missing docs).
  2. Returns a count of remaining unencrypted contacts.
  3. Logs pii_backfill_reconciliation total_missing=N.
  4. Emits a DataDog metric cdp_pii_backfill_missing_count so an alert can fire if the count rises unexpectedly after the read switch.

The reconciliation checker runs as a one-shot cron (scheduled; not continuously looping) and can also be triggered via a lightweight GET /private/contacts/backfill/pii/status endpoint that returns { missing_count, total_count, pct_complete }.

Rationale. A count query on the name_encrypted index is fast once migration 035 is applied (TF-3434 adds idx_contact_name_encrypted: { name_encrypted: 1 }). Without that index the count query would scan 150M docs. This is cheap post-migration and gives an exact gate before Phase 3.

Reversibility. Remove the cron registration; the query itself is harmless.


Decision 3: Encrypted-first read with plaintext fallback (Phase 3)

Context. After backfill reaches 100% coverage, reads need to switch to the encrypted fields as the source of truth. The transition must be:

  • Gradual (feature-flag controlled, not a code deploy).
  • Zero-regression (if a doc somehow lacks name_encrypted, serve plaintext; never return empty).
  • Reversible (flag off → revert to plaintext reads with no data change).

Decision. Add a Redis-backed feature flag pii_read_encrypted_enabled. In all repository read/search methods that project PII fields, after deserializing the document:

if flag=on AND name_encrypted != nil:
decryptContactPIIFields(cfg, &contact) // populates .Name etc. from *_encrypted
else:
// plaintext .Name/.Email etc. already populated by bson unmarshal

decryptContactPIIFields already exists (crypto_helpers.go:183); this decision wires it into the read path. The flag is read on every request (Redis GET; <1 ms); no pod restart is needed to toggle.

The fallback logic is field-level, not document-level. If name_encrypted is nil but name is set, Name is already populated by unmarshal and is returned as-is. This handles:

  • Docs not yet backfilled (name plaintext → returned; degraded but not broken).
  • Future docs with partial encryption (edge case).

Read methods to wire the flag into (all in internal/app/repository/contact/):

  • search.goSearchWithFilters, SearchByEmail, SearchByPhone, SearchByAccountUniqueID, SearchByCompanySsoID, SearchByID, and all variants.
  • create.goInsertContact response serializer.
  • update.goUpdateContact return path.

Rationale. Field-level fallback is the safest approach: no document is ever "broken" by the switch, and the flag can be toggled without a deploy.

Consequences. One Redis round-trip per request (already paid by other flag checks in the service layer; can be cached per-request in context). Decryption cost: p95 ≤ 5 ms per field (verified: AES-256-GCM on typical field lengths is sub-millisecond in Go at this scale).

Reversibility. Toggle flag to 0/unset; reads revert to plaintext instantly.


Decision 4: Migrate exact-match searches to blind-index fields

Context. SearchByEmail, SearchByPhone, and SearchByAccountUniqueID currently query plaintext fields using case-insensitive regex / $elemMatch. After Phase 4 removes plaintext fields, these queries will return zero results.

Decision. Migrate each exact-match search to the corresponding *_bidx field. The blind index is HMAC-SHA256(normalize(value), blind_key) where blind_key is derived from the encryption key via HKDF (NewConfig:44). To query: compute the blind index of the search input using the same cipher, then query { email_bidx: computedHash }.

Migration plan per method:

Search methodCurrent filterNew filterNormalization
SearchByEmail (search.go){ email: { $regex: input, $options: "i" } }{ email_bidx: cfg.Cipher.BlindIndex(strings.ToLower(input)) }lowercase
SearchByPhone (search.go){ phone: input }{ phone_bidx: cfg.Cipher.BlindIndex(input) }none (phone stored as-is)
SearchByAccountUniqueID (search.go){ accounts: { $elemMatch: { unique_id: input } } }{ accounts_bidx: { $elemMatch: { channel: ch, unique_id_bidx: cfg.Cipher.BlindIndex(input) } } }none
Name search (partial)name_tokenized: { $all: tokens }name_search: { $all: tokens }tokenize (already written at encrypt time)

Each migration is guarded by the same pii_read_encrypted_enabled flag: if flag off → old regex query; if flag on → blind-index query. This gives a single toggle for both read decryption and search migration. (Implementation: the ToFilters() method in internal/app/payload/search_contact_request.go already dispatches to the per-field search methods — the flag check can live there.)

Consequences. Partial search on email/phone/account.unique_id is permanently unavailable after Phase 4. Exact-match is preserved. Callers relying on prefix/fuzzy email search must be migrated to exact-match UX before Phase 4 is triggered. This is a known product constraint documented in the Confluence RFC §2.4.

Reversibility. Toggle flag off → old regex queries restored.


Decision 5: Legacy plaintext removal gate (Phase 4)

Context. Phase 4 is irreversible at the data level (unsetting name from every document cannot be easily undone in bulk). It must be gated behind firm criteria.

Decision. Phase 4 is triggered only when all of:

  1. Reconciliation checker returns missing_count = 0 (100% backfill).
  2. pii_read_encrypted_enabled flag has been on for ≥ 7 days with zero decryption failures (cdp_pii_decrypt_failure_rate = 0).
  3. Fallback-read counter cdp_pii_read_plaintext_fallback is zero for 24 h (no doc is silently serving plaintext).
  4. Data Platform ETL jobs are updated to decrypt from *_encrypted (TF-2589 done).
  5. Infosec sign-off on the field-removal migration.

Phase 4 itself: a one-time migration script (new db/migrations/0NN_contact_pii_cleanup.up.json) using $unset on plaintext fields for all documents — then disable plaintext write in code.

Reversibility. Backup/point-in-time restore before the migration; Phase 4 is only executed after a documented sign-off. Source-code rollback stops the plaintext unset from running again but does not restore already-unset documents.


Repo Reading Guide + Source Verification

All anchors below are verified against contact-service at ../contact-service (as of 2026-07-01).

SymbolFileLine(s)Notes
Contact struct (PII + encrypted fields)internal/app/repository/contact/base.go49–80+Name, Email, Phone, Accounts, Usernames + *_encrypted/*_bidx/*_search counterparts
Config / NewConfiginternal/app/repository/contact/crypto_helpers.go22–55HKDF blind-index derivation; cipher initialization
encryptContactPIIFieldscrypto_helpers.go65–175Pure; writes *_encrypted/*_bidx/name_search; zeros plaintext
decryptContactPIIFieldscrypto_helpers.go183–266Pure; populates plaintext from *_encrypted
applyDualWriteToBSONMapcrypto_helpers.go350–485Called in update paths; adds encrypted counterparts to $set map
BuildPIIEncryptedUpdateFieldscrypto_helpers.go495–593Backfill helper; returns bson.M of encrypted-only fields
PIIBackfillService.BackfillPIIByTeaminternal/app/service/pii_backfill_service.go38–116Current per-team loop (blocks HTTP — Decision 1 demotes to single-batch)
SearchMissingPIIEncryptionByTeamrepository/contact/search.go~219Existing per-team query; TF-3434 adds global SearchMissingPIIEncryption (no company_sso_id)
BackfillNameTokenizedCroninternal/app/cron/backfill_name_tokenized.go1–206Template pattern for BackfillPIIEncryptionCron (TF-3434)
Address structinternal/app/repository/contact/base.go78, 174Address *Address pointer; AddressEncrypted/AddressBidx fields added by TF-3435
SearchContactsWithAddressMissingEncryptionrepository/contact/search.go(TF-3436)Gap query using sparse index on address: { address: $exists: true, address_bidx: $exists: false }
BackfillAddressPIIinternal/app/service/pii_backfill_service.go(TF-3436)Bounded HTTP backfill: max 50 pages × 100 docs per call
Dual-write createinternal/app/repository/contact/create.goInsertContactCalls encryptContactPIIFields before insert
Dual-write updateinternal/app/repository/contact/update.goUpdateContact, BulkUpdateFieldsCall applyDualWriteToBSONMap before $set
Cron registrationinternal/app/cron/worker registration fileWhere BackfillPIIEncryptionCron must be registered
SearchByEmail (to migrate)internal/app/repository/contact/search.goSwitch to email_bidx lookup (Decision 4, TF-2598)
SearchByPhone (to migrate)internal/app/repository/contact/search.goSwitch to phone_bidx lookup (Decision 4, TF-2598)
SearchByAccountUniqueIDinternal/app/repository/contact/search.goExcludedaccounts not encrypted; no blind-index migration
ToFilters() (search dispatch)internal/app/payload/search_contact_request.goAdd flag-guarded branch for blind-index path (Decision 4, TF-2599)

Sequence Diagrams

Backfill Cron (Decision 1)

sequenceDiagram
participant sched as gocraft/work scheduler
participant cron as BackfillPIIEncryptionCron
participant redis as Redis
participant mongo as MongoDB

sched->>cron: trigger (scheduled interval)
cron->>redis: GET pii_backfill_encryption_active
alt already running
cron-->>sched: return (skip)
end
cron->>redis: GET pii_backfill_encryption_enabled
alt disabled
cron-->>sched: return (skip)
end
cron->>redis: SET pii_backfill_encryption_active TTL=3600
loop until no docs left OR force_break
cron->>redis: GET pii_backfill_encryption_force_break
alt force_break set
cron-->>sched: break
end
cron->>mongo: SearchMissingPIIEncryption(batch_size=100)
mongo-->>cron: []Contact (plaintext)
cron->>cron: BuildPIIEncryptedUpdateFields() per contact
cron->>mongo: BulkUpdateFields(updates map[ObjectID]bson.M)
mongo-->>cron: ok
cron->>redis: GET pii_backfill_encryption_sleep_ms
cron->>cron: time.Sleep(sleepMs)
Note over cron: log pii_backfill_cron batch_processed
end
cron->>redis: DEL pii_backfill_encryption_active
Note over cron: log pii_backfill_cron completed

Encrypted-First Read (Phase 3, Decision 3)

sequenceDiagram
participant handler as contact handler
participant repo as contact repository
participant redis as Redis
participant mongo as MongoDB
participant crypto as crypto_helpers

handler->>repo: SearchWithFilters(ctx, req)
repo->>redis: GET pii_read_encrypted_enabled
alt flag = "1"
repo->>mongo: find({ email_bidx: hash(input) }) — blind-index query
mongo-->>repo: []Contact (with *_encrypted fields set)
repo->>crypto: decryptContactPIIFields(cfg, &contact)
crypto-->>repo: Contact{Name, Email, Phone, ...} populated
else flag = "0" or unset
repo->>mongo: find({ email: { $regex: input } }) — legacy query
mongo-->>repo: []Contact (plaintext fields set)
end
repo-->>handler: []Contact

Data Model (MongoDB contact collection)

The encrypted payload format follows go-utils standard (EncryptedPayload struct):

{
"kid": "contact-key-v1",
"alg": "AES256GCM",
"iv": "<base64_nonce>",
"payload": "<base64_ciphertext>"
}

Additive fields on each contact document (Phase 1 — done):

// encrypted payload fields
name_encrypted: EncryptedPayload (object)
email_encrypted: EncryptedPayload (object)
phone_encrypted: []EncryptedPayload (array)
usernames_encrypted: []EncryptedUsername (array: { source, flag, icon_key, value{*} })
address_encrypted: *EncryptedPayload (object — JSON-marshalled Address struct; TF-3435)

// exact search blind-index fields (HMAC-SHA256 keyed hash)
name_bidx: string
email_bidx: string
phone_bidx: []string
usernames_bidx: []UsernameBidx (array: { value_bidx })
address_bidx: string (blind-index of marshalled JSON; TF-3435)

// partial search shadow fields
name_search: []string (token array, same algorithm as name_tokenized)

// NOTE: accounts_encrypted / accounts_bidx excluded by product decision

Mongo indexes (db/migrations/):

  • Done (027): idx_contact_email_bidx: { company_sso_id: 1, is_deleted: 1, email_bidx: 1 }
  • Done (027): idx_contact_phone_bidx: { company_sso_id: 1, is_deleted: 1, phone_bidx: 1 }
  • Done (028): idx_contact_name_search: { company_sso_id: 1, is_deleted: 1, name_search: 1 }
  • TF-3434 (035): idx_contact_name_encrypted: { name_encrypted: 1 } — regular index; enables { name_encrypted: { $exists: false } } backfill query on 150M docs without a full collection scan. make migrate-up blocks 30–60 min — run off-peak before deploying TF-3434.
  • TF-3436 (036): idx_contact_address_sparse: { address: 1 } sparse — only indexes docs with a non-null address; enables the address gap query without scanning 150M docs. Run off-peak before deploying TF-3436.

APIs

No external API contract changes. Internal behavior changes only:

Endpoint groupPhase 2 (now)Phase 3 (read switch)Phase 4 (cleanup)
POST/PUT /iag/v1/contactsdual-write (plaintext + encrypted)dual-write continuesencrypted-only write
GET /iag/v1/contacts/{id}returns plaintext (read unchanged)returns decrypted-from-encryptedsame
SearchContacts*regex on plaintextblind-index on *_bidx (flag-gated)blind-index only
POST /private/contacts/backfill/pii/{team_id}single-batch (100 docs max, returns count)sameremoved/no-op
GET /private/contacts/backfill/pii/statusreturns { missing_count, total_count, pct_complete }samesame
POST /private/contacts/backfill/pii/addressbounded loop max 50 pages × 100 docs; returns { processed_count, remaining_count } (TF-3436)sameremoved/no-op

3. High-Availability & Security

Performance

  • AES-256-GCM per-field latency: ≤ 5 ms p95 (Go stdlib implementation; verified by micro-benchmark).
  • Endpoint p95 regression: ≤ 100 ms for endpoints touching encrypted fields.
  • Backfill batch size: 1,000 contacts (cron default); configurable via Redis. HTTP endpoint cap stays at 100 per call (timeout constraint). Sleep between cron batches: configurable (default 0 ms; increase if oplog lag rises).
  • MongoDB write pressure during backfill: throttled by sleep key; monitored via Datadog mongodb.opcounters.update.

Monitoring & Alerting

Emit the following DataDog metrics (structured log events piped to DD):

MetricAlert threshold
cdp_pii_backfill_batch_processed (count)— (progress tracking)
cdp_pii_backfill_missing_count (gauge, from reconciliation)> 0 after read switch enabled
cdp_pii_decrypt_error_count (counter)> 0 (any decryption error = incident)
cdp_pii_read_plaintext_fallback (counter)> 0 for > 5 min after Phase 3 gate
cdp_pii_backfill_failed_count (counter)> 10 in 5 min

Logging

  • Never log plaintext PII (name, email, phone, accounts).
  • Log only: contact ObjectID, team SSO ID, field name (for encrypt/decrypt errors), phase, error category.
  • Encrypted payload and key material must never appear in logs.

Security Implications

  • AES-256-GCM provides confidentiality + integrity for stored values.
  • Blind index leaks equality pattern (same input → same hash). This is the accepted trade-off for searchability. Keyed HMAC mitigates rainbow-table attacks.
  • name_search / token fields introduce additional disclosure risk — controlled by not storing raw tokens for email/phone (only name). Full policy in Confluence RFC §3.
  • Keys must be externally managed (Vault); no hardcoded keys or keys in config files.
  • Key rotation: follow-up using MultiAlgAdapter (out of scope here).

Datalake / Datamart Impact

After Phase 4, name, email, phone, accounts are removed from the collection. ETL jobs that read from contact must add a decrypt step before transformation. This is owned by Data Platform (TF-2589). Phase 4 is blocked until TF-2589 is done.


4. Backwards Compatibility & Rollout Plan

Compatibility

  • API request/response schema unchanged through all phases.
  • Legacy plaintext fields remain readable through Phase 3.
  • Rollback at each phase is non-destructive until Phase 4 (unset migration).

Agent Execution Plan

The following tasks are ordered. Complete one fully before starting the next. Tasks I–VIII are already done; start at VIII-fix.

[DONE] I. go-utils/crypto dependency + wrapper (TF-2561)
[DONE] II. Encryption key configuration (TF-2562)
[DONE] IIa. getString → getStringOrPanic (TF-2723)
[DONE] III. Mongo migration: *_encrypted, *_bidx indexes (TF-2563)
[DONE] IV. Extend Contact struct (TF-2591)
[DONE] V. Encryption helpers in repository layer (TF-2592)
[DONE] VI. Dual-write: InsertContact, InsertContactBulk (TF-2593)
[DONE] VII. Dual-write: UpdateContact, UpdateMany, Bulk (TF-2594)
[DONE] VIIb. Usernames encryption column (TF-2866)
[DONE] VIII. HTTP backfill endpoint (per-team) (TF-2595) ← timed out

[NEXT] VIII-fix. BackfillPIIEncryptionCron — cron-based backfill (TF-3434)
⚠️ PRE-DEPLOY: run make migrate-up (migration 035: name_encrypted index)
off-peak BEFORE deploying image — blocks 30–60 min on 150M docs.
- New migration: db/migrations/035_add_name_encrypted_index.up.json
→ { name_encrypted: 1 } regular index; required for $exists: false query at scale
- New file: internal/app/cron/backfill_pii_encryption.go
- Pattern: mirror BackfillNameTokenizedCron exactly
- Redis flags: pii_backfill_encryption_{enabled,force_break,sleep_ms,active,batch_size}
- Refactor BackfillPIIByTeam HTTP handler to process ONE batch (100 docs max) and return
- Register cron in worker setup
- Unit test: cron exits when flag unset; processes batch when enabled

[TODO] VIIIa. address_encrypted + address_bidx struct fields + crypto helpers (TF-3435)
- Depends on: TF-3434 merged first
- Add AddressEncrypted *EncryptedPayload and AddressBidx string to Contact struct (base.go)
- Extend encryptContactPIIFields: json.Marshal(c.Address) → encryptToPayload → nil pointer
- Extend decryptContactPIIFields: decryptFromPayload → json.Unmarshal → *Address
- Extend BuildPIIEncryptedUpdateFields: emit address_encrypted/address_bidx when c.Address != nil
- Round-trip test: encrypt → decrypt → Address struct fields match

[TODO] VIIIb. address backfill — sparse index + bounded HTTP endpoint (TF-3436)
- Deferred: run after all other tickets (Tasks VIII-fix through XIV) are complete
- Depends on: TF-3435 merged (needs AddressEncrypted/AddressBidx struct + helpers)
- Does NOT gate Phase 3 or Phase 4 — plaintext address remains readable throughout
⚠️ PRE-DEPLOY: run make migrate-up (migration 036: sparse index on address) off-peak.
- New migration: db/migrations/036_add_address_sparse_index.up.json
→ { address: 1 } sparse=true; required for address $exists: true query at scale
- New method: SearchContactsWithAddressMissingEncryption(ctx, limit, page)
filter: { is_deleted: false, address: $exists: true, address_bidx: $exists: false }
- New endpoint: POST /private/contacts/backfill/pii/address
→ bounded loop max 50 pages × 100 docs; returns { processed_count, remaining_count }
- Ops calls endpoint until remaining_count = 0 before proceeding to Task X

[TODO] IX. Reconciliation checker (TF-2596)
- New method: CountMissingPIIEncryption(ctx) (int64, error)
- New endpoint: GET /private/contacts/backfill/pii/status → { missing_count, total_count, pct_complete }
- Emit cdp_pii_backfill_missing_count metric
- Gate: proceed to Task X only when missing_count = 0

[TODO] X. Encrypted-first read switch (TF-2597)
- Add Redis flag: pii_read_encrypted_enabled
- Wire decryptContactPIIFields into ALL repository read paths (search.go, create.go return, update.go return)
- Field-level fallback: if *_encrypted is nil → use plaintext as-is
- Emit cdp_pii_read_plaintext_fallback counter when fallback triggers
- Integration test: insert doc with encrypted fields, toggle flag, verify response

[TODO] XI. Migrate SearchByEmail + SearchByPhone to blind-index (TF-2598)
- Guard both with pii_read_encrypted_enabled flag
- SearchByEmail: query { email_bidx: cfg.Cipher.BlindIndex(strings.ToLower(input)) }
- SearchByPhone: query { phone_bidx: cfg.Cipher.BlindIndex(input) }
- SearchByAccountUniqueID: NOT migrated (accounts excluded from scope)

[SKIP] XII. SearchByAccountUniqueID → accounts_bidx (EXCLUDED)
- accounts field excluded by product decision; no blind-index migration needed

[TODO] XIII. Update SearchContactRequest.ToFilters() (TF-2599)
- Route name search to name_search token array (already populated by dual-write)
- Route email/phone exact-match to *_bidx (guarded by flag)

[TODO] XIV. Remove legacy plaintext write + unset migration (TF-2600)
- GATE: all Phase 4 criteria from Decision 5 must pass before this task starts
- Remove plaintext field writes from create/update (dual-write → encrypted-only)
- New migration: db/migrations/0NN_contact_pii_cleanup.up.json
using $unset on name/email/phone/usernames (accounts already excluded)
- Disable pii_read_encrypted_enabled fallback code path

Verification & Rollback Recipe

Per-task verification:

TaskVerify byPass criteria
VIII-fix (cron)Enable flag in staging; tail logspii_backfill_cron batch_processed logs appear; pii_backfill_cron completed when no docs left
VIIIa (address fields)Insert contact with address; read backaddress_encrypted stored as BSON sub-document; decrypt returns original Address struct
VIIIb (address backfill)Call POST /private/contacts/backfill/pii/address after all other tasks doneprocessed_count increments; remaining_count reaches 0 after all pages
IX (reconciler)Call GET /private/contacts/backfill/pii/statusmissing_count decreases each run; reaches 0
X (read switch)Toggle pii_read_encrypted_enabled=1 in staging; call GET /iag/v1/contacts/{id}Response name, email, phone match plaintext values; no cdp_pii_decrypt_error_count increment
XI (search blind-index)Toggle flag; search by known email/phoneResults match pre-flag results; latency within 100 ms p95
XIII (ToFilters)Toggle flag; run name/email/phone filter searchesResults consistent with pre-flag; no regex fallback triggered
XIV (cleanup)Run migration on one tenant; query docname, email, phone fields absent; name_encrypted present

Rollback per phase:

PhaseRollback actionData safe?
Cron (VIII-fix)Set pii_backfill_encryption_enabled to empty in RedisYES — no data removed
Address fields (VIIIa)Revert base.go struct change; redeployYES — address_encrypted/address_bidx fields unused but harmless in existing docs
Address backfill (VIIIb)Stop calling the endpoint; no data removedYES — address plaintext still present
Read switch (X)Set pii_read_encrypted_enabled to 0 in RedisYES — plaintext still present
Search migration (XI, XIII)Set pii_read_encrypted_enabled to 0 → reverts to regex queriesYES
Cleanup (XIV)Cannot revert unset migration without restorePoint-in-time backup required before running

5. Concerns, Questions, or Known Limitations

#ItemMitigation / Status
OQ-1Partial search on email/phone is permanently unavailable after Phase 4Accepted product constraint; exact-match only. Communicate to product/callers before Phase 4.
OQ-2Backfill cron processes all teams uniformly — no priority orderingAcceptable for now. The old HTTP endpoint (single-batch mode) remains for targeted team processing.
OQ-3Contacts with empty name AND empty email (blank contacts) will not be backfilled by the name-missing queryAdd a secondary query condition OR run a separate pass after primary backfill completes. Low priority — blank PII contacts are low-risk.
OQ-4address field backfill query times out on 150M docs without an indexResolved: sparse index on address (migration 036, TF-3436) limits the scan to the small subset of contacts that have an address sub-document. Backfill via bounded POST /private/contacts/backfill/pii/address endpoint only after TF-3434 cron reaches missing_count = 0.
OQ-5Kafka consumer and webhook paths (TF-2589) must be updated before Phase 4Phase 4 gate includes TF-2589 sign-off (Decision 5).
OQ-6Decryption adds ~1–5 ms per field on read path — aggregate latency on multi-field projections?Profile in staging with realistic dataset before Phase 3 production toggle. Limit projection to required fields.
OQ-7accounts field not encrypted — gap vs. original scope?Excluded by product decision (2026-07-01): accounts is not actively used and will not be encrypted. SearchByAccountUniqueID blind-index migration is also excluded.

6. Comment Logs

2026-07-01 — Scope update: accounts excluded, address re-added, index gaps found (Berlianto / CDP)

  • accounts excluded: product decision — accounts is not actively used and will not be encrypted. Task XIII (SearchByAccountUniqueID blind-index) removed. accounts_encrypted/accounts_bidx struct fields not added.
  • address re-added: address_encrypted + address_bidx added back to scope. Split into two tasks: TF-3435 (struct fields + crypto helpers only) and TF-3436 (sparse index migration 036 + bounded HTTP backfill endpoint). address is a pointer sub-document serialised as JSON before encryption.
  • name_encrypted index gap found: Migration 027/028 confirmed — no index on name_encrypted. Without it, the { name_encrypted: { $exists: false } } cron query scans all 150M docs on every batch. Fix: migration 035_add_name_encrypted_index.up.json added to TF-3434. Must run make migrate-up off-peak (blocks 30–60 min) before deploying TF-3434.
  • address backfill query confirmed unsafe without sparse index: Production query { address: { $exists: true } } timed out on 150M docs. Fix: sparse index on address (migration 036, TF-3436). Only indexes docs that actually have an address — makes the gap query efficient without scanning the full collection.
  • Jira corrections: TF-2598 covers both SearchByEmail + SearchByPhone (combined); TF-2599 = ToFilters; TF-2600 = Phase 4 cleanup (was TF-2602). TF-3436 created new.

2026-07-01 — RFC Rewrite (Berlianto / CDP)

  • Tasks I–VIII marked Done per Jira (TF-2561 through TF-2866). Implementation verified against contact-service repo (../contact-service).
  • Root cause of HTTP timeout diagnosed: PIIBackfillService.BackfillPIIByTeam (pii_backfill_service.go:41) is a synchronous pagination loop with no page cap — exhausts request deadline for large teams. Fix: Decision 1 (cron job, BackfillPIIEncryptionCron).
  • Decision 1 modeled after BackfillNameTokenizedCron (cron/backfill_name_tokenized.go) — identical Redis control flag pattern.
  • HTTP endpoint demoted to single-batch (100 docs max) for targeted use; cron drives bulk backfill.
  • Remaining tasks IX–XV scoped and anchored to TF-2596 through TF-2602.
  • Phase 4 gate criteria (Decision 5) made explicit: 100% reconciliation + 7-day soak + DataPlatform sign-off.

7. Ready for Agent Execution

Checklist

  • §1 PRD-to-Schema Derivation — all encrypted fields traced to source
  • §2 Repo Reading Guide — all file:function anchors verified against ../contact-service
  • §2 Technical Decisions — 5 ADRs with options, rationale, consequences, reversibility
  • §2 Sequence Diagrams — backfill cron + encrypted-read flows
  • §4 Agent Execution Plan — task-ordered, each with specific file + action
  • §4 Verification & Rollback Recipe — per-task pass criteria + rollback per phase
  • §5 Open Questions — logged with mitigations

Entry Bar

An agent or engineer starting Task VIII-fix must:

  1. Read internal/app/cron/backfill_name_tokenized.go in full — the new cron mirrors it exactly.
  2. Read internal/app/service/pii_backfill_service.goBackfillPIIByTeam loop becomes a single-batch call.
  3. Read internal/app/repository/contact/crypto_helpers.go BuildPIIEncryptedUpdateFields — this is what the cron calls per contact.
  4. Implement BackfillPIIEncryptionCron in internal/app/cron/backfill_pii_encryption.go.
  5. Refactor PIIBackfillService.BackfillPIIByTeam to process exactly one page (remove the outer for page := 1; ; page++ loop — keep the single page fetch + bulk update).
  6. Register the new cron in the worker startup.
  7. Write unit tests: flag-disabled path exits; flag-enabled path processes one batch and logs correctly.

Do not start Task IX until the cron runs to completion (missing_count = 0) in staging.