RFC: PII Encryption — Contact Service
Document Conventions (do not remove)
This RFC follows the Qontak RFC Template for governance. Sections 1–6 and Comment logs are mandatory; sections that do not apply are marked
N/A — reason.It is agent-execution-ready: §1 PRD-to-Schema Derivation (BE), §2 Repo Reading Guide with Source Verification, mermaid diagrams, and the §4 Agent Execution Plan + Verification & Rollback Recipe are the readiness gates checked in §7.
Implementation state as of 2026-07-01. Tasks I–VIII are done (go-utils/crypto dependency, key config, Mongo migration, Contact struct extension, crypto helpers, dual-write on create/update, Usernames column, per-team backfill HTTP endpoint). The endpoint approach timed out on large teams — the root cause is documented in Decision 1 and the fix is a cron-based backfill job (Task VIII-fix / TF-3434). Remaining tasks are VIII-fix through XV.
Scope update (2026-07-01):
accountsfield (accounts_encrypted,accounts_bidx) is excluded from remaining tasks by product decision and is listed under Out of Scope.addressfield (address_encrypted,address_bidx) has been re-added to scope — field additions land in Task VIIIa (TF-3435) and the backfill in Task VIIIb (TF-3436, requires sparse index migration 036 before running).Production scale note: The
contactcollection has ~150M documents. The backfill cron (TF-3434) requires migration 035 (name_encryptedindex) to be applied before the image is deployed —make migrate-upblocks 30–60 min on 150M documents and must run off-peak.
Metadata
| Field | Value | Notes |
|---|---|---|
| Status | RFC | Human label; YAML status: in-review. |
| DRI | Puji Triwibowo | RFC + implementation owner. |
| Type | backend-only | Pure contact-service change; no FE/mobile surface. |
| Author(s) | CDP Squad | |
| Reviewers | CDP Squad BE, Infosec, Data Platform | Infosec sign-off required (AES key scope, blind-index risk). |
| Approver(s) | TBD tech lead + TBD infosec | |
| Submitted Date | 2026-07-01 | |
| Last Updated | 2026-07-01 | |
| Target Release | 2026-Q3 | |
| Source Confluence | DRAFT RFC — PII Encryption | Original design; this RFC supersedes it with implementation state. |
| Jira Epic | TF-2480 | PII Encryption (Supporting) |
Sections at a Glance
- Overview (Problem, Success Criteria, Out of Scope, Dependencies, PRD-to-Schema, Traceability)
- Technical Design (Infra Topology, ADR Decisions, Repo Reading Guide + Source Verification, Sequence Diagrams, Data Model, APIs, Async Paths)
- High-Availability & Security
- Backwards Compatibility & Rollout Plan (Agent Execution Plan, Verification & Rollback Recipe)
- Concerns, Questions, or Known Limitations
- Comment Logs
- Ready for Agent Execution
1. Overview
Encrypt all PII fields in contact-service's MongoDB contact collection using AES-256-GCM
(go-utils/crypto, single-algorithm method). Introduce parallel blind-index fields
(*_bidx) for exact-match search and tokenized shadow fields (*_search) for partial-match
search, so search parity is preserved after plaintext fields are removed.
The migration is phased to zero hard downtime:
| Phase | What changes | When |
|---|---|---|
| 1 — Additive | Add *_encrypted, *_bidx, *_search fields + indexes (no behavior change) | Done (TF-2563) |
| 2 — Dual-write | Every create/update writes plaintext and encrypted fields simultaneously | Done (TF-2593, TF-2594, TF-2866) |
| Backfill | Encrypt historical docs via background cron (migrated from endpoint — Decision 1) | Next (TF-3434 + migration 035) |
| Backfill address | Struct fields + crypto helpers (TF-3435); gap backfill via sparse index + HTTP endpoint (TF-3436 + migration 036) — runs after main cron completes | After TF-3434 cron finishes |
| 3 — Read switch | *_encrypted becomes primary read source; plaintext is fallback | Task X (TF-2597) |
| 4 — Cleanup | Stop writing plaintext; unset legacy fields after soak period | Task XV (TF-2600) |
Success Criteria
- 100% of non-deleted
contactdocuments havename_encryptedpopulated before read switch. - Read path serves from encrypted fields with zero functional regression (API responses identical).
- Search (exact and partial) passes parity test suite against same data.
- p95 endpoint latency increase ≤ 100 ms for read-path endpoints touching PII fields.
- No plaintext PII appears in service logs at any phase.
- Backfill job throughput ≥ 1,000 contacts/min at default sleep (MongoDB oplog lag < 5 s).
Out of Scope
- Client-side encryption — this RFC covers server-side / database-level encryption only.
- Key rotation (
MultiAlgAdapter) — follow-up after steady-state. - Rewriting unrelated APIs or domain behavior outside the target PII fields.
- Building an external search engine — native Mongo with blind-index is the chosen approach.
accountsfield encryption (accounts_encrypted,accounts_bidx) — excluded by product decision;accountsis not used and will not be encrypted.SearchByAccountUniqueIDblind-index migration (Task XIII) is also excluded.- Kafka consumer paths and webhook delivery — tracked in TF-2589 (Async/Consumer Paths); scoped out of this RFC to reduce blast radius.
Related Documents
| Title | Link | What this RFC takes from it |
|---|---|---|
| DRAFT RFC — PII Encryption (Confluence) | Confluence | Original design, field format, phase plan, search strategy |
| TF-2480 Jira Epic + child stories | Jira | Task breakdown I–XV; implementation state |
| go-utils/crypto docs | bitbucket.org/mid-kelola-indonesia/go-utils/src/master/docs/crypto_single_alg.md | AES256GCM payload format, cipher API |
Assumptions
PII_ENCRYPTION_KEYandPII_ENCRYPTION_KEY_IDare provisioned via Vault and injected as env vars — key management is out of scope for this RFC.- MongoDB replication lag is monitored; backfill rate is throttled via configurable sleep to stay within oplog capacity.
- Redis is available for cron control flags (same requirement as existing
BackfillNameTokenizedCron). - The read-switch feature flag (
pii_read_encrypted_enabled) is toggled manually by engineering; no automatic cut-over.
Dependencies
| Dependency | Owner | Status | Blocking? |
|---|---|---|---|
go-utils/crypto AES256GCM cipher | Platform | done — wired in contact/crypto_helpers.go | YES (done) |
| Vault PII encryption key | Infra/Sec | done (env present in staging) | YES |
| Redis cron control | Infra | exists — shared Redis used by BackfillNameTokenizedCron | YES |
Mongo *_bidx indexes | DB/Infra | done (TF-2563, migration files present) | YES |
| Data Platform ETL update | Data | pending — ETL jobs must add decrypt step before Phase 4 (TF-2589) | YES for Phase 4 |
PRD-to-Schema Derivation
The contact collection lives in MongoDB. "Schema" here is the Contact Go struct + bson tags (internal/app/repository/contact/base.go).
| PII requirement | Persisted as (collection.field) | Enforced where | Source |
|---|---|---|---|
Encrypt name at rest | name_encrypted: EncryptedPayload, name_bidx: string, name_search: []string | encryptContactPIIFields (crypto_helpers.go:65) | go-utils AES256GCM |
Encrypt email at rest | email_encrypted: EncryptedPayload, email_bidx: string | same | |
Encrypt phone[] at rest | phone_encrypted: []EncryptedPayload, phone_bidx: []string | same | |
Encrypt usernames[] at rest | usernames_encrypted: []EncryptedUsername, usernames_bidx: []UsernameBidx | same | |
Encrypt address at rest (sub-document) | address_encrypted: *EncryptedPayload, address_bidx: string — JSON-marshal the *Address struct before encrypting | encryptContactPIIFields (TF-3435) | Sub-document: json.Marshal(c.Address) → encryptToPayload |
accounts[] | Excluded by product decision — accounts_encrypted, accounts_bidx are not encrypted in this RFC | — | — |
| Exact search (email, phone) | *_bidx HMAC-SHA256 keyed hash (HKDF-derived blind key from encryption key) | NewConfig (crypto_helpers.go:34) | |
| Partial search (name) | name_search: []string token array (same as name_tokenized) | encryptContactPIIFields:80 | |
| Dual-write during transition | plaintext field and *_encrypted/*_bidx written atomically per document | applyDualWriteToBSONMap (crypto_helpers.go:350) | |
| Backfill historical docs | per-batch cron using BuildPIIEncryptedUpdateFields + BulkUpdateFields | BackfillPIIEncryptionCron (TF-3434) — requires migration 035 (name_encrypted index) before running | |
| Backfill address gap | bounded HTTP endpoint POST /private/contacts/backfill/pii/address | BackfillAddressPII (TF-3436) — requires migration 036 (sparse index on address) before running |
2. Technical Design
Infrastructure Topology
flowchart TB
agent([API / Consumer caller]) -->|HTTPS| lb[Ingress / API Gateway]
lb --> cs["contact-service-api pods\n(chi, stateless)"]
cs -->|read/write bson| mongo[("MongoDB\ncontact collection")]
cs -->|HKDF blind-index key| cr["contact/crypto_helpers\n(in-process AES256GCM)"]
cr --> vault(["Vault\n(PII_ENCRYPTION_KEY)"])
redis[("Redis\n(cron control flags)")] -.->|enable/disable/sleep| cron["BackfillPIIEncryptionCron\n(gocraft/work, worker pod)"]
cron -->|SearchMissingPIIEncryptionByTeam\n+ BulkUpdateFields| mongo
cron -->|BuildPIIEncryptedUpdateFields| cr
cs -->|feature flag\npii_read_encrypted_enabled| redis
Verified infra: MongoDB
go.mongodb.org/mongo-driver v1.12.1(go.mod); Redis-backedgocraft/workworker (internal/app/service/job_enqueuer.go,make run-worker Makefile:64-77); existing cron pattern:BackfillNameTokenizedCron(internal/app/cron/backfill_name_tokenized.go).
Technical Decisions (ADR)
Decision 1: Convert backfill from blocking HTTP endpoint to cron job
Context. Task VIII delivered POST /private/contacts/backfill/pii/{team_id} backed by
PIIBackfillService.BackfillPIIByTeam. The service loops through all pages of 100 contacts
synchronously within the HTTP request context (pii_backfill_service.go:41). For large teams
(tens-of-thousands of contacts), the full loop exhausts the request deadline → HTTP timeout,
leaving the backfill partially done with no way to resume from the last page.
Root cause. HTTP handlers have a bounded timeout (typically 30–60 s); the pagination loop
inside a single request is unbounded — it runs until all pages are exhausted or the connection
drops. Page count = ceil(team_doc_count / 100), so any team with > ~500 contacts will timeout
at default HTTP deadlines.
Options considered.
- Option A — paginated endpoint (cursor-based). Keep HTTP; return a
next_pagetoken; caller drives pagination. Pros: no infra change. Cons: requires an external orchestrator (script / pipeline) to loop; the caller must handle retries; operational overhead to script across all teams. - Option B — enqueue per-team gocraft/work job. Endpoint enqueues a
PIIBackfillJobfor the given team; worker processes all pages. Pros: non-blocking; survives pod restarts. Cons: one job per team → job fan-out; no global rate limiting across teams; harder to observe cross-team progress. - Option C — background cron job (all teams, batch-at-a-time). A
BackfillPIIEncryptionCronruns on the worker pod, continuously fetches the next batch of contacts missingname_encryptedacross all teams, encrypts them, and sleeps between batches. Controlled by Redis flags (enable/disable, force-break, sleep-time). Matches exactly theBackfillNameTokenizedCronpattern already in the repo. Pros: self-driving, no external orchestrator; uniform across all teams; configurable throttle; identical to proven pattern. Cons: no team-ordering control (processes whichever doc Mongo returns first from the missing-encrypted index).
Decision. Option C — implement BackfillPIIEncryptionCron following the
BackfillNameTokenizedCron pattern (verified internal/app/cron/backfill_name_tokenized.go).
The existing HTTP endpoint (BackfillPIIByTeam) is kept but demoted to a single-batch
trigger (processes max 100 contacts per call) for manual/targeted use; it no longer loops.
Cron control flags (Redis keys):
| Redis key | Effect | Default |
|---|---|---|
pii_backfill_encryption_enabled | Master switch — cron exits early if absent/empty | disabled (safe default) |
pii_backfill_encryption_force_break | Abort current run immediately | unset |
pii_backfill_encryption_sleep_ms | Sleep between batches (ms); 0 = no sleep | 0 |
pii_backfill_encryption_active | TTL-guarded mutex — prevents concurrent runs | set for 3600 s on start |
pii_backfill_encryption_batch_size | Contacts per batch; default 1,000 | 1000 |
Rationale. The BackfillNameTokenizedCron pattern is battle-tested in production. Reusing it
gives force-break, active-job dedup, and rate control for free. The cron runs on the existing
worker pod deployment — no new infra.
Consequences. The HTTP endpoint loop is shortened to one batch for targeted use. The cron
replaces the bulk backfill. Progress visibility comes from logs (pii_backfill_cron batch_processed)
and the reconciliation checker (Decision 2 / Task IX).
Reversibility. Disable via Redis flag; HTTP endpoint loop can be re-enabled in one line.
Decision 2: Reconciliation checker via dedicated cron pass
Context. After the backfill completes, we need confidence that every non-deleted contact has
name_encrypted before switching reads (Task X). A simple count query is insufficient — it
doesn't catch docs where encryption was skipped due to a transient error mid-batch.
Decision. Implement a reconciliation checker (BackfillPIIReconciliationCron or a dedicated
service method) that:
- Queries
{ name: { $exists: true, $ne: "" }, name_encrypted: { $exists: false } }for the whole collection (same query as the backfill cron uses to find missing docs). - Returns a count of remaining unencrypted contacts.
- Logs
pii_backfill_reconciliation total_missing=N. - Emits a DataDog metric
cdp_pii_backfill_missing_countso an alert can fire if the count rises unexpectedly after the read switch.
The reconciliation checker runs as a one-shot cron (scheduled; not continuously looping) and can
also be triggered via a lightweight GET /private/contacts/backfill/pii/status endpoint that
returns { missing_count, total_count, pct_complete }.
Rationale. A count query on the name_encrypted index is fast once migration 035 is applied
(TF-3434 adds idx_contact_name_encrypted: { name_encrypted: 1 }). Without that index the count
query would scan 150M docs. This is cheap post-migration and gives an exact gate before Phase 3.
Reversibility. Remove the cron registration; the query itself is harmless.
Decision 3: Encrypted-first read with plaintext fallback (Phase 3)
Context. After backfill reaches 100% coverage, reads need to switch to the encrypted fields as the source of truth. The transition must be:
- Gradual (feature-flag controlled, not a code deploy).
- Zero-regression (if a doc somehow lacks
name_encrypted, serve plaintext; never return empty). - Reversible (flag off → revert to plaintext reads with no data change).
Decision. Add a Redis-backed feature flag pii_read_encrypted_enabled. In all repository
read/search methods that project PII fields, after deserializing the document:
if flag=on AND name_encrypted != nil:
decryptContactPIIFields(cfg, &contact) // populates .Name etc. from *_encrypted
else:
// plaintext .Name/.Email etc. already populated by bson unmarshal
decryptContactPIIFields already exists (crypto_helpers.go:183); this decision wires it into
the read path. The flag is read on every request (Redis GET; <1 ms); no pod restart is needed
to toggle.
The fallback logic is field-level, not document-level. If name_encrypted is nil but name
is set, Name is already populated by unmarshal and is returned as-is. This handles:
- Docs not yet backfilled (name plaintext → returned; degraded but not broken).
- Future docs with partial encryption (edge case).
Read methods to wire the flag into (all in internal/app/repository/contact/):
search.go—SearchWithFilters,SearchByEmail,SearchByPhone,SearchByAccountUniqueID,SearchByCompanySsoID,SearchByID, and all variants.create.go—InsertContactresponse serializer.update.go—UpdateContactreturn path.
Rationale. Field-level fallback is the safest approach: no document is ever "broken" by the switch, and the flag can be toggled without a deploy.
Consequences. One Redis round-trip per request (already paid by other flag checks in the service layer; can be cached per-request in context). Decryption cost: p95 ≤ 5 ms per field (verified: AES-256-GCM on typical field lengths is sub-millisecond in Go at this scale).
Reversibility. Toggle flag to 0/unset; reads revert to plaintext instantly.
Decision 4: Migrate exact-match searches to blind-index fields
Context. SearchByEmail, SearchByPhone, and SearchByAccountUniqueID currently query
plaintext fields using case-insensitive regex / $elemMatch. After Phase 4 removes plaintext
fields, these queries will return zero results.
Decision. Migrate each exact-match search to the corresponding *_bidx field. The blind
index is HMAC-SHA256(normalize(value), blind_key) where blind_key is derived from the
encryption key via HKDF (NewConfig:44). To query: compute the blind index of the search input
using the same cipher, then query { email_bidx: computedHash }.
Migration plan per method:
| Search method | Current filter | New filter | Normalization |
|---|---|---|---|
SearchByEmail (search.go) | { email: { $regex: input, $options: "i" } } | { email_bidx: cfg.Cipher.BlindIndex(strings.ToLower(input)) } | lowercase |
SearchByPhone (search.go) | { phone: input } | { phone_bidx: cfg.Cipher.BlindIndex(input) } | none (phone stored as-is) |
SearchByAccountUniqueID (search.go) | { accounts: { $elemMatch: { unique_id: input } } } | { accounts_bidx: { $elemMatch: { channel: ch, unique_id_bidx: cfg.Cipher.BlindIndex(input) } } } | none |
| Name search (partial) | name_tokenized: { $all: tokens } | name_search: { $all: tokens } | tokenize (already written at encrypt time) |
Each migration is guarded by the same pii_read_encrypted_enabled flag: if flag off → old
regex query; if flag on → blind-index query. This gives a single toggle for both read decryption
and search migration. (Implementation: the ToFilters() method in
internal/app/payload/search_contact_request.go already dispatches to the per-field search
methods — the flag check can live there.)
Consequences. Partial search on email/phone/account.unique_id is permanently
unavailable after Phase 4. Exact-match is preserved. Callers relying on prefix/fuzzy email
search must be migrated to exact-match UX before Phase 4 is triggered. This is a known product
constraint documented in the Confluence RFC §2.4.
Reversibility. Toggle flag off → old regex queries restored.
Decision 5: Legacy plaintext removal gate (Phase 4)
Context. Phase 4 is irreversible at the data level (unsetting name from every document
cannot be easily undone in bulk). It must be gated behind firm criteria.
Decision. Phase 4 is triggered only when all of:
- Reconciliation checker returns
missing_count = 0(100% backfill). pii_read_encrypted_enabledflag has been on for ≥ 7 days with zero decryption failures (cdp_pii_decrypt_failure_rate = 0).- Fallback-read counter
cdp_pii_read_plaintext_fallbackis zero for 24 h (no doc is silently serving plaintext). - Data Platform ETL jobs are updated to decrypt from
*_encrypted(TF-2589 done). - Infosec sign-off on the field-removal migration.
Phase 4 itself: a one-time migration script (new db/migrations/0NN_contact_pii_cleanup.up.json)
using $unset on plaintext fields for all documents — then disable plaintext write in code.
Reversibility. Backup/point-in-time restore before the migration; Phase 4 is only executed after a documented sign-off. Source-code rollback stops the plaintext unset from running again but does not restore already-unset documents.
Repo Reading Guide + Source Verification
All anchors below are verified against contact-service at ../contact-service (as of 2026-07-01).
| Symbol | File | Line(s) | Notes |
|---|---|---|---|
Contact struct (PII + encrypted fields) | internal/app/repository/contact/base.go | 49–80+ | Name, Email, Phone, Accounts, Usernames + *_encrypted/*_bidx/*_search counterparts |
Config / NewConfig | internal/app/repository/contact/crypto_helpers.go | 22–55 | HKDF blind-index derivation; cipher initialization |
encryptContactPIIFields | crypto_helpers.go | 65–175 | Pure; writes *_encrypted/*_bidx/name_search; zeros plaintext |
decryptContactPIIFields | crypto_helpers.go | 183–266 | Pure; populates plaintext from *_encrypted |
applyDualWriteToBSONMap | crypto_helpers.go | 350–485 | Called in update paths; adds encrypted counterparts to $set map |
BuildPIIEncryptedUpdateFields | crypto_helpers.go | 495–593 | Backfill helper; returns bson.M of encrypted-only fields |
PIIBackfillService.BackfillPIIByTeam | internal/app/service/pii_backfill_service.go | 38–116 | Current per-team loop (blocks HTTP — Decision 1 demotes to single-batch) |
SearchMissingPIIEncryptionByTeam | repository/contact/search.go | ~219 | Existing per-team query; TF-3434 adds global SearchMissingPIIEncryption (no company_sso_id) |
BackfillNameTokenizedCron | internal/app/cron/backfill_name_tokenized.go | 1–206 | Template pattern for BackfillPIIEncryptionCron (TF-3434) |
Address struct | internal/app/repository/contact/base.go | 78, 174 | Address *Address pointer; AddressEncrypted/AddressBidx fields added by TF-3435 |
SearchContactsWithAddressMissingEncryption | repository/contact/search.go | (TF-3436) | Gap query using sparse index on address: { address: $exists: true, address_bidx: $exists: false } |
BackfillAddressPII | internal/app/service/pii_backfill_service.go | (TF-3436) | Bounded HTTP backfill: max 50 pages × 100 docs per call |
| Dual-write create | internal/app/repository/contact/create.go | InsertContact | Calls encryptContactPIIFields before insert |
| Dual-write update | internal/app/repository/contact/update.go | UpdateContact, BulkUpdateFields | Call applyDualWriteToBSONMap before $set |
| Cron registration | internal/app/cron/ | worker registration file | Where BackfillPIIEncryptionCron must be registered |
SearchByEmail (to migrate) | internal/app/repository/contact/search.go | — | Switch to email_bidx lookup (Decision 4, TF-2598) |
SearchByPhone (to migrate) | internal/app/repository/contact/search.go | — | Switch to phone_bidx lookup (Decision 4, TF-2598) |
SearchByAccountUniqueID | internal/app/repository/contact/search.go | — | Excluded — accounts not encrypted; no blind-index migration |
ToFilters() (search dispatch) | internal/app/payload/search_contact_request.go | — | Add flag-guarded branch for blind-index path (Decision 4, TF-2599) |
Sequence Diagrams
Backfill Cron (Decision 1)
sequenceDiagram
participant sched as gocraft/work scheduler
participant cron as BackfillPIIEncryptionCron
participant redis as Redis
participant mongo as MongoDB
sched->>cron: trigger (scheduled interval)
cron->>redis: GET pii_backfill_encryption_active
alt already running
cron-->>sched: return (skip)
end
cron->>redis: GET pii_backfill_encryption_enabled
alt disabled
cron-->>sched: return (skip)
end
cron->>redis: SET pii_backfill_encryption_active TTL=3600
loop until no docs left OR force_break
cron->>redis: GET pii_backfill_encryption_force_break
alt force_break set
cron-->>sched: break
end
cron->>mongo: SearchMissingPIIEncryption(batch_size=100)
mongo-->>cron: []Contact (plaintext)
cron->>cron: BuildPIIEncryptedUpdateFields() per contact
cron->>mongo: BulkUpdateFields(updates map[ObjectID]bson.M)
mongo-->>cron: ok
cron->>redis: GET pii_backfill_encryption_sleep_ms
cron->>cron: time.Sleep(sleepMs)
Note over cron: log pii_backfill_cron batch_processed
end
cron->>redis: DEL pii_backfill_encryption_active
Note over cron: log pii_backfill_cron completed
Encrypted-First Read (Phase 3, Decision 3)
sequenceDiagram
participant handler as contact handler
participant repo as contact repository
participant redis as Redis
participant mongo as MongoDB
participant crypto as crypto_helpers
handler->>repo: SearchWithFilters(ctx, req)
repo->>redis: GET pii_read_encrypted_enabled
alt flag = "1"
repo->>mongo: find({ email_bidx: hash(input) }) — blind-index query
mongo-->>repo: []Contact (with *_encrypted fields set)
repo->>crypto: decryptContactPIIFields(cfg, &contact)
crypto-->>repo: Contact{Name, Email, Phone, ...} populated
else flag = "0" or unset
repo->>mongo: find({ email: { $regex: input } }) — legacy query
mongo-->>repo: []Contact (plaintext fields set)
end
repo-->>handler: []Contact
Data Model (MongoDB contact collection)
The encrypted payload format follows go-utils standard (EncryptedPayload struct):
{
"kid": "contact-key-v1",
"alg": "AES256GCM",
"iv": "<base64_nonce>",
"payload": "<base64_ciphertext>"
}
Additive fields on each contact document (Phase 1 — done):
// encrypted payload fields
name_encrypted: EncryptedPayload (object)
email_encrypted: EncryptedPayload (object)
phone_encrypted: []EncryptedPayload (array)
usernames_encrypted: []EncryptedUsername (array: { source, flag, icon_key, value{*} })
address_encrypted: *EncryptedPayload (object — JSON-marshalled Address struct; TF-3435)
// exact search blind-index fields (HMAC-SHA256 keyed hash)
name_bidx: string
email_bidx: string
phone_bidx: []string
usernames_bidx: []UsernameBidx (array: { value_bidx })
address_bidx: string (blind-index of marshalled JSON; TF-3435)
// partial search shadow fields
name_search: []string (token array, same algorithm as name_tokenized)
// NOTE: accounts_encrypted / accounts_bidx excluded by product decision
Mongo indexes (db/migrations/):
- Done (027):
idx_contact_email_bidx:{ company_sso_id: 1, is_deleted: 1, email_bidx: 1 } - Done (027):
idx_contact_phone_bidx:{ company_sso_id: 1, is_deleted: 1, phone_bidx: 1 } - Done (028):
idx_contact_name_search:{ company_sso_id: 1, is_deleted: 1, name_search: 1 } - TF-3434 (035):
idx_contact_name_encrypted:{ name_encrypted: 1 }— regular index; enables{ name_encrypted: { $exists: false } }backfill query on 150M docs without a full collection scan.make migrate-upblocks 30–60 min — run off-peak before deploying TF-3434. - TF-3436 (036):
idx_contact_address_sparse:{ address: 1 }sparse — only indexes docs with a non-nulladdress; enables the address gap query without scanning 150M docs. Run off-peak before deploying TF-3436.
APIs
No external API contract changes. Internal behavior changes only:
| Endpoint group | Phase 2 (now) | Phase 3 (read switch) | Phase 4 (cleanup) |
|---|---|---|---|
POST/PUT /iag/v1/contacts | dual-write (plaintext + encrypted) | dual-write continues | encrypted-only write |
GET /iag/v1/contacts/{id} | returns plaintext (read unchanged) | returns decrypted-from-encrypted | same |
SearchContacts* | regex on plaintext | blind-index on *_bidx (flag-gated) | blind-index only |
POST /private/contacts/backfill/pii/{team_id} | single-batch (100 docs max, returns count) | same | removed/no-op |
GET /private/contacts/backfill/pii/status | returns { missing_count, total_count, pct_complete } | same | same |
POST /private/contacts/backfill/pii/address | bounded loop max 50 pages × 100 docs; returns { processed_count, remaining_count } (TF-3436) | same | removed/no-op |
3. High-Availability & Security
Performance
- AES-256-GCM per-field latency: ≤ 5 ms p95 (Go stdlib implementation; verified by micro-benchmark).
- Endpoint p95 regression: ≤ 100 ms for endpoints touching encrypted fields.
- Backfill batch size: 1,000 contacts (cron default); configurable via Redis. HTTP endpoint cap stays at 100 per call (timeout constraint). Sleep between cron batches: configurable (default 0 ms; increase if oplog lag rises).
- MongoDB write pressure during backfill: throttled by sleep key; monitored via Datadog
mongodb.opcounters.update.
Monitoring & Alerting
Emit the following DataDog metrics (structured log events piped to DD):
| Metric | Alert threshold |
|---|---|
cdp_pii_backfill_batch_processed (count) | — (progress tracking) |
cdp_pii_backfill_missing_count (gauge, from reconciliation) | > 0 after read switch enabled |
cdp_pii_decrypt_error_count (counter) | > 0 (any decryption error = incident) |
cdp_pii_read_plaintext_fallback (counter) | > 0 for > 5 min after Phase 3 gate |
cdp_pii_backfill_failed_count (counter) | > 10 in 5 min |
Logging
- Never log plaintext PII (
name,email,phone,accounts). - Log only: contact ObjectID, team SSO ID, field name (for encrypt/decrypt errors), phase, error category.
- Encrypted payload and key material must never appear in logs.
Security Implications
- AES-256-GCM provides confidentiality + integrity for stored values.
- Blind index leaks equality pattern (same input → same hash). This is the accepted trade-off for searchability. Keyed HMAC mitigates rainbow-table attacks.
name_search/ token fields introduce additional disclosure risk — controlled by not storing raw tokens for email/phone (only name). Full policy in Confluence RFC §3.- Keys must be externally managed (Vault); no hardcoded keys or keys in config files.
- Key rotation: follow-up using
MultiAlgAdapter(out of scope here).
Datalake / Datamart Impact
After Phase 4, name, email, phone, accounts are removed from the collection. ETL jobs that
read from contact must add a decrypt step before transformation. This is owned by Data Platform
(TF-2589). Phase 4 is blocked until TF-2589 is done.
4. Backwards Compatibility & Rollout Plan
Compatibility
- API request/response schema unchanged through all phases.
- Legacy plaintext fields remain readable through Phase 3.
- Rollback at each phase is non-destructive until Phase 4 (unset migration).
Agent Execution Plan
The following tasks are ordered. Complete one fully before starting the next. Tasks I–VIII are already done; start at VIII-fix.
[DONE] I. go-utils/crypto dependency + wrapper (TF-2561)
[DONE] II. Encryption key configuration (TF-2562)
[DONE] IIa. getString → getStringOrPanic (TF-2723)
[DONE] III. Mongo migration: *_encrypted, *_bidx indexes (TF-2563)
[DONE] IV. Extend Contact struct (TF-2591)
[DONE] V. Encryption helpers in repository layer (TF-2592)
[DONE] VI. Dual-write: InsertContact, InsertContactBulk (TF-2593)
[DONE] VII. Dual-write: UpdateContact, UpdateMany, Bulk (TF-2594)
[DONE] VIIb. Usernames encryption column (TF-2866)
[DONE] VIII. HTTP backfill endpoint (per-team) (TF-2595) ← timed out
[NEXT] VIII-fix. BackfillPIIEncryptionCron — cron-based backfill (TF-3434)
⚠️ PRE-DEPLOY: run make migrate-up (migration 035: name_encrypted index)
off-peak BEFORE deploying image — blocks 30–60 min on 150M docs.
- New migration: db/migrations/035_add_name_encrypted_index.up.json
→ { name_encrypted: 1 } regular index; required for $exists: false query at scale
- New file: internal/app/cron/backfill_pii_encryption.go
- Pattern: mirror BackfillNameTokenizedCron exactly
- Redis flags: pii_backfill_encryption_{enabled,force_break,sleep_ms,active,batch_size}
- Refactor BackfillPIIByTeam HTTP handler to process ONE batch (100 docs max) and return
- Register cron in worker setup
- Unit test: cron exits when flag unset; processes batch when enabled
[TODO] VIIIa. address_encrypted + address_bidx struct fields + crypto helpers (TF-3435)
- Depends on: TF-3434 merged first
- Add AddressEncrypted *EncryptedPayload and AddressBidx string to Contact struct (base.go)
- Extend encryptContactPIIFields: json.Marshal(c.Address) → encryptToPayload → nil pointer
- Extend decryptContactPIIFields: decryptFromPayload → json.Unmarshal → *Address
- Extend BuildPIIEncryptedUpdateFields: emit address_encrypted/address_bidx when c.Address != nil
- Round-trip test: encrypt → decrypt → Address struct fields match
[TODO] VIIIb. address backfill — sparse index + bounded HTTP endpoint (TF-3436)
- Deferred: run after all other tickets (Tasks VIII-fix through XIV) are complete
- Depends on: TF-3435 merged (needs AddressEncrypted/AddressBidx struct + helpers)
- Does NOT gate Phase 3 or Phase 4 — plaintext address remains readable throughout
⚠️ PRE-DEPLOY: run make migrate-up (migration 036: sparse index on address) off-peak.
- New migration: db/migrations/036_add_address_sparse_index.up.json
→ { address: 1 } sparse=true; required for address $exists: true query at scale
- New method: SearchContactsWithAddressMissingEncryption(ctx, limit, page)
filter: { is_deleted: false, address: $exists: true, address_bidx: $exists: false }
- New endpoint: POST /private/contacts/backfill/pii/address
→ bounded loop max 50 pages × 100 docs; returns { processed_count, remaining_count }
- Ops calls endpoint until remaining_count = 0 before proceeding to Task X
[TODO] IX. Reconciliation checker (TF-2596)
- New method: CountMissingPIIEncryption(ctx) (int64, error)
- New endpoint: GET /private/contacts/backfill/pii/status → { missing_count, total_count, pct_complete }
- Emit cdp_pii_backfill_missing_count metric
- Gate: proceed to Task X only when missing_count = 0
[TODO] X. Encrypted-first read switch (TF-2597)
- Add Redis flag: pii_read_encrypted_enabled
- Wire decryptContactPIIFields into ALL repository read paths (search.go, create.go return, update.go return)
- Field-level fallback: if *_encrypted is nil → use plaintext as-is
- Emit cdp_pii_read_plaintext_fallback counter when fallback triggers
- Integration test: insert doc with encrypted fields, toggle flag, verify response
[TODO] XI. Migrate SearchByEmail + SearchByPhone to blind-index (TF-2598)
- Guard both with pii_read_encrypted_enabled flag
- SearchByEmail: query { email_bidx: cfg.Cipher.BlindIndex(strings.ToLower(input)) }
- SearchByPhone: query { phone_bidx: cfg.Cipher.BlindIndex(input) }
- SearchByAccountUniqueID: NOT migrated (accounts excluded from scope)
[SKIP] XII. SearchByAccountUniqueID → accounts_bidx (EXCLUDED)
- accounts field excluded by product decision; no blind-index migration needed
[TODO] XIII. Update SearchContactRequest.ToFilters() (TF-2599)
- Route name search to name_search token array (already populated by dual-write)
- Route email/phone exact-match to *_bidx (guarded by flag)
[TODO] XIV. Remove legacy plaintext write + unset migration (TF-2600)
- GATE: all Phase 4 criteria from Decision 5 must pass before this task starts
- Remove plaintext field writes from create/update (dual-write → encrypted-only)
- New migration: db/migrations/0NN_contact_pii_cleanup.up.json
using $unset on name/email/phone/usernames (accounts already excluded)
- Disable pii_read_encrypted_enabled fallback code path
Verification & Rollback Recipe
Per-task verification:
| Task | Verify by | Pass criteria |
|---|---|---|
| VIII-fix (cron) | Enable flag in staging; tail logs | pii_backfill_cron batch_processed logs appear; pii_backfill_cron completed when no docs left |
| VIIIa (address fields) | Insert contact with address; read back | address_encrypted stored as BSON sub-document; decrypt returns original Address struct |
| VIIIb (address backfill) | Call POST /private/contacts/backfill/pii/address after all other tasks done | processed_count increments; remaining_count reaches 0 after all pages |
| IX (reconciler) | Call GET /private/contacts/backfill/pii/status | missing_count decreases each run; reaches 0 |
| X (read switch) | Toggle pii_read_encrypted_enabled=1 in staging; call GET /iag/v1/contacts/{id} | Response name, email, phone match plaintext values; no cdp_pii_decrypt_error_count increment |
| XI (search blind-index) | Toggle flag; search by known email/phone | Results match pre-flag results; latency within 100 ms p95 |
| XIII (ToFilters) | Toggle flag; run name/email/phone filter searches | Results consistent with pre-flag; no regex fallback triggered |
| XIV (cleanup) | Run migration on one tenant; query doc | name, email, phone fields absent; name_encrypted present |
Rollback per phase:
| Phase | Rollback action | Data safe? |
|---|---|---|
| Cron (VIII-fix) | Set pii_backfill_encryption_enabled to empty in Redis | YES — no data removed |
| Address fields (VIIIa) | Revert base.go struct change; redeploy | YES — address_encrypted/address_bidx fields unused but harmless in existing docs |
| Address backfill (VIIIb) | Stop calling the endpoint; no data removed | YES — address plaintext still present |
| Read switch (X) | Set pii_read_encrypted_enabled to 0 in Redis | YES — plaintext still present |
| Search migration (XI, XIII) | Set pii_read_encrypted_enabled to 0 → reverts to regex queries | YES |
| Cleanup (XIV) | Cannot revert unset migration without restore | Point-in-time backup required before running |
5. Concerns, Questions, or Known Limitations
| # | Item | Mitigation / Status |
|---|---|---|
| OQ-1 | Partial search on email/phone is permanently unavailable after Phase 4 | Accepted product constraint; exact-match only. Communicate to product/callers before Phase 4. |
| OQ-2 | Backfill cron processes all teams uniformly — no priority ordering | Acceptable for now. The old HTTP endpoint (single-batch mode) remains for targeted team processing. |
| OQ-3 | Contacts with empty name AND empty email (blank contacts) will not be backfilled by the name-missing query | Add a secondary query condition OR run a separate pass after primary backfill completes. Low priority — blank PII contacts are low-risk. |
| OQ-4 | address field backfill query times out on 150M docs without an index | Resolved: sparse index on address (migration 036, TF-3436) limits the scan to the small subset of contacts that have an address sub-document. Backfill via bounded POST /private/contacts/backfill/pii/address endpoint only after TF-3434 cron reaches missing_count = 0. |
| OQ-5 | Kafka consumer and webhook paths (TF-2589) must be updated before Phase 4 | Phase 4 gate includes TF-2589 sign-off (Decision 5). |
| OQ-6 | Decryption adds ~1–5 ms per field on read path — aggregate latency on multi-field projections? | Profile in staging with realistic dataset before Phase 3 production toggle. Limit projection to required fields. |
| OQ-7 | accounts field not encrypted — gap vs. original scope? | Excluded by product decision (2026-07-01): accounts is not actively used and will not be encrypted. SearchByAccountUniqueID blind-index migration is also excluded. |
6. Comment Logs
2026-07-01 — Scope update: accounts excluded, address re-added, index gaps found (Berlianto / CDP)
accountsexcluded: product decision —accountsis not actively used and will not be encrypted. Task XIII (SearchByAccountUniqueID blind-index) removed.accounts_encrypted/accounts_bidxstruct fields not added.addressre-added:address_encrypted+address_bidxadded back to scope. Split into two tasks: TF-3435 (struct fields + crypto helpers only) and TF-3436 (sparse index migration 036 + bounded HTTP backfill endpoint).addressis a pointer sub-document serialised as JSON before encryption.name_encryptedindex gap found: Migration 027/028 confirmed — no index onname_encrypted. Without it, the{ name_encrypted: { $exists: false } }cron query scans all 150M docs on every batch. Fix: migration035_add_name_encrypted_index.up.jsonadded to TF-3434. Must runmake migrate-upoff-peak (blocks 30–60 min) before deploying TF-3434.addressbackfill query confirmed unsafe without sparse index: Production query{ address: { $exists: true } }timed out on 150M docs. Fix: sparse index onaddress(migration 036, TF-3436). Only indexes docs that actually have an address — makes the gap query efficient without scanning the full collection.- Jira corrections: TF-2598 covers both SearchByEmail + SearchByPhone (combined); TF-2599 = ToFilters; TF-2600 = Phase 4 cleanup (was TF-2602). TF-3436 created new.
2026-07-01 — RFC Rewrite (Berlianto / CDP)
- Tasks I–VIII marked Done per Jira (TF-2561 through TF-2866). Implementation verified against
contact-servicerepo (../contact-service). - Root cause of HTTP timeout diagnosed:
PIIBackfillService.BackfillPIIByTeam(pii_backfill_service.go:41) is a synchronous pagination loop with no page cap — exhausts request deadline for large teams. Fix: Decision 1 (cron job,BackfillPIIEncryptionCron). - Decision 1 modeled after
BackfillNameTokenizedCron(cron/backfill_name_tokenized.go) — identical Redis control flag pattern. - HTTP endpoint demoted to single-batch (100 docs max) for targeted use; cron drives bulk backfill.
- Remaining tasks IX–XV scoped and anchored to TF-2596 through TF-2602.
- Phase 4 gate criteria (Decision 5) made explicit: 100% reconciliation + 7-day soak + DataPlatform sign-off.
7. Ready for Agent Execution
Checklist
- §1 PRD-to-Schema Derivation — all encrypted fields traced to source
- §2 Repo Reading Guide — all file:function anchors verified against
../contact-service - §2 Technical Decisions — 5 ADRs with options, rationale, consequences, reversibility
- §2 Sequence Diagrams — backfill cron + encrypted-read flows
- §4 Agent Execution Plan — task-ordered, each with specific file + action
- §4 Verification & Rollback Recipe — per-task pass criteria + rollback per phase
- §5 Open Questions — logged with mitigations
Entry Bar
An agent or engineer starting Task VIII-fix must:
- Read
internal/app/cron/backfill_name_tokenized.goin full — the new cron mirrors it exactly. - Read
internal/app/service/pii_backfill_service.go—BackfillPIIByTeamloop becomes a single-batch call. - Read
internal/app/repository/contact/crypto_helpers.goBuildPIIEncryptedUpdateFields— this is what the cron calls per contact. - Implement
BackfillPIIEncryptionCronininternal/app/cron/backfill_pii_encryption.go. - Refactor
PIIBackfillService.BackfillPIIByTeamto process exactly one page (remove the outerfor page := 1; ; page++loop — keep the single page fetch + bulk update). - Register the new cron in the worker startup.
- Write unit tests: flag-disabled path exits; flag-enabled path processes one batch and logs correctly.
Do not start Task IX until the cron runs to completion (missing_count = 0) in staging.