Skip to main content

Task Breakdown: PII Encryption — Contact Service

Source RFC: rfc-pii-encryption.md

Scope exclusions (confirmed): accounts field (accounts_encrypted, accounts_bidx) is excluded from all remaining tasks. address was re-added to scope (see Task 1b / TF-3435). Tasks I–VIII are already Done. This breakdown covers only remaining work.

Mode: Vertical — one task per logical chunk, each independently deployable.


Effort Summary

TaskJiraBE daysQA daysTotal
Task 1 — BackfillPIIEncryptionCron + HTTP single-batch fixTF-343420.52.5
Task 1b — address_encrypted + address_bidx struct fields + crypto helpersTF-34350.50.250.75
Task 1c — address backfill (sparse index + bounded HTTP endpoint)TF-343610.251.25
Task 2 — Reconciliation checker + status endpointTF-259610.251.25
Task 3 — Encrypted-first read switchTF-259720.52.5
Task 4 — SearchByEmail + SearchByPhone blind-indexTF-259810.251.25
Task 5 — ToFilters() encrypted search pathsTF-259910.251.25
Task 6 — Phase 4 cleanup (legacy plaintext removal)TF-26001.512.5
Grand total103.2513.25

Confidence: medium. Cron, repo, and service patterns are battle-tested in this codebase — estimates are tight. Task 1b field type is confirmed (Address *Address pointer sub-document — see Step 1 for full details). Task 3 uncertainty remains: the number of SearchWith* call sites that need the decrypt path wired in is large. Task 6 QA is heavier because the field-unset migration is irreversible.


Task 1: [BE] BackfillPIIEncryptionCron + single-batch HTTP refactor (VIII-fix)

The backfill runs as a self-driving cron job on the worker pod, processing missing-encrypted contacts in batches across all teams, instead of blocking a single HTTP request until it times out.

Status: ✅ Actionable

Design reference: n/a — BE only

What to build

Implement BackfillPIIEncryptionCron (mirroring BackfillNameTokenizedCron) that continuously processes contacts missing name_encrypted across all teams. The cron defaults to 1,000 contacts per batch — no timeout pressure unlike the HTTP endpoint. Simultaneously, strip the pagination loop from PIIBackfillService.BackfillPIIByTeam so the HTTP endpoint processes exactly one page (100 contacts) per call and returns immediately.

Implementation Plan

ActionFileWhat changes
createdb/migrations/035_add_name_encrypted_index.up.jsonAdd index { name_encrypted: 1 } on contact collection
createdb/migrations/035_add_name_encrypted_index.down.jsonDrop index idx_contact_name_encrypted
extendinternal/app/repository/contact/search.goAdd SearchMissingPIIEncryption(ctx, limit, page) — same filter as SearchMissingPIIEncryptionByTeam but without company_sso_id scope
extendinternal/app/repository/contact/base.goAdd SearchMissingPIIEncryption signature to ContactInterface (line ~400)
extendinternal/app/repository/contact/mocks/ContactInterface.goRe-run mockery or hand-add the mock method for SearchMissingPIIEncryption
createinternal/app/cron/backfill_pii_encryption.goNew BackfillPIIEncryptionCron — mirrors backfill_name_tokenized.go exactly
createinternal/app/cron/backfill_pii_encryption_test.goUnit tests: flag-disabled exits, force-break stops loop, single batch processed, BulkUpdateFields called
editinternal/app/service/pii_backfill_service.goRemove outer for page := 1; ; page++ loop; process exactly one page (100 docs) per HTTP call; cron uses 1,000 via BACKFILL_PII_ENCRYPTION_BATCH_SIZE
editinternal/app/service/pii_backfill_service_test.goUpdate tests to reflect single-page semantics
extendinternal/worker/worker_service.goAdd BackfillPIIEncryptionJobName, BackfillPIIEncryptionDuration constants; add BackfillPIIEncryptionCron to CronList struct; register in registerCronJob
extendcmd/initializer.goWire cron.NewBackfillPIIEncryptionCron(cacheRepo, contactRepo, cfg) and include in the server.Handler return

Implementation steps

  1. Create the migration files. The cron query { name_encrypted: { $exists: false } } needs an index to avoid a full collection scan on every batch across 150M documents. No index on name_encrypted exists today — db/migrations/ checked, only email_bidx, phone_bidx, and name_search indexes were added in migrations 027–028.

    Create db/migrations/035_add_name_encrypted_index.up.json:

    [
    {
    "createIndexes": "contact",
    "indexes": [
    {
    "key": { "name_encrypted": 1 },
    "name": "idx_contact_name_encrypted",
    "background": true
    }
    ]
    }
    ]

    Create db/migrations/035_add_name_encrypted_index.down.json:

    [
    {
    "dropIndexes": "contact",
    "index": "idx_contact_name_encrypted"
    }
    ]

    Deployment note: make migrate-up blocks until the index is built. On 150M documents expect 30–60 minutes. Run during off-peak hours before deploying the image, not after.

    Deployment sequence: make migrate-up → deploy image → set pii_backfill_encryption_enabled in Redis.

  2. Explore the pattern. Open internal/app/cron/backfill_name_tokenized.go and read it in full. Note the five Redis keys, the checkActiveJobbackfillEnabled → loop → forceBreakgetCustomSleepTime flow, and the BakfillNameTokenized(*work.Job) error method signature. The new cron follows this exactly.

  3. Add the global repo method. Open internal/app/repository/contact/search.go. After SearchMissingPIIEncryptionByTeam (line 219), add:

    // SearchMissingPIIEncryption returns contacts across all teams that have a non-blank
    // name but are missing name_encrypted. Used by the backfill cron to process all teams.
    func (r *ContactRepo) SearchMissingPIIEncryption(ctx context.Context, limit int, page int) (data []Contact, err error) {
    if limit <= 0 { limit = 1000 }
    if page <= 0 { page = 1 }
    filter := bson.M{
    "is_deleted": false,
    "name": bson.M{"$nin": bson.A{nil, ""}},
    "name_encrypted": bson.M{"$exists": false},
    }
    results, err := r.mongo.Where(ctx, Contact{}.TableName(), filter, limit, page, repository.SortBy{})
    // ... parse loop identical to SearchMissingPIIEncryptionByTeam
    }

    Add the signature to ContactInterface in base.go (after SearchMissingPIIEncryptionByTeam, line ~400).

  4. Create the cron file. Create internal/app/cron/backfill_pii_encryption.go. Use the same package (package cron). Define:

    const (
    ACTIVE_JOB_PII_ENCRYPTION_KEY = "pii_backfill_encryption_active"
    ACTIVE_JOB_PII_ENCRYPTION_EXPIRY = 3600
    BACKFILL_PII_ENCRYPTION_ENABLED_KEY = "pii_backfill_encryption_enabled"
    BACKFILL_PII_ENCRYPTION_FORCE_BREAK_KEY = "pii_backfill_encryption_force_break"
    BACKFILL_PII_ENCRYPTION_SLEEP_TIME_KEY = "pii_backfill_encryption_sleep_ms"
    BACKFILL_PII_ENCRYPTION_BATCH_SIZE = 1000
    )

    type BackfillPIIEncryptionCron struct {
    cacheRepo repository.ICacheRepo
    contactRepo contact.ContactInterface
    cfg contact.Config
    }

    The main method BackfillPIIEncryption(job *work.Job) error follows the identical structure to BakfillNameTokenized: check active job → check enabled → set active → loop until empty or force-break → fetch batch via SearchMissingPIIEncryptionBuildPIIEncryptedUpdateFields per contact → BulkUpdateFields → sleep → clear active on exit.

  5. Refactor the HTTP service. Open internal/app/service/pii_backfill_service.go. Remove the outer for page := 1; ; page++ loop. The method body becomes: one call to SearchMissingPIIEncryptionByTeam(ctx, teamID, piiBackfillPageSize, 1) → build update fields → BulkUpdateFields → return result. Single-page, no pagination.

  6. Register the cron. In internal/worker/worker_service.go:

    • Add BackfillPIIEncryptionJobName = "backfill_pii_encryption" and BackfillPIIEncryptionDuration = "0/30 * * * * ?" to the constants block.
    • Add BackfillPIIEncryptionCron cron.BackfillPIIEncryptionCron to CronList.
    • In registerCronJob: pool.PeriodicallyEnqueue(BackfillPIIEncryptionDuration, BackfillPIIEncryptionJobName) + registerJobWithOptions(BackfillPIIEncryptionJobName, options, piiEncCron.BackfillPIIEncryption, pool).
  7. Wire in initializer. In cmd/initializer.go, add after backfillNameTokenizedCron (~line 258):

    backfillPIIEncryptionCron := cron.NewBackfillPIIEncryptionCron(cacheRepo, contactRepo, buildContactRepoCfg(env.Config.PIIEncryption))

    Add it to the server.Handler return struct field BackfillPIIEncryptionCron.

  8. Write tests (cron). In internal/app/cron/backfill_pii_encryption_test.go (same structure as backfill_name_tokenized_test.go):

    • TestBackfillPIIEncryptionCron_ActiveJobExists — cache returns "TRUE" → method returns nil without calling repo.
    • TestBackfillPIIEncryptionCron_Disabled — enabled key empty → returns nil without calling repo.
    • TestBackfillPIIEncryptionCron_ForceBreak — force_break set → exits loop immediately after first check.
    • TestBackfillPIIEncryptionCron_ProcessesBatch — enabled, one batch of 2 contacts returned, BulkUpdateFields called with 2 entries, second call returns empty → loop exits.
  9. Write tests (service). Update internal/app/service/pii_backfill_service_test.go to reflect single-page semantics: remove any multi-page test cases; add TestPIIBackfillService_SinglePageOnly that asserts SearchMissingPIIEncryptionByTeam is called exactly once with page=1.

  10. Run tests. go test ./internal/app/cron/... ./internal/app/service/... -v — all pass.

Acceptance criteria

  • POST /private/contacts/backfill/pii/{team_id} processes at most 100 contacts per call and returns in < 5 s for any team size.
  • The cron job (backfill_pii_encryption) starts automatically on the worker pod when pii_backfill_encryption_enabled is set in Redis.
  • Setting pii_backfill_encryption_force_break stops the cron mid-run within one batch cycle.
  • Logs emit pii_backfill_cron batch_processed per batch and pii_backfill_cron completed when no contacts remain.
  • Concurrent cron triggers are deduplicated via pii_backfill_encryption_active Redis key (TTL 3600 s).
  • The cron skips contacts already encrypted (name_encrypted exists) — idempotent re-runs are safe.

Test strategy

Cron tests use mocks.ContactInterface and repoMock.ICacheRepo (same mocks as backfill_name_tokenized_test.go). Key assertions: SearchMissingPIIEncryption called only when flag is set; BulkUpdateFields receives the correct map keyed by ObjectID; no panic when BuildPIIEncryptedUpdateFields returns empty map (cipher nil).

Effort estimate

DisciplineDays
Backend2
QA0.5
Total2.5

Assumptions: cron mirrors BackfillNameTokenizedCron exactly — no novel patterns; BuildPIIEncryptedUpdateFields already handles name/email/phone/usernames (accounts excluded).

Run to verify

go test ./internal/app/cron/... ./internal/app/service/... -v -run PII

Depends on

None — self-contained. Starts immediately.


Task 1b: [BE] address_encrypted + address_bidx — struct fields + crypto helpers (TF-3435)

Purely additive: add the struct fields and extend the three crypto helpers so new contacts get address_encrypted/address_bidx automatically. No backfill logic here — that is Task 1c.

Status: ✅ Actionable — start after Task 1 is merged so BuildPIIEncryptedUpdateFields can be extended cleanly.

Design reference: n/a — BE only

What to build

Add AddressEncrypted *EncryptedPayload and AddressBidx string to Contact struct. Extend encryptContactPIIFields, decryptContactPIIFields, and BuildPIIEncryptedUpdateFields in crypto_helpers.go.

Implementation Plan

ActionFileWhat changes
extendinternal/app/repository/contact/base.goAdd AddressEncrypted *EncryptedPayload and AddressBidx string to Contact struct after UsernamesBidx (~line 104)
extendinternal/app/repository/contact/crypto_helpers.goencryptContactPIIFields: add address block; decryptContactPIIFields: add address block; BuildPIIEncryptedUpdateFields: emit address_encrypted/address_bidx when c.Address != nil

Implementation steps

  1. address is a Go pointer sub-document — confirmed. From base.go line 78:

    Address *Address `json:"address,omitempty" bson:"address,omitempty"`

    Non-empty check: c.Address != nil. Zero after encrypting: c.Address = nil. AddressEncrypted and AddressBidx do not yet exist on Contact — add them after UsernamesBidx (~line 104):

    AddressEncrypted *EncryptedPayload `json:"address_encrypted,omitempty" bson:"address_encrypted,omitempty"`
    AddressBidx string `json:"address_bidx,omitempty" bson:"address_bidx,omitempty"`
  2. Extend encryptContactPIIFields (~line 65), after the Usernames block:

    if c.Address != nil {
    addrJSON, err := json.Marshal(c.Address)
    if err != nil { return fmt.Errorf("encrypt address marshal: %w", err) }
    addrStr := string(addrJSON)
    ep, err := encryptToPayload(cfg.Cipher, addrStr)
    if err != nil { return fmt.Errorf("encrypt address: %w", err) }
    c.AddressEncrypted = ep
    c.AddressBidx = cfg.Cipher.BlindIndex(addrStr)
    c.Address = nil
    }
  3. Extend decryptContactPIIFields (~line 183), after the Usernames block:

    if c.AddressEncrypted != nil {
    plain, err := decryptFromPayload(cfg.Cipher, c.AddressEncrypted)
    if err != nil { return fmt.Errorf("decrypt address: %w", err) }
    var addr Address
    if err := json.Unmarshal([]byte(plain), &addr); err != nil {
    return fmt.Errorf("unmarshal address: %w", err)
    }
    c.Address = &addr
    }
  4. Extend BuildPIIEncryptedUpdateFields (~line 495), after the Usernames block:

    if c.Address != nil {
    addrJSON, err := json.Marshal(c.Address)
    if err != nil { return nil, fmt.Errorf("BuildPIIEncryptedUpdateFields marshal address: %w", err) }
    addrStr := string(addrJSON)
    ep, err := encryptToPayload(cfg.Cipher, addrStr)
    if err != nil { return nil, fmt.Errorf("BuildPIIEncryptedUpdateFields encrypt address: %w", err) }
    fields["address_encrypted"] = ep
    fields["address_bidx"] = cfg.Cipher.BlindIndex(addrStr)
    }
  5. Write tests. TestEncryptContactPIIFields_Address, TestDecryptContactPIIFields_Address (round-trip: c.Address.ProvinceID == 13, c.Address.CityName == "Kabupaten Tabalong"), TestBuildPIIEncryptedUpdateFields_Address.

  6. Run. go test ./internal/app/repository/contact/... -v -run Address

Acceptance criteria

  • AddressEncrypted *EncryptedPayload and AddressBidx string exist on Contact struct.
  • New contacts created after this merges get address_encrypted/address_bidx in the same write as name_encrypted.
  • Round-trip: decrypt(encrypt(address)) reconstructs the full Address struct field-by-field.
  • No change to any existing field — purely additive.

Effort estimate

DisciplineDays
Backend0.5
QA0.25
Total0.75

Depends on

  • Task 1 (TF-3434) — BuildPIIEncryptedUpdateFields must be merged before extending it here.

Must complete before

  • Task 1c (TF-3436) — backfill reads address_bidx absence as the gap signal; struct field must exist first.

Task 1c: [BE] address backfill — sparse index + bounded HTTP endpoint (TF-3436)

Cleans up the gap population: contacts already processed by the main cron (have name_encrypted) but still missing address_bidx. Runs after the main cron finishes. Uses a sparse index on address to avoid a full collection scan on 150M documents.

Status: ⏳ Deferred — run after all other tickets (Tasks 1, 1b, 2, 3, 4, 5, 6) are complete.

Design reference: n/a — BE only

What to build

  1. Migration: sparse index on address (required before the gap query can run without timeout).
  2. SearchContactsWithAddressMissingEncryption repo method.
  3. POST /private/contacts/backfill/pii/address bounded HTTP endpoint (max 50 pages × 100 = 5,000 docs per call), returning { processed_count, remaining_count }.

Why a sparse index

Without it, { address: { $exists: true } } on 150M documents times out — confirmed in production. A sparse index only stores entries for documents where address exists. Since very few contacts have an address sub-document, this index is small and the query goes directly to the small population.

Implementation Plan

ActionFileWhat changes
createdb/migrations/036_add_address_sparse_index.up.jsonSparse index { address: 1 } on contact
createdb/migrations/036_add_address_sparse_index.down.jsonDrop index
extendinternal/app/repository/contact/search.goAdd SearchContactsWithAddressMissingEncryption(ctx, limit, page) ([]Contact, error)
extendinternal/app/repository/contact/base.goAdd to ContactInterface
extendinternal/app/repository/contact/mocks/ContactInterface.goRe-run mockery or hand-add mock
extendinternal/app/service/pii_backfill_service.goAdd BackfillAddressPII(ctx) (*AddressPIIBackfillResponse, error)
extendinternal/app/payload/pii_backfill_payload.goAdd AddressPIIBackfillResponse { ProcessedCount, RemainingCount }
extendinternal/app/handler/pii_backfill_handler.goAdd BackfillAddressPII handler
extendrouterRegister POST /private/contacts/backfill/pii/address

Implementation steps

  1. Create migration files.

    db/migrations/036_add_address_sparse_index.up.json:

    [
    {
    "createIndexes": "contact",
    "indexes": [
    {
    "key": { "address": 1 },
    "name": "idx_contact_address_sparse",
    "sparse": true,
    "background": true
    }
    ]
    }
    ]

    db/migrations/036_add_address_sparse_index.down.json:

    [{ "dropIndexes": "contact", "index": "idx_contact_address_sparse" }]

    Deployment note: make migrate-up blocks until the index is built. A sparse index on a rarely-present field builds much faster than a full index. Run off-peak before deploying the image.

    Sequence: make migrate-up → deploy image → ops calls endpoint until remaining_count = 0.

  2. Add repo method. In search.go (address: $exists: true now uses the sparse index):

    func (r *ContactRepo) SearchContactsWithAddressMissingEncryption(ctx context.Context, limit int, page int) ([]Contact, error) {
    if limit <= 0 { limit = 100 }
    if page <= 0 { page = 1 }
    filter := bson.M{
    "is_deleted": false,
    "address": bson.M{"$exists": true}, // sparse index — efficient
    "address_bidx": bson.M{"$exists": false},
    }
    // ... same parse loop as SearchMissingPIIEncryptionByTeam
    }

    Add to ContactInterface in base.go.

  3. Add service method. Bounded loop:

    const addressBackfillMaxPages = 50

    for page := 1; page <= addressBackfillMaxPages; page++ {
    contacts, err := s.contactRepo.SearchContactsWithAddressMissingEncryption(ctx, 100, page)
    if err != nil { return nil, err }
    if len(contacts) == 0 { break }
    updateMap := make(map[primitive.ObjectID]bson.M, len(contacts))
    for _, c := range contacts {
    fields, err := BuildPIIEncryptedUpdateFields(s.cfg, c)
    if err != nil { continue }
    if len(fields) > 0 { updateMap[c.ID] = fields }
    }
    if err := s.contactRepo.BulkUpdateFields(ctx, updateMap); err != nil { return nil, err }
    processed += len(contacts)
    if len(contacts) < 100 { break }
    }
    // CountWithFilters for remaining, return AddressPIIBackfillResponse
  4. Add handler + register route. POST /private/contacts/backfill/pii/address, same pattern as BackfillPIIByTeam.

  5. Write tests. TestBackfillAddressPII_NoGap (0/0), TestBackfillAddressPII_SmallGap (map contains address_encrypted/address_bidx), TestBackfillAddressPII_MaxPagesCap (stops at 50 pages).

  6. Run. go test ./internal/app/... -v -run BackfillAddress

Acceptance criteria

  • POST /private/contacts/backfill/pii/address processes gap in batches of 100, max 50 pages per call.
  • After running until remaining_count = 0: every contact with address has address_bidx.
  • Endpoint is idempotent — remaining_count = 0 calls return { processed_count: 0, remaining_count: 0 }.
  • db/migrations/036_add_address_sparse_index.up.json present in the PR.

Effort estimate

DisciplineDays
Backend1
QA0.25
Total1.25

Run to verify

go test ./internal/app/repository/contact/... -v -run Address
go test ./internal/app/service/... -v -run BackfillAddress
curl -X POST http://localhost:8080/private/contacts/backfill/pii/address

Depends on

  • Task 1b (TF-3435) — BuildPIIEncryptedUpdateFields must include address fields before this can write them.

Must complete before

  • Nothing — runs after all other tickets are done; does not gate Phase 3 or Phase 4.

Task 2: [BE] Reconciliation checker + status endpoint (IX)

Engineering can query how many contacts still need backfilling and confirm the gate for Phase 3 (read switch) is met.

Status: ✅ Actionable

Design reference: n/a — BE only (internal ops endpoint)

What to build

Add a CountMissingPIIEncryption repository method, a GetPIIBackfillStatus service method, and a GET /private/contacts/backfill/pii/status endpoint that returns { missing_count, total_count, pct_complete }. Emit a DataDog metric cdp_pii_backfill_missing_count on each call.

Implementation Plan

ActionFileWhat changes
extendinternal/app/repository/contact/search.goAdd CountMissingPIIEncryption(ctx) (int64, error) using existing CountWithFilters pattern
extendinternal/app/repository/contact/base.goAdd CountMissingPIIEncryption to ContactInterface
extendinternal/app/repository/contact/mocks/ContactInterface.goRe-run mockery or hand-add mock method
extendinternal/app/service/pii_backfill_service.goAdd GetPIIBackfillStatus(ctx) (*PIIBackfillStatusResponse, error) to IPIIBackfillService interface + impl
extendinternal/app/payload/pii_backfill_payload.goAdd PIIBackfillStatusResponse { MissingCount, TotalCount, PctComplete }
extendinternal/app/handler/pii_backfill_handler.goAdd GetPIIBackfillStatus handler method
extendinternal/server/rest.go (or wherever private routes are wired)Register GET /private/contacts/backfill/pii/status
extendinternal/app/service/pii_backfill_service_test.goTest: zero missing → 100%, non-zero missing → correct pct
extendinternal/app/handler/pii_backfill_handler_test.goTest: 200 response shape

Implementation steps

  1. Explore. Open internal/app/repository/contact/search.go line 147 (CountWithFilters) and internal/app/handler/pii_backfill_handler.go (line 40). Note the handler uses myhttp.NewJSONResponse and returns (myhttp.ResponseBody, error). Follow the same pattern.

  2. Add CountMissingPIIEncryption. In search.go, after CountWithFilters (line ~155):

    func (r *ContactRepo) CountMissingPIIEncryption(ctx context.Context) (int64, error) {
    return r.CountWithFilters(ctx, bson.M{
    "is_deleted": false,
    "name": bson.M{"$nin": bson.A{nil, ""}},
    "name_encrypted": bson.M{"$exists": false},
    })
    }

    Add to ContactInterface in base.go.

  3. Add service method. In pii_backfill_service.go, add to IPIIBackfillService:

    GetPIIBackfillStatus(ctx context.Context) (*payload.PIIBackfillStatusResponse, error)

    Implementation calls CountMissingPIIEncryption (missing) and CountWithFilters(ctx, { is_deleted: false, name: { $ne: "" } }) (total), computes pct_complete = (total - missing) / total * 100, emits datadog.StatsDClient().Gauge("cdp_pii_backfill_missing_count", float64(missing), ...).

  4. Add payload struct. In internal/app/payload/pii_backfill_payload.go:

    type PIIBackfillStatusResponse struct {
    MissingCount int64 `json:"missing_count"`
    TotalCount int64 `json:"total_count"`
    PctComplete float64 `json:"pct_complete"`
    }
  5. Add handler method. In pii_backfill_handler.go:

    func (h *PIIBackfillHandler) GetPIIBackfillStatus(w http.ResponseWriter, r *http.Request) (myhttp.ResponseBody, error) {
    result, err := h.piiBackfillService.GetPIIBackfillStatus(r.Context())
    if err != nil { return myhttp.ResponseBody{}, myhttp.ErrInternal() }
    return myhttp.NewJSONResponse(result, nil), nil
    }
  6. Register route. Find where POST /private/contacts/backfill/pii/{team_id} is registered (likely internal/server/rest.go or similar router file). Add GET /private/contacts/backfill/pii/status alongside it using the same auth middleware.

  7. Write tests. In pii_backfill_service_test.go: mock CountMissingPIIEncryption → 50, CountWithFilters → 1000; assert PctComplete == 95.0. In pii_backfill_handler_test.go: assert 200 + response body shape.

  8. Run tests. go test ./internal/app/... -v -run PIIBackfill

Acceptance criteria

  • GET /private/contacts/backfill/pii/status returns { missing_count: N, total_count: M, pct_complete: X } in < 500 ms.
  • missing_count reaches 0 when backfill cron has processed all contacts (gate for Task 3).
  • DataDog metric cdp_pii_backfill_missing_count is emitted on each call.
  • pct_complete is 100.0 when missing_count = 0.

Test strategy

Service test mocks both count calls. Handler test mocks the service interface. Key assertion: integer division is avoided in pct_complete (use float64).

Effort estimate

DisciplineDays
Backend1
QA0.25
Total1.25

Assumptions: CountWithFilters already exists in the interface and is usable as-is; route registration follows an identical pattern to the existing backfill POST route.

Run to verify

go test ./internal/app/... -v -run PIIBackfill
# then manually:
curl -X GET http://localhost:8080/private/contacts/backfill/pii/status

Depends on

  • Task 1 (cron must reach missing_count = 0 in staging before Task 3 starts)

Task 3: [BE] Encrypted-first read switch (X)

When the pii_read_encrypted_enabled flag is toggled on, all contact reads serve decrypted values from *_encrypted fields instead of plaintext — zero API contract change, fully flag-reversible.

Status: ✅ Actionable — start implementation in parallel with cron running in staging; do not toggle the flag in production until reconciliation shows missing_count = 0.

Design reference: n/a — BE only

What to build

Wire decryptContactPIIFields(cfg, &contact) into every repository read path. Add a Redis flag check pii_read_encrypted_enabled (reusing the existing cacheRepo pattern). When the flag is on and a contact has name_encrypted set, populate Name/Email/Phone/Usernames from the encrypted fields; if a field's encrypted counterpart is nil, the bson-unmarshalled plaintext value stays unchanged (field-level fallback). Emit cdp_pii_read_plaintext_fallback counter when the fallback triggers.

Implementation Plan

ActionFileWhat changes
createinternal/app/repository/contact/decrypt_helper.goNew helper maybeDecrypt(ctx, cfg, cacheRepo, contact *Contact) error — reads Redis flag once per call, calls decryptContactPIIFields if flag on + name_encrypted != nil
extendinternal/app/repository/contact/search.goCall maybeDecrypt at the end of SearchByID, SearchByEmail, SearchByPhone, SearchByCompanySsoID, SearchWithFilters, SearchByBSUID read paths
extendinternal/app/repository/contact/base.goAdd cacheRepo repository.ICacheRepo field to ContactRepo struct; update NewContactRepo signature accordingly
extendcmd/initializer.goPass cacheRepo to NewContactRepo
extendinternal/app/repository/contact/search.goIn maybeDecrypt: if NameEncrypted == nil after flag=on, emit DataDog counter cdp_pii_read_plaintext_fallback
createinternal/app/repository/contact/decrypt_helper_test.goTests: flag off → no decrypt; flag on + encrypted set → Name populated; flag on + encrypted nil → fallback counter incremented

Implementation steps

  1. Explore. Open internal/app/repository/contact/crypto_helpers.go line 183 (decryptContactPIIFields) and internal/app/cron/backfill_name_tokenized.go lines 126–138 (checkActiveJob / Redis GET pattern). The new flag check reuses the same cacheRepo.Get(ctx, "pii_read_encrypted_enabled") pattern.

  2. Create decrypt_helper.go. New file in internal/app/repository/contact/:

    package contact

    const PIIReadEncryptedEnabledKey = "pii_read_encrypted_enabled"

    func maybeDecrypt(ctx context.Context, cfg Config, cacheRepo repository.ICacheRepo, c *Contact) error {
    if cfg.Cipher == nil {
    return nil
    }
    val, err := cacheRepo.Get(ctx, PIIReadEncryptedEnabledKey)
    if err != nil || val == "" {
    return nil // flag off → serve plaintext
    }
    if c.NameEncrypted == nil {
    // doc not yet backfilled — emit fallback metric
    _ = datadog.StatsDClient().Incr("cdp_pii_read_plaintext_fallback", []string{})
    return nil
    }
    return decryptContactPIIFields(cfg, c)
    }
  3. Add cacheRepo to ContactRepo. In base.go, extend the struct and NewContactRepo:

    type ContactRepo struct {
    mongo repository.IDbRepo
    cfg Config
    cacheRepo repository.ICacheRepo // new
    }
    func NewContactRepo(mongo repository.IDbRepo, cfg Config, cacheRepo repository.ICacheRepo) ContactInterface {
    return &ContactRepo{mongo: mongo, cfg: cfg, cacheRepo: cacheRepo}
    }

    Update all callers of NewContactRepo in cmd/initializer.go.

  4. Wire into read paths. In search.go, after every parseData call that produces a Contact (or slice of Contact), add maybeDecrypt(ctx, r.cfg, r.cacheRepo, &datum). Methods to update: SearchByID, SearchByEmail, SearchByPhone, SearchByBSUID, SearchByCompanySsoID (loop), SearchWithFilters (loop). Do not wire into SearchMissingPIIEncryption or SearchMissingPIIEncryptionByTeam — those are backfill-only queries that need plaintext.

  5. Write tests. In decrypt_helper_test.go:

    • TestMaybeDecrypt_FlagOff — cacheRepo returns "" → contact fields unchanged.
    • TestMaybeDecrypt_FlagOn_Encrypted — cacheRepo returns "1", contact has NameEncrypted set → Name populated from decryption.
    • TestMaybeDecrypt_FlagOn_NotBackfilled — flag on, NameEncrypted == nil → fallback counter incremented, no error.
  6. Run tests. go test ./internal/app/repository/contact/... -v

Acceptance criteria

  • With pii_read_encrypted_enabled = "" (unset): GET /iag/v1/contacts/{id} response is identical to today.
  • With pii_read_encrypted_enabled = "1": response name, email, phone, usernames match the original plaintext values (round-trip correctness).
  • A contact whose name_encrypted is nil (not yet backfilled) does not error — returns plaintext and increments cdp_pii_read_plaintext_fallback.
  • Setting pii_read_encrypted_enabled = "" reverts all reads to plaintext within one Redis TTL — no pod restart needed.
  • cdp_pii_decrypt_error_count is zero under normal operation.

Test strategy

decrypt_helper_test.go uses repoMock.ICacheRepo for the Redis flag and a hand-crafted Contact with pre-populated *_encrypted fields. Key assertion: after maybeDecrypt, c.Name equals the known plaintext used to build the encrypted payload.

Effort estimate

DisciplineDays
Backend2
QA0.5
Total2.5

Assumptions: decryptContactPIIFields in crypto_helpers.go already handles name/email/phone/usernames correctly; wiring is mechanical across ~6 methods. The extra 0.5 day accounts for updating NewContactRepo callers throughout the initializer.

Run to verify

go test ./internal/app/repository/contact/... -v -run Decrypt

Depends on

  • Task 2 — do not toggle the flag in production until reconciliation shows missing_count = 0 for 2 consecutive runs.

Task 4: [BE] Blind-index search — SearchByEmail + SearchByPhone (XI, XII)

Exact-match searches for email and phone use the deterministic blind-index fields (email_bidx, phone_bidx) when the encrypted-read flag is on, so they remain functional after plaintext fields are removed in Phase 4.

Status: ✅ Actionable — can be built while Task 3 is in staging.

Design reference: n/a — BE only

What to build

Gate SearchByEmail and SearchByPhone behind the same pii_read_encrypted_enabled flag. When on: compute cfg.Cipher.BlindIndex(normalized_input) and query email_bidx / phone_bidx instead of the plaintext fields. When off: existing query unchanged. Both methods already exist in internal/app/repository/contact/search.go lines 41 and 55.

Implementation Plan

ActionFileWhat changes
extendinternal/app/repository/contact/search.goSearchByEmail: add flag check; when on, query { email_bidx: cfg.Cipher.BlindIndex(strings.ToLower(email)) }
extendinternal/app/repository/contact/search.goSearchByPhone: add flag check; when on, query { phone_bidx: cfg.Cipher.BlindIndex(phone) }
extendinternal/app/repository/contact/search_test.go (or create if absent)Tests: flag off → old query; flag on → bidx query; flag on, cipher nil → old query (safe fallback)

Implementation steps

  1. Explore. Open search.go lines 41–66. SearchByEmail currently queries { email: email } after strings.ToLower. SearchByPhone queries { phone: phone }. Both use r.mongo.FindBy.

  2. Extend SearchByEmail (line 41):

    func (r *ContactRepo) SearchByEmail(ctx context.Context, email string, company_sso_id string) (data Contact, err error) {
    email = strings.ToLower(email)

    var filter bson.M
    if r.cfg.Cipher != nil {
    val, _ := r.cacheRepo.Get(ctx, PIIReadEncryptedEnabledKey)
    if val != "" {
    filter = bson.M{"company_sso_id": company_sso_id, "email_bidx": r.cfg.Cipher.BlindIndex(email), "is_deleted": false}
    }
    }
    if filter == nil {
    filter = bson.M{"company_sso_id": company_sso_id, "email": email, "is_deleted": false}
    }

    result, err := r.mongo.FindBy(ctx, Contact{}.TableName(), filter)
    // ... existing parseData, return
    }
  3. Extend SearchByPhone (line 55) — same pattern with phone_bidx and no lowercase normalization (phone is stored as-is).

  4. Write tests. Using mocks.ContactInterface or directly testing ContactRepo with a mock Mongo:

    • TestSearchByEmail_FlagOff — flag unset → filter contains email not email_bidx.
    • TestSearchByEmail_FlagOn — flag set → filter contains email_bidx.
    • TestSearchByPhone_FlagOn — filter contains phone_bidx.
    • TestSearchByEmail_CipherNil — cipher is nil → always uses plaintext filter regardless of flag.
  5. Run. go test ./internal/app/repository/contact/... -v -run SearchBy

Acceptance criteria

  • SearchByEmail("test@example.com", teamID) with flag on returns the same contact as with flag off (result parity verified in staging against known data).
  • SearchByPhone("+628111", teamID) with flag on returns the same contact as with flag off.
  • When cipher is nil (encryption disabled), both methods always use the plaintext query — no panic.
  • No change to response shape or API contract.

Test strategy

Tests use repoMock.ICacheRepo for the flag and mocks.ContactInterface for the repo (or a minimal integration test with a real mongo Docker container). Key assertion: the bson.M filter passed to FindBy contains the correct field name (email_bidx vs email).

Effort estimate

DisciplineDays
Backend1
QA0.25
Total1.25

Assumptions: blind-index lookup returns only exact matches — no partial email/phone search after Phase 4 (accepted product constraint). PIIReadEncryptedEnabledKey constant is defined in Task 3's decrypt_helper.go and reused here.

Run to verify

go test ./internal/app/repository/contact/... -v -run SearchByEmail
go test ./internal/app/repository/contact/... -v -run SearchByPhone

Depends on

  • Task 3 (needs PIIReadEncryptedEnabledKey constant and cacheRepo on ContactRepo)

Task 5: [BE] Update SearchContactRequest.ToFilters() for encrypted search paths (XIV)

The general contact search endpoint correctly routes name, email, and phone filter inputs to their blind-index or token-array counterparts when the encrypted-read flag is enabled, maintaining search parity through Phase 3.

Status: ✅ Actionable

Design reference: n/a — BE only

What to build

Extend ToFilters() in internal/app/payload/search_contact_request.go (line 119) to:

  1. When req.Name is set: route to name_search token array instead of name_tokenized (same token algorithm, different field — name_search is populated by dual-write; name_tokenized stays as the pre-encryption field).
  2. When req.Email is set and the flag is on: compute email_bidx and query it; otherwise keep existing plaintext query.
  3. When req.Phone[] is set and the flag is on: compute phone_bidx per entry and query; otherwise keep existing query.

ToFilters needs access to cfg.Cipher and the Redis flag — inject these via a new struct or constructor parameter on SearchContactRequest.

Implementation Plan

ActionFileWhat changes
extendinternal/app/payload/search_contact_request.goAdd Cipher pkgcrypto.ICipher and EncryptedReadEnabled bool fields to SearchContactRequest; populate in the handler before calling ToFilters()
extendinternal/app/payload/search_contact_request.goIn ToFilters(): name block → name_search token array; email block → email_bidx when flag+cipher set; phone block → phone_bidx when flag+cipher set
extendinternal/app/handler/contact_handler.go (and any other handler that constructs SearchContactRequest)Populate Cipher + EncryptedReadEnabled from cfg + Redis before calling ToFilters()
extendinternal/app/payload/search_contact_request_test.go (create if absent)Tests: name token routing; email/phone bidx routing with flag on; fallback to plaintext with flag off

Implementation steps

  1. Explore. Open search_contact_request.go lines 119–248. Note the Name block (lines 167–178) uses util.TokenizeString and name_tokenized. The Email and Phone fields are currently passed through bson marshal/unmarshal directly (lines 120–129) — they become plain field queries. Find where SearchContactRequest is constructed in contact_handler.go.

  2. Add fields to SearchContactRequest.

    // Populated by the handler before ToFilters; not serialized.
    Cipher pkgcrypto.ICipher `json:"-" bson:"-"`
    EncryptedReadEnabled bool `json:"-" bson:"-"`
  3. Update ToFilters() — name block (replace name_tokenized with name_search):

    if req.Name != "" {
    tokens := util.TokenizeString(req.Name)
    delete(filters, "name")
    tokenField := "name_search" // was name_tokenized; name_search is populated by dual-write
    var conditions []bson.M
    for _, token := range tokens {
    conditions = append(conditions, bson.M{tokenField: bson.M{"$elemMatch": bson.M{"$regex": "^" + token}}})
    }
    AppendConditions(&filters, "$and", conditions)
    }
  4. Update ToFilters() — email block (after the bson marshal/unmarshal, add override):

    if req.Email != "" && req.EncryptedReadEnabled && req.Cipher != nil {
    delete(filters, "email")
    filters["email_bidx"] = req.Cipher.BlindIndex(strings.ToLower(req.Email))
    }
  5. Update ToFilters() — phone block (primitive.A case in the switch, after populating $or conditions):

    if len(req.Phone) > 0 && req.EncryptedReadEnabled && req.Cipher != nil {
    // replace existing $or phone conditions with bidx lookups
    delete(filters, "$or") // remove the phone $or just added
    var bidxConditions []bson.M
    for _, p := range req.Phone {
    bidxConditions = append(bidxConditions, bson.M{"phone_bidx": req.Cipher.BlindIndex(p)})
    }
    AppendConditions(&filters, "$or", bidxConditions)
    }
  6. Wire in handler. In contact_handler.go (wherever req.ToFilters() is called), before calling it:

    req.Cipher = h.contactRepoCfg.Cipher
    val, _ := h.cacheRepo.Get(ctx, contact.PIIReadEncryptedEnabledKey)
    req.EncryptedReadEnabled = val != ""
  7. Write tests. In search_contact_request_test.go:

    • Name always routes to name_search (not name_tokenized).
    • Email with flag off → email field in filter.
    • Email with flag on + cipher → email_bidx field in filter.
    • Phone with flag on + cipher → phone_bidx in filter.
  8. Run. go test ./internal/app/payload/... -v

Acceptance criteria

  • SearchContacts with name=john queries name_search (not name_tokenized).
  • SearchContacts with email=test@example.com and flag on returns the same results as plaintext search against the same dataset (staging smoke test).
  • SearchContacts with phone[]=+628111 and flag on returns the correct contact.
  • With flag off, all three search fields behave identically to the pre-task behavior.

Test strategy

Unit tests construct a SearchContactRequest with controlled Cipher/EncryptedReadEnabled, call ToFilters(), and assert the bson.M filter contains the expected keys. No real MongoDB needed.

Effort estimate

DisciplineDays
Backend1
QA0.25
Total1.25

Assumptions: SearchContactRequest handler wiring is straightforward; pkgcrypto.ICipher is already importable in the payload package without import cycles (verify — if cycle exists, pass a BlindIndex func(string) string instead).

Run to verify

go test ./internal/app/payload/... -v -run ToFilters

Depends on

  • Task 3 (needs PIIReadEncryptedEnabledKey constant)
  • Task 4 (same flag semantics; implement after confirming blind-index query correctness)

Task 6: [BE] Phase 4 cleanup — remove legacy plaintext write + unset migration (XV)

All plaintext PII is removed from the contact collection; the service writes only encrypted fields; the plaintext fallback read path is deleted. Irreversible at the data level.

Status: ⚠️ Gated — do not start until all Phase 4 gate criteria pass (see Acceptance criteria below).

Design reference: n/a — BE only

What to build

  1. Stop writing plaintext PII fields (name, email, phone, usernames) in create/update paths — write encrypted fields only.
  2. Add a one-time MongoDB $unset migration for all existing plaintext fields.
  3. Remove the plaintext fallback branch from maybeDecrypt (Task 3).

Implementation Plan

ActionFileWhat changes
extendinternal/app/repository/contact/crypto_helpers.goIn encryptContactPIIFields: remove the c.Name = "" / c.Email = "" / c.Phone = nil / c.Usernames = nil zeroing lines — the zeroing is now permanent (plaintext not stored); plaintext fields in the $set map are never sent
extendinternal/app/repository/contact/create.goAfter encryptContactPIIFields, explicitly zero the plaintext fields on the Contact before insert so bson omitempty drops them
extendinternal/app/repository/contact/update.goIn applyDualWriteToBSONMap: after writing encrypted counterparts, delete the plaintext key from fields (delete(fields, "name") etc.)
extendinternal/app/repository/contact/decrypt_helper.goRemove the name_encrypted == nil fallback branch; if flag on but no name_encrypted, return an error (log + skip — doc should not exist post-migration)
createdb/migrations/0NN_contact_pii_cleanup.up.jsonMongoDB $unset migration for name, email, phone, usernames on all non-deleted contacts with name_encrypted present
createdb/migrations/0NN_contact_pii_cleanup.down.jsonDown migration: no-op (data cannot be restored from down; point-in-time restore is the rollback)
extendinternal/app/repository/contact/create_test.goAssert that inserted doc does NOT contain name/email/phone/usernames plaintext fields
extendinternal/app/repository/contact/update_test.goAssert that $set map does NOT contain plaintext PII fields

Implementation steps

  1. Confirm gate. Before any code change, verify all of the following:

    • GET /private/contacts/backfill/pii/status returns missing_count = 0 for 2 consecutive runs.
    • pii_read_encrypted_enabled has been on for ≥ 7 days in production with zero cdp_pii_decrypt_error_count.
    • cdp_pii_read_plaintext_fallback counter is zero for 24 h.
    • Data Platform confirms TF-2589 (Kafka consumer + ETL paths) is complete.
    • Infosec sign-off documented in a delivery/decisions/ ADR.
  2. Write the $unset migration. In db/migrations/0NN_contact_pii_cleanup.up.json:

    [
    {
    "updateMany": {
    "filter": { "name_encrypted": { "$exists": true }, "is_deleted": false },
    "update": { "$unset": { "name": "", "email": "", "phone": "", "usernames": "" } }
    },
    "collection": "contact"
    }
    ]

    Test this on a staging MongoDB snapshot first. Verify document count before/after; spot-check 5 random docs to confirm name_encrypted present and name absent.

  3. Stop writing plaintext in create. Open internal/app/repository/contact/create.go. After encryptContactPIIFields is called on the contact struct, add explicit nil-outs:

    data.Name = ""
    data.Email = ""
    data.Phone = nil
    data.Usernames = nil

    The omitempty bson tag will then omit them from the insert document.

  4. Stop writing plaintext in update. Open internal/app/repository/contact/update.go / applyDualWriteToBSONMap in crypto_helpers.go. After encryptContactPIIFields writes the encrypted fields into the map, add delete(fields, "name"), delete(fields, "email"), delete(fields, "phone"), delete(fields, "usernames") to remove plaintext from the $set.

  5. Remove fallback. In decrypt_helper.go, remove the name_encrypted == nil block. The method should return an error (or a structured log + no-op) if the flag is on but the document somehow lacks name_encrypted — these docs should not exist post-migration.

  6. Write tests. Integration-style test on create.go: create a contact with Name="John", call InsertContact, assert the inserted bson document (captured via mock or test-DB) does NOT contain the name key. Similarly for applyDualWriteToBSONMap: assert fields does not contain "name" after the function returns.

  7. Run tests. go test ./internal/app/repository/contact/... -v

  8. Run migration. Apply 0NN_contact_pii_cleanup.up.json in staging first; confirm no contacts have both name and name_encrypted set. Then apply in production during a low-traffic window.

Acceptance criteria

Gate criteria (must pass before coding starts):

  • Reconciliation: missing_count = 0 for 2 consecutive status endpoint checks.
  • pii_read_encrypted_enabled has been active in production for ≥ 7 days.
  • cdp_pii_read_plaintext_fallback = 0 for 24 h.
  • TF-2589 (Kafka/consumer paths) signed off by Data Platform.
  • Infosec sign-off recorded in cdp/pii-encryption/delivery/decisions/0001-phase4-plaintext-removal.md.

Implementation criteria:

  • POST /iag/v1/contacts (create): inserted document contains name_encrypted and does NOT contain name.
  • PUT /iag/v1/contacts/{id} (update): $set payload contains name_encrypted and does NOT contain name.
  • GET /iag/v1/contacts/{id}: response name, email, phone are correctly decrypted and match the original values.
  • MongoDB $unset migration completes with zero errors; spot-check confirms name absent on 10 random docs with name_encrypted set.
  • No plaintext PII appears in service logs after migration.

Test strategy

Create/update tests use a test-double or real test-DB docker container to inspect the actual bson document written to Mongo. The key assertion is field absence (the name key must not be present in the BSON document), not just value correctness.

Effort estimate

DisciplineDays
Backend1.5
QA1
Total2.5

QA is heavier here (1 day) because the migration is irreversible: QA must validate staging end-to-end before production, including a full smoke test of create/read/update/search after the unset migration runs.

Run to verify

go test ./internal/app/repository/contact/... -v -run Create
go test ./internal/app/repository/contact/... -v -run Update
# then verify migration on staging:
# mongo contact --eval 'db.contact.findOne({ name_encrypted: { $exists: true } })' → name field absent

Depends on

  • All preceding tasks (1–5)
  • External: TF-2589 (Kafka/consumer path updates by Data Platform)
  • Gate: all criteria listed above

Ordering rationale

  • Task 1 first — fixes the immediate production problem (timeout). Includes the name_encrypted index migration (035) — run make migrate-up off-peak before deploying.
  • Task 1b immediately after Task 1 — extends BuildPIIEncryptedUpdateFields with address fields; purely additive, no backfill logic. Can ship while the main cron is still running in production.
  • Task 2 in parallel with 1b — the reconciliation status endpoint is independent; build while the cron runs in staging.
  • Task 3 only after Task 2 confirms cleanmissing_count = 0 (main reconciliation) required before toggling pii_read_encrypted_enabled in production.
  • Task 1c last, after all other tickets done — deferred; does not gate Phase 3 or Phase 4. Requires the sparse index migration (036) applied off-peak before running. The bounded HTTP endpoint cleans up the address gap independently.
  • Task 3 and 4 can overlap — both depend on ContactRepo.cacheRepo but not on each other. Start Task 4 as soon as the cacheRepo field is merged from Task 3.
  • Task 5 after Task 4ToFilters() reuses PIIReadEncryptedEnabledKey and cipher; build after confirming blind-index queries return correct results in Task 4.
  • Task 6 last and gated — irreversible; do not start until all gate criteria pass. Push on Data Platform to complete TF-2589 early — it is the external dependency most likely to delay Phase 4.

Skipped stories

Story / TaskReason
XIII — SearchByAccountUniqueID → accounts_bidxExcluded by product decision: accounts field is not used and will not be encrypted.
Kafka consumer + webhook paths (TF-2589)Owned by Data Platform; out of RFC scope. Required as a Phase 4 gate — must complete before Task 6 starts.
Key rotation (MultiAlgAdapter)Out of scope: follow-up after steady-state confirmed.