Skip to main content

[PRD] CEBE | Core Infrastructure & CDP Customer Event — Phase 1 (TECH)

HEADER BLOCK

FieldValue
PMZhelia Alifa
PRD Version1.4
StatusREADY
PRD TypeTECH
EpicTF-3302
SquadCDP Squad (lead) + BI/Data Squad (co-own)
RFC LinkREQUIRED before BUILD — joint Eng + Data RFC (Q3 scope)
Figma MasterN/A — no UI changes
AnchorYes — Customer Event-Based Engine (CEBE) — ANCHOR
PhasePhase 1 of 3 (Q3 2026)
Labelsepic:cdp | module:platform | feature:cebe-core-infra
Last Updated2026-06-26

READY gate: Epic cannot move to In Progress without PRD Link + RFC Link. RFC is critical — it details the implementation approach the PM is not prescribing.


CONDITIONAL BLOCK: TECH CONTEXT

FieldDetail
Problem (technical)Qontak has no shared, event-driven customer data layer. Each module writes its own DB, so there is no standardized way for any squad to publish customer events, no single Central DB, and no reusable metrics/segments.
Expected outcomeA live event-driven foundation by end of Q3 2026: (1) a standardized, reusable event schema any squad can push to (keyed on qontak_customer_id via ContactResolver — "push like Mixpanel"); (2) Central DB schema; (3) event adapter layer (contact-service); (4) retriever/metrics layer; (5) a documented query approach for downstream features. CDP customer-data-change + segment events are the first source. Propagation < 5s (p99), zero event loss.
Scope — PM-owned• Standardized reusable event schema published & adopted (CDP + Marketing)
• Central DB schema delivered
• Retriever layer produces reusable metrics (e.g. % reply rate, % ads conversion)
• Query approach documented for segmentation + marketing automation
• <5s (p99) propagation SLA; zero event loss; idempotency on qontak_customer_id
• CDP emits customer.created/updated + segment.entered/exited as first source
• Socialize to all squads by end of Q3
Scope — Eng-owned (PM does not prescribe — joint Eng + Data RFC details these)• Storage model choice (event-level log vs aggregated vs hybrid)
• Streaming infra choice (Kafka / PubSub / existing pipeline)
• Adapter implementation, partitioning, indexing strategy
• Retriever computation strategy (batch vs streaming vs on-demand)
User-facing UI changesNone. This is a backend/data foundation. Any UI change is out of scope.

1. One-liner + Problem

One-liner: Build CEBE's event-driven foundation — reusable event schema, Central DB, adapter, retriever, and query layer — with CDP customer-data events as the first source.

Problem: There is no shared event layer, so no squad can publish customer events in a standard way and no Central DB exists as a single source of truth. Every intelligence feature (segmentation, marketing automation, AI memory) re-solves cross-module data access from scratch, which is slow, inconsistent, and unscalable. Without this Q3 foundation, the entire CEBE initiative and all downstream H2 2026 features are blocked.


2. What Happens If We Don't Build This

  • Blocks the whole initiative. Phase 2 (Q4) and Phase 3 (Q1'27) cannot start — no foundation to push events onto.
  • No reusable metrics or segments. Marketing and segmentation stay manual; reply-rate and ads-conversion can't be computed centrally.
  • Per-squad re-implementation cost compounds every quarter the shared layer is absent.

3. Target Users + Persona Context

PersonaRoleImpactCurrent State
Indirect — All product squads + Business Admin / Marketing OwnerSquads that will publish customer events; admins/marketers who consume segments & metricsSquads get a standard, self-serve way to push events; consumers get fresh, unified dataNo standard event contract; siloed per-module DBs; stale, manually reconciled data

4. Non-Goals

  1. Does not build the segmentation UI, marketing automation UI, AI memory, or health score — these are downstream consumers in later work.
  2. Does not connect Communication, Ticket, Loyalty, Deal, Commerce, or Booking events — those are Phase 2 / Phase 3.
  3. Does not change any module's existing user-facing behavior.
  4. Does not define a customer-facing API; scope is the internal event contract + Central DB + retriever/query layer.
  5. Does not own Marketing (Broadcast + Ads) event emission — that is the Broadcast squad's SUPPORT PRD; this PRD provides the schema + adapter they push to.

5. Scope Changes

Engineering surfaces this PRD touches (controlled vocab). Kept in sync with the scope_changes frontmatter above.

  • Backendcontact-service: new Event Adapter Layer (ingestion consumer/endpoint), ContactResolver identity resolution keyed on qontak_customer_id, idempotency + duplicate-identity dedup, PII masking at the adapter; CDP emits customer.created / customer.updated / segment.entered / segment.exited as the first source.
  • Data — Central DB schema (event-level log + customer-level aggregation), retriever/metrics materialization (e.g. % reply rate, % ads conversion), documented query approach for segmentation + marketing automation, 12-month historical backfill, joint BI/Data feasibility assessment.
  • Infra — streaming/queue capacity + async buffer to hold the <5s p99 propagation SLA with zero loss; monitoring & alerting on propagation latency and loss reconciliation.
  • Docs — standardized event-schema spec + event-push guide socialized to all squads at end of Q3.
  • Frontend / Mobile / Design — None (no UI surface in this phase).

6. Constraints

ConstraintDetail
PlatformBackend / data platform (contact-service + Central DB). No UI surface.
Performance targetEvent propagation < 5s p99 (source → Central DB). Aggregated customer query < 1–2s for UI consumers.
Uptime / SLAZero data loss across module event streams.
Backward compatEvent schema is additive/versioned — schema changes must not break existing publishers.
Plan / tierN/A — internal platform foundation; no plan/tier gating on this initiative.
Data integrityIdempotency via qontak_customer_id; dedup of duplicate identities (phone/email); handle late-arriving events.
SecurityPII masked at the adapter layer per role-based access policy; team-level scope.
Feature flagcebe_event_ingestion | default: OFF (per-source enablement) — CDP + Marketing first.
BackfillOn first integration, backfill last 12 months of historical events.

6.7 Data Lifecycle

Artifact TypeRetention PeriodCleanup TriggerUser-Visible Effect
Raw event log (append-only)Per Data governance (TBD in RFC — default 24 months)Partition expiry job by timeNone
Backfill processing job artifacts7 daysCleanup job on backfill completionNone
Aggregated/materialized metricsLive (refreshed)Recompute on new eventsNone

7. Rollout

AspectDetail
Feature flagcebe_event_ingestion | default: OFF (per source: CDP, then Marketing)
RolloutStage 1 → Staging: schema + adapter + Central DB validated with synthetic events
Stage 2 → Canary: CDP events from N pilot accounts; verify <5s + zero loss
Stage 3 → CDP events all accounts + Marketing (Broadcast+Ads) events on
GA → Retriever metrics + query approach published; socialize to all squads
Backward compatOld per-module DBs remain untouched; CEBE runs alongside (read path additive).
MigrationBackfill last 12 months for CDP on first connect (see RFC for backfill + rollback).
Rollback planFlag OFF stops ingestion per source within minutes; no destructive change to source modules.

7.4 Migration Transition Window

AspectDetail
Old behaviorConsumers read per-module DBs directly (existing behavior unchanged).
New behaviorConsumers read unified data from CEBE Central DB / retriever layer.
Coexistence periodThrough Q3–Q4 while modules onboard; both paths valid.
End stateOnce a module is on CEBE, downstream consumers migrate to CEBE reads for that domain.

8. Observability

Key Events

Event NameTriggerProperties
cebe_event_ingestedEvent written to Central DBsource_module, event_name, qontak_customer_id, latency_ms, timestamp
cebe_event_rejectedEvent fails validation/dedupsource_module, reason, qontak_customer_id
cebe_propagation_slo_breachPropagation > 5ssource_module, latency_ms
cebe_backfill_progressBackfill batch processedsource_module, records_processed, percent_complete
cebe_retriever_metric_computedMetric recomputedmetric_name, compute_time_ms

Dashboard & Alerts

AspectDetail
Dashboard ownerBI/Data Squad (initiative) + CDP Squad (adapter/service)
Baseline periodCapture ingestion latency baseline during staging/canary before GA.
Alerts• Propagation latency p99 > 5s for 5 min → PagerDuty on-call (CDP)
• Event rejection rate > 1% in 10 min → PagerDuty on-call (CDP)
• Zero-loss check mismatch (source count vs CEBE count) → Engineering escalation same day

8.5 Post-Launch Monitoring Cadence

AspectDetail
Review cadenceWeekly for first 4 weeks post-GA, then monthly
OwnerCDP PM + BI/Data lead
Review scopePropagation latency, rejection rate, zero-loss reconciliation, retriever freshness
Trigger thresholds• p99 latency > 5s for 2 consecutive days → investigation
• Loss reconciliation mismatch > 0 → same-day engineering escalation
Rollback considerationIf ingestion causes source-module degradation, flag OFF for affected source and investigate.

9. Success Metrics

CategoryMetricDefinitionBaselineTarget
PerformanceEvent propagation latency (p99)Time from source event emission to availability in Central DBN/A — new< 5s p99 by end of Q3 2026
Reliability & QualityZero event lossSource event count vs CEBE-ingested count reconciliationN/A0 lost events — ongoing
Reliability & QualityFirst-source coverageCDP customer + Marketing (Broadcast+Ads) events live on CEBE0Both sources live by end of Q3 2026
EnablementReusable metrics & query approach availableReply rate, ads conversion, use-case segments queryable from CEBEN/APublished + documented for segmentation/MA by end of Q3 2026
EnablementSquad adoption readinessStandardized event-push schema + guide socialized to all squads0All squads onboarded to push their domain events starting Q4 2026

10. Launch Plan & Stage Gates

StageAudienceDurationSuccess Gate to AdvanceOwner
Internal StagingStaging + synthetic events~2 weeksSchema + adapter + DB validated; <5s on synthetic loadPM + Eng (CDP) + BI/Data
CanaryCDP events, N pilot accounts~2 weeks<5s p99, zero loss, idempotency verifiedEng Lead (CDP)
Staged RolloutCDP all accounts + Marketing source~2–3 weeksSLA sustained; retriever metrics correct vs sourceEng Lead + PM
GAQuery approach published; all squads socializedOngoingGates sustained 2 weeks; squads can push events in Q4PM + BI/Data

11. Dependencies

DependencyOwning TeamDeliverable NeededBlocking?
Joint Eng + Data RFC approvalEng (CDP) + BI/DataApproved storage model, streaming infra, query approachYES
Streaming/queue infra capacityInfra/PlatformConfirmed capacity for <5s, zero-loss bufferYES
qontak_customer_id / ContactResolverCDP SquadStable identity-resolution key available to all publishersYES
Marketing event emissionBroadcast Squad (SUPPORT PRD)Broadcast + Ads events pushed to the schema this PRD definesNO (parallel)
Data governance / PII policySecurity / Data GovSign-off on retention + maskingYES

12. Key Decisions + Alternatives Rejected

12a — Decisions Made

DateDecisionRationale
2026-06-26Standardized reusable event schema, push model keyed on qontak_customer_idSquads self-serve like Mixpanel; avoids per-squad integrations
2026-06-26CDP customer-data + Marketing are the first two sources in Q3Aligns with re-prioritized Primary use-case
2026-06-26Joint RFC & assessment on both Eng and Data sides before buildStorage/query design is data-heavy; needs both perspectives
2026-06-26Retriever layer materializes reusable metrics in the DBDownstream features query metrics directly; avoids recompute per feature

12b — Alternatives Rejected

AlternativeWhy RejectedDate
Point-to-point integrations per consumerDoesn't scale; inconsistent; re-paid per squad2026-06-26
Polling module APIsCannot meet <5s; couples CEBE to module APIs2026-06-26
Compute all metrics on-demand (no retriever materialization)Slow for UI (<1–2s) and repeated cost across features2026-06-26

13. Open Questions

#TypeQuestionOwnerDeadline
1Open QuestionStorage model: event-level log vs aggregated vs hybrid?BI/Data + Eng (CDP)2026-07-15
2Open QuestionStreaming infra (Kafka/PubSub) needed, or current pipeline sufficient for <5s + zero loss?Eng (CDP) + Infra2026-07-15
3Open QuestionQuery approach: pre-aggregated vs on-the-fly — speed vs storage cost trade-off?BI/Data + Eng (CDP)2026-08-15
4AssumptionBackfill of last 12 months for CDP is feasible within Q3 capacityEng (CDP)2026-07-31
5RiskHigh write volume breaches <5s. Mitigation: pre-size DB; async queue bufferEng (Infra)2026-08-31
6RiskDuplicate identities / late events corrupt profiles. Mitigation: idempotency + dedup at adapterEng (CDP)2026-08-31

14. API & Event Behavior

Behavioral contracts in plain language — HTTP methods, schemas, and topic design are resolved in the RFC. The standard event envelope and the full per-module event list live in the anchor: Module Event Catalog — not duplicated here.

#BehaviorEntity AffectedTriggered ByExpected BehaviorFailure Behavior
1Standardized event ingestion (push)CEBE Central DB (new event records)Any squad publishing a standardized customer event• Adapter resolves identity via ContactResolver (qontak_customer_id)
• Validates schema; dedups via idempotency key; writes to Central DB
• Available downstream in < 5s p99
• Invalid schema → rejected + cebe_event_rejected logged
• Duplicate idempotency key → no-op (idempotent)
• Downstream/DB unavailable → buffered in async queue, retried (no loss)
2CDP customer-data-change emission (first source)CEBE customer records / segment membershipcustomer.created/updated, segment.entered/exited in CDP (event list)• CDP changes propagate to CEBE in <5s; profile + segment state updated
• Backfill loads last 12 months on first connect
• Emission failure → retried; reconciliation job catches gaps (zero loss)
• Conflicting/duplicate identity → dedup on qontak_customer_id
3Retriever metrics + query approachMaterialized metrics / queryable views in Central DBNew events ingested (recompute) + downstream queries• Retriever computes reusable metrics (e.g. % reply rate, % ads conversion)
• Documented query approach for segmentation + marketing automation
• Aggregated customer query < 1–2s
• Stale metric beyond freshness SLA → flagged; recompute triggered
• Query timeout → returns last materialized value + freshness timestamp

Claude to resolve during RFC: transport (topic/endpoint), schema registry, retry/DLQ (B1); CDC vs explicit emit, backfill batching (B2); materialization strategy, query interface, freshness SLA (B3).


15. System Flow + User Stories + ACs

15.1 System Flow

Flow: Standardized customer event ingestion into CEBE · Type: Integration Flow

  1. A squad (CDP first) emits a standardized customer event with qontak_customer_id.
  2. Event Adapter (contact-service) receives it; resolves identity via ContactResolver.
  3. Adapter validates schema + checks idempotency key (dedup).
  4. Valid event written to Central DB; PII fields masked per policy. (<5s p99)
  5. Retriever layer recomputes affected reusable metrics (reply rate, ads conversion, etc.).
  6. Downstream features query the retriever / Central DB via the documented query approach.
  7. Failure: invalid schema → rejected + logged; duplicate → idempotent no-op; DB unavailable → buffered in async queue + retried; reconciliation job ensures zero loss.

Architecture & ingestion diagram: see the anchor — 5. Architecture.

15.2 User Stories

User StoryImportanceMockupTechnical NotesAcceptance Criteria
[CEBE-CORE-S01-REG] — Source modules keep working while CEBE ingests in parallel

As an existing CDP/Broadcast user, I want my module to behave exactly as before while CEBE captures events in the background, so that the foundation work is transparent and introduces no regression.
Must Have— (no UI)Data Fields:
• Standard event envelope — see anchor → Module Event Catalog (not duplicated)
• CDP event list — see anchor → 6. CDP Module
• Flag: cebe_event_ingestion (per §6, default OFF)

Before-After Behavior: Before — modules read/write their own DBs; no event push exists. After — same module behavior for users; CEBE ingestion runs in parallel with no destructive source change.
— Happy Path —
• AC-1: Given cebe_event_ingestion is ON for CDP (per §6 Feature flag), when an admin creates/updates a customer, then the CDP module behaves exactly as before AND a customer.created/customer.updated event reaches CEBE in <5s.
• AC-2: Given the same idempotency key is delivered twice (retry/replay), when the adapter processes both, then only one record exists in Central DB.
• AC-3 (volume/boundary): Given a first-connect 12-month backfill of a high-volume source, when the backfill job runs, then it processes in batches without breaching the <5s SLA for live events, AND cebe_backfill_progress reports percent_complete to 100% with zero loss.

— Error / Unhappy Path —
• ERR-1: Given the Central DB is temporarily unavailable, when events are emitted, then events are buffered and retried with zero loss, AND the source module is not blocked or degraded.
• ERR-2: Given cebe_event_ingestion is OFF for a source, when that source emits changes, then no events are ingested for that source AND the source module behaves normally.

— Permission Model —
• CAN: system emission (no user-facing permission change)
• CANNOT: end users trigger ingestion directly
• PII: masked at adapter per role-based policy

— UI States —
• N/A — backend/data initiative; any UI change is a regression, not expected behavior.

Dependencies: Joint Eng + Data RFC; ContactResolver (qontak_customer_id).


PRD CHANGELOG

VersionDateBySectionTypeSummary
1.02026-06-26ClaudeAllCREATEDPhase 1 (Q3 2026) lead TECH PRD: schema, Central DB, adapter, retriever, query approach; CDP customer-data + segment events as first source.
1.12026-06-26ClaudeS1, S4, S14, S15MODIFIEDTightened one-liner; N/A plan/tier; duplicate-identity failure mode; volume/boundary backfill AC.
1.22026-06-26ClaudeHeaderMODIFIEDLinked Epic TF-3302; Status DRAFT → READY.
1.32026-06-26ClaudeAllMODIFIEDNewest write-prd skill: added Scope Changes; 5-column story table; linked event lists to anchor.
1.42026-06-26ClaudeCB, S6, S7, S8, S9, S14, S15MODIFIEDConverted fenced code-block sections to tables/lists per skill. Committed to documents repo as .md.