[PRD] CEBE | Core Infrastructure & CDP Customer Event — Phase 1 (TECH)
HEADER BLOCK
| Field | Value |
|---|---|
| PM | Zhelia Alifa |
| PRD Version | 1.4 |
| Status | READY |
| PRD Type | TECH |
| Epic | TF-3302 |
| Squad | CDP Squad (lead) + BI/Data Squad (co-own) |
| RFC Link | REQUIRED before BUILD — joint Eng + Data RFC (Q3 scope) |
| Figma Master | N/A — no UI changes |
| Anchor | Yes — Customer Event-Based Engine (CEBE) — ANCHOR |
| Phase | Phase 1 of 3 (Q3 2026) |
| Labels | epic:cdp | module:platform | feature:cebe-core-infra |
| Last Updated | 2026-06-26 |
READY gate: Epic cannot move to In Progress without PRD Link + RFC Link. RFC is critical — it details the implementation approach the PM is not prescribing.
CONDITIONAL BLOCK: TECH CONTEXT
| Field | Detail |
|---|---|
| Problem (technical) | Qontak has no shared, event-driven customer data layer. Each module writes its own DB, so there is no standardized way for any squad to publish customer events, no single Central DB, and no reusable metrics/segments. |
| Expected outcome | A live event-driven foundation by end of Q3 2026: (1) a standardized, reusable event schema any squad can push to (keyed on qontak_customer_id via ContactResolver — "push like Mixpanel"); (2) Central DB schema; (3) event adapter layer (contact-service); (4) retriever/metrics layer; (5) a documented query approach for downstream features. CDP customer-data-change + segment events are the first source. Propagation < 5s (p99), zero event loss. |
| Scope — PM-owned | • Standardized reusable event schema published & adopted (CDP + Marketing) • Central DB schema delivered • Retriever layer produces reusable metrics (e.g. % reply rate, % ads conversion) • Query approach documented for segmentation + marketing automation • <5s (p99) propagation SLA; zero event loss; idempotency on qontak_customer_id• CDP emits customer.created/updated + segment.entered/exited as first source• Socialize to all squads by end of Q3 |
| Scope — Eng-owned (PM does not prescribe — joint Eng + Data RFC details these) | • Storage model choice (event-level log vs aggregated vs hybrid) • Streaming infra choice (Kafka / PubSub / existing pipeline) • Adapter implementation, partitioning, indexing strategy • Retriever computation strategy (batch vs streaming vs on-demand) |
| User-facing UI changes | None. This is a backend/data foundation. Any UI change is out of scope. |
1. One-liner + Problem
One-liner: Build CEBE's event-driven foundation — reusable event schema, Central DB, adapter, retriever, and query layer — with CDP customer-data events as the first source.
Problem: There is no shared event layer, so no squad can publish customer events in a standard way and no Central DB exists as a single source of truth. Every intelligence feature (segmentation, marketing automation, AI memory) re-solves cross-module data access from scratch, which is slow, inconsistent, and unscalable. Without this Q3 foundation, the entire CEBE initiative and all downstream H2 2026 features are blocked.
2. What Happens If We Don't Build This
- Blocks the whole initiative. Phase 2 (Q4) and Phase 3 (Q1'27) cannot start — no foundation to push events onto.
- No reusable metrics or segments. Marketing and segmentation stay manual; reply-rate and ads-conversion can't be computed centrally.
- Per-squad re-implementation cost compounds every quarter the shared layer is absent.
3. Target Users + Persona Context
| Persona | Role | Impact | Current State |
|---|---|---|---|
| Indirect — All product squads + Business Admin / Marketing Owner | Squads that will publish customer events; admins/marketers who consume segments & metrics | Squads get a standard, self-serve way to push events; consumers get fresh, unified data | No standard event contract; siloed per-module DBs; stale, manually reconciled data |
4. Non-Goals
- Does not build the segmentation UI, marketing automation UI, AI memory, or health score — these are downstream consumers in later work.
- Does not connect Communication, Ticket, Loyalty, Deal, Commerce, or Booking events — those are Phase 2 / Phase 3.
- Does not change any module's existing user-facing behavior.
- Does not define a customer-facing API; scope is the internal event contract + Central DB + retriever/query layer.
- Does not own Marketing (Broadcast + Ads) event emission — that is the Broadcast squad's SUPPORT PRD; this PRD provides the schema + adapter they push to.
5. Scope Changes
Engineering surfaces this PRD touches (controlled vocab). Kept in sync with the scope_changes frontmatter above.
- Backend —
contact-service: new Event Adapter Layer (ingestion consumer/endpoint),ContactResolveridentity resolution keyed onqontak_customer_id, idempotency + duplicate-identity dedup, PII masking at the adapter; CDP emitscustomer.created/customer.updated/segment.entered/segment.exitedas the first source. - Data — Central DB schema (event-level log + customer-level aggregation), retriever/metrics materialization (e.g.
% reply rate,% ads conversion), documented query approach for segmentation + marketing automation, 12-month historical backfill, joint BI/Data feasibility assessment. - Infra — streaming/queue capacity + async buffer to hold the
<5sp99 propagation SLA with zero loss; monitoring & alerting on propagation latency and loss reconciliation. - Docs — standardized event-schema spec + event-push guide socialized to all squads at end of Q3.
- Frontend / Mobile / Design — None (no UI surface in this phase).
6. Constraints
| Constraint | Detail |
|---|---|
| Platform | Backend / data platform (contact-service + Central DB). No UI surface. |
| Performance target | Event propagation < 5s p99 (source → Central DB). Aggregated customer query < 1–2s for UI consumers. |
| Uptime / SLA | Zero data loss across module event streams. |
| Backward compat | Event schema is additive/versioned — schema changes must not break existing publishers. |
| Plan / tier | N/A — internal platform foundation; no plan/tier gating on this initiative. |
| Data integrity | Idempotency via qontak_customer_id; dedup of duplicate identities (phone/email); handle late-arriving events. |
| Security | PII masked at the adapter layer per role-based access policy; team-level scope. |
| Feature flag | cebe_event_ingestion | default: OFF (per-source enablement) — CDP + Marketing first. |
| Backfill | On first integration, backfill last 12 months of historical events. |
6.7 Data Lifecycle
| Artifact Type | Retention Period | Cleanup Trigger | User-Visible Effect |
|---|---|---|---|
| Raw event log (append-only) | Per Data governance (TBD in RFC — default 24 months) | Partition expiry job by time | None |
| Backfill processing job artifacts | 7 days | Cleanup job on backfill completion | None |
| Aggregated/materialized metrics | Live (refreshed) | Recompute on new events | None |
7. Rollout
| Aspect | Detail |
|---|---|
| Feature flag | cebe_event_ingestion | default: OFF (per source: CDP, then Marketing) |
| Rollout | Stage 1 → Staging: schema + adapter + Central DB validated with synthetic events Stage 2 → Canary: CDP events from N pilot accounts; verify <5s + zero loss Stage 3 → CDP events all accounts + Marketing (Broadcast+Ads) events on GA → Retriever metrics + query approach published; socialize to all squads |
| Backward compat | Old per-module DBs remain untouched; CEBE runs alongside (read path additive). |
| Migration | Backfill last 12 months for CDP on first connect (see RFC for backfill + rollback). |
| Rollback plan | Flag OFF stops ingestion per source within minutes; no destructive change to source modules. |
7.4 Migration Transition Window
| Aspect | Detail |
|---|---|
| Old behavior | Consumers read per-module DBs directly (existing behavior unchanged). |
| New behavior | Consumers read unified data from CEBE Central DB / retriever layer. |
| Coexistence period | Through Q3–Q4 while modules onboard; both paths valid. |
| End state | Once a module is on CEBE, downstream consumers migrate to CEBE reads for that domain. |
8. Observability
Key Events
| Event Name | Trigger | Properties |
|---|---|---|
cebe_event_ingested | Event written to Central DB | source_module, event_name, qontak_customer_id, latency_ms, timestamp |
cebe_event_rejected | Event fails validation/dedup | source_module, reason, qontak_customer_id |
cebe_propagation_slo_breach | Propagation > 5s | source_module, latency_ms |
cebe_backfill_progress | Backfill batch processed | source_module, records_processed, percent_complete |
cebe_retriever_metric_computed | Metric recomputed | metric_name, compute_time_ms |
Dashboard & Alerts
| Aspect | Detail |
|---|---|
| Dashboard owner | BI/Data Squad (initiative) + CDP Squad (adapter/service) |
| Baseline period | Capture ingestion latency baseline during staging/canary before GA. |
| Alerts | • Propagation latency p99 > 5s for 5 min → PagerDuty on-call (CDP) • Event rejection rate > 1% in 10 min → PagerDuty on-call (CDP) • Zero-loss check mismatch (source count vs CEBE count) → Engineering escalation same day |
8.5 Post-Launch Monitoring Cadence
| Aspect | Detail |
|---|---|
| Review cadence | Weekly for first 4 weeks post-GA, then monthly |
| Owner | CDP PM + BI/Data lead |
| Review scope | Propagation latency, rejection rate, zero-loss reconciliation, retriever freshness |
| Trigger thresholds | • p99 latency > 5s for 2 consecutive days → investigation • Loss reconciliation mismatch > 0 → same-day engineering escalation |
| Rollback consideration | If ingestion causes source-module degradation, flag OFF for affected source and investigate. |
9. Success Metrics
| Category | Metric | Definition | Baseline | Target |
|---|---|---|---|---|
| Performance | ⭐ Event propagation latency (p99) | Time from source event emission to availability in Central DB | N/A — new | < 5s p99 by end of Q3 2026 |
| Reliability & Quality | Zero event loss | Source event count vs CEBE-ingested count reconciliation | N/A | 0 lost events — ongoing |
| Reliability & Quality | First-source coverage | CDP customer + Marketing (Broadcast+Ads) events live on CEBE | 0 | Both sources live by end of Q3 2026 |
| Enablement | Reusable metrics & query approach available | Reply rate, ads conversion, use-case segments queryable from CEBE | N/A | Published + documented for segmentation/MA by end of Q3 2026 |
| Enablement | Squad adoption readiness | Standardized event-push schema + guide socialized to all squads | 0 | All squads onboarded to push their domain events starting Q4 2026 |
10. Launch Plan & Stage Gates
| Stage | Audience | Duration | Success Gate to Advance | Owner |
|---|---|---|---|---|
| Internal Staging | Staging + synthetic events | ~2 weeks | Schema + adapter + DB validated; <5s on synthetic load | PM + Eng (CDP) + BI/Data |
| Canary | CDP events, N pilot accounts | ~2 weeks | <5s p99, zero loss, idempotency verified | Eng Lead (CDP) |
| Staged Rollout | CDP all accounts + Marketing source | ~2–3 weeks | SLA sustained; retriever metrics correct vs source | Eng Lead + PM |
| GA | Query approach published; all squads socialized | Ongoing | Gates sustained 2 weeks; squads can push events in Q4 | PM + BI/Data |
11. Dependencies
| Dependency | Owning Team | Deliverable Needed | Blocking? |
|---|---|---|---|
| Joint Eng + Data RFC approval | Eng (CDP) + BI/Data | Approved storage model, streaming infra, query approach | YES |
| Streaming/queue infra capacity | Infra/Platform | Confirmed capacity for <5s, zero-loss buffer | YES |
qontak_customer_id / ContactResolver | CDP Squad | Stable identity-resolution key available to all publishers | YES |
| Marketing event emission | Broadcast Squad (SUPPORT PRD) | Broadcast + Ads events pushed to the schema this PRD defines | NO (parallel) |
| Data governance / PII policy | Security / Data Gov | Sign-off on retention + masking | YES |
12. Key Decisions + Alternatives Rejected
12a — Decisions Made
| Date | Decision | Rationale |
|---|---|---|
| 2026-06-26 | Standardized reusable event schema, push model keyed on qontak_customer_id | Squads self-serve like Mixpanel; avoids per-squad integrations |
| 2026-06-26 | CDP customer-data + Marketing are the first two sources in Q3 | Aligns with re-prioritized Primary use-case |
| 2026-06-26 | Joint RFC & assessment on both Eng and Data sides before build | Storage/query design is data-heavy; needs both perspectives |
| 2026-06-26 | Retriever layer materializes reusable metrics in the DB | Downstream features query metrics directly; avoids recompute per feature |
12b — Alternatives Rejected
| Alternative | Why Rejected | Date |
|---|---|---|
| Point-to-point integrations per consumer | Doesn't scale; inconsistent; re-paid per squad | 2026-06-26 |
| Polling module APIs | Cannot meet <5s; couples CEBE to module APIs | 2026-06-26 |
| Compute all metrics on-demand (no retriever materialization) | Slow for UI (<1–2s) and repeated cost across features | 2026-06-26 |
13. Open Questions
| # | Type | Question | Owner | Deadline |
|---|---|---|---|---|
| 1 | Open Question | Storage model: event-level log vs aggregated vs hybrid? | BI/Data + Eng (CDP) | 2026-07-15 |
| 2 | Open Question | Streaming infra (Kafka/PubSub) needed, or current pipeline sufficient for <5s + zero loss? | Eng (CDP) + Infra | 2026-07-15 |
| 3 | Open Question | Query approach: pre-aggregated vs on-the-fly — speed vs storage cost trade-off? | BI/Data + Eng (CDP) | 2026-08-15 |
| 4 | Assumption | Backfill of last 12 months for CDP is feasible within Q3 capacity | Eng (CDP) | 2026-07-31 |
| 5 | Risk | High write volume breaches <5s. Mitigation: pre-size DB; async queue buffer | Eng (Infra) | 2026-08-31 |
| 6 | Risk | Duplicate identities / late events corrupt profiles. Mitigation: idempotency + dedup at adapter | Eng (CDP) | 2026-08-31 |
14. API & Event Behavior
Behavioral contracts in plain language — HTTP methods, schemas, and topic design are resolved in the RFC. The standard event envelope and the full per-module event list live in the anchor: Module Event Catalog — not duplicated here.
| # | Behavior | Entity Affected | Triggered By | Expected Behavior | Failure Behavior |
|---|---|---|---|---|---|
| 1 | Standardized event ingestion (push) | CEBE Central DB (new event records) | Any squad publishing a standardized customer event | • Adapter resolves identity via ContactResolver (qontak_customer_id)• Validates schema; dedups via idempotency key; writes to Central DB • Available downstream in < 5s p99 | • Invalid schema → rejected + cebe_event_rejected logged• Duplicate idempotency key → no-op (idempotent) • Downstream/DB unavailable → buffered in async queue, retried (no loss) |
| 2 | CDP customer-data-change emission (first source) | CEBE customer records / segment membership | customer.created/updated, segment.entered/exited in CDP (event list) | • CDP changes propagate to CEBE in <5s; profile + segment state updated • Backfill loads last 12 months on first connect | • Emission failure → retried; reconciliation job catches gaps (zero loss) • Conflicting/duplicate identity → dedup on qontak_customer_id |
| 3 | Retriever metrics + query approach | Materialized metrics / queryable views in Central DB | New events ingested (recompute) + downstream queries | • Retriever computes reusable metrics (e.g. % reply rate, % ads conversion) • Documented query approach for segmentation + marketing automation • Aggregated customer query < 1–2s | • Stale metric beyond freshness SLA → flagged; recompute triggered • Query timeout → returns last materialized value + freshness timestamp |
Claude to resolve during RFC: transport (topic/endpoint), schema registry, retry/DLQ (B1); CDC vs explicit emit, backfill batching (B2); materialization strategy, query interface, freshness SLA (B3).
15. System Flow + User Stories + ACs
15.1 System Flow
Flow: Standardized customer event ingestion into CEBE · Type: Integration Flow
- A squad (CDP first) emits a standardized customer event with
qontak_customer_id. - Event Adapter (contact-service) receives it; resolves identity via ContactResolver.
- Adapter validates schema + checks idempotency key (dedup).
- Valid event written to Central DB; PII fields masked per policy. (<5s p99)
- Retriever layer recomputes affected reusable metrics (reply rate, ads conversion, etc.).
- Downstream features query the retriever / Central DB via the documented query approach.
- Failure: invalid schema → rejected + logged; duplicate → idempotent no-op; DB unavailable → buffered in async queue + retried; reconciliation job ensures zero loss.
Architecture & ingestion diagram: see the anchor — 5. Architecture.
15.2 User Stories
| User Story | Importance | Mockup | Technical Notes | Acceptance Criteria |
|---|---|---|---|---|
| [CEBE-CORE-S01-REG] — Source modules keep working while CEBE ingests in parallel As an existing CDP/Broadcast user, I want my module to behave exactly as before while CEBE captures events in the background, so that the foundation work is transparent and introduces no regression. | Must Have | — (no UI) | Data Fields: • Standard event envelope — see anchor → Module Event Catalog (not duplicated) • CDP event list — see anchor → 6. CDP Module • Flag: cebe_event_ingestion (per §6, default OFF)Before-After Behavior: Before — modules read/write their own DBs; no event push exists. After — same module behavior for users; CEBE ingestion runs in parallel with no destructive source change. | — Happy Path — • AC-1: Given cebe_event_ingestion is ON for CDP (per §6 Feature flag), when an admin creates/updates a customer, then the CDP module behaves exactly as before AND a customer.created/customer.updated event reaches CEBE in <5s.• AC-2: Given the same idempotency key is delivered twice (retry/replay), when the adapter processes both, then only one record exists in Central DB. • AC-3 (volume/boundary): Given a first-connect 12-month backfill of a high-volume source, when the backfill job runs, then it processes in batches without breaching the <5s SLA for live events, AND cebe_backfill_progress reports percent_complete to 100% with zero loss.— Error / Unhappy Path — • ERR-1: Given the Central DB is temporarily unavailable, when events are emitted, then events are buffered and retried with zero loss, AND the source module is not blocked or degraded. • ERR-2: Given cebe_event_ingestion is OFF for a source, when that source emits changes, then no events are ingested for that source AND the source module behaves normally.— Permission Model — • CAN: system emission (no user-facing permission change) • CANNOT: end users trigger ingestion directly • PII: masked at adapter per role-based policy — UI States — • N/A — backend/data initiative; any UI change is a regression, not expected behavior. |
Dependencies: Joint Eng + Data RFC; ContactResolver (qontak_customer_id).
PRD CHANGELOG
| Version | Date | By | Section | Type | Summary |
|---|---|---|---|---|---|
| 1.0 | 2026-06-26 | Claude | All | CREATED | Phase 1 (Q3 2026) lead TECH PRD: schema, Central DB, adapter, retriever, query approach; CDP customer-data + segment events as first source. |
| 1.1 | 2026-06-26 | Claude | S1, S4, S14, S15 | MODIFIED | Tightened one-liner; N/A plan/tier; duplicate-identity failure mode; volume/boundary backfill AC. |
| 1.2 | 2026-06-26 | Claude | Header | MODIFIED | Linked Epic TF-3302; Status DRAFT → READY. |
| 1.3 | 2026-06-26 | Claude | All | MODIFIED | Newest write-prd skill: added Scope Changes; 5-column story table; linked event lists to anchor. |
| 1.4 | 2026-06-26 | Claude | CB, S6, S7, S8, S9, S14, S15 | MODIFIED | Converted fenced code-block sections to tables/lists per skill. Committed to documents repo as .md. |