Skip to content
Case · Web3M
Data Platform · 2023

A customer-data platform that respects on-chain identity.

Web3M operates a monetization layer for Web3 companies — protocols that need to understand who their users are without ever holding personally identifying data.

Web3M — cover artifact
fig. 01 — web3m · data platform · 2023
01
The brief

Context, constraint, and the part that mattered.

The existing analytics stack was a stitched-together combination of off-the-shelf SaaS and ad-hoc scripts, and it had reached the point where it could neither answer a new question in less than a week nor handle the protocol's growth in event volume. The brief was to design and build a customer-data platform from scratch, native to Web3 identity primitives — wallets, ENS, on-chain reputation — and operable by a small in-house data team. We spent three weeks on architecture before writing production code: the data model, the ingestion contract, the query surface, and the trade-offs that would govern every later decision were all written down and agreed before the first commit. The result is a platform that ingests several billion events per month, answers product-team questions in seconds rather than days, and is operated by two in-house engineers with no support contract from us.

02
Approach

What we did, in the order we did it.

  1. 01

    An ingestion contract, not an SDK

    Rather than ship a bespoke SDK to every protocol, we defined a single ingestion contract — a stable schema that any source can produce into. Each protocol's adapter is a small, well-isolated module that translates its native event shape into the contract. New protocols are onboarded in days rather than quarters, and the platform itself never has to change to accept them. This is the single decision that has paid the most dividends since launch.

  2. 02

    A query surface tuned for the team that asks questions

    The query layer is built on ClickHouse with a thin GraphQL surface in front of it. Product analysts use the GraphQL surface; data engineers drop into raw SQL when they need to. Both surfaces share the same semantic layer, so a metric defined once is consistent across both. There is no separate 'business-intelligence layer' to drift out of sync — a category of bug we have spent enough careers chasing to refuse to introduce again.

  3. 03

    Identity without PII

    Every event is keyed by a wallet address, optionally enriched with on-chain reputation signals, and never joined to off-chain personally identifying data. The platform has no concept of an email address. Compliance review treated this as a feature rather than a constraint, and it has materially shortened the platform's path through customer security reviews.

  4. 04

    Operated by two engineers

    The platform was designed from day one to be operated by a small in-house team. Every component has a documented runbook, a documented failure mode, and a documented rollback. The on-call rotation is two people deep and has not paged outside business hours since the third month after launch. We treat that statistic as the most honest measure of the engineering quality of the system.

04
Outcome

The numbers we agreed to ship against.

M01ingested at steady state
3.1B

Events per month

M02across 14 data sources
<200ms

P95 query latency

M03ingested via a single adapter contract
11

Web3 protocols

We went from a question-takes-a-week to a question-takes-a-minute. The platform is boring in exactly the right way.
Daniel Kogan · Head of Data · Web3M

Have a project that looks like this?

Send a short brief — we'll reply with concrete next steps. New engagements are limited each quarter.