SlideShare a Scribd company logo
Netflix Data Mesh
Composable Data Processing
jcunningham@netflix.com
Netflix Streaming &
Keystone
Cross-platform
Eventing
Netflix Streaming &
Keystone
Keystone Platform
More than 150M Global Members
Trillions of Messages / Petabytes a Day
A high level view of Netflix’s Studio Structure
Airport: Netflix
Air Traffic Control: Studio
Airplanes: Productions
Credit: Christopher Goss, Netflix
Studio Productions
Production
Company D Production
Company E
Studio B
Studio
A
Production
Company F
Studio C
Parent Studio
(many airports, many airplanes per airport)
Netflix
(one large airport, huge # of airplanes)
A Disconnected Studio
Data Mesh: Composable
Data Processing
Data Transport
Problems
Significant duplication of
effort across pipelines and
teams.
Delay in bringing online new
pipelines and increasing
maintenance overhead from
existing pipeline.
Uneven implementation of
best practices.
Need for lower latency data
transportation and
warehousing for operational
reporting.
Correctness issues related
to distributed systems error
recovery.
Data Mesh: Composable
Data Processing
Flink Processing
RDS
Cassandra
Airtable
Logging Data
…
RDS
Cassandra
S3 Data Warehouse
Elastic Search
…
Extract Transform Load
Data Mesh: Composable
Data Processing
Stream 1
Stream 2
Stream 3
Stream 4
Catalog
EV Cache
ES
S3
Service
RDS
Cassandra
Stream Processor
SourceConnector
SourceConnector
Sources Sinks
SinkConnector
SinkConnector
SinkConnector
Out
In
(Avro)
Stream 1
Stream 2
Stream 1
Stream Processor
Stream Processor
Streams
Sinks
Data Mesh: Composable
Data Processing
Data Mesh Platform
Data Mesh
Pipelines
Data Mesh
Sources
Data Mesh Flink
Processors
GraphQL
Configuration
Iceberg Sink
Configuration
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - Justin Cunningham
Data Mesh: Composable
Data Processing
Source
Database
DB CDC Source
Connector
DB Change
Stream
CDC Flink Auditor
GraphQL Flink
Processor
Enriched Stream
Iceberg Sink
Flink Processor
Iceberg
S3 Data
GraphQL Flink
Auditor
Batch Iceberg
Auditor
Data Mesh
Schema
Evolution
Data Mesh: Composable
Data Processing
Overall Schema
Evolution Approach
Apache Avro
schema format
Stream
processors are
deployed with
fixed input and
output schemas
Schema changes
are managed by
redeploying with
new fixed input
and output
schemas
Processors can
opt-in to
Automatic
schema upgrades
Most schema
changes don’t
require a topic
change
Data Mesh: Composable
Data Processing
Data Mesh Controller
DB CDC Source
Connector
GraphQL Flink
Processor
Iceberg Sink
Flink Processor
Iceberg
S3 Data
Data Mesh
Batch & Stream
Convergence
Data Mesh: Composable
Data Processing
Physical Data Mesh Storage
id: name
1: id
2: first
3: last
Physical S3 Storage
id
1
2
3
Iceberg Data
id: name
1: id
2: first
3: last
Logical Iceberg
Avro Data Mesh Topic Avro Iceberg Sink
Data Mesh: Composable
Data Processing
Physical Data Mesh Storage
id: name
1: id
2: first
3: last
4: city
Physical S3 Storage
id
1
2
3
4
Iceberg Data
id: name
1: id
2: first
3: last
Logical Iceberg
Avro Data Mesh Topic Avro Iceberg Sink
Data Mesh: Composable
Data Processing
id: name
1: id
2: first
3: last
Physical Data Mesh Storage
id: name
1: id
2: first
3: last
4: city
Physical S3 Storage
id
1
2
3
4
Iceberg Data
id: name
1: id
2: first
3: last
4: city
Logical Iceberg
Avro Data Mesh Topic Avro Iceberg Sink
Data Mesh: Composable
Data Processing
Physical Data Mesh Storage
id: name
1: id
2: first_name
3: last_name
4: city
Physical S3 Storage
id
1
2
3
4
Iceberg Data
id: name
1: id
2: first
3: last
4: city
Logical Iceberg
Avro Data Mesh Topic Avro Iceberg Sink
id: name
1: id
2: first_name
3: last_name
4: city
Data Mesh: Composable
Data Processing
Physical Data Mesh Storage
id: name
1: id
2: first_name
4: city
5: last
Physical S3 Storage
id
1
2
3
4
5
Iceberg Data
id: name
1: id
2: first_name
4: city
5: last
id: name
1: id
2: first_name
3: last_name
4: city
id: name
1: id
2: first
3: last
4: city
Logical Iceberg
Avro Data Mesh Topic Avro Iceberg Sink
Data Mesh: Composable
Data Processing
Data Mesh: Composable
Data Processing
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - Justin Cunningham
Questions?
jcunningham@netflix.com

More Related Content

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - Justin Cunningham