Real-time Analytics with Cassandra, Spark, and Shark

Real-time Analytics with
Cassandra, Spark and Shark
Tuesday, June 18, 13

Who is this guy
• Staff Engineer, Compute and Data Services, Ooyala
• Building multiple web-scale real-time systems on top of C*, Kafka,
Storm, etc.
• Scala/Akka guy
• Very excited by open source, big data projects - share some today
• @evanfchan

Agenda
• Ooyala and Cassandra

Agenda
• What problem are we trying to solve?

Agenda
• Spark and Shark

Agenda
• Spark and Shark
• Our Spark/Cassandra Architecture

Agenda
• Spark and Shark
• Our Spark/Cassandra Architecture
• Demo

Cassandra at Ooyala
Who is Ooyala, and how we use Cassandra

CONFIDENTIAL—DO NOT DISTRIBUTE
OOYALA
Powering personalized video
experiences across all screens.
5

CONFIDENTIAL—DO NOT DISTRIBUTE 6CONFIDENTIAL—DO NOT DISTRIBUTE
Founded in 2007
Commercially launch in 2009
230+ employees in Silicon Valley, LA, NYC,
London, Paris, Tokyo, Sydney & Guadalajara
Global footprint, 200M unique users,
110+ countries, and more than 6,000 websites
Over 1 billion videos played per month
and 2 billion analytic events per day
25% of U.S. online viewers watch video
powered by Ooyala
COMPANY OVERVIEW

CONFIDENTIAL—DO NOT DISTRIBUTE 7
TRUSTED VIDEO PARTNER
STRATEGIC PARTNERS
CUSTOMERS
CONFIDENTIAL—DO NOT DISTRIBUTE

We are a large Cassandra user

• 11 clusters ranging in size from 3 to 36 nodes

• Total of 28TB of data managed over ~85 nodes

• Over 2 billion C* column writes per day

• Powers all of our analytics infrastructure

• Powers all of our analytics infrastructure
• Much much bigger cluster coming soon

What problem are we trying to
solve?
Lots of data, complex queries, answered really quickly... but how??

From mountains of useless data...

To nuggets of truth...

• Quickly

• Quickly
• Painlessly

• Quickly
• Painlessly
• At scale?

Today: Precomputed aggregates

• Video metrics computed along several high cardinality dimensions

• Very fast lookups, but inﬂexible, and hard to change

• Most computed aggregates are never read

• Most computed aggregates are never read
• What if we need more dynamic queries?
– Top content for mobile users in France
– Engagement curves for users who watched recommendations
– Data mining, trends, machine learning

The static - dynamic continuum
• Super fast lookups
• Inﬂexible, wasteful
• Best for 80% most
common queries
• Always compute results
from raw data
• Flexible but slow
100% Precomputation 100% Dynamic

The static - dynamic continuum
• Super fast lookups
• Inﬂexible, wasteful
• Best for 80% most
common queries
• Always compute results
from raw data
• Flexible but slow
100% Precomputation100% Dynamic

Where we want to be
Partly dynamic
• Pre-aggregate most
common queries
• Flexible, fast dynamic
queries
• Easily generate many
materialized views

Industry Trends

Industry Trends
• Fast execution frameworks
– Impala

Industry Trends
– Impala
• In-memory databases
– VoltDB, Druid

Industry Trends
– Impala
– VoltDB, Druid
• Streaming and real-time

Industry Trends
– Impala
– VoltDB, Druid
• Streaming and real-time
• Higher-level, productive data frameworks
– Cascading, Hive, Pig

Why Spark and Shark?
“Lightning-fast in-memory cluster computing”

Introduction to Spark

• In-memory distributed computing framework

• Created by UC Berkeley AMP Lab in 2010

• Targeted problems that MR is bad at:
– Iterative algorithms (machine learning)
– Interactive data mining

• More general purpose than Hadoop MR

• More general purpose than Hadoop MR
• Active contributions from ~ 15 companies

HDFS
Map
Reduce
Map
Reduce

HDFS
Map
Reduce
Map
Reduce
Data Source
map()
join()
Source 2

HDFS
Map
Reduce
Map
Reduce
Data Source
map()
join()
Source 2
cache()

HDFS
Map
Reduce
Map
Reduce
Data Source
map()
join()
Source 2
cache()
transform

Throughput: Memory is king
6-node C*/DSE 1.1.9 cluster,
Spark 0.7.0

Throughput: Memory is king
0 37500 75000 112500 150000
C*, cold cache
C*, warm cache
Spark RDD
Spark 0.7.0

Developers love it

Developers love it
• “I wrote my ﬁrst aggregation job in 30 minutes”

Developers love it
• High level “distributed collections” API

Developers love it
• No Hadoop cruft

Developers love it
• No Hadoop cruft
• Full power of Scala, Java, Python

Developers love it
• No Hadoop cruft
• Interactive REPL shell

Developers love it
• No Hadoop cruft
• EASY testing!!

Developers love it
• No Hadoop cruft
• EASY testing!!
• Low latency - quick development cycles

Spark word count example
1 package org.myorg;
2
3 import java.io.IOException;
4 import java.util.*;
5
6 import org.apache.hadoop.fs.Path;
7 import org.apache.hadoop.conf.*;
8 import org.apache.hadoop.io.*;
9 import org.apache.hadoop.mapreduce.*;
10 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
11 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
14
15 public class WordCount {
16
17 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
18 private final static IntWritable one = new IntWritable(1);
19 private Text word = new Text();
20
21 public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
22 String line = value.toString();
23 StringTokenizer tokenizer = new StringTokenizer(line);
24 while (tokenizer.hasMoreTokens()) {
25 word.set(tokenizer.nextToken());
26 context.write(word, one);
27 }
28 }
29 }
30
31 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
32
33 public void reduce(Text key, Iterable<IntWritable> values, Context context)
34 throws IOException, InterruptedException {
35 int sum = 0;
36 for (IntWritable val : values) {
37 sum += val.get();
38 }
39 context.write(key, new IntWritable(sum));
40 }
41 }
42
43 public static void main(String[] args) throws Exception {
44 Configuration conf = new Configuration();
45
46 Job job = new Job(conf, "wordcount");
47
48 job.setOutputKeyClass(Text.class);
49 job.setOutputValueClass(IntWritable.class);
50
51 job.setMapperClass(Map.class);
52 job.setReducerClass(Reduce.class);
53
54 job.setInputFormatClass(TextInputFormat.class);
55 job.setOutputFormatClass(TextOutputFormat.class);
56
57 FileInputFormat.addInputPath(job, new Path(args[0]));
58 FileOutputFormat.setOutputPath(job, new Path(args[1]));
59
60 job.waitForCompletion(true);
61 }
62
63 }

Spark word count example
file = spark.textFile("hdfs://...")

file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
1 package org.myorg;
2
3 import java.io.IOException;
4 import java.util.*;
5
6 import org.apache.hadoop.fs.Path;
7 import org.apache.hadoop.conf.*;
8 import org.apache.hadoop.io.*;
9 import org.apache.hadoop.mapreduce.*;
10 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
11 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
14
15 public class WordCount {
16
17 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
18 private final static IntWritable one = new IntWritable(1);
19 private Text word = new Text();
20
21 public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
22 String line = value.toString();
23 StringTokenizer tokenizer = new StringTokenizer(line);
24 while (tokenizer.hasMoreTokens()) {
25 word.set(tokenizer.nextToken());
26 context.write(word, one);
27 }
28 }
29 }
30
31 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
32
33 public void reduce(Text key, Iterable<IntWritable> values, Context context)
34 throws IOException, InterruptedException {
35 int sum = 0;
36 for (IntWritable val : values) {
37 sum += val.get();
38 }
39 context.write(key, new IntWritable(sum));
40 }
41 }
42
43 public static void main(String[] args) throws Exception {
44 Configuration conf = new Configuration();
45
46 Job job = new Job(conf, "wordcount");
47
48 job.setOutputKeyClass(Text.class);
49 job.setOutputValueClass(IntWritable.class);
50
51 job.setMapperClass(Map.class);
52 job.setReducerClass(Reduce.class);
53
54 job.setInputFormatClass(TextInputFormat.class);
55 job.setOutputFormatClass(TextOutputFormat.class);
56
57 FileInputFormat.addInputPath(job, new Path(args[0]));
58 FileOutputFormat.setOutputPath(job, new Path(args[1]));
59
60 job.waitForCompletion(true);
61 }
62
63 }

The Spark Ecosystem
Spark
Tachyon - in-memory caching DFS

The Spark Ecosystem
Bagel -
Pregel on
Spark
Spark

The Spark Ecosystem
Bagel -
Pregel on
Spark
HIVE on Spark
Spark

The Spark Ecosystem
Bagel -
Pregel on
Spark
HIVE on Spark
Spark Streaming -
discretized stream
processing
Spark

Shark - HIVE on Spark

• 100% HiveQL compatible

• 10-100x faster than HIVE, answers in seconds

• Reuse UDFs, SerDe’s, StorageHandlers

• Can use DSE / CassandraFS for Metastore

• Can use DSE / CassandraFS for Metastore
• Easy Scala/Java integration via Spark - easier than
writing UDFs

Our new analytics architecture
How we integrate Cassandra and Spark/Shark

From raw events to fast queries
Raw
Events
Raw
Events
Raw
Events

Ingestion
C*
event
store
Raw
Events
Raw
Events
Raw
Events

Ingestion
C*
event
store
Raw
Events
Raw
Events
Raw
Events
Spark
Spark
Spark
View 1
View 2
View 3

Ingestion
C*
event
store
Raw
Events
Raw
Events
Raw
Events
Spark
Spark
Spark
View 1
View 2
View 3
Spark
Predeﬁned
queries

Ingestion
C*
event
store
Raw
Events
Raw
Events
Raw
Events
Spark
Spark
Spark
View 1
View 2
View 3
Spark
Shark
Predeﬁned
queries
Ad-hoc
HiveQL

Our Spark/Shark/Cassandra Stack
Node1
Cassandra
Node2
Cassandra
Node3
Cassandra

Node1
Cassandra
InputFormat
SerDe
Node2
Cassandra
InputFormat
SerDe
Node3
Cassandra
InputFormat
SerDe

Node1
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Node2
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Node3
Cassandra
InputFormat
SerDe
Spark
Worker
Shark

Node1
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Node2
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Node3
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Spark Master

Node1
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Node2
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Node3
Cassandra
InputFormat
SerDe
Spark
Worker
Shark
Spark Master Job Server

Event Store Cassandra schema
t0 t1 t2 t3 t4
2013-04-05
T00:00Z#id1
{event0:
a0}
{event1:
a1}
{event2:
a2}
{event3:
a3}
{event4:
a4}
Event CF

Event Store Cassandra schema
t0 t1 t2 t3 t4
2013-04-05
T00:00Z#id1
{event0:
a0}
{event1:
a1}
{event2:
a2}
{event3:
a3}
{event4:
a4}
ipaddr:10.20.30.40:t1 videoId:45678:t1 providerId:500:t0
2013-04-05
T00:00Z#id1
Event CF
EventAttr CF

Unpacking raw events
t0 t1
2013-04-05
T00:00Z#id1
{video: 10,
type:5}
{video: 11,
type:1}
2013-04-05
T00:00Z#id2
{video: 20,
type:5}
{video: 25,
type:9}
UserID Video Type
id1 10 5

t0 t1
2013-04-05
T00:00Z#id1
{video: 10,
type:5}
{video: 11,
type:1}
2013-04-05
T00:00Z#id2
{video: 20,
type:5}
{video: 25,
type:9}
UserID Video Type
id1 10 5
id1 11 1

t0 t1
2013-04-05
T00:00Z#id1
{video: 10,
type:5}
{video: 11,
type:1}
2013-04-05
T00:00Z#id2
{video: 20,
type:5}
{video: 25,
type:9}
UserID Video Type
id1 10 5
id1 11 1
id2 20 5

t0 t1
2013-04-05
T00:00Z#id1
{video: 10,
type:5}
{video: 11,
type:1}
2013-04-05
T00:00Z#id2
{video: 20,
type:5}
{video: 25,
type:9}
UserID Video Type
id1 10 5
id1 11 1
id2 20 5
id2 25 9

Tips for InputFormat Development

• Know which target platforms you are developing for
– Which API to write against? New? Old? Both?

• Be prepared to spend time tuning your split computation
– Low latency jobs require fast splits

• Consider sorting row keys by token for data locality

• Consider sorting row keys by token for data locality
• Implement predicate pushdown for HIVE SerDe’s
– Use your indexes to reduce size of dataset

Example: OLAP processing
t0
2013-04
-05T00:
00Z#id1
{video:
10,
type:5}
2013-04
-05T00:
00Z#id2
{video:
20,
type:5}
C* events

t0
2013-04
-05T00:
00Z#id1
{video:
10,
type:5}
2013-04
-05T00:
00Z#id2
{video:
20,
type:5}
C* events
OLAP
Aggregates
OLAP
Aggregates
OLAP
Aggregates
Cached Materialized Views
Spark
Spark
Spark

t0
2013-04
-05T00:
00Z#id1
{video:
10,
type:5}
2013-04
-05T00:
00Z#id2
{video:
20,
type:5}
C* events
OLAP
Aggregates
OLAP
Aggregates
OLAP
Aggregates
Spark
Spark
Spark
Union

t0
2013-04
-05T00:
00Z#id1
{video:
10,
type:5}
2013-04
-05T00:
00Z#id2
{video:
20,
type:5}
C* events
OLAP
Aggregates
OLAP
Aggregates
OLAP
Aggregates
Spark
Spark
Spark
Union
Query 1: Plays
by Provider

t0
2013-04
-05T00:
00Z#id1
{video:
10,
type:5}
2013-04
-05T00:
00Z#id2
{video:
20,
type:5}
C* events
OLAP
Aggregates
OLAP
Aggregates
OLAP
Aggregates
Spark
Spark
Spark
Union
Query 1: Plays
by Provider
Query 2: Top
content for
mobile

Performance numbers
Spark 0.7.0

Performance numbers
Spark: C* -> OLAP aggregates
cold cache, 1.4 million events
130 seconds
C* -> OLAP aggregates
warmed cache
20-30 seconds
OLAP aggregate query via Spark
(56k records)
60 ms
Spark 0.7.0

OLAP WorkFlow
Aggregation JobSpark
Executors
REST Job Server
Aggregate

OLAP WorkFlow
Aggregation JobSpark
Executors
Cassandra
REST Job Server
Aggregate

OLAP WorkFlow
DatasetAggregation JobSpark
Executors
Cassandra
REST Job Server
Aggregate

OLAP WorkFlow
DatasetAggregation Job Query JobSpark
Executors
Cassandra
REST Job Server
Aggregate Query

OLAP WorkFlow
Executors
Cassandra
REST Job Server
Aggregate Query
Result

OLAP WorkFlow
Executors
Cassandra
REST Job Server
Query Job
Aggregate Query
Result
Query
Result

Fault Tolerance

Fault Tolerance
• Cached dataset lives in Java Heap only - what if process dies?

Fault Tolerance
• Spark lineage - automatic recomputation from source, but this is
expensive!

Fault Tolerance
expensive!
• Can also replicate cached dataset to survive single node failures

Fault Tolerance
expensive!
• Persist materialized views back to C*, then load into cache -- now
recovery path is much faster

Fault Tolerance
expensive!
• Persist materialized views back to C*, then load into cache -- now
recovery path is much faster
• Persistence also enables multiple processes to hold cached dataset

Demo time

Shark Demo
• Local shark node, 1 core, MBP
• How to create a table from C* using our inputformat
• Creating a cached Shark table
• Running fast queries

Creating a Shark Table from InputFormat

Creating a cached table

Querying cached table

THANK YOU

THANK YOU
• @evanfchan

THANK YOU
• @evanfchan
• ev@ooyala.com

THANK YOU
• @evanfchan
• ev@ooyala.com
• WE ARE HIRING!!

Spark: Under the hood
Map DatasetReduce Map
Driver Map DatasetReduce Map
Map DatasetReduce Map
One executor process per node
Driver

Real-time Analytics with Cassandra, Spark, and Shark

Related slideshows

More Related Content

Real-time Analytics with Cassandra, Spark, and Shark