SlideShare a Scribd company logo
@andypiper
Chris Aniszczyk
Head of Open Source
@cra
Apache Mesos at Twitter
#TXLF 2014
Hi, I’m @cra & run the @TwitterOSS office!
2
Twitter is Built on Open Source…
3
Agenda
!
• Introduction
• How does Mesos work?

• Mesos Ecosystem

• Conclusion

• Q&A
Twitter Scale…
5
255M+
500M+
77%
Active users
Tweets per day
of users are
outside the US
2006 2014
100TB+
compressed data per day
6
Growth challenges… sad times… remember the fail whale?
7
Ups and Downs… remember World Cup 2010?
http://gigaom.com/2010/06/11/is-the-world-cup-bringing-down-twitter/
Easy solution!? Lets add machines… but…
!
• Can get expensive… even with
commodity hardware…

• Hard to fully utilize machines
(e.g., 72 GB RAM and 24 CPUs)

• Hard to deal with failures…

• What else could we do…?
Evaluate industry…
!
• Google was ahead of the game of managing
warehouse scale computing: http://
research.google.com/pubs/pub35290.html

!
• Google hit a lot of these problems before many other
companies and came up with interesting solutions:
http://youtube.com/watch?v=0ZFMlO98Jkc
Evaluate research at universities…
!
• Universities (wooooo PhDs) were doing research in
this area, we decided to partner and hire researchers:
https://amplab.cs.berkeley.edu/tag/mesos/

!
• “Return of the Borg: How Twitter Rebuilt Google’s
Secret Weapon: http://www.wired.com/2013/03/
google-borg-twitter-mesos
Enter Apache Mesos
!
• We took university research and spun into an open
source project at the Apache Foundation: https://
blog.twitter.com/2012/incubating-apache-mesos

• https://twitter.com/ApacheMesos/statuses/
360039441500340224
What is exactly is Mesos?
• Mesos is an open source project with a healthy
independent community: http://mesos.apache.org	

• Mesos is a distributed system to build and run
distributed systems	

• Mesos provides fine-grained resource sharing and
isolation	

• Mesos enables high-availability and fault-tolerance
for your cluster
This is your typical data center
1 2 3
4 5 6
7 8 9
This is your typical data center with static partitioned apps
1 2 3
4 5 6
7 8 9
Not sharing wastes resources
0%
11%
22%
33%
0%
11%
22%
33%
0%
11%
22%
33%
Resource sharing increases throughput and utilization
0%
11%
22%
33%
0%
11%
22%
33%
0%
11%
22%
33%
0%
33.333%
66.667%
100%
Running at the container level improves performance…
Timetoprovision(seconds)
1
100
10000
Bare metal VM Container
Inspired by Tomas Barton’s Mesos talk at InstallFest in Prague
Agenda
!
• Introduction

• How does Mesos work?
• Mesos Ecosystem

• Conclusion

• Q&A
Mesos Slave
Hadoop task-tracker Mesos Executor
Task #1 Task #2 ./ruby XYZ
Mesos Slave
Docker Executor Docker Executor
java -jar XYZ.jar ./xyz
Mesos Master Mesos Master Mesos Master
Hadoop
scheduler
Marathon
scheduler
Zookeeper
quorum
*Thank you to Niklas Nielsen and Adam Borlen for the following diagrams explaining Mesos
https://www.youtube.com/watch?v=EI0ROkf0vks
Mesos consists of master/slave nodes
Mesos Slave
Hadoop task-tracker Mesos Executor
Task #1 Task #2 ./ruby XYZ
Mesos Slave
Docker Executor Docker Executor
java -jar XYZ.jar ./xyz
Mesos Master Mesos Master Mesos Master
Hadoop
scheduler
Marathon
scheduler
Zookeeper
quorum
applications
are known as
frameworks
in Mesos,
they interact
with master
Mesos Slave
Hadoop task-tracker Mesos Executor
Task #1 Task #2 ./ruby XYZ
Mesos Slave
Docker Executor Docker Executor
java -jar XYZ.jar ./xyz
Mesos Master Mesos Master Mesos Master
Hadoop
scheduler
Marathon
scheduler
Zookeeper
quorum
Multiple masters can
be in place for HA;
coordinate leader
election with ZK
Mesos Slave
Hadoop task-tracker Mesos Executor
Task #1 Task #2 ./ruby XYZ
Mesos Slave
Docker Executor Docker Executor
java -jar XYZ.jar ./xyz
Mesos Master Mesos Master Mesos Master
Hadoop
scheduler
Marathon
scheduler
Zookeeper
quorum
Master schedules tasks to
run on slaves’ available
resources; slaves use
executors to coordinate
execution of tasks
Tasks are
the unit
of
execution
Mesos provides fine-grained resource isolation (via cgroups)
Compute Node
Mesos Slave Process
Hadoop task-tracker Mesos Executor
Task #1 Task #2 ruby XYZ
Container
(Cgroups)
Executor
Slaves isolate executors and tasks via containers (dotted line)
Compute Node
Mesos Slave Process
Hadoop task-tracker
Task #1 Task #2
Container
(Cgroups)
Task #3
Mesos provides fine-grained resource isolation (via cgroups)
Containers can GROW AND SRHINK as tasks run and complete
Mesos provides componentized resource isolation
Mesos Slave Process
Mesos Containerizer
CGroups CPU isolator
CGroups Memory isolator
Launcher
Container foo
Task baz
Containerizer API
Executor bar
When a slave starts, you
can specify a
“containerizer” to launch
the container and set of
isolators to enforce
resource constraints
(CPU/memory)
Mesos can track and
allocate more
resource types,
allowing you to
manage resources
like ip-addresses,
ports, disk space and
even GPUs!
Mesos provides pluggable resource isolation (e.g., Docker)
External Containerizer
External Containerizer API
Mesos Slave Process
External Containerizer Program
Container foo
MySQL
Containerizer API
Ubuntu 13.10
Container bar
Ruby
Centos 6.4
github.com/mesosphere/deimos
Everything fails all the time
Werner Vogels (Amazon CTO)
Mesos has no single point of failure (master keeps
monitoring tasks and waits for a node to reconnect,
master will update the framework with any tasks that
were completed while it was gone)
Tasks keep running!
Framework
Masters
Master node can fail-over (ZK quorum will elect a new leader)
Tasks keep running!
Framework
Masters
Slave processes can fail over (loads check pointed
state to learn what pods to reconnect for reach
task and re-registeres with the master)
Tasks keep running!
Compute Node
Mesos Slave Process
Mesos Executor Mesos Executor
The Mesos ecosystem is growing, frameworks everywhere)
http://mesos.apache.org/documentation/latest/mesos-frameworks/
Chronos: Distributed cron with dependencies
https://github.com/airbnb/chronos
Marathon: init.d for your data center
https://github.com/mesosphere/marathon
Aurora: Advanced scheduler used by Twitter in production
http://aurora.incubator.apache.org
You can also build your own framework…
Agenda
!
• Introduction

• How does Mesos work?

• Mesos Ecosystem
• Conclusion

• Q&A
#PoweredByMesos (public)
http://mesos.apache.org/documentation/latest/powered-by-mesos/
Apache Mesos at Twitter (Texas LinuxFest 2014)
Mesos allow services to scale
Engineers think about resources, not
machines
Storage
MySQL
Tweet store
Flock
User Store
Cache
Memcached
Redis
Logic
Tweet Service
User Service
Timeline
Service
SocialGraph
Service
DM Service
Presentation
API
Web
Search
Feature X
Feature Y
Presentation
TFE
(netty)
Reverse Proxy
HTTP Thrift Thrift
Aurora
Mesos
Monorail
Apache Mesos at Twitter (Texas LinuxFest 2014)
Mesos enables multi-tenant clusters
Small teams can move fast
AWS-based infrastructure
beyond just Hadoop
Marathon
Mesos
Chronos
Batch/Streaming
Hadoop
Spark
Kafka
Query/Analysis
Cascading
Presto
Hive
Shark
Pig
Services
Rails
Redis
Cassandra
KairosDB
RDS
Hadoop A Hadoop B
Agenda
!
• Introduction

• How does Mesos work?

• Mesos Ecosystem

• Conclusion
• Q&A
Conclusion
• Mesos is a distributed system to build and run
distributed systems (think datacenter OS)	

• Mesos enables resource sharing, high-availability
and fault-tolerance for your data centers	

• Mesos is an open source project with a healthy
independent community: http://mesos.apache.org	

• So please check it out, use it or contribute back if
you can to make it better!
https://elastic.mesosphere.io
http://mesos.apache.org
Open Source Support from the Mesos Community
http://mesos.apache.org/community/user-groups/
Learn more via Mesos User Groups
http://mesosphere.io/learn
Commercial Support from Mesosphere
http://mesoscon.org
First #MesosCon to coincide with LinuxCon 2014!
Thank you for listening!
Chris Aniszczyk (@cra)
zx@twitter.com

http://opensource.twitter.com

!
http://mesos.apache.org

email: {user,dev}@mesos.apache.org
51Also thanks to Niklas Nielsen and Adam Borlen for their slides explaining Mesos from ApacheCon 2014
https://www.youtube.com/watch?v=EI0ROkf0vks
Resources
!
http://mesos.apache.org
http://mesosphere.io/learn/
http://wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos
http://mesosphere.io/2013/09/26/docker-on-mesos/
http://typesafe.com/blog/play-framework-grid-deployment-with-mesos
http://research.google.com/pubs/pub35290.html
http://nerds.airbnb.com/hadoop-on-mesos/
https://blog.twitter.com/2013/mesos-graduates-from-apache-incubation
http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/
https://www.youtube.com/watch?v=EI0ROkf0vks
!

More Related Content

Apache Mesos at Twitter (Texas LinuxFest 2014)