SlideShare a Scribd company logo
HBase, Hadoop World NYC Ryan Rawson, Stumbleupon.com, su.pr Jonathan Gray, Streamy.com
A presentation in 2 parts
Part 1
About Me Ryan Rawson Senior Software Developer @ Stumbleupon HBase committer, core contributor
Stumbleupon Uses HBase in production Behind features of our su.pr service More later
Adventures with MySQL Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary (!) indexes too! (!!)
MySQL problems cont. Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Future doesn’t look so bright as we contemplate 10x sizes MySQL master becomes a problem...
Limitations of masters What if your write speed is greater than a single machine? All slaves must have same write capacity as master (can’t cheap out on slaves) Single point of failure, no easy failover Can (sort of) solve this with sharding...
Sharding
Sharding problems Requires either a hashing function or mapping table to determine shard Data access code becomes complex What if shard sizes become too large...
Resharding!
What about schema changes? What about schema changes or migrations? MySQL not your friend here Only gets harder with more data
HBase to the rescue Clustered, commodity(ish) hardware Mostly schema-less Dynamic distribution Spreads writes out over the cluster
What is HBase? HBase is an open-source distributed database, inspired by Google’s bigtable Part of the Hadoop ecosystem Layers on HDFS for storage Native connections to map reduce
HBase storage model Column-oriented database Column name is arbitrary data, can have large, variable, number of columns per row Rows stored in sorted order Can random read and write
 
 
Tables Table is split into roughly equal sized “regions” Each region is a contiguous range of keys, from [start, to end) Regions split as they grow, thus dynamically adjusting to your data set
Server architecture Similar to HDFS: Master = Namenode (ish) Regionserver = Datanode (ish) Often run these alongside each other!
Server Architecture 2 But not quite the same, HBase stores state in HDFS HDFS provides robust data storage across machines, insulating against failure Master and Regionserver fairly stateless and machine independent
Region assignment Each region from every table is assigned to a Regionserver The master is responsible for assignment and noticing if (when!) regionservers go down
Master Duties When machines fail, move regions from affected machines to others When regions split, move regions to balance cluster Could move regions to respond to load Can run multiple backup masters
What Master does NOT do Does not handle any write requests (not a DB master!) Does not handle location finding requests Not involved in the read/write path! Generally does very little most the time
Distributed coordination To manage master election and server availability we use ZooKeeper Set up as a cluster, provides distributed coordination primitives An excellent tool for building cluster management systems
Scaling HBase Add more machines to scale Base model (bigtable) scales past 1000TB No inherent reason why HBase couldn’t
What to store in HBase? Maybe not your raw log data...
... but the results of processing it with Hadoop! By storing the refined version in HBase, can keep up with huge data demands and serve to your website
Provides a real time, structured storage layer that integrates on your existing Hadoop clusters Provides “out of the box” hookups to map-reduce. Uses the same loved (or hated) management model as Hadoop HBase & Hadoop
HBase @
Stumbleupon & HBase Started investigating the field in Jan ’09 Looked at 3 top (at the time) choices: Cassandra Hypertable HBase cassandra didnt work, didnt like data model - hypertable fast but community and project viability (no major users beyond zvents) - hbase local and good community
Stumbleupon & HBase Picked HBase: Community Features Map-reduce, cascading, etc Now highly involved and invested
su.pr marketing “ Su.pr is the only URL shortener that also helps your content get discovered! Every Su.pr URL exposes your content to StumbleUpon's nearly 8 million users!”
su.pr tech features Real time stats Done directly in HBase In depth stats Use cascading, map reduce and put results in hbase
su.pr web access Using thrift gateway, php code accesses HBase No additional caching other than what HBase provides
Large data storage Over 9 billion rows and 1300 GB in HBase Can map reduce a 700GB table in ~ 20 min That is about 6 million rows/sec Scales to 2x that speed on 2x the hardware
Micro read benches Single reads are 1-10ms depending on disk seeks and caching Scans can return hundreds of rows in dozens of ms
Serial read speeds A small table A bigger table (removed printlns from the code)
Deployment considerations Zookeeper requires IO to complete ops Consider hosting on dedicated machines Namenode and HBase master can co-exist
What to put on your nodes Regionserver requires 2-4 cores and 3gb+ Can’t run HDFS, HBase, maps, reduces on a 2 core system On my 8 core systems I run datanode, regionserver, 2 maps, 2 reduces
Garbage collection GC tuning becomes important. Quick tip: use CMS, use -Xmx4000m Interested in G1 (if it ever stops crashing)
Batch and interactive These may not be compatible Latency goes up with heavy batch load May need to use 2 clusters to ensure responsive website
Part 2
HBase @ Streamy History of Data RDBMS Issues HBase to the Rescue Streamy Today and Tomorrow Future of HBase
About Me Co-Founder and CTO of Streamy.com HBase Committer Migrated Streamy from RDBMS to HBase and Hadoop in June 2008
History of Data The Prototype Streamy 1.0 built on PostgreSQL All of the bells and whistles Powered by single low-spec node 8 core / 8 GB / 2TB / $4k Functionally powerful,  Woefully slow
History of Data The Alpha Streamy 1.5 built on  optimized  PostgreSQL Remove bells and whistles, add partitioning Powered by high-powered master node 16 core / 64 GB / 15x146GB 15k RPM / $40k Less powerful,  still slow ...   Insanely expensive
History of Data The Beta Streamy 2.0 built entirely on HBase Custom caches, query engines, and API Powered by 10 low-spec nodes 4 core / 4GB / 1TB / $10k for entire cluster Less functional but  fast ,  scalable , and  cheap
RDBMS Issues Poor disk usage patterns Black box query engine Write speed degrades with table size Transactions/MVCC unnecessary overhead  Expensive
The Read Problem View 30 newest unread stories from blogs Not RDBMS friendly, no early-out PL/Python heap-merge hack helped We knew what to do but DB didn’t listen
The Write Problem Rapidly growing items table Crawl index from 1k to 100k feeds Indexes, static content, dynamic statistics Solutions are imperfect
RDBMS Conclusions Enormous functionality and flexibility But you throw it out the door at scale Stripped down RDBMS still not attractive Turned entire team into DBAs Gets in the way of domain-specific optimizations
What We Wanted Transparent partitioning Transparent distribution Fast random writes Good data locality Fast random reads
What We Got Transparent partitioning Transparent distribution Fast random writes Good data locality Fast random reads Regions RegionServers MemStore Column Families HBase 0.20
What Else We Got Transparent replication High availability MapReduce Versioning Fast Sequential Reads HDFS No SPOF Input/OutputFormats Column Versions Scanners
HBase @ Streamy  Today
HBase @ Streamy  Today All data stored in HBase Additional caching of hot data Query and indexing engines MapReduce crawling and analytics Zookeeper/Katta/Lucene
HBase @ Streamy  Tomorrow Thumbnail media server Slave replication for Backup/DR More Cascading Better Katta integration Realtime MapReduce
HBase on a Budget HBase works on cheap nodes But you need a cluster (5+ nodes) $10k cluster has 10X capacity of $40k node Multiple instances on a single cluster 24/7 clusters + bandwidth != EC2
Lessons Learned Layer of abstraction helps tremendously Internal Streamy Data API Storage of serialized types Schema design is about reads not writes What’s good for HBase is good for Streamy
What’s Next for HBase Inter-cluster / Inter-DC replication Slave and Multi-Master Master rewrite, more Zookeeper Batch operations, HDFS uploader No more data loss Need HDFS appends
HBase Information Home Page  http://hbase.org Wiki  http://wiki.apache.org/hadoop/Hbase Twitter  http://twitter.com/hbase Freenode IRC  #hbase Mailing List  [email_address]

More Related Content

Hw09 Practical HBase Getting The Most From Your H Base Install

  • 1. HBase, Hadoop World NYC Ryan Rawson, Stumbleupon.com, su.pr Jonathan Gray, Streamy.com
  • 4. About Me Ryan Rawson Senior Software Developer @ Stumbleupon HBase committer, core contributor
  • 5. Stumbleupon Uses HBase in production Behind features of our su.pr service More later
  • 6. Adventures with MySQL Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary (!) indexes too! (!!)
  • 7. MySQL problems cont. Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Future doesn’t look so bright as we contemplate 10x sizes MySQL master becomes a problem...
  • 8. Limitations of masters What if your write speed is greater than a single machine? All slaves must have same write capacity as master (can’t cheap out on slaves) Single point of failure, no easy failover Can (sort of) solve this with sharding...
  • 10. Sharding problems Requires either a hashing function or mapping table to determine shard Data access code becomes complex What if shard sizes become too large...
  • 12. What about schema changes? What about schema changes or migrations? MySQL not your friend here Only gets harder with more data
  • 13. HBase to the rescue Clustered, commodity(ish) hardware Mostly schema-less Dynamic distribution Spreads writes out over the cluster
  • 14. What is HBase? HBase is an open-source distributed database, inspired by Google’s bigtable Part of the Hadoop ecosystem Layers on HDFS for storage Native connections to map reduce
  • 15. HBase storage model Column-oriented database Column name is arbitrary data, can have large, variable, number of columns per row Rows stored in sorted order Can random read and write
  • 16.  
  • 17.  
  • 18. Tables Table is split into roughly equal sized “regions” Each region is a contiguous range of keys, from [start, to end) Regions split as they grow, thus dynamically adjusting to your data set
  • 19. Server architecture Similar to HDFS: Master = Namenode (ish) Regionserver = Datanode (ish) Often run these alongside each other!
  • 20. Server Architecture 2 But not quite the same, HBase stores state in HDFS HDFS provides robust data storage across machines, insulating against failure Master and Regionserver fairly stateless and machine independent
  • 21. Region assignment Each region from every table is assigned to a Regionserver The master is responsible for assignment and noticing if (when!) regionservers go down
  • 22. Master Duties When machines fail, move regions from affected machines to others When regions split, move regions to balance cluster Could move regions to respond to load Can run multiple backup masters
  • 23. What Master does NOT do Does not handle any write requests (not a DB master!) Does not handle location finding requests Not involved in the read/write path! Generally does very little most the time
  • 24. Distributed coordination To manage master election and server availability we use ZooKeeper Set up as a cluster, provides distributed coordination primitives An excellent tool for building cluster management systems
  • 25. Scaling HBase Add more machines to scale Base model (bigtable) scales past 1000TB No inherent reason why HBase couldn’t
  • 26. What to store in HBase? Maybe not your raw log data...
  • 27. ... but the results of processing it with Hadoop! By storing the refined version in HBase, can keep up with huge data demands and serve to your website
  • 28. Provides a real time, structured storage layer that integrates on your existing Hadoop clusters Provides “out of the box” hookups to map-reduce. Uses the same loved (or hated) management model as Hadoop HBase & Hadoop
  • 30. Stumbleupon & HBase Started investigating the field in Jan ’09 Looked at 3 top (at the time) choices: Cassandra Hypertable HBase cassandra didnt work, didnt like data model - hypertable fast but community and project viability (no major users beyond zvents) - hbase local and good community
  • 31. Stumbleupon & HBase Picked HBase: Community Features Map-reduce, cascading, etc Now highly involved and invested
  • 32. su.pr marketing “ Su.pr is the only URL shortener that also helps your content get discovered! Every Su.pr URL exposes your content to StumbleUpon's nearly 8 million users!”
  • 33. su.pr tech features Real time stats Done directly in HBase In depth stats Use cascading, map reduce and put results in hbase
  • 34. su.pr web access Using thrift gateway, php code accesses HBase No additional caching other than what HBase provides
  • 35. Large data storage Over 9 billion rows and 1300 GB in HBase Can map reduce a 700GB table in ~ 20 min That is about 6 million rows/sec Scales to 2x that speed on 2x the hardware
  • 36. Micro read benches Single reads are 1-10ms depending on disk seeks and caching Scans can return hundreds of rows in dozens of ms
  • 37. Serial read speeds A small table A bigger table (removed printlns from the code)
  • 38. Deployment considerations Zookeeper requires IO to complete ops Consider hosting on dedicated machines Namenode and HBase master can co-exist
  • 39. What to put on your nodes Regionserver requires 2-4 cores and 3gb+ Can’t run HDFS, HBase, maps, reduces on a 2 core system On my 8 core systems I run datanode, regionserver, 2 maps, 2 reduces
  • 40. Garbage collection GC tuning becomes important. Quick tip: use CMS, use -Xmx4000m Interested in G1 (if it ever stops crashing)
  • 41. Batch and interactive These may not be compatible Latency goes up with heavy batch load May need to use 2 clusters to ensure responsive website
  • 43. HBase @ Streamy History of Data RDBMS Issues HBase to the Rescue Streamy Today and Tomorrow Future of HBase
  • 44. About Me Co-Founder and CTO of Streamy.com HBase Committer Migrated Streamy from RDBMS to HBase and Hadoop in June 2008
  • 45. History of Data The Prototype Streamy 1.0 built on PostgreSQL All of the bells and whistles Powered by single low-spec node 8 core / 8 GB / 2TB / $4k Functionally powerful, Woefully slow
  • 46. History of Data The Alpha Streamy 1.5 built on optimized PostgreSQL Remove bells and whistles, add partitioning Powered by high-powered master node 16 core / 64 GB / 15x146GB 15k RPM / $40k Less powerful, still slow ... Insanely expensive
  • 47. History of Data The Beta Streamy 2.0 built entirely on HBase Custom caches, query engines, and API Powered by 10 low-spec nodes 4 core / 4GB / 1TB / $10k for entire cluster Less functional but fast , scalable , and cheap
  • 48. RDBMS Issues Poor disk usage patterns Black box query engine Write speed degrades with table size Transactions/MVCC unnecessary overhead Expensive
  • 49. The Read Problem View 30 newest unread stories from blogs Not RDBMS friendly, no early-out PL/Python heap-merge hack helped We knew what to do but DB didn’t listen
  • 50. The Write Problem Rapidly growing items table Crawl index from 1k to 100k feeds Indexes, static content, dynamic statistics Solutions are imperfect
  • 51. RDBMS Conclusions Enormous functionality and flexibility But you throw it out the door at scale Stripped down RDBMS still not attractive Turned entire team into DBAs Gets in the way of domain-specific optimizations
  • 52. What We Wanted Transparent partitioning Transparent distribution Fast random writes Good data locality Fast random reads
  • 53. What We Got Transparent partitioning Transparent distribution Fast random writes Good data locality Fast random reads Regions RegionServers MemStore Column Families HBase 0.20
  • 54. What Else We Got Transparent replication High availability MapReduce Versioning Fast Sequential Reads HDFS No SPOF Input/OutputFormats Column Versions Scanners
  • 56. HBase @ Streamy Today All data stored in HBase Additional caching of hot data Query and indexing engines MapReduce crawling and analytics Zookeeper/Katta/Lucene
  • 57. HBase @ Streamy Tomorrow Thumbnail media server Slave replication for Backup/DR More Cascading Better Katta integration Realtime MapReduce
  • 58. HBase on a Budget HBase works on cheap nodes But you need a cluster (5+ nodes) $10k cluster has 10X capacity of $40k node Multiple instances on a single cluster 24/7 clusters + bandwidth != EC2
  • 59. Lessons Learned Layer of abstraction helps tremendously Internal Streamy Data API Storage of serialized types Schema design is about reads not writes What’s good for HBase is good for Streamy
  • 60. What’s Next for HBase Inter-cluster / Inter-DC replication Slave and Multi-Master Master rewrite, more Zookeeper Batch operations, HDFS uploader No more data loss Need HDFS appends
  • 61. HBase Information Home Page http://hbase.org Wiki http://wiki.apache.org/hadoop/Hbase Twitter http://twitter.com/hbase Freenode IRC #hbase Mailing List [email_address]