SlideShare a Scribd company logo
Analysis of Data Placement Strategy 
based on Computing Power of Nodes on 
Heterogeneous Hadoop Clusters 
Sanket Reddy Chintapalli 
Advisor - Dr. Xiao Qin
Presentation Overview 
● Synopsis 
● Mapreduce Programming Model Overview 
● HDFS Overview 
● Motivation 
● Design 
● Software Description 
● Hardware Description 
● Results 
● Conclusion
Synopsis 
● Data placement strategy 
● Heterogeneous Clusters 
● Computing Power 
● Calculating Computing Ratio 
● WordCount and Grep
MapReduce Model 
● Hadoop 1.0 and Hadoop 2.0 
● Master - Slave Model 
● JobTracker and TaskTracker Hadoop 1.0 
● YARN Hadoop 2.0 
● Resource Manager YARN 
● Application Manager YARN 
● Node Manager YARN 
● MapReduce Flow
Mapreduce Model
Mapreduce Model - 1.0
Mapreduce Model - YARN - 2.0
Mapreduce Model - Flow
HDFS 
● Namenode 
● Datanode 
● Replication 
● Federated Namenodes
HDFS Architecture
HDFS Federated Namenodes
HDFS Federated Namenodes 
● Scalability 
● Performance 
● Isolation - overload
Motivation
Software Description 
● Hadoop 2.3.0 
● Maven 
● Eclipse 
● Protocol Buffers
Hardware Description
Design 
Run WordCount and Grep Applications on 
individual nodes
Design 
Calculate Computing Power of Individual Nodes for 
a specific application
Design 
● Evaluate Hadoop Distribution by running grep and 
wordcount together on all nodes 
● Run the CRBalancer to balance the nodes 
● Finally re-run the applications to note the ramifications 
of the data placement strategy.
Design - Algorithm 
CRBalancer Strategy
Implementation 
● CRBalancer 
● CRBalancingPolicy 
● CRNamenodeConnector
Results - WordCount
Results - Grep
Questions ??

More Related Content

HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters