SlideShare a Scribd company logo
Data Warehouse Systems in the
Cloud: new requirements and new
challenges
Rim Moussa

LaTICE Lab. -University of Tunis
ESTI -University of Carthage
rim.moussa@esti.rnu.tn
10th Intl. Conference on Computer Systems and Applications
(AICCSA), Fez, Kingdom of Morocco
th
30 May 2013 Keynote @ Intl. Conference on Computing, Networking and
30th May
Communications, Hammamet, Tunisia
DWS in the Cloud, AICCSA'13, Fez
2013
Context
Cloud Rationale

Benchmarking
Data Warehouse
Systems
NO

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

2
Cloud Rationale

Benchmarking
Data Warehouse
Systems
NO

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

3
Outline
1. Cloud Computing
2. Data Warehouse Systems
3. Overview of DWS Benchmarks
4. New Requirements for DWS in the Cloud
5. Related Work
6. Conclusion
7. Research Perspectives

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

4
Cloud Computing

●

NIST Definition
–

●

cloud computing as a pay-per-use model for enabling available,
convenient, on-demand network access to a shared pool of
configurable computing resources (e.g. networks, servers, storage,
applications, services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction.

Opportunities
–

Performance

–

Faster data analysis through usage of up-to-date hardware
infrastructure made available by Cloud Service Providers,
More Economical
●

●

30th May
2013

Organizations no longer need to expend capital upfront for
hardware and software purchases, with Services provided on a
pay-per-use basis,
DWS in the Cloud, AICCSA'13, Fez

5
Cloud Computing
--Market share
●

Market Share
–

Forrester Research expects the global cloud computing
market to reach $241 billion in 2020,

–

Gartner group: The public cloud services market is
forecast to grow 18.5% in 2013 to total $131 billion
worldwide, up from $111 billion in 2012,

–

Gartner: the public cloud services market in the Middle
East and North Africa (MENA) is expected to increase
by 24.5% in 2013,

–

Gartner group: the public cloud services market in INDIA
is forecast to grow 36% in 2013 to total $443 million, up
from $326 million in 2012,

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

6
Data Warehouse Systems
--Typical System Architecture

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

7
Data Warehouse Systems
--Technologies
●

Traditional Relational DBMSs & OLAP Servers
–
–

●

Mature
Do not scale linearly

NoSQL solutions
–

Adopted by Google, Facebook, Amazon, ...

–

Dynamic horizontal scale-up

–

Nodes are added without bringing the cluster down
●
Shared-nothing architecture
●
Independent
computing
and
storage
nodes
interconnected via a high speed network
MapReduce Distributed programming framework
●

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

8
Data Warehouse Systems
--challenges with big data management

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

9
Data Warehouse Systems
--Common Optimizations: Hardware Storage Tech.
●

DRAM: in-memory data processing (very expensive)

●

SSD (Solid State Drives): a non-volatile type of memory.
●

An SSD does not have a mechanical arm to read and
write data
SSD

HDD

Cost/GB

$1/GB

$0.075/GB

Typical size

512GB

Up to 2TB

Failure rate:
2 million hours
MTBF
Read/Write speed 200-500 MBps

30th May
2013

1.5 million hour
120 MBps

DWS in the Cloud, AICCSA'13, Fez

10
Data Warehouse Systems
--Common Optimizations: Columnar Storage Principle
●

Row-oriented storage
–

Read pages containing all columns
Date

●

Customer

Product

Price Quantity

Column-oriented storage
–

Read only columns needed for query processing

Date

30th May
2013

Customer

Product

Price

DWS in the Cloud, AICCSA'13, Fez

Quantity

11
Data Warehouse Systems
--Common Optimizations: Columnar Storage Benefits
●

●

●

Allows best data compression rate, since data values are
redundant within a single column,
Eliminates unnecessary I/O through the retrieval of only
relevant data
Vectorwise is in the TPC-H - Top Ten Performance Results
(14-Jun-2013)

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

12
Data Warehouse Systems
--Common Optimizations: Derived Data
●

Derived Data:
–
–

Derived Attributes,

–
●

Indexes,
Aggregate tables

Pros:
–

●

High Performance

Cons:
–

Maintenance: refresh is expensive

–

Storage cost

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

13
Data Warehouse Systems
--DWS Benchmarks
●

APB-1 OLAP Benchmark --obsolete
–
–

●

Released by the OLAP Council (www.olapcouncil.org) in 1998
A simple star schema data model

TPC DSS Benchmark
–

Released by the Transaction Processing Council (www.tpc.org)

–

Examine large volumes of data (from 10GB to 100TB)

–

Complex relational data model

–

TPC-H
Workload composed of 22 ad-hoc complex SQL Statements
●
The most prominent DSS benchmark
TPC-DS -successor of TPC-H
●

–

●
●

30th May
2013

Workload composed of a 99 SQL business questions
Same metrics than TPC-H
DWS in the Cloud, AICCSA'13, Fez

14
Data Warehouse Systems
--TPC-H Benchmark Metrics (same for TPC-DS)
●

Query-per-hour Performance Metric
–
–

●

For a given scale factor (warehouse data volume)
Concurrent users

Price-Performance Metric
–

30th May
2013

Ratio of Priced System (cost of ownership: hardware,
software, maintenance, and cost of everything needed to run
the TPC6H workload) to Query performance Metric

DWS in the Cloud, AICCSA'13, Fez

15
Data Warehouse Systems
--TPC-H mismatches Cloud Rationale
●

TPC-H Does not represent BI suites
–
–

Analytics services (Multi-dimensional
Language, Mining Structures)

–
●

Integration services

Reporting services

eXpressions

TPC-H Workload Processing Metric
–

Qph@Size defines the number of queries processed by hour

–

The workload is assumed static, which is not realistic!

–

The benchmark should assess the SUT scalability under
variable and evolving workload and data volumes

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

16
Data Warehouse Systems
--TPC-H mismatches Cloud Rationale (ctnd.1)
●

TPC-H Cost-Performance Metric
–

$/Qph@Size, where the cost relates to all of hardware,
software and HR required for running the workload (3yrs)

–

The cost model in the cloud is different, and does
relate to the cost of ownership

●

TPC-H does not report a Cost-Effectiveness Metric

●

not

TPC-H implementation vs. CAP theorem
–

CAP theorem: A distributed system can not fulfill both
Consistency (same view of data), Availability (query response)
and Partition Tolerance (cope with hardware crash).

–

Since DWS deployments are onto shared-nothing architectures,
benchmarks should be either CA, CP and AP-compliant.

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

17
New Requirements & New Metrics
NewRequirements & New Metrics

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

18
High Performance Requirement

High Performance Requirement
--Data Transfer IN/ OUT CSP
●

Data Transfer Characteristics
–

Huge data volumes transfer IN and OUT the
Cloud Service Provider

–

Resulting in Network-bound DWS

–

Usually, the cost model adopted by CSPs is:
●
●

●

Data upload IN the CSP is free of charge
Data download OUT the CSP is priced

Data Transfer Metrics in the Cloud
–
–

30th May
2013

Time and cost for data upload
Time and cost for data download

DWS in the Cloud, AICCSA'13, Fez

19
High Performance Requirement

High Performance (ctnd. 1)
Requirement
--Workload Processing
●

Workload Processing Characteristics
–

–

●

Both I/O-bound and CPU-bound business
questions
Intra-query processing combined with virtual
partitioning or physical processing

Performance across Cluster Size
–

–

30th May
2013

For each business question, there is an
optimum response time for a particular cluster
size and performance degrades from this
optimum onward and backward
Proved for both SQL and NoSQL technologies

DWS in the Cloud, AICCSA'13, Fez

20
High Performance Requirement

High Performance (ctnd.2)
Requirement
--Workload Processing
●

30th May
2013

TPC-H benchmarking of Apache Hadoop/Pig
Latin
on
GRID5000
-Bordeaux
Site
[Moussa,ICCIT'12] (SF=10)

DWS in the Cloud, AICCSA'13, Fez

21
High Performance Requirement

High Performance (ctnd.3)
Requirement
--Workload Processing
●

Workload Processing Metrics
–
–

30th May
2013

Elapsed times for running business questions,
Slope: performance - cost

DWS in the Cloud, AICCSA'13, Fez

22
Scalability Requirement

●

Definition
–

●

Scalability is the ability of a system to
increase total throughput under an
increased load when hardware resources
are added..

Scalability Metric
–

Query Performance Metric under
●
●

30th May
2013

Ever increasing workload
Different query frequencies

DWS in the Cloud, AICCSA'13, Fez

23
Elasticity Requirement

●

Definition
–

●

Elasticity adjusts the system capacity at runtime by
adding and removing resources without service
interruption in order to handle the workload variation.

Elasticity Metric
–
–

Scaling Latency: elapsed time to scale-down and
scale-up

–

Impact on SUT performances during scale-up and
scale-down

–

Scale-up cost (+$)

–

30th May
2013

Capacity to add/remove resources: (0|1)

Scale-down gain (-$)

DWS in the Cloud, AICCSA'13, Fez

24
High Availability Requirement
–- Redundancy Strategies
●

Redundancy Strategies
–
–

●

Replication (a.k.a. mirroring)
Erasure-Resilient Codes

Redundancy Strategies vs. Workload Type
–
–

●

Replication suits OLTP workload
Erasure-resilient codes suits OLAP workload

Comparison [Litwin et al.,ACM TODS'05]
–
–

Computation cost

–

30th May
2013

Data storage cost
Communication cost

DWS in the Cloud, AICCSA'13, Fez

25
High Availability Requirement
–-Strategies Comparison (ctnd.1)

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

26
High Availability Requirement
--Metrics for the Cloud (ctnd.2)

●

High Availability Metrics
–

$@k: Cost of different targeted levels of
availabilities (1-available, . . . , k-available, i.e.
the number of failures the system can tolerate).

–

Cost of recovery expressed
●
●

30th May
2013

Time to get system back
Decreased system productivity caused by
the hardware failure ($) from customer
perspective

DWS in the Cloud, AICCSA'13, Fez

27
Cost Management Requirement

●

CSP price cost model
–

Different cloud service price models (IaaS,
PaaS, SaaS)

–

e.g.

CPU cost for IaaS: Instance based
(Amazon, MS Azur) or CPU-cycles based
(Cloud Sites, Google App Engine)
●
Query processing by Google BigQuery is
based on retrieved bytes (columnar storage)
Cost-Performance Ratio
●

●

●

30th May
2013

Cost-Effectiveness ratio

DWS in the Cloud, AICCSA'13, Fez

28
Related Work
●

Benchmarking in the cloud
–

[Gray,MS'08]: Terasoft Benchmark for data sort evaluations,

–

[Cooper et al., SoCC'10]: Yahoo Cloud Serving Benchmark
(YCSB) for evaluating the performance of "key-value" and
"cloud" serving stores.

–

[Sobel et al., ICCSA'08]: CloudStone Benchmark for Web2.0
applications

–

[Bennet et al., KDD'10]: MalStone Benchmarking for data
mining in the cloud

–

[Ang et al., USENIX'10]: CloudCMP project for CSP
comparison

–

[Binnig et al., DBTest'09], [Kossmann et al., SIGMOD'10]:
Benchmarking OLTP systems in the cloud

●

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

29
Related Work (ctnd.1)
●

NoSQL and SQL Technologies Assessment in the cloud
–
–

●

[Pavlo et al. SIGMOD'09],
[Floratou et al., TPC-TC'11 ],

More Specific Issues
–

[Forrester, 2011]: Storage on-premises vs. in the cloud

–

[Nguyen et al., EDBT Workshops'12]: Materialized Views
Selection

–

[Moussa, IJWA'12]: OLAP Scenarios in the Cloud and OLAP
Workload Texonomy

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

30
Conclusion & Future Work
●

Keynote scope
–

Overview of DWS

–

Insight of new requirements and new metrics to be
considered for benchmarking DWS in the cloud [Moussa,
AICCSA'13]

●

Research Perspectives
–

Assessment of OLAP systems in the cloud e
●
●
●
●

30th May
2013

Amazon RDS
Google BigQuery
MS Azure
...
DWS in the Cloud, AICCSA'13, Fez

31
Research Perspectives
--New OLTP Systems
●

Classical Workload Taxonomy
–
–

●

OLTP: Transactions, ACID properties
OLAP: complex queries, star-joins, grouping,
aggregations...

New OLTP Workload features:
–
–

Big Data

–
●

OLTP
Real-time analytics

Examples of systems: Google Spanner,
Clustrix, NuoDB and TransLattice

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

34
Thank you for Your Attention
Q&A

?
Rim Moussa
Data Warehouse Systems in the Cloud
N2C'2013, Hammamet
30th May
2013

15th June 2013

DWS in the Cloud, AICCSA'13, Fez

35
Data Warehouse Systems
--TPC-H Benchmark Relational DB Schema

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

36
Data Warehouse Systems
--TPC-H Benchmark Metrics

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

37

More Related Content

Benchmarking data warehouse systems in the cloud: new requirements & new metrics

  • 1. Data Warehouse Systems in the Cloud: new requirements and new challenges Rim Moussa LaTICE Lab. -University of Tunis ESTI -University of Carthage rim.moussa@esti.rnu.tn 10th Intl. Conference on Computer Systems and Applications (AICCSA), Fez, Kingdom of Morocco th 30 May 2013 Keynote @ Intl. Conference on Computing, Networking and 30th May Communications, Hammamet, Tunisia DWS in the Cloud, AICCSA'13, Fez 2013
  • 2. Context Cloud Rationale Benchmarking Data Warehouse Systems NO 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 2
  • 3. Cloud Rationale Benchmarking Data Warehouse Systems NO 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 3
  • 4. Outline 1. Cloud Computing 2. Data Warehouse Systems 3. Overview of DWS Benchmarks 4. New Requirements for DWS in the Cloud 5. Related Work 6. Conclusion 7. Research Perspectives 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 4
  • 5. Cloud Computing ● NIST Definition – ● cloud computing as a pay-per-use model for enabling available, convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Opportunities – Performance – Faster data analysis through usage of up-to-date hardware infrastructure made available by Cloud Service Providers, More Economical ● ● 30th May 2013 Organizations no longer need to expend capital upfront for hardware and software purchases, with Services provided on a pay-per-use basis, DWS in the Cloud, AICCSA'13, Fez 5
  • 6. Cloud Computing --Market share ● Market Share – Forrester Research expects the global cloud computing market to reach $241 billion in 2020, – Gartner group: The public cloud services market is forecast to grow 18.5% in 2013 to total $131 billion worldwide, up from $111 billion in 2012, – Gartner: the public cloud services market in the Middle East and North Africa (MENA) is expected to increase by 24.5% in 2013, – Gartner group: the public cloud services market in INDIA is forecast to grow 36% in 2013 to total $443 million, up from $326 million in 2012, 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 6
  • 7. Data Warehouse Systems --Typical System Architecture 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 7
  • 8. Data Warehouse Systems --Technologies ● Traditional Relational DBMSs & OLAP Servers – – ● Mature Do not scale linearly NoSQL solutions – Adopted by Google, Facebook, Amazon, ... – Dynamic horizontal scale-up – Nodes are added without bringing the cluster down ● Shared-nothing architecture ● Independent computing and storage nodes interconnected via a high speed network MapReduce Distributed programming framework ● 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 8
  • 9. Data Warehouse Systems --challenges with big data management 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 9
  • 10. Data Warehouse Systems --Common Optimizations: Hardware Storage Tech. ● DRAM: in-memory data processing (very expensive) ● SSD (Solid State Drives): a non-volatile type of memory. ● An SSD does not have a mechanical arm to read and write data SSD HDD Cost/GB $1/GB $0.075/GB Typical size 512GB Up to 2TB Failure rate: 2 million hours MTBF Read/Write speed 200-500 MBps 30th May 2013 1.5 million hour 120 MBps DWS in the Cloud, AICCSA'13, Fez 10
  • 11. Data Warehouse Systems --Common Optimizations: Columnar Storage Principle ● Row-oriented storage – Read pages containing all columns Date ● Customer Product Price Quantity Column-oriented storage – Read only columns needed for query processing Date 30th May 2013 Customer Product Price DWS in the Cloud, AICCSA'13, Fez Quantity 11
  • 12. Data Warehouse Systems --Common Optimizations: Columnar Storage Benefits ● ● ● Allows best data compression rate, since data values are redundant within a single column, Eliminates unnecessary I/O through the retrieval of only relevant data Vectorwise is in the TPC-H - Top Ten Performance Results (14-Jun-2013) 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 12
  • 13. Data Warehouse Systems --Common Optimizations: Derived Data ● Derived Data: – – Derived Attributes, – ● Indexes, Aggregate tables Pros: – ● High Performance Cons: – Maintenance: refresh is expensive – Storage cost 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 13
  • 14. Data Warehouse Systems --DWS Benchmarks ● APB-1 OLAP Benchmark --obsolete – – ● Released by the OLAP Council (www.olapcouncil.org) in 1998 A simple star schema data model TPC DSS Benchmark – Released by the Transaction Processing Council (www.tpc.org) – Examine large volumes of data (from 10GB to 100TB) – Complex relational data model – TPC-H Workload composed of 22 ad-hoc complex SQL Statements ● The most prominent DSS benchmark TPC-DS -successor of TPC-H ● – ● ● 30th May 2013 Workload composed of a 99 SQL business questions Same metrics than TPC-H DWS in the Cloud, AICCSA'13, Fez 14
  • 15. Data Warehouse Systems --TPC-H Benchmark Metrics (same for TPC-DS) ● Query-per-hour Performance Metric – – ● For a given scale factor (warehouse data volume) Concurrent users Price-Performance Metric – 30th May 2013 Ratio of Priced System (cost of ownership: hardware, software, maintenance, and cost of everything needed to run the TPC6H workload) to Query performance Metric DWS in the Cloud, AICCSA'13, Fez 15
  • 16. Data Warehouse Systems --TPC-H mismatches Cloud Rationale ● TPC-H Does not represent BI suites – – Analytics services (Multi-dimensional Language, Mining Structures) – ● Integration services Reporting services eXpressions TPC-H Workload Processing Metric – Qph@Size defines the number of queries processed by hour – The workload is assumed static, which is not realistic! – The benchmark should assess the SUT scalability under variable and evolving workload and data volumes 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 16
  • 17. Data Warehouse Systems --TPC-H mismatches Cloud Rationale (ctnd.1) ● TPC-H Cost-Performance Metric – $/Qph@Size, where the cost relates to all of hardware, software and HR required for running the workload (3yrs) – The cost model in the cloud is different, and does relate to the cost of ownership ● TPC-H does not report a Cost-Effectiveness Metric ● not TPC-H implementation vs. CAP theorem – CAP theorem: A distributed system can not fulfill both Consistency (same view of data), Availability (query response) and Partition Tolerance (cope with hardware crash). – Since DWS deployments are onto shared-nothing architectures, benchmarks should be either CA, CP and AP-compliant. 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 17
  • 18. New Requirements & New Metrics NewRequirements & New Metrics 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 18
  • 19. High Performance Requirement High Performance Requirement --Data Transfer IN/ OUT CSP ● Data Transfer Characteristics – Huge data volumes transfer IN and OUT the Cloud Service Provider – Resulting in Network-bound DWS – Usually, the cost model adopted by CSPs is: ● ● ● Data upload IN the CSP is free of charge Data download OUT the CSP is priced Data Transfer Metrics in the Cloud – – 30th May 2013 Time and cost for data upload Time and cost for data download DWS in the Cloud, AICCSA'13, Fez 19
  • 20. High Performance Requirement High Performance (ctnd. 1) Requirement --Workload Processing ● Workload Processing Characteristics – – ● Both I/O-bound and CPU-bound business questions Intra-query processing combined with virtual partitioning or physical processing Performance across Cluster Size – – 30th May 2013 For each business question, there is an optimum response time for a particular cluster size and performance degrades from this optimum onward and backward Proved for both SQL and NoSQL technologies DWS in the Cloud, AICCSA'13, Fez 20
  • 21. High Performance Requirement High Performance (ctnd.2) Requirement --Workload Processing ● 30th May 2013 TPC-H benchmarking of Apache Hadoop/Pig Latin on GRID5000 -Bordeaux Site [Moussa,ICCIT'12] (SF=10) DWS in the Cloud, AICCSA'13, Fez 21
  • 22. High Performance Requirement High Performance (ctnd.3) Requirement --Workload Processing ● Workload Processing Metrics – – 30th May 2013 Elapsed times for running business questions, Slope: performance - cost DWS in the Cloud, AICCSA'13, Fez 22
  • 23. Scalability Requirement ● Definition – ● Scalability is the ability of a system to increase total throughput under an increased load when hardware resources are added.. Scalability Metric – Query Performance Metric under ● ● 30th May 2013 Ever increasing workload Different query frequencies DWS in the Cloud, AICCSA'13, Fez 23
  • 24. Elasticity Requirement ● Definition – ● Elasticity adjusts the system capacity at runtime by adding and removing resources without service interruption in order to handle the workload variation. Elasticity Metric – – Scaling Latency: elapsed time to scale-down and scale-up – Impact on SUT performances during scale-up and scale-down – Scale-up cost (+$) – 30th May 2013 Capacity to add/remove resources: (0|1) Scale-down gain (-$) DWS in the Cloud, AICCSA'13, Fez 24
  • 25. High Availability Requirement –- Redundancy Strategies ● Redundancy Strategies – – ● Replication (a.k.a. mirroring) Erasure-Resilient Codes Redundancy Strategies vs. Workload Type – – ● Replication suits OLTP workload Erasure-resilient codes suits OLAP workload Comparison [Litwin et al.,ACM TODS'05] – – Computation cost – 30th May 2013 Data storage cost Communication cost DWS in the Cloud, AICCSA'13, Fez 25
  • 26. High Availability Requirement –-Strategies Comparison (ctnd.1) 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 26
  • 27. High Availability Requirement --Metrics for the Cloud (ctnd.2) ● High Availability Metrics – $@k: Cost of different targeted levels of availabilities (1-available, . . . , k-available, i.e. the number of failures the system can tolerate). – Cost of recovery expressed ● ● 30th May 2013 Time to get system back Decreased system productivity caused by the hardware failure ($) from customer perspective DWS in the Cloud, AICCSA'13, Fez 27
  • 28. Cost Management Requirement ● CSP price cost model – Different cloud service price models (IaaS, PaaS, SaaS) – e.g. CPU cost for IaaS: Instance based (Amazon, MS Azur) or CPU-cycles based (Cloud Sites, Google App Engine) ● Query processing by Google BigQuery is based on retrieved bytes (columnar storage) Cost-Performance Ratio ● ● ● 30th May 2013 Cost-Effectiveness ratio DWS in the Cloud, AICCSA'13, Fez 28
  • 29. Related Work ● Benchmarking in the cloud – [Gray,MS'08]: Terasoft Benchmark for data sort evaluations, – [Cooper et al., SoCC'10]: Yahoo Cloud Serving Benchmark (YCSB) for evaluating the performance of "key-value" and "cloud" serving stores. – [Sobel et al., ICCSA'08]: CloudStone Benchmark for Web2.0 applications – [Bennet et al., KDD'10]: MalStone Benchmarking for data mining in the cloud – [Ang et al., USENIX'10]: CloudCMP project for CSP comparison – [Binnig et al., DBTest'09], [Kossmann et al., SIGMOD'10]: Benchmarking OLTP systems in the cloud ● 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 29
  • 30. Related Work (ctnd.1) ● NoSQL and SQL Technologies Assessment in the cloud – – ● [Pavlo et al. SIGMOD'09], [Floratou et al., TPC-TC'11 ], More Specific Issues – [Forrester, 2011]: Storage on-premises vs. in the cloud – [Nguyen et al., EDBT Workshops'12]: Materialized Views Selection – [Moussa, IJWA'12]: OLAP Scenarios in the Cloud and OLAP Workload Texonomy 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 30
  • 31. Conclusion & Future Work ● Keynote scope – Overview of DWS – Insight of new requirements and new metrics to be considered for benchmarking DWS in the cloud [Moussa, AICCSA'13] ● Research Perspectives – Assessment of OLAP systems in the cloud e ● ● ● ● 30th May 2013 Amazon RDS Google BigQuery MS Azure ... DWS in the Cloud, AICCSA'13, Fez 31
  • 32. Research Perspectives --New OLTP Systems ● Classical Workload Taxonomy – – ● OLTP: Transactions, ACID properties OLAP: complex queries, star-joins, grouping, aggregations... New OLTP Workload features: – – Big Data – ● OLTP Real-time analytics Examples of systems: Google Spanner, Clustrix, NuoDB and TransLattice 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 34
  • 33. Thank you for Your Attention Q&A ? Rim Moussa Data Warehouse Systems in the Cloud N2C'2013, Hammamet 30th May 2013 15th June 2013 DWS in the Cloud, AICCSA'13, Fez 35
  • 34. Data Warehouse Systems --TPC-H Benchmark Relational DB Schema 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 36
  • 35. Data Warehouse Systems --TPC-H Benchmark Metrics 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 37