BDOOP meetup: Benchmarking Hadoop Workloads

ATTENTION: THIS TEMPLATE IS DEPRECATED

Benchmarking Hadoop Workloads

Designing a new Hadoop cluster or optimizing a running one is becoming an art! The high number of hardware and software configuration options such as on-premise vs. on-cloud clusters,SATA array vs. SSDs, distribution, and tuning more than 100 Hadoop parameters can greatly affect executions times, scalability, and of course the Total Cost of Ownership (TCO) of data processing! Benchmarking allows us to learn and plan how an application behaves and scales to the different configurations and loads, and how resources that are utilized in other to make better decisions and optimizations.

This talk will present HiBench (https://github.com/intel-hadoop/HiBench), a benchmark suite developed by Intel which includes several ready to use benchmarks of different categories. Where we will see the different resource requirements such as I/O, Memory, or CPU requirements for each workload, and how Hadoop configurations can affect the total running time and TCO of the cluster. Results and insights will be discuss on how to improve the performance and reduce the TCO of clusters.

Agenda:

19:00 - Arrive at Itnig and meet other members
19:15 - Talk: Benchmarking Hadoop Workloads (by Nico Poggi)
19:45 - Q&A and dicussion of topics
20:00 - Networking and beers