A Case Study about Virtualized Hadoop Performance on VMware: Benchmarking
Executive Summary: The performance of three Hadoop applications is reported for several virtual configurations on VMware vSphere 5 and compared to native configurations. A well-balanced seven-node AMAX ClusterMax system was used to show that the average performance difference between native and the simplest virtualized configurations is only 4%. Further, the flexibility enabled by virtualization to create multiple Hadoop nodes per host can be used to achieve performance significantly better than native.
Introduction: In recent years the amount of data stored worldwide has exploded, increasing by a factor of nine in the last five years. Individual companies often have petabytes or more of data and buried in this is business information that is critical to continued growth and success. However, the quantity of data is often far too large to store and analyze in traditional relational database systems, or the data are in unstructured forms unsuitable for structured schemas, or the hardware needed for conventional analysis is just too costly. And even when an RDBMS is suitable for the actual analysis, the sheer volume of raw data can create issues for data preparation tasks like data integration and ETL.
Click here to read more on Virtualized Hadoop Performance