Smarter Hadoop Sort Means Faster MapReduce
As much as 80% of all ETL processing is spent sorting data. Joins, aggregations, rankings, database loads, and more; all depend on sorting data. Hadoop is no exception. In fact, all MapReduce jobs involve sort for both the Map as well as the Reduce steps. Unfortunately, the native Hadoop sort, has limited performance capabilities, which can force organizations to spend precious IT resources tuning jobs or to add more nodes to achieve the desired performance.
Thanks to Syncsort’s recently committed contribution to the open source community – MAPREDUCE-2454 – sort is now a pluggable component of Hadoop. This means you can now run DMX-h – the fastest, most efficient sort tool – natively within the MapReduce framework, to seamlessly optimize Map-Sort and Reduce-Merge operations.