DMX-h: A Smarter Approach to Hadoop ETL & Hadoop Sort

Unleash Hadoop’s potential with a smarter approach to sorting, integrating and processing Big Data

As organizations ramp up their Hadoop implementations, they often face challenges that can undermine its potential. DMX-h offers a unique approach to Hadoop Sort and Hadoop ETL, that lowers the barriers for wider adoption, helping organizations unleash the full potential of Hadoop. With Syncsort, organizations can do Hadoop sort and Hadoop ETL in a smarter way. Eliminate the need for custom code, get smarter connectivity to all your data, and improve Hadoop’s processing efficiency.

Using Amazon EMR?
Check out Ironcluster™ - Hadoop ETL for Amazon EMR.


DMX-h Sort Edition: A Smarter Sort for A Big Data Platform

Maximize the return on your Hadoop investment by increasing the scalability and efficiency of every node in the cluster.

For over 40 years, Syncsort has been the undisputed leader in high-performance sort for mainframe and open systems. Now you can benefit from the same technology in Hadoop. No need to make any changes to existing code, simply plug-in DMX-h Hadoop Sort to seamlessly accelerate MapReduce operations in Hadoop deployments.

Divider

Smarter Hadoop Sort Means Faster MapReduce

As much as 80% of all ETL processing is spent sorting data. Joins, aggregations, rankings, database loads, and more; all depend on sorting data. Hadoop is no exception. In fact, all MapReduce jobs involve sort for both the Map as well as the Reduce steps. Unfortunately, the native Hadoop sort, has limited performance capabilities, which can force organizations to spend precious IT resources tuning jobs or to add more nodes to achieve the desired performance.
 
Thanks to Syncsort’s recently committed contribution to the open source community – MAPREDUCE-2454  – sort is now a pluggable component of Hadoop. This means you can now run DMX-h – the fastest, most efficient sort tool – natively within the MapReduce framework, to seamlessly optimize Map-Sort and Reduce-Merge operations.

Divider

New and Expanded Use Cases

DMX-h Sort Edition enables more sophisticated manipulation of data, making Hadoop a more robust environment for the enterprise.
 
With DMX-h Sort Edition, organizations can more easily implement and optimize new use cases typical of enterprise ETL implementations, including:

  • Hash aggregations. Optimized hash-based aggregations can provide significant performance benefit for applications such as log analysis and queries on large data volumes.

  • Optimized full joins for change data capture (CDC). Critical data warehouse processes such as CDC require a full join.

  • Run jobs with a subset of data. Many applications, including data sampling, require processing a subset of the data, e.g. first N matches/limit N queries

  • No-sort option. Avoid sort altogether when not needed and/or redundant to minimize wasted resources.

Divider

Smarter Scalability

A smarter Hadoop sort also means smarter scalability. DMX-h Hadoop Sort Edition can help organizations process more data in less time without the need to constantly add more nodes to the cluster.

DMX-h Sort Edition dynamically optimizes performance as well as CPU and memory utilization of all sort-intensive computations by dynamically adapting to hardware architectures, operating system and data characteristics. The result is maximum vertical scalability that fully exploits the processing power of each node.

From the Blog

March 18, 2013
5 Pitfalls to Avoid with Hadoop

A couple of weeks ago, during the Strata Conference in Santa Clara, Syncsort announced a significant milestone in our quest to make Hadoop more mature environment for the enterprise... read more