DMX-h: A Smarter Approach to Hadoop ETL & Hadoop Sort

Unleash Hadoop’s potential with a smarter approach to sorting, integrating and processing Big Data

As organizations ramp up their Hadoop implementations, they often face challenges that can undermine its potential. DMX-h offers a unique approach to Hadoop Sort and Hadoop ETL, that lowers the barriers for wider adoption, helping organizations unleash the full potential of Hadoop. With Syncsort, organizations can do Hadoop sort and Hadoop ETL in a smarter way. Eliminate the need for custom code, get smarter connectivity to all your data, and improve Hadoop’s processing efficiency.

Using Amazon EMR?
Check out Ironcluster™ - Hadoop ETL for Amazon EMR.

DMX-h Hadoop ETL Edition: Everything you need to turn Hadoop into a Smarter ETL Tool

Deploy Hadoop ETL without the need to learn or acquire complex MapReduce skills. Design any data flow, get smarter connectivity to all sources and targets, and accelerate performance.

Getting data in and out of Hadoop, developing ETL jobs, and optimizing MapReduce jobs, typically require mastering disparate tools, as well as writing and maintaining hundreds of lines of code. DMX-h Hadoop ETL Edition provides a unique and smarter approach. One ETL tool to connect and develop. No coding, no tuning; just Smarter Hadoop ETL.


Graphical Hadoop ETL Development – Hadoop within Reach!

Hadoop is not an ETL tool on its own. Therefore, deploying ETL in Hadoop requires organizations to acquire a completely new set of advanced programming skills that are expensive and difficult to find. DMX-h Hadoop ETL Edition makes it easy, even for non-data-scientists to build and deploy ETL jobs in Hadoop.
With DMX-h Hadoop ETL Edition, users can start developing Hadoop ETL jobs within hours, and become fully productive within days, using a drag-and-drop interface and the same ETL skills they already have. No need to learn complex MapReduce, Pig or Hive.


Zero-code Engine – Fear no more!

Code generation is the only feature we didn’t include in DMX-h Hadoop ETL Edition. DMX-h is the only ETL tool to offer a truly zero-code approach to Hadoop ETL. Other tools like Informatica or Talend, generate code that later can become a nightmare to maintain and tune.
DMX-h is not a code generator. Instead, Hadoop automatically invokes the highly efficient DMX-h runtime engine, which executes on all nodes as an integral part of the Hadoop framework. This means there is no code generation, just faster MapReduce jobs.
A zero-code engine means you will never have to worry about understanding, maintaining, and tuning thousands of lines of code, you didn’t even write in the first place!


Use Case Accelerators – Get More Value, Faster!

Hadoop can be a great platform for common tasks such as processing click-stream data, web logs, change-data-capture, lookups, hash joins, and more. DMX-h Hadoop ETL Edition includes a growing library of these and other Use Case Accelerators to give developers a head start to quickly deploy these and other use cases in Hadoop.
Users can also leverage these accelerators as building blocks to deploy even more sophisticated data flows.


Smarter Connectivity – Faster Loads & Extracts without hand coding!

Getting data in and out of Hadoop can undermine your ability to extract value from your investment. Simply loading data into Hadoop requires programmers to manually write custom scripts to parse, transform and then load data into HDFS.
DMX-h Hadoop ETL Edition includes a comprehensive set of high-performance connectors for every major RDBMS, appliances, XML, flat files, and legacy systems. DMX-h writes data directly to HDFS using native Hadoop interfaces. DMX-h can partition the data and parallelize the loading processes to load multiple streams simultaneously into HDFS; all without writing a single line of code.


Mainframe Connectivity – Big Iron is Big Data Too!

If you have a mainframe, you know Big Data is not entirely new. For over 40 years, Syncsort has provided the world’s highest-performing and most efficient solutions for managing “big data” on IBM mainframes. Today, DMX-h Hadoop ETL Edition offers unique capabilities to collect, process and analyze mainframe data with Hadoop opening up a wealth of opportunities by delivering deeper analytics, at lower cost.


Enhanced Metadata – More Reusability, Less Risk!

DMX-h Hadoop ETL Edition provides built-in metadata capabilities enabling greater transparency into impact analysis, data lineage, and execution flow.
DMX-h ETL Edition has no dependencies on third-party systems such as relational databases. All metadata is file-based, which means it’s easier to manage, share and reuse. Distributed teams can easily develop MapReduce ETL jobs in Windows to then deploy them into Hadoop.


Smarter Scalability – Faster Performance per Node!

Thanks to its Smart ETL Optimizer, DMX-h Hadoop ETL Edition can help organizations process more data in less time without the need to constantly add more nodes to the cluster.

The Smart ETL Optimizer dynamically optimizes performance as well as CPU and memory utilization of MapReduce ETL jobs by adapting to hardware architectures, operating system and data characteristics at run-time. The result is maximum vertical scalability that fully exploits the processing characteristics of each node without the need for manual tuning.

From the Blog

March 18, 2013
5 Pitfalls to Avoid with Hadoop

A couple of weeks ago, during the Strata Conference in Santa Clara, Syncsort announced a significant milestone in our quest to make Hadoop more mature environment for the enterprise... read more