Syncsort Open Sources Technology to Make Mainframe Big Data Available to Apache Spark
Apache Spark Mainframe Connector Now Available on Databricks’ Spark Packages Community Site
Woodcliff Lake, NJ – September 01, 2015
Syncsort, a global leader in Big Data software, today announced a milestone open source contribution of an IBM z Systems mainframe connector for Apache Spark. The contribution will enable organizations to easily access and get new insights from their critical mainframe data with Apache Spark
’s advanced analytics capabilities and Spark SQL. The connector is now available on Spark Packages
Spark Packages, launched by Apache Spark pioneer Databricks
– the organization founded by the team that created and continues to drive Apache Spark – makes it easy for users to find, discuss, rate and install packages that enhance Spark’s capabilities.
“One of the key elements of Spark’s continued success is its integration with important data sources,” said Matei Zaharia, creator of Apache Spark and co-founder & CTO of Databricks. “We are excited that Syncsort has made this valuable contribution to the Apache Spark community, making mainframe data easily available for use within Spark.”
Spark has emerged as one of the most active big data open source projects, initially as the lightning fast memory-optimized processing engine for machine learning and now as the single compute platform for all types of workloads including real-time data processing, interactive queries, social graph analysis, and much more. Given its success, there is a growing need to securely access data from a diverse set of sources, including mainframes, and to transform the data into a format that is easily understandable by Spark.
“Syncsort’s open source contribution makes it easy to get real-time insights from mainframe data using Apache Spark’s advanced analytics capabilities and Spark SQL interactive queries,” said Tendü Yoğurtçu, general manager of Syncsort’s Big Data business. “We believe that Apache Spark will play a critical role in a wide variety of next-generation use cases, including streaming ETL and the Internet of Things. We will continue to contribute to Spark and related Big Data projects to enable a uniform user experience for batch and real-time workloads across all data sources.”
Syncsort’s mainframe connector for Spark is similar to the Apache Sqoop mainframe connector that Syncsort released as open source last year
. Customers simply specify the location of multiple datasets and the associated COBOL copybook metadata and the Spark mainframe connector automatically transfers the datasets in parallel via a secure connection into Spark’s DataFrame objects. Users can then manipulate this DataFrame object and join it with their other data sources for further analysis. Syncsort’s mainframe connector conforms to Spark's Data Sources API specification, and because of Spark’s ability operate on data in memory, the connector will allow queries to access mainframe data without offloading the data first. Mainframe record formats including fixed, variable, sequential and VSAM files are all supported. The connector also handles compressed data transfer, minimizing network bandwidth and optimizing overall elapsed time.
To download the Spark Mainframe Connector on Spark Packages, click here
Syncsort provides enterprise software that allows organizations to collect, integrate, sort and distribute more data in less time, with fewer resources and lower costs. Syncsort software provides specialized solutions spanning “Big Iron to Big Data,” including next gen analytical platforms such as Hadoop, cloud, and Splunk. For more than 40 years customers have turned to Syncsort’s software and expertise to dramatically improve performance of their data processing environments, while reducing hardware and labor costs. Experience Syncsort at http://www.syncsort.com
Director, Corporate Communications