Syncsort Integrate products let you unlock valuable data from legacy systems such as the mainframeLEARN MORE
Data Integration Defined
Data integration is a common industry term referring to the requirement to combine data from multiple separate business systems into a single unified view, often called a single view of the truth. This unified view is typically stored in a central data repository known as a data warehouse.
For example, customer data integration involves the extraction of information about each individual customer from disparate business systems such as sales, accounts, and marketing, which is then combined into a single view of the customer to be used for customer service, reporting and analysis.
Data integration occurs when a variety of data sources are blended into a single database, offering users of that database efficient access to the information they need. Collecting significant amounts of data might not be much of a challenge in the modern world, but properly integrating that data remains difficult in some circumstances.
How Data Integration Benefits Businesses
There is sometimes a disconnect between the management of data and the practical application of what that data can do for an organization. It is the role of data integration to bridge that gap, permitting the data to be far more useful than it was previously.
For instance, it is common for data to exist in silos – that is, in separate databases which are each focused on a specific type of customer, product, location, etc. Individually, these silos of data may not be particularly useful, but they can be quite powerful when integrated. Of course, that integration needs to happen in an efficient and logical manner for it to be beneficial. The challenge here is significant, and the job is typically best handled by an experienced team and a data integration tool, such as Syncsort's DMX and DMX-h.
Data Integration at Scale
Smaller organizations frequently fail to have their data properly integrated. In the early stages, it’s just a matter of not having the need – the organization does not collect enough data to warrant investment in data integration. As the business scales up, however, the needs change and suddenly data integration is of the highest importance for continued growth. Working on integration a bit too early as opposed to a bit too late is going to yield better results for the company moving forward.
ETL and ELT
There are a number of different data integration approaches which can be used to populate a data warehouse, two of the most common are:
- ETL – Extract, Transform and Load. Data is extracted from multiple source systems, transformed into a usable format, and then loaded into a data warehouse where users can query and report on the unified view of the data.
- ELT – Extract, Load and Transform. Data is extracted from source systems, loaded into a database staging area, and then transformed to a usable format within the database. This approach may be necessary to meet check-pointing, auditing, and security requirements.
The trade-off with ELT is that storage space and CPU usage within the database can be relatively expensive. Using ETL to perform the transformation step outside of the database reduces the storage and CPU load on the data warehouse, making ETL a more cost-effective approach for many applications.
Data Integration Criteria
Three major data integration criteria to consider when building a data warehouse are:
- Freshness – Since data integration processes are executed periodically, data freshness refers to the delay between when a change occurs on a source system to when the change appears in the data warehouse.
- Granularity – Data from source systems is often summarized or aggregated during the data integration process. Data granularity refers to the level of detail that will be stored in the data warehouse.
- Matching – The business rules that define how data from different systems should be matched can be quite complex. For a simple example – is “Bill Smith” the same individual as “William Smith”?
These criteria will affect the performance and storage requirements for the data warehouse, and hence the cost of the hardware and software required for the project.