Data Synchronization

Data Synchronization is a process of establishing consistency among systems and subsequent continuous updates to maintain consistency. The word 'continuous' should be stressed here as the data synchronization should not be considered as a one-time task. It is really a process which needs to be planned, owned, managed, scheduled and controlled.

Motivation

Let us present two scenarios in which data synchronization is crucial for an enterprise.

  • In any enterprise, there are often at least 10 systems which are sharing the same data - customer data, product data, employee data, customer support systems, billing and invoicing systems, etc. In order to make the company's manufacturing process auditable, each activity needs to be properly logged. For example, if a company is in car manufacturing business, for each car it need to log from which parts the car was assembled, the part's serial numbers, lot numbers, part supplier; the employee id who has mounted the part; and in the end to whom has the car been sold and the car's service history including the service station and possibly even the technicians and spare parts. Each of the company's production systems, however, has a partial piece of information - the pieces that it actually needs for operation. For example, the system logging the car assembly stores the information about employees; similarly it needs to access the information about suppliers and parts in stock. Even though several applications/systems use the same data, the data is captured only by one application. The data then needs to be synchronized to other systems.
  • With the advent of the internet and increasing international business, many companies choose to distribute their systems geographically to reduce the latency and cost of the network usage and to increase reliability (by reducing the risk of e.g. natural disaster affecting the location). The systems in all locations, however, do need to have the same data even though the data is modified in several locations in parallel. The data needs to be synchronized across all locations.

Process

Planning

Requirements on the data synchronization should be gathered in the planning phase. This needs to cover the data content, data formats, initial load and frequency of the updates. Non-functional requirements like performance, timing and security should be covered as well.

Ownership

Although the data synchronization idea may come from the IT organization of the company, an owner or champion from the company business is necessary to provide a continuity of the initiative. It is business who will benefit from the data synchronization initiative in the end.

Scheduling

The scheduling and frequency of updates is one of the items which need to be investigated during initial planning phase. Often the requirements change during this time and the schedule updates needs to be revised. Obviously, the granularity of a schedule on which the updates/synchronization is performed cannot be finer than the source system is able to provide. However, the scheduling also needs to take into account performance aspects (see the section Challenges below).

Monitoring

The synchronization process should be monitored to evaluate whether the update schedule and frequency meets the company's needs.

From the technical point of view, the synchronization may be implemented on any level:

  • System/Application level
  • File level (may include even version control)
  • Record level synchronization

Challenges

Data Formats Complexity

As the enterprise grows and evolves, new systems from different vendors are implemented. The data formats for employees, products, suppliers and customers vary among different industries which results not only in building a simple interface between the two applications (source and target), but also in a need to transform the data while passing them to the target application. The data formats, of course, vary from proprietary formats through plain text to xml. Some of the applications provide API to push the data directly. ETL tools can be helpful here.

Real-timeliness

The requirement today is that the systems are real time. Customers want to see what the status of their order in e-shop is; the status of a parcel delivery - a real time parcel tracking; what the current balance on their account is; etc. Enterprises need to have their system real-time updated as well to enable smooth manufacturing process, e.g. ordering material when enterprise is running out stock; synchronizing customer orders with manufacturing process, etc. There are thousands of examples from real life when the real time is becoming either advantage or a must to be successful and competitive.

Main challenge with real time data synchronization is to work with systems which do not provide any API to identify the changes. In such cases, performance may be the limitating factor.

Security

Different systems may have different policies to enforce data security and access levels. Even though the security is maintained correctly in the source system which captures the data, the security and information access privileges must be enforced on the target systems as well to prevent any potential misuse of the information. This is particularly an issue when handling personal information or any piece of confidential information under Non Disclosure Agreement (NDA). Any intermediate results of the data transfer as well as the data transfer itself must be encrypted.

Data Quality

Maintaining data in one place and sharing with other applications is best practice in managing and improving data quality. This prevents inconsistencies in the data caused by updating the same data in one system.

Performance

The data synchronization process consists basically of five phases:

  1. Data extraction from the source/master system
  2. Data transfer
  3. Data transformation
  4. Data transfer
  5. Data load to the target system

In case of large data, each of these steps may impact performance. Therefore, the synchronization needs to be carefully planned to avoid any negative impact e.g. during peak processing hours.

Maintenance

As any other process, the synchronization process needs to be monitored to ensure that it is running as scheduled and properly handling any errors during the process of synchronization such as rejected records or malformed data.