What Is Data Harmonization?
Users of analytical and reporting apps often work with the end product without realizing the complexity of the process underlying data dissemination.
At the same time, data harmonization, which in effect is invisible to the user, is one of the crucial stages of the data management process.
Data harmonization allows integrating disparate indicators and joining them in a way that makes them compatible with each other, with data relevant for a limited group of users extracted and set aside (the so-called metadata). The need for data harmonization within any analytical solution is absolutely natural and is dictated by the human striving for arranging information in order and categorizing data for easier access. On the other hand, different systems are supposed to interact with each other using a common language; it is their integration that is reached via the harmonization and standardization processes.
When developing solutions connected with data management systems we often encounter diverse interpretations of indicators. It occurs as a result of the method of calculation, collection, source and structure of the original data.
In our example we have two tables taken from popular world statistical data sources on agriculture: Eurostat and FAOStat. Both tables demonstrate data about grapes production in Italy and France in the period from 2009 to 2018. As you can see, the same information is represented differently: FAOStat uses tons for the measure, whereas Eurostat chooses thousands of tons. For further work with the data in the same set, they need to be harmonized.