Data integrity and quality findings on Croatian open data

Dec 3, 2022

The volume of data held by the public sector is constantly increasing. The data ranges from sensitive personal information used to deliver personalized services (such as health and social care) to non-personal information (such as environmental data). This data is useful to the organization that collects and holds it, but there is the possibility of making it even more useful by letting others use it again. 

Open data is information that is freely available to everyone to use, reuse, and share. Open data is data that is made available, via the internet, in an electronic format that supports its ready re-use, and with open licensing that allows its reuse.

There’s no question that the open data movement has helped to make data more accessible. But what happens when you attempt to use open data in a project? You encounter some serious issues—such as data integrity and quality.

Data integrity is the most important part of data analytics. The confidence that your insights are accurate is what makes business intelligence work. Ensuring that your data is accurate and reliable at every stage of its life cycle is essential.

Data integrity starts with gathering high-quality data from trusted sources in a timely fashion, integrating it into one place, and then analyzing it with confidence. Organizations can add additional data attributes, like location intelligence, to the data to make better decisions based on all the information available.

The concepts of data quality and data integrity are often discussed in the open data domain. Data Quality is about ensuring that your data is accurate, complete, consistent, timely, valid, and unique. It is an assessment of how well your information meets your needs and expectations. Data Quality refers to the degree to which data conforms to a given standard or set of rules.

The issue is that there is no single entity who controls what gets published as open data. That makes it hard for people to check if the data is accurate or if it has been changed in any way.

The good news is that we can learn from how open-source software projects deal with this issue. One of the most important lessons we can learn from these projects is how they test their code before releasing it. These projects also perform extensive validation on the quality of their products before releasing them for public consumption.

And we should carry this idea further by ensuring that all organizations that publish open data adhere to rigorous standards when publishing their datasets. This will help them build a reputation as a reliable source and build trust in their products and services.

Besides meeting rigorous quality standards, we also need to address the gap between the publication side of the open datasets and the (re)user side of the open datasets. Because publishing organizations have very little insight into the use case opportunities coming from their data, it is difficult to envision utilization domains and improve data towards those. This is valid for both commercial and private company use cases, as well as other governmental agencies using the same datasets.

OpenDataConnected Kick-Off – 7th December 2022

Slidedecks from the event on the 7th of December 2022 Opening presentation by Svebor Prstacic - How to make Open Data Great again HrOpen-Making-Open-Data-Great-Again-presentation_OpenDataConnected Introduction by Hans de Raad - The state and challenges of Croatian...

How to go from here? Future ambitions and follow-ups

An organizations ability to use its data effectively and reliably is dependent on the quality of that data. For example, if you have a database of company names and addresses that is accurate and valid, but you don't also have other data that tells you more about...

Lessons learned from business models based on open data

Different business models and understanding which one suits their goal are challenges for open data-driven organizations. Many data-driven organizations, especially small- to medium-sized enterprises, are still struggling to generate revenue by adapting to the changes...

Open Data strategy and targets from a public perspective

In this Topic, we will look at governmental organizations and political parties with strong open-data policies and ambitions. Several obstacles impede an ambitious policy towards publication of open data. Investments in infrastructure, software, and staff resources...

Reframing OpenDataConnected

A true snowball effect has begun around OpenDataConnected! The result of a relatively modest import of Croatian open data sources into our Aranei data management platform resulted in an international group of interested and helpful people. After many considerations...