Data Validation Strategies

The level of trust and dependency on health catchment area geospatial databases depends on how valid the information contained within these databases are. Increased levels of trust in a milieu of open data policies promotes reliability and widespread usage of these databases. From the methodologies reviewed, government and assigned government agencies are relied on by NGOs to validate data collected and intended to be used for programme work (mandated list/master list of health facilities). However, in situations where government’s capacity to validate such data is limited (such as when master lists or maps do not exist for comparison), the data collected by these NGOs and humanitarian organizations become the standard, which is then used to build a more comprehensive and mandated database or used as a benchmark to validate subsequent lists and maps. An effective validation of locations of health facilities, boundaries of villages and health catchment areas depends on community health workers and volunteers living within the communities which are being mapped. Their inclusion in the initial stages of compiling such lists or mapping boundaries of villages creates a sense of ownership for the data and promotes its maintenance over time.

“Maps should be used to help people rather than just create reports.” - Key Informant Interview

A review of data validation strategies from organizations working to populate health catchment area databases revealed various schemata that are used to guide this process. Some of these schemas have been highlighted below:

Indicators of Data Quality (Source: OSM)20

  1. Geographic coverage (relative completeness of data)

  2. Attribute completeness (to assess fit for use)

  3. Temporal accuracy (assess age)

  4. Data errors (ensure quality)

  5. Logical Consistency

Location Validation Index (Source: Healthsites.io)21

  1. User status (new user or trusted user)

  2. Attribute completion (percentage of complete attributes)

  3. Temporal nature of each update (time since last update)

  4. Attrition (confidence in data decays if it goes unvalidated with time)

Last updated