How to Assess the Quality of Aggregated Routine Health Facility Data

(From left to righ) Amadou Diallo, Cheikh Faye, Ibrahima Gaye, Martine Njitchoung, Abdoulaye Maïga, Aluisio Barros, and Jean Christian Youmba review data during the workshop in Nairobi.

Resources to assess the quality of routine health facility data related to reproductive, maternal, newborn, child and adolescent health (RMNCAH) are now available, based on resources developed for a workshop hosted by Countdown and the Global Financing Facility (GFF) in Nairobi, Kenya, in June 2022 .

Data from individual health facilities is a potentially rich data source for informing policies and programs and tracking country progress toward goals, and the wide uptake of the DHIS2 software platform has made the data more accessible than ever. However, concerns and questions about data quality have frequently limited its full use. The World Health Organization has developed resources related to health facility data use, and Countdown has expanded on this work with a set of resources to assess data quality.


This data quality process includes:


The analysis included data elements related to pregnancy, delivery, family planning, child health, deaths that occurred in facilities, and population size, as listed below.




Compiling Data

When compiling the data, a few key things to check are if:

  • The number of districts is consistent across worksheets.
  • Spellings are consistent across worksheets. For example, if spellings include “Nairobi”, “Nairobi”, “Nairoby”, “NAIROBI”, “nairobi”, and “Nai-robi”, Stata will interpret that as six different values instead of one.
  • Invisible extra spaces are included in string variables.
  • Date (month and year) formats are consistent across worksheets.
  • Duplicate values are included.
  • Missing data exists, in terms of data not sent from the health facilities.
  • Zeroes have been inserted for missing values, and vice versa.

Assess and adjust for incomplete reporting

The Stata code checks the reporting rates by service type (family planning, antenatal care [ANC], vaccination, outpatient visits [OPD], and inpatient stays [IPD]) and year at the district level. Less than 90% is considered low reporting.

It is important to adjust for incomplete reporting, since it can have a major effect on levels and trends of coverage and other statistics derived from health facilities. If we do not consider completeness, we assume that no services are provided by the non-reporting facilities, which is usually going to be an incorrect assumption.

For this analysis, incompleteness can occur at two levels:

  • At the facility level, because facilities that did not report data; and
  • At the service level, because facilities did not report any services or expected level of services

Adjustment must account for both dimensions of incompleteness and must make an assumption about the level of care provided at health facilities that did not report data, compared to those that did. Adjustment for incomplete reporting formula is expressed as follows:

  N(adjusted) = Nreported + Nreported * (1/(c)-1)*k

 where  N=number of service outputs, c=reporting completeness,

k=adjustment factor.

  • k=0        no services in non-reporting facilities
  • k=0.25 some services, but much lower than reporting facilities
  • k=0.5     half the rate compared to reporting facilities
  • k=0.75 nearly as much as reporting facilities
  • k=1.0     same rate of services as reporting facilities

The choice of the adjustment factor (k) requires consultation with experts with knowledge of the country’s health system.

Extreme outliers and missing values

Large variations in the number of reported services provided may indicate a data quality problem, especially for interventions known to be provided at high coverage levels based on population-based surveys. However, fluctuations in the number of reported services can occur for valid reasons – such as population growth, changes programmatic activities, or the emergencies such as the COVID-19 pandemic. So, any adjustments should be made in consultation with experts who are aware of the local context.

For annual data, an extreme outlier is defined as any number in the dataset higher or lower than 5 standard deviations from the median absolute deviation (MAD) calculated from the preceding 3 years. The formula for identifying outliers is:

Median-1.4826*5*MAD < Xi > Median+1.4826*5*MAD ¥

Lower Bound = Median  – 1.4826 * 5 * MAD

Upper Bound = Median  + 1.4826 * 5 * MAD

where Xi is the value of the observation for a particular time period (year) and the MAD is defined as the median absolute deviation (MAD = median(|Xi – X~|), where X~ is the median of the three preceding years).

The example below shows assessment of extreme outliers at the district level, by quarter.



Extreme outliers can be corrected by imputing a value based on the median value of the calendar year. A similar imputation can be conducted for missing values.

Internal consistency

Internal consistency of services is checked by comparing:

  • The number of reported first antenatal care visits (ANC1) and the number of first doses of pentavalent vaccine (Penta1) and
  • The number of Penta1 and of third dose of pentavalent vaccine (Penta3)

One method is to calculate a ratio between these two numbers. Ratios would be expected to range from 1.0 – 1.5; other values require further examination of the data.

A second method is to calculate the absolute difference between the expected and the reported ratios of the two indicators. The expected ratio is calculated based on household survey data, such as the Demographic Health Surveys {LINK], if recent data with appropriate level of disaggregation are available. This metric is interpreted as:

  • < 5 suggests good quality
  • 5-14.9 suggests moderate quality
  • ≥ 15+ suggests poor quality

Using a scatterplot to compare ANC1 to Penta1 and Penta1 to Penta3 is also helpful for understanding potential data quality problems.


An in-depth data check is recommended to confirm any inconsistencies which should be corrected with a clear audit trail.

Data quality scorecard

Finally, it is helpful to summarize the data quality assessments into a scorecard, as shown below.


PowerPoint presentation from Nairobi workshop:

Stata do-files:

More resources are available on the Health Facility DAC page.