Creating Customized Indicators for surveyPrev

Qianyu Dong and Zehang Richard Li

2024-05-30

In this vignette, we provide an overview of the list of DHS indicators currently implemented in surveyPrev and the process to add new indicators or create customized indicators.

First we load the surveyPrev package, and any packages used in the customized indicator processing function later. In our example, dplyr and labelled are used.

library(surveyPrev)
library(dplyr)
library(labelled)
library(kableExtra)

In order to use getDHSdata() to download the relevant DHS data directly from the DHS website, we need to

  1. register with DHS to gain data access, and
  2. set up a DHS account details in R using rdhs package, i.e.,
rdhs::set_rdhs_config(email = "your_email", project = "your_registered_DHS_project_title")

1 Built-in indicators

Currently, the surveyPrev package supports 22 indicators listed in Table 1. The list of indicators and their IDs can be found in the surveyPrevIndicators dataset.

data(surveyPrevIndicators)
head(surveyPrevIndicators)
Table 1: List of built-in indicators in the surveyPrev package.
ID alternative.ID Description Topic
AN_ANEM_W_ANY womananemia Percentage of women aged 15-49 classified as having any anemia Nutrition
AN_NUTS_W_THN Underweight (Prevalence of underweight (BMI < 18.5) women of reproductive age) Nutrition
CH_DIAT_C_ORT Diarrhea treatment (Children under five with diarrhea treated with either ORS or RHF) Child Health
CH_VACC_C_BAS Children age 12-23 months with all 8 basic vaccinations Child Health
CH_VACC_C_DP1 Percentage of children 12-23 months who had received DPT 1 vaccination Child Health
CH_VACC_C_DP3 DPT3 Percentage of children 12-23 months who had received DPT 3 vaccination Child Health
CH_VACC_C_MSL Percentage of children 12-23 months who had received MCV 1 (Measles containing vaccine) Child Health
CH_VACC_C_NON Children 12-23 months with no vaccinations Child Health
CM_ECMR_C_NNR nmr Probability of dying in the first month of life in the five or ten years preceding the survey Infant and Child Mortality
CN_ANMC_C_ANY Children under five with any anemia Nutrition
CN_BRFS_C_EXB Children exclusively breastfed (Prevalence of exclusive breastfeeding of children under six months of age) Nutrition
CN_NUTS_C_HA2 stunting Stunting rate (Prevalence of stunted (HAZ < -2) children under five (0-59 months)) Nutrition
CN_NUTS_C_WH2 wasting Wasting rate (Prevalence of wasted (HAZ < -2) children under five (0-59 months)) Nutrition
FP_CUSA_W_MOD Modern contraceptive prevalence rate (Percentage of women currently using any modern method of contraception) Family Planning
FP_NADA_W_UNT unmet_family Percentage of women with an unmet need for family planning Family Planning
HA_HIVP_B_HIV HIV prevalence HIV prevalence
ML_NETP_H_IT2 Households with at least one insecticide-treated mosquito net (ITN) for every two persons who stayed in the household the previous night Malaria
RH_ANCN_W_N4P ancvisit4+ Antenatal visits for pregnancy: 4+ visits Maternal Healthcare
RH_DELA_C_SKP Assistance during delivery from a skilled provide Maternal Healthcare
WS_SRCE_P_BAS Population using a basic water source Water, Sanitation and Hygiene
WS_TLET_H_IMP sanitation Percentage of households using an improved sanitation facility Water, Sanitation and Hygiene
WS_TLET_P_BAS Population with access to a basic sanitation service Water, Sanitation and Hygiene

These indicators above can be directly processed within surveyPrev using getDHSdata() and getDHSindicator() functions. Both the ID and alternative ID in the surveyPrevIndicators dataset can be used to retrieve the indicator. getDHSindicator() processes the raw survey data into a data.frame, where the column titled value is the indicator of interest. It also contains cluster ID, household ID, survey weight and strata information. This data format allows a svydesign object to be defined in the survey package. For example,

indicator <- "ancvisit4+"
year <- 2018
country <- "Zambia"
dhsData1 <- getDHSdata(country = country, indicator = indicator, year = year)
data1 <- getDHSindicator(dhsData1, indicator = indicator)
head(data1)
##   cluster householdID    v024  weight strata value
## 1       1           1 eastern 1892890  rural     1
## 2       1           2 eastern 1892890  rural     1
## 3       1           3 eastern 1892890  rural     1
## 4       1           4 eastern 1892890  rural     0
## 5       1           7 eastern 1892890  rural     1
## 6       1           9 eastern 1892890  rural    NA

If the DHS download using the API fails, you may also manually download the file from the DHS website and read into R. The getDHSdata() function returns a message specifying which file is used (e.g., Individual Record file for the ANC visit example).

2 New indicators

Details on how standard DHS indicators are defined can be found in the Guide to DHS Statistics:https://dhsprogram.com/data/Guide-to-DHS-Statistics/ by searching for an indicator. Codes for creating most standard DHS indicators can be found on DHS GitHub site: https://github.com/DHSProgram/DHS-Indicators-R. The indicators are organized by chapters in the Guide to DHS Statistics.

To use surveyPrev to create a new indicator not already built into the package, we need to specify

  1. which DHS data file to download,
  2. a set of rules to create the indicator from DHS standard recode; or a customized function to process the indicator from the raw HDS survey data.

We will discuss and give examples for both ways to create the new indicator below.

2.1 DHS dataset types

The table below lists the different types of DHS data and their naming conventions in surveyPrev. You can find more details in this website

Table 2: DHS data types
Name Recode
MRdata Men’s Recode
PRdata Household Member Recode
KRdata Children’s Recode
BRdata Births Recode
CRdata Couples’ Recode
HRdata Household Recode
IRdata Individual Recode

2.2 Option 1: Specifying data processing rules

For many standard indicators, the specification can be summarized into the following three steps:

  1. Filter the data for relevant individuals based on certain selection criterion.
  2. Specify a yes response, i.e., outcome value = 1, if certain conditions hold.
  3. Specify a no response, i.e., outcome value = 0, if certain conditions hold.

If an indicator can be specified in this way, with the conditions depending on variables in the DHS data, they can be specified using the filter, yesCondition, and noCondition arguments in the getDHSindicator() function.

As an example, we create a dataset below that consists of neonatal deaths during the last 5 years prior to survey. The dataset we need is the births recode. After obtaining the raw data using getDHSdata() function, we specify the following conditions:

  1. Filter the dataset with v008 - b3 < 120. v008 and b3 are column names in the DHS birth recode. v008 is the data of interview, and b3 is the date of birth. Both dates are in Century Month Code (CMC) format, so that the difference is in the unit of months. This filter creates a dataset that only consist of births 120 months prior to survey.
  2. Specify a yes response, i.e., a neonatal death, by b7 == 0. b7 is the column specifying age of deaths. Age of deaths equal to 0 means the death happened during the first month of birth.
  3. Specify a no response, i.e., not a neonatal death, by b7 > 0 | is.na(b7). That is, deaths after the first month or live children are considered not a neonatal death.

Putting everything together, the dataset can be created with the following codes.

Recode <- "Births Recode"
dhsData <- getDHSdata(country = "Zambia", indicator = NULL, 
                        Recode = Recode, year = "2018")
data <- getDHSindicator(Rdata = dhsData,
                        indicator = NULL, 
                        filter = "v008 - b3 < 120",
                        yesCondition = "b7 == 0", 
                        noCondition = "b7 > 0 | is.na(b7)")

Notice that all the conditions are specified by a string of expressions that R can use to filter and data with. For the filter argument, more than one filters can be applied. For example, the following two calls to getDHSindicator() are equivalent and add an additional filter for births by mothers over 30 years old at time of birth.

data_over30_a <- getDHSindicator(Rdata = dhsData,
                        indicator = NULL, 
                        filter = c("v008 - b3 < 120", "b3 - v011 > 30 * 12"),
                        yesCondition = "b7 == 0", 
                        noCondition = "b7 > 0 | is.na(b7)")
data_over30_b <- getDHSindicator(Rdata = dhsData,
                        indicator = NULL, 
                        filter = "v008 - b3 < 120 & b3 - v011 > 30 * 12",
                        yesCondition = "b7 == 0", 
                        noCondition = "b7 > 0 | is.na(b7)")

2.3 Option 2: Specifying function to process the indicator

When the indicator specification needs more steps than the first option can handle, users can import the full data processing functions by following the steps below.

Let’s take Current use of any modern method of contraception (all women) as an example. For users familiar with the standard indicators defined by the DHS Data Indicator API, the indicator ID is “FP_CUSA_W_MOD”. We will go through the steps to create the customized function below.

Step 1: Search indicator ID or key words in the Guide to DHS Statistics, and then identify which chapter it is in.

For our example, we can either search “FP_CUSA_W_MOD”, or “contraceptive”, and it is in chapter 7: family planning.

Screenshot of Step 1(a): Searching for indicator ID or key words.

Figure 1: Screenshot of Step 1(a): Searching for indicator ID or key words.

Screenshot of Step 1(b): Identifying which chapter the indicator is from.

Figure 2: Screenshot of Step 1(b): Identifying which chapter the indicator is from.

Step 2: We can download IndicatorList.xlsx from DHS GitHub site, and search keyword again in the the corresponding chapter to find out

  1. which DHS data recode is used to create this indicator
  2. which file contains the code to create this indicator in the DHS GitHub repository.
  3. what the corresponding variable name is in the DHS GitHub repository.

For our example, since we are looking up for the indicator of all women currently using any modern method of contraception, we identify the cell “currently use any modern method”, and find out that the code to process this indicator is in the FP_USE.do file and we need IRdata (Individual Recode). We also identify the variable name used in the R codes on the Github repository is “fp_cruse_mod”. We will all three pieces of information in the next step to find the codes processing the indicator.

Screenshot of Step 2: Finding file name and recode name.

Figure 3: Screenshot of Step 2: Finding file name and recode name.

Step 3: In the Github repository, we find the folder for Chapter 7, the file FP_USE.R, and search “fp_cruse_mod” to find the following chunk of code script.

Screenshot of Step 3: Finding code.

Figure 4: Screenshot of Step 3: Finding code.

We extract the following chunk of codes for this indicator, which takes the IR data, and perform a few steps of data cleaning.

# Currently use modern method
IRdata <- IRdata %>%
    mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
    set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
    set_variable_labels(fp_cruse_mod = "Currently used any modern method")

We can use the code chunk to define a new function to be used in the getDHSindicator function. The self-defined function should:

  • Use Recode file as input and return the same data.frame.
  • Change the name of your variable into "value" in the end.

The example below creates a fp_cruse_mod function for “Current use of any modern method of contraception (all women)”, which can be recognized by the getDHSindicator function later.

fp_cruse_mod <- function(RData) {
    IRdata <- RData %>%
        mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
        set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
        set_variable_labels(fp_cruse_mod = "Currently used any modern method")
    colnames(IRdata)[colnames(IRdata) == "fp_cruse_mod"] <- "value"
    return(IRdata)
}

Finally, after we create this function fp_cruse_mod, We can create the indicator by

  1. Use getDHSdata function to downloading relevant DHS datasets using the identified DHS data type in Step 3. In this example we specify Recode = "Individual Recode" and indicator = NULL, The recode be one of the recode lists in table 1. to download the dataset. We can also set Recode = NULL, in which case all available DHS data types will be downloaded.
  2. Use the function FUN = fp_cruse_mod in the call of getDHSindicator to process the indicator according to the customized function.

Altogether, the following codes creates the dataset of the processed indicator.

year <- 2018
country <- "Zambia"
Recode <- "Individual Recode"
dhsData <- getDHSdata(country = country, indicator = NULL, Recode = Recode, year = year)
data <- getDHSindicator(dhsData, indicator = NULL, FUN = fp_cruse_mod)
head(data)
##   cluster householdID    v024  weight strata value
## 1       1           1 eastern 1892890  rural     0
## 2       1           2 eastern 1892890  rural     0
## 3       1           3 eastern 1892890  rural     0
## 4       1           4 eastern 1892890  rural     1
## 5       1           7 eastern 1892890  rural     1
## 6       1           9 eastern 1892890  rural     1

3 Multiple dataset

Some indicators, such as HIV prevalence, require additional data files. Take, HIV prevalence among general population, as an example. This is a built-in indicator with the ID “HA_HIVP_B_HIV”. It needs three Recode files: Individual, Men’s and HIV Test Results, so that the output of getDHSdata() and the input for getDHSindicator() is a list of three data files.

HIVdhsData <- getDHSdata(country = country, indicator = NULL, Recode = c("Individual Recode",
    "Men's Recode", "HIV Test Results Recode"), year = year)