Categorical methods

Why does Ki utilize categorical methods?

Categorical analysis may best suit outcome variables with nominal or ordinal properties to describe associations applicable to healthy birth, growth, and development. Additionally, Ki can combine continuous outcome variables from collaborators by utilizing categories to align data, which may be differentially (continuous vs. categorical) collected across data sources.


TABLE 1. Example contingency table of stunting status at birth and maternal height category

10 Distributions[2]

Ordinal Logistic Regression[3]

Ordinal logistic regression uses independent variables (predictors) to predict the odds of outcome being one of the response dependent categories, when the dependent variable has ordered categories.

This model assumes the proportionality of odds for each category of the response variable. In other words, the effect of the predictor is the same across the different categories, which means that for a given change of the predictor, the odds from passing from one category to the next is the same regardless of what category we are starting at. The test for proportionality is discussed further and displayed in the HBGDki example below and can be relaxed if it does not hold.

Advantages of categorical methods

The parameters are easily interpretable (probabilities or odds of outcome).

Disadvantages of categorical methods



Ordered categorical model for LAZ

As an example, an HBGDki model with categorical outcome variable for LAZ (stunted (LAZ < -2), at-risk for stunting LAZ between -2 and -1), and not stunted (LAZ ≥ -1)) is regressed on continuous and categorical parameters (including age, mother’s height, presence of enteric pathogens in stool, % energy from protein, enrollment LAZ and other important variables).[4] To illustrate the categorization of the LAZ, see Figure 1.

The LAZ categories are created according to the cutoffs, then the percent of each category is calculated across months of age.

For example (Figure 1), at age 0 Months, we had 37 infants below -2 (green points) from the total of 230 infants (shown in gray). This translates to the 16% shown in the lower section of Figure 1. The probability of being stunted (LAZ < -2) is increasing over time.

Ordinal regression analysis is utilized because of the natural order of the constructed LAZ categories.

A linear piecewise spline age with breakpoints every 6-month intervals was necessary to describe the nonlinear relationship between age and the probability of LAZ category.

Figure 2 illustrates the data over time and how to assess goodness of fit. The model fit for age (x-axis) as a predictor of the probability by LAZ category (y-axis) and demonstrates the model fits well (by the overlap in the gray 95% confidence intervals and observed circular points).

Figure 3 demonstrates the proportionality of the odds assumption by the overlap between the odds (solid square) and the two LAZ categories (triangles). The proportionality did not differ in a considerable way as to influence the effect estimates, as demonstrated by the “substantial overlap” in the confidence intervals. This was illustrated using the 5 most important predictors.


FIGURE 1. Categorization of LAZ outcome variable [4]

FIGURE 2. Goodness of Fit example from MAL-ED study.[4] The median 95% CI helps visualize the fit of the model.
FIGURE 3. Proportionality of odds assumption example from MAL-ED study [4]



Resource Links


  1. Weiss N. Introductory statistics. 9th ed. Boston: Pearson Addison- Wesley; 2012.
  2. Quigley D. Module 7.1: The Binomial, Chi-squared and Fisher’s Exact tests. 2016; http:// biostatistics/module_07.1.html. Accessed Nov 2, 2017.
  3. Norusis M. IBM SPSS Statistics 19 Advanced Statistical Procedures Companion. Pearson; 2012.
  4. MAL-ED Network Investigators. Childhood stunting in relation to the pre- and postnatal environment during the first 2 years of life: The MAL-ED longitudinal birth cohort study. PLOS Medicine. 2017;14(10):e1002408.