Data Science Resources

Tools & Models

Ki is using state-of-the-art tools, including modeling and visualization methods, to understand and analyze data in the Ki knowledge base.

By curating and aggregating data sets into the Ki knowledge base, we can work with collaborators to use these tools to ask bigger and broader questions about healthy birth, growth, and neurocognitive development.

A Ki model catalog is being created to organize Ki models that are at various stages of development and testing. Here are some of the tools and models that we have developed to explore the growing knowledge base. Check back in late 2018 for updates.


Ki tools are interactive applications that are designed to explore data and advance learning to promote healthy birth, growth, and development. The information explored with Ki tools includes existing knowledge (Seminal Events Timeline), isolated data sets (Trelliscope), and integrated data sets (Full Random Effects Model Explorer; Study Explorer).

Methodology Catalog

Ki uses a range of methodologies because analyzing different kinds of data in different ways reveals more insights. Some of these methodologies (e.g., functional principal component analysis and machine learning) are based entirely on observed patterns within data. Others (e.g., mathematical models of biological systems) are driven by statistical assumptions about the data based on researchers’ biological understanding of the process being modeled. Many methods lie somewhere in the middle of this spectrum of assumptions.

Machine Learning Algorithms
Functional Principal Component Analysis Models
Categorical Methods
Linear Models
Multistate Markov Models
Non-linear Mixed Effects Models
Network Meta-analysis Framework
Structural Equation Model Framework
Mathematical Models of Biological Systems

Empirical models of longitudinal growth outcomes

Empirical models help us understand study data and identify key trends by fitting model curves to the measured data. HBGDki empirical models include the Full Random Effects Model (FREM) that describes growth patterns in height- (HAZ) and weight-for-age z-score (WAZ), and the Development score (D-score) to model observations about cognitive development.
ACTIVE Full random effects model (FREM) 0 - 15 years
ACTIVE Joint model for length, weight, and head circumferemce 0-2 years
ACTIVE Ordered categorical model for longitudinal measures of HAZ 0-2 years
ACTIVE Multistate Markov model to describe longitudinal changes in LAZ categories 0-2 years
ACTIVE Longitudinal growth measures and associations with brain development 0-1 year
ACTIVE SuperLearning to define and predict composite outcomes 0-2 year (anthropometry); 11 years (test...
ACTIVE SuperLearning of child growth trajectories Study specific, currently uses all ages...
ACTIVE Pooled logistic regression to describe characteristics associated with wasting and recovery 0-24 months
ACTIVE Machine learning models for child growth trajectories 0-5000 days

Mechanistic Models

Mechanistic models describe underlying biological mechanisms that are relevant to growth and development outcomes. HBGDki mechanistic models use data from published studies to quantitatively characterize the interactions of nutrients (quantity and quality), gut function, maternal-fetal interactions, infectious and noninfectious microbes, and environmental enteropathy pathways that affect birth, growth, and neurodevelopmental outcomes.
ACTIVE Gut & Growth Mechanistic Model Birth through 2 years (can go...
ACTIVE Mother-fetus model Conception through birth
ACTIVE Body-brain/Infant-Child model 0-5 years old

Causal Models

Causal models describe cause and effect, and may establish how an intervention or combination of interventions may affect physical growth or neurocognitive development. HBGDki causal models are created with methods such as network meta-analysis to determine the relative efficacy between interventions that have not been compared directly in a clinical trial, and computer reading of large volumes of published research to learn about connections between different causal relations.
ACTIVE Structural equations model for height-for-age z-score (HAZ) 0-2 years

Population Model

Population models help us understand how the burden of disease varies between different populations or over time. HBGDki population models evaluate heterogeneity between different populations to determine the most important risk factors to a population and potential interventions that may be most effective in a population, and categorize countries on the basis of risk factors for disease instead of geography or environment.
ACTIVE Population-level models of determinants of child growth. 2-5 years