top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Longitudinal Analysis of Diabetic Foot Disease in Kiambu County: A PhD-Level Mixed-Methods Study

Project type

Longitudinal Analysis

Date

2025

Role

Biostatistician

This project outlines a comprehensive data science workflow developed to support a PhD-level clinical study on Diabetic Foot Disease (DFD) across intervention and control sites in Kiambu County, Kenya. It demonstrates robust statistical design and implementation using R programming, integrating baseline and endline assessments, with advanced modeling to evaluate the impact of educational interventions on patient outcomes.

Project Goals:
Evaluate DFD prevalence and predictors at baseline

Assess the impact of a targeted education intervention

Quantify the incidence of new DFD cases over time

Support evidence-based public health decision-making through reproducible analytics

Data Structure:
4 datasets: Baseline and Endline for both Intervention and Control groups

Variables include: blood pressure, HbA1c, eGFR, comorbidities, demographic & behavioral factors

Analytical Workflow and Techniques:
1. Data Cleaning and Preparation
Unified import and structure standardization in R

Labeled categorical variables for clarity (e.g., sex, education, residence)

Derived key metrics like binary DFD status, CKD stages, and comorbidity indicators

2. Data Integration
Merged data on unique patient ID

Constructed time and group indicators for repeated measures analysis

Ensured matched records for before–after comparison

3. Exploratory Data Analysis (EDA)
Summary stats by group and timepoint

Visualizations: boxplots, density plots, cross-tabulations

Initial identification of risk trends

4. Research Question-Specific Modeling
RQ1: DFD Prevalence at Baseline
→ Cross-sectional analysis + chi-square tests

RQ2: Predictors of DFD
→ Logistic regression (adjusted ORs, interaction terms, multicollinearity checks)

RQ3: Effectiveness of Education Intervention
→ Difference-in-Differences (DID) model
→ Paired t-tests/Wilcoxon tests
→ Linear mixed models for clustering effects

RQ4: Incidence of New DFD
→ Poisson/Log-binomial regression for relative risk

5. Advanced Extensions (PhD-Level Depth)
Latent Class Analysis for symptom clustering

Propensity Score Matching for baseline adjustment

Multilevel modeling (site-level effects)

6. Reporting and Documentation
High-quality tables with gtsummary, visualizations via ggplot2

Full reproducibility with R Markdown / Quarto

Recommendation to publish in peer-reviewed journals

Why R Over SPSS?
Supports complex longitudinal and multilevel models

Allows automation, reproducibility, and customization

Better suited for publication-ready graphics and robust workflows

Skills and Value Demonstrated:
Designed and executed a full analytical pipeline for clinical data

Applied causal inference and advanced regression techniques

Leveraged open-source tools for reproducible public health research

Delivered a scalable template for future intervention evaluations

bottom of page