Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Longitudinal Analysis of Diabetic Foot Disease in Kiambu County: A PhD-Level Mixed-Methods Study
This project outlines a comprehensive data science workflow developed to support a PhD-level clinical study on Diabetic Foot Disease (DFD) across intervention and control sites in Kiambu County, Kenya. It demonstrates robust statistical design and implementation using R programming, integrating baseline and endline assessments, with advanced modeling to evaluate the impact of educational interventions on patient outcomes.
Project Goals:
Evaluate DFD prevalence and predictors at baseline
Assess the impact of a targeted education intervention
Quantify the incidence of new DFD cases over time
Support evidence-based public health decision-making through reproducible analytics
Data Structure:
4 datasets: Baseline and Endline for both Intervention and Control groups
Variables include: blood pressure, HbA1c, eGFR, comorbidities, demographic & behavioral factors
Analytical Workflow and Techniques:
1. Data Cleaning and Preparation
Unified import and structure standardization in R
Labeled categorical variables for clarity (e.g., sex, education, residence)
Derived key metrics like binary DFD status, CKD stages, and comorbidity indicators
2. Data Integration
Merged data on unique patient ID
Constructed time and group indicators for repeated measures analysis
Ensured matched records for before–after comparison
3. Exploratory Data Analysis (EDA)
Summary stats by group and timepoint
Visualizations: boxplots, density plots, cross-tabulations
Initial identification of risk trends
4. Research Question-Specific Modeling
RQ1: DFD Prevalence at Baseline
→ Cross-sectional analysis + chi-square tests
RQ2: Predictors of DFD
→ Logistic regression (adjusted ORs, interaction terms, multicollinearity checks)
RQ3: Effectiveness of Education Intervention
→ Difference-in-Differences (DID) model
→ Paired t-tests/Wilcoxon tests
→ Linear mixed models for clustering effects
RQ4: Incidence of New DFD
→ Poisson/Log-binomial regression for relative risk
5. Advanced Extensions (PhD-Level Depth)
Latent Class Analysis for symptom clustering
Propensity Score Matching for baseline adjustment
Multilevel modeling (site-level effects)
6. Reporting and Documentation
High-quality tables with gtsummary, visualizations via ggplot2
Full reproducibility with R Markdown / Quarto
Recommendation to publish in peer-reviewed journals
Why R Over SPSS?
Supports complex longitudinal and multilevel models
Allows automation, reproducibility, and customization
Better suited for publication-ready graphics and robust workflows
Skills and Value Demonstrated:
Designed and executed a full analytical pipeline for clinical data
Applied causal inference and advanced regression techniques
Leveraged open-source tools for reproducible public health research
Delivered a scalable template for future intervention evaluations

