Human Validation
Related to Figure 6.
This section describes independent validation of the CAAH 10-gene signature across three human or cross-species datasets covering three distinct platforms and disease contexts.
| Dataset | Platform | Comparison | Key result |
|---|---|---|---|
| ERCB (GSE37460/GSE37455) | Affymetrix microarray | Hypertensive nephropathy vs. healthy donors | CAAH sig elevated in glomerular (padj=0.035, AUC=0.789) and tubulointerstitial (padj=0.032, AUC=0.731) compartments |
| KPMP (Lake et al. 2023) | snRNA-seq (CellxGene Census) | CKD vs. normal kidney | CAAH sig UP in CKD (p=0.0005, n=19 CKD / 31 normal) |
| SISKA (Kloetzer et al. 2025) | Mouse snRNA-seq atlas | Healthy < CKD < CKD+ACEi | Monotonic gradient consistent with RAAS-specific transcriptional biology |
Analysis paradigm
All single-cell/snRNA-seq datasets follow a two-step approach: Python extracts data and builds pseudobulk CSVs; R handles all scoring, statistics, and figure generation.
Initialize R
packages <- c(
# Core data / plotting
"svglite", "data.table", "dplyr", "ggplot2", "ggpubr", "ggrepel",
"tibble", "tidyr", "patchwork", "scales",
# Scoring
"UCell", "msigdbr",
# Differential expression and gene set enrichment
"limma", "fgsea",
# ROC analysis
"pROC",
# Human gene annotation
"org.Hs.eg.db", "AnnotationDbi",
# Heatmaps
"pheatmap",
# Parallelization
"BiocParallel"
)
invisible(lapply(packages, function(x) {
suppressMessages(suppressPackageStartupMessages(library(x, character.only = TRUE)))
}))
set.seed(99)
Set output directories used throughout the human validation figures and tables:
FIG_DIR <- "results/figures/human_validation/"
TAB_DIR <- "results/tables/"
KMP_DIR <- file.path(TAB_DIR, "kpmp/")
SIS_DIR <- file.path(TAB_DIR, "siska/")
Steps
- ERCB microarray analysis — R: parse GSE37460/GSE37455 series matrix files, limma normalization, UCell signature scoring, limma DE, fgsea enrichment, ROC analysis, Figures 01–06
- KPMP data extraction — Python: CellxGene Census API query, pseudobulk log-CPM CSVs per donor
- KPMP scoring + figures — R: UCell scoring, Wilcoxon + ROC stats, individual gene analysis, Figures 07–10
- SISKA atlas extraction — Python: subset Mouse.h5ad to three-group comparison, export pseudobulk and cell-type CSVs
- SISKA ACEi analysis — R: UCell scoring, Kruskal-Wallis + pairwise Wilcoxon, boxplot and cell-type heatmap figures, Figures 11–13