Human Validation

Related to Figure 6.

This section describes independent validation of the CAAH 10-gene signature across three human or cross-species datasets covering three distinct platforms and disease contexts.

Dataset Platform Comparison Key result
ERCB (GSE37460/GSE37455) Affymetrix microarray Hypertensive nephropathy vs. healthy donors CAAH sig elevated in glomerular (padj=0.035, AUC=0.789) and tubulointerstitial (padj=0.032, AUC=0.731) compartments
KPMP (Lake et al. 2023) snRNA-seq (CellxGene Census) CKD vs. normal kidney CAAH sig UP in CKD (p=0.0005, n=19 CKD / 31 normal)
SISKA (Kloetzer et al. 2025) Mouse snRNA-seq atlas Healthy < CKD < CKD+ACEi Monotonic gradient consistent with RAAS-specific transcriptional biology

Analysis paradigm

All single-cell/snRNA-seq datasets follow a two-step approach: Python extracts data and builds pseudobulk CSVs; R handles all scoring, statistics, and figure generation.

Initialize R

packages <- c(
  # Core data / plotting
  "svglite", "data.table", "dplyr", "ggplot2", "ggpubr", "ggrepel",
  "tibble", "tidyr", "patchwork", "scales",

  # Scoring
  "UCell", "msigdbr",

  # Differential expression and gene set enrichment
  "limma", "fgsea",

  # ROC analysis
  "pROC",

  # Human gene annotation
  "org.Hs.eg.db", "AnnotationDbi",

  # Heatmaps
  "pheatmap",

  # Parallelization
  "BiocParallel"
)

invisible(lapply(packages, function(x) {
  suppressMessages(suppressPackageStartupMessages(library(x, character.only = TRUE)))
}))

set.seed(99)

Set output directories used throughout the human validation figures and tables:

FIG_DIR <- "results/figures/human_validation/"
TAB_DIR <- "results/tables/"
KMP_DIR <- file.path(TAB_DIR, "kpmp/")
SIS_DIR <- file.path(TAB_DIR, "siska/")

Steps

  1. ERCB microarray analysis — R: parse GSE37460/GSE37455 series matrix files, limma normalization, UCell signature scoring, limma DE, fgsea enrichment, ROC analysis, Figures 01–06
  2. KPMP data extraction — Python: CellxGene Census API query, pseudobulk log-CPM CSVs per donor
  3. KPMP scoring + figures — R: UCell scoring, Wilcoxon + ROC stats, individual gene analysis, Figures 07–10
  4. SISKA atlas extraction — Python: subset Mouse.h5ad to three-group comparison, export pseudobulk and cell-type CSVs
  5. SISKA ACEi analysis — R: UCell scoring, Kruskal-Wallis + pairwise Wilcoxon, boxplot and cell-type heatmap figures, Figures 11–13