SISKA Atlas Extraction
Related to Figure 6.
SISKA mouse kidney single-nucleus atlas — Kloetzer et al., Nature Genetics 2025. The m_humphreys_DKD sub-cohort contains three groups: Healthy controls, CKD untreated, and CKD + ACEi. Download Mouse.h5ad from Zenodo (record 15007208) and place at data/siska/Mouse.h5ad. All scoring is deferred to R; this script saves pseudobulk and cell-type CSVs only.
Gene signatures (mouse)
CAAH_MOUSE = ["Clu", "Lrp2", "Lamp2", "Col4a2", "Spink1",
"Wfdc2", "Pax8", "Lyz2", "S100a9", "Cdh13"]
HYPOXIA_MOUSE = ["Hif1a", "Epas1", "Epo", "Epor", "Vegfa",
"Aldoa", "Ldha", "Pgk1", "Slc2a1", "Eno1",
"Angptl4", "Hilpda", "Bnip3"]
GLYCOLYSIS_MOUSE = ["Hk2", "Pfkm", "Pkm", "Ldha", "Hk1",
"Aldoa", "Gapdh", "Pgk1", "Pgam1", "Eno1",
"Tpi1", "Gpi", "Ldhb"]
ECM_MOUSE = ["Col1a1", "Col1a2", "Col3a1", "Col4a1", "Col4a2",
"Col12a1", "Fn1", "Timp1", "Timp2", "Mmp2",
"Acta2", "Vim", "Tgfb1", "Postn", "Thbs1"]
ALL_GENES = list(set(CAAH_MOUSE + HYPOXIA_MOUSE + GLYCOLYSIS_MOUSE + ECM_MOUSE))
Load the atlas and join treatment metadata
The h5ad obs contains orig_ident (sample ID) and proj (sub-cohort) but treatment labels live in a separate metadata CSV. They are joined by orig_ident.
adata = sc.read_h5ad(H5AD)
meta = pd.read_csv(META_CSV)
for col in ["treated", "condition_harmonized", "disease"]:
adata.obs[col] = adata.obs["orig_ident"].map(
meta.set_index("orig_ident")[col]
)
Subset to the three-group comparison
keep = (
(adata.obs["proj"] == "m_humphreys_DKD") &
(adata.obs["treated"].isin(["Control_healthy", "Control_diseased", "ACEi"]))
)
adata_sub = adata[keep].copy()
GROUP_MAP = {
"Control_healthy": "Healthy",
"Control_diseased": "CKD",
"ACEi": "CKD + ACEi",
}
adata_sub.obs["Group"] = adata_sub.obs["treated"].map(GROUP_MAP)
Save per-cell metadata
The cell-level metadata CSV is used in R for cell-type breakdown figures. Cell type labels come from annotation_final_level1.
cell_meta_cols = ["orig_ident", "Group", "treated",
"annotation_final_level1", "annotation_final_level1B",
"nCount_RNA", "nFeature_RNA"]
cell_meta_cols = [c for c in cell_meta_cols if c in adata_sub.obs.columns]
adata_sub.obs[cell_meta_cols].reset_index(drop=True).to_csv(
OUT_DIR / "siska_mouse_cell_meta.csv", index=False
)
Pseudobulk construction
Counts are summed per sample (orig_ident), restricted to signature genes only to keep the CSV manageable.
X = adata_sub.X.toarray() if sp.issparse(adata_sub.X) else np.array(adata_sub.X)
gene_names = list(adata_sub.var_names)
found = [g for g in ALL_GENES if g in gene_names]
sig_idx = [gene_names.index(g) for g in found]
X_sig = X[:, sig_idx]
obs = adata_sub.obs.reset_index(drop=True)
expr_df = pd.DataFrame(X_sig, columns=found)
expr_df["orig_ident"] = obs["orig_ident"].astype(str).values
expr_df["Group"] = obs["Group"].astype(str).values
pb = expr_df.groupby(["orig_ident", "Group"], observed=True).sum().reset_index()
pb_meta = pb[["orig_ident", "Group"]].copy()
pb_expr = pb.drop(columns=["orig_ident", "Group"])
row_sums = pb_expr.sum(axis=1)
pb_norm = np.log1p(pb_expr.div(row_sums, axis=0) * 1e6)
pd.concat([pb_meta.reset_index(drop=True),
pb_norm.reset_index(drop=True)], axis=1).to_csv(
OUT_DIR / "siska_mouse_pseudobulk_expr.csv", index=False
)
Cell-type × sample mean expression
Used in R for the cell-type heatmap. Mean raw expression (not log-CPM) is computed per cell type per sample; the R script applies its own normalization for the heatmap.
expr_df_ct = pd.DataFrame(X_sig, columns=found)
expr_df_ct["orig_ident"] = obs["orig_ident"].astype(str).values
expr_df_ct["Group"] = obs["Group"].astype(str).values
expr_df_ct["cell_type"] = obs["annotation_final_level1"].astype(str).values
ct_pb = expr_df_ct.groupby(
["orig_ident", "Group", "cell_type"], observed=True
).mean().reset_index()
ct_pb.to_csv(OUT_DIR / "siska_mouse_celltype_expr.csv", index=False)
expr_df_ct.groupby(
["orig_ident", "Group", "cell_type"], observed=True
).size().reset_index(name="n_cells").to_csv(
OUT_DIR / "siska_mouse_celltype_counts.csv", index=False
)
Outputs
| File | Description |
|---|---|
siska_mouse_pseudobulk_expr.csv |
log-CPM per sample × gene |
siska_mouse_cell_meta.csv |
per-cell metadata (cell type, group, QC) |
siska_mouse_celltype_expr.csv |
mean expression per cell type × sample |
siska_mouse_celltype_counts.csv |
cell counts per cell type × sample |