Skip to main content
. 2023 Mar 20;55(4):194–212. doi: 10.1152/physiolgenomics.00144.2022

Table 2.

Computational workflow and dimensionality for analyzed data streams

Skeletal Muscle
Serum EVs
Transcript-Level Differential Expression Analysis Long RNA Small RNA circRNA Long RNA Small RNA circRNA
Raw counts (transcripts × samples) 94,946 × 120 85,542 × 120 39,577 × 120 94,946 × 148 85,542 × 154 197,605 × 148
Filtering Steps:
 1. to remove participants missing pre-exercise sample 94,946 × 120 85,542 × 120 39,577 × 120 94,946 × 140 85,542 × 154 197,605 × 140
 2. by biotype (based on length at transcript maturity) 86,636 × 120 5,398 × 120 39,577 × 120 86,636 × 140 5,398 × 154 197,605 × 140
 3. for expression (cpm >0) in ≥25% of samples 32,295 × 120 2,673 × 120 1,812 × 120 62,772 × 140 1,313 × 154 1,120 × 140
Batch effect identification (adjusted with ComBat_seq if needed) Adjusted
Outlier (±6 MAD from median) identification 32,295 × 120 2,673 × 120 1,812 × 120 62,772 × 133 1,313 × 146 1,120 × 140
Filtering by expression (long RNA only) 23,329 × 121 57,025 × 133
limma differential expression analysis with voom correction
Exploratory Transcriptomic Network Discovery
 FDR < 0.2 at any time point 19,444 1,728 148 55,325 772 310
 All biotype counts collapsed within tissue & filtered for shared samples 21,320 × 120 56,407 × 128
PLIER singular value decomposition:
 n LVs generated (n showing main effect of exercise) 40 (23) 32 (4)

See text for complete workflow, including steps that do not affect data dimensionality. cpm, counts per million; EVs, extracellular vesicles; FDR, false discovery rate; LVs, latent variables; MAD, mean absolute deviation; PLIER, Pathway-Level information extractor.