Skip to main content
eLife logoLink to eLife
. 2021 Nov 16;10:e72129. doi: 10.7554/eLife.72129

Standardizing workflows in imaging transcriptomics with the abagen toolbox

Ross D Markello 1,, Aurina Arnatkeviciute 2, Jean-Baptiste Poline 1, Ben D Fulcher 3, Alex Fornito 2, Bratislav Misic 1,
Editors: Saad Jbabdi4, Tamar R Makin5
PMCID: PMC8660024  PMID: 34783653

Abstract

Gene expression fundamentally shapes the structural and functional architecture of the human brain. Open-access transcriptomic datasets like the Allen Human Brain Atlas provide an unprecedented ability to examine these mechanisms in vivo; however, a lack of standardization across research groups has given rise to myriad processing pipelines for using these data. Here, we develop the abagen toolbox, an open-access software package for working with transcriptomic data, and use it to examine how methodological variability influences the outcomes of research using the Allen Human Brain Atlas. Applying three prototypical analyses to the outputs of 750,000 unique processing pipelines, we find that choice of pipeline has a large impact on research findings, with parameters commonly varied in the literature influencing correlations between derived gene expression and other imaging phenotypes by as much as ρ ≥ 1.0. Our results further reveal an ordering of parameter importance, with processing steps that influence gene normalization yielding the greatest impact on downstream statistical inferences and conclusions. The presented work and the development of the abagen toolbox lay the foundation for more standardized and systematic research in imaging transcriptomics, and will help to advance future understanding of the influence of gene expression in the human brain.

Research organism: Human

Introduction

Technologies like magnetic resonance imaging (MRI) provide unique insights into macroscopic brain structure and function in vivo. Modern research increasingly emphasizes how microscale attributes, such as gene expression, influence these imaging-derived phenotypes (Fornito et al., 2019; Arnatkeviciute et al., 2019; Arnatkevičiūtė et al., 2021). Gene expression is particularly useful as it is a fundamental molecular phenotype that can be plausibly linked to the function of biological pathways (Whitaker et al., 2016; Seidlitz et al., 2018), protein synthesis (Zheng et al., 2019), receptor distributions (Beliveau et al., 2017; Nørgaard et al., 2021; Shine et al., 2019; Deco et al., 2020; Preller et al., 2018), and cell types (Hansen et al., 2021; Anderson et al., 2020b; Anderson et al., 2018; Seidlitz et al., 2020; Gao et al., 2020). However, researchers looking to bridge these macro- and microscopic phenotypes must overcome multiple challenges. Although there are numerous technical and analytic considerations, one foundational issue is that acquiring high-quality transcriptomic data from the human brain is both costly and highly invasive, requiring budgets far greater than most typical neuroimaging studies and restrictive access to tissue from post-mortem donors or cranial surgical patients. As such, researchers must often rely on freely available repositories of gene expression data.

There exist multiple open-access repositories for gene expression in the human brain, including BrainSpan (Miller et al., 2014; Kang et al., 2011) and PsychENCODE (Gandal et al., 2018; Li et al., 2018; Wang et al., 2018; among others: Sousa et al., 2017; Darmanis et al., 2015; Lake et al., 2016); however, these datasets generally provide relatively sparse anatomical coverage, limiting the types of analyses that can be performed. Thus, researchers who aim to compare transcriptomic expression with whole-brain imaging-derived phenotypes have primarily relied on the Allen Human Brain Atlas (AHBA; Hawrylycz et al., 2012; Hawrylycz et al., 2015). Initially released in 2010, the AHBA remains the most spatially comprehensive dataset of its kind. Derived from bulk microarray analysis of tissue samples obtained from six donors, the AHBA provides expression data for more than 20,000 genes across 3702 brain areas in MRI-derived stereotactic space. With its superior resolution, the AHBA has significantly contributed to the emergence of the field of imaging transcriptomics (Fornito et al., 2019), enabling dozens of studies over the past decade examining relationships between gene expression and an array of macroscale imaging attributes, including cortical thickness (Shin et al., 2018), myelination (Burt et al., 2018), developmental brain maturation (Whitaker et al., 2016; Kirsch and Chechik, 2016), structural brain networks (Seidlitz et al., 2018; Romero-Garcia et al., 2018; Arnatkevičiūtė et al., 2020), functional brain networks (Richiardi et al., 2015; Krienen et al., 2016; Vértes et al., 2016), and human cognition (Fox et al., 2014; Hansen et al., 2021). The AHBA has also highlighted the importance of whole-brain gene expression in neurological and psychiatric diseases, where it has become increasingly clear that transcriptional pathways play a critical role in shaping the broader dynamics of disease progression and emergent symptomatology (Zheng et al., 2019; Shafiei et al., 2021; Henderson et al., 2019; Vogel et al., 2020; Rittman et al., 2016; Anderson et al., 2020a; Romme et al., 2017; McColgan et al., 2018; Morgan et al., 2019).

Since its release, several software toolboxes have been developed to help researchers use transcriptional data from the AHBA (French and Paus, 2015; Gorgolewski et al., 2015; Rittman et al., 2017; Rizzo et al., 2016); however, these tools often focus primarily on facilitating integration of the AHBA with neuroimaging data, offering limited if any functionality for modifying how the data are processed prior to analysis. Instead, a recent comprehensive review revealed that many research groups have opted to develop their own processing pipelines for the AHBA (Arnatkeviciute et al., 2019). Unfortunately, as there are no field-accepted standards for processing imaging transcriptomic data, the generated pipelines vary substantially across groups.

The extent to which such processing variability affects analytic outcomes from the AHBA remains unknown. Indeed, over the past decade neuroimaging research has shown that methodological variability can have broad influences on analyses using structural MRI (Bhagwat et al., 2021; Kharabian Masouleh et al., 2020), diffusion MRI (Oldham et al., 2020; Maier-Hein et al., 2017; Schilling et al., 2019), task fMRI (Carp, 2012; Botvinik-Nezer et al., 2020), and resting-state fMRI (Parkes et al., 2018; Ciric et al., 2017). Although researchers are beginning to grapple with the consequences of this variability, the lack of baseline gene expression datasets against which to compare new results impedes the development of standardized practices. In these situations, some researchers have proposed performing ‘multiverse’ analyses (Steegen et al., 2016; Dragicevic et al., 2019), wherein all possible permutations of data processing are analyzed and the full range of analytic results reported. Although such analyses can be computationally intensive, they offer a path to understand how processing choices impact statistical inferences and conclusions, and provide a mechanism by which to help researchers converge on an optimal pipeline.

Here, we comprehensively investigate how different processing choices influence the results of analyses using the AHBA. First, we develop an open-source Python toolbox, abagen, that collates all possible processing parameters into a set of turn-key workflows, optimized for flexibility and ease-of-use. We then use the toolbox to process the AHBA through approximately 750,000 unique pipelines. Across three prototypical imaging transcriptomic analyses, we examine whether and how these different processing options modify derived statistical estimates and quantify the relative importance of each option. Next, we replicate a curated set of processing pipelines from the literature to assess how previously reported findings compare to the full range of potential outcomes observed across all examined pipelines. Finally, we end with a set of recommendations, integrated directly into the developed abagen toolbox, to promote standardized use of the AHBA in future work.

Results

We introduce the abagen toolbox, an open-access software package designed to streamline processing and preparation of the AHBA for integration with neuroimaging data (Markello et al., 2021c, available at https://github.com/rmarkello/abagen; Markello, 2021b copy archived at swh:1:rev:2aeab5bd0f147fa76b488645e148a1c18095378d). Supporting several workflows, abagen offers functionality for an array of analyses and has already been used in several peer-reviewed publications and preprints (Shafiei et al., 2020; Hansen et al., 2021; Shafiei et al., 2021; Brown et al., 2021; Park et al., 2021; Valk et al., 2021; Zhao et al., 2020; Benkarim et al., 2020; Ding et al., 2021; Park et al., 2020; Lariviere et al., 2020; Martins et al., 2021). The primary workflow, used to generate regional gene expression matrices, integrates 17 distinct processing steps that have previously been employed by research groups throughout the published literature (Table 1). We refer to each unique set of processing choices and parameters as a ‘pipeline’. The following results use abagen to investigate how variable application of these processing steps can impact analyses of AHBA data.

Table 1. Abagen pipeline options.

Overview of 17 options to be considered when processing the AHBA data. The Choices column indicates the number of parameters explored in the current report (numerator) and the total number of parameters possible for the given option (denominator). A denominator of n indicates a hypothetically near-infinite parameter space. The Description column gives a brief overview of the processing choice; for more detail refer to the relevant section in Materials and methods: Gene expression pipelines.

Option Choices Description
Volumetric or surface atlas 2/2 Whether to use a volumetric or surface representation of the atlas
Individualized or group atlas 1/2 Whether to use individualized donor-specific atlases or a group-level atlas
Use non-linear MNI coordinates 2/2 Whether to use updated MNI coordinates provided by alleninf package
Mirror samples across L/R hemisphere 3/4 Whether to mirror (i.e., duplicate) samples across hemisphere boundary
Update probe-to-gene annotations 2/2 Whether to update probe annotations
Intensity-based filtering threshold 3/n Threshold for intensity-based filtering of probes
Inter-areal similarity threshold 1/n Threshold for removing samples with low inter-areal correspondence
Probe selection method 6/8 Method by which to select which probe(s) should represent a given gene
Donor-specific probe selection 3/3 How specified probe selection should integrate data from different donors
Missing data method 2/3 How to handle when brain regions are not assigned expression data
Sample-to-region matching tolerance 3/n Distance tolerance for matching tissue samples to atlas brain regions
Sample normalization method 3/10 Method for normalizing tissue samples (across genes)
Gene normalization method 3/10 Method for normalizing genes (across tissue samples)
Normalize only matched samples 2/2 Whether to perform gene normalization for all versus matched samples
Normalizing discrete structures 2/2 Whether to perform gene normalization within structural classes
Sample-to-region combination method 2/2 Whether to aggregate tissue samples in regions within or across donors
Sample-to-region combination metric 2/2 Metric for aggregating tissue samples into atlas brain regions

Processing choices influence transcriptomic analyses

To understand how choices made during the processing of AHBA data impact downstream analyses, we enumerated 17 decision points (i.e. processing steps or options) that have been modified and used in the literature (Table 1). From these 17 steps we implemented 746,496 distinct processing pipelines, where each pipeline parcellated microarray expression from the AHBA with the Desikan-Killiany atlas (Desikan et al., 2006) to generate a unique brain region-by-gene expression matrix.

Analyses of expression data from the AHBA can be grouped into one of three broad classes (Fornito et al., 2019): correlated gene expression analyses, gene co-expression analyses, and regional gene expression analyses. Correlated gene expression analyses examine the correlation between brain regions across genes, yielding a symmetric region × region matrix (similar to a functional connectivity matrix). Gene co-expression analyses, on the other hand, examine the correlation between genes across brain regions, yielding a symmetric gene × gene matrix. Finally, regional gene expression analyses examine the expression patterns of specific genes or gene sets in relation to other imaging-derived phenotypes.

To examine how differences in processing choices may impact both the expression matrices generated from the different pipelines and derived statistical estimates we ran one analysis from each of these classes on the matrices generated by each processing pipeline. Notably, these analyses are either direct reproductions or variations of analyses that have been previously published (Arnatkeviciute et al., 2019; Oldham et al., 2008; Hawrylycz et al., 2012; Burt et al., 2018). Although there is no ground truth for any of these analyses, findings from previous work offer some context for interpreting the observed results (i.e. data from other species and other modalities; Lau et al., 2021). Nonetheless, we primarily focus on highlighting the potential variability resulting from different processing pipelines.

Correlated gene expression (CGE)

First, we separately correlated the rows of each expression matrix to generate symmetric region × region ‘correlated gene expression’ matrices, indicating the similarity of gene expression profiles between different brain regions (Figure 1a). Previous work in other species has reliably observed that transcriptional similarity in the brain decays with increasing separation distance (Fulcher et al., 2019; Lau et al., 2021). This distance-dependent relationship is an expected feature due to the functional specialization of brain regions, and is consistent with other imaging-derived phenotypes in humans (Roberts et al., 2016; Goulas et al., 2019; Betzel and Bassett, 2018; Mišić et al., 2014; Shafiei et al., 2020; Horvát et al., 2016). We assessed this relationship by extracting the upper triangle of the correlated gene expression matrices and correlating them with the upper triangle of a regional distance matrix, derived by computing the average Euclidean distance between brain region centroids in the Desikan-Killiany atlas (Figure 1a, left panel). Although previous work has highlighted that this relationship is exponential (Arnatkeviciute et al., 2019), we computed the Spearman correlation as both statistics should exhibit similar variability across pipelines and the latter is less computationally expensive.

Figure 1. Processing choices influence transcriptomic analyses.

Figure 1.

(a) Examples of the three analyses used to assess differences in gene expression matrices generated by transcriptomic pipelines. First row: a depiction of the region-by-gene expression matrix generated from one of the 746,496 tested processing pipelines. Second row, left: we compute the correlation between rows of each matrix to generate a symmetric region × region CGE matrix. We then compute the correlation between the upper triangle of this CGE matrix and the upper triangle of a regional distance matrix to examine the degree to which CGE decays with increasing distance between regions (Arnatkeviciute et al., 2019). Second row, middle: we compute the Euclidean distance between columns of each matrix to generate a gene × gene GCE matrix. We use previously defined functional gene communities (Oldham et al., 2008) to compute a silhouette score for this GCE matrix to investigate whether genes within a module have more similar patterns of spatial expression than genes between modules. Second row, right: the first principal component is extracted from the RGE matrix. We compute the correlation between this principal component and the whole-brain T1w/T2w ratio (Burt et al., 2018) to understand how closely these maps covary across the brain. (b) The full statistical distributions from each of the three analyses for all 746,496 pipelines. Left panel: Spearman correlation values, ρ, from the CGE analyses. Middle panel: silhouette scores from the GCE analyses. Right panel: Spearman correlation coefficients, ρ, from the RGE analyses. CGE: correlated gene expression; GCE: gene co-expression; RGE: regional gene expression.

Gene co-expression (GCE)

For the second type of analysis we separately correlated the columns of each expression matrix to generate gene × gene ‘co-expression’ (GCE) matrices, indicating the similarity in spatial expression patterns between all pairs of genes (Figure 1a). A significant body of research has shown that genes tend to form functional communities, exhibiting synchronized expression patterns across space and time (Oldham et al., 2008), such that gene co-expression patterns tend to be more similar within than between such communities. Here, we obtained a set of gene community assignments derived for the brain from a previously studied human transcriptomic dataset (Oldham et al., 2008). We used these community assignments to calculate a silhouette score (Rousseeuw, 1987) for the gene co-expression matrices generated by each pipeline, measuring how well these communities represented the derived co-expression patterns (Figure 1a, middle panel).

Regional gene expression (RGE)

For the third type of transcriptomic analysis, we focused on regional correlations between gene expression measures and an MRI-derived phenotype. Our regional expression measure was defined by computing the first principal component of the region-by-gene expression matrix, representing the axis of maximum spatial variation of gene expression in the brain observed under a given AHBA processing pipeline. As gene expression fundamentally shapes the structure and function of the human brain, it is likely that this principal component may exhibit similar spatial variability to other imaging-derived measures. Recent work has highlighted that the T1w/T2w ratio is a robust phenotype that exhibits patterns of regional variation consistent with other microstructural and functional properties (Gao et al., 2020; Burt et al., 2018; Demirtaş et al., 2019; Fulcher et al., 2019). We therefore correlated the first principal component of gene expression with the whole-brain T1w/T2w ratio (Figure 1a, right panel), measuring the extent to which these values covary across the cortex.

Pipeline distributions

Results from these three analyses reveal that choice of processing pipeline dramatically influences derived statistical estimates (i.e. the CGE-distance correlation, the gene co-expression silhouette score, and the spatial correlations between gene PC1 and whole-brain T1w/T2w ratio; Figure 1b). We observe that all three of the generated distributions of statistical estimates across the 746,496 pipelines have wide ranges (correlated gene expression: [-0.51,–0.13]; gene co-expression: [-0.78,–0.18]; regional gene expression: [0.00, 0.90]) and are either bimodal (Figure 1b, left/middle panels) or heavily skewed (Figure 1b, right panel).

Since there is no ground truth for these analyses we cannot quantitatively assess whether some pipelines are more or less accurate than others. However, there is strong qualitative evidence to suggest that correlated gene expression should be lower between brain regions that are farther apart (Arnatkeviciute et al., 2019; Krienen et al., 2016; Richiardi et al., 2015; Fulcher et al., 2019; Lau et al., 2021). It is notable, then, that the distribution of distance-dependent estimates is so strongly bimodal (splitting at r-0.4), suggesting two very different perspectives on the size of this effect (Figure 1a and b, left panels). As increasingly-detailed single-cell transcriptional data become available (e.g. Yao et al., 2021) we may be able to use these estimates to determine accuracy; for now, we simply note that even for this estimate with strong biological priors we see considerable variability.

Similar variability can be observed for the other two analyses. While all the pipelines demonstrate relatively poor fit of gene communities to the derived gene co-expression matrices (refer to Materials and methods: Analytic approaches for information on why this is not unexpected), we observe that a portion of the pipelines yield far worse correspondence (Figure 1a and b, middle panels). Moreover, while the correlations between gene PC1 and whole-brain T1w/T2w ratio are largely consistent across pipelines, there are a small group of pipelines that yield correlations that deviate by ρ1.0. Notably, the parameter choices for these pipelines are not pathological—that is, their use could be justified—and, as we discuss later (see Results: Variability in parameter importance), modifying just one parameter setting can yield changes in effect sizes within this range.

Collectively, we find that for all three of these analyses there is substantial variability in the statistical estimates generated by different processing pipelines, and this variability is large enough that, across pipelines, it has a meaningful difference in the potential inferences and conclusions that can be drawn.

Variability in parameter importance

Next, we quantified the relative importance of different processing steps and parameters on our three derived statistical estimates. While researchers must ultimately make choices for each of the steps individually when processing AHBA data, we wanted to investigate whether unique choices have distinct influences. Moreover, which parameters are most important may differ based on the type of analysis performed.

We investigated parameter importance by calculating a distribution of difference scores for each parameter, measuring the extent to which changing each parameter—holding all other parameters constant—influences the derived statistical metrics from each of the three analyses. For example, given a processing parameter with two choices this procedure yielded a distribution of N/2 difference scores per analysis, where N is the total number of pipelines (i.e. 746,496/2=373,248). We averaged these distributions separately for each analysis to generate a single, summary ‘impact score’ for each processing step, which we then rank-ordered independently for each analysis.

We find considerable agreement in which parameters are the most impactful across analyses (Figure 2a): the most influential processing steps often involve procedures that influence the gene normalization process in some way (e.g. gene normalization method, normalizing only matched samples; Figure 2b). On the other hand, among the least impactful parameters are choices concerning donor-specific probe selection and handling of missing data. It is worth noting that of the probe selection methods tested in the current manuscript (i.e. max intensity, correlation intensity, correlation variance, differential stability, RNAseq correlation, and averaging), three of the six all render the choice of donor-specific probe selection redundant. In other words, these three methods are mutually exclusive with choice of donor-specific probe selection, potentially confounding our ability to measure the real influence of this parameter. We also highlight that choice of atlas may influence the impact of missing data handling: since the Desikan-Killiany atlas is a relatively low-resolution atlas (68 nodes), expression matrices generated from the tested pipelines are missing, at most, data for two brain regions. It is possible that handling of missing data may be more important when higher-resolution parcellations are employed. That is, while some parameters do not appear to affect our results in aggregate, there are potentially specific research questions where these parameters could play an important and impactful role.

Figure 2. Parameter choice differentially impacts statistical estimates.

Figure 2.

(a) Rank of the relative importance for each parameter (y-axis) across all three analyses (x-axis). Warmer colors indicate parameters that have a greater influence on statistical estimates. (b) Statistical distributions from the three analyses, shown as kernel density plots, separated by choice of gene normalization method (the most impactful parameter as shown in panel a). (c) Density plots of the statistical estimates for all 746,496 pipelines shown along the first two principal components, derived from the 746,496 (pipeline) x 3 (statistical estimates) matrix, representing how different the statistical estimates from each of the three analyses are relative to other pipelines. Left panel: pipelines are colored based on choice of gene normalization method, where each color represents 1/3 of the pipelines. Here, the pipelines in which no normalization was applied (purple) are distinguished from those in which some form of normalization was applied (blue and brown). Right panel: pipelines are colored based on whether gene normalization was performed within (True, red) or across (False, purple) structural classes (i.e. cortex, subcortex/brainstem, cerebellum; see Materials and methods: Gene expression pipelines for more information).

To investigate those parameters that did play an influential role in the current analyses, we visualized their impact by examining the statistical distributions from each analysis separated by the different parameter choices (shown in Figure 2b for gene normalization method). Dividing the distributions in this way highlights how strongly parameter choice can influence the outcomes of the analyses: for example, when no gene normalization is employed the resulting estimates are dramatically shifted from those generated by pipelines that employed some form of normalization (Figure 2b; no normalization: purple distribution). Indeed, the bimodality and skew observed in the full statistical distributions for the analyses (Figure 1b) is almost entirely explained by this single parameter choice.

To investigate more qualitative differences in how parameter choice influences the processing pipelines we performed a principal component analysis (PCA) on the matrix of statistical estimates from the three analyses (i.e. the 746,496×3 pipeline-by-analysis matrix). We extracted the first two principal components from the statistical estimate matrix (variance explained: PC1 = 70%, PC2 = 26%) and examined how pipeline scores were distributed along these axes (Figure 2c). Delineating the distribution of pipelines based on parameter choice underscores how these options impact the separability of resulting statistical estimates. Reinforcing results presented above, we find that the choice of gene normalization method distinguishes the one-third of pipelines with no normalization (purple) from the remaining two-thirds that applied some form of normalization (blue and brown; Figure 2c, left). It is clear from the distribution of pipelines, however, that other processing choices interact with this parameter. For example, plotting the pipelines by whether the gene normalization was performed separately on samples within each structural class (i.e. cerebral cortex, subcortex, cerebellum) rather than across all tissue samples further delineates the pipelines that applied gene normalization into two distinct clusters (Figure 2c, right).

These results reveal how different processing steps are grouped in terms of their importance to analyses of the AHBA, with some groups demonstrating greater potential impact. Broadly, parameters modifying normalization are the most important, followed by parameters influencing how tissue samples are matched to brain regions, and finally parameters impacting probe selection. Moreover, we find that choices within each processing step do not all have an equivalent impact on derived estimates (i.e. performing no gene normalization has a much greater influence than choosing between the two other forms of normalization tested).

Reproducing published analyses

The previous subsections demonstrate variability across the complete range of reasonable processing pipelines; however, many of these pipelines have not yet been used in practice. To investigate whether the subset of pipelines that have already been implemented in the published literature display similar variability, we used abagen to reproduce the processing procedures from nine peer-reviewed articles that (1) are highly-cited within the field, (2) highlight a wide range of processing options, and (3) sufficiently describe their processing pipelines such that they could be reproduced. We explored how different the gene expression values and statistical outcomes generated by these published pipelines were (Hawrylycz et al., 2015; French and Paus, 2015; Whitaker et al., 2016; Krienen et al., 2016; Anderson et al., 2018; Burt et al., 2018; Romero-Garcia et al., 2018; Anderson et al., 2020b; Liu et al., 2020). To ensure comparability, we standardized the choice of brain parcellation across pipelines, using the Desikan-Killiany atlas in all instances. The pipelines were used to generate nine region-by-gene expression matrices, which were then subjected to the same three analyses described previously.

In reproducing the pipelines we note important differences in processing parameter selection (Figure 3a), and find that this variability results in slight discrepancies between gene expression values generated by the pipelines. For example, looking at the distribution of cortical somatostatin (SST), a gene discussed heavily in Anderson et al., 2020b where it used as a proxy for somatostatin interneuron density (Fulcher, 2019), we observe some variation between pipelines (Figure 3b and c). Although we find moderate consistency in the statistical estimates generated by the pipelines, there are important differences (ranges: correlated gene expression [-0.49,–0.28], gene co-expression [-0.70,–0.24], regional gene expression [0.34, 0.88]; Figure 3c). One outlier is the single pipeline that did not appear to implement any form of gene normalization (French and Paus, 2015), supporting earlier results demonstrating the importance of this processing step on downstream expression estimates. This is potentially notable as the processed expression data from this pipeline were made openly available and have been used in analyses by other researchers (e.g. Sepulcre et al., 2018; Beliveau et al., 2017).

Figure 3. Reproducing published pipelines.

Figure 3.

(a) Parameter choices used in the reproduction of published pipelines. Processing steps with categorical choices (e.g., gene normalization) were converted to numerical choices for display purposes only. These choices reflect the range of choices enumerated in Table 1. (b) Relative expression values of cortical somatostatin (SST) generated by each of the reproduced pipelines. Value ranges vary based on pipeline processing options. (c) The Pearson correlation between the cortical somatostatin (SST) maps generated by the nine pipelines shown in panel (b). (d) Statistical estimates from the three analyses described in Materials and methods: Analytic approaches applied to expression data from each of the published pipelines.

Given that imaging transcriptomics is still relatively new and there has been limited work addressing best practices in the field (Arnatkeviciute et al., 2019), these results stress the importance of standardization in use of the AHBA among research groups. Although variation in processing can ostensibly lead to similar inferences in specific analyses, even minor differences in processing choices consistently yield measurable discrepancies in derived expression data. Without proper standardization, these differences will compound and become more problematic as the field continues to grow.

Standardized processing and reporting with the abagen toolbox

Across all of our analyses we find that choice of processing steps and parameters can have a strong influence on the statistical outcomes of research with the AHBA. Here, we briefly highlight features that we have integrated into the abagen toolbox to facilitate standardization in future research.

The abagen toolbox supports two use-case driven workflows: (1) a workflow that accepts an atlas and returns a parcellated, preprocessed regional gene expression matrix (Figure 4a); and, (2) a workflow that accepts a mask and returns preprocessed expression data for all tissue samples within the mask (Figure 4b). Workflows can be called via a single line of code from either the command line or Python terminal, and take approximately one minute to run with default settings using the Desikan-Killiany atlas. The main output of abagen is a single brain region (or tissue sample) × gene expression matrix. Changing the parameters may modify the shape of the matrix (e.g. different atlases will yield different numbers of regions or samples) or different values (e.g. different processing choices may yield different numbers of genes), but not the structure. The outputs of these workflows can be used generally to examine the three prototypical research questions enabled by the AHBA: correlated gene expression, gene co-expression, and regional expression of genes of interest more broadly (Fornito et al., 2019). Beyond its primary workflows, abagen has additional functionality for post-processing the AHBA data (e.g. removing distance-dependent effects from expression data, calculating differential stability estimates; Hawrylycz et al., 2015), and for accessing data from the companion Allen Mouse Brain Atlas (e.g. providing interfaces for querying the Allen Mouse API; https://mouse.brain-map.org/; Lein et al., 2007).

Figure 4. Workflows and features in the abagen toolbox.

Figure 4.

(a) The primary workflow of abagen, used in the reported analyses, accepts a brain atlas and returns a parcellated brain-region-by-gene expression matrix. (b) An alternative abagen workflow accepts a regional mask and returns a processed tissue-sample-by-gene expression matrix, for all tissue samples from the six AHBA donors that fall within boundaries of the mask. (c) Examples of selected features from the abagen workflows and additional toolbox functionality. Top left: examples of some commonly-used atlases that can be employed with the parcellation workflow shown in panel (a). Bottom left: abagen can accept either standard atlases (i.e. in MNI space) or atlases defined in the space of the six individual donors from the AHBA. Top right: an additional workflow available in abagen can be used to generate densely-interpolated expression maps from AHBA data using a k-nearest neighbors interpolation algorithm. Bottom right: using high-resolution atlases in the parcellation workflow (panel a) may result in some parcels being assigned no expression data; abagen supports two methods for assigning values to such regions.

Although these workflows support the entire range of processing options that we assessed in the current manuscript (Figure 4c), we have set the default options for all steps based on best practice recommendations developed in Arnatkeviciute et al., 2019 and further informed by the results presented above (see Supplementary file 1 for a full list). We believe the default settings in abagen will provide a reasonable starting point for researchers beginning to work with the AHBA; however, as we have continually noted, the appropriate choices for some parameters will vary based on research question. As such, to make it easier for researchers to report exactly what parameters they use, we have integrated an automated reporting mechanism into the abagen workflows (Figure 5). The generated reports provide manuscript-ready step-by-step documentation describing all the processing done to the AHBA data in the workflow, and are licensed CC0 (https://creativecommons.org/share-your-work/public-domain/cc0/) so that they can be freely used without restriction.

Figure 5. Annotated example abagen report.

Figure 5.

Example of an automatically generated methods section report from the abagen toolbox. Processing steps are shown on the left and the relevant methods text—which is updated when these steps are modified—is shown in the same font color on the right. Reports also include a formatted reference section and relevant equations; these are not shown here for conciseness. Note that some processing steps (e.g. normalizing within structures, missing data handling) are omitted here because they are not run by default (see Supplementary file 1).

Creation of the toolbox has followed best-practices in software development, including version control, continuous integration testing, and modular code design. To encourage further use by new research groups we provide comprehensive documentation on installing and working with the abagen toolbox online (https://abagen.readthedocs.io/).

Discussion

In the present report, we introduced the abagen toolbox, an open-source Python library for processing transcriptomic data. Using abagen, we conducted a comprehensive analysis examining whether and how different processing options modify statistical estimates derived from analyses using the AHBA. We investigated how processing pipelines used in the literature compare to those we tested, and provide recommendations for improving standardization and reporting of analyses using the AHBA, highlighting how the abagen toolbox can facilitate future developments in this space.

Testing nearly 750,000 unique processing pipelines, we find that choice of processing parameters can strongly influence statistical estimates derived from analyses of the AHBA, and that these choices interact with the type of analysis performed (Figure 1). We observe significant variability with regard to which parameters are most influential, finding that procedures modifying gene expression normalization have a far greater impact on downstream analyses than other processing steps (Figure 2). Looking to the literature, we reproduce nine pipelines from published articles and find that, despite notable inconsistencies in their processing choices, there is moderate consistency in their produced statistical estimates (Figure 3). We demonstrate, however, that these summary estimates may obscure meaningful differences in gene expression values derived by the pipelines, cautioning researchers to be aware of how analytic choices may impact their findings.

Altogether, the present report provides a comprehensive assessment of how processing variability can impact analyses in the field of imaging transcriptomics. Our results demonstrate how researcher choices (or ‘researcher degrees of freedom’; Simmons et al., 2011) can play a meaningful role in analyses of the AHBA. However, these findings are not necessarily limited to the AHBA. Indeed, increasing reliance on open-access datasets has begun to reveal unique challenges associated with data reuse (Thompson et al., 2020). Improved standardization and reporting among research groups using (and re-using) openly available datasets may help to mitigate some of these challenges. We believe that functionality in the abagen toolbox can support future researchers in overcoming these pitfalls and improve reproducibility in processing and analyzing AHBA data.

Our results also show that not all processing choices are equal: that is, we find a hierarchy of processing parameters, wherein procedures modifying gene normalization have the greatest impact on analyses, followed by steps more broadly influencing the matching of tissue samples to brain regions and finally by parameters that determine probe selection. Furthermore, we find that within processing steps certain parameter choices may lead to more reasonable statistical estimates. In particular, applying some form of gene normalization tends to improve the behavior of processed expression data when compared to instances in which no normalization is applied (Figure 1), but there appear to be limited differences in the type of normalization used. Although we only considered cortical tissue samples in the current analyses, we expect that including non-cortical samples would further reinforce these results (Arnatkeviciute et al., 2019) known differences in microarray expression values between cortex and subcortical structures will likely emphasize the impact of different normalization procedures across pipelines. Critically, these findings largely agree with previous recommendations developed by Arnatkeviciute et al., 2019, and we have chosen default parameter choices for abagen workflows accordingly.

Note that there are some processing steps that should be performed in a specific sequence, and others whose order could potentially be interchanged. For example, intensity-based filtering of probes must always be performed before probe selection—reversing the order of these operations would, in the majority of cases, be problematic because it would potentially result in the selection of noisy probes to be carried through to analysis. However, the order of other steps (i.e. sample versus gene normalization) could arguably be reversed with no ostensible detriment. This procedural ambiguity is a salient example of the need to standardize workflows.

More broadly, this work builds on increasing efforts to examine the importance of methodological choices and analytical flexibility in human neuroimaging research (Bhagwat et al., 2021; Kharabian Masouleh et al., 2020; Oldham et al., 2020; Maier-Hein et al., 2017; Schilling et al., 2019; Carp, 2012; Botvinik-Nezer et al., 2020; Parkes et al., 2018; Ciric et al., 2017). Thankfully, emerging technical solutions have begun to tackle these issues via the development of tools that aim to abstract away sources of variation (e.g. fMRIPrep, Esteban et al., 2019; QSIPrep, Cieslak et al., 2020). While results from the present study reinforce the importance of methodological choices in research, abagen draws significant inspiration from these software packages in providing a set of tools designed to overcome such concerns when working with the AHBA.

While the AHBA dataset remains the only one of its kind, the abagen toolbox is designed to be used more broadly as similar datasets become available. That is, the preprocessing functions in abagen can be applied to other microarray expression datasets assuming, for example, availability of stereotactic coordinates. As new imaging transcriptomic datasets are developed and become more widely used, abagen functionality for creating standardized processing pipelines will only become more important. By developing the toolbox openly on GitHub (https://github.com/rmarkello/abagen), it is our hope that abagen can serve as a foundational, community tool for use in imaging transcriptomics research.

One consideration for future work on this topic is that the pipelines tested cover only a portion of the potential variability possible when processing AHBA data (Table 1). For example, a growing body of research has begun to examine how choice of brain parcellation may impact imaging analyses (e.g. Craddock et al., 2012; Thirion et al., 2014; Messé, 2020; Markello and Misic, 2021). While we only assessed processing pipelines using the Desikan-Killiany atlas, many other atlases have been used with the AHBA and it remains unclear how this variation may impact research findings. We also did not investigate whether donor-specific parcellations may impact analyses, a processing choice used in several published research findings (Anderson et al., 2020b; Romero-Garcia et al., 2018; Burt et al., 2018). Although there is significant evidence suggesting inter-individual variability in brain region definition (e.g. Gordon et al., 2017; Kong et al., 2019; Dickie et al., 2018), the process of generating individualized brain parcellations is fraught with methodological choices and requires careful data processing. Given the quality of the MRI data provided alongside the transcriptomic data in the AHBA—including important differences in scanning protocol and procedures between donors—creating donor-specific parcellations may be a large source of variability between pipelines.

Another limitation of the presented results is that we are unable to make categorical statements about which processing options are ’best’ for the AHBA. First, there is no ground truth against which one can assess what the optimal set of processing parameters. One potential solution to this could be to examine the robustness of pipelines based on a leave-one-donor-out strategy (e.g. Arnatkeviciute et al., 2019; Vogel et al., 2020), wherein analyses are repeated six times, omitting one donor each time, to ensure that none of the donors are unduly influencing analytic estimates. This approach is likely to become more useful as data from more individuals becomes available, but at present may be a worthwhile approach for assessing whether chosen processing parameters are appropriate. Moreover, the optimal set of processing parameters may vary based on research question. For instance, in most applications gene normalization is appropriate, as it ensures that downstream analyses are not driven by a small subset of highly expressed genes. However, in other applications it may be desirable to retain the variance contributed by genes to accurately reflect their relative expression levels. For example, many genes in AHBA are not brain-specific, so normalization will amplify their expression patterns, potentially obscuring more relevant expression information. This can be avoided by sub-selecting genes in a hypothesis-driven manner and skipping the normalization step altogether.

Nonetheless, we offer two alternative solutions for researchers who want to continue using the AHBA data. First, similar to the current report, researchers can conduct a comprehensive analysis with the AHBA, running multiple processing pipelines and showing the entire distribution of generated statistical estimates; however, this process can be computationally prohibitive and may impair researchers’ abilities to interpret their findings (Steegen et al., 2016). A less costly alternative, then, is for the imaging transcriptomic research community to converge on a set of data-driven processing pipeline for the AHBA that can be used across research groups. We believe the abagen toolbox—with its comprehensive workflows, well-informed default parameter choices, and detailed documentation—can facilitate this process. While we acknowledge that some research groups may have strong reasons for wanting to use specific (i.e. non-default) processing choices, in these instances we urge clear and detailed reporting of the methods used—such as via the automated reporting functionality from the abagen toolbox.

Altogether, the current report highlights the problem of processing variability in analyses using the AHBA, impacting many research studies in the burgeoning field of imaging transcriptomics. We demonstrate how different processing options can influence statistical estimates of analyses relating data from the AHBA to imaging-derived phenotypes, and present the abagen toolbox as a promising potential solution to this issue.

Materials and methods

Code and data availability

All code used for data processing, analysis, and figure generation is available on GitHub (https://github.com/netneurolab/markello_transcriptome; Markello, 2021a copy archived at swh:1:rev:3abbc85596a5baacd93e5e9e56c906c9dbb080f3)and directly relies on the following open-source Python packages: IPython (Perez and Granger, 2007), Jupyter (Kluyver et al., 2016), Matplotlib (Hunter, 2007), NiBabel (Brett et al., 2019), NumPy (Oliphant, 2006; van der Walt et al., 2011; Harris et al., 2020), Pandas (McKinney, 2010), PySurfer (Waskom et al., 2020), Scikit-learn (Pedregosa et al., 2011), SciPy (Virtanen et al., 2020), and Seaborn (Waskom et al., 2018).

Data

Allen human brain atlas

The Allen Human Brain Atlas (AHBA) is an open-access online resource containing whole-brain microarray gene expression data obtained from post-mortem tissue samples of six adult human donors (https://human.brain-map.org; Allen Institute for Brain Science, 2013; Hawrylycz et al., 2012). Expression data for over 20,000 genes were sampled from 3702 distinct tissue samples across the six donors (one female, ages 24–57), providing the most spatially comprehensive assay of gene expression in the human brain. Normalized microarray expression data were downloaded for all six donors; RNAseq data were downloaded for the two donors with relevant data.

Human connectome project

Group-averaged T1w/T2w (a proxy for intracortical myelin) data were downloaded from the S1200 release of the Human Connectome Project (HCP; Van Essen et al., 2013) and used without further processing.

Brain parcellations

All analyses were performed with the Desikan-Killiany atlas (DK; 68 cortical nodes), an anatomical parcellation generated by delineating regions based on gyral boundaries (Desikan et al., 2006). To explore the impact of volumetric- versus surface-based parcellations we used a version of the DK atlas in (1) volumetric MNI152, and (2) surface fsaverage5 space; both versions are provided directly with the abagen toolbox. To facilitate comparison between volumetric- and surface-based parcellations, samples from the cerebellum, subcortex and brainstem were omitted.

The abagen toolbox

Source code for abagen is available on GitHub (https://github.com/rmarkello/abagen) and is provided under the three-clause BSD license (https://opensource.org/licenses/BSD-3-Clause). We have integrated abagen with Zenodo, which generates unique digital object identifiers (DOIs) for each new release of the toolbox (e.g. https://doi.org/10.5281/zenodo.3451463). Researchers can install abagen as a Python package via the PyPi repository (https://pypi.org/project/abagen/), and can access comprehensive online documentation via ReadTheDocs (https://abagen.readthedocs.io/).

Gene expression pipelines

Most neuroimaging analyses using the AHBA must first convert the ‘raw’ data into a pre-processed brain region-by-gene expression matrix. To investigate the extent to which different processing procedures might impact downstream analyses, we used abagen to modify 17 distinct processing steps in the generation of region-by-gene matrices from the original AHBA data. Each unique set of these 17 processing choices and parameters constitutes a pipeline, yielding 746,496 unique pipelines. Here, we describe in detail the 17 processing steps and respective methods for each option that we examined in our analyses (refer to Table 1 for a summary overview of these choices or refer to the abagen documentation for implementation details; https://abagen.readthedocs.io).

Volumetric or surface atlas

Aggregation of tissue samples from the AHBA into discrete brain regions requires researchers to supply an atlas (or parcellation). There are many brain atlases available for use; however, they typically exist in one of two forms: defined (1) in 3D ‘volumetric’ space, or (2) in ‘surface’ space on a 2D representation of the cortical sheet. Many atlases can exist in both of these formats and so beyond the choice of parcellation, researchers must select which representation to use when processing AHBA samples. Choice of atlas may impact how many and which samples are matched to brain regions. In the current manuscript, we examined a volume- and surface-based representation of the Desikan-Killiany atlas (see Materials and methods: Data; Desikan et al., 2006). Note that both versions of the atlas used in the reported analyses are included with the abagen software distribution.

Individualized or group-level atlas

There is growing recognition that brain parcellations derived at the group level tend to obscure individual differences in anatomy or function (e.g. Gordon et al., 2017; Kong et al., 2019; Dickie et al., 2018). Researchers working with the AHBA have thus begun to generate donor-specific parcellations, using individualized atlases to match tissue samples to brain regions. The individualization process can vary dramatically depending on whether researchers are using volumetric or surface atlases and whether they are operating in ‘native’ or standard (i.e. group) space. Because of the immense variability inherent to the individualization process itself, we opted not to explore this parameter in the current manuscript.

Use non-linear MNI coordinates

With its initial release the AHBA provided stereotactic coordinates for each tissue sample in MNI space (Fonov et al., 2009; Fonov et al., 2011; Collins et al., 1999); however, two of the six donor brains were scanned in cranio and coordinates were derived using affine registrations to the MNI template, while the remaining four were scanned ex vivo and a non-linear registration was used to generate coordinates. More recently, Gorgolewski et al., 2014 used ANTS (Avants et al., 2011) to perform a standardized, manually corrected non-linear diffeomorphic registration of all the donor brains to MNI space. Analyses collating tissue samples into distinct brain regions often rely on MNI coordinates to match samples to regions, and researchers must choose whether to use the original coordinates provided with the AHBA or the newer, non-linearly generated coordinates. In the current manuscript, we assessed the impact of using (1) the original MNI coordinates and (2) the updated coordinates from Gorgolewski et al., 2014.

Mirror samples across left-right hemisphere

Only the first two donors included in the AHBA had tissue samples taken from the right hemisphere. Preliminary analyses of these data revealed minimal lateralization of microarray expression, and so samples were collected exclusively from the left hemisphere for the following four donors (Hawrylycz et al., 2012; Hawrylycz et al., 2015). This irregular sampling resulted in limited spatial coverage of expression in the right hemisphere; to resolve this, some researchers have opted to mirror existing tissue samples across the left-right hemisphere boundary (Romero-Garcia et al., 2018). Researchers must decide whether to perform sample mirroring, and, if so, whether they should mirror unilaterally (i.e. only right-to-left or left-to-right) or bilaterally (i.e. both right-to-left and left-to-right). In the current manuscript, we assessed (1) no mirroring, (2) left-to-right mirroring, and (3) bilateral mirroring. The option for mirroring right-to-left was omitted as this is only useful when analyses selectively consider the left hemisphere, not the whole brain.

Update probe-to-gene annotations

The 60-base-pair probes used to assess microarray expression in the AHBA were annotated with their corresponding gene (or lack thereof) when the data were publicly released. However, as the human reference genome is updated these annotations become increasingly out-of-date. Thus, when researchers choose to use the AHBA data they must decide whether to use the original gene annotations or more recently-generated annotations. In the current manuscript, we assessed using both the original annotations and those generated by Arnatkeviciute et al., 2019.

Intensity-based filtering threshold

Data from the AHBA are provided with information indicating whether the expression of each microarray probe exceeds the expression levels of background signal. Using this information, researchers can choose to perform an intensity-based filtering procedure wherein probes are only considered if their expression levels are greater than background across a specified percentage of tissue samples. In the current manuscript, we considered three degrees of intensity-based filtering: (1) no filtering (all probes used), (2) 25 % filtering (probes used if they exceeded background for more than 25 % of all samples), and (3) median filtering (probes used if they exceeded background for more than 50 % of all samples).

Inter-areal similarity threshold

The expression value of some tissue samples in the AHBA differ markedly from all other samples in the dataset. While this could be driven by real spatial variability in expression values throughout the brain, it is also possible that this variability is artifactual. Researchers can opt to assess the inter-areal similarity of tissue samples, quantifying those that differ from the rest by a given threshold, and remove them from consideration. To our knowledge, this processing step has only been implemented in a single research study (Burt et al., 2018), and as such we do not consider it in the current manuscript.

Probe selection method

The probes used to measure microarray expression levels in the AHBA are often redundant; that is, there are frequently several probes indexing the same gene. Thus, at some point researchers must transition from measuring probe expression levels to measuring gene expression levels. Effectively, this means selecting from or condensing the redundant probes for each gene. There have been at least eight methods proposed in the literature for this process, including selecting a single probe with the (1) max intensity across samples, (2) max variance across samples, (3) highest loading on the first principal components across samples, (4) highest correlation to other probes (or max intensity across samples when only two probes exist), (5) highest correlation to other probes (or max variance across samples when only two probes exist), (6) highest differential stability across donors, (7) highest fidelity to simultaneously-acquired RNAseq data, or (8) simply averaging all probes indexing the same gene. In the current manuscript we only consider six of the most commonly-applied methods (i.e. 1, 4, 5, 6, 7, and 8); the other methods (i.e. 2 and 3) have only been reported in a single research study (Negi and Guda, 2017 and Parkes et al., 2017, respectively) and as such we do not consider them.

Donor-specific probe selection

Probe selection (described above) often requires applying some selection criterion to gene expression levels across tissue samples. For these methods, the specified criterion can be measured across donors (i.e. aggregating tissues samples from donors) or independently for each donor. The latter case—performing probe selection independently for each donor—allows for two additional options: (1) using whichever probe is chosen for each donor, even if it differs from the other donors, or (2) using the most-commonly selected probe for all donors. In the current manuscript, we considered all three of these options: (1) aggregating samples across donors, (2) performing probe selection independently for each donor, and (3) using the most commonly-selected probe across donors.

Missing data method

Due to the irregular spatial sampling of data in the AHBA some brain regions may not be assigned any corresponding microarray expression data. Researchers can opt to simply omit these regions from subsequent analyses; however, in some cases, this is not desirable as the spatial distribution of the missing samples may not be random and discarding them may bias resulting estimates. Two options for handling missing data have been proposed in the literature, including filling missing regions with expression data from nearby regions (i.e. nearest-neighbors interpolation; Whitaker et al., 2016), or interpolating data in missing regions based on nearby samples (i.e. linear interpolation; Burt et al., 2018). In the current manuscript, we tested two options: (1) omit brain regions with missing data entirely from subsequent analyses, and (2) fill missing data with expression values using nearest-neighbors interpolation. Linear interpolation has been sparingly used in the published literature (e.g. Burt et al., 2018; Romero-Garcia et al., 2018) and carries an increase in computational cost (approximately an order of magnitude higher than nearest neighbors interpolation); as such, we do not consider it in the current manuscript.

Sample-to-region matching tolerance

Volumetric atlases

While most tissue samples from the AHBA will fall directly within the brain regions delineated by most parcellations, some samples may fall outside the boundaries of these regions. Researchers can nonetheless choose to permit assigning these nearby samples to a given region, but will often set a distance threshold beyond which samples cannot be assigned. In the current manuscript, we considered three distance tolerances: 0 mm (i.e. samples must fall exactly within a region), 1 mm, and 2 mm.

Surface atlases

Because tissue samples from the AHBA are defined in volumetric space, matching them to parcels defined on a surface-based atlas requires different considerations than with volumetric atlases. Notably, all samples will have non-zero distances from surface vertices; therefore, when matching to surface atlases distance thresholds are generally considered in terms of standard deviations (Burt et al., 2018; Anderson et al., 2020b). In this way, all samples are matched to the surface and then those that are more than the specified standard deviation(s) above the mean away from the surface are excluded. In the current manuscript we tested three standard deviation distance tolerances: 0 s.d. (i.e. all samples farther than the average distance are excluded), 1 s.d., and 2 s.d.

Sample normalization method

Prior to aggregating microarray expression data across donors, researchers can optionally normalize the microarray expression data for each tissue sample across all represented genes (i.e., perform row-wise normalization). This procedure can account for between-sample differences in gene expression potentially driven by measurement errors. There is a number of techniques that have been proposed to normalize expression values; however, in the current manuscript, we considered three normalization methods: (1) no normalization, (2) a z-score transform, and (3) a scaled robust sigmoid transform (Fulcher et al., 2013).

Gene normalization method

Prior to aggregating microarray expression data across donors, researchers can optionally normalize the microarray expression data for each represented gene across tissue samples (i.e. perform column-wise normalization). This procedure can account for inter-individual (donor-specific) differences in gene expression data, which remain present in the AHBA despite batch corrections performed by the Allen Institute prior to releasing the data. In the current manuscript, we considered three normalization methods: (1) no normalization, (2) a z-score transform, and (3) a scaled robust sigmoid transform (Fulcher et al., 2013).

Normalizing only matched samples

Due to choices in other processing steps (e.g. Volume- or surface-based atlas, Sample-to-region matching tolerance) some tissue samples from the AHBA may not be assigned to any region in a given brain atlas. During gene normalization, where expression from each gene is normalized across tissue samples, researchers must decide whether to use (1) only those tissue samples matched to brain regions, or (2) the entire corpus of tissue samples, irrespective of whether they will be included in the final, processed regional expression matrix. In the current manuscript we consider both of these options.

Normalizing discrete structures

There is known variation in gene expression values between tissue samples taken from distinct structural classes (i.e. samples taken from neocortex may have different expression values than those from the brainstem). When performing gene normalization researchers can opt to normalize (1) across all samples irrespective of the structure from which they derive or (2) independently for samples taken from different brain structures. Although the brain atlas used in the current manuscript represents only cortical parcels, this processing choice can interact with Normalizing only matched samples to impact resulting expression values and we therefore test both options.

Note that in the abagen toolbox structural classes are operationalized as: (1) cortex, (2) subcortex and brainstem, (3) cerebellum, and (4) white matter. Subcortex and brainstem are considered as one class because neuroanatomical delineation between these regions are widely contested and expression values in these regions tend to be more similar to one another than to other regions (i.e. data-driven clustering of samples tends to assign subcortical and brainstem samples together).

Sample-to-region combination method

Once tissue samples have been assigned to brain regions they need to be combined to generate a single expression profile; however, due to sampling differences between donors, some donors may have more tissue samples assigned to a given brain region than others. Thus, researchers must decide whether to aggregate samples (1) within each brain region independently for each donor and then across donors, or (2) simultaneously across all donors. In the latter case, donors with a higher number of samples matched to a region will contribute more to the expression profile of a given region (Arnatkeviciute et al., 2019). In the current manuscript, we test both of these options.

Sample-to-region combination metric

When aggregating tissue samples into brain regions researchers must decide what aggregation metric they want to use. Although any statistical estimate could be considered, in practice an estimate of central tendency such as the mean expression values across tissue samples is most applicable. In the current manuscript, we test aggregation with both the (1) mean and (2) median.

Analytic approaches

Prototypical analyses relying on parcellated microarray expression data from the AHBA fall into three broad categories (Fornito et al., 2019):

  1. Correlated gene expression: Examining the correlation between distinct brain regions across genes (i.e. using the region-by-region correlation matrix);

  2. Gene co-expression: Examining the correlation between gene expression profiles across brain regions (i.e. using the gene-by-gene correlation matrix); or,

  3. Regional gene expression: Examining the expression profile of one (or more) genes across brain regions (i.e. using selected columns of the region-by-gene expression matrix).

In order to examine the interaction between processing options and analytic method, we performed one analysis from each of these three categories, described below, for every output of the 746,496 processing pipelines.

Correlated gene expression

Researchers have reliably found a relationship between correlated gene expression in the brain and the distance between brain regions: that is, brain regions that are farther away from one another tend to have less similar gene expression profiles (Richiardi et al., 2015; Richiardi et al., 2017; Krienen et al., 2016; Vértes et al., 2016; Arnatkeviciute et al., 2019). In order to examine the impact of processing choices on this relationship, we computed the Spearman correlation between the upper triangle of the regional distance matrix (Euclidean distance between brain regions) and the upper triangle of each correlated gene expression matrix (Figure 1a, left). Brain regions for which no gene expression data were available (dependent on pipeline options) were not included in the correlation. Note that this relationship is likely exponential (Arnatkeviciute et al., 2019); however, we calculated the Spearman coefficient as it is more computationally tractable and it should exhibit similar variability across pipelines.

Gene co-expression

Researchers have previously shown that gene expression in the brain tends to organize into functionally defined communities or modules (Oldham et al., 2008; Hawrylycz et al., 2012). We examined the extent to which functional gene modules derived from a separate transcriptomic dataset (Oldham et al., 2008) mapped onto the gene co-expression matrices generated from the different processing pipelines. For each gene-by-gene matrix, we calculated the silhouette score (Rousseeuw, 1987) of the gene modules on a modified version of gene co-expression matrix (calculating Euclidean distance between genes instead of gene correlations; Figure 1a, middle) via:

s=1Ni=1Nb(i)-a(i)max{a(i),b(i)}

where a(i) is the average distance of a data point i to all other data points in the same cluster, b(i) is the mean distance of data point i to the nearest neighboring cluster, and N is the total number of data points. The final silhouette score s ranges from –1 to +1, where positive values indicate assortative and negative values indicate disassortative clusters.

Note that the original gene modules were defined using a weighted gene co-expression network analysis (WGCNA), which generally requires performing additional processing steps on the gene co-expression matrix. Since we used the raw gene co-expression matrix in the current analysis, we expect lower silhouette scores than those reported in the initial manuscript where the gene communities were initially defined; however, the variance in scores between pipelines should not be significantly impacted by this choice.

Regional gene expression

Researchers recently highlighted how the principal component of gene expression in the brain closely mirrors the spatial variation observed in MRI-derived T1w/T2w measurements (typically used as a proxy for myelination; Burt et al., 2018). We examined whether this relationship was present across the outputs of the different pipelines, measuring the Spearman correlation between the T1w/T2w ratio and the first principal component of the regional gene expression matrix (Figure 1a, right). Regional gene expression matrices were mean-centered prior to extraction of the principal component.

Assessing pipeline impact

In order to examine the impact of each processing option on the resulting analyses, we calculated a difference score, measuring the extent to which changing each option—holding all other options constant—influenced the derived metrics (i.e. correlation, silhouette score). When there were only two choices for a given option the impact was calculated as the absolute value of the difference between the two choices. When there were more than two choices and choices were ordinal (e.g. sample-to-region matching tolerance) the impact was calculated as the average of the absolute value of the difference between adjacent choices. When there were more than two choices and the choices were categorical (e.g. probe selection method) the impact was calculated as the average of the absolute value of the difference between all combinations of choices. These calculations yielded a distribution of ‘impact’ estimates (i.e. change scores) for each processing option; we represented the final impact score for each processing option as the average of these distributions, taken independently for each of the three analyses. Impact estimates were rank-ordered (where the most impactful parameter was given a rank of one, the second most impactful a rank of two, and so on) to enable direct comparison across the different statistical estimates derived from the three analyses.

Pipeline dimensionality reduction

To investigate qualitative differences between the processing pipelines we performed a principal components analysis (PCA) on the matrix of estimates from the three statistical analyses (i.e. the 746,496 × 3 matrix). We mean-centered the columns of the matrix and extracted the first two principal components, examining how pipeline scores were distributed along these two components in relation to different processing options. These principal component highlight the closeness of the estimate generated by each pipeline along the dimensions of maximum statistical variation; that is, two pipelines that are closer together in the reduced-dimension space yielded more similar statistical estimates than two pipelines that are farther apart.

Reproducing pipelines from the literature

Although all the processing options explored in the current manuscript are reasonable or viable choices that researchers could make when preparing the AHBA for analysis, in reality these have not all been used in the published literature. In order to examine how pipelines used in the literature compared to those that we assessed, we selected nine articles that relied on data from the AHBA to support a primary research finding and reproduced their processing pipelines in abagen (Hawrylycz et al., 2015; French and Paus, 2015; Whitaker et al., 2016; Krienen et al., 2016; Anderson et al., 2018; Burt et al., 2018; Romero-Garcia et al., 2018; Anderson et al., 2020b; Liu et al., 2020). Note that these articles used a variety of parcellations and so to ensure comparability across pipelines we standardized this parameter, using the Desikan-Killiany atlas in all instances. One parameter that we did not assess in the pipelines explored in the current manuscript—whether to use individualized, donor-specific parcellations or a group-level atlas—was frequently varied in the published pipelines. Thus, when reproducing pipelines that called for individualized volumetric atlases we relied on the donor-specific Desikan-Killiany parcellations provided by Arnatkeviciute et al., 2019; when reproducing pipelines with individualized surface atlases we relied on the donor-specific Desikan-Killiany parcellations provided by Romero-Garcia et al., 2018.

As not all of the original manuscripts detailed the processing choices for each of the 17 steps in the abagen workflow, when specific parameter choices were omitted we either: (1) used the default setting if the parameter was required (e.g. using the mean for the ‘sample-to-region combination metric’, since all pipelines must combine samples to regions), or (2) omitted the processing step entirely if it is an optional step (e.g. not performing any gene normalization).

Acknowledgements

We thank Vincent Bazinet, Elizabeth DuPre, Justine Hansen, Golia Shafiei, Laura Suárez, and Bertha Vázquez-Rodríguez for their comments and suggestions. This research was undertaken thanks in part to funding from the Canada First Research Excellence Fund, awarded to McGill University for the Healthy Brains for Healthy Lives initiative. This work was supported in part by funding provided by Brain Canada, in partnership with Health Canada, for the Canadian Open Neuroscience Platform initiative. RDM acknowledges support from the Fonds du Recherche Québec - Nature et Technologies and the Canadian Open Neuroscience Platform. BM acknowledges support from the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery Grant RGPIN #017–04265) and from the Canada Research Chairs Program. AF was supported by the Sylvia and Charles Viertel Foundation and National Health and Medical Research Council (ID: 3274306). J-BP was partially funded by National Institutes of Health (NIH) NIH-NIBIB P41 EB019936 (ReproNim) NIH-NIMH R01 MH083320 (CANDIShare) and NIH RF1 MH120021 (NIDM), the National Institute Of Mental Health of the NIH under Award Number R01MH096906 (Neurosynth), and by Natural Sciences and Engineering Research Council of Canada (NSERC).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Ross D Markello, Email: ross.markello@mail.mcgill.ca.

Bratislav Misic, Email: bratislav.misic@mcgill.ca.

Saad Jbabdi, University of Oxford, United Kingdom.

Tamar R Makin, University College London, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • Natural Sciences and Engineering Research Council of Canada 017-04265 to Bratislav Misic.

  • National Health and Medical Research Council 3274306 to Alex Fornito.

  • National Institutes of Health NIH-NIBIB P41 EB019936 to Jean-Baptiste Poline.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review and editing.

Conceptualization, Data curation, Software, Writing – review and editing.

Conceptualization, Writing – review and editing.

Conceptualization, Software, Writing – review and editing.

Conceptualization, Writing – review and editing.

Conceptualization, Project administration, Supervision, Visualization, Writing – original draft, Writing – review and editing.

Additional files

Transparent reporting form
Supplementary file 1. Default abagen pipeline options.

The default settings for the 17 processing steps considered when processing the AHBA data with abagen. An entry of ‘—' indicates that this is a required, user-supplied parameter. A blank entry indicates that the processing step is not implemented by default. Refer to Table 1 and Methods: Gene expression pipelines for further details.

elife-72129-supp1.pdf (55.6KB, pdf)

Data availability

All datasets used in this study are publicly available. Detailed information about the datasets and how to access them are described in the manuscript.

References

  1. Allen Institute for Brain Science . Allen Institute Publications for Brain Science; 2013. https://help.brain-map.org/display/humanbrain/Documentation [Google Scholar]
  2. Anderson KM, Krienen FM, Choi EY, Reinen JM, Yeo BTT, Holmes AJ. Gene expression links functional networks across cortex and striatum. Nature Communications. 2018;9:1428. doi: 10.1038/s41467-018-03811-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Anderson KM, Collins MA, Chin R, Ge T, Rosenberg MD, Holmes AJ. Transcriptional and imaging-genetic association of cortical interneurons, brain function, and schizophrenia risk. Nature Communications. 2020a;11:2889. doi: 10.1038/s41467-020-16710-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anderson KM, Collins MA, Kong R, Fang K, Li J, He T, Chekroud AM, Yeo BTT, Holmes AJ. Convergent molecular, cellular, and cortical neuroimaging signatures of major depressive disorder. PNAS. 2020b;117:25138–25149. doi: 10.1073/pnas.2008004117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Arnatkeviciute A, Fulcher BD, Fornito A. A practical guide to linking brain-wide gene expression and neuroimaging data. NeuroImage. 2019;189:353–367. doi: 10.1016/j.neuroimage.2019.01.011. [DOI] [PubMed] [Google Scholar]
  6. Arnatkevičiūtė A, Fulcher B, Oldham S, Tiego J, Paquola C, Gerring Z, Aquino K, Hawi Z, Johnson B, Ball G, Klein M, Deco G, Franke B, Bellgrove M, Fornito A. Genetic Influences on Hub Connectivity of the Human Connectome. bioRxiv. 2020 doi: 10.1101/2020.06.21.163915. [DOI] [PMC free article] [PubMed]
  7. Arnatkevičiūtė A, Fulcher B, Bellgrove M, Fornito A. Where the Genome Meets the Connectome: Understanding How Genes Shape Human Brain Connectivity. PsyArXiv. 2021 doi: 10.31234/osf.io/hqgz7. [DOI] [PubMed]
  8. Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage. 2011;54:2033–2044. doi: 10.1016/j.neuroimage.2010.09.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Beliveau V, Ganz M, Feng L, Ozenne B, Højgaard L, Fisher PM, Svarer C, Greve DN, Knudsen GM. A High-Resolution In Vivo Atlas of the Human Brain’s Serotonin System. The Journal of Neuroscience. 2017;37:120–128. doi: 10.1523/JNEUROSCI.2830-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Benkarim O, Paquola C, Park B -y, Hong SJ, Royer J, de Wael R, Larivière S, Valk S, Bzdok D, Mottron L. Functional Idiosyncrasy Has a Shared Topography with Group-Level Connectivity Alterations in Autism. bioRxiv. 2020 doi: 10.1101/2020.12.18.423291. [DOI]
  11. Betzel RF, Bassett DS. Specificity and robustness of long-distance connections in weighted, interareal connectomes. PNAS. 2018;115:E4880–E4889. doi: 10.1073/pnas.1720186115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bhagwat N, Barry A, Dickie EW, Brown ST, Devenyi GA, Hatano K, DuPre E, Dagher A, Chakravarty M, Greenwood CMT, Misic B, Kennedy DN, Poline J-B. Understanding the impact of preprocessing pipelines on neuroimaging cortical surface analyses. GigaScience. 2021;10:giaa155. doi: 10.1093/gigascience/giaa155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, Kirchler M, Iwanir R, Mumford JA, Adcock RA, Avesani P, Baczkowski BM, Bajracharya A, Bakst L, Ball S, Barilari M, Bault N, Beaton D, Beitner J, Benoit RG, Berkers RMWJ, Bhanji JP, Biswal BB, Bobadilla-Suarez S, Bortolini T, Bottenhorn KL, Bowring A, Braem S, Brooks HR, Brudner EG, Calderon CB, Camilleri JA, Castrellon JJ, Cecchetti L, Cieslik EC, Cole ZJ, Collignon O, Cox RW, Cunningham WA, Czoschke S, Dadi K, Davis CP, Luca AD, Delgado MR, Demetriou L, Dennison JB, Di X, Dickie EW, Dobryakova E, Donnat CL, Dukart J, Duncan NW, Durnez J, Eed A, Eickhoff SB, Erhart A, Fontanesi L, Fricke GM, Fu S, Galván A, Gau R, Genon S, Glatard T, Glerean E, Goeman JJ, Golowin SAE, González-García C, Gorgolewski KJ, Grady CL, Green MA, Guassi Moreira JF, Guest O, Hakimi S, Hamilton JP, Hancock R, Handjaras G, Harry BB, Hawco C, Herholz P, Herman G, Heunis S, Hoffstaedter F, Hogeveen J, Holmes S, Hu C-P, Huettel SA, Hughes ME, Iacovella V, Iordan AD, Isager PM, Isik AI, Jahn A, Johnson MR, Johnstone T, Joseph MJE, Juliano AC, Kable JW, Kassinopoulos M, Koba C, Kong X-Z, Koscik TR, Kucukboyaci NE, Kuhl BA, Kupek S, Laird AR, Lamm C, Langner R, Lauharatanahirun N, Lee H, Lee S, Leemans A, Leo A, Lesage E, Li F, Li MYC, Lim PC, Lintz EN, Liphardt SW, Losecaat Vermeer AB, Love BC, Mack ML, Malpica N, Marins T, Maumet C, McDonald K, McGuire JT, Melero H, Méndez Leal AS, Meyer B, Meyer KN, Mihai G, Mitsis GD, Moll J, Nielson DM, Nilsonne G, Notter MP, Olivetti E, Onicas AI, Papale P, Patil KR, Peelle JE, Pérez A, Pischedda D, Poline J-B, Prystauka Y, Ray S, Reuter-Lorenz PA, Reynolds RC, Ricciardi E, Rieck JR, Rodriguez-Thompson AM, Romyn A, Salo T, Samanez-Larkin GR, Sanz-Morales E, Schlichting ML, Schultz DH, Shen Q, Sheridan MA, Silvers JA, Skagerlund K, Smith A, Smith DV, Sokol-Hessner P, Steinkamp SR, Tashjian SM, Thirion B, Thorp JN, Tinghög G, Tisdall L, Tompson SH, Toro-Serey C, Torre Tresols JJ, Tozzi L, Truong V, Turella L, van ’t Veer AE, Verguts T, Vettel JM, Vijayarajah S, Vo K, Wall MB, Weeda WD, Weis S, White DJ, Wisniewski D, Xifra-Porxas A, Yearling EA, Yoon S, Yuan R, Yuen KSL, Zhang L, Zhang X, Zosky JE, Nichols TE, Poldrack RA, Schonberg T. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020;582:84–88. doi: 10.1038/s41586-020-2314-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Brett M, Markiewicz CJ, Hanke M, Côté MA, Cipollini B, McCarthy P, Cheng CP, Halchenko YO, Cottaar M, Ghosh S. 2019. Nipy/Nibabel. Zenodo. [DOI]
  15. Brown JA, Lee AJ, Pasquini L, Seeley WW. A Dynamic Gradient Architecture Generates Brain Activity States. bioRxiv. 2021 doi: 10.1101/2020.08.12.248112. [DOI] [PMC free article] [PubMed]
  16. Burt JB, Demirtaş M, Eckner WJ, Navejar NM, Ji JL, Martin WJ, Bernacchia A, Anticevic A, Murray JD. Hierarchy of transcriptomic specialization across human cortex captured by structural neuroimaging topography. Nature Neuroscience. 2018;21:1251–1259. doi: 10.1038/s41593-018-0195-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Carp J. On the plurality of (methodological) worlds: estimating the analytic flexibility of FMRI experiments. Frontiers in Neuroscience. 2012;6:149. doi: 10.3389/fnins.2012.00149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cieslak M, Cook PA, He X, Yeh FC, Dhollander T, Adebimpe A, Aguirre GK, Bassett DS, Betzel RF, Bourque J. QSIPrep: An Integrative Platform for Preprocessing and Reconstructing Diffusion MRI. bioRxiv. 2020 doi: 10.1101/2020.09.04.282269. [DOI] [PMC free article] [PubMed]
  19. Ciric R, Wolf DH, Power JD, Roalf DR, Baum GL, Ruparel K, Shinohara RT, Elliott MA, Eickhoff SB, Davatzikos C, Gur RC, Gur RE, Bassett DS, Satterthwaite TD. Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage. 2017;154:174–187. doi: 10.1016/j.neuroimage.2017.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Collins DL, Zijdenbos AP, Baaré WF, Evans AC. ANIMAL+INSECT: improved cortical structure segmentation. DBLP; 1999. [DOI] [Google Scholar]
  21. Craddock RC, James GA, Holtzheimer PE, Hu XP, Mayberg HS. A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping. 2012;33:1914–1928. doi: 10.1002/hbm.21333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Hayden Gephart MG, Barres BA, Quake SR. A survey of human brain transcriptome diversity at the single cell level. PNAS. 2015;112:7285–7290. doi: 10.1073/pnas.1507125112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Deco G, Aquino KM, Arnatkeviciute A, Oldham S, Sabaroedin K, Rogasch NC, Kringelbach ML, Fornito A. Dynamical Consequences of Regional Heterogeneity in the Brains Transcriptional Landscape. bioRxiv. 2020 doi: 10.1101/2020.10.28.359943. [DOI] [PMC free article] [PubMed]
  24. Demirtaş M, Burt JB, Helmer M, Ji JL, Adkinson BD, Glasser MF, Van Essen DC, Sotiropoulos SN, Anticevic A, Murray JD. Hierarchical Heterogeneity across Human Cortex Shapes Large-Scale Neural Dynamics. Neuron. 2019;101:1181–1194. doi: 10.1016/j.neuron.2019.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT, Albert MS, Killiany RJ. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]
  26. Dickie EW, Ameis SH, Shahab S, Calarco N, Smith DE, Miranda D, Viviano JD, Voineskos AN. Personalized Intrinsic Network Topography Mapping and Functional Connectivity Deficits in Autism Spectrum Disorder. Biological Psychiatry. 2018;84:278–286. doi: 10.1016/j.biopsych.2018.02.1174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Ding Y, Zhao K, Che T, Du K, Sun H, Liu S, Zheng Y, Li S, Liu B, Liu Y, Alzheimer’s Disease Neuroimaging Initiative Quantitative Radiomic Features as New Biomarkers for Alzheimer’s Disease: An Amyloid PET Study. Cerebral Cortex. 2021;31:3950–3961. doi: 10.1093/cercor/bhab061. [DOI] [PubMed] [Google Scholar]
  28. Dragicevic P, Jansen Y, Sarma A, Kay M, Chevalier F. Increasing the Transparency of Research Papers with Explorable Multiverse Analyses. The 2019 CHI Conference; 2019. [DOI] [Google Scholar]
  29. Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, Kent JD, Goncalves M, DuPre E, Snyder M, Oya H, Ghosh SS, Wright J, Durnez J, Poldrack RA, Gorgolewski KJ. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature Methods. 2019;16:111–116. doi: 10.1038/s41592-018-0235-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Fonov VS, Evans AC, McKinstry RC, Almli C, Collins D. Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. NeuroImage. 2009;47:S102. doi: 10.1016/S1053-8119(09)70884-5. [DOI] [Google Scholar]
  31. Fonov V, Evans AC, Botteron K, Almli CR, McKinstry RC, Collins DL, Group BDC. Unbiased average age-appropriate atlases for pediatric studies. NeuroImage. 2011;54:313–327. doi: 10.1016/j.neuroimage.2010.07.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Fornito A, Arnatkevičiūtė A, Fulcher BD. Bridging the Gap between Connectome and Transcriptome. Trends in Cognitive Sciences. 2019;23:34–50. doi: 10.1016/j.tics.2018.10.005. [DOI] [PubMed] [Google Scholar]
  33. Fox AS, Chang LJ, Gorgolewski KJ, Yarkoni T. Bridging Psychology and Genetics Using Large-Scale Spatial Analysis of Neuroimaging and Neurogenetic Data. bioRxiv. 2014 doi: 10.1101/012310. [DOI]
  34. French L, Paus T. A FreeSurfer view of the cortical transcriptome generated from the Allen Human Brain Atlas. Frontiers in Neuroscience. 2015;9:323. doi: 10.3389/fnins.2015.00323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Fulcher BD, Little MA, Jones NS. Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society, Interface. 2013;10:20130048. doi: 10.1098/rsif.2013.0048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Fulcher BD. Discovering Conserved Properties of Brain Organization Through Multimodal Integration and Interspecies Comparison. Journal of Experimental Neuroscience. 2019;13:1179069519862047. doi: 10.1177/1179069519862047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Fulcher BD, Murray JD, Zerbi V, Wang XJ. Multimodal gradients across mouse cortex. PNAS. 2019;116:4689–4695. doi: 10.1073/pnas.1814144116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Gandal MJ, Zhang P, Hadjimichael E, Walker RL, Chen C, Liu S, Won H, van Bakel H, Varghese M, Wang Y, Shieh AW, Haney J, Parhami S, Belmont J, Kim M, Moran Losada P, Khan Z, Mleczko J, Xia Y, Dai R, Wang D, Yang YT, Xu M, Fish K, Hof PR, Warrell J, Fitzgerald D, White K, Jaffe AE, PsychENCODE Consortium. Peters MA, Gerstein M, Liu C, Iakoucheva LM, Pinto D, Geschwind DH. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science. 2018;362:eaat8127. doi: 10.1126/science.aat8127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Gao R, van den Brink RL, Pfeffer T, Voytek B. Neuronal timescales are functionally dynamic and shaped by cortical microarchitecture. eLife. 2020;9:e61277. doi: 10.7554/eLife.61277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Gordon EM, Laumann TO, Gilmore AW, Newbold DJ, Greene DJ, Berg JJ, Ortega M, Hoyt-Drazen C, Gratton C, Sun H, Hampton JM, Coalson RS, Nguyen AL, McDermott KB, Shimony JS, Snyder AZ, Schlaggar BL, Petersen SE, Nelson SM, Dosenbach NUF. Precision Functional Mapping of Individual Human Brains. Neuron. 2017;95:791–807. doi: 10.1016/j.neuron.2017.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Gorgolewski KJ, Fox AS, Chang L, Schäfer A, Arélin K, Burmann I, Sacher J, Margulies DS. Tight fitting genes: finding relations between statistical maps and gene expression patterns. F1000Research. 2014;5:1. doi: 10.7490/F1000RESEARCH.1097120.1. [DOI] [Google Scholar]
  42. Gorgolewski KJ, Varoquaux G, Rivera G, Schwarz Y, Ghosh SS, Maumet C, Sochat VV, Nichols TE, Poldrack RA, Poline J-B, Yarkoni T, Margulies DS. NeuroVault.org: a web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics. 2015;9:8. doi: 10.3389/fninf.2015.00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Goulas A, Betzel RF, Hilgetag CC. Spatiotemporal ontogeny of brain wiring. Science Advances. 2019;5:eaav9694. doi: 10.1126/sciadv.aav9694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Hansen JY, Markello RD, Vogel JW, Seidlitz J, Bzdok D, Misic B. Mapping gene transcription and neurocognition across human neocortex. Nature Human Behaviour. 2021;5:1240–1250. doi: 10.1038/s41562-021-01082-z. [DOI] [PubMed] [Google Scholar]
  45. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, Del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE. Array programming with NumPy. Nature. 2020;585:357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, Miller JA, van de Lagemaat LN, Smith KA, Ebbert A, Riley ZL, Abajian C, Beckmann CF, Bernard A, Bertagnolli D, Boe AF, Cartagena PM, Chakravarty MM, Chapin M, Chong J, Dalley RA, David Daly B, Dang C, Datta S, Dee N, Dolbeare TA, Faber V, Feng D, Fowler DR, Goldy J, Gregor BW, Haradon Z, Haynor DR, Hohmann JG, Horvath S, Howard RE, Jeromin A, Jochim JM, Kinnunen M, Lau C, Lazarz ET, Lee C, Lemon TA, Li L, Li Y, Morris JA, Overly CC, Parker PD, Parry SE, Reding M, Royall JJ, Schulkin J, Sequeira PA, Slaughterbeck CR, Smith SC, Sodt AJ, Sunkin SM, Swanson BE, Vawter MP, Williams D, Wohnoutka P, Zielke HR, Geschwind DH, Hof PR, Smith SM, Koch C, Grant SGN, Jones AR. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489:391–399. doi: 10.1038/nature11405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Hawrylycz M, Miller JA, Menon V, Feng D, Dolbeare T, Guillozet-Bongaarts AL, Jegga AG, Aronow BJ, Lee C-K, Bernard A, Glasser MF, Dierker DL, Menche J, Szafer A, Collman F, Grange P, Berman KA, Mihalas S, Yao Z, Stewart L, Barabási A-L, Schulkin J, Phillips J, Ng L, Dang C, Haynor DR, Jones A, Van Essen DC, Koch C, Lein E. Canonical genetic signatures of the adult human brain. Nature Neuroscience. 2015;18:1832–1844. doi: 10.1038/nn.4171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Henderson MX, Cornblath EJ, Darwich A, Zhang B, Brown H, Gathagan RJ, Sandler RM, Bassett DS, Trojanowski JQ, Lee VMY. Spread of α-synuclein pathology through the brain connectome is modulated by selective vulnerability and predicted by network analysis. Nature Neuroscience. 2019;22:1248–1257. doi: 10.1038/s41593-019-0457-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Horvát S, Gămănuț R, Ercsey-Ravasz M, Magrou L, Gămănuț B, Van Essen DC, Burkhalter A, Knoblauch K, Toroczkai Z, Kennedy H. Spatial Embedding and Wiring Cost Constrain the Functional Layout of the Cortical Network of Rodents and Primates. PLOS Biology. 2016;14:e1002512. doi: 10.1371/journal.pbio.1002512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. 2007;9:90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
  51. Kang HJ, Kawasawa YI, Cheng F, Zhu Y, Xu X, Li M, Sousa AMM, Pletikos M, Meyer KA, Sedmak G, Guennel T, Shin Y, Johnson MB, Krsnik Z, Mayer S, Fertuzinhos S, Umlauf S, Lisgo SN, Vortmeyer A, Weinberger DR, Mane S, Hyde TM, Huttner A, Reimers M, Kleinman JE, Sestan N. Spatio-temporal transcriptome of the human brain. Nature. 2011;478:483–489. doi: 10.1038/nature10523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Kharabian Masouleh S, Eickhoff SB, Zeighami Y, Lewis LB, Dahnke R, Gaser C, Chouinard-Decorte F, Lepage C, Scholtens LH, Hoffstaedter F, Glahn DC, Blangero J, Evans AC, Genon S, Valk SL. Influence of Processing Pipeline on Cortical Thickness Measurement. Cerebral Cortex. 2020;30:5014–5027. doi: 10.1093/cercor/bhaa097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kirsch L, Chechik G. On Expression Patterns and Developmental Origin of Human Brain Regions. PLOS Computational Biology. 2016;12:e1005064. doi: 10.1371/journal.pcbi.1005064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, Kelley K, Hamrick JB, Grout J, Corlay S. In: Positioning and Power in Academic Publishing: Players, Agents and Agendas. Loizides F, Scmidt B, editors. IOS Press; 2016. Jupyter Notebooks–A publishing format for reproducible computational workflows; pp. 1–164. [Google Scholar]
  55. Kong R, Li J, Orban C, Sabuncu MR, Liu H, Schaefer A, Sun N, Zuo X-N, Holmes AJ, Eickhoff SB, Yeo BTT. Spatial Topography of Individual-Specific Cortical Networks Predicts Human Cognition, Personality, and Emotion. Cerebral Cortex. 2019;29:2533–2551. doi: 10.1093/cercor/bhy123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Krienen FM, Yeo BTT, Ge T, Buckner RL, Sherwood CC. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain. PNAS. 2016;113:E469–E478. doi: 10.1073/pnas.1510903113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Lake BB, Ai R, Kaeser GE, Salathia NS, Yung YC, Liu R, Wildberg A, Gao D, Fung H-L, Chen S, Vijayaraghavan R, Wong J, Chen A, Sheng X, Kaper F, Shen R, Ronaghi M, Fan J-B, Wang W, Chun J, Zhang K. Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science. 2016;352:1586–1590. doi: 10.1126/science.aaf1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Lariviere S, Paquola C, Park B -y, Royer J, Wang Y, Benkarim O, de Wael R, Valk SL, Thomopoulos S, Kirschner M. The ENIGMA Toolbox: Cross-Disorder Integration and Multiscale Neural Contextualization of Multisite Neuroimaging Datasets. bioRxiv. 2020 doi: 10.1101/2020.12.21.423838. [DOI] [PMC free article] [PubMed]
  59. Lau HYG, Fornito A, Fulcher BD. Scaling of gene transcriptional gradients with brain size across mouse development. NeuroImage. 2021;224:117395. doi: 10.1016/j.neuroimage.2020.117395. [DOI] [PubMed] [Google Scholar]
  60. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ, Chen L, Chen L, Chen T-M, Chin MC, Chong J, Crook BE, Czaplinska A, Dang CN, Datta S, Dee NR, Desaki AL, Desta T, Diep E, Dolbeare TA, Donelan MJ, Dong H-W, Dougherty JG, Duncan BJ, Ebbert AJ, Eichele G, Estin LK, Faber C, Facer BA, Fields R, Fischer SR, Fliss TP, Frensley C, Gates SN, Glattfelder KJ, Halverson KR, Hart MR, Hohmann JG, Howell MP, Jeung DP, Johnson RA, Karr PT, Kawal R, Kidney JM, Knapik RH, Kuan CL, Lake JH, Laramee AR, Larsen KD, Lau C, Lemon TA, Liang AJ, Liu Y, Luong LT, Michaels J, Morgan JJ, Morgan RJ, Mortrud MT, Mosqueda NF, Ng LL, Ng R, Orta GJ, Overly CC, Pak TH, Parry SE, Pathak SD, Pearson OC, Puchalski RB, Riley ZL, Rockett HR, Rowland SA, Royall JJ, Ruiz MJ, Sarno NR, Schaffnit K, Shapovalova NV, Sivisay T, Slaughterbeck CR, Smith SC, Smith KA, Smith BI, Sodt AJ, Stewart NN, Stumpf K-R, Sunkin SM, Sutram M, Tam A, Teemer CD, Thaller C, Thompson CL, Varnam LR, Visel A, Whitlock RM, Wohnoutka PE, Wolkey CK, Wong VY, Wood M, Yaylaoglu MB, Young RC, Youngstrom BL, Yuan XF, Zhang B, Zwingman TA, Jones AR. Genome-wide atlas of gene expression in the adult mouse brain. Nature. 2007;445:168–176. doi: 10.1038/nature05453. [DOI] [PubMed] [Google Scholar]
  61. Li M, Santpere G, Imamura Kawasawa Y, Evgrafov OV, Gulden FO, Pochareddy S, Sunkin SM, Li Z, Shin Y, Zhu Y, Sousa AMM, Werling DM, Kitchen RR, Kang HJ, Pletikos M, Choi J, Muchnik S, Xu X, Wang D, Lorente-Galdos B, Liu S, Giusti-Rodríguez P, Won H, de Leeuw CA, Pardiñas AF, Hu M, Jin F, Li Y, Owen MJ, O’Donovan MC, Walters JTR, Posthuma D, Reimers MA, Levitt P, Weinberger DR, Hyde TM, Kleinman JE, Geschwind DH, Hawrylycz MJ, State MW, Sanders SJ, Sullivan PF, Gerstein MB, Lein ES, Knowles JA, Sestan N, BrainSpan Consortium. PsychENCODE Consortium. PsychENCODE Developmental Subgroup Integrative functional genomic analysis of human brain development and neuropsychiatric risks. Science. 2018;362:eaat7615. doi: 10.1126/science.aat7615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Liu J, Xia M, Wang X, Liao X, He Y. The spatial organization of the chronnectome associates with cortical hierarchy and transcriptional profiles in the human brain. NeuroImage. 2020;222:117296. doi: 10.1016/j.neuroimage.2020.117296. [DOI] [PubMed] [Google Scholar]
  63. Maier-Hein KH, Neher PF, Houde JC, Côté MA, Garyfallidis E, Zhong J, Chamberland M, Yeh FC, Lin YC, Ji Q, Reddick WE, Glass JO, Chen DQ, Feng Y, Gao C, Wu Y, Ma J, He R, Li Q, Westin CF, Deslauriers-Gauthier S, González JOO, Paquette M, St-Jean S, Girard G, Rheault F, Sidhu J, Tax CMW, Guo F, Mesri HY, Dávid S, Froeling M, Heemskerk AM, Leemans A, Boré A, Pinsard B, Bedetti C, Desrosiers M, Brambati S, Doyon J, Sarica A, Vasta R, Cerasa A, Quattrone A, Yeatman J, Khan AR, Hodges W, Alexander S, Romascano D, Barakovic M, Auría A, Esteban O, Lemkaddem A, Thiran JP, Cetingul HE, Odry BL, Mailhe B, Nadar MS, Pizzagalli F, Prasad G, Villalon-Reina JE, Galvis J, Thompson PM, Requejo FDS, Laguna PL, Lacerda LM, Barrett R, Dell’Acqua F, Catani M, Petit L, Caruyer E, Daducci A, Dyrby TB, Holland-Letz T, Hilgetag CC, Stieltjes B, Descoteaux M. The challenge of mapping the human connectome based on diffusion tractography. Nature Communications. 2017;8:1349. doi: 10.1038/s41467-017-01285-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Markello RD, Misic B. Comparing spatial null models for brain maps. NeuroImage. 2021;236:118052. doi: 10.1016/j.neuroimage.2021.118052. [DOI] [PubMed] [Google Scholar]
  65. Markello R. markello_transcriptome. swh:1:rev:3abbc85596a5baacd93e5e9e56c906c9dbb080f3Software Heritage. 2021a https://archive.softwareheritage.org/swh:1:dir:ed4b1a9e5eb2449f1d9f5bb65c51477aa8c350dc;origin=https://github.com/netneurolab/markello_transcriptome;visit=swh:1:snp:4f5eeca5d011970f437459b46fbf885ac1554644;anchor=swh:1:rev:3abbc85596a5baacd93e5e9e56c906c9dbb080f3
  66. Markello R. abagen. swh:1:rev:2aeab5bd0f147fa76b488645e148a1c18095378dSoftware Heritage. 2021b https://archive.softwareheritage.org/swh:1:dir:24ed1ac6001e876742bf4c8317902313926be07c;origin=https://github.com/rmarkello/abagen;visit=swh:1:snp:7d534f07cc7c0a549243db17dc6de7d2ede98383;anchor=swh:1:rev:2aeab5bd0f147fa76b488645e148a1c18095378d
  67. Markello R, Shafiei G, Zheng YQ, Mišić B. 2021c. Rmarkello/Abagen. Zenodo. [DOI]
  68. Martins D, Dipasquale O, Veronese M, Turkheimer FE, Loggia M, McMahon S, Williams SC. Transcriptional and Cellular Signatures of Cortical Morphometric Similarity Remodelling in Chronic Pain. bioRxiv. 2021 doi: 10.1101/2021.03.24.436777. [DOI] [PMC free article] [PubMed]
  69. McColgan P, Gregory S, Seunarine KK, Razi A, Papoutsi M, Johnson E, Durr A, Roos RAC, Leavitt BR, Holmans P, Scahill RI, Clark CA, Rees G, Tabrizi SJ, Track-On HD Investigators Brain Regions Showing White Matter Loss in Huntington’s Disease Are Enriched for Synaptic and Metabolic Genes. Biological Psychiatry. 2018;83:456–465. doi: 10.1016/j.biopsych.2017.10.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. McKinney W. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference; 2010. [DOI] [Google Scholar]
  71. Messé A. Parcellation influence on the connectivity-based structure-function relationship in the human brain. Human Brain Mapping. 2020;41:1167–1180. doi: 10.1002/hbm.24866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Miller JA, Ding S-L, Sunkin SM, Smith KA, Ng L, Szafer A, Ebbert A, Riley ZL, Royall JJ, Aiona K, Arnold JM, Bennet C, Bertagnolli D, Brouner K, Butler S, Caldejon S, Carey A, Cuhaciyan C, Dalley RA, Dee N, Dolbeare TA, Facer BAC, Feng D, Fliss TP, Gee G, Goldy J, Gourley L, Gregor BW, Gu G, Howard RE, Jochim JM, Kuan CL, Lau C, Lee C-K, Lee F, Lemon TA, Lesnar P, McMurray B, Mastan N, Mosqueda N, Naluai-Cecchini T, Ngo N-K, Nyhus J, Oldre A, Olson E, Parente J, Parker PD, Parry SE, Stevens A, Pletikos M, Reding M, Roll K, Sandman D, Sarreal M, Shapouri S, Shapovalova NV, Shen EH, Sjoquist N, Slaughterbeck CR, Smith M, Sodt AJ, Williams D, Zöllei L, Fischl B, Gerstein MB, Geschwind DH, Glass IA, Hawrylycz MJ, Hevner RF, Huang H, Jones AR, Knowles JA, Levitt P, Phillips JW, Sestan N, Wohnoutka P, Dang C, Bernard A, Hohmann JG, Lein ES. Transcriptional landscape of the prenatal human brain. Nature. 2014;508:199–206. doi: 10.1038/nature13185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Mišić B, Fatima Z, Askren MK, Buschkuehl M, Churchill N, Cimprich B, Deldin PJ, Jaeggi S, Jung M, Korostil M, Kross E, Krpan KM, Peltier S, Reuter-Lorenz PA, Strother SC, Jonides J, McIntosh AR, Berman MG. The functional connectivity landscape of the human brain. PLOS ONE. 2014;9:e111007. doi: 10.1371/journal.pone.0111007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Morgan SE, Seidlitz J, Whitaker KJ, Romero-Garcia R, Clifton NE, Scarpazza C, van Amelsvoort T, Marcelis M, van Os J, Donohoe G, Mothersill D, Corvin A, Pocklington A, Raznahan A, McGuire P, Vértes PE, Bullmore ET. Cortical patterning of abnormal morphometric similarity in psychosis is associated with brain expression of schizophrenia-related genes. PNAS. 2019;116:9604–9609. doi: 10.1073/pnas.1820754116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Negi SK, Guda C. Global gene expression profiling of healthy human brain and its application in studying neurological disorders. Scientific Reports. 2017;7:897. doi: 10.1038/s41598-017-00952-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Nørgaard M, Beliveau V, Ganz M, Svarer C, Pinborg LH, Keller SH, Jensen PS, Greve DN, Knudsen GM. A high-resolution in vivo atlas of the human brain’s benzodiazepine binding site of GABAA receptors. NeuroImage. 2021;232:117878. doi: 10.1016/j.neuroimage.2021.117878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH. Functional organization of the transcriptome in human brain. Nature Neuroscience. 2008;11:1271–1282. doi: 10.1038/nn.2207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Oldham S, Arnatkevic Iūtė A, Smith RE, Tiego J, Bellgrove MA, Fornito A. The efficacy of different preprocessing steps in reducing motion-related confounds in diffusion MRI connectomics. NeuroImage. 2020;222:117252. doi: 10.1016/j.neuroimage.2020.117252. [DOI] [PubMed] [Google Scholar]
  79. Oliphant TE. A Guide to NumPy. Trelgol Publishing USA; 2006. [Google Scholar]
  80. Park B -y, Park H, Morys F, Kim M, Byeon K, Lee H, Kim SH, Valk S, Dagher A, Bernhardt B. Body Mass Variations Relate to Fractionated Functional Brain Hierarchies. bioRxiv. 2020 doi: 10.1101/2020.08.07.241794. [DOI] [PMC free article] [PubMed]
  81. Park BY, Bethlehem RA, Paquola C, Larivière S, Rodríguez-Cruces R, Vos de Wael R, Neuroscience in Psychiatry Network (NSPN) Consortium. Bullmore ET, Bernhardt BC. An expanding manifold in transmodal regions characterizes adolescent reconfiguration of structural connectome organization. eLife. 2021;10:e64694. doi: 10.7554/eLife.64694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Parkes L, Fulcher BD, Yücel M, Fornito A. Transcriptional signatures of connectomic subregions of the human striatum. Genes, Brain, and Behavior. 2017;16:647–663. doi: 10.1111/gbb.12386. [DOI] [PubMed] [Google Scholar]
  83. Parkes L, Fulcher B, Yücel M, Fornito A. An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. NeuroImage. 2018;171:415–436. doi: 10.1016/j.neuroimage.2017.12.073. [DOI] [PubMed] [Google Scholar]
  84. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
  85. Perez F, Granger BE. IPython: A System for Interactive Scientific Computing. Computing in Science & Engineering. 2007;9:21–29. doi: 10.1109/MCSE.2007.53. [DOI] [Google Scholar]
  86. Preller KH, Burt JB, Ji JL, Schleifer CH, Adkinson BD, Stämpfli P, Seifritz E, Repovs G, Krystal JH, Murray JD, Vollenweider FX, Anticevic A. Changes in global and thalamic brain connectivity in LSD-induced altered states of consciousness are attributable to the 5-HT2A receptor. eLife. 2018;7:e35082. doi: 10.7554/eLife.35082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Richiardi J, Altmann A, Milazzo AC, Chang C, Chakravarty MM, Banaschewski T, Barker GJ, Bokde ALW, Bromberg U, Büchel C, Conrod P, Fauth-Bühler M, Flor H, Frouin V, Gallinat J, Garavan H, Gowland P, Heinz A, Lemaître H, Mann KF, Martinot JL, Nees F, Paus T, Pausova Z, Rietschel M, Robbins TW, Smolka MN, Spanagel R, Ströhle A, Schumann G, Hawrylycz M, Poline JB, Greicius MD, IMAGEN consortium BRAIN NETWORKS. Correlated gene expression supports synchronous activity in brain networks. Science. 2015;348:1241–1244. doi: 10.1126/science.1255905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Richiardi J, Altmann A, Greicius M. Distance Is Not Everything in Imaging Genomics of Functional Networks: Reply to a Commentary on Correlated Gene Expression Supports Synchronous Activity in Brain Networks. bioRxiv. 2017 doi: 10.1101/132746. [DOI]
  89. Rittman T, Rubinov M, Vértes PE, Patel AX, Ginestet CE, Ghosh BCP, Barker RA, Spillantini MG, Bullmore ET, Rowe JB. Regional expression of the MAPT gene is associated with loss of hubs in brain networks and cognitive impairment in Parkinson disease and progressive supranuclear palsy. Neurobiology of Aging. 2016;48:153–160. doi: 10.1016/j.neurobiolaging.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Rittman T, Rittman M, Azevedo T. Maybrain software package. RittmanResearch. 2017 https://github.com/RittmanResearch/maybrain
  91. Rizzo G, Veronese M, Expert P, Turkheimer FE, Bertoldo A. MENGA: A New Comprehensive Tool for the Integration of Neuroimaging Data and the Allen Human Brain Transcriptome Atlas. PLOS ONE. 2016;11:e0148744. doi: 10.1371/journal.pone.0148744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Roberts JA, Perry A, Lord AR, Roberts G, Mitchell PB, Smith RE, Calamante F, Breakspear M. The contribution of geometry to the human connectome. NeuroImage. 2016;124:379–393. doi: 10.1016/j.neuroimage.2015.09.009. [DOI] [PubMed] [Google Scholar]
  93. Romero-Garcia R, Whitaker KJ, Váša F, Seidlitz J, Shinn M, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium. Bullmore ET, Vértes PE. Structural covariance networks are coupled to expression of genes enriched in supragranular layers of the human cortex. NeuroImage. 2018;171:256–267. doi: 10.1016/j.neuroimage.2017.12.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Romme IAC, de Reus MA, Ophoff RA, Kahn RS, van den Heuvel MP. Connectome Disconnectivity and Cortical Gene Expression in Patients With Schizophrenia. Biological Psychiatry. 2017;81:495–502. doi: 10.1016/j.biopsych.2016.07.012. [DOI] [PubMed] [Google Scholar]
  95. Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987;20:53–65. doi: 10.1016/0377-0427(87)90125-7. [DOI] [Google Scholar]
  96. Schilling KG, Nath V, Hansen C, Parvathaneni P, Blaber J, Gao Y, Neher P, Aydogan DB, Shi Y, Ocampo-Pineda M, Schiavi S, Daducci A, Girard G, Barakovic M, Rafael-Patino J, Romascano D, Rensonnet G, Pizzolato M, Bates A, Fischi E, Thiran J-P, Canales-Rodríguez EJ, Huang C, Zhu H, Zhong L, Cabeen R, Toga AW, Rheault F, Theaud G, Houde J-C, Sidhu J, Chamberland M, Westin C-F, Dyrby TB, Verma R, Rathi Y, Irfanoglu MO, Thomas C, Pierpaoli C, Descoteaux M, Anderson AW, Landman BA. Limits to anatomical accuracy of diffusion tractography using modern approaches. NeuroImage. 2019;185:1–11. doi: 10.1016/j.neuroimage.2018.10.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Seidlitz J, Váša F, Shinn M, Romero-Garcia R, Whitaker KJ, Vértes PE, Wagstyl K, Kirkpatrick Reardon P, Clasen L, Liu S, Messinger A, Leopold DA, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium. Raznahan A, Bullmore ET. Morphometric Similarity Networks Detect Microscale Cortical Organization and Predict Inter-Individual Cognitive Variation. Neuron. 2018;97:231–247. doi: 10.1016/j.neuron.2017.11.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Seidlitz J, Nadig A, Liu S, Bethlehem RAI, Vértes PE, Morgan SE, Váša F, Romero-Garcia R, Lalonde FM, Clasen LS, Blumenthal JD, Paquola C, Bernhardt B, Wagstyl K, Polioudakis D, de la Torre-Ubieta L, Geschwind DH, Han JC, Lee NR, Murphy DG, Bullmore ET, Raznahan A. Author Correction: Transcriptomic and cellular decoding of regional brain vulnerability to neurogenetic disorders. Nature Communications. 2020;11:5936. doi: 10.1038/s41467-020-19362-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Sepulcre J, Grothe MJ, d’Oleire Uquillas F, Ortiz-Terán L, Diez I, Yang H-S, Jacobs HIL, Hanseeuw BJ, Li Q, El-Fakhri G, Sperling RA, Johnson KA. Neurogenetic contributions to amyloid beta and tau spreading in the human cortex. Nature Medicine. 2018;24:1910–1918. doi: 10.1038/s41591-018-0206-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Shafiei G, Markello RD, Vos de Wael R, Bernhardt BC, Fulcher BD, Misic B. Topographic gradients of intrinsic dynamics across neocortex. eLife. 2020;9:e62116. doi: 10.7554/eLife.62116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Shafiei G, Bazinet V, Dadar M. Global Network Structure and Local Transcriptomic Vulnerability Shape Atrophy in Sporadic and Genetic Behavioral Variant Frontotemporal Dementia. bioRxiv. 2021 doi: 10.1101/2021.08.24.457538. [DOI]
  102. Shin J, French L, Xu T, Leonard G, Perron M, Pike GB, Richer L, Veillette S, Pausova Z, Paus T. Cell-Specific Gene-Expression Profiles and Cortical Thickness in the Human Brain. Cerebral Cortex. 2018;28:3267–3277. doi: 10.1093/cercor/bhx197. [DOI] [PubMed] [Google Scholar]
  103. Shine JM, Breakspear M, Bell PT, Ehgoetz Martens KA, Shine R, Koyejo O, Sporns O, Poldrack RA. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nature Neuroscience. 2019;22:289–296. doi: 10.1038/s41593-018-0312-0. [DOI] [PubMed] [Google Scholar]
  104. Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. 2011;22:1359–1366. doi: 10.1177/0956797611417632. [DOI] [PubMed] [Google Scholar]
  105. Sousa AMM, Zhu Y, Raghanti MA, Kitchen RR, Onorati M, Tebbenkamp ATN, Stutz B, Meyer KA, Li M, Kawasawa YI, Liu F, Perez RG, Mele M, Carvalho T, Skarica M, Gulden FO, Pletikos M, Shibata A, Stephenson AR, Edler MK, Ely JJ, Elsworth JD, Horvath TL, Hof PR, Hyde TM, Kleinman JE, Weinberger DR, Reimers M, Lifton RP, Mane SM, Noonan JP, State MW, Lein ES, Knowles JA, Marques-Bonet T, Sherwood CC, Gerstein MB, Sestan N. Molecular and cellular reorganization of neural circuits in the human lineage. Science. 2017;358:1027–1032. doi: 10.1126/science.aan3456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W. Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science. 2016;11:702–712. doi: 10.1177/1745691616658637. [DOI] [PubMed] [Google Scholar]
  107. Thirion B, Varoquaux G, Dohmatob E, Poline JB. Which fMRI clustering gives good brain parcellations? Frontiers in Neuroscience. 2014;8:167. doi: 10.3389/fnins.2014.00167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Thompson WH, Wright J, Bissett PG, Poldrack RA. Dataset decay and the problem of sequential analyses on open datasets. eLife. 2020;9:e53498. doi: 10.7554/eLife.53498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Valk SL, Kanske P, Park B -y, Hong SJ, Boeckler-Raettig A, Trautwein FM, Bernhardt BC, Singer T. Functional Network Plasticity of the Human Social Brain. bioRxiv. 2021 doi: 10.1101/2020.11.11.377895. [DOI]
  110. van der Walt S, Colbert SC, Varoquaux G. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering. 2011;13:22–30. doi: 10.1109/MCSE.2011.37. [DOI] [Google Scholar]
  111. Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K, WU-Minn HCP Consortium The WU-Minn Human Connectome Project: an overview. NeuroImage. 2013;80:62–79. doi: 10.1016/j.neuroimage.2013.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Vértes PE, Rittman T, Whitaker KJ, Romero-Garcia R, Váša F, Kitzbichler MG, Wagstyl K, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium. Bullmore ET. Gene transcription profiles associated with inter-modular hubs and connection distance in human functional magnetic resonance imaging networks. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2016;371:20150362. doi: 10.1098/rstb.2015.0362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, SciPy 1.0 Contributors. Vijaykumar A, Bardelli AP, Rothberg A, Hilboll A, Kloeckner A, Scopatz A, Lee A, Rokem A, Woods CN, Fulton C, Masson C, Häggström C, Fitzgerald C, Nicholson DA, Hagen DR, Pasechnik DV, Olivetti E, Martin E, Wieser E, Silva F, Lenders F, Wilhelm F, Young G, Price GA, Ingold GL, Allen GE, Lee GR, Audren H, Probst I, Dietrich JP, Silterra J, Webber JT, Slavič J, Nothman J, Buchner J, Kulick J, Schönberger JL, de Miranda Cardoso JV, Reimer J, Harrington J, Rodríguez JLC, Nunez-Iglesias J, Kuczynski J, Tritz K, Thoma M, Newville M, Kümmerer M, Bolingbroke M, Tartre M, Pak M, Smith NJ, Nowaczyk N, Shebanov N, Pavlyk O, Brodtkorb PA, Lee P, McGibbon RT, Feldbauer R, Lewis S, Tygier S, Sievert S, Vigna S, Peterson S, More S, Pudlik T, Oshima T, Pingel TJ, Robitaille TP, Spura T, Jones TR, Cera T, Leslie T, Zito T, Krauss T, Upadhyay U, Halchenko YO, Vázquez-Baeza Y. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Vogel JW, Iturria-Medina Y, Strandberg OT, Smith R, Levitis E, Evans AC, Hansson O, Alzheimer’s Disease Neuroimaging Initiative. Swedish BioFinder Study Spread of pathological tau proteins through communicating neurons in human Alzheimer’s disease. Nature Communications. 2020;11:2612. doi: 10.1038/s41467-020-15701-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, Clarke D, Gu M, Emani P, Yang YT, Xu M, Gandal MJ, Lou S, Zhang J, Park JJ, Yan C, Rhie SK, Manakongtreecheep K, Zhou H, Nathan A, Peters M, Mattei E, Fitzgerald D, Brunetti T, Moore J, Jiang Y, Girdhar K, Hoffman GE, Kalayci S, Gümüş ZH, Crawford GE, PsychENCODE Consortium. Roussos P, Akbarian S, Jaffe AE, White KP, Weng Z, Sestan N, Geschwind DH, Knowles JA, Gerstein MB. Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018;362:eaat8464. doi: 10.1126/science.aat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Waskom M, Botvinnik O, OḰane ­D, Hobson P, Ostblom J, Lukauskas S, Gemperline DC, Augspurger T, Halchenko Y, Cole JB. 2018. Mwaskom/Seaborn. Zenodo. [DOI]
  117. Waskom M, Larson E, Brodbeck C, Gramfort A, Burns S, Luessi M, Weidemann CT, Bitzer S, Markiewicz C, LaPlante R, Halchenko Y, Engemann DA, van Vliet M, Ghosh S, Klein N, Piantoni G, Brett M, Gwilliams L, King JR, Liu D. 2020. nipy/pysurfer. Zenodo. [DOI]
  118. Whitaker KJ, Vértes PE, Romero-Garcia R, Váša F, Moutoussis M, Prabhu G, Weiskopf N, Callaghan MF, Wagstyl K, Rittman T, Tait R, Ooi C, Suckling J, Inkster B, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium. Bullmore ET. Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. PNAS. 2016;113:9105–9110. doi: 10.1073/pnas.1601745113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Yao Z, van Velthoven CTJ, Nguyen TN, Goldy J, Sedeno-Cortes AE, Baftizadeh F, Bertagnolli D, Casper T, Chiang M, Crichton K, Ding S-L, Fong O, Garren E, Glandon A, Gouwens NW, Gray J, Graybuck LT, Hawrylycz MJ, Hirschstein D, Kroll M, Lathia K, Lee C, Levi B, McMillen D, Mok S, Pham T, Ren Q, Rimorin C, Shapovalova N, Sulc J, Sunkin SM, Tieu M, Torkelson A, Tung H, Ward K, Dee N, Smith KA, Tasic B, Zeng H. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell. 2021;184:3222–3241. doi: 10.1016/j.cell.2021.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Zhao K, Zheng Q, Che T, Martin D, Li Q, Ding Y, Zheng Y, Liu Y, Li S. Regional Radiomics Similarity Networks (R2SN) in the Human Brain: Reproducibility, Small-World and Biological Basis. bioRxiv. 2020 doi: 10.1101/2020.12.09.418509. [DOI] [PMC free article] [PubMed]
  121. Zheng Y-Q, Zhang Y, Yau Y, Zeighami Y, Larcher K, Misic B, Dagher A, Kennedy H. Local vulnerability and global connectivity jointly shape neurodegenerative disease propagation. PLOS Biology. 2019;17:e3000495. doi: 10.1371/journal.pbio.3000495. [DOI] [PMC free article] [PubMed] [Google Scholar]

Editor's evaluation


This paper will be of interest to scientists studying the large-scale transcriptomic organization of the human brain, and in particular those who have used or plan to use the Allen Human Brain Atlas dataset. The study is well-motivated and novel. The most striking finding is the magnitude of variability that is introduced by different data processing decisions. The open-source software described in this study is comprehensive, well documented, and is an important contribution to the field.

Decision letter

Editor: Saad Jbabdi1
Reviewed by: Saad Jbabdi2, Joshua Burt3, Michael J Hawrylycz4

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Standardizing workflows in imaging transcriptomics with the abagen toolbox" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Saad Jbabdi as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Tamar Makin as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Joshua Burt (Reviewer #2); Michael J Hawrylycz (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

As you will see, all reviewers are enthusiastic about the paper and the accompanying toolbox. The main things the reviewers are asking for are:

1) Further comments on how the tool can be used to better converge on more strongly interpretable results.

2) More detailed description of the toolbox output.

Reviewer #1 (Recommendations for the authors):

This is a first for me but I don't have anything that I'd like the authors to change.

Reviewer #2 (Recommendations for the authors):

Are region-region distances geodesic or Euclidean?

P13 – typo: "asses" (you meant assess).

Figure 3 – Panel C might benefit from a schematic. I personally found it somewhat difficult to understand exactly what was being shown.

The number of pipelines tested is written at least two different ways (746,946 and 746,496).

Reviewer #3 (Recommendations for the authors):If length becomes an issue, one might reduce some of the description of the many challenges in the field and data analysis which are more well known.

eLife. 2021 Nov 16;10:e72129. doi: 10.7554/eLife.72129.sa2

Author response


Reviewer #2 (Recommendations for the authors):

Are region-region distances geodesic or Euclidean?

In our analysis of the distance-dependent relationship of gene expression ("correlated gene expression" and "CGE" in the manuscript) we used the Euclidean distance between parcel centroids. For volumetric atlases, parcel centroids were calculated as the center-of-mass of the voxels within each parcel; for surface-based atlases, parcel centroids were calculated as the coordinate of the vertex with the minimum Euclidean distance to the center-of-mass of the vertices within each parcel. We have clarified this in the revised manuscript (“Results” section, “Correlated gene expression” subsection):

“We assessed this relationship by extracting the upper triangle of the correlated gene expression matrices and correlating them with the upper triangle of a regional distance matrix, derived by computing the average Euclidean distance between brain region centroids in the Desikan-Killiany atlas.”

P13 – typo: "asses" (you meant assess).

Fixed!

Figure 3 – Panel C might benefit from a schematic. I personally found it somewhat difficult to understand exactly what was being shown.

We have attempted to clarify Figure 3c by updating the figure labels and caption to be more explicit that the correlations shown are between the maps depicted in Figure 3b.

“(c) The Pearson correlation between the cortical somatostatin (SST) maps generated by the nine pipelines shown in panel (b).”

The number of pipelines tested is written at least two different ways (746,946 and 746,496).

We have updated the number of pipelines throughout the manuscript to the correct value of 746,496.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Brett M, Markiewicz CJ, Hanke M, Côté MA, Cipollini B, McCarthy P, Cheng CP, Halchenko YO, Cottaar M, Ghosh S. 2019. Nipy/Nibabel. Zenodo. [DOI]
    2. Markello R, Shafiei G, Zheng YQ, Mišić B. 2021c. Rmarkello/Abagen. Zenodo. [DOI]
    3. Waskom M, Botvinnik O, OḰane ­D, Hobson P, Ostblom J, Lukauskas S, Gemperline DC, Augspurger T, Halchenko Y, Cole JB. 2018. Mwaskom/Seaborn. Zenodo. [DOI]
    4. Waskom M, Larson E, Brodbeck C, Gramfort A, Burns S, Luessi M, Weidemann CT, Bitzer S, Markiewicz C, LaPlante R, Halchenko Y, Engemann DA, van Vliet M, Ghosh S, Klein N, Piantoni G, Brett M, Gwilliams L, King JR, Liu D. 2020. nipy/pysurfer. Zenodo. [DOI]

    Supplementary Materials

    Transparent reporting form
    Supplementary file 1. Default abagen pipeline options.

    The default settings for the 17 processing steps considered when processing the AHBA data with abagen. An entry of ‘—' indicates that this is a required, user-supplied parameter. A blank entry indicates that the processing step is not implemented by default. Refer to Table 1 and Methods: Gene expression pipelines for further details.

    elife-72129-supp1.pdf (55.6KB, pdf)

    Data Availability Statement

    All code used for data processing, analysis, and figure generation is available on GitHub (https://github.com/netneurolab/markello_transcriptome; Markello, 2021a copy archived at swh:1:rev:3abbc85596a5baacd93e5e9e56c906c9dbb080f3)and directly relies on the following open-source Python packages: IPython (Perez and Granger, 2007), Jupyter (Kluyver et al., 2016), Matplotlib (Hunter, 2007), NiBabel (Brett et al., 2019), NumPy (Oliphant, 2006; van der Walt et al., 2011; Harris et al., 2020), Pandas (McKinney, 2010), PySurfer (Waskom et al., 2020), Scikit-learn (Pedregosa et al., 2011), SciPy (Virtanen et al., 2020), and Seaborn (Waskom et al., 2018).

    All datasets used in this study are publicly available. Detailed information about the datasets and how to access them are described in the manuscript.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES