The single-cell eQTLGen consortium

MGP van der Wijst; DH de Vries; HE Groot; G Trynka; CC Hon; MJ Bonder; O Stegle; MC Nawijn; Y Idaghdour; P van der Harst; CJ Ye; J Powell; FJ Theis; A Mahfouz; M Heinig; L Franke

doi:10.7554/eLife.52155

. 2020 Mar 9;9:e52155. doi: 10.7554/eLife.52155

The single-cell eQTLGen consortium

MGP van der Wijst ^1,^†,^✉, DH de Vries ^1,^†, HE Groot ², G Trynka ^3,⁴, CC Hon ⁵, MJ Bonder ^6,⁷, O Stegle ^6,⁷, MC Nawijn ⁸, Y Idaghdour ⁹, P van der Harst ², CJ Ye ¹⁰, J Powell ¹¹, FJ Theis ^12,¹³, A Mahfouz ^14,¹⁵, M Heinig ^12,¹⁶, L Franke ¹

Editors: Helena Pérez Valle¹⁷, Peter Rodgers¹⁸

¹Department of Genetics, Oncode Institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

²Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

³Wellcome Sanger Institute, Hinxton, United Kingdom

⁴Open Targets, Hinxton, United Kingdom

⁵RIKEN Center for Integrative Medical Sciences, Yokahama, Japan

⁶Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁷Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

⁸Department of Pathology and Medical Biology, GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

⁹Program in Biology, Public Health Research Center, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates

¹⁰Institute for Human Genetics, Bakar Computational Health Sciences Institute, Bakar ImmunoX Initiative, Department of Medicine, Department of Bioengineering and Therapeutic Sciences, Department of Epidemiology and Biostatistics, Chan Zuckerberg Biohub, University of California San Francisco, San Francisco, United States

¹¹Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute, UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, Australia

¹²Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany

¹³Department of Mathematics, Technical University of Munich, Garching bei München, Germany

¹⁴Leiden Computational Biology Center, Leiden University Medical Center, Leiden, Netherlands

¹⁵Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands

¹⁶Department of Informatics, Technical University of Munich, Garching bei München, Germany

¹⁷eLife, United Kingdom

¹⁸eLife, United Kingdom

^†

These authors contributed equally to this work.

^✉

Corresponding author.

Roles

Helena Pérez Valle: Reviewing Editor

Peter Rodgers: Senior Editor

PMCID: PMC7077978 PMID: 32149610

Abstract

In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. Rapid increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. To fully leverage these emerging data resources, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing the cellular contexts in which disease-causing genetic variants affect gene expression. Here, we outline the goals, approach and potential utility of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.

Research organism: Human

Interindividual variation needs to be studied at the single-cell level

Genetic variants, most commonly single nucleotide polymorphisms (SNPs), can contribute to disease in a plethora of ways. In monogenic diseases, one single variant is sufficient to result in a disease phenotype. In complex diseases, tens to hundreds of variants each independently contribute to disease risk and an accumulation of risk alleles – often in combination with specific environmental exposures – is required to develop the disease phenotype. The overwhelming evidence showing enrichment of disease-associated variants in regulatory regions suggests that regulation of gene expression is likely a dominant mediator for disease risk. Expression quantitative trait loci (eQTL) analysis is commonly used for linking disease risk-SNPs to downstream expression effects on local (cis) or distal (trans) genes. Large-scale eQTL efforts such as GTEx (GTEx Consortium, 2017), PsychENCODE (Wang et al., 2018), ImmVar (Ye et al., 2014), BLUEPRINT (Chen et al., 2016), CAGE (Lloyd-Jones et al., 2017), and eQTLGen (Võsa, 2018) have proven highly valuable to identify downstream transcriptional consequences. All these efforts together lead to ever growing sample sizes that now allow us to start identifying both cis- and trans-eQTLs.

An important next step is to precisely define the cellular contexts in which disease risk-SNPs affect gene expression levels. This will help to better understand the molecular and cellular mechanisms by which disease risk is conferred and to inform therapeutic strategies. This needs to be done as recent analyses have shown that many eQTL effects are tissue (GTEx Consortium, 2017; Fu et al., 2012) and cell type-specific (Brown et al., 2013; Fairfax et al., 2012). Additionally, many eQTLs are conditional, and only revealed after specific stimuli that, for example, change the activation or differentiation of specific cell types (Ye et al., 2014; Cuomo et al., 2020). Beyond the ability to annotate individual disease associations, cell type-specific eQTLs have been shown to be strongly enriched for heritability across complex traits (Hormozdiari et al., 2018). Sorting (Fairfax et al., 2012; Ishigaki et al., 2017) and computational deconvolution (Westra et al., 2015; Venet et al., 2001) of cell types from bulk samples have been used to uncover context-specificity of eQTLs. However, these methods are biased towards known cell types defined by a limited set of marker genes (Zhernakova et al., 2017), are of limited use for less abundant cell types, and do not capture any heterogeneity within a sorted population. In contrast, single-cell RNA-sequencing (scRNA-seq) enables the simultaneous and unbiased estimation of cellular composition and cell type-specific gene expression (van der Wijst et al., 2018a), and is particularly well positioned to investigate rare cell types (Villani et al., 2017). As opposed to using bulk data, single-cell data allows us to also link genetics to phenomena such as cell-to-cell expression variability (Cuomo et al., 2020), cell type heterogeneity (Donovan et al., 2019), and gene regulatory network differences (van der Wijst et al., 2018a). As such, single-cell analyses in a population-based setting will likely become mainstream in the next few years. However, we envision that most scientific value will be obtained by unifying these efforts. Additionally, to utilize the aforementioned developments in the single-cell field most efficiently and effectively, a coordinated effort from multiple research groups is urgently needed.

Here we introduce the single-cell eQTLGen consortium (sc-eQTLGen), a large-scale, international collaborative effort that has been set up to identify the upstream interactors and downstream consequences of disease-related genetic variants in individual immune cell types (https://eqtlgen.org/single-cell.html, Figure 1). In this consortium we will attain a sufficiently large sample size to have the statistical power to unbiasedly identify cell type-specific effects on both cis and trans genes. Moreover, we aim to reconstruct context-specific gene regulatory networks (GRNs) by combining single-cell and bulk RNA-seq datasets for increased resolution. We expect the results of sc-eQTLGen to have an impact in a number of areas including the prioritization of disease-risk genes, the prediction of drug efficacy and the reconstruction of personalized GRNs.

Figure 1. — The sc-eQTLGen consortium combines an individual’s genetic information with single-cell RNA expression (scRNA-seq) data of peripheral blood mononuclear cells (PBMCs) in order to identify effects of genetic variation on downstream gene expression levels (eQTLs) and to enable reconstruction of personalized gene regulatory networks. Right panel is adapted from van der Wijst et al. (2018b).

Integration of sc-eQTLGen within the scientific landscape

Large numbers of single cell expression profiles from many individuals are required to reach our goals. The accessibility and clinical relevance of peripheral blood mononuclear cells (PBMCs) have made them the most studied cell types in current population-based scRNA-seq datasets. Therefore, to have such datasets from the same tissue type readily available, we have chosen to focus on PBMCs. It also allows for continuation of the knowledge acquired through the eQTLGen consortium, which performed the largest eQTL meta-analysis to date using whole blood bulk gene expression data of over 30,000 individuals to reveal the influence of genetics on gene expression (Võsa, 2018). The sc-eQTLGen consortium now allows us to take the next step by systematically assessing the cell types and contexts in which the eQTL effects manifest. Beyond resolving the influence of genetics on individual genes, the consortium will also take advantage of the unique features of scRNA-seq data to learn the directionality of GRNs and uncover how genetics is affecting co-expression relationships (van der Wijst et al., 2018a). We expect that the infrastructure and best practices developed within sc-eQTLGen can serve as a basis for studying population genetics at the single-cell level in solid tissues in the future.

Other large-scale efforts such as the Human Cell Atlas (HCA) (Regev et al., 2017) or Lifetime FET flagship consortium (https://lifetime-fetflagship.eu) mainly focus on mapping all cells of the human body or a disease context in a limited number of individuals. The sc-eQTLGen consortium will complement those efforts by putting a unique focus on deciphering the impact of genetic variation on gene expression and its regulation. Different to experimental designs that aim to generate an extensive map on a low number of individuals, we require larger numbers of individuals, whereas the number of cells per individual can be lower. This will enable the accurate capture of both the genetic variation and the cell type heterogeneity. By building on the data and harmonized cell type annotations generated within the HCA, our results will be easily transferable to other datasets as well. We will share best practices of the HCA consortium with regard to data acquisition, analysis and reporting. We also share standards for open science and the infrastructure and legal frameworks for data sharing while accounting for the privacy issues specific to genetic, health record and demographic information.

Single cell eQTL analysis: the new era of population genetics

The practice of identifying eQTLs is shifting from bulk to single-cell analyses. Considering only its ability to identify eQTLs, scRNA-seq data has a lower statistical power compared to bulk RNA-seq data on the same number of donors (Cuomo et al., 2020; Sarkar et al., 2019), likely due to increased sparsity of the single-cell data. Nevertheless, there are several clear benefits of single-cell over bulk expression data for QTL analysis. First, scRNA-seq data enables the simultaneous estimation of the composition and expression profiles of discrete cell populations including cell types and their activation states (van der Wijst et al., 2018a; Figure 2). Second, scRNA-seq data provides a flexible, unbiased approach that has increased resolution to define cell states along continuous dynamic processes in which the eQTL effects manifest themselves (Cuomo et al., 2020). Third and fourth, single-cell data allows estimating the variability in gene expression across individual cells (Brennecke et al., 2013; Eling et al., 2018), which could be used to improve mean estimations for eQTL analysis. At the same time, the single-cell nature now also enables us to look at the effect of genetic variation on transcriptomic traits other than average gene expression level, such as dispersion QTLs that alter the variance independently of the mean expression (Sarkar et al., 2019) or cell type proportion QTLs (Kang et al., 2018), providing a new angle on how genetic variation may impact disease pathogenesis. Fifth, the large number of observations per individual (i.e. cells) enable the generation of personalized co-expression networks, which vastly reduces the number of individuals required to identify SNPs altering co-expression relationships (i.e. co-expression QTLs [van der Wijst et al., 2018a]). Finally, and paradoxically, is the potential benefit of lower experimental costs compared to bulk experiments on sorted cells: such experiments require a library to be generated for each sorted population, whereas a single scRNA-seq library of just one sample contains all this information and can easily be multiplexed across multiple individuals (Kang et al., 2018).

Figure 2. — Using scRNA-seq data for eQTL mapping offers a number of advantages over bulk RNA-seq based approaches, of which the flexibility in analysis strategy is a major one. (A) From single cell data, individual cell types can be identified and we can map eQTLs for each of these. (B) Alternatively, lineages based on either knowledge of cell developmental lineages or through pseudo-time based approaches can be constructed. By positioning cells across a trajectory dynamic changes in the allelic effects on gene expression levels as a function of trajectory position can be integrated. (C) Finally, as the discoveries of new cell subtypes are made or cell type definitions are being refined, the analysis can be revisiting by re-classifying cells and determining how the genetic effects on gene expression vary on these new annotations.

So far, only a limited number of papers have performed eQTL analysis using scRNA-seq data (Cuomo et al., 2020; van der Wijst et al., 2018a; Sarkar et al., 2019; Kang et al., 2018). In the earliest single-cell eQTL studies, bulk-based eQTL analysis approaches, such as Spearman rank correlation (Heap et al., 2009; Stranger et al., 2007) and linear regression (Michaelson et al., 2009; Stranger et al., 2005), were applied to the average expression level of all cells from a particular cell type per individual. However, the underlying assumptions of these bulk-based approaches may not be applicable to scRNA-seq data. Therefore, these bulk-based methods will lose statistical power when applied to scRNA-seq data, because of the inflation of zero values (i.e. sparsity). More recently, single-cell-specific eQTL methods have been developed that, for example, take into account zero-inflated gene expression (Sarkar et al., 2019; Hu and Zhang, 2018) or take advantage of pseudotime (i.e. statistically inferred time from snapshot data) to increase the resolution by which response-/differentiation-associated eQTLs (dynamic eQTLs, i.e. eQTLs that dynamically change along pseudotime) can be identified (Cuomo et al., 2020). Instead of averaging gene expression levels across all cells from a particular cell type, some of these approaches look at the fraction of zero expression and the non-zero expression separately for each gene (Hu and Zhang, 2018). Other approaches take dynamic pseudotime-defined instead of statically-defined cell types into consideration for the eQTL analysis (Cuomo et al., 2020). This latter approach was shown to uncover hundreds of new eQTL variants during iPSC differentiation that had not been detected when static differentiation time points would have been used (Cuomo et al., 2020). In line with this, we expect that some of these methodological advances, as opposed to bulk-based approaches, will further improve the power and resolution of single-cell eQTL analysis. However, there are two initial challenges that need to be carefully addressed for single-cell eQTL mapping: firstly, the normalization of data to remove technical variation in sequencing depth per cell, while avoiding the removal of biological variation; and secondly, the identification or classification of a cell into a cell type or state.

During library preparation and sequencing, technical and stochastic factors will lead to variation in cell-to-cell sequencing depth. However, simply normalizing to equal sequencing depth per cell will remove important biological variation – for example a CD4⁺ T cell is expected to have lower RNA contents than a plasma B cell. Therefore, we need to employ normalization strategies that can account for traditional batch effects, such as sample run or sequencing lane, while retaining biological differences (Bacher et al., 2017; Hafemeister and Satija, 2019).

Once normalized, each cell needs to be accurately annotated into a cell type and/or cell state to maximize the statistical power to detect cell type-specific eQTLs. We encourage the use of individual cell classification approaches, rather than cluster-labeling methods. Clustering approaches are powerful ways of identifying a subpopulation of cells that share similar expression levels. However, while most cells placed in a specific cluster will likely be the same cell type, clusters can also contain alternative cell types. Labeling all cells in a cluster based on a high percentage of the expression of a canonical marker(s) will therefore lead to the incorrect classification of some cells (Alquicira-Hernández et al., 2018). To acquire a reliable classification model, large scRNA-seq datasets from various contexts are required. Such datasets have been collected within large-scale efforts such as our consortium and the HCA. We expect these will help to develop a gold standard classification model that can classify each cell independently. This will ensure a higher accuracy in cell labeling and thus will maximize power to detect cell type-specific effects.

After solving these challenges, eQTLs can be mapped by either averaging the normalized expression levels on a per gene, per cell type, per individual basis. Alternatively, each cell from an individual can be taken as a repeated measure which can then be used to fit a statistical model to all cells, while including a random effect of the individual.

Instead of using observational studies, eQTLs could also be identified through experimental approaches that use single cells as individual units of experimentation (Gasperini et al., 2019). Sample multiplexing (see Box 1) can be combined with experimental perturbation to more efficiently characterize the genetic architecture of gene expression. For example, synthetic genetic perturbations with CRISPR/Cas9 may allow precise control of the expression levels of target gene regulators enabling the validation of detected trans-eQTLs and the establishment of upper and lower bounds of trans effects. Encoding environmental and genetic perturbations across large population cohorts also enables new designs for studying genetic interactions, both gene-by-environment and gene-by-gene (epistasis). Historically, characterizing these effects in human cells has been plagued by the lack of power and the susceptibility to technical confounding of bulk experiments. Recent work that knocked out ~150 regulators in primary human T cells of nine donors illustrates a proof of concept of how single-cell sequencing across individuals can be combined with experimental perturbations to detect these genetic interactions (Gate, 2019).

Box 1.

Guidelines for creating a population-based single-cell cohort.

Even though a single-cell eQTL dataset has less discovery power than an equal-sized bulk RNA-seq eQTL dataset (6.9 fold difference based on the lowest correlation that led to the identification of a significant eQTL from single-cell [van der Wijst et al., 2018a] vs bulk RNA-seq data [Zhernakova et al., 2017]), it does provide insights that cannot easily be extracted from bulk data. For example, single-cell data allows for the unbiased detection of cell type- and context-dependent eQTLs and has more power to detect co-expression QTLs (van der Wijst et al., 2018a). This makes population-based single-cell datasets a valuable addition to bulk-based datasets for studying the effects of genetic variation on gene expression and its regulation (van der Wijst et al., 2018a; Kang et al., 2018). In comparison to ‘standard’ single-cell datasets, generating such population-based single-cell datasets require some additional aspects to be taken into account.

First of all, the genetic information that is available for each of the individuals in such cohorts can be used to demultiplex pools of multiple individuals within the same sample. This approach allows to properly randomize experiments, while also significantly reducing cost and confounding effects (Kang et al., 2018; Bycroft et al., 2018). This genetic information can either be efficiently generated using genotype arrays (Marchini and Howie, 2010) in combination with imputation-based approaches (Xu, 2019), or extracted from the scRNA-seq data itself (Bycroft et al., 2018; McCarthy et al., 2016). Within the consortium all reads will be aligned to the GRCh38/hg38 reference genome and genotypes will be imputed using the Haplotype Reference Consortium reference panel (Gravel et al., 2011). The basic principle behind genetic multiplexing is that enough transcripts harboring SNPs are expressed and detected in each single cell such that cells can be accurately assigned to the donor of origin. Furthermore, as the number of multiplexed individuals increases, the probability that a droplet harbors multiple cells from different individuals increases, thus allowing the detection of multiplets using genetic information. This enables the overloading of cells into standard droplet-based workflows and overall reduction of cost per cell up to about 10-fold (https://satijalab.org/ costpercell). As the cost of sequencing and the background multiplet rate reduce, the benefits of multiplexing increase. We anticipate that future workflows will allow for even higher throughput.

Secondly, accounting for ethnicity variation and population stratification will be required when single-cell data of diverse populations are being analyzed. It is known that a different genetic architecture exists between different populations. Nevertheless, practical considerations have limited the majority of eQTL studies to cohorts of European origin. As an undesirable consequence of this bias in population representation, certain variants may not have been detected before (Carlson et al., 2013) or the effect sizes and associated polygenic risk scores based on the European population may not be translatable to other populations (Martin et al., 2017; Sirugo et al., 2019). Therefore, inclusion of datasets from different ethnic populations will help reduce long-standing disparities in genetic studies and has many analytical advantages (Wojcik et al., 2019; Hsiao et al., 2010). For example, the increased genotype frequency diversity will enhance the range over which gene expression varies, and thereby, will further increase statistical power. To implement multi-population sc-eQTL analysis, several challenges have to be addressed. Handling data from populations with different levels of population genetic properties such as LD structure, relatedness and multiple genetic origins that result in the presence of genetic covariance remains important and requires appropriate adjustments to avoid spurious signal and to manage the bias in estimating genetic cis- and trans-effects (Zhou and Stephens, 2012; Mandric, 2019). This is particularly important when differences in cohort-specific genetic characteristics are enhanced such as when family-based and unrelated cohorts or cohorts of different ancestries are analyzed. Failing to account for these effects affects the accuracy of mapping and results in false positives.

Finally, studying genetic variation at the single-cell level adds some extra requirements for the number of cells per individual and the number of individuals to be included in the study. The number of cells per individual will mainly define for which cell types in a heterogeneous sample such as PBMCs eQTL and co-eQTL analyses can be performed. In contrast, the number of individuals will mainly define the number of genetic variants for which effects on gene expression can be confidently assessed. A recent analysis showed that, with a fixed budget, the optimal power for detecting cell type-specific eQTLs is obtained when the number of reads is spread across many individuals¹¹⁹. Even though a lower sequencing depth per cell results in a lower accuracy of estimating cell type-specific gene expression levels, many more individuals and cells per individual can be included for the same budget. As a result, the optimal experimental design with a fixed budget provides up to three times more power than a design based on the recommended sequencing depth of 50,000 reads per cell (for 10X Genomics scRNA-seq). In contrast, for co-eQTL analysis there is a different trade-off between sequencing depth, number of individuals and number of reads per cell; while for eQTL analysis gene expression levels among cells of the same cell type can be averaged, for co-eQTL analysis you cannot as this would prohibit you from calculating a gene-gene correlation per individual. Therefore, for co-eQTLs the sequencing depth will be a major limiting factor that determines the number of genes for which you can confidently calculate gene-gene correlations. Altogether, depending on the goal of your study, the optimal balance between sequencing depth and number of individuals and cells per individual will be different. By the end of 2020, the sc-eQTLGen consortium will provide standardized pipelines and guidelines for single-cell population genetics studies.

Another promising avenue that has become available in recent years to gain increased insights in the link between genetics and disease, is through the use of spatial transcriptomics technologies, including MERFISH, seqFISH+, Slide-seq and 10x Visium (Burgess, 2019; Maynard et al., 2020). While for PBMCs this approach may not be applicable, in solid tissues and organs this extra layer of spatial information is extremely valuable. For example, it can help to disentangle trans-eQTL interactions that are modulated through cell-cell communication (e.g. a SNP affects ligand expression in one cell type, and thereby affects downstream receptor signaling in a second cell type). Despite not having this spatial information available in PBMCs, other approaches that consider receptor-ligand expression pairs do provide insights in potential cell-cell interactions. These approaches have been successfully applied before to uncover how the ligand expression in one cell type can affect the frequency (Smillie et al., 2019) or the downstream signaling (Arneson et al., 2018) of another cell type expressing the corresponding receptor.

Single cell GRN reconstruction: taking eQTLs one step further

In the case of complex diseases, it is not the disruption of a single gene that causes the disease phenotype. In fact, hundreds of variants can contribute to the disease and converge into just a few key disrupted regulatory pathways (Westra et al., 2013; Fagny et al., 2017). Therefore, for a better disease understanding and to take eQTLs one step further, one has to look beyond the disruption of individual genes and determine how the interaction of genes changes based on cell type (Battle et al., 2014; GTEx Consortium, 2017; Westra et al., 2015; Knowles et al., 2017), environment (Favé et al., 2018; van der Wijst et al., 2018b) and an individual’s genetic makeup (Zhernakova et al., 2017; van der Wijst et al., 2018a). The sc-eQTLGen consortium will do so by reconstructing personalized, cell type-specific GRNs (La Manno et al., 2018; Figure 3). The unique features of scRNA-seq data, among which the inference of pseudotime and RNA velocity (i.e. the ratio between spliced and unspliced mRNA that allows prediction of the future state of a cell) (Qiu, 2018), enable learning the directionality of network connections (Fiers et al., 2018). We expect that such personalized GRNs will help explain for example differences in interindividual drug responses, and thereby, will aid in precision medicine in the future.

Figure 3. — Individual and cell-type specific scRNA-seq data will be used to construct personalized gene regulatory networks. Some single cell datasets allow for the inference of trajectories, for instance in response to a stimulus. These can be used as input to dynamic models to infer causal (directed) interactions. Steady state datasets, characterized by cell type clusters can be analyzed with models that exploit co-expression, prior networks or cell type-specific reference scATAC-seq datasets in combination with sequence motifs to infer directed transcription factor-target relations. Topological comparison between personalized networks of groups of individuals can reveal coordinated differences, for instance the change of connectivity in densely connected modules, change of connectivity of hub genes or changes of module membership of individual genes.

Reconstruction of GRNs from single cell data reviewed in Raj et al. (2006) is complicated by the sparsity of the data as a consequence of the stochasticity underlying gene expression (Chen and Mar, 2018) and dropouts, i.e. genes that are not detected in some cells as a consequence of technical limitations (Jackson et al., 2019). This sparsity leads to lower correlation estimates that obscure the identification of true edges in the GRNs. Several solutions have been developed to overcome this problem, including the implementation of prior information (Aibar et al., 2017; Andrews and Hemberg, 2018), gene expression imputation (Aibar et al., 2017; Iacono et al., 2019) and usage of alternative measurements of correlation (Skinnider et al., 2019; Budden et al., 2014).

Firstly, prior information encoded in the DNA sequence can be used to overcome these complications (Angermueller et al., 2017; Angelini and Costa, 2014). Such priors on regulatory interactions can be derived from, for example, ChIP-seq data (Miraldi et al., 2019), ATAC-seq data (Qin et al., 2014), spatial information (Burgess, 2019; Maynard et al., 2020) or from perturbation experiments (Gate, 2019; Aibar et al., 2017). Implementation of such priors was shown to improve bulk GRN reconstruction (Qin et al., 2014; Ghanbari et al., 2015; Azizi et al., 2018), and similarly, it is expected to also improve GRNs reconstructed from single-cell data (Aibar et al., 2017; Andrews and Hemberg, 2018). However, caution is warranted when using this information, as their effect on GRN reconstruction depends on the quality of these data priors (Siahpirani and Roy, 2017; Simpson, 1951) and priors derived from bulk data may not hold true at the single-cell level (Buenrostro et al., 2015). Recent technological advances enable studying chromatin accessibility (Lareau et al., 2019; Hayashi et al., 2018) and expression of enhancers RNAs (Kouno et al., 2019; Lin et al., 2016) at the single cell level, which will make it possible to implement single-cell derived priors in GRN reconstruction in the future, though these quantifications come with their own limitations and challenges.

Secondly, gene expression imputation may be used to restore the underlying correlation structure. However, current gene expression imputation methods become more unreliable as the dropout rates increase (Iacono et al., 2019; Skinnider et al., 2019). After gene expression imputation, more network edges are identified, but with a higher chance of detecting false positives (Aibar et al., 2017; Iacono et al., 2019). Nevertheless, by combining prior information with imputation, GRN reconstruction can be improved both in the bulk (Qin et al., 2014) and single cell setting (Aibar et al., 2017). For example, one can replace transcription factor expression with inferred transcription factor activities based on the collective expression patterns of their target genes or take advantage of cross-omics relationships (Stoeckius et al., 2017).

Finally, alternative correlation measures are being explored to overcome the complications associated with data sparsity, including measures of proportionality (Budden et al., 2014) and by calculating the correlations on measures other than the normalized expression counts (Skinnider et al., 2019). For example, Z-scores of the gene expression distributions of highly similar cells have been used to calculate the co-expression relationships. This approach could reveal the true correlation structure that was otherwise hidden by technical artifacts (Skinnider et al., 2019). In addition to these computational tools, technological advances, such as single-cell multi-omics approaches (Cao et al., 2018; Moignard et al., 2015) and improved experimental protocols, are expected to alleviate these complications. Moreover, being able to assess multiple layers of information within the same cell, e.g. chromatin accessibility, DNA methylation, gene and protein expression, opens unique opportunities for developing new methodology for GRN reconstruction and validation. Altogether, this will further improve the accuracy of GRNs reconstructed using single-cell data in the future.

The incorporation of dynamic information extracted from time series or pseudotime (Ocone et al., 2015; Pratapa et al., 2019) is another promising avenue to further improve single-cell GRN reconstruction. However, not all datasets are equally well suited to identify temporal trajectories. For example, PBMCs are usually in steady state, and only after stimulation such trajectories would appear.

Summarized, the ideal GRN reconstruction tool can efficiently manage large amounts of single-cell data, incorporate prior information, model non-linear relationships and take dynamic information into account. Early benchmark studies, performed for a limited number of methods on rather small datasets (Jackson et al., 2019) or on simulated data (Lukowski et al., 2017) show that current tools usually only work well in specific situations. As such, there is a clear need for the development of all-round tools that work well in every situation.

sc-eQTLGen: a federated single-cell eQTL meta-analysis consortium

Combining data of numerous groups increases the resolution and power by which downstream analyses, such as eQTL identification and personalized GRN reconstruction, can be performed. Ideally, all scRNA-seq datasets should be jointly analyzed at one centralized location. This is particularly helpful to align each group’s approaches for preprocessing, quality control (QC) and cell type classification. However, it also eases for instance benchmarking different statistical and computational methods. While this concept of ‘bringing the data to the algorithm’ is preferred from an analytical perspective, it is usually very difficult to do so when handling privacy-sensitive scRNA-seq and genotype data from human individuals (Lloyd-Jones et al., 2017; Yengo et al., 2018).

To overcome this, a federated approach could be used instead, which has the aim of ‘bringing the algorithm to the data’: each participating cohort will run the analyses themselves (adhering to predefined criteria for preprocessing and QC), and will only share summary statistics that are not privacy-sensitive. Finally, one site takes responsibility for performing the overall meta-analysis using these provided summary statistics. For genome-wide association studies this is a common strategy (Xue et al., 2018; Luecken and Theis, 2019), and for eQTL studies this procedure has been shown to be effective as well (Võsa, 2018; Westra et al., 2013). In the following sections we will expand on all steps that have to be taken and what considerations should be made when conducting such a federated approach for single-cell population genetics studies (Figure 4).

Figure 4. — sc-eQTLGen aims to identify the downstream consequences and upstream interactors of gene expression regulation. To increase the resolution and power of this analysis, datasets of multiple cohorts need to be combined while taking privacy issues into account. This will be done using a federated approach in which we will first harmonize all preprocessing and quality control (QC) steps across cohorts. Subsequently, shared gene expression matrices will be normalized and cell types will be classified based on a trained reference dataset (e.g. Immune Cell Atlas (ICA)). Any cells that cannot be classified using this trained classifier, representing new cell types or previously unknown cell states, can then be manually annotated based on marker genes, and then be used to further train the classifier. Each cohort will then separately perform a *cis-* and *trans*-eQTL and co-expression QTL analysis using their genotype and expression matrix, while using appropriate statistical models to account for effects such as gender, population structure and family-relatedness that can alter the genotype-expression relationship in a cohort-specific manner. The summary statistics will be shared and analyzed in one centralized place. Finally, these results will be used for reconstruction of personalized and context-specific gene regulatory networks. Bottom panel is reproduced from Võsa (2018).

Preprocessing, quality control

The first challenge of federated analyses is the need to have a standardized protocol on how each group should perform their analyses. While such a protocol helps to ensure reproducibility of the data analysis, it requires that all methods and tools used have been rigorously tested before. For scRNA-seq data such protocols are still under development, while in other fields such as that of genome-wide association studies, standardized protocols have been available for years.

Several initiatives are now being undertaken to define best practices in the scRNA-seq field (Tian et al., 2019). For example, Tian et al. have compared 3913 combinations of different scRNA-seq data analysis pipelines to define best practices in the field (Price et al., 2006). Such initiatives could provide the basis for defining the optimal preprocessing, QC and cell type classification steps for our consortium. Additionally, in population-based scRNA-seq studies special attention is required to account for ethnic variation and population stratification (see Box 1; Zeng and Gibson, 2019; Zhou and Stephens, 2012). In the event of presence of relatedness in a given cohort, a genetic relatedness matrix will be included in a mixed model to account for the effect, such as in Zhou and Stephens (2012); Abdelaal et al. (2019). Adjustments of cohort-level genetic differences will be made in the framework of meta-analysis using summary statistics of the individual cohorts. Once all protocols are established, we can harmonize the preprocessing steps across all groups in the consortium, such as the genome build to use, alignment tool and sample demultiplexing strategy. Due to the cohort-specific characteristics of each dataset, the QC steps cannot be harmonized to the same extent as the preprocessing. Nevertheless, the parameters used for QC can be coordinated across all groups, such as the cutoffs for number of detected genes per cell and mitochondrial fraction. Both the preprocessing and the QC do not require exchanges of data and can be performed independently.

Cell type classification

To facilitate the eQTL meta-analysis, we need to ensure that the cell type annotations are consistent across the different cohorts. To ensure reproducibility of annotations across the different cohorts, we will employ a classification scheme to identify canonical cell types in each cohort separately. Performing cell type labeling using classification models does not only increase the reproducibility, but also constitutes a privacy-safe way of annotating cell types that does not require the sharing of raw or processed gene expression data.

Reference datasets with labeled cells, such as those available from the Immune Cell Atlas (http://immunecellatlas.net/) will be used to train a classifier for automatic cell type classification in each cohort. Our recent comparisons of single cell classification methods showed that simple linear models can yield good results (Köhler et al., 2019; Wolf et al., 2019). Despite the wide availability of reference datasets, we expect that some cohorts will contain novel unknown cell types or states that cannot be identified using the trained classifier. For this, we will use a classification scheme with a rejection option that can flag unknown cells whenever the confidence in cell type assignment is low (Köhler et al., 2019). The rejected cells can then be manually annotated based on marker gene expression.

To capitalize on the large number of cells and individuals to be profiled in each cohort, an unsupervised clustering approach will be used to analyze the count matrix of each cohort, in parallel to the supervised approach described earlier. This unsupervised approach will serve two purposes: (1) it will help annotate unassigned cells by the classifier, and (2) it will allow refining the resolution at which cells are annotated. Varying levels of granularity of the clustering may reveal cell types, as well as particular cell states or subtypes. This level of granularity required to separate particular cell states is not known a priori. Therefore, novel unbiased approaches such as partition based graph abstraction (Baran, 2018) or metacells, i.e. disjoint, homogenous and highly compact groups of cells that each exhibit only sampling variance (Saelens et al., 2019), provide a framework to reconcile discrete states at different levels of granularity with continuous cell states. These novel annotations can feed back into an iterative online learning approach of supervised classification models, where we could refine cell type prediction models on the available datasets. Once new datasets become available within the consortium these can be annotated based on current models and updated labels can be used in the next round of training. An important consideration here is to preserve the hierarchy of cell annotations, so that if new annotations are added to the classifier, they are subclasses of existing classes. In this way, any downstream analysis based on older annotations remains valid at the older level of granularity. This would yield a coherent approach of labeling over time as the dataset grows. For inference of continuous cell states, we require data integration across multiple centers, as this would ensure the usage of a similar pseudotime scale between individuals. Currently, ordering cells along pseudotime is challenging and best practices are being evaluated (Price et al., 2006; Roshchupkin et al., 2016).

Ultimately, integrating all expression data in a privacy-preserving manner, i.e. as gene expression matrices, will produce a dataset with unprecedented numbers of cells. Such a dataset allows discovery of novel rare cell types or states using clustering approaches as described above. This valuable dataset will then be shared with the community through platforms like the HCA data portal.

eQTL and co-expression QTL analysis

After cell type assignment, annotated gene expression matrices can be returned to each of the cohorts. Each cohort will then map genome-wide cell type-specific cis- and trans-eQTLs by combining the cell type-specific gene expression matrices with the privacy-sensitive bulk-assessed genotype information using appropriate statistical models. The resulting summary-statistics can then be safely shared without privacy-issues.

One challenge with federated eQTL analyses is that the amount of summary statistics that need to be shared is substantial. For instance, when assuming there are 10 cohorts and for each of these cohorts cells have been assigned to 10 major cell types, a genome-wide trans-eQTL analysis (testing the effect of 10,000,000 common SNPs on 20,000 protein coding genes for each of the 10 cell types), where only the correlation for a SNP-gene combination is stored as a 64 bit double value, would require each cohort to exchange 10,000,0000 × 20,000×10 x eight bytes = 146 terabytes of data. To overcome this problem, several frameworks have recently been proposed that take advantage of the fact that these summary statistics matrices reflect the product of a normalized genotype matrix and a normalized gene expression matrix. For instance, the HASE framework (Silvester et al., 2018) recodes genotype and phenotype (i.e. gene expression) data, along with a covariate matrix, in such a way that privacy is ensured and only those matrices, making up only a few gigabytes of data, need to be exchanged.

While protocols exist that explain how eQTL data needs to be processed, harmonized and QCed to perform a federated eQTL analysis (e.g. eQTLGen used the eQTLMappingPipeline [Võsa, 2018]), not all steps can be completed immediately: for instance, to identify effects of polygenic risk scores on gene expression levels (ePRS), gene expression data first needs to be corrected for cis-eQTL effects (Võsa, 2018). Therefore, the full cis-eQTL meta-analysis has to precede calculations of ePRSs. Such iterations take considerable time and are also inconvenient, since it requires a lot of coordination with each of the participating cohorts. For sc-eQTLGen we will first conduct a federated, cell type-specific cis- and trans-eQTL analysis. After this is completed, we will proceed with a co-expression QTL (co-eQTL) analysis. This analysis will be limited to a predefined set of genes or SNPs, such as the SNP-gene combinations extracted from the identified cis- and trans-eQTLs or the SNPs located within open chromatin regions that show high interindividual variability, as otherwise trillions of statistical tests have to be conducted (e.g. in van der Wijst et al., 2018a: 7975 variable genes * 7975 variable genes * 4,027,501 SNPs (MAF ≥0.1) = 256,151,580,788,125 tests). Finally, all these results will be combined to reconstruct personalized, cell type-specific GRNs. This multi-step approach will require us to go back and forth between the different cohorts at least twice. Therefore, easy-to-use analysis scripts that can be run efficiently on different high-performance cluster infrastructures are essential to limit the amount of hands-on time.

Gene regulatory network reconstruction

Finally, the scRNA-seq data will be used to reconstruct GRNs. Two strategies will be explored in the context of sc-eQTLGen. The first approach makes use of the large number of bulk RNA-seq datasets for specific cell types that are available in public RNA-seq repositories (Leinonen et al., 2011; Langfelder and Horvath, 2008). Using this publicly available bulk RNA-seq data, reference co-expression networks will be constructed using cell type-specific data. Subsequently, scRNA-seq data will be used to implement directionality and specify the connections in the network that are affected by specific contexts (La Manno et al., 2018). The second approach will directly use scRNA-seq data to build cell type-specific GRNs, thereby enabling to immediately take the context-specificity into account. However, the number of genes that can confidently be taken into account by this second approach may be lower due to the sparsity of scRNA-seq data. For both strategies, we will make use of prior information (e.g. ATAC-seq data [Qin et al., 2014], TF binding information), dynamic information (e.g. information extracted from time series data (Ocone et al., 2015), pseudotime (Pratapa et al., 2019) in combination with RNA velocity (Qiu, 2018; Fiers et al., 2018) and experimental validation (e.g. perturbation experiments [Gate, 2019; Aibar et al., 2017]) to go from a co-expression to a gene regulatory network. Before implementation, the additional benefit of using such information, extracted from either bulk or single-cell data (Aibar et al., 2017; Andrews and Hemberg, 2018), and using gene expression imputation (Aibar et al., 2017; Iacono et al., 2019) will be assessed. We expect that the optimal strategy will depend on the amount of available bulk data and prior information that is available for a particular cell type. We will extract this prior information from existing large-scale efforts, such as ENCODE (Wang et al., 2018) and BLUEPRINT (Chen et al., 2016). Additionally, we will make use of single-cell information beyond gene expression levels that is or will be collected within subsets of cohorts within the consortium, including information on chromatin accessibility (Lareau et al., 2019; Hayashi et al., 2018) and expression of enhancers RNAs (Kouno et al., 2019; Lin et al., 2016).

Additionally, recent advances have made it possible to measure multi-omics data from the very same cell (Moignard et al., 2015; Sverchkov and Craven, 2017). However, current approaches are very time- and cost-consuming, and therefore limited to only a few hundred cells. As such, currently, this type of single-cell multi-omics data is of limited use for reconstructing personalized GRNs. Nevertheless, as single-cell multi-omics approaches mature, this combined information of gene expression and additional data layers has the potential to improve GRN inference beyond correlating separate omics layers and allows for direct measurements instead.

Once reconstructed, these GRNs can be used to determine how for example, genetic differences or disease status change the architecture of the network. These networks consist of nodes, representing genes, that are connected through edges, representing the relationship between genes. The context-specific changes in the network can be identified on different levels, such as on the level of individual edges or nodes, topological properties of individual nodes, such as their connectivity (degree) or module membership (GTEx Consortium, 2015), subnetwork properties, such as the existence and size of modules, or global topological properties, such as degree distribution (Figure 3). Comparing topological features such as node degree to genotypes may identify polymorphisms altering the function of master regulators (highly connected ‘hub’ genes). Interestingly, implementation of network information was shown to be complementary to the identification of eQTLs; using this network information, novel SNPs were identified that could not be identified through single- or multi-tissue eQTL analyses of GTEx (Clark et al., 2018). This clearly shows the complementarity of both eQTL and network-based analyses for understanding the impact of genetic variation.

Ultimately, CRISPR perturbations will be coupled to scRNA-seq to validate or improve reconstructed GRNs. To optimize the number of perturbations required for extracting the most useful information from such experiments, an iterative approach will be taken that feeds back the experimental data to the GRN. This approach will make use of active machine learning to select those perturbations that are required to further improve the model (Ud-Dean and Gunawan, 2016; Aguet, 2019). These well validated, personalized and context-specific GRNs will provide us with a better understanding of disease and can be the starting point of applying this knowledge for precision medicine in the future.

Future clinical implications

The goal of the sc-eQTLGen consortium is to identify how genetic and environmental factors interact to affect gene expression in the context of both health and disease. With ever increasing sample sizes, eQTLs have now been detected for almost every gene (GTEx Consortium, 2017; Võsa, 2018). It is likely this will become even more pronounced through our initiative in which we will study many different cell types and contexts, and pose the question to what extent extensive eQTL maps will help to better understand disease. For cis-eQTLs this will not be straightforward: although it is known that disease-associated SNPs are enriched for showing cis-eQTL effects, this enrichment is quite modest (Zhernakova et al., 2017; Liu et al., 2019). It is therefore not sufficient to simply look up and catalogue which disease-associated SNPs show which cell type- and context-specific eQTL effects, since this can lead to incorrect inferences on the likely causal gene(s) per locus (Giambartolomei et al., 2014). To partly overcome this, several colocalization and Mendelian randomization approaches have been published that help to better infer likely causal genes (Gusev et al., 2016; Porcu et al., 2019; Mancuso et al., 2019; Kemp, 2019). Once these methods are able to account for multiple cell type- and context-dependent, causal regulatory variants per locus, we expect increased statistical power to prioritize causal genes. Additionally, we envision that such methods in conjunction with our cell type- and context-specific eQTL maps will help to determine which genetic variants have pleiotropic effects, affecting the expression levels of several genes in multiple cell types and conditions.

Nevertheless, we expect that most statistical power to pinpoint causal genes will be gained through the other goals of the sc-eQTLGen consortium: the reconstruction of cell type-specific gene regulatory networks (expected by the end of 2022), the mapping of cell type-specific trans-eQTLs and co-expression QTLs (expected by the end of 2021). These efforts will enable us to ascertain how the prioritized cis-eQTL genes (expected by the end of 2021) work together. Moreover, it permits us to study the effect of all disease-associated SNPs of a particular disease in a gene network structure, which helps prioritizing the key disrupted genes and pathways in that disease. In line with recent findings of the eQTLGen consortium that applied eQTL analysis in 31,684 bulk samples, we expect that the (majority of) causal genes within disease-associated loci will converge onto only a few key pathways per disease.

One strategy to identify those key driver genes is to consider all associated variants for a specific disease jointly, and ascertain whether most of these variants show (small) downstream effects on an overlapping set of downstream genes. We recently showed proof-of-concept in eQTLGen that this holds for independent systemic lupus erythematosus (SLE)-associated SNPs: many of these variants show downstream trans-eQTL effects on genes involved in the type I interferon pathway (Võsa, 2018), indicating an important role for this pathway in SLE development. Recently, success has been reported of a type I interferon-targeted therapy in SLE patients (Astle et al., 2016), highlighting the value of using trans-eQTLs for identifying key genes and pathways that are amenable for pharmaceutical intervention. We expect that our single-cell eQTL initiative will aid such analyses substantially: by performing large-scale eQTL mapping in specific cell types that are in a specific cellular state or are exposed to a particular stimulus, we will be able to more accurately determine where and when these downstream effects manifest. Moreover, single-cell studies will also help to overcome the problem associated with cell type composition differences across individuals in bulk-based eQTL studies: many variants exist that affect the proportion of specific cell types that for instance circulate in blood (Vieira Braga et al., 2019). If this is not fully accounted for, trans-eQTLs will be identified in genes that are specifically expressed in such cell types in bulk analyses. Single-cell studies allow us to distinguish between effects of genetic variants on cell type composition and effects on intracellular gene expression levels. Therefore, we expect scRNA-seq data will be vital to gain insight into the downstream consequences of disease-associated genetic variants, and to identify the key pathways and genes that drive disease.

Altogether, we expect these approaches will provide us with the information required to reveal new targets for disease prevention and treatment (Figure 5). For example, a novel subset of tissue-resident memory T cells has recently been identified in the setting of asthma using scRNA-seq (Schork, 2015). This study also showed that mostly T helper 2 cells are dominating the cell-cell interactions in the asthmatic airway wall, whereas in healthy controls mostly epithelial and mesenchymal cell types are communicating with each other. Integration of the gene expression of this asthma-associated cell type with asthma-associated genetic risk variants would further increase our understanding of the disease and such knowledge would greatly accelerate the development of personalized/precision treatments in the future. It is this information about how genes interact differently between individuals as a function of their genetic predisposition that will be obtained through the results of our consortium (Figure 5). One of the major benefits of such personalized treatments is in prescribing the correct drug based on the individual (mechanism that underlies) susceptibility to disease. Currently only between 4% and 25% of the people respond to commonly prescribed drugs (Mukherjee and Topol, 2002), showing the need to better predict drug responsiveness and thereby avoid unnecessary exposure to side-effects.

Figure 5. — (A) In the coming three years the sc-eQTLGen consortium aims to deliver the following: i) standardized pipelines and guidelines for single-cell population genetics studies (2020); ii) cell type classification models for PBMCs (2020); iii) summary statistics of *cis*- and *trans*-eQTLs, co-expression QTLs, cell count QTLs and variance QTLs (2021); iv) reconstruction of personalized, cell type-specific gene regulatory networks (2022). (B) These efforts of the consortium will lead to the (1) identification of disease-associated cell types and (2) key disease-driving genes, which together will aid (3) the implementation of personalized medicine and the development of new therapeutics that take all this information into account (cell type- and genotype-specific treatments). Panel B2 is adapted from van der Wijst et al. (2018b).

This high interindividual variability in drug response is a consequence of genetic and environmental exposure differences between individuals, which can result in differences in drug metabolism, absorption and excretion (pharmacodynamics) (Johnson et al., 2012). For example, a variant in the CYP2C19 gene changes the response to the anti-blood clotting drug clopidogrel. The CYP2C19 gene encodes for an enzyme in the bioactivation of the drug. CYP2C19 poor metabolizers were shown to exhibit higher cardiovascular event rates after acute coronary syndrome, or percutaneous coronary intervention, as compared to patients with normal CYP2C19 function (Tigchelaar et al., 2015).

While previous efforts have mainly focused on pharmacodynamic variation, recent single-cell analyses have revealed that gene-gene interactions can also be changed by genetic (van der Wijst et al., 2018a) and environmental variation (Cuomo et al., 2020). For example, two closely related SNPs (linkage disequilibrium R² = 0.92) affected both gene-gene interactions (RPS26 and RPL21) (van der Wijst et al., 2018a) and gene-environment interactions (RPS26 and the respiratory status of the cell) (Cuomo et al., 2020). This shows that gene regulatory network changes may underlie part of the interindividual variation in drug responsiveness. However, such effects have never been studied in detail before and the extent to which such interactions affect drug responsiveness are unknown. The sc-eQTLGen consortium is able to study both how gene-gene interactions and gene-environment interactions are affected by genetic variation, giving insight into where and when they occur. Importantly, the applied methodologies will be easily transferable to single-cell data that is collected in other cell types and disease context through other large-scale efforts (Regev et al., 2017) (https://lifetime-fetflagship.eu). Moreover, several partners within our consortium have generated scRNA-seq data in cohorts with extensive information on individuals’ health records and drug usage (e.g. the Lifelines Deep cohort [Heaton, 2019] and the OneK1K cohort). With such information, we will be able to validate the link between changes in the gene regulatory network and the drug responsiveness of an individual. This allows us to determine the predictive value of gene networks for determining responsiveness of specific drugs and the applicability of such networks in precision medicine.

As such, the sc-eQTLGen consortium will not only increase our basic knowledge about the contribution of genetics in gene expression and its regulation, but will also be a valuable resource for drug target identification and validation. To increase the impact of this work, all code, guidelines and summary statistics (including all non-significant results) will become freely available to the community through the sc-eQTLGen website (https://eqtlgen.org/single-cell.html). For any additional information, please visit the contact page (https://eqtlgen.org/contact.html).

Biographies

MGP van der Wijst is in the Department of Genetics, Oncode Institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

DH de Vries is in the Department of Genetics, Oncode Institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands.

HE Groot is in the Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

G Trynka is at the Wellcome Sanger Institute and Open Targets, Wellcome Genome Campus, Hinxton, United Kingdom

CC Hon is at the RIKEN Center for Integrative Medical Sciences, Yokohama, Japan

MJ Bonder is in the Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, and the Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany

O Stegle is in the Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, and the Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany

MC Nawijn is in the Department of Pathology and Medical Biology, GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

Y Idaghdour is in the Program in Biology, Public Health Research Center, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates

P van der Harst is in the Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

CJ Ye is in the Institute for Human Genetics, the Bakar Computational Health Sciences Institute, the Bakar ImmunoX Initiative, the Department of Medicine, the Department of Bioengineering and Therapeutic Sciences, the Department of Epidemiology and Biostatistics, and the Chan Zuckerberg Biohub, University of California San Francisco, San Francisco, CA, United States

J Powell is in the Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute, UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, Australia

FJ Theis in in the Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, and the Department of Mathematics, Technical University of Munich, Garching bei München, Germany

A Mahfouz is in the Leiden Computational Biology Center, Leiden University Medical Center, Leiden, and the Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands

M Heinig in in the Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, and the Department of Informatics, Technical University of Munich, Garching bei München, Germany

L Franke is in the Department of Genetics, Oncode Institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

MGP van der Wijst, Email: m.g.p.van.der.wijst@umcg.nl.

Helena Pérez Valle, eLife, United Kingdom.

Peter Rodgers, eLife, United Kingdom.

Funding Information

This paper was supported by the following grants:

Nederlandse Organisatie voor Wetenschappelijk Onderzoek NWO-Veni 192.029 to MGP van der Wijst.
Nederlandse Organisatie voor Wetenschappelijk Onderzoek ZonMW-VIDI 917.14.374 to L Franke.
European Research Council ERC Starting grant Immrisk 637640 to L Franke.
Oncode Institute to L Franke.
National Health and Medical Research Council Investigator grant 1175781 to J Powell.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Visualization, Project administration, Writing - original draft.

Conceptualization, Visualization, Writing - original draft.

Visualization, Writing - original draft.

Visualization.

Writing - original draft.

Writing - review and editing.

Writing - original draft, Writing - review and editing.

Writing - review and editing.

Conceptualization, Writing - original draft.

Conceptualization, Visualization, Writing - original draft.

Conceptualization.

Conceptualization, Visualization, Writing - original draft.

Conceptualization, Writing - review and editing.

Data availability

Not applicable.

References

Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, Mahfouz A. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biology. 2019;20:194. doi: 10.1186/s13059-019-1795-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aguet F. The GTEx consortium atlas of genetic regulatory effects across human tissues. bioRxiv. 2019 doi: 10.1101/787903. [DOI] [PMC free article] [PubMed]
Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine JC, Geurts P, Aerts J, van den Oord J, Atak ZK, Wouters J, Aerts S. SCENIC: single-cell regulatory network inference and clustering. Nature Methods. 2017;14:1083–1086. doi: 10.1038/nmeth.4463. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alquicira-Hernández J, Sathe A, Ji HP, Nguyen Q, Powell JE. scPred: cell type prediction at single-cell resolution. bioRxiv. 2018 doi: 10.1101/369538. [DOI]
Andrews TS, Hemberg M. False signals induced by single-cell imputation. F1000Research. 2018;7:1740. doi: 10.12688/f1000research.16613.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Angelini C, Costa V. Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems. Frontiers in Cell and Developmental Biology. 2014;2:51. doi: 10.3389/fcell.2014.00051. [DOI] [PMC free article] [PubMed] [Google Scholar]
Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biology. 2017;18:67. doi: 10.1186/s13059-017-1189-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Arneson D, Zhang G, Ying Z, Zhuang Y, Byun HR, Ahn IS, Gomez-Pinilla F, Yang X. Single cell molecular alterations reveal target cells and pathways of concussive brain injury. Nature Communications. 2018;9:3894. doi: 10.1038/s41467-018-06222-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA, Lambourne JJ, Sivapalaratnam S, Downes K, Kundu K, Bomba L, Berentsen K, Bradley JR, Daugherty LC, Delaneau O, Freson K, Garner SF, Grassi L, Guerrero J, Haimel M, Janssen-Megens EM, Kaan A, Kamat M, Kim B, Mandoli A, Marchini J, Martens JHA, Meacham S, Megy K, O'Connell J, Petersen R, Sharifi N, Sheard SM, Staley JR, Tuna S, van der Ent M, Walter K, Wang SY, Wheeler E, Wilder SP, Iotchkova V, Moore C, Sambrook J, Stunnenberg HG, Di Angelantonio E, Kaptoge S, Kuijpers TW, Carrillo-de-Santa-Pau E, Juan D, Rico D, Valencia A, Chen L, Ge B, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yang Y, Guigo R, Beck S, Paul DS, Pastinen T, Bujold D, Bourque G, Frontini M, Danesh J, Roberts DJ, Ouwehand WH, Butterworth AS, Soranzo N. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–1429. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
Azizi E, Carr AJ, Plitas G, Cornish AE, Konopacki C, Prabhakaran S, Nainys J, Wu K, Kiseliovas V, Setty M, Choi K, Fromme RM, Dao P, McKenney PT, Wasti RC, Kadaveru K, Mazutis L, Rudensky AY, Pe'er D. Single-Cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174:1293–1308. doi: 10.1016/j.cell.2018.05.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bacher R, Chu LF, Leng N, Gasch AP, Thomson JA, Stewart RM, Newton M, Kendziorski C. SCnorm: robust normalization of single-cell RNA-seq data. Nature Methods. 2017;14:584–586. doi: 10.1038/nmeth.4263. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baran Y. MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions. bioRxiv. 2018 doi: 10.1101/437665. [DOI] [PMC free article] [PubMed]
Battle A, Mostafavi S, Zhu X, Potash JB, Weissman MM, McCormick C, Haudenschild CD, Beckman KB, Shi J, Mei R, Urban AE, Montgomery SB, Levinson DF, Koller D. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Research. 2014;24:14–24. doi: 10.1101/gr.155192.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, Baying B, Benes V, Teichmann SA, Marioni JC, Heisler MG. Accounting for technical noise in single-cell RNA-seq experiments. Nature Methods. 2013;10:1093–1095. doi: 10.1038/nmeth.2645. [DOI] [PubMed] [Google Scholar]
Brown CD, Mangravite LM, Engelhardt BE. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLOS Genetics. 2013;9:e1003649. doi: 10.1371/journal.pgen.1003649. [DOI] [PMC free article] [PubMed] [Google Scholar]
Budden DM, Hurley DG, Cursons J, Markham JF, Davis MJ, Crampin EJ. Predicting expression: the complementary power of histone modification and transcription factor binding data. Epigenetics & Chromatin. 2014;7:36. doi: 10.1186/1756-8935-7-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523:486–490. doi: 10.1038/nature14590. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burgess DJ. Spatial transcriptomics coming of age. Nature Reviews Genetics. 2019;20:317. doi: 10.1038/s41576-019-0129-z. [DOI] [PubMed] [Google Scholar]
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, Daza RM, McFaline-Figueroa JL, Packer JS, Christiansen L, Steemers FJ, Adey AC, Trapnell C, Shendure J. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361:1380–1385. doi: 10.1126/science.aau0730. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, Buyske S, Schumacher FR, Peters U, Franceschini N, Ritchie MD, Duggan DJ, Spencer KL, Dumitrescu L, Eaton CB, Thomas F, Young A, Carty C, Heiss G, Le Marchand L, Crawford DC, Hindorff LA, Kooperberg CL, PAGE Consortium Generalization and dilution of association results from european GWAS in populations of non-European ancestry: the PAGE study. PLOS Biology. 2013;11:e1001661. doi: 10.1371/journal.pbio.1001661. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S, Datta A, Richardson D, Burden F, Mead D, Mann AL, Fernandez JM, Rowlston S, Wilder SP, Farrow S, Shao X, Lambourne JJ, Redensek A, Albers CA, Amstislavskiy V, Ashford S, Berentsen K, Bomba L, Bourque G, Bujold D, Busche S, Caron M, Chen SH, Cheung W, Delaneau O, Dermitzakis ET, Elding H, Colgiu I, Bagger FO, Flicek P, Habibi E, Iotchkova V, Janssen-Megens E, Kim B, Lehrach H, Lowy E, Mandoli A, Matarese F, Maurano MT, Morris JA, Pancaldi V, Pourfarzad F, Rehnstrom K, Rendon A, Risch T, Sharifi N, Simon MM, Sultan M, Valencia A, Walter K, Wang SY, Frontini M, Antonarakis SE, Clarke L, Yaspo ML, Beck S, Guigo R, Rico D, Martens JHA, Ouwehand WH, Kuijpers TW, Paul DS, Stunnenberg HG, Stegle O, Downes K, Pastinen T, Soranzo N. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell. 2016;167:1398–1414. doi: 10.1016/j.cell.2016.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics. 2018;19:232. doi: 10.1186/s12859-018-2217-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clark SJ, Argelaguet R, Kapourani CA, Stubbs TM, Lee HJ, Alda-Catalinas C, Krueger F, Sanguinetti G, Kelsey G, Marioni JC, Stegle O, Reik W. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nature Communications. 2018;9:781. doi: 10.1038/s41467-018-03149-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cuomo ASE, Seaton DD, McCarthy DJ, Martinez I, Bonder MJ, Garcia-Bernardo J, Amatya S, Madrigal P, Isaacson A, Buettner F, Knights A, Natarajan KN, Vallier L, Marioni JC, Chhatriwala M, Stegle O, HipSci Consortium Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nature Communications. 2020;11:810. doi: 10.1038/s41467-020-14457-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Donovan MKR, D’Antonio-Chronowska A, D’Antonio M, Frazer KA. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. bioRxiv. 2019 doi: 10.1101/671040. [DOI]
Eling N, Richard AC, Richardson S, Marioni JC, Vallejos CA. Correcting the Mean-Variance dependency for differential variability testing using Single-Cell RNA sequencing data. Cell Systems. 2018;7:284–294. doi: 10.1016/j.cels.2018.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fagny M, Paulson JN, Kuijjer ML, Sonawane AR, Chen CY, Lopes-Ramos CM, Glass K, Quackenbush J, Platig J. Exploring regulation in tissues with eQTL networks. PNAS. 2017;114:E7841–E7850. doi: 10.1073/pnas.1707375114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fairfax BP, Makino S, Radhakrishnan J, Plant K, Leslie S, Dilthey A, Ellis P, Langford C, Vannberg FO, Knight JC. Genetics of gene expression in primary immune cells identifies cell type–specific master regulators and roles of HLA alleles. Nature Genetics. 2012;44:502–510. doi: 10.1038/ng.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Favé MJ, Lamaze FC, Soave D, Hodgkinson A, Gauvin H, Bruat V, Grenier JC, Gbeha E, Skead K, Smargiassi A, Johnson M, Idaghdour Y, Awadalla P. Gene-by-environment interactions in urban populations modulate risk phenotypes. Nature Communications. 2018;9:827. doi: 10.1038/s41467-018-03202-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fiers M, Minnoye L, Aibar S, Bravo González-Blas C, Kalender Atak Z, Aerts S. Mapping gene regulatory networks from single-cell omics data. Briefings in Functional Genomics. 2018;17:246–254. doi: 10.1093/bfgp/elx046. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu J, Wolfs MG, Deelen P, Westra HJ, Fehrmann RS, Te Meerman GJ, Buurman WA, Rensen SS, Groen HJ, Weersma RK, van den Berg LH, Veldink J, Ophoff RA, Snieder H, van Heel D, Jansen RC, Hofker MH, Wijmenga C, Franke L. Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLOS Genetics. 2012;8:e1002431. doi: 10.1371/journal.pgen.1002431. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gasperini M, Hill AJ, McFaline-Figueroa JL, Martin B, Kim S, Zhang MD, Jackson D, Leith A, Schreiber J, Noble WS, Trapnell C, Ahituv N, Shendure J. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell. 2019;176:377–390. doi: 10.1016/j.cell.2018.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gate RE. Mapping gene regulatory networks of primary CD4+ T cells using single-cell genomics and genome engineering. bioRxiv. 2019 doi: 10.1101/678060. [DOI]
Ghanbari M, Lasserre J, Vingron M. Reconstruction of gene networks using prior knowledge. BMC Systems Biology. 2015;9:84. doi: 10.1186/s12918-015-0233-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genetics. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, Bustamante CD, 1000 Genomes Project Demographic history and rare allele sharing among human populations. PNAS. 2011;108:11983–11988. doi: 10.1073/pnas.1019276108. [DOI] [PMC free article] [PubMed] [Google Scholar]
GTEx Consortium The Genotype-Tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
GTEx Consortium Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, Jansen R, de Geus EJ, Boomsma DI, Wright FA, Sullivan PF, Nikkola E, Alvarez M, Civelek M, Lusis AJ, Lehtimäki T, Raitoharju E, Kähönen M, Seppälä I, Raitakari OT, Kuusisto J, Laakso M, Price AL, Pajukanta P, Pasaniuc B. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. bioRxiv. 2019 doi: 10.1101/576827. [DOI] [PMC free article] [PubMed]
Hayashi T, Ozaki H, Sasagawa Y, Umeda M, Danno H, Nikaido I. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nature Communications. 2018;9:619. doi: 10.1038/s41467-018-02866-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heap GA, Trynka G, Jansen RC, Bruinenberg M, Swertz MA, Dinesen LC, Hunt KA, Wijmenga C, Vanheel DA, Franke L. Complex nature of SNP genotype effects on gene expression in primary human leucocytes. BMC Medical Genomics. 2009;2:1. doi: 10.1186/1755-8794-2-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heaton H. Souporcell: robust clustering of single cell RNAseq by genotype and ambient RNA inference without reference genotypes. bioRxiv. 2019 doi: 10.1101/699637. [DOI] [PMC free article] [PubMed]
Hormozdiari F, Gazal S, van de Geijn B, Finucane HK, Ju CJ, Loh PR, Schoech A, Reshef Y, Liu X, O'Connor L, Gusev A, Eskin E, Price AL. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nature Genetics. 2018;50:1041–1047. doi: 10.1038/s41588-018-0148-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hsiao CL, Lian I, Hsieh AR, Fann CS. Modeling expression quantitative trait loci in data combining ethnic populations. BMC Bioinformatics. 2010;11:111. doi: 10.1186/1471-2105-11-111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu Y, Zhang X. SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data. bioRxiv. 2018 doi: 10.1101/499863. [DOI] [PMC free article] [PubMed]
Iacono G, Massoni-Badosa R, Heyn H. Single-cell transcriptomics unveils gene regulatory network plasticity. Genome Biology. 2019;20:110. doi: 10.1186/s13059-019-1713-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ishigaki K, Kochi Y, Suzuki A, Tsuchida Y, Tsuchiya H, Sumitomo S, Yamaguchi K, Nagafuchi Y, Nakachi S, Kato R, Sakurai K, Shoda H, Ikari K, Taniguchi A, Yamanaka H, Miya F, Tsunoda T, Okada Y, Momozawa Y, Kamatani Y, Yamada R, Kubo M, Fujio K, Yamamoto K. Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nature Genetics. 2017;49:1120–1125. doi: 10.1038/ng.3885. [DOI] [PubMed] [Google Scholar]
Jackson CA, Castro DM, Saldi G, Bonneau R, Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. bioRxiv. 2019 doi: 10.7554/eLife.51254. [DOI] [PMC free article] [PubMed]
Johnson MD, Plantinga TS, van de Vosse E, Velez Edwards DR, Smith PB, Alexander BD, Yang JC, Kremer D, Laird GM, Oosting M, Joosten LAB, van der Meer JWM, van Dissel JT, Walsh TJ, Perfect JR, Kullberg B-J, Scott WK, Netea MG. Cytokine gene polymorphisms and the outcome of invasive candidiasis: a prospective cohort study. Clinical Infectious Diseases. 2012;54:502–510. doi: 10.1093/cid/cir827. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang HM, Subramaniam M, Targ S, Nguyen M, Maliskova L, McCarthy E, Wan E, Wong S, Byrnes L, Lanata CM, Gate RE, Mostafavi S, Marson A, Zaitlen N, Criswell LA, Ye CJ. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nature Biotechnology. 2018;36:89–94. doi: 10.1038/nbt.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kemp A. Anifrolumab phase III trial meets primary endpoint in systemic lupus erythematosus. [August 29, 2019];2019 https://www.astrazeneca.com/media-centre/press-releases/2019/anifrolumab-phase-iii-trial-meets-primary-endpoint-in-systemic-lupus-erythematosus-29082019.html
Knowles DA, Davis JR, Edgington H, Raj A, Favé MJ, Zhu X, Potash JB, Weissman MM, Shi J, Levinson DF, Awadalla P, Mostafavi S, Montgomery SB, Battle A. Allele-specific expression reveals interactions between genetic variation and environment. Nature Methods. 2017;14:699–702. doi: 10.1038/nmeth.4298. [DOI] [PMC free article] [PubMed] [Google Scholar]
Köhler ND, Büttner M, Theis FJ. Deep learning does not outperform classical machine learning for cell-type annotation. bioRxiv. 2019 doi: 10.1101/653907. [DOI]
Kouno T, Moody J, Kwon AT, Shibayama Y, Kato S, Huang Y, Böttcher M, Motakis E, Mendez M, Severin J, Luginbühl J, Abugessaisa I, Hasegawa A, Takizawa S, Arakawa T, Furuno M, Ramalingam N, West J, Suzuki H, Kasukawa T, Lassmann T, Hon CC, Arner E, Carninci P, Plessy C, Shin JW. C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution. Nature Communications. 2019;10:360. doi: 10.1038/s41467-018-08126-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Lidschreiber K, Kastriti ME, Lönnerberg P, Furlan A, Fan J, Borm LE, Liu Z, van Bruggen D, Guo J, He X, Barker R, Sundström E, Castelo-Branco G, Cramer P, Adameyko I, Linnarsson S, Kharchenko PV. RNA velocity of single cells. Nature. 2018;560:494–498. doi: 10.1038/s41586-018-0414-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lareau CA, Duarte FM, Chew JG, Kartha VK, Burkett ZD, Kohlway AS, Pokholok D, Aryee MJ, Steemers FJ, Lebofsky R, Buenrostro JD. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nature Biotechnology. 2019;37:916–924. doi: 10.1038/s41587-019-0147-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration The sequence read archive. Nucleic Acids Research. 2011;39:19–21. doi: 10.1093/nar/gkx1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin D, Zhang J, Li J, Xu C, Deng HW, Wang YP. An integrative imputation method based on multi-omics datasets. BMC Bioinformatics. 2016;17:247. doi: 10.1186/s12859-016-1122-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu B, Gloudemans MJ, Rao AS, Ingelsson E, Montgomery SB. Abundant associations with gene expression complicate GWAS follow-up. Nature Genetics. 2019;51:768–769. doi: 10.1038/s41588-019-0404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lloyd-Jones LR, Holloway A, McRae A, Yang J, Small K, Zhao J, Zeng B, Bakshi A, Metspalu A, Dermitzakis M, Gibson G, Spector T, Montgomery G, Esko T, Visscher PM, Powell JE. The genetic Architecture of gene expression in peripheral blood. American Journal of Human Genetics. 2017;100:228–237. doi: 10.1016/j.ajhg.2016.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Molecular Systems Biology. 2019;15:e8746. doi: 10.15252/msb.20188746. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lukowski SW, Lloyd-Jones LR, Holloway A, Kirsten H, Hemani G, Yang J, Small K, Zhao J, Metspalu A, Dermitzakis ET, Gibson G, Spector TD, Thiery J, Scholz M, Montgomery GW, Esko T, Visscher PM, Powell JE. Genetic correlations reveal the shared genetic architecture of transcription in human peripheral blood. Nature Communications. 2017;8:483. doi: 10.1038/s41467-017-00473-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, Pasaniuc B. Probabilistic fine-mapping of transcriptome-wide association studies. Nature Genetics. 2019;51:675–682. doi: 10.1038/s41588-019-0367-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mandric I. Optimal design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis. bioRxiv. 2019 doi: 10.1101/766972. [DOI] [PMC free article] [PubMed]
Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nature Reviews Genetics. 2010;11:499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]
Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human demographic history impacts genetic risk prediction across diverse populations. American Journal of Human Genetics. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maynard KR, Jaffe AE, Martinowich K. Spatial transcriptomics: putting genome-wide expression on the map. Neuropsychopharmacology. 2020;45:232–233. doi: 10.1038/s41386-019-0484-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki AE, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PI, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R, Haplotype Reference Consortium A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
Michaelson JJ, Loguercio S, Beyer A. Detection and interpretation of expression quantitative trait loci (eQTL) Methods. 2009;48:265–276. doi: 10.1016/j.ymeth.2009.03.004. [DOI] [PubMed] [Google Scholar]
Miraldi ER, Pokrovskii M, Watters A, Castro DM, De Veaux N, Hall JA, Lee JY, Ciofani M, Madar A, Carriero N, Littman DR, Bonneau R. Leveraging chromatin accessibility for transcriptional regulatory network inference in T helper 17 cells. Genome Research. 2019;29:449–463. doi: 10.1101/gr.238253.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moignard V, Woodhouse S, Haghverdi L, Lilly AJ, Tanaka Y, Wilkinson AC, Buettner F, Macaulay IC, Jawaid W, Diamanti E, Nishikawa SI, Piterman N, Kouskoff V, Theis FJ, Fisher J, Göttgens B. Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nature Biotechnology. 2015;33:269–276. doi: 10.1038/nbt.3154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mukherjee D, Topol EJ. Pharmacogenomics in cardiovascular diseases. Progress in Cardiovascular Diseases. 2002;44:479–498. doi: 10.1053/pcad.2002.123467. [DOI] [PubMed] [Google Scholar]
Ocone A, Haghverdi L, Mueller NS, Theis FJ. Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics. 2015;31:i89–i96. doi: 10.1093/bioinformatics/btv257. [DOI] [PMC free article] [PubMed] [Google Scholar]
Porcu E, Rüeger S, Lepik K, Santoni FA, Reymond A, Kutalik Z, eQTLGen Consortium, BIOS Consortium Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nature Communications. 2019;10:3300. doi: 10.1038/s41467-019-10936-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. bioRxiv. 2019 doi: 10.1101/642926. [DOI] [PMC free article] [PubMed]
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
Qin J, Hu Y, Xu F, Yalamanchili HK, Wang J. Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. Methods. 2014;67:294–303. doi: 10.1016/j.ymeth.2014.03.006. [DOI] [PubMed] [Google Scholar]
Qiu X. Towards inferring causal gene regulatory networks from single cell expression measurements. bioRxiv. 2018 doi: 10.1101/426981. [DOI] [PMC free article] [PubMed]
Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLOS Biology. 2006;4:e309. doi: 10.1371/journal.pbio.0040309. [DOI] [PMC free article] [PubMed] [Google Scholar]
Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, Clevers H, Deplancke B, Dunham I, Eberwine J, Eils R, Enard W, Farmer A, Fugger L, Göttgens B, Hacohen N, Haniffa M, Hemberg M, Kim S, Klenerman P, Kriegstein A, Lein E, Linnarsson S, Lundberg E, Lundeberg J, Majumder P, Marioni JC, Merad M, Mhlanga M, Nawijn M, Netea M, Nolan G, Pe'er D, Phillipakis A, Ponting CP, Quake S, Reik W, Rozenblatt-Rosen O, Sanes J, Satija R, Schumacher TN, Shalek A, Shapiro E, Sharma P, Shin JW, Stegle O, Stratton M, Stubbington MJT, Theis FJ, Uhlen M, van Oudenaarden A, Wagner A, Watt F, Weissman J, Wold B, Xavier R, Yosef N, Human Cell Atlas Meeting Participants The Human Cell Atlas. eLife. 2017;6:e27041. doi: 10.7554/eLife.27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roshchupkin GV, Adams HH, Vernooij MW, Hofman A, Van Duijn CM, Ikram MA, Niessen WJ. HASE: framework for efficient high-dimensional association analyses. Scientific Reports. 2016;6:36076. doi: 10.1038/srep36076. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nature Biotechnology. 2019;37:547–554. doi: 10.1038/s41587-019-0071-9. [DOI] [PubMed] [Google Scholar]
Sarkar AK, Tung PY, Blischak JD, Burnett JE, Li YI, Stephens M, Gilad Y. Discovery and characterization of variance QTLs in human induced pluripotent stem cells. PLOS Genetics. 2019;15:e1008045. doi: 10.1371/journal.pgen.1008045. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schork NJ. Personalized medicine: time for one-person trials. Nature. 2015;520:609–611. doi: 10.1038/520609a. [DOI] [PubMed] [Google Scholar]
Siahpirani AF, Roy S. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Research. 2017;45:2221. doi: 10.1093/nar/gkw1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
Silvester N, Alako B, Amid C, Cerdeño-Tarrága A, Clarke L, Cleland I, Harrison PW, Jayathilaka S, Kay S, Keane T, Leinonen R, Liu X, Martínez-Villacorta J, Menchi M, Reddy K, Pakseresht N, Rajan J, Rossello M, Smirnov D, Toribio AL, Vaughan D, Zalunin V, Cochrane G. The European Nucleotide Archive in 2017. Nucleic Acids Research. 2018;46:D36–D40. doi: 10.1093/nar/gkx1125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Simpson EH. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B. 1951;13:238–241. doi: 10.1111/j.2517-6161.1951.tb00088.x. [DOI] [Google Scholar]
Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
Skinnider MA, Squair JW, Foster LJ. Evaluating measures of association for single-cell transcriptomics. Nature Methods. 2019;16:381–386. doi: 10.1038/s41592-019-0372-4. [DOI] [PubMed] [Google Scholar]
Smillie CS, Biton M, Ordovas-Montanes J, Sullivan KM, Burgin G, Graham DB, Herbst RH, Rogel N, Slyper M, Waldman J, Sud M, Andrews E, Velonias G, Haber AL, Jagadeesh K, Vickovic S, Yao J, Stevens C, Dionne D, Nguyen LT, Villani AC, Hofree M, Creasey EA, Huang H, Rozenblatt-Rosen O, Garber JJ, Khalili H, Desch AN, Daly MJ, Ananthakrishnan AN, Shalek AK, Xavier RJ, Regev A. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell. 2019;178:714–730. doi: 10.1016/j.cell.2019.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, Satija R, Smibert P. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavaré S, Deloukas P, Dermitzakis ET. Genome-wide associations of gene expression variation in humans. PLOS Genetics. 2005;1:e78. doi: 10.1371/journal.pgen.0010078. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavaré S, Deloukas P, Dermitzakis ET. Population genomics of human gene expression. Nature Genetics. 2007;39:1217–1224. doi: 10.1038/ng2142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sverchkov Y, Craven M. A review of active learning approaches to experimental design for uncovering biological networks. PLOS Computational Biology. 2017;13:e1005466. doi: 10.1371/journal.pcbi.1005466. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tian L, Dong X, Freytag S, Lê Cao KA, Su S, JalalAbadi A, Amann-Zalcenstein D, Weber TS, Seidi A, Jabbari JS, Naik SH, Ritchie ME. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nature Methods. 2019;16:479–487. doi: 10.1038/s41592-019-0425-8. [DOI] [PubMed] [Google Scholar]
Tigchelaar EF, Zhernakova A, Dekens JA, Hermes G, Baranska A, Mujagic Z, Swertz MA, Muñoz AM, Deelen P, Cénit MC, Franke L, Scholtens S, Stolk RP, Wijmenga C, Feskens EJ. Cohort profile: Lifelines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open. 2015;5:e006772. doi: 10.1136/bmjopen-2014-006772. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ud-Dean SM, Gunawan R. Optimal design of gene knockout experiments for gene regulatory network inference. Bioinformatics. 2016;32:875–883. doi: 10.1093/bioinformatics/btv672. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Wijst MGP, Brugge H, de Vries DH, Deelen P, Swertz MA, Franke L, LifeLines Cohort Study, BIOS Consortium Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nature Genetics. 2018a;50:493–497. doi: 10.1038/s41588-018-0089-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Wijst MGP, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Medicine. 2018b;10:96. doi: 10.1186/s13073-018-0608-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Venet D, Pecasse F, Maenhaut C, Bersini H. Separation of samples into their constituents using gene expression data. Bioinformatics. 2001;17 Suppl 1:S279–S287. doi: 10.1093/bioinformatics/17.suppl_1.S279. [DOI] [PubMed] [Google Scholar]
Vieira Braga FA, Kar G, Berg M, Carpaij OA, Polanski K, Simon LM, Brouwer S, Gomes T, Hesse L, Jiang J, Fasouli ES, Efremova M, Vento-Tormo R, Talavera-López C, Jonker MR, Affleck K, Palit S, Strzelecka PM, Firth HV, Mahbubani KT, Cvejic A, Meyer KB, Saeb-Parsy K, Luinge M, Brandsma CA, Timens W, Angelidis I, Strunz M, Koppelman GH, van Oosterhout AJ, Schiller HB, Theis FJ, van den Berge M, Nawijn MC, Teichmann SA. A cellular census of human lungs identifies novel cell states in health and in asthma. Nature Medicine. 2019;25:1153–1163. doi: 10.1038/s41591-019-0468-5. [DOI] [PubMed] [Google Scholar]
Villani AC, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J, Griesbeck M, Butler A, Zheng S, Lazo S, Jardine L, Dixon D, Stephenson E, Nilsson E, Grundberg I, McDonald D, Filby A, Li W, De Jager PL, Rozenblatt-Rosen O, Lane AA, Haniffa M, Regev A, Hacohen N. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science. 2017;356:eaah4573. doi: 10.1126/science.aah4573. [DOI] [PMC free article] [PubMed] [Google Scholar]
Võsa U. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. bioRxiv. 2018 doi: 10.1101/447367. [DOI]
Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, Clarke D, Gu M, Emani P, Yang YT, Xu M, Gandal MJ, Lou S, Zhang J, Park JJ, Yan C, Rhie SK, Manakongtreecheep K, Zhou H, Nathan A, Peters M, Mattei E, Fitzgerald D, Brunetti T, Moore J, Jiang Y, Girdhar K, Hoffman GE, Kalayci S, Gümüş ZH, Crawford GE, Roussos P, Akbarian S, Jaffe AE, White KP, Weng Z, Sestan N, Geschwind DH, Knowles JA, Gerstein MB, PsychENCODE Consortium Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018;362:eaat8464. doi: 10.1126/science.aat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]
Westra HJ, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, Christiansen MW, Fairfax BP, Schramm K, Powell JE, Zhernakova A, Zhernakova DV, Veldink JH, Van den Berg LH, Karjalainen J, Withoff S, Uitterlinden AG, Hofman A, Rivadeneira F, Hoen PAC', Reinmaa E, Fischer K, Nelis M, Milani L, Melzer D, Ferrucci L, Singleton AB, Hernandez DG, Nalls MA, Homuth G, Nauck M, Radke D, Völker U, Perola M, Salomaa V, Brody J, Suchy-Dicey A, Gharib SA, Enquobahrie DA, Lumley T, Montgomery GW, Makino S, Prokisch H, Herder C, Roden M, Grallert H, Meitinger T, Strauch K, Li Y, Jansen RC, Visscher PM, Knight JC, Psaty BM, Ripatti S, Teumer A, Frayling TM, Metspalu A, van Meurs JBJ, Franke L. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nature Genetics. 2013;45:1238–1243. doi: 10.1038/ng.2756. [DOI] [PMC free article] [PubMed] [Google Scholar]
Westra HJ, Arends D, Esko T, Peters MJ, Schurmann C, Schramm K, Kettunen J, Yaghootkar H, Fairfax BP, Andiappan AK, Li Y, Fu J, Karjalainen J, Platteel M, Visschedijk M, Weersma RK, Kasela S, Milani L, Tserel L, Peterson P, Reinmaa E, Hofman A, Uitterlinden AG, Rivadeneira F, Homuth G, Petersmann A, Lorbeer R, Prokisch H, Meitinger T, Herder C, Roden M, Grallert H, Ripatti S, Perola M, Wood AR, Melzer D, Ferrucci L, Singleton AB, Hernandez DG, Knight JC, Melchiotti R, Lee B, Poidinger M, Zolezzi F, Larbi A, Wang deY, van den Berg LH, Veldink JH, Rotzschke O, Makino S, Salomaa V, Strauch K, Völker U, van Meurs JB, Metspalu A, Wijmenga C, Jansen RC, Franke L. Cell specific eQTL analysis without sorting cells. PLOS Genetics. 2015;11:e1005223. doi: 10.1371/journal.pgen.1005223. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL, Belbin GM, Bien SA, Cheng I, Cullina S, Hodonsky CJ, Hu Y, Huckins LM, Jeff J, Justice AE, Kocarnik JM, Lim U, Lin BM, Lu Y, Nelson SC, Park SL, Poisner H, Preuss MH, Richard MA, Schurmann C, Setiawan VW, Sockell A, Vahi K, Verbanck M, Vishnu A, Walker RW, Young KL, Zubair N, Acuña-Alonso V, Ambite JL, Barnes KC, Boerwinkle E, Bottinger EP, Bustamante CD, Caberto C, Canizales-Quinteros S, Conomos MP, Deelman E, Do R, Doheny K, Fernández-Rhodes L, Fornage M, Hailu B, Heiss G, Henn BM, Hindorff LA, Jackson RD, Laurie CA, Laurie CC, Li Y, Lin DY, Moreno-Estrada A, Nadkarni G, Norman PJ, Pooler LC, Reiner AP, Romm J, Sabatti C, Sandoval K, Sheng X, Stahl EA, Stram DO, Thornton TA, Wassel CL, Wilkens LR, Winkler CA, Yoneyama S, Buyske S, Haiman CA, Kooperberg C, Le Marchand L, Loos RJF, Matise TC, North KE, Peters U, Kenny EE, Carlson CS. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, Rajewsky N, Simon L, Theis FJ. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biology. 2019;20:59. doi: 10.1186/s13059-019-1663-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu J. Genotype-free demultiplexing of pooled single-cell RNA-seq. bioRxiv. 2019 doi: 10.1101/570614. [DOI] [PMC free article] [PubMed]
Xue A, Wu Y, Zhu Z, Zhang F, Kemper KE, Zheng Z, Yengo L, Lloyd-Jones LR, Sidorenko J, Wu Y, McRae AF, Visscher PM, Zeng J, Yang J, eQTLGen Consortium Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nature Communications. 2018;9:2941. doi: 10.1038/s41467-018-04951-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ye CJ, Feng T, Kwon HK, Raj T, Wilson MT, Asinovski N, McCabe C, Lee MH, Frohlich I, Paik HI, Zaitlen N, Hacohen N, Stranger B, De Jager P, Mathis D, Regev A, Benoist C. Intersection of population variation and autoimmunity genetics in human T cell activation. Science. 2014;345:1254665. doi: 10.1126/science.1254665. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeng B, Gibson G. PolyQTL: Bayesian multiple eQTL detection with control for population structure and sample relatedness. Bioinformatics. 2019;35:1061–1063. doi: 10.1093/bioinformatics/bty728. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, van 't Hof P, Mei H, van Dijk F, Westra HJ, Bonder MJ, van Rooij J, Verkerk M, Jhamai PM, Moed M, Kielbasa SM, Bot J, Nooren I, Pool R, van Dongen J, Hottenga JJ, Stehouwer CD, van der Kallen CJ, Schalkwijk CG, Zhernakova A, Li Y, Tigchelaar EF, de Klein N, Beekman M, Deelen J, van Heemst D, van den Berg LH, Hofman A, Uitterlinden AG, van Greevenbroek MM, Veldink JH, Boomsma DI, van Duijn CM, Wijmenga C, Slagboom PE, Swertz MA, Isaacs A, van Meurs JB, Jansen R, Heijmans BT, 't Hoen PA, Franke L. Identification of context-dependent expression quantitative trait loci in whole blood. Nature Genetics. 2017;49:139–145. doi: 10.1038/ng.3737. [DOI] [PubMed] [Google Scholar]
Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.52155.sa1

Decision letter

Editor: Helena Pérez Valle¹

Reviewed by: Stephen B Montgomery², Maud Fagny³

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Thank you for submitting your article "The single-cell eQTLGen Consortium" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by two editors from the eLife Features Team (Helena Pérez Valle and Peter Rodgers). The following individuals involved in review of your submission have agreed to reveal their identity: Stephen B. Montgomery (Reviewer #1); Maud Fagny (Reviewer #2).

The reviewers have discussed the reviews with one another and the editors have drafted this decision to help you prepare a revised submission.

The manuscript introduces the single-cell eQTLGen Consortium, which aims to assess the effect of genetic variants on gene expression at the cell-type level, and describes the goals of the consortium and its plans for data analysis. The authors propose the development of standardized guidelines and pipelines to perform eQTL analyses, as well as a roadmap to perform analyses while preserving anonymity. They highlight the expected results of the Consortium, including a better understanding of the molecular bases of complex diseases and of the cell type involved, and potential clinical applications.

The manuscript is clearly written and interesting, and it outlines a sound an applicable protocol to analyse many cohorts while preserving data privacy. However, it would benefit from addressing a number of issues in greater detail - see below.

Essential revisions:

1. The Consortium does not provide detail of the study design considerations for data producers in any specifics. I.e. how data should be processed, how many cells/individual, how many reads. Minimum number of individuals. Inclusion/exclusion criteria. Further, what type of genotyping will be required for individuals. For example, the authors mention that cells from different individuals can be mixed together and "multiplexed" to reduce cost and avoid confounding, but it would be very useful for the authors to show the mapping power increase obtained from multiplexing. Analyses like this may help researchers decide on their preferred collection design which would allow a better harmonization of data generated from outside the consortium.

2. The analyses proposed are exciting but the specifics of how they will be run are vague. It would be helpful to catalogue existing tools and identify where new tools are needed, highlighting where the code/algorithms will eventually be found.

3. Authors mention gene regulatory networks, when they really plan to study gene co-expression network. "Regulatory" suggests a causal relationship between 2 nodes, while co-expression only relies on correlations. While similar changes in expression levels among cells might suggest a co-regulation, no inference can be made about a regulatory relationship between genes in absence of complementary information such as TF bindings. Some approaches are able to build regulatory networks from expression data, with the addition of prior information (see Sonawane et al., Network Medicine in the age of biomedical big data. 2019. Frontiers in Genetics. doi: 10.3389/fgene.2019.00294).

4. scRNA-seq data has lower power for eQTL mapping than bulk RNA-seq when matched for sample size. It would be informative for the readers and community to get a better sense of the number of eQTLs that we would expect to map based on individual sample size, number of cells captured by experiment, cell-type proportion in PBMC, etc...

5. Many data are mentioned (genomic data, scRNA-seq, scATAC-seq, sc-protein level...), but it is not always clear which ones will be generated, which ones may be generated, and which ones are already existing datasets. Maybe a figure would help?

6. There is limited mention of potential ASE-based or splicing analyses.

7. There is limited mention of how multi-omics from single cell data may improve GRN or other analyses. There are multiple studies that have now obtained different data modalities from the same cells.

8. I would expect some discussion of spatial transcriptomics and its potential.

9. How does the consortium and its work relate to/differ from the following project?

https://chanzuckerberg.com/science/programs-resources/humancellatlas/seednetworks/human-immune-variation-across-genetic-backgrounds-gender-and-time/

10. Please outline the deliverables proposed for the consortium (including a timeline for when they will be available).

11. Aspects of future data sharing and accessibility are essential to address.

12. Please explain how new individuals can become members of the consortium.

13. Please explain the consortium will be funded.

eLife. 2020 Mar 9;9:e52155. doi: 10.7554/eLife.52155.sa2

Author response

[We repeat the reviewers’ points here in bold text, and include our replies point by point, as well as a description of the changes made, in plain text].

Reply to reviewer comments:

We would like to thank the reviewers and editors for their highly constructive comments and suggestions, which have enabled us to improve our manuscript substantially. We believe that we have now adequately addressed each reviewer’s comments. In short, we have added additional information about data sharing and how to become part of this effort. Moreover, we have further improved the figures and resolved some confusions. Finally, we have added an overview of the deliverables and the associated timelines in Figure 5. This will provide the reader with insights into when to expect standardized pipelines and guidelines for designing single-cell population genetics studies, and when to expect our first results.

Essential revisions:

1. The Consortium does not provide detail of the study design considerations for data producers in any specifics. I.e. how data should be processed, how many cells/individual, how many reads. Minimum number of individuals. Inclusion/exclusion criteria.

We completely agree that any detail of study design considerations would greatly benefit groups that are interested in conducting single-cell population genetics studies and joining the consortium in the future. Unfortunately, we do not have all the answers yet.

In Box 1 we do already mention some more general guidelines related to sample multiplexing:

“First of all, the genetic information that is available for each of the individuals in such cohorts can be used to demultiplex pools of multiple individuals within the same sample.This approach allows to properly randomize experiments, while also significantly reducing cost and confounding effects^{23, 107}. This genetic information can either be efficiently generated using genotype arrays¹⁰⁸ in combination with imputation-based approaches¹⁰⁹, or extracted from the scRNA-seq data itself^{107, 110}. Within the consortium all reads will be aligned to the GRCh38/hg38 reference genome and genotypes will be imputed using the Haplotype Reference Consortium reference panel¹¹¹. The basic principle behind genetic multiplexing is that enough transcripts harboring SNPs are expressed and detected in each single cell such that cells can be accurately assigned to the donor of origin. Furthermore, as the number of multiplexed individuals increases, the probability that a droplet harbors multiple cells from different individuals increases, thus allowing the detection of multiplets using genetic information. This enables the overloading of cells into standard droplet-based workflows and overall reduction of cost per cell up to about 10-fold (https://satijalab.org/costpercell). As the cost of sequencing and the background multiplet rate reduce, the benefits of multiplexing increase. We anticipate that future workflows will allow for even higher throughput.”

… and the number of cells and individuals required for identifying eQTLs, and how to most cost-efficiently design such studies for these purposes:

“Finally, studying genetic variation at the single-cell level adds some extra requirements for the number of cells per individual and the number of individuals to be included in the study. The number of cells per individual will mainly define for which cell types in a heterogeneous sample such as PBMCs eQTL and co-eQTL analyses can be performed. In contrast, the number of individuals will mainly define the number of genetic variants for which effects on gene expression can be confidently assessed. A recent analysis showed that, with a fixed budget, the optimal power for detecting cell type-specific eQTLs is obtained when the number of reads is spread across many individuals¹¹⁸. Even though a lower sequencing depth per cell results in a lower accuracy of estimating cell type-specific gene expression levels, many more individuals and cells per individual can be included for the same budget. As a result, the optimal experimental design with a fixed budget provides up to three times more power than a design based on the recommended sequencing depth of 50,000 reads per cell (for 10X Genomics scRNA-seq). In contrast, for co-eQTL analysis there is a different trade-off between sequencing depth, number of individuals and number of reads per cell; while for eQTL analysis gene expression levels among cells of the same cell type can be averaged, for co-eQTL analysis you cannot as this would prohibit you from calculating a gene-gene correlation per individual. Therefore, for co-eQTLs the sequencing depth will be a major limiting factor that determines the number of genes for which you can confidently calculate gene-gene correlations. Altogether, depending on the goal of your study, the optimal balance between sequencing depth and number of individuals and cells per individual will be different.”

However, the consortium aims to have more detailed guidelines ready by the summer of 2020. Currently, the members of the consortium are working on defining the minimum requirements, setting guidelines and defining standardized protocols for single-cell population genetics studies in three separate working groups (1. QC and preprocessing, 2. Cell type classification, 3. eQTL meta-analysis). As soon as this information is ready, guidelines will be shared through our sc-eQTLGen website (https://eqtlgen.org/single-cell.html). Moreover, as a proof of concept, the consortium will summarize all these efforts in a first pilot study in which 3-5 sc-eQTL datasets will be meta-analyzed. This publication is expected to be ready by the end of 2020.

We now specify the expected timeline of our consortium in the “Future clinical implications” section:

“Nevertheless, we expect that most statistical power to pinpoint causal genes will be gained through the other goals of the sc-eQTLGen consortium: the reconstruction of cell type-specific gene regulatory networks (expected by the end of 2022), the mapping of cell type-specific trans-eQTLs and co-expression QTLs (expected by the end of 2021). These efforts will enable us to ascertain how the prioritized cis-eQTL genes (expected by the end of 2021) work together.”

In Box 1:

“By the end of 2020, the sc-eQTLGen consortium will provide standardized pipelines and guidelines for single-cell population genetics studies.”

And in Figure 5 and the corresponding figure legend:

“Figure 5. Deliverables of the single-cell eQTLGen consortium in relation to their future clinical implications. A) In the coming 3 years the sc-eQTLGen consortium aims to deliver 1. Standardized pipelines and guidelines for single-cell population genetics studies (2020); 2. Cell type classification models for PBMCs (2020); 3. Summary statistics of cis- and trans-eQTLs, co-expression QTLs, cell count QTLs and variance QTLs (2021); 4. Reconstruction of personalized, cell type-specific gene regulatory networks (2022). B) These efforts of the consortium will lead to the (1) identification of disease-associated cell types and (2) key disease-driving genes, which together will aid (3) the implementation of personalized medicine and the development of new therapeutics that take all this information into account (cell type- and genotype-specific treatments).”

Further, what type of genotyping will be required for individuals.

The minimum requirements for genotype information are that the genotypes should be assessed through genotype arrays (e.g. Illumina Global Screening Array or ThermoFisher Axiom chip), whole exome or whole genome sequencing. Irrespective of the platform used, we will impute genotypes using the Haplotype Reference Consortium (HRC) imputation reference panel. This will ensure consistency in genotype assignments, preventing potential ambiguity for C/G and A/T SNPs. We realize HRC is confined to SNVs, and plan to look into structural variants at a later stage. Depending on the proportion of cohorts with whole-genome sequencing data in a year from now, we will decide on appropriate strategies for calling structural variants.

We now mention this in Box 1:

“This genetic information can either be efficiently generated using genotype arrays¹⁰⁸ in combination with imputation-based approaches¹⁰⁹, or extracted from the scRNA-seq data itself^{107, 110}. Within the consortium all reads will be aligned to the GRCh38/hg38 reference genome and genotypes will be imputed using the Haplotype Reference Consortium reference panel¹¹¹.”

For example, the authors mention that cells from different individuals can be mixed together and "multiplexed" to reduce cost and avoid confounding, but it would be very useful for the authors to show the mapping power increase obtained from multiplexing. Analyses like this may help researchers decide on their preferred collection design which would allow a better harmonization of data generated from outside the consortium.

We apologize for the confusion, but multiplexing samples is not related to the mapping power. Multiplexing allows you to identify doublets from different individuals, thereby allowing a higher number of cells to be loaded (i.e. ‘overloading’) and making the approach more cost-efficient. Moreover, it allows experimental set-ups that combine multiple conditions/individuals together, thereby allowing us to distinguish between experimental and biological effects.

This has previously already been discussed in Box 1:

2. The analyses proposed are exciting but the specifics of how they will be run are vague. It would be helpful to catalogue existing tools and identify where new tools are needed, highlighting where the code/algorithms will eventually be found.

The exact details of how all analyses will be run are currently being defined in three separate working groups (1. QC and preprocessing, 2. Cell type classification, 3. eQTL meta-analysis). Each of the working groups are currently benchmarking current approaches for their applicability for large-scale single-cell population studies, and wherever necessary current tools will be adapted for our purpose our new tools will be developed. We expect to have more detailed guidelines ready by the summer of 2020. As soon as this information is ready, guidelines and code will be shared through our sc-eQTLGen website (https://eqtlgen.org/single-cell.html) and through Github. Moreover, as a proof of concept, the consortium will summarize all these efforts in a first pilot study in which 3-5 sc-eQTL datasets will be meta-analyzed. This publication is expected to be ready by the end of 2020.

3. Authors mention gene regulatory networks, when they really plan to study gene co-expression network. "Regulatory" suggests a causal relationship between 2 nodes, while co-expression only relies on correlations. While similar changes in expression levels among cells might suggest a co-regulation, no inference can be made about a regulatory relationship between genes in absence of complementary information such as TF bindings. Some approaches are able to build regulatory networks from expression data, with the addition of prior information (see Sonawane et al., Network Medicine in the age of biomedical big data. 2019. Frontiers in Genetics. doi: 10.3389/fgene.2019.00294).

We apologize for the confusion, but the consortium does want to go beyond reconstruction of gene co-expression networks. In a previously published perspective paper [Van der Wijst et al., 2018 – Genome Medicine], we have explained in detail some of the strategies we would like to take to go from co-expression to gene regulatory networks.

In the “sc-eQTLGen: a federated single-cell eQTL meta-analysis consortium” section, we have now more specifically mentioned that the strategies discussed above, including the use of prior information (e.g. ATAC-seq data, TF binding information), dynamic information (e.g. information extracted from time series data, pseudotime in combination with RNA velocity) and experimental validation (e.g. CRISPR coupled to scRNA-seq), will be applied by the consortium to go from co-expression to gene regulatory networks:

“Finally, the scRNA-seq data will be used to reconstruct GRNs. Two strategies will be explored in the context of sc-eQTLGen. The first approach makes use of the large number of bulk RNA-seq datasets for specific cell types that are available in public RNA-seq repositories^{88, 89}. Using this publicly available bulk RNA-seq data, reference co-expression networks will be constructed using cell type-specific data. Subsequently, scRNA-seq data will be used to implement directionality and specify the connections in the network that are affected by specific contexts⁴⁴. The second approach will directly use scRNA-seq data to build cell type-specific GRNs, thereby enabling to immediately take the context-specificity into account. However, the number of genes that can confidently be taken into account by this second approach may be lower due to the sparsity of scRNA-seq data. For both strategies, we will make use of prior information (e.g. ATAC-seq data⁵⁸, TF binding information), dynamic information (e.g. information extracted from time series data⁷¹, pseudotime⁷² in combination with RNA velocity^{45, 46}) and experimental validation (e.g. perturbation experiments^{33, 50}) to go from a co-expression to a gene regulatory network.”

4. scRNA-seq data has lower power for eQTL mapping than bulk RNA-seq when matched for sample size. It would be informative for the readers and community to get a better sense of the number of eQTLs that we would expect to map based on individual sample size, number of cells captured by experiment, cell-type proportion in PBMC, etc.

Indeed, with a similar sample size, single-cell data may not necessarily allow you to detect more eQTLs, but it does provide insights that cannot be easily extracted from bulk data (e.g. unbiased detection of cell type-/context-dependent eQTLs, more power to detect co-expression QTLs [Van der Wijst et al., 2018 – Nature Genetics] and being able to extract dynamic information (e.g. pseudotime in combination with RNA velocity) that allows reconstruction of gene regulatory networks [Van der Wijst et al., 2018 – Genome Medicine]). It is these aspects that make single-cell data unique and a valuable approach to pursue to gain additional disease understanding in the future.

Previously, we have performed a small comparison between single-cell and bulk RNA-seq data. For this purpose, we compared the number of cis-eQTLs identified in PBMC scRNA-seq data of 45 individuals [Van der Wijst et al., 2018 – Nature Genetics] and whole blood bulk RNA-seq data of 45 individuals (using a random subset of the 2,116 samples that have been reported by Zhernakova et al., Nature Genetics 2017). We subsequently determined what P-Value threshold corresponded to a gene-level FDR of 0.05 (the null distribution was determined using permutations, using the same procedure and software as employed by Zhernakova et al, Nature Genetics 2017). Given this P-Value threshold (P = 8.24x10^-4) and a sample-size of 45, the corresponding minimal absolute correlation was 0.48 for a cis-eQTL to be significant. We then determined what the number of unique genes are with a cis-eQTL effect correlation of at least 0.48, as determined in entire bulk RNA-seq cohort of 2,116 samples, to bring down sampling variation. This yielded a set of 2,611 unique genes with a significant cis-eQTL effect, indicating that when using the same number of samples, bulk RNA-seq eQTL analysis will yield 6.9x more genes (2,611) with a significant cis-eQTL effect, than cis-eQTL analysis using scRNA-seq data (379 unique genes with a significant cis-eQTL effect).

We now discuss the above in Box 1:

“Even though a single-cell eQTL dataset has less discovery power than an equal-sized bulk RNA-seq eQTL dataset (6.9 fold difference based on the lowest correlation that led to the identification of a significant eQTL from single-cell¹⁶ vs bulk RNA-seq data¹⁵), it does provide insights that cannot easily be extracted from bulk data. For example, single-cell data allows for the unbiased detection of cell type- and context-dependent eQTLs and has more power to detect co-expression QTLs¹⁶. This makes population-based single-cell datasets a valuable addition to bulk-based datasets for studying the effects of genetic variation on gene expression and its regulation^{16, 23}.”

However, please note that the analysis above is based on a relatively small number of individuals. Therefore, the exact result will be less accurate. In 2020, we will assess the value of single-cell versus bulk RNA-seq data in much more detail in our first proof of concept paper. In this paper we will perform eQTL, cell count QTL, and variance QTL analyses in 3-5 scRNA-seq datasets (~3-4M cells, ~1,500 individuals) and compare the performance to similar size bulk-RNA seq data.

We now inform the reader about these upcoming analyses while presenting the timeline of our consortium (“Future clinical implications” section and Figure 5):

5. Many data are mentioned (genomic data, scRNA-seq, scATAC-seq, sc-protein level...), but it is not always clear which ones will be generated, which ones may be generated, and which ones are already existing datasets. Maybe a figure would help?

To be included, every dataset within the consortium will consist of at least scRNA-seq and genotype data. Additional data layers, including CITE-seq and scATAC-seq data, are or will become available for subsets of some of the cohorts. Please also note that the consortium will not necessarily limit itself to the data that is being generated within the consortium. These additional data layers will be useful sources to, for example, further improve our cell type classification or aid in the reconstruction of regulatory networks. However, as large amounts of data are currently still being generated, exact numbers are constantly being updated, and so we cannot be any more specific.

We now further discuss this in the subsection “Gene regulatory network reconstruction”:

“We will extract this prior information from previously published datasets and large-scale efforts, such as ENCODE² and BLUEPRINT⁴. Additionally, we will make use of single-cell information beyond gene expression levels that is or will be collected within subsets of cohorts within the consortium, including information on chromatin accessibility^{64, 65} and expression of enhancers RNAs^{66, 67}.”

6. There is limited mention of potential ASE-based or splicing analyses.

ASE-based or splicing analyses are currently not the focus of this consortium. We have several reasons for not doing so at the moment. First, all currently contributing groups have generated either 3’- or 5’-end scRNA-seq data. This type of scRNA-seq data is not very suitable for doing ASE-based or splicing analyses. On the contrary, full-length approaches have been used for this purpose, but are not yet sufficiently cost-efficient so that they can be applied on a very large scale (many cells and many individuals). Nevertheless, technological developments (e.g. Smart-seq3 [M Hagemann-Jensen - ‎2019 bioRxiv]) and ever decreasing sequencing cost may open this avenue to be pursued in the future.

In Figure 5 we now clearly present the deliverables of our consortium, and with that make clear that ASE-based or splicing analyses are not the focus of this consortium (see Figure 5 legend):

7. There is limited mention of how multi-omics from single cell data may improve GRN or other analyses. There are multiple studies that have now obtained different data modalities from the same cells.

Whilst single-cell multi-omics data will be useful for the consortium in the future, we believe that in its current state it is too time- and cost-consuming to generate sufficient data for personalized GRN reconstruction. However, to acknowledge its potential, we have added the following to the “Gene regulatory network reconstruction” subsection:

‘Additionally, recent advances have made it possible to measure multi-omics data from the very same cell^{70, 92}. However, current approaches are very time- and cost-consuming, and therefore limited to only a few hundred cells. As such, currently, this type of single-cell multi-omics data is of limited use for reconstructing personalized GRNs. Nevertheless, as single-cell multi-omics approaches mature, this combined information of gene expression and additional data layers has the potential to improve GRN inference beyond correlating separate omics layers and allows for direct measurements instead.’

8. I would expect some discussion of spatial transcriptomics and its potential.

Initially, the sc-eQTLGen consortium will focus on PBMCs. Specifically for PBMCs spatial transcriptomics data may not be so applicable, as cells are freely floating around in the blood, so spatial organization is more dynamic and cannot be easily captured with current techniques. However, we do acknowledge that for solid tissues and organs this is a highly promising approach that can help us uncover aspects that remain hidden with standard scRNA-seq. Spatial information can, for example, help to disentangle trans-eQTL interactions that are modulated through cell-cell communication (e.g. a SNP affects ligand expression in one cell type, and thereby affects downstream receptor signaling in a second cell type [independent of genetics but ligand-downstream receptor signaling: Arneson et al., 2018 - Nature communications; independent of genetics but ligand expression affecting the cell frequency of receptor cells: Smillie et al., 2019 - Cell]). Additionally, this spatial information can be used as prior information to construct more accurate network models that take into account the physical distance between cells and thereby there likelihood of interaction.

We now discuss these aspects in the “Single-cell eQTL analysis: the new era of population genetics” section:

“Another promising avenue that has become available in recent years to gain increased insights in the link between genetics and disease, is through the use of spatial transcriptomics technologies, including MERFISH, seqFISH+, Slide-seq and 10x Visium^{34, 35}. While for PBMCs this approach may not be applicable, in solid tissues and organs this extra layer of spatial information is extremely valuable. For example, it can help to disentangle trans-eQTL interactions that are modulated through cell-cell communication (e.g. a SNP affects ligand expression in one cell type, and thereby affects downstream receptor signaling in a second cell type). Despite not having this spatial information available in PBMCs, other approaches that consider receptor-ligand expression pairs do provide insights in potential cell-cell interactions. These approaches have been successfully applied before to uncover how the ligand expression in one cell type can affect the frequency³⁶ or the downstream signaling³⁷ of another cell type expressing the corresponding receptor.”

And in the “Single-cell GRN reconstruction: taking eQTLs one step further” section:”

“Such priors on regulatory interactions can be derived from, for example, ChIP-seq data⁵⁷, ATAC-seq data⁵⁸, spatial information^{34, 35} or from perturbation experiments^{33, 50}.”

9. How does the consortium and its work relate to/differ from the following project?

The sc-eQTLGen consortium does currently not yet have its own funding. Therefore, individual groups generate their own funding through other sources such as the Chan Zuckerberg Initiative (CZI). Jimmie Ye is one of the PIs that is involved in both this CZI project and the sc-eQTLGen consortium. This CZI project covers just parts of the goals of the sc-eQTLGen consortium (identification of context- and cell type-specific eQTLs, but not reconstruction of personalized, context-specific gene regulatory networks) and is limited to a small number of groups. To truly capture all immune phenotypic variation (genetic and environmental), and being able to study trans-effects in a genome-wide fashion, a larger collaborative effort (such as sc-eQTLGen) is required. So both projects are related and will benefit from each other, but they will have their own focus points.

10. Please outline the deliverables proposed for the consortium (including a timeline for when they will be available).

The main deliverables for the sc-eQTLGen consortium (focused on single-cell PBMC data initially):

1. Standardized pipelines and guidelines for single-cell population genetics studies (Q2-Q3 2020).

2. Automatic cell type classification model for PBMCs (Q2-Q3 2020).

3. Proof of concept eQTL paper (3-5 cohorts – 3-4M cells - ~1,500 individuals): Summary statistics of cell type- and context-specific cis-eQTLs, cell count QTLs, variance QTLs (Q4 2020).

4. Main single-cell eQTL paper (across all cohorts): Summary statistics of genome-wide cis- and trans-eQTL, co-expression QTL, cell count QTL and variance QTL analyses (Q4 2021)

5. Proof of concept gene regulatory network paper (1 cell type, multiple environmental conditions): Reconstruction of personalized, cell type-specific gene regulatory networks (Q1 2021).

6. Main single-cell gene regulatory network paper (across all cohorts and in all cell types): Reconstruction of personalized, cell type-specific gene regulatory networks, including experimental validation (e.g. using CRISPR perturbation scRNA-seq coupled screens) (Q4 2022).

We now specify the expected timeline of our consortium in the “Future clinical implications” section and in Figure 5:

And in the figure legend of Figure 5:

11. Aspects of future data sharing and accessibility are essential to address.

Similarly as we have previously done within the eQTLGen consortium (http://www.eqtlgen.org/), summary statistics (including non-significant findings) will be made available through the sc-eQTLGen website (https://eqtlgen.org/single-cell.html).

This information is now mentioned in the “Future clinical implications” section:

“To increase the impact of this work, all code, guidelines and summary statistics will become freely available to the community through the sc-eQTLGen website (https://eqtlgen.org/single-cell.html). For any additional information, please visit the contact page (https://eqtlgen.org/contact.html).”

12. Please explain how new individuals can become members of the consortium.

Individuals or groups who would like to have additional information or are interested in joining the consortium, can visit the contact page of the sc-eQTLGen website (https://eqtlgen.org/contact.html).

We now mention this in the “Future clinical implications” section:

‘For any additional information, please visit the contact page (https://eqtlgen.org/contact.html).’

13. Please explain the consortium will be funded.

Currently, each group is separately funded through their own funding. Based on data generated in the pilot study that will be conducted in 2020, the consortium aims to apply for consortium-wide funding through initiatives like the Chan Zuckerberg Initiative, national (e.g. NIH) or transnational (e.g. H2020) funding.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.

[bib1] Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, Mahfouz A. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biology. 2019;20:194. doi: 10.1186/s13059-019-1795-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Aguet F. The GTEx consortium atlas of genetic regulatory effects across human tissues. bioRxiv. 2019 doi: 10.1101/787903. [DOI] [PMC free article] [PubMed]

[bib3] Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine JC, Geurts P, Aerts J, van den Oord J, Atak ZK, Wouters J, Aerts S. SCENIC: single-cell regulatory network inference and clustering. Nature Methods. 2017;14:1083–1086. doi: 10.1038/nmeth.4463. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Alquicira-Hernández J, Sathe A, Ji HP, Nguyen Q, Powell JE. scPred: cell type prediction at single-cell resolution. bioRxiv. 2018 doi: 10.1101/369538. [DOI]

[bib5] Andrews TS, Hemberg M. False signals induced by single-cell imputation. F1000Research. 2018;7:1740. doi: 10.12688/f1000research.16613.1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Angelini C, Costa V. Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems. Frontiers in Cell and Developmental Biology. 2014;2:51. doi: 10.3389/fcell.2014.00051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biology. 2017;18:67. doi: 10.1186/s13059-017-1189-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Arneson D, Zhang G, Ying Z, Zhuang Y, Byun HR, Ahn IS, Gomez-Pinilla F, Yang X. Single cell molecular alterations reveal target cells and pathways of concussive brain injury. Nature Communications. 2018;9:3894. doi: 10.1038/s41467-018-06222-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA, Lambourne JJ, Sivapalaratnam S, Downes K, Kundu K, Bomba L, Berentsen K, Bradley JR, Daugherty LC, Delaneau O, Freson K, Garner SF, Grassi L, Guerrero J, Haimel M, Janssen-Megens EM, Kaan A, Kamat M, Kim B, Mandoli A, Marchini J, Martens JHA, Meacham S, Megy K, O'Connell J, Petersen R, Sharifi N, Sheard SM, Staley JR, Tuna S, van der Ent M, Walter K, Wang SY, Wheeler E, Wilder SP, Iotchkova V, Moore C, Sambrook J, Stunnenberg HG, Di Angelantonio E, Kaptoge S, Kuijpers TW, Carrillo-de-Santa-Pau E, Juan D, Rico D, Valencia A, Chen L, Ge B, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yang Y, Guigo R, Beck S, Paul DS, Pastinen T, Bujold D, Bourque G, Frontini M, Danesh J, Roberts DJ, Ouwehand WH, Butterworth AS, Soranzo N. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–1429. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Azizi E, Carr AJ, Plitas G, Cornish AE, Konopacki C, Prabhakaran S, Nainys J, Wu K, Kiseliovas V, Setty M, Choi K, Fromme RM, Dao P, McKenney PT, Wasti RC, Kadaveru K, Mazutis L, Rudensky AY, Pe'er D. Single-Cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174:1293–1308. doi: 10.1016/j.cell.2018.05.060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Bacher R, Chu LF, Leng N, Gasch AP, Thomson JA, Stewart RM, Newton M, Kendziorski C. SCnorm: robust normalization of single-cell RNA-seq data. Nature Methods. 2017;14:584–586. doi: 10.1038/nmeth.4263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Baran Y. MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions. bioRxiv. 2018 doi: 10.1101/437665. [DOI] [PMC free article] [PubMed]

[bib13] Battle A, Mostafavi S, Zhu X, Potash JB, Weissman MM, McCormick C, Haudenschild CD, Beckman KB, Shi J, Mei R, Urban AE, Montgomery SB, Levinson DF, Koller D. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Research. 2014;24:14–24. doi: 10.1101/gr.155192.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, Baying B, Benes V, Teichmann SA, Marioni JC, Heisler MG. Accounting for technical noise in single-cell RNA-seq experiments. Nature Methods. 2013;10:1093–1095. doi: 10.1038/nmeth.2645. [DOI] [PubMed] [Google Scholar]

[bib15] Brown CD, Mangravite LM, Engelhardt BE. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLOS Genetics. 2013;9:e1003649. doi: 10.1371/journal.pgen.1003649. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Budden DM, Hurley DG, Cursons J, Markham JF, Davis MJ, Crampin EJ. Predicting expression: the complementary power of histone modification and transcription factor binding data. Epigenetics & Chromatin. 2014;7:36. doi: 10.1186/1756-8935-7-36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523:486–490. doi: 10.1038/nature14590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Burgess DJ. Spatial transcriptomics coming of age. Nature Reviews Genetics. 2019;20:317. doi: 10.1038/s41576-019-0129-z. [DOI] [PubMed] [Google Scholar]

[bib19] Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, Daza RM, McFaline-Figueroa JL, Packer JS, Christiansen L, Steemers FJ, Adey AC, Trapnell C, Shendure J. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361:1380–1385. doi: 10.1126/science.aau0730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, Buyske S, Schumacher FR, Peters U, Franceschini N, Ritchie MD, Duggan DJ, Spencer KL, Dumitrescu L, Eaton CB, Thomas F, Young A, Carty C, Heiss G, Le Marchand L, Crawford DC, Hindorff LA, Kooperberg CL, PAGE Consortium Generalization and dilution of association results from european GWAS in populations of non-European ancestry: the PAGE study. PLOS Biology. 2013;11:e1001661. doi: 10.1371/journal.pbio.1001661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S, Datta A, Richardson D, Burden F, Mead D, Mann AL, Fernandez JM, Rowlston S, Wilder SP, Farrow S, Shao X, Lambourne JJ, Redensek A, Albers CA, Amstislavskiy V, Ashford S, Berentsen K, Bomba L, Bourque G, Bujold D, Busche S, Caron M, Chen SH, Cheung W, Delaneau O, Dermitzakis ET, Elding H, Colgiu I, Bagger FO, Flicek P, Habibi E, Iotchkova V, Janssen-Megens E, Kim B, Lehrach H, Lowy E, Mandoli A, Matarese F, Maurano MT, Morris JA, Pancaldi V, Pourfarzad F, Rehnstrom K, Rendon A, Risch T, Sharifi N, Simon MM, Sultan M, Valencia A, Walter K, Wang SY, Frontini M, Antonarakis SE, Clarke L, Yaspo ML, Beck S, Guigo R, Rico D, Martens JHA, Ouwehand WH, Kuijpers TW, Paul DS, Stunnenberg HG, Stegle O, Downes K, Pastinen T, Soranzo N. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell. 2016;167:1398–1414. doi: 10.1016/j.cell.2016.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics. 2018;19:232. doi: 10.1186/s12859-018-2217-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Clark SJ, Argelaguet R, Kapourani CA, Stubbs TM, Lee HJ, Alda-Catalinas C, Krueger F, Sanguinetti G, Kelsey G, Marioni JC, Stegle O, Reik W. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nature Communications. 2018;9:781. doi: 10.1038/s41467-018-03149-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Cuomo ASE, Seaton DD, McCarthy DJ, Martinez I, Bonder MJ, Garcia-Bernardo J, Amatya S, Madrigal P, Isaacson A, Buettner F, Knights A, Natarajan KN, Vallier L, Marioni JC, Chhatriwala M, Stegle O, HipSci Consortium Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nature Communications. 2020;11:810. doi: 10.1038/s41467-020-14457-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Donovan MKR, D’Antonio-Chronowska A, D’Antonio M, Frazer KA. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. bioRxiv. 2019 doi: 10.1101/671040. [DOI]

[bib27] Eling N, Richard AC, Richardson S, Marioni JC, Vallejos CA. Correcting the Mean-Variance dependency for differential variability testing using Single-Cell RNA sequencing data. Cell Systems. 2018;7:284–294. doi: 10.1016/j.cels.2018.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Fagny M, Paulson JN, Kuijjer ML, Sonawane AR, Chen CY, Lopes-Ramos CM, Glass K, Quackenbush J, Platig J. Exploring regulation in tissues with eQTL networks. PNAS. 2017;114:E7841–E7850. doi: 10.1073/pnas.1707375114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Fairfax BP, Makino S, Radhakrishnan J, Plant K, Leslie S, Dilthey A, Ellis P, Langford C, Vannberg FO, Knight JC. Genetics of gene expression in primary immune cells identifies cell type–specific master regulators and roles of HLA alleles. Nature Genetics. 2012;44:502–510. doi: 10.1038/ng.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Favé MJ, Lamaze FC, Soave D, Hodgkinson A, Gauvin H, Bruat V, Grenier JC, Gbeha E, Skead K, Smargiassi A, Johnson M, Idaghdour Y, Awadalla P. Gene-by-environment interactions in urban populations modulate risk phenotypes. Nature Communications. 2018;9:827. doi: 10.1038/s41467-018-03202-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Fiers M, Minnoye L, Aibar S, Bravo González-Blas C, Kalender Atak Z, Aerts S. Mapping gene regulatory networks from single-cell omics data. Briefings in Functional Genomics. 2018;17:246–254. doi: 10.1093/bfgp/elx046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Fu J, Wolfs MG, Deelen P, Westra HJ, Fehrmann RS, Te Meerman GJ, Buurman WA, Rensen SS, Groen HJ, Weersma RK, van den Berg LH, Veldink J, Ophoff RA, Snieder H, van Heel D, Jansen RC, Hofker MH, Wijmenga C, Franke L. Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLOS Genetics. 2012;8:e1002431. doi: 10.1371/journal.pgen.1002431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Gasperini M, Hill AJ, McFaline-Figueroa JL, Martin B, Kim S, Zhang MD, Jackson D, Leith A, Schreiber J, Noble WS, Trapnell C, Ahituv N, Shendure J. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell. 2019;176:377–390. doi: 10.1016/j.cell.2018.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Gate RE. Mapping gene regulatory networks of primary CD4+ T cells using single-cell genomics and genome engineering. bioRxiv. 2019 doi: 10.1101/678060. [DOI]

[bib35] Ghanbari M, Lasserre J, Vingron M. Reconstruction of gene networks using prior knowledge. BMC Systems Biology. 2015;9:84. doi: 10.1186/s12918-015-0233-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genetics. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, Bustamante CD, 1000 Genomes Project Demographic history and rare allele sharing among human populations. PNAS. 2011;108:11983–11988. doi: 10.1073/pnas.1019276108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] GTEx Consortium The Genotype-Tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] GTEx Consortium Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, Jansen R, de Geus EJ, Boomsma DI, Wright FA, Sullivan PF, Nikkola E, Alvarez M, Civelek M, Lusis AJ, Lehtimäki T, Raitoharju E, Kähönen M, Seppälä I, Raitakari OT, Kuusisto J, Laakso M, Price AL, Pajukanta P, Pasaniuc B. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. bioRxiv. 2019 doi: 10.1101/576827. [DOI] [PMC free article] [PubMed]

[bib42] Hayashi T, Ozaki H, Sasagawa Y, Umeda M, Danno H, Nikaido I. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nature Communications. 2018;9:619. doi: 10.1038/s41467-018-02866-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Heap GA, Trynka G, Jansen RC, Bruinenberg M, Swertz MA, Dinesen LC, Hunt KA, Wijmenga C, Vanheel DA, Franke L. Complex nature of SNP genotype effects on gene expression in primary human leucocytes. BMC Medical Genomics. 2009;2:1. doi: 10.1186/1755-8794-2-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Heaton H. Souporcell: robust clustering of single cell RNAseq by genotype and ambient RNA inference without reference genotypes. bioRxiv. 2019 doi: 10.1101/699637. [DOI] [PMC free article] [PubMed]

[bib45] Hormozdiari F, Gazal S, van de Geijn B, Finucane HK, Ju CJ, Loh PR, Schoech A, Reshef Y, Liu X, O'Connor L, Gusev A, Eskin E, Price AL. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nature Genetics. 2018;50:1041–1047. doi: 10.1038/s41588-018-0148-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Hsiao CL, Lian I, Hsieh AR, Fann CS. Modeling expression quantitative trait loci in data combining ethnic populations. BMC Bioinformatics. 2010;11:111. doi: 10.1186/1471-2105-11-111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Hu Y, Zhang X. SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data. bioRxiv. 2018 doi: 10.1101/499863. [DOI] [PMC free article] [PubMed]

[bib48] Iacono G, Massoni-Badosa R, Heyn H. Single-cell transcriptomics unveils gene regulatory network plasticity. Genome Biology. 2019;20:110. doi: 10.1186/s13059-019-1713-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Ishigaki K, Kochi Y, Suzuki A, Tsuchida Y, Tsuchiya H, Sumitomo S, Yamaguchi K, Nagafuchi Y, Nakachi S, Kato R, Sakurai K, Shoda H, Ikari K, Taniguchi A, Yamanaka H, Miya F, Tsunoda T, Okada Y, Momozawa Y, Kamatani Y, Yamada R, Kubo M, Fujio K, Yamamoto K. Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nature Genetics. 2017;49:1120–1125. doi: 10.1038/ng.3885. [DOI] [PubMed] [Google Scholar]

[bib50] Jackson CA, Castro DM, Saldi G, Bonneau R, Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. bioRxiv. 2019 doi: 10.7554/eLife.51254. [DOI] [PMC free article] [PubMed]

[bib51] Johnson MD, Plantinga TS, van de Vosse E, Velez Edwards DR, Smith PB, Alexander BD, Yang JC, Kremer D, Laird GM, Oosting M, Joosten LAB, van der Meer JWM, van Dissel JT, Walsh TJ, Perfect JR, Kullberg B-J, Scott WK, Netea MG. Cytokine gene polymorphisms and the outcome of invasive candidiasis: a prospective cohort study. Clinical Infectious Diseases. 2012;54:502–510. doi: 10.1093/cid/cir827. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Kang HM, Subramaniam M, Targ S, Nguyen M, Maliskova L, McCarthy E, Wan E, Wong S, Byrnes L, Lanata CM, Gate RE, Mostafavi S, Marson A, Zaitlen N, Criswell LA, Ye CJ. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nature Biotechnology. 2018;36:89–94. doi: 10.1038/nbt.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Kemp A. Anifrolumab phase III trial meets primary endpoint in systemic lupus erythematosus. [August 29, 2019];2019 https://www.astrazeneca.com/media-centre/press-releases/2019/anifrolumab-phase-iii-trial-meets-primary-endpoint-in-systemic-lupus-erythematosus-29082019.html

[bib54] Knowles DA, Davis JR, Edgington H, Raj A, Favé MJ, Zhu X, Potash JB, Weissman MM, Shi J, Levinson DF, Awadalla P, Mostafavi S, Montgomery SB, Battle A. Allele-specific expression reveals interactions between genetic variation and environment. Nature Methods. 2017;14:699–702. doi: 10.1038/nmeth.4298. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Köhler ND, Büttner M, Theis FJ. Deep learning does not outperform classical machine learning for cell-type annotation. bioRxiv. 2019 doi: 10.1101/653907. [DOI]

[bib56] Kouno T, Moody J, Kwon AT, Shibayama Y, Kato S, Huang Y, Böttcher M, Motakis E, Mendez M, Severin J, Luginbühl J, Abugessaisa I, Hasegawa A, Takizawa S, Arakawa T, Furuno M, Ramalingam N, West J, Suzuki H, Kasukawa T, Lassmann T, Hon CC, Arner E, Carninci P, Plessy C, Shin JW. C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution. Nature Communications. 2019;10:360. doi: 10.1038/s41467-018-08126-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Lidschreiber K, Kastriti ME, Lönnerberg P, Furlan A, Fan J, Borm LE, Liu Z, van Bruggen D, Guo J, He X, Barker R, Sundström E, Castelo-Branco G, Cramer P, Adameyko I, Linnarsson S, Kharchenko PV. RNA velocity of single cells. Nature. 2018;560:494–498. doi: 10.1038/s41586-018-0414-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] Lareau CA, Duarte FM, Chew JG, Kartha VK, Burkett ZD, Kohlway AS, Pokholok D, Aryee MJ, Steemers FJ, Lebofsky R, Buenrostro JD. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nature Biotechnology. 2019;37:916–924. doi: 10.1038/s41587-019-0147-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration The sequence read archive. Nucleic Acids Research. 2011;39:19–21. doi: 10.1093/nar/gkx1097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Lin D, Zhang J, Li J, Xu C, Deng HW, Wang YP. An integrative imputation method based on multi-omics datasets. BMC Bioinformatics. 2016;17:247. doi: 10.1186/s12859-016-1122-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Liu B, Gloudemans MJ, Rao AS, Ingelsson E, Montgomery SB. Abundant associations with gene expression complicate GWAS follow-up. Nature Genetics. 2019;51:768–769. doi: 10.1038/s41588-019-0404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Lloyd-Jones LR, Holloway A, McRae A, Yang J, Small K, Zhao J, Zeng B, Bakshi A, Metspalu A, Dermitzakis M, Gibson G, Spector T, Montgomery G, Esko T, Visscher PM, Powell JE. The genetic Architecture of gene expression in peripheral blood. American Journal of Human Genetics. 2017;100:228–237. doi: 10.1016/j.ajhg.2016.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Molecular Systems Biology. 2019;15:e8746. doi: 10.15252/msb.20188746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Lukowski SW, Lloyd-Jones LR, Holloway A, Kirsten H, Hemani G, Yang J, Small K, Zhao J, Metspalu A, Dermitzakis ET, Gibson G, Spector TD, Thiery J, Scholz M, Montgomery GW, Esko T, Visscher PM, Powell JE. Genetic correlations reveal the shared genetic architecture of transcription in human peripheral blood. Nature Communications. 2017;8:483. doi: 10.1038/s41467-017-00473-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, Pasaniuc B. Probabilistic fine-mapping of transcriptome-wide association studies. Nature Genetics. 2019;51:675–682. doi: 10.1038/s41588-019-0367-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Mandric I. Optimal design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis. bioRxiv. 2019 doi: 10.1101/766972. [DOI] [PMC free article] [PubMed]

[bib68] Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nature Reviews Genetics. 2010;11:499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]

[bib69] Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human demographic history impacts genetic risk prediction across diverse populations. American Journal of Human Genetics. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Maynard KR, Jaffe AE, Martinowich K. Spatial transcriptomics: putting genome-wide expression on the map. Neuropsychopharmacology. 2020;45:232–233. doi: 10.1038/s41386-019-0484-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib71] McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki AE, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PI, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R, Haplotype Reference Consortium A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Michaelson JJ, Loguercio S, Beyer A. Detection and interpretation of expression quantitative trait loci (eQTL) Methods. 2009;48:265–276. doi: 10.1016/j.ymeth.2009.03.004. [DOI] [PubMed] [Google Scholar]

[bib73] Miraldi ER, Pokrovskii M, Watters A, Castro DM, De Veaux N, Hall JA, Lee JY, Ciofani M, Madar A, Carriero N, Littman DR, Bonneau R. Leveraging chromatin accessibility for transcriptional regulatory network inference in T helper 17 cells. Genome Research. 2019;29:449–463. doi: 10.1101/gr.238253.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] Moignard V, Woodhouse S, Haghverdi L, Lilly AJ, Tanaka Y, Wilkinson AC, Buettner F, Macaulay IC, Jawaid W, Diamanti E, Nishikawa SI, Piterman N, Kouskoff V, Theis FJ, Fisher J, Göttgens B. Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nature Biotechnology. 2015;33:269–276. doi: 10.1038/nbt.3154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Mukherjee D, Topol EJ. Pharmacogenomics in cardiovascular diseases. Progress in Cardiovascular Diseases. 2002;44:479–498. doi: 10.1053/pcad.2002.123467. [DOI] [PubMed] [Google Scholar]

[bib76] Ocone A, Haghverdi L, Mueller NS, Theis FJ. Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics. 2015;31:i89–i96. doi: 10.1093/bioinformatics/btv257. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib77] Porcu E, Rüeger S, Lepik K, Santoni FA, Reymond A, Kutalik Z, eQTLGen Consortium, BIOS Consortium Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nature Communications. 2019;10:3300. doi: 10.1038/s41467-019-10936-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib78] Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. bioRxiv. 2019 doi: 10.1101/642926. [DOI] [PMC free article] [PubMed]

[bib79] Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[bib80] Qin J, Hu Y, Xu F, Yalamanchili HK, Wang J. Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. Methods. 2014;67:294–303. doi: 10.1016/j.ymeth.2014.03.006. [DOI] [PubMed] [Google Scholar]

[bib81] Qiu X. Towards inferring causal gene regulatory networks from single cell expression measurements. bioRxiv. 2018 doi: 10.1101/426981. [DOI] [PMC free article] [PubMed]

[bib82] Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLOS Biology. 2006;4:e309. doi: 10.1371/journal.pbio.0040309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, Clevers H, Deplancke B, Dunham I, Eberwine J, Eils R, Enard W, Farmer A, Fugger L, Göttgens B, Hacohen N, Haniffa M, Hemberg M, Kim S, Klenerman P, Kriegstein A, Lein E, Linnarsson S, Lundberg E, Lundeberg J, Majumder P, Marioni JC, Merad M, Mhlanga M, Nawijn M, Netea M, Nolan G, Pe'er D, Phillipakis A, Ponting CP, Quake S, Reik W, Rozenblatt-Rosen O, Sanes J, Satija R, Schumacher TN, Shalek A, Shapiro E, Sharma P, Shin JW, Stegle O, Stratton M, Stubbington MJT, Theis FJ, Uhlen M, van Oudenaarden A, Wagner A, Watt F, Weissman J, Wold B, Xavier R, Yosef N, Human Cell Atlas Meeting Participants The Human Cell Atlas. eLife. 2017;6:e27041. doi: 10.7554/eLife.27041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] Roshchupkin GV, Adams HH, Vernooij MW, Hofman A, Van Duijn CM, Ikram MA, Niessen WJ. HASE: framework for efficient high-dimensional association analyses. Scientific Reports. 2016;6:36076. doi: 10.1038/srep36076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nature Biotechnology. 2019;37:547–554. doi: 10.1038/s41587-019-0071-9. [DOI] [PubMed] [Google Scholar]

[bib86] Sarkar AK, Tung PY, Blischak JD, Burnett JE, Li YI, Stephens M, Gilad Y. Discovery and characterization of variance QTLs in human induced pluripotent stem cells. PLOS Genetics. 2019;15:e1008045. doi: 10.1371/journal.pgen.1008045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Schork NJ. Personalized medicine: time for one-person trials. Nature. 2015;520:609–611. doi: 10.1038/520609a. [DOI] [PubMed] [Google Scholar]

[bib88] Siahpirani AF, Roy S. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Research. 2017;45:2221. doi: 10.1093/nar/gkw1160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Silvester N, Alako B, Amid C, Cerdeño-Tarrága A, Clarke L, Cleland I, Harrison PW, Jayathilaka S, Kay S, Keane T, Leinonen R, Liu X, Martínez-Villacorta J, Menchi M, Reddy K, Pakseresht N, Rajan J, Rossello M, Smirnov D, Toribio AL, Vaughan D, Zalunin V, Cochrane G. The European Nucleotide Archive in 2017. Nucleic Acids Research. 2018;46:D36–D40. doi: 10.1093/nar/gkx1125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] Simpson EH. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B. 1951;13:238–241. doi: 10.1111/j.2517-6161.1951.tb00088.x. [DOI] [Google Scholar]

[bib91] Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib92] Skinnider MA, Squair JW, Foster LJ. Evaluating measures of association for single-cell transcriptomics. Nature Methods. 2019;16:381–386. doi: 10.1038/s41592-019-0372-4. [DOI] [PubMed] [Google Scholar]

[bib93] Smillie CS, Biton M, Ordovas-Montanes J, Sullivan KM, Burgin G, Graham DB, Herbst RH, Rogel N, Slyper M, Waldman J, Sud M, Andrews E, Velonias G, Haber AL, Jagadeesh K, Vickovic S, Yao J, Stevens C, Dionne D, Nguyen LT, Villani AC, Hofree M, Creasey EA, Huang H, Rozenblatt-Rosen O, Garber JJ, Khalili H, Desch AN, Daly MJ, Ananthakrishnan AN, Shalek AK, Xavier RJ, Regev A. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell. 2019;178:714–730. doi: 10.1016/j.cell.2019.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib94] Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, Satija R, Smibert P. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib95] Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavaré S, Deloukas P, Dermitzakis ET. Genome-wide associations of gene expression variation in humans. PLOS Genetics. 2005;1:e78. doi: 10.1371/journal.pgen.0010078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib96] Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavaré S, Deloukas P, Dermitzakis ET. Population genomics of human gene expression. Nature Genetics. 2007;39:1217–1224. doi: 10.1038/ng2142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib97] Sverchkov Y, Craven M. A review of active learning approaches to experimental design for uncovering biological networks. PLOS Computational Biology. 2017;13:e1005466. doi: 10.1371/journal.pcbi.1005466. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib98] Tian L, Dong X, Freytag S, Lê Cao KA, Su S, JalalAbadi A, Amann-Zalcenstein D, Weber TS, Seidi A, Jabbari JS, Naik SH, Ritchie ME. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nature Methods. 2019;16:479–487. doi: 10.1038/s41592-019-0425-8. [DOI] [PubMed] [Google Scholar]

[bib99] Tigchelaar EF, Zhernakova A, Dekens JA, Hermes G, Baranska A, Mujagic Z, Swertz MA, Muñoz AM, Deelen P, Cénit MC, Franke L, Scholtens S, Stolk RP, Wijmenga C, Feskens EJ. Cohort profile: Lifelines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open. 2015;5:e006772. doi: 10.1136/bmjopen-2014-006772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib100] Ud-Dean SM, Gunawan R. Optimal design of gene knockout experiments for gene regulatory network inference. Bioinformatics. 2016;32:875–883. doi: 10.1093/bioinformatics/btv672. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib101] van der Wijst MGP, Brugge H, de Vries DH, Deelen P, Swertz MA, Franke L, LifeLines Cohort Study, BIOS Consortium Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nature Genetics. 2018a;50:493–497. doi: 10.1038/s41588-018-0089-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib102] van der Wijst MGP, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Medicine. 2018b;10:96. doi: 10.1186/s13073-018-0608-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib103] Venet D, Pecasse F, Maenhaut C, Bersini H. Separation of samples into their constituents using gene expression data. Bioinformatics. 2001;17 Suppl 1:S279–S287. doi: 10.1093/bioinformatics/17.suppl_1.S279. [DOI] [PubMed] [Google Scholar]

[bib104] Vieira Braga FA, Kar G, Berg M, Carpaij OA, Polanski K, Simon LM, Brouwer S, Gomes T, Hesse L, Jiang J, Fasouli ES, Efremova M, Vento-Tormo R, Talavera-López C, Jonker MR, Affleck K, Palit S, Strzelecka PM, Firth HV, Mahbubani KT, Cvejic A, Meyer KB, Saeb-Parsy K, Luinge M, Brandsma CA, Timens W, Angelidis I, Strunz M, Koppelman GH, van Oosterhout AJ, Schiller HB, Theis FJ, van den Berge M, Nawijn MC, Teichmann SA. A cellular census of human lungs identifies novel cell states in health and in asthma. Nature Medicine. 2019;25:1153–1163. doi: 10.1038/s41591-019-0468-5. [DOI] [PubMed] [Google Scholar]

[bib105] Villani AC, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J, Griesbeck M, Butler A, Zheng S, Lazo S, Jardine L, Dixon D, Stephenson E, Nilsson E, Grundberg I, McDonald D, Filby A, Li W, De Jager PL, Rozenblatt-Rosen O, Lane AA, Haniffa M, Regev A, Hacohen N. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science. 2017;356:eaah4573. doi: 10.1126/science.aah4573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib106] Võsa U. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. bioRxiv. 2018 doi: 10.1101/447367. [DOI]

[bib107] Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, Clarke D, Gu M, Emani P, Yang YT, Xu M, Gandal MJ, Lou S, Zhang J, Park JJ, Yan C, Rhie SK, Manakongtreecheep K, Zhou H, Nathan A, Peters M, Mattei E, Fitzgerald D, Brunetti T, Moore J, Jiang Y, Girdhar K, Hoffman GE, Kalayci S, Gümüş ZH, Crawford GE, Roussos P, Akbarian S, Jaffe AE, White KP, Weng Z, Sestan N, Geschwind DH, Knowles JA, Gerstein MB, PsychENCODE Consortium Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018;362:eaat8464. doi: 10.1126/science.aat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib108] Westra HJ, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, Christiansen MW, Fairfax BP, Schramm K, Powell JE, Zhernakova A, Zhernakova DV, Veldink JH, Van den Berg LH, Karjalainen J, Withoff S, Uitterlinden AG, Hofman A, Rivadeneira F, Hoen PAC', Reinmaa E, Fischer K, Nelis M, Milani L, Melzer D, Ferrucci L, Singleton AB, Hernandez DG, Nalls MA, Homuth G, Nauck M, Radke D, Völker U, Perola M, Salomaa V, Brody J, Suchy-Dicey A, Gharib SA, Enquobahrie DA, Lumley T, Montgomery GW, Makino S, Prokisch H, Herder C, Roden M, Grallert H, Meitinger T, Strauch K, Li Y, Jansen RC, Visscher PM, Knight JC, Psaty BM, Ripatti S, Teumer A, Frayling TM, Metspalu A, van Meurs JBJ, Franke L. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nature Genetics. 2013;45:1238–1243. doi: 10.1038/ng.2756. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib109] Westra HJ, Arends D, Esko T, Peters MJ, Schurmann C, Schramm K, Kettunen J, Yaghootkar H, Fairfax BP, Andiappan AK, Li Y, Fu J, Karjalainen J, Platteel M, Visschedijk M, Weersma RK, Kasela S, Milani L, Tserel L, Peterson P, Reinmaa E, Hofman A, Uitterlinden AG, Rivadeneira F, Homuth G, Petersmann A, Lorbeer R, Prokisch H, Meitinger T, Herder C, Roden M, Grallert H, Ripatti S, Perola M, Wood AR, Melzer D, Ferrucci L, Singleton AB, Hernandez DG, Knight JC, Melchiotti R, Lee B, Poidinger M, Zolezzi F, Larbi A, Wang deY, van den Berg LH, Veldink JH, Rotzschke O, Makino S, Salomaa V, Strauch K, Völker U, van Meurs JB, Metspalu A, Wijmenga C, Jansen RC, Franke L. Cell specific eQTL analysis without sorting cells. PLOS Genetics. 2015;11:e1005223. doi: 10.1371/journal.pgen.1005223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib110] Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL, Belbin GM, Bien SA, Cheng I, Cullina S, Hodonsky CJ, Hu Y, Huckins LM, Jeff J, Justice AE, Kocarnik JM, Lim U, Lin BM, Lu Y, Nelson SC, Park SL, Poisner H, Preuss MH, Richard MA, Schurmann C, Setiawan VW, Sockell A, Vahi K, Verbanck M, Vishnu A, Walker RW, Young KL, Zubair N, Acuña-Alonso V, Ambite JL, Barnes KC, Boerwinkle E, Bottinger EP, Bustamante CD, Caberto C, Canizales-Quinteros S, Conomos MP, Deelman E, Do R, Doheny K, Fernández-Rhodes L, Fornage M, Hailu B, Heiss G, Henn BM, Hindorff LA, Jackson RD, Laurie CA, Laurie CC, Li Y, Lin DY, Moreno-Estrada A, Nadkarni G, Norman PJ, Pooler LC, Reiner AP, Romm J, Sabatti C, Sandoval K, Sheng X, Stahl EA, Stram DO, Thornton TA, Wassel CL, Wilkens LR, Winkler CA, Yoneyama S, Buyske S, Haiman CA, Kooperberg C, Le Marchand L, Loos RJF, Matise TC, North KE, Peters U, Kenny EE, Carlson CS. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib111] Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, Rajewsky N, Simon L, Theis FJ. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biology. 2019;20:59. doi: 10.1186/s13059-019-1663-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib112] Xu J. Genotype-free demultiplexing of pooled single-cell RNA-seq. bioRxiv. 2019 doi: 10.1101/570614. [DOI] [PMC free article] [PubMed]

[bib113] Xue A, Wu Y, Zhu Z, Zhang F, Kemper KE, Zheng Z, Yengo L, Lloyd-Jones LR, Sidorenko J, Wu Y, McRae AF, Visscher PM, Zeng J, Yang J, eQTLGen Consortium Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nature Communications. 2018;9:2941. doi: 10.1038/s41467-018-04951-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib114] Ye CJ, Feng T, Kwon HK, Raj T, Wilson MT, Asinovski N, McCabe C, Lee MH, Frohlich I, Paik HI, Zaitlen N, Hacohen N, Stranger B, De Jager P, Mathis D, Regev A, Benoist C. Intersection of population variation and autoimmunity genetics in human T cell activation. Science. 2014;345:1254665. doi: 10.1126/science.1254665. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib115] Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib116] Zeng B, Gibson G. PolyQTL: Bayesian multiple eQTL detection with control for population structure and sample relatedness. Bioinformatics. 2019;35:1061–1063. doi: 10.1093/bioinformatics/bty728. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib117] Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, van 't Hof P, Mei H, van Dijk F, Westra HJ, Bonder MJ, van Rooij J, Verkerk M, Jhamai PM, Moed M, Kielbasa SM, Bot J, Nooren I, Pool R, van Dongen J, Hottenga JJ, Stehouwer CD, van der Kallen CJ, Schalkwijk CG, Zhernakova A, Li Y, Tigchelaar EF, de Klein N, Beekman M, Deelen J, van Heemst D, van den Berg LH, Hofman A, Uitterlinden AG, van Greevenbroek MM, Veldink JH, Boomsma DI, van Duijn CM, Wijmenga C, Slagboom PE, Swertz MA, Isaacs A, van Meurs JB, Jansen R, Heijmans BT, 't Hoen PA, Franke L. Identification of context-dependent expression quantitative trait loci in whole blood. Nature Genetics. 2017;49:139–145. doi: 10.1038/ng.3737. [DOI] [PubMed] [Google Scholar]

[bib118] Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The single-cell eQTLGen consortium

MGP van der Wijst

DH de Vries

HE Groot

G Trynka

CC Hon

MJ Bonder

O Stegle

MC Nawijn

Y Idaghdour

P van der Harst

CJ Ye

J Powell

FJ Theis

A Mahfouz

M Heinig

L Franke

Roles

Abstract

Interindividual variation needs to be studied at the single-cell level

Figure 1. Set-up of the single-cell eQTLGen (sc-eQTLGen) consortium.

Integration of sc-eQTLGen within the scientific landscape

Single cell eQTL analysis: the new era of population genetics

Figure 2. scRNA-seq data offers increased flexibility in the eQTL analysis strategy over bulk RNA-seq data.

Box 1.

Single cell GRN reconstruction: taking eQTLs one step further

Figure 3. Reconstruction of personalized gene regulatory networks.

sc-eQTLGen: a federated single-cell eQTL meta-analysis consortium

Figure 4. Overview of the sc-eQTLGen proposed federated approach.

Preprocessing, quality control

Cell type classification

eQTL and co-expression QTL analysis

Gene regulatory network reconstruction

Future clinical implications

Figure 5. Deliverables of the single-cell eQTLGen consortium in relation to their future clinical implications.

Biographies

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Data availability

References

Decision letter

Roles

Author response

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases