Gene expression analysis of whole blood, peripheral blood mononuclear cells, and lymphoblastoid cell lines from the Framingham Heart Study

Roby Joehanes; Andrew D Johnson; Jennifer J Barb; Nalini Raghavachari; Poching Liu; Kimberly A Woodhouse; Christopher J O'Donnell; Peter J Munson; Daniel Levy

doi:10.1152/physiolgenomics.00130.2011

. 2011 Nov 1;44(1):59–75. doi: 10.1152/physiolgenomics.00130.2011

Gene expression analysis of whole blood, peripheral blood mononuclear cells, and lymphoblastoid cell lines from the Framingham Heart Study

Roby Joehanes ^1,², Andrew D Johnson ^1,², Jennifer J Barb ³, Nalini Raghavachari ⁴, Poching Liu ⁴, Kimberly A Woodhouse ⁴, Christopher J O'Donnell ^1,², Peter J Munson ³, Daniel Levy ^1,^2,^✉

PMCID: PMC3289123 PMID: 22045913

Abstract

Despite a growing number of reports of gene expression analysis from blood-derived RNA sources, there have been few systematic comparisons of various RNA sources in transcriptomic analysis or for biomarker discovery in the context of cardiovascular disease (CVD). As a pilot study of the Systems Approach to Biomarker Research (SABRe) in CVD Initiative, this investigation used Affymetrix Exon arrays to characterize gene expression of three blood-derived RNA sources: lymphoblastoid cell lines (LCL), whole blood using PAXgene tubes (PAX), and peripheral blood mononuclear cells (PBMC). Their performance was compared in relation to identifying transcript associations with sex and CVD risk factors, such as age, high-density lipoprotein, and smoking status, and the differential blood cell count. We also identified a set of exons that vary substantially between participants, but consistently in each RNA source. Such exons are thus stable phenotypes of the participant and may potentially become useful fingerprinting biomarkers. In agreement with previous studies, we found that each of the RNA sources is distinct. Unlike PAX and PBMC, LCL gene expression showed little association with the differential blood count. LCL, however, was able to detect two genes related to smoking status. PAX and PBMC identified Y-chromosome probe sets similarly and slightly better than LCL.

Keywords: microarray, system biology, biomarker discovery, fingerprinting genes, data normalization, X-linked expression, cardiovascular disease

owing to accessibility, practicality, and minimal invasiveness, blood-derived RNA sources, such as lymphoblastoid cell lines (LCL), whole blood cells (PAXgene tubes; PAX), and peripheral blood mononuclear cells (PBMC), have been widely used in gene expression studies for biomarker identification (30, 32) and pathway profiling (10). These RNA sources have been useful for identifying biomarker signatures of lupus (9), cancer (75), and bacterial infection (14). This makes blood-derived RNA valuable even when studying diseases involving remote target tissues (66).

Each of these blood-derived RNA sources is known to have inherent characteristics that will result in a unique gene expression profile (25). PAX samples, derived from whole blood, capture RNA profiles of all cell types in whole blood, including erythrocytes, granulocytes (neutrophils, eosinophils, basophils), lymphocytes, monocytes, and platelets. PBMC samples, derived from a Ficoll-filtered lymphocyte and monocyte subset, are largely devoid of granulocytes, platelets, and reticulocytes. LCL samples, derived from lymphoblastoid cell lines [i.e., B cells infected and immortalized by Epstein-Barr virus (EBV), stored frozen and regrown several years after sample collection], represent RNA from a single cell type. In addition, gene expression differences may also arise from varying RNA isolation protocols and sample handling (6, 25, 50, 80).

Despite a growing number of reports of gene expression analysis from these RNA sources, there have been few systematic comparisons of their suitability for biomarker discovery, especially in the context of cardiovascular disease (CVD). Previous studies (50, 67) have examined gene signature differences among these RNA sources. One study examined the expression profile differences among the sources with respect to age and sex (80) in a spotted-array platform. However, none of these studies has a balanced experimental design that can eliminate certain statistical biases in the analysis.

Therefore, the primary goal of this study, which was undertaken as a feasibility study for the Systems Approach to Biomarker Research (SABRe) in CVD Initiative, was to characterize three blood-derived RNA sources, LCL, PAX, and PBMC, for quantity and quality of RNA, and expression properties using an exon-array platform in a balanced experimental design. The performance of these sources was assessed with regard to identification of differential expression of Y-chromosome probe sets with sex, which is a major CVD risk factor (35). Beyond sex associations, associations of expression with other risk factors, such as age, smoking status, and high-density lipoprotein (HDL) cholesterol level, were also explored. In addition, complete blood counts (CBC) (19) obtained at the time of blood collection allowed tests of association of expression with blood cell proportions.

The balanced experimental design permitted a secondary goal of identifying genes whose expression levels are stable across RNA sources within individuals yet highly variable across the population. Such markers may be useful in fingerprinting the samples for forensic identification or in resolving sample mix-ups, which is a common problem in gene expression studies. Last, we also identified genes that are consistently expressed across multiple RNA sources and across individuals, making them suitable for use as calibration markers.

MATERIALS AND METHODS

Study Samples

The first cohort of the Framingham Heart Study (FHS) included 5,209 men and women between 30 and 60 yr of age who enrolled in 1948 and have undergone biennial examinations (24, 54, 55). In 1971, 5,124 children (spouses of children) of the original cohort were recruited to the Framingham Offspring Study (27). In 2002, 4,095 participants were included in the third generation cohort (71). Blood samples were obtained from 50 consecutive participants from the third generation cohort who attended their second examination cycle clinic visit in January 2009. Immortalized cell lines for these same participants were prepared from samples taken during their initial clinic visit 1, ∼5 years earlier. To investigate for possible sample storage effects, we obtained 24 whole blood samples from the offspring cohort which were sampled in 2005–2006 and stored for 3–4 yr at −80°C prior to RNA preparation. Protocols for participant examinations and collection of genetic materials, including immortalized cell lines, were approved by the Boston University Medical Center Institutional Review Board.

Individual Trait Data

Current smoking status (defined as regularly smoking one or more cigarettes per day during the past year), systolic and diastolic blood pressure (seated, measured twice in the left arm by a physician), total and HDL cholesterol levels, fasting blood glucose level, and body mass index (BMI, weight in kg divided by height in m²) were obtained at the clinic visit. Hypertension was defined a systolic blood pressure of at least 140 mmHg or a diastolic blood pressure of at least 90 mmHg or current use of antihypertensive medication. Diabetes was defined as fasting blood glucose of at least 126 mg/dl or current use of insulin or an oral hypoglycemic medication. CBCs were obtained on samples collected from the third generation at the second examination clinic visit using a Beckman Coulter Counter (Beckman Coulter, Brea, CA).

RNA Isolation and Target Labeling

The three RNA sources collected on each of the 50 consecutive individuals included PAX and PBMC (obtained at the second clinic examination of the third generation cohort), and LCL, obtained at the first clinic examination ∼5 yr earlier.

PAXgene samples.

Blood Specimens (2.5 ml) collected in PAXgene tubes from each participant were incubated at room temperature for 4 h for RNA stabilization and then stored at −80°C. RNA was extracted from whole blood using the PAXgene Blood RNA System Kit following the manufacturer's guidelines (62). In brief, samples were removed from −80°C and incubated at room temperature for 2 h to ensure complete lysis. Following lysis, the tubes were centrifuged for 10 min at 5,000 g, the supernatant was discarded, and 500 μl of RNase-free water was added to the pellet. The tube was vortexed thoroughly to resuspend the pellet, centrifuged for 10 min at 5,000 g, and the entire supernatant was discarded. The pellet was resuspended in 360 μl of buffer BR1 by vortexing and RNA was further purified with on-column DNase digestion. Quality of the purified RNA was verified on an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA); RNA concentrations were determined using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE). We amplified 50 ng of total RNA using NuGEN's WT-Ovation Pico RNA Amplification System and labeled it with FL-Ovation cDNA Biotin Module V2 (NuGEN, San Carlos, CA) according to the protocol provided by the supplier.

PBMC samples.

Venous blood (8 ml) from each participant was collected into Vacutainer cell preparation tubes containing sodium citrate and Ficoll (Becton Dickinson, Franklin Lakes, NJ). Purified PBMC suspensions were resuspended in RLT buffer (700–1,000 μl per 10⁷ cells), passed through Qiashredder columns (Qiagen, Valencia, CA), and then stored at −80°C. Total RNA (50 ng) was amplified and labeled using the Affymetrix Whole-Transcript (WT) Sense Target Labeling Protocol without rRNA reduction. Complementary DNA (cDNA) was regenerated through a random-primed reverse transcription using a dNTP mix containing dUTP. The RNA was hydrolyzed with RNase H, and the cDNA was purified. The cDNA was then fragmented by incubation with a mixture of UDG and APE1 with restriction endonucleases and end-labeled via a terminal transferase reaction incorporating a biotinylated dideoxynucleotide.

LCL samples.

Total RNA was extracted from pelleted lymphoblastoid cells of each participant using the Qiagen RNeasy Plus extraction kit according to the manufacturer's protocol (61). The process included a column-based elimination of genomic DNA. Total RNA (50 ng) was amplified and labeled using the Affymetrix Whole-Transcript (WT) Sense Target Labeling Protocol without rRNA reduction. cDNA generation, RNA hydrolysis, fragmentation, and labeling were carried out with the same protocol as described above for PBMC samples.

Microarray Hybridization

We added 5.5 μg of the fragmented, biotinylated cDNA prepared from each of the whole blood, PBMC, and cell line samples to a hybridization cocktail, loaded it on an Affymetrix Human Exon 1.0 ST GeneChip, which contains ∼1.4 million probe sets in total, and hybridized it for 16 h at 45°C and 60 rpm (3). Following hybridization, the array was washed and stained according to the manufacturer's protocol. The stained array was scanned at 532 nm using an Affymetrix GeneChip Scanner 3000, generating CEL files for each array. Aside from 10 samples (4 LCL, 3 PAX, 3 PBMC) with insufficient RNA, all samples were chipped in two batches.

Expression Data Analysis

We applied the robust multichip average (RMA) method (34) to normalize expression values for the remaining 140 samples using the Affymetrix Power Tool (APT) (4) version 1.12.0. We used the following metrics (2) to determine the quality of the hybridized samples: all_probeset_mean, all_probeset_rle_mean, pm_mean, and pos_vs_neg_auc. Two LCL and five PBMC samples failed on these metrics, and were excluded, along with the other samples from the same individuals. After inspecting Y-chromosome probe sets for agreement across the samples from each individual, we additionally removed two PAX and two PBMC samples due to apparent mislabeling. This left 35 individuals with satisfactory results for all three sample types, giving 105 samples. We repeated probe set-level RMA normalization on these samples, retaining only core-level, RefSeq-annotated probe sets, giving 287,329 probe sets in all, representing 18,282 distinct genes. We also performed transcript cluster-level normalization at the “core level,” giving 17,330 RefSeq-annotated genes. The gene counts and annotations are based on Affymetrix NetAffx release 31 (1).

Quality control.

Having discarded samples from participants 21, 23, and 35 due to insufficient RNA, we normalized the CEL files of the remaining 140 chipable samples with the RMA method using the APT software in three runs, one per RNA source. The quality control parameters of these samples are shown in Supplementary Table S13.¹ Since participant 45 did not yield sufficient RNA in its LCL sample, its PAX and PBMC samples were discarded. PBMC samples of participants 2 and 8 and PAX samples of participants 43 and 44 were found mislabeled by the inspection of Y chromosome probe sets. These four samples were discarded. LCL sample of participant 18 was identical to that of participant 17, and LCL sample of participant 42 was identical to that of participant 43. Only LCL sample from participant 18 could be restored. We select only participants with samples having all probe set RLE mean of at most 0.75. This step removed samples from participants 37, 38, 40, 41, 46, and 47. We renormalized the 105 CEL files altogether from the remaining 35 patients.

Postnormalization methods.

To address an apparent systematic bias in gene expression values between PAX results and either PBMC or LCL (Fig. 1A), likely arising from the differences in labeling protocol, we further normalized the data with the S10 postnormalization procedure (52), a variance-stabilizing and quantile-normalizing transform. In addition to S10, we also considered quantile postnormalization (QPN), which is a quantile-normalization transform. We will choose the transform that minimizing variance across participants.

Using QPN, we computed the mean value of each probe set per RNA source, yielding three sets of mean values. We chose PBMC as the reference distribution because its mean values correlate well with those of LCL and PAX. Such selection is aimed to minimize drastic quantile correction. After the mean value of each gene for LCL and PAX was quantile-normalized against that of PBMC, its individual expression values were shifted by the difference between the original and normalized mean values.

For S10, we computed the anti-log of the RMA expression values, calculated the normal quantiles, then computed mean and standard deviations across samples, and then fit a spline to the standard deviation as a function of the mean. A variance-stabilizing transform function is computed from this smooth function, and then applied to the data. Finally, the log base 2 was computed on the normalized data.

After postnormalization, the QPN-transformed mean densities were identical to that of PBMC, while those S10-transformed were diagonally aligned (Fig. 1B). Using two-way ANOVA with RNA source and participant as fixed factors, we determined to use S10 because it minimizes variance across participants while normalizing the quantiles.

Statistical Methods

All statistical analyses were performed both at the exon/probe-set level and at the gene/transcript cluster level using R (63) version 2.11.1 or JMP 9 (SAS, Cary, NC). The MSCL Analyst's Toolbox (freely available at http://abs.cit.nih.gov/MSCLtoolbox/) was used for initial exploratory analyses and feature discovery. We discarded 92,157 probe sets where the intensity of 52 or fewer (≤50%) of the 105 samples was significantly above background [i.e., with detected-above-background (DABG) P values ≤ 0.05], leaving 195,172 for subsequent analysis. DABG filtering was applied to exon-level data and not to gene-level data. In all cases, we calculated the false discovery rates (FDR) with Benjamini and Hochberg's method (11).

To determine the separation of expression patterns across cell types, we performed principal component analysis (PCA) on all 105 arrays on DABG-filtered exon-level data. The PCA was performed using the “prcomp” function (76) of R on centered, but unscaled data.

To determine differentially expressed genes between each pair of RNA sources, we used a two-way ANOVA with fixed factors for sample type (n = 3) and participant (n = 35). We counted probe sets and transcript clusters where mean expression differences among RNA sources were declared significant based on the sample type F-statistic (FDR ≤ 0.05). Comparison of expression between pairs of RNA sources used a post hoc t-test statistic with the same FDR threshold. To identify genes that were uniquely overexpressed in each RNA source compared with the other two sources, we computed the minimum fold-change for each paired comparison and required this to be greater than eightfold.

Stable fingerprinting genes/exons are those with expression levels strongly related to participant, irrespective of RNA source. Using the same two-way ANOVA, we selected such genes/exons with a significant participant effect at FDR ≤ 0.05. A subset of the most significant exons having participant-effect standard deviations of at least twofold change were clustered using Ward's method (53) on their expression level, after subtracting the sample-type effect. Conversely, housekeeping or calibration genes were selected as those with the smallest variation across sample type or across participant. We selected such genes with 1) P value for sample type >0.2, 2) P value for participant >0.2, 3) standard deviation of the within-participant effects less than twofold change, and 4) mean expression level greater than background threshold (4 RMA units).

Transcript profile associations with age, sex, and selected CVD risk factors.

To discover genes that are expressed differentially in men vs. women, we performed a two-sample t-test with unequal variance assumption for each RNA source. We required the exons and genes to pass the FDR ≤ 0.05 threshold.

For trait-based biomarker discovery, we regressed the RMA expression of each gene for each RNA source against the trait, adjusting for age and sex, using the linear mixed-effects model implemented in the R package “lmer” (59). Owing to small sample size, we relaxed the multiple-testing penalty by setting a P value cutoff of 0.05 and selecting only genes with a significant association in more than one RNA source. For confirmatory testing of previously identified biomarkers using our data, we regressed the RMA expression of each exon against the trait, adjusting for age and sex using P value ≤ 0.05.

Differential blood count analysis.

Because blood is a complex tissue made up of varying proportions of several cell types, each with a distinct expression profile, expression of some genes would be expected to vary proportionally to these components. To find such genes, we used a multiple regression model with factors for the absolute count per μl of each measured component: red blood cells, platelets, neutrophils, lymphocytes, monocytes, eosinophils, and basophils. We collected the significance levels (P values) for each factor and performed FDR adjustment as above. We also relaxed the FDR cutoff to 0.2, since few results were obtained at lower levels. The test was repeated for each RNA source.

Gene ontology analysis.

We performed gene ontology (GO) (7) enrichment analysis of the differentially expressed genes between each pair of RNA sources based on exon-level or gene-level data using GOrilla (26). This method determines whether the number of differentially expressed genes having a particular GO assignment is significantly higher than would be expected by chance, given the total number of genes, the total number having that assignment, and the number of differentially expressed genes, overall. We removed unannotated genes and required the remaining genes to pass 1) an FDR ≤ 0.05 threshold and 2) at least a fourfold difference in expression. We ran GOrilla using genes in the Affymetrix NetAffx core-level annotation version 31 for the Human Exon 1.0 ST GeneChip as the background set of genes.

RESULTS

RNA Source Comparison

The clinical characteristics of the study sample are provided in Table 1. PCA revealed striking differences in expression patterns of the three RNA sources (Fig. 2). The first two principal components, attributable to RNA source differences, accounted for 70.88% of overall variance in expression. The PCA plot of the 24 older PAX samples coincide with the newer PAX samples. Therefore, the striking differences among the three RNA sources are much larger than any possible sample storage or aging effects.

Table 1.

Characteristics of the study participants

Characteristic	Value
Age, yr	51 ± 7.3 (27–59)
Sex	21 males/14 females
Body mass index, kg/m²	30.2 ± 5.8 (20.5–42.0)
Systolic blood pressure, mmHg	120.5 ± 13.5 (96–153)
Diastolic blood pressure, mmHg	76.9 ± 7.5 (59–91)
Total cholesterol, mg/dl	202.7 ± 39.1 (145–323)
High density lipoprotein, mg/dl	56.6 ± 18.1 (27–112)
Fasting blood glucose, mg/dl	96.6 ± 7.6 (81–108)
Smoking status	6 males/2 females
Hypertension status	4 males/3 females
Lipid medication use	5 males/4 females
White blood cell count (× 10³/μl)	6.22 ± 1.48 (3.5–9.7)
Red blood cell count (× 10⁶/μl)	4.53 ± 0.47 (3.72–5.47)
Hemoglobin, g/dl	14.06 ± 1.46 (11.3–16.8)
Hematocrit, %	40.90 ± 4.25 (32.6–49.0)
Mean corpuscular volume, fl	90.37 ± 3.86 (83.4–100.2)
Mean corpuscular hemoglobin, pg	31.07 ± 1.94 (27.3–37.3)
Mean corpuscular hemoglobin concentration, g/dl	34.37 ± 1.19 (32.7–39.3)
Red blood cell distribution width, %	12.46 ± 0.60 (11.4–14.1)
Platelet count	232.85 ± 63.32 (48–379)
Mean platelet volume, fl	8.65 ± 0.90 (6.8–10.9)
Neutrophil, %	57.4% ± 8.17 (35.3–72.2)
Lymphocyte, %	30.39% ± 7.48 (19.2–50.8)
Monocyte, %	8.07% ± 2.32 (5.2–14.9)
Eosinophil, %	3.23% ± 1.70 (1.2–8.6)
Basophil, %	0.87% ± 0.36 (0.3–1.7)
Neutrophil count (× 10³/μl)	3.63 ± 1.22 (1.3–6.9)
Lymphocyte count (× 10³/μl)	1.83 ± 0.42 (1.1–3.2)
Monocyte count (× 10³/μl)	0.50 ± 0.16 (0.2–0.8)
Eosinophil count (× 10³/μl)	0.21 ± 0.13 (0.1–0.7)
Basophil count (× 10³/μl)	0.05 ± 0.05 (0.0–0.1)

Open in a new tab

Values are means ± SD (minimum–maximum); n =35.

Fig. 2. — Principal component plot of the 3 RNA sources at the exon level. The RNA source difference explains 70.88% of total variation. The stored samples coincide with the newer ones, indicating a lack of storage effect. PAX06 indicates PAXgene samples assayed in 2005–2006, while PAX09 those assayed in 2009.

About 90% of probe sets (176,641 of the 195,172 expressed above background) were found to differ across RNA sources (FDR ≤ 0.05). Even when an exceedingly low FDR cutoff (≤1×10⁻⁸) was set, more than half the exon probe sets (105,709) differed significantly across RNA sources (Table 2). A similar percentage showed expression differences at the gene level. Most of these expression differences were seen in the PAX vs. LCL and PAX vs. PBMC comparisons. Genes that are uniquely overexpressed by ≥8-fold in each RNA source compared with the other two sources were ranked by level of overexpression, and the topmost are presented in Tables 3–5. The corresponding tables based on exon-level analysis are given in Supplementary Tables S1–S3.

Table 2.

Number of probe sets and transcripts with expression differences among RNA sources

	Exon Level	Gene Level
Differed among 3 RNA sources	105,709 (14,811)^*	10,253
PAX vs. LCL	97,219 (14,728)	9,323
LCL vs. PBMC	51,922 (7,427)	5,708
PBMC vs. PAX	86,829 (14,480)	8,169
Expressed ≥ 8-fold over other 2 RNA sources
PAX	4,859 (2,512)	119
LCL	2,436 (495)	188
PBMC	883 (341)	19
Expressed ≥ 4-fold over other 2 RNA sources
PAX	12,970 (5,339)	336
LCL	6,312 (1,173)	426
PBMC	2,859 (831)	113

Open in a new tab

Results based on S10 postnormalization, application of detected-above-background filtering and significance at false discovery rate (FDR) ≤10⁻⁸ .

Values in parentheses are the number of genes that include the detected probe sets.

PAX, PAXgene tubes; LCL, lymphoblastoid cell lines; PBMC, peripheral blood mononuclear cells.

Table 3.

Genes overexpressed in PAX

Rank^a	Transcript Cluster ID	Gene Symbol	Chr.	Description	Mean PAX^b	PAX/LCL^c	PAX/PBMC^c	Min. FC^d
1	2787958	GYPB	4	glycophorin B (MNS blood group)^*(13)	10.5	319	218	218
2	2907173	HCRP1	6	hepatocellular carcinoma-related HCRP1	11.1	95	102	95
3	2648677	MME	3	membrane metallo-endopeptidase	10.2	97	65	65
4	3453732	TUBA1B	12	tubulin, alpha 1b	9.1	57	174	57
5	4009849	ALAS2	X	aminolevulinate, delta-, synthase 2^*(23)	12.4	118	56	56
6	3037100	RSPH10B	7	radial spoke head 10 homolog B (Chlamydomonas)	7.9	60	56	56
7	3996598	NCRNA00204	X	nonprotein coding RNA 204	8.2	136	48	48
8	2765935	GAFA3	4	FGF-2 activity-associated protein 3	9.6	48	47	47
9	3906007	PRO0628	20	uncharacterized protein PRO0628-like	7.7	46	49	46
10	4010152	LOC442454	X	ubiquinol-cytochrome c reductase binding protein pseudogene	7.5	59	45	45
11	3679643	C16orf72	16	chromosome 16 open reading frame 72	9.7	46	42	42
12	3617458	GOLGA8A	15	golgin A8 family, member A	8.5	41	47	41
13	3489673	KCNRG	13	potassium channel regulator	10.8	63	41	41
14	3399623	THYN1	11	thymocyte nuclear protein 1	12.9	53	39	39
15	3421118	RAP1B	12	RAP1B, member of RAS oncogene family	10.5	195	37	37
16	2375338	OCR1	1	ovarian cancer-related protein 1	9.1	67	36	36
17	2325877	RHD	1	Rh blood group, D antigen^*(8)	6.1	45	36	36
18	4037708	MIR1974	M	microRNA 1974	13.9	48	35	35
19	3886765	PI3	20	peptidase inhibitor 3, skin-derived	10.9	35	36	35
20	3416483	HNRNPA1	12	heterogeneous nuclear ribonucleoprotein A1	10.8	33	47	33
21	3090006	SLC25A37	8	solute carrier family 25, member 37	13.9	41	31	31
22	3498476	LOC100132099	13	FRSS1829	11.0	72	31	31
23	2701294	TMEM14E	3	transmembrane protein 14E	9.7	41	29	29
24	3823304	CYP4F3	19	cytochrome P450, family 4, subfamily F, polypeptide 3	7.8	32	29	29
25	3360401	HBB	11	hemoglobin, beta^*(31)	15.1	1651	27	27
26	2923270	PLN	6	phospholamban	8.0	26	25	25
27	2527580	CXCR2	2	chemokine (C-X-C motif) receptor 2^†(44)	12.0	131	24	24
28	3830484	FFAR2	19	free fatty acid receptor 2	9.1	33	23	23
29	3918696	SON	21	SON DNA binding protein	13.8	21	26	21
30	3920850	KCNJ15	21	potassium inwardly-rectifying channel, subfamily J, member 15	11.0	66	19	19
Genes known to be expressed in erythrocytes or neutrophils
36	2496907	IL1R2	2	interleukin 1 receptor, type II^†(15)	10.2	26	17	17
47	3657253	AHSP	16	alpha hemoglobin stabilizing protein^*	9.0	26	16	16
52	3907190	SLPI	20	secretory leukocyte peptidase inhibitor^†(37)	7.5	25	14	14
58	3475782	GPR109A	12	G protein-coupled receptor 109A^†(43)	10.3	18	13	13
76	3759006	SLC4A1	17	solute carrier family 4, anion exchanger, member 1 (erythrocyte membrane protein band 3, Diego blood group) ^*(74)	10.1	16	11	11
85	3621029	EPB42	15	erythrocyte membrane protein band 4.2^*(69)	8.9	13	10	10
98	3360417	HBD	11	hemoglobin, delta^*(31)	7.4	18	10	10
Genes known to be expressed in erythrocytes or neutrophils but with fold change < 8-fold
160	3217077	HEMGN	9	hemogen^*(46)	8.8	33	6	6
182	3533435	PNN	14	pinin, desmosome associated protein	11.0	6	9	6
267	2731381	CXCL1	4	chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha) ^†(77)	7.7	6	5	5
306	3388751	MMP8	11	matrix metallopeptidase 8 (neutrophil collagenase) ^†(42)	4.4	6	4	4
612	2327677	EPB41	1	erythrocyte membrane protein band 4.1 (elliptocytosis 1, RH-linked) ^*(22)	11.9	11	3	3
840	3089102	EPB49	8	erythrocyte membrane protein band 4.9 (dematin) ^*(65)	10.1	13	2	2

Open in a new tab

Ranked according to minimum fold change (FC).

RMA units, log₂scale.

FC ratio of gene expression between PAX and LCL or between PAX and PBMC.

Minimum FC.

Expressed in erythrocytes (literature reference given within parentheses).

^†

Expressed in neutrophils (literature reference).

Table 4.

Genes overexpressed in LCL

Rank^a	Transcript Cluster ID	Gene Symbol	Chr.	Description	Mean LCL^b	LCL/PAX^c	LCL/PBMC^c	Min. FC^d
1	3248289	CDK1	10	cyclin-dependent kinase 1^¶	8.4	111	68	68
2	3595979	CCNB2	15	cyclin B2^¶	9.5	100	65	65
3	3662687	CCL22	16	chemokine (C-C motif) ligand 22	13.7	63	93	63
4	2333136	CDC20	1	cell division cycle 20 homolog (S. cerevisiae)^¶	11.5	207	58	58
5	3129149	PBK	8	PDZ binding kinase^¶	7.8	77	58	58
6	3041816	DFNA5	7	deafness, autosomal dominant 5	9.4	59	54	54
7	2742935	HSPA4L	4	heat shock 70 kDa protein 4-like	8.3	69	53	53
8	3565663	DLGAP5	14	discs, large (Drosophila) homolog-associated protein 5^¶	9.3	68	49	49
9	3756193	TOP2A	17	topoisomerase (DNA) II alpha 170 kDa^¶	10.0	75	49	49
10	2946225	HIST1H2BB	6	histone cluster 1, H2bb	7.3	115	47	47
11	3629103	KIAA0101	15	KIAA0101	9.5	118	46	46
12	3258168	KIF11	10	kinesin family member 11^¶	8.4	74	46	46
13	3589697	BUB1B	15	budding uninhibited by benzimidazoles 1 homolog beta (yeast) ^¶	9.3	55	45	45
14	3331903	FAM111B	11	family with sequence similarity 111, member B	10.0	72	43	43
15	3443206	AICDA	12	activation-induced cytidine deaminase	9.9	49	42	42
16	2378937	DTL	1	denticleless homolog (Drosophila)	10.9	55	41	41
17	3040518	MACC1	7	metastasis associated in colon cancer 1	9.4	42	40	40
18	3788049	SKA1	18	spindle and kinetochore associated complex subunit 1^¶	6.9	54	39	39
19	2585933	SPC25	2	SPC25, NDC80 kinetochore complex component, homolog (S. cerevisiae)^¶	6.9	48	38	38
20	3881443	TPX2	20	TPX2, microtubule-associated, homolog (Xenopus laevis)^¶	10.5	46	38	38
21	2914777	TTK	6	TTK protein kinase^¶	7.6	57	38	38
22	2838656	HMMR	5	hyaluronan-mediated motility receptor (RHAMM)	7.5	40	38	38
23	2830638	KIF20A	5	kinesin family member 20A^¶	9.4	49	37	37
24	3160658	SLC1A1	9	solute carrier family 1 (neuronal/epithelial high affinity glutamate transporter, system Xag), member 1	8.5	37	38	37
25	2417528	DEPDC1	1	DEP domain containing 1	7.5	55	36	36
26	3258444	CEP55	10	centrosomal protein 55 kDa^¶	8.6	58	36	36
27	3260586	SCD	10	stearoyl-CoA desaturase (delta-9-desaturase)	11.3	44	36	36
28	3354799	CHEK1	11	CHK1 checkpoint homolog (S. pombe)^¶	8.9	64	35	35
29	3648391	TNFRSF17	16	tumor necrosis factor receptor superfamily, member 17	9.3	33	33	33
30	2720251	NCAPG	4	nonSMC condensin I complex, subunit G^¶	8.9	37	33	33
31	3720896	CDC6	17	cell division cycle 6 homolog (S. cerevisiae)^¶	8.3	31	34	31
Known EBV-inducible genes
32	3817380	EBI3	19	Epstein-Barr virus induced 3^*(45)	11.0	46	30	30
112	2377283	CR2	1	complement component (3d/Epstein Barr virus) receptor 2 ^*(12)	9.0	27	12	12
117	3848492	FCER2	19	Fc fragment of IgE, low affinity II, receptor for (CD23) ^*(12)	12.3	20	12	12
Known EBV-inducible genes, significant but with fold change <8.0
302	3332403	MS4A1	11	membrane-spanning 4-domains, subfamily A, member 1^*(12)	12.2	14	6	6
308	2438892	FCRL5	1	Fc receptor-like 5^*(51)	8.0	5	6	5
319	3063685	MCM7	7	minichromosome maintenance complex component 7^*(38)	11.1	6	5	5
422	2402459	STMN1	1	stathmin 1^*(20, 38)	10.5	4	6	4
438	3677752	TRAP1	16	TNF receptor-associated protein 1^*(38)	10.7	7	4	4
359	2440327	SLAMF1	1	signaling lymphocytic activation molecule family member 1^*	9.5	5	5	5
561	2901913	TUBB	6	tubulin, beta^*(38)	13.5	4	3	3
770	3259253	ENTPD1	10	ectonucleoside triphosphate diphosphohydrolase 1^*(12)	11.8	3	4	3
778	2317317	TP73	1	tumor protein p73^*(18)	8.7	4	3	3
912	2320683	TNFRSF8	1	tumor necrosis factor receptor superfamily, member 8^*(12)	9.5	2	2	2
977	3820443	ICAM1	19	intercellular adhesion molecule 1^*(68)	10.4	7	2	2
1064	2592268	STAT1	2	signal transducer and activator of transcription 1, 91 kDa ^*(49)	12.3	2	2	2
1216	2526759	ATIC	2	5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP cyclohydrolase^*(38)	11.1	2	3	2
1323	2877508	HSPA9	5	heat shock 70 kDa protein 9 (mortalin)^*(38)	11.0	2	2	2
2389	3743906	TP53	17	tumor protein p53^*(41)	11.4	7	2	2

Open in a new tab

Ranked according to minimum FC.

RMA units, log₂scale.

FC ratio of gene expression between LCL and PAX or between LCL and PBMC.

Minimum FC.

Epstein Barr-Virus-inducible genes (literature reference given within parentheses).

^¶

Cell-cycle related genes by Gene Ontology (GO).

Table 5.

Genes overexpressed in PBMC

Rank^a	Transcript Cluster ID	Gene Symbol	Chr.	Description	Mean PBMC^b	PBMC/LCL^c	PBMC/PAX^c	Min. FC^d
1	3012978	GNG11	7	guanine nucleotide binding protein (G protein), gamma 11	9.9	35	41	35
2	2701081	P2RY12	3	purinergic receptor P2Y, G-protein coupled, 12^¶	7.7	21	18	18
3	3589458	THBS1	15	thrombospondin 1^¶	10.8	48	16	16
4	2761837	FGFBP2	4	fibroblast growth factor binding protein 2	9.5	33	15	15
5	3535780	PTGER2	14	prostaglandin E receptor 2 (subtype EP2), 53 kDa	10.1	17	14	14
6	3724545	ITGB3	17	integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61)^¶	11.7	14	12	12
7	3891342	TUBB1	20	tubulin, beta 1	12.7	246	12	12
8	2773972	CXCL11	4	chemokine (C-X-C motif) ligand 11	5.6	10	11	10
9	3841506	LAIR2	19	leukocyte-associated immunoglobulin-like receptor 2	7.3	29	10	10
10	3866831	CABP5	19	calcium binding protein 5	5.4	14	9	9
11	2443417	SELP	1	selectin P (granule membrane protein 140 kDa, antigen CD62)^¶	9.2	29	9	9
12	2987544	LFNG	7	LFNG O-fucosylpeptide 3-beta-N-acetylglucosaminyltransferase	10.1	9	11	9
13	3729052	YPEL2	17	yippee-like 2 (Drosophila)	8.2	10	9	9
14	3417842	LRP1	12	low density lipoprotein receptor-related protein 1	10.6	13	8	8
15	3904508	SLA2	20	Src-like-adaptor 2	10.4	19	8	8
16	2783596	PDE5A	4	phosphodiesterase 5A, cGMP-specific^¶	7.8	28	8	8
17	3579114	BCL11B	14	B-cell CLL/lymphoma 11B (zinc finger protein)	10.0	8	10	8
18	2902609	C6orf25	6	chromosome 6 open reading frame 25	12.3	30	8	8
19	3188111	PTGS1	9	prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase)	11.4	16	8	8

Open in a new tab

Ranked according to minimum FC.

bRMA units, log₂scale.

FC ratio of gene expression between PBMC and LCL or between PBMC and PAX.

Minimum FC.

^¶

Platelet-related genes by GO.

Seven red blood cell-related genes were overexpressed in PAX compared with the other two RNA sources (Table 3) including the top-ranked GYPB (glycophorin B) gene. Hemoglobin and hemoglobin-related genes, including HBB, HBD, ALAS2, RHD, and AHSP are seen in high-ranking positions. Many known erythrocyte-related genes, such as HEMGN, EPB42, EPB49, and SLC4A1, were also significantly higher in PAX, but were seen above the eightfold cutoff only in the exon-level analysis. These observations clearly result from the fact that PAX samples are derived from whole blood, which comprises predominantly erythrocytes and reticulocytes as well as white blood cells. Genes associated with neutrophils such as SLPI were also overexpressed in PAX compared with LCL or PBMC (Table 3). Most of these genes were also moderately expressed in PBMC but markedly lower in LCL expression. The top GO (7) categories (see Table 6) for genes most highly expressed in PAX are related to RNA splicing and processing.

Table 6.

GO categories for overexpressed genes in each RNA source

GO ID	GO Category	−log₁₀(P)^a	Significant Genes, n^b	Genes in Category, n
PAX
GO:0008380	RNA splicing	6	17	259
GO:0006397	mRNA processing	5	19	335
GO:0000375	RNA splicing, via transesterification reactions	5	13	166
GO:0000377	RNA splicing, via transesterification reactions with bulged adenosine as nucleophile	5	12	160
GO:0000398	nuclear mRNA splicing, via spliceosome	5	12	160
GO:0016071	mRNA metabolic process	4	21	462
GO:0006396	RNA processing	3	22	560
LCL
GO:0022402	cell cycle process	76	130	723
GO:0022403	cell cycle phase	69	103	462
GO:0007049	cell cycle	69	119	665
GO:0000278	mitotic cell cycle	47	69	293
GO:0051301	cell division	40	62	278
GO:0010564	regulation of cell cycle process	37	62	319
GO:0000087	M phase of mitotic cell cycle	35	37	90
GO:0000279	M phase	35	37	92
GO:0000236	mitotic prometaphase	29	32	82
GO:0006996	organelle organization	29	110	1356
GO:0071156	regulation of cell cycle arrest	29	45	208
GO:0051726	regulation of cell cycle	28	70	585
GO:0000075	cell cycle checkpoint	27	42	192
GO:0000280	nuclear division	26	40	175
GO:0007067	mitosis	26	40	175
GO:0048285	organelle fission	26	41	186
PBMC
GO:0030168	platelet activation	13	17	224
GO:0001775	cell activation	12	22	455
GO:0002576	platelet degranulation	10	10	76
GO:0050817	coagulation	10	19	434
GO:0007596	blood coagulation	10	19	434
GO:0007599	hemostasis	10	19	438
GO:0050878	regulation of body fluid levels	9	19	499
GO:0002376	immune system process	8	25	922
GO:0006955	immune response	8	19	545
GO:0050896	response to stimulus	7	64	5133
GO:0006887	exocytosis	7	10	156
GO:0007165	signal transduction	7	44	2942
GO:0006952	defense response	6	17	620
GO:0046903	secretion	5	13	391
GO:0051716	cellular response to stimulus	5	46	3515

Open in a new tab

Determined by GOrilla (26).

Overexpressed by at least 4-fold compared with other 2 RNA sources and met the FDR ≤0.05 threshold on gene-level analysis.

The top 32 genes specific to LCL (Table 4) are rich in cell cycle-related genes. For example, CDK1 (cyclin-dependent kinase 1) and CCNB2 (cyclin B2) are 68- and 65-fold overexpressed in LCL compared with the other two RNA sources. Top GO categories for genes most highly expressed in LCL (see Table 6) are related to cell cycle and mitosis, which are indicative of a cell line undergoing rapid cell division. Several genes known to be induced by EBV were also overexpressed in LCL. Of these, EBI3 (Epstein-Barr induced 3) was the most highly differentially expressed. Fifteen other known EBV-induced genes show significant overexpression of two- to sixfold. In general, exon-level analysis was more sensitive than gene-level analysis in identifying such genes (Supplementary Table S2). Comparison of LCL with PAX expression appeared to be generally more sensitive to EBV-induced differences than the comparison with PBMC (e.g., CR2, Table 4).

PBMC overexpression (Table 5) was seen in many genes known to be platelet specific, or involved in coagulation. For example, P2RY12, THBS1, ITGB3, PTGS1 were abundantly expressed in PBMC vs. PAX and PBMC vs. LCL. Evidently, the inclusion of platelets within the PBMC fraction is sufficient to for allow detection of these genes (64). The top GO categories (Table 6) for genes most highly expressed in PBMC are related to immune response, platelet activation, and blood coagulation reflecting the primary presence of lymphocytes and monocytes, and some platelets, in this sample type.

Analysis of Differential Blood Count Data

We were able to identify the associations of numerous genes with individual blood elements in the differential blood count (Table 7). In PAX samples, most of the genes with positive associations were associated with neutrophil or lymphocyte counts, while in PBMC, the genes were generally associated with lymphocyte (36, 70, 73) and monocyte counts, as would be expected based on the cell-type composition of these sources. Some of the neutrophil-associated genes, such as SLPI and IL1R2, are also reported in Table 3. As before, the exon-level analysis often detected more genes than did the gene-level analysis. GO analysis of these genes showed overrepresentation in the categories of immune system regulation, lymphocyte, leukocyte, and T-cell activation (Supplementary Tables S4–S6). In contrast, no genes were associated with the differential blood count in RNA from LCL. This may be due to the single cell type represented in LCL or because the LCL samples were derived from whole blood obtained 5 yr prior to the differential blood counts whereas PAX and PBMC were drawn at the same time as the blood counts were performed.

Table 7.

Number of genes with positive association^* with differential blood count, for each RNA source at the gene and exon level

	Gene Level				Exon Level^a
	LCL	PAX	PBMC	LCL	PAX	PBMC
Red blood cell count	0	0	3	0	0	15 (14)
Neutrophil count	0	354	0	0	1,178 (475)	0
Lymphocyte count	0	636	131	0	354 (259)	458 (241)
Monocyte count	0	2	154	0	1 (1)	321 (200)
Eosinophil count	0	2	0	0	0	0
Basophil count	0	0	0	0	0	0
Platelet count	0	0	0	0	0	1 (1)

Open in a new tab

Significant at FDR ≤0.2 level, counting only genes with positive association.

Number of exons (number of genes).

Identification of Genes Associated With Major CVD Risk Factors

Sex.

Not surprisingly, a search for biomarkers of sex in our study yielded many Y-chromosome genes (Table 8). All three sample types identified 128 exons within nine distinct genes residing on the Y-chromosome at FDR ≤ 0.05 level. An additional 28 exons on 11 Y-chromosome genes were detected in one or more of the RNA sources, with PAX able to detect 16 exons, PBMC 15 exons, and LCL 9 exons at this FDR level. Only 14 exons of two X-linked genes [KDM5C, KDM6A, lysine (K)-specific demethylase 5C and 6A] were differentially expressed in women vs. men in all three RNA sources. However, 142 exons in 22 genes (Table 8) showed differential expression in at least one source. Several of these genes are obvious homologs to their Y-linked counterparts (DDX3X, EIF1AX, NLGN4X, PRKX, RPS4X, ZFX). Interestingly, in LCL samples more X-linked overexpression in women was detected, with 99 additional exons beyond those detected in all three sources, compared with 28 for PAX and 25 for PBMC. The key gene responsible for X inactivation (XIST), which is ordinarily highly overexpressed in women, is less overexpressed in women in LCL samples (Supplementary Table S7), compared with PAX or PBMC (female to male fold-change of 25 in LCL, 140 in PAX, and 52 in PBMC). Furthermore, we observed that XIST expression in LCL in women is significantly correlated with 339 of 648 X-linked genes (FDR ≤ 0.2 genome wide). The majority (206) are negatively correlated with X-linked expression, further supporting the idea that XIST-mediated X inactivation is substantially and variably disrupted by EBV infection/transformation and/or culture conditions of the LCL samples.

Table 8.

X- and Y-linked exons detected as expression biomarkers of sex using 3 RNA sources

			Number of Exons Detected by		Number of Additional Exons Detected with:
Gene Symbol	Description	Total Exons	Any of 3	All 3	LCL	PAX	PBMC
Y-chromosome
CYorf15B	chromosome Y open reading frame 15B	18	13	9	0	4	2
DDX3Y	DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, Y-linked	22	20	16	4	0	3
EIF1AY	eukaryotic translation initiation factor 1A, Y-linked	9	8	7	1	0	1
NLGN4Y	neuroligin 4, Y-linked	6	1	0	1	0	0
PRKY	protein kinase, Y-linked	16	7	4	1	2	1
PRY	PTPN13-like, Y-linked	22	1	0	0	1	0
RPS4Y1	ribosomal protein S4, Y-linked 1	12	12	12	0	0	0
RPS4Y2	ribosomal protein S4, Y-linked 2	7	4	2	0	1	1
TMSB4Y	thymosin beta 4, Y-linked	6	2	0	0	1	2
USP9Y	Ubiquitin-specific peptidase 9, Y-linked	53	40	38	1	2	0
UTY	ubiquitously transcribed tetratricopeptide repeat gene, Y-linked	52	42	35	0	5	4
ZFY	zinc finger protein, Y-linked	10	6	5	1	0	1
Total		233	156	128	9	16	15
X-chromosome
DDX3X	DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, X-linked	28	6	0	3	2	2
EIF1AX	eukaryotic translation initiation factor 1A, X-linked	15	6	0	5	2	0
EIF2S3	eukaryotic translation initiation factor 2, subunit 3 gamma, 52 kDa	15	4	0	4	1	0
HDHD1A	haloacid dehalogenase-like hydrolase domain containing 1A	7	4	0	4	1	0
KDM5C	lysine (K)-specific demethylase 5C	37	27	5	20	2	5
KDM6A	lysine (K)-specific demethylase 6A	42	29	9	17	2	6
NLGN4X	neuroligin 4, X-linked	27	2	0	2	0	0
PLCXD1	phosphatidylinositol-specific phospholipase C, X domain containing 1	26	1	0	0	0	1
PNPLA4	patatin-like phospholipase domain containing 4	8	5	0	5	0	0
PRKX	protein kinase, X-linked	18	6	0	2	3	3
RPS4X	ribosomal protein S4, X-linked	11	8	0	8	1	0
SEPT6	septin 6	16	2	0	0	0	2
SMC1A	structural maintenance of chromosomes 1A	32	12	0	12	2	0
STS	steroid sulfatase (microsomal), isozyme S	24	8	0	8	0	0
TCEANC	transcription elongation factor A (SII) N-terminal and central domain containing	8	1	0	0	1	0
TIMM8A	translocase of inner mitochondrial membrane 8 homolog A (yeast)	4	1	0	0	1	0
TXLNG	taxilin gamma	14	4	0	0	2	2
VSIG4	V-set and immunoglobulin domain containing 4	14	3	0	0	3	0
XPNPEP2	X-prolyl aminopeptidase (aminopeptidase P) 2, membrane-bound	28	2	0	0	0	2
ZFX	zinc finger protein, X-linked	13	6	0	6	4	0
ZRSR2	zinc finger (CCCH type), RNA-binding motif and serine/arginine rich 2	20	4	0	2	1	2
ZXDB	zinc finger, X-linked, duplicated B	12	1	0	1	0	0
Total		419	142	14	99	28	25

Open in a new tab

There were 90 autosomal exons (74 genes) associated with sex at FDR ≤ 0.05, none was significant in more than one RNA source (Table 9). PBMC identified exons from 52 genes, while LCL and PAX identified 19 genes each.

Table 9.

Autosomal genes associated with sex

Transcript cluster ID	Gene Symbol	Chr.	Description	Effect LCL^*	Effect PAX^*	Effect PBMC^*
3631397	UACA	15	uveal autoantigen with coiled-coil domains and ankyrin repeats	0.43	0.29	0.14
2880361	JAKMIP2	5	janus kinase and microtubule interacting protein 2	0.25	0.60	0.69
3712675	RAI1	17	retinoic acid induced 1	−0.22	−0.10	0.13
3373946	TIMM10	11	translocase of inner mitochondrial membrane 10 homolog (yeast)	0.17	0.15	0.36
3725602	ABI3	17	ABI family, member 3	−0.16	0.03	0.40
2439101	FCRL1	1	Fc receptor-like 1	−0.12	−0.58	−0.81
2893109	LOC100129033	6	QIQN5815	−0.10	−0.58	0.07
3857811	C19orf12	19	chromosome 19 open reading frame 12	0.08	0.20	0.30
3223687	PHF19	9	PHD finger protein 19	0.04	0.03	0.31
3264621	TCF7L2	10	transcription factor 7-like 2 (T-cell specific, HMG-box)	−0.04	0.22	0.61
3417184	SUOX	12	sulfite oxidase	−0.04	0.06	0.35
3543935	COQ6	14	coenzyme Q6 homolog, monooxygenase (S. cerevisiae)	−0.04	−0.26	0.06
2607055	PASK	2	PAS domain containing serine/threonine kinase	0.04	−0.18	−0.21
3870990	GP6	19	glycoprotein VI (platelet)	0.04	−0.36	−0.24
3534866	MGAT2	14	mannosyl (alpha-1,6-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase	0.02	−0.04	0.26
3940992	ASPHD2	22	aspartate beta-hydroxylase domain containing 2	0.01	0.15	0.40

Open in a new tab

In log₂ RMA units. Positive effects are highly expressed in males. Partial list, genes are significant in at least 1 RNA source, FDR ≤0.2.

Smoking.

Several probe sets in the CHRNA3 (cholinergic receptor, nicotinic, alpha 3) gene were downregulated in smokers in all three RNA sources in our study, though not significantly (P < 0.10). PAX and PBMC samples showed a stronger tendency toward downregulation (P = 0.06) on probe set ID 3634334. Variants of CHRNA3 have been associated with smoking behavior and susceptibility to lung cancer (5). Genetic variants in ALDH2 (aldehyde dehydrogenase 2) have been studied extensively in relation to smoking and lung cancer risk (57). In our study, LCL samples detected 1.36- to 2.84-fold lower expression of this gene in smokers, with 14 of 16 probe sets having P values < 0.05. Average expression of all 16 exons differed significantly in LCL samples (P = 0.004) was borderline for PBMC (P = 0.054) but did not differ for PAX (P = 0.247).

Age.

The relatively narrow age range of the participants hindered biomarker detection for age. Nevertheless, five genes were associated with age (P < 0.05) for each of the three RNA sources (Supplementary Table S8). One of them, TP53, has been associated with senescence (28). The magnitude of expression differences was small with only three out of five genes having the same directional difference in all three RNA sources.

HDL cholesterol levels.

Since the small sample size hindered discovery of gene expression signatures of HDL cholesterol, we sought to confirm previously observed associations with HDL. Four such genes were seen to be associated in PAX at P < 0.05, four were associated in PBMC and none in LCL (Supplementary Table S9). Two genes, FADS1 (fatty acid desaturase 1) and LDLR (low-density lipoprotein receptor), were associated with HDL levels in both PAX and PBMC, with small but consistent inverse associations of higher expression with lower HDL. These genes are known to influence circulating lipid levels and risk of coronary artery disease (78). The remaining CVD risk factors listed in Table 1, including BMI, total cholesterol, and blood pressure, were analyzed but did not reveal any significant association with gene expression.

Robust and Consistent Markers

“Fingerprinting” genes.

We identified a number of exons that strongly distinguished individual participants, irrespective of RNA source. These fingerprinting exons have robust expression levels (i.e., their relative expression is independent of RNA source) and may allow for identification of individuals within a large study sample.

We selected 423 such exons drawn from 247 distinct genes having statistical significance (Table 10, Supplementary Table S10). Among the top results were several histocompatibility antigen genes (HLA-DRB1, HLA-DRB5, HLA-DPB1, HLA-B, HLA-DQA2, HLA-DQB2). HLA genes code for antigenic surface proteins used by the immune system to recognize “self” and thus are highly specific to an individual's ancestry. These genes have been suggested as biomarkers for autoimmune diseases (56, 60). These 423 selected exons were able to cluster the three samples from each participant perfectly. Indeed, a subset of only 38 autosomal exons exhibiting the largest F-ratio for participant effects together with five exons on the X- and Y-chromosomes were sufficient to cluster the participants perfectly (Fig. 3). Of note, these fingerprinting markers include one exon of the β-actin gene (ACTB), commonly used as a calibration standard or housekeeping gene. This ACTB exon exhibits a strongly bimodal expression pattern (Fig. 4), possibly due to the influence of an underlying or associated SNP. A similar bimodal pattern is also seen in other probe sets (Fig. 5), such as exons of genes GSTM1, HLA-DRB1, and OAS1. OAS1, which encodes a protein vital to immune response to viral infection, is associated with multiple diseases (40) and contains common functional variation that strongly affects exon inclusion (58). In the case of GSTM1, the bimodal pattern is evident in eight consecutive probe sets covering seven distinct exons, suggesting a true pattern of bimodal expression or extensive splice variation, rather than the direct influence of a single SNP. GSTM1 is an important drug and xenobiotic metabolizing enzyme that is known to exhibit common copy number variation that likely contributes to the observed bimodal pattern of expression (33). The complete list of fingerprinting exons is given in Supplementary Table S10.

Table 10.

Partial list of exons with FDR ≤0.05 for the participant effect and SD across participant of at least 2-fold for this effect

Probe Set ID^a	Gene Symbol	Chr.	Description	F(subject)^b	Det. Exons^c	Total Exons, n^d
4030178	DDX3Y	Y	DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, Y-linked	85.2	18	22
4035087	UTY	Y	ubiquitously transcribed tetratricopeptide repeat gene, Y-linked	74.4	33	52
4030146	USP9Y	Y	ubiquitin-specific peptidase 9, Y-linked	73.5	34	53
4028553	RPS4Y1	Y	ribosomal protein S4, Y-linked 1	46.4	12	12
3764386	SUPT4H1	17	suppressor of Ty 4 homolog 1 (S. cerevisiae)	46.0	1	11
4048279	HLA-DRB1	6	major histocompatibility complex, class II, DR beta 1	45.9	4	8
2350995	GSTM1	1	glutathione S-transferase mu 1	44.4	3	14
3717652	ZNF207	17	zinc finger protein 207	43.7	1	24
3505812	PARP4	13	poly (ADP-ribose) polymerase family, member 4	41.3	1	43
4031141	EIF1AY	Y	eukaryotic translation initiation factor 1A, Y-linked	41.2	7	9
2825746	HSD17B4	5	hydroxysteroid (17-beta) dehydrogenase 4	38.9	1	26
4028588	ZFY	Y	zinc finger protein, Y-linked	35.4	5	10
3988474	DOCK11	X	dedicator of cytokinesis 11	34.7	1	57
3036926	ACTB	7	actin, beta	34.7	1	11
3432446	OAS1	12	2′,5′-oligoadenylate synthetase 1, 40/46 kDa	31.5	1	15
2367199	BAT2L2	1	HLA-B associated transcript 2-like 2	31.3	1	48
3304629	NT5C2	10	5′-nucleotidase, cytosolic II	28.6	1	22
2984580	SFT2D1	6	SFT2 domain containing 1	28.4	1	9
3462877	NAP1L1	12	nucleosome assembly protein 1-like 1	27.9	1	22
4028462	CD99	Y	CD99 molecule	25.5	1	27
4048249	HLA-DRB5	6	major histocompatibility complex, class II, DR beta 5	25.2	5	11
2727952	EXOC1	4	exocyst complex component 1	24.7	1	25
3831276	ZNF146	19	zinc finger protein 146	24.5	1	8
4025365	IDS	X	iduronate 2-sulfatase	23.7	1	20
2469139	TAF1B	2	TATA box binding protein (TBP)-associated factor, RNA polymerase I, B, 63 kDa	23.5	1	17
3067144	COG5	7	component of oligomeric golgi complex 5	23.4	1	33
2366603	C1orf112	1	SCY1-like 3 (S. cerevisiae)	23.1	1	36
2903428	HLA-DPB1	6	major histocompatibility complex, class II, DP beta 1	23.1	1	8
2367974	RABGAP1L	1	RAB GTPase-activating protein 1-like	21.6	1	47
2418460	CRYZ	1	crystallin, zeta (quinone reductase)	21.3	1	15
3105938	CPNE3	8	copine III	20.2	1	22
2989124	ZDHHC4	7	zinc finger, DHHC-type containing 4	19.4	1	13
2603075	SP110	2	SP110 nuclear body protein	19.4	1	24
3975522	KDM6A	X	lysine (K)-specific demethylase 6A	19.1	1	42
2821406	ERAP2	5	endoplasmic reticulum aminopeptidase 2	18.8	22	27
3004680	ZNF138	7	zinc finger protein 138	18.7	1	15
3395427	HSPA8	11	heat shock 70 kDa protein 8	18.5	1	17
3238248	MLLT10	10	myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila); translocated to, 10	17.7	1	38
4015713	BTK	X	Bruton agammaglobulinemia tyrosine kinase	17.5	1	24
2518349	ITGA4	2	integrin, alpha 4 (antigen CD49D, alpha 4 subunit of VLA-4 receptor)	17.4	1	40
3707765	MIS12	17	MIS12, MIND kinetochore complex component, homolog (S. pombe)	16.5	1	5
2816364	IQGAP2	5	IQ motif containing GTPase-activating protein 2	16.5	1	45
3517836	KLF12	13	Kruppel-like factor 12	16.1	1	23
2542747	LAPTM4A	2	lysosomal protein transmembrane 4 alpha	15.8	1	10
2948952	HLA-B	6	major histocompatibility complex, class I, B	15.6	1	12
2351023	GSTM5	1	glutathione S-transferase mu 5	15.5	1	11
3056088	BAZ1B	7	bromodomain adjacent to zinc finger domain, 1B	15.3	1	39
3385778	CTSC	11	cathepsin C	15.2	1	15
3932139	PSMG1	21	proteasome (prosome, macropain) assembly chaperone 1	14.9	1	11
2961826	PHIP	6	pleckstrin homology domain-interacting protein	14.9	1	48
3879393	PLK1S1	20	polo-like kinase 1 substrate 1	14.4	1	19
2723770	TBC1D1	4	TBC1 (tre-2/USP6, BUB2, cdc16) domain family, member 1	14.4	1	32
3576822	TRIP11	14	thyroid hormone receptor interactor 11	14.3	1	30
2903265	HLA-DQA2	6	major histocompatibility complex, class II, DQ alpha 2	14.3	1	5
3996335	RPL10	X	ribosomal protein L10	14.2	1	14
3485880	EXOSC8	13	exosome component 8	14.2	1	12
3584495	SNRPN	15	small nuclear ribonucleoprotein polypeptide N	14.0	1	23
4031106	CYorf15B	Y	chromosome Y open reading frame 15B	13.6	10	18
3425122	C12orf29	12	chromosome 12 open reading frame 29	13.5	1	11
2530554	MFF	2	mitochondrial fission factor	13.2	1	14
3243742	BMS1	10	BMS1 homolog, ribosome assembly protein (yeast)	13.2	1	31
3169339	ALDH1B1	9	aldehyde dehydrogenase 1 family, member B1	13.1	1	10
2739191	CCDC109B	4	coiled-coil domain containing 109B	13.0	1	12
2571102	ANAPC1	2	anaphase-promoting complex subunit 1	12.8	1	70
3046682	TRGV5	7	TCR gamma alternate reading frame protein	12.7	1	1
2446619	STX6	1	syntaxin 6	12.7	1	11
4031175	RPS4Y2	Y	ribosomal protein S4, Y-linked 2	12.7	2	7
3458101	NACA	12	nascent polypeptide-associated complex alpha subunit	12.0	1	10
2350940	GSTM4	1	glutathione S-transferase mu 4	11.9	1	12
2676049	WDR82	3	WD repeat domain 82	11.6	1	12
3907879	ELMO2	20	engulfment and cell motility 2	11.4	1	27
3759912	LRRC37A4	17	leucine-rich repeat containing 37, member A4 (pseudogene)	11.3	3	18
3315556	PSMD13	11	proteasome (prosome, macropain) 26S subunit, nonATPase, 13	11.2	1	16
2369585	SOAT1	1	sterol O-acyltransferase 1	11.2	1	21
2492088	KDM3A	2	lysine (K)-specific demethylase 3A	10.8	1	35
3003206	CCT6A	7	chaperonin-containing TCP1, subunit 6A (zeta 1)	10.7	1	23
2821249	CAST	5	calpastatin	10.7	1	42
3641887	LINS1	15	lines homolog 1 (Drosophila)	10.3	1	16
3971880	EIF2S3	X	eukaryotic translation initiation factor 2, subunit 3 gamma, 52 kDa	10.3	1	15
3850437	KRI1	19	KRI1 homolog (S. cerevisiae)	10.1	1	24
3908171	ZMYND8	20	zinc finger, MYND-type containing 8	10.0	1	33
4029193	PRKY	Y	protein kinase, Y-linked	9.7	3	16
2501343	LOC654433	2	hypothetical LOC654433	9.6	1	8
3487448	DNAJC15	13	DnaJ (Hsp40) homolog, subfamily C, member 15	9.4	1	8
3462702	KRR1	12	KRR1, small subunit (SSU) processome component, homolog (yeast)	9.2	1	17
3742635	C17orf87	17	chromosome 17 open reading frame 87	9.0	1	6
3140703	STAU2	8	staufen, RNA binding protein, homolog 2 (Drosophila)	9.0	1	29

Open in a new tab

ID of the probe set of the gene with the highest F statistics.

F statistics of the top probe set of the gene.

Number of exons of the gene with ≥2 F-statistics.

Total number of exons of the gene in the annotation. The probe sets are sorted by the F-score of the participant.

Fig. 3. — Heat map of expression values of the top “fingerprinting” probe sets. These probe sets have the largest F-values for participant effects (Table 10, Supplementary Table S10) and are clustered with Ward's hierarchical clustering. The first 5 are Y-linked. Colors indicate expression values after subtracting the mean within RNA source, with red having high values. The participants are clustered perfectly in groups of 3, 1 for each RNA source indicating that perfect self-identification is possible from expression data, even across different RNA sources. Participant number indicated at bottom.

Fig. 4. — Expression profile of gene β-actin (*ACTB*) on chromosome 7p22. *ACTB* has 15 known exons and 14 known RefSeq transcripts. According to the Alternative Splicing Database (72), this gene is known to have 10 splice variants. A: gene level profile; B: 1 of 11 RefSeq-core probe sets, 3036926. Since the bimodal distribution is seen at the gene level, it is not likely to be solely the result of splice variation.

Fig. 5. — Expression profile of some fingerprinting probe sets. A: *EXOC1* probe set 2727952, B: *OAS1* probe set 3432446, C: *GSTM1* probe set 2350993, and D: *HLA-DRB1* probe set 4048279. These probe sets show strong, participant-specific variation in expression consistently in all 3 sample types and may reflect genetically determined variation in expression levels (e.g., functional SNPs, CNVs, imprinting, LOH) or variation in mRNA sequence in each participant (e.g., SNP) compared with the Affymetrix probe sequence.

Stable “calibration” genes.

Conversely, we also searched for genes expressed above background (>4.0 in log₂ RMA scale) and that had nonsignificant expression changes (<2.0-fold change, P value >0.2) across RNA sources and across participants. These genes would be valuable for batch corrections, meta-analysis across RNA sources or platforms, and for calibrating expression levels of transcripts of other genes (17). We found 139 genes meeting these criteria (Supplementary Table S11). Most are well-known and well-annotated protein coding genes. Many are known to be expressed in whole blood. Some of the most stable genes were CLCN6, TEAD3, ART5, COX6A2, SIRT5, ACTL6B, GPR50, GPR32, and RAB8B. Although these may not commonly be used as housekeeping genes, they are likely to be quite stable as calibration standards in future analyses using this platform. At the exon level, we found 1,544 exons representing 1,355 genes that passed similar selection criteria. Of these exons 25, representing 22 distinct genes, were common to the set selected at the gene level, including CLCN6, CSNK1G3, FAM48A, and RAB8B.

DISCUSSION

Each of the three RNA sources bears distinct characteristics, evident by the clear separation in the first two principal components (Fig. 2) and the finding that most genes were differentially expressed among the different sources (Table 2). Since most genes are expressed differentially across the RNA sources, their associations with each of the traits we studied are also different, warranting careful selection of the RNA source in a gene expression experiment. For the gene expression signature of sex, all three RNA sources yielded a large common subset of Y-chromosome genes strongly linked to sex. LCL samples were able to detect expression differences in X-chromosome genes between men and women, but this may be due to reversal of X-chromosome inactivation during EBV infection, cell immortalization, and culture. PBMC were better able to detect sex-linked autosomal genes than the other two RNA sources, although apparently none of the detected genes were also detected in prior studies (39), suggesting that our observation may be unique to our sample.

As cultured cells, LCL samples are less likely than PAX or PBMC samples to reflect in vivo expression changes. For example, LCL did not detect association between lymphocyte-related genes and lymphocyte differential counts. These findings, together with the perturbation of expression attributable to the EBV transformation process itself, suggest that LCL may be of limited value in identifying expression signatures of many health related traits. Prior work has shown limitations in the use of expression signatures in LCL due to their ex vivo status (16, 21). However, the ability of LCL to detect downregulation of ALDH2 in smokers suggests that epigenetic influences conditioned by the environment may still be encoded in LCL expression profiles.

It is important to note that a proportion of the differences observed between PAXgene and the other two sample types may be due to differences in preparation kits. As noted in materials and methods, PAX require a distinct preparation kit from that used for PBMC and LCL. However, by focusing on the minimal difference observed between each type vs. the other two (see Table 3), we attempted to report differences most likely attributable to underlying biological differences rather than simply due to technical sources. For example, the comparison of LCL with PBMC (which use the same preparation kit) shows very large differences for genes involved in cell-cycle pathways, as might be expected in transformed LCL cells.

Our study has several important advantages over prior studies. A balanced study design with three blood-derived RNA sources from each of 35 participants allows investigation of biomarkers and source-invariant genes to be undertaken more thoroughly. Indeed, few population-based expression studies include replicate samples in as many participants as are included here. This study includes multiple samples from the same individual, separated in time by as much as 5 yr. Expression patterns that persist across these samples are more likely to represent true stable phenotypes of the individual, than are those based on single, one-time measurements. Genes and exons showing variation in expression across the population, yet remaining consistent within the individual over years are likely to be enriched in useful expression biomarkers of risk factors or disease, compared with other genes. Furthermore, such genes and exons may be more likely to be associated with genetic factors (such as expression single nucleotide polymorphisms), than are genes having greater within-individual variation.

We showed that some of genes or exons showing variation in expression across our study sample can be used to distinguish individuals, suggesting that microarray expression data alone provide a personally identifiable fingerprint. In our study, only a tiny fraction of all exons distinguished individuals perfectly. This finding may prompt consideration of the identifiability of individuals within public microarray databases and whether safeguards are needed to protect their privacy. Conversely, we also provided result on stable and robust markers that may help researchers to calibrate their gene expression results. Calibration has been one of the major issues in gene expression analysis. We showed that conventional calibration genes, such as ACTB, may not be reliable.

We believe fingerprinting genes are useful in two contexts. First, in quality control of high-throughput assays, the identity of samples is sometimes questioned. Estimates of sample mix-ups often range up to 18% (79). If left unaddressed, this can introduce errors in the analysis and may possibly lead to the weakened or incorrect conclusion (47). Indeed some mix-ups were detected in the current study by aligning predicted sex based on Y-chromosome expression with that recorded in the database for the subject. When multiple samples from the same individuals are assayed, analysis of fingerprinting gene expression levels can be used to further identify mislabeled samples by clustering of such genes. The second context would be in searching for eQTLs (expression quantitative trait loci). A quantitative trait should be tightly coupled to the genome and recognizable regardless of when or in what tissue the gene expression level is measured. The set of fingerprinting genes are here shown to be stable within individual (in the small number of tissues tested) and over time (since the LCL cells were derived from an earlier blood draw, compared with the PAX and PBMC samples) and are thus good candidate quantitative traits. In searching for eQTLs, i.e., loci in the genome associated with quantitative traits, the fingerprinting genes, should be an excellent place to start. It has previously been noted that some genes are expressed in a bimodal fashion in the population (e.g., ACTB) and that a disproportionately large number of such genes have associations to disease (48). Many of our fingerprinting genes appear to express bimodally. Thus, it is reasonable to hypothesize that our fingerprinting genes might also contain a large fraction of genes (e.g., the HLA genes) related to disease or disease propensities.

Our study considered only blood-derived RNA sources, because this is one source likely to be widely available in a large population-based study. Although a desired tissue, such as brain in stroke patients, may be inaccessible, one can sometimes use blood as a surrogate, provided the relevant transcripts are similarly expressed in blood and brain. In certain situations (e.g., angioplasty, heart transplant, or coronary artery bypass graft surgery), it may be possible to obtain paired blood and heart tissue samples, from which the relevant transcripts expressed similarly in both can be determined. Accumulating such information will ultimately make blood-derived expression data in population-based studies more valuable in the future.

A larger sample size would have improved our power for biomarker discovery. The relatively narrow age range in this study likely prevented detection of extensive associations with age. In addition, analysis of many complex traits influenced by multiple genes each having modest effects (29) will require larger sample sizes. Larger sample size (or combining results of many studies) would have the additional benefit of further characterizing the measurement platform. The Affymetrix Exon array has ∼1.4 million probe sets, of which only about one-fourth were analyzed here. These probe sets were used because they correspond to well-annotated transcripts and have good performance characteristics. Many of the remaining probe sets have unknown performance characteristics or correspond to unannotated regions of the genome or to weakly annotated genes. Pooling experience from the growing number of published results on this platform will allow us to more sharply focus on the better-performing probe sets, while the general improvement of genome annotation will make other probe sets more useful in the future.

Although our pilot study was small and not intended for biomarker discovery, we were able to confirm associations of expression with lipid levels in two previously implicated genes, FADS1 and LDLR. While the observed effects were small, the magnitude, direction, and significance were consistent in PAX and PBMC samples, but not in LCL. This, again, suggests that LCL samples are less appropriate for detecting signatures related to health-related traits. The ability of even a small study to confirm associations with these well-established lipid-controlling genes lends optimism that more associations would be detected in a larger study, using either PAX or PBMC. Based on the results of this pilot, the larger, population-based SABRe in CVD Initiative will be using PAX as its RNA source and the Affymetrix Exon array platform. Completion of data collection is anticipated in late 2011.

GRANTS

The National Heart, Lung, and Blood Institute's (NHLBI's) FHS is supported by National Institutes of Health Grant NO1-HC-25195. The SABRe CVD Initiative is funded by the Division of Intramural Research, NHLBI, Bethesda, MD.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

AUTHOR CONTRIBUTIONS

D. L and P. J. M. designed, directed, and supervised the experiment. D. L. was responsible for funding of the project. R. J. and P. J. M. drafted the manuscript. P. J. M., D. L., A. D. J., and C. J. O. revised and edited the manuscript. R. J. and P. J. M. performed the statistical analysis. J. J. B. performed S10 normalization of the data. N. R., P. L., and K. A. W. collected the data. All authors have read and approved the final version of the manuscript.

Supplementary Material

Supplemental Material

suppmat.pdf^{(1,000.5KB, pdf)}

Footnotes

The online version of this article contains supplemental material.

REFERENCES

1.Affymetrix. Transcript assignment for NetAffx(TM) Annotations [online]. http://www.affymetrix.com/support/technical/byproduct.affx?product=huexon-st, 2006.
2.Affymetrix. Quality Assessment of Exon and Gene Arrays, 2007. [Google Scholar]
3.Affymetrix. GeneChip Whole Transcript (WT) Sense Target Labeling Assay Manual [online]. http://www.affymetrix.com/support/downloads/manuals/wt_sensetarget_label_manual.pdf.
4.Affymetrix. Affymetrix Power Tools [online]. http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx.
5.Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Dong Q, Zhang Q, Gu X, Vijayakrishnan J, Sullivan K, Matakidou A, Wang Y, Mills G, Doheny K, Tsai YY, Chen WV, Shete S, Spitz MR, Houlston RS. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40: 616–622, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Asare AL, Kolchinsky SA, Gao Z, Wang R, Raddassi K, Bourcier K, Seyfert-Margolis V. Differential gene expression profiles are dependent upon method of peripheral blood collection and RNA isolation. BMC Genomics 9: 474, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Avent ND, Reid ME. The Rh blood group system: a review. Blood 95: 375–387, 2000. [PubMed] [Google Scholar]
9.Baechler EC, Batliwalla FM, Karypis G, Gaffney PM, Ortmann WA, Espe KJ, Shark KB, Grande WJ, Hughes KM, Kapur V, Gregersen PK, Behrens TW. Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proc Natl Acad Sci USA 100: 2610–2615, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Barr TL, Conley Y, Ding J, Dillman A, Warach S, Singleton A, Matarin M. Genomic biomarkers and cellular pathways of ischemic stroke by RNA gene expression profiling. Neurology 75: 1009–1014, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57: 289–300, 1995. [Google Scholar]
12.Birkenbach M, Josefsen K, Yalamanchili R, Lenoir G, Kieff E. Epstein-Barr virus-induced genes: first lymphocyte-specific G protein-coupled peptide receptors. J Virol 67: 2209–2220, 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Blumenfeld OO, Huang CH. Molecular genetics of the glycophorin gene family, the antigens for MNSs blood groups: multiple gene rearrangements and modulation of splice site usage result in extensive diversification. Hum Mutat 6: 199–209, 1995. [DOI] [PubMed] [Google Scholar]
14.Boldrick JC, Alizadeh AA, Diehn M, Dudoit S, Liu CL, Belcher CE, Botstein D, Staudt LM, Brown PO, Relman DA. Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc Natl Acad Sci USA 99: 972–977, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Bourke E, Cassetti A, Villa A, Fadlon E, Colotta F, Mantovani A. IL-1 beta scavenging by the type II IL-1 decoy receptor in human neutrophils. J Immunol 170: 5999–6005, 2003. [DOI] [PubMed] [Google Scholar]
16.Cain CE, Blekhman R, Marioni JC, Gilad Y. Gene expression differences among primates are associated with changes in a histone epigenetic modification. Genetics 187: 1225–1234, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Caradec J, Sirab N, Keumeugni C, Moutereau S, Chimingqi M, Matar C, Revaud D, Bah M, Manivet P, Conti M, Loric S. “Desperate house genes”: the dramatic example of hypoxia. Br J Cancer 102: 1037–1043, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Casabonne D, Reina O, Benavente Y, Becker N, Maynadié M, Foretová L, Cocco P, González-Neira A, Nieters A, Boffetta P, Middeldorp JM, de Sanjose S. Single nucleotide polymorphisms of matrix metalloproteinase 9 (MMP9) and tumor protein 73 (TP73) interact with Epstein-Barr virus in chronic lymphocytic leukemia: results from the European case-control study EpiLymph. Haematologica 96: 323–327, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.CDC. Understanding Your Complete Blood Count [online]. http://www.cc.nih.gov/ccc/patient_education/pepubs/cbc97.pdf, 2008.
20.Chen PW, Lin SJ, Tsai SC, Lin JH, Chen MR, Wang JT, Lee CP, Tsai CH. Regulation of microtubule dynamics through phosphorylation on stathmin by Epstein-Barr virus kinase BGLF4. J Biol Chem 285: 10053–10063, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Choy E, Yelensky R, Bonakdar S, Plenge RM, Saxena R, De Jager PL, Shaw SY, Wolfish CS, Slavik JM, Cotsapas C, Rivas M, Dermitzakis ET, Cahir-McFarland E, Kieff E, Hafler D, Daly MJ, Altshuler D. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet 4: e1000287, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Conboy JG. Structure, function, and molecular genetics of erythroid membrane skeletal protein 4.1 in normal and abnormal red blood cells. Semin Hematol 30: 58–73, 1993. [PubMed] [Google Scholar]
23.Cox TC, Sadlon TJ, Schwarz QP, Matthews CS, Wise PD, Cox LL, Bottomley SS, May BK. The major splice variant of human 5-aminolevulinate synthase-2 contributes significantly to erythroid heme biosynthesis. Int J Biochem Cell Biol 36: 281–295, 2004. [DOI] [PubMed] [Google Scholar]
24.Dawber TR, Meadors GF, Moore FE., Jr Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health Nations Health 41: 279–281, 1951. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Debey S, Schoenbeck U, Hellmich M, Gathof BS, Pillai R, Zander T, Schultze JL. Comparison of different isolation techniques prior gene expression profiling of blood derived cells: impact on physiological responses, on overall expression and the role of different cell types. Pharmacogenomics J 4: 193–207, 2004. [DOI] [PubMed] [Google Scholar]
26.Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10: 48, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Feinleib M, Kannel WB, Garrison RJ, McNamara PM, Castelli WP. The Framingham Offspring Study. Prev Med 4: 518–525, 1975. [DOI] [PubMed] [Google Scholar]
28.Fujita K, Horikawa I, Mondal AM, Jenkins LMM, Appella E, Vojtesek B, Bourdon JC, Lane DP, Harris CC. Positive feedback between p53 and TRF2 during telomere-damage signalling and cellular senescence. Nat Cell Biol 12: 1205–1212, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Govindaraju DR, Larson MG, Yin X, Benjamin EJ, Rao MB, Vasan RS. Association between SNP heterozygosity and quantitative traits in the Framingham Heart Study. Ann Hum Genet 73: 465–473, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Grünblatt E, Bartl J, Zehetmayer S, Ringel TM, Bauer P, Riederer P, Jacob CP. Gene expression as peripheral biomarkers for sporadic Alzheimer's disease. J Alzheimers Dis 16: 627–634, 2009. [DOI] [PubMed] [Google Scholar]
31.Higgs DR, Vickers MA, Wilkie AO, Pretorius IM, Jarman AP, Weatherall DJ. A review of the molecular genetics of the human alpha-globin gene cluster. Blood 73: 1081–1104, 1989. [PubMed] [Google Scholar]
32.Hindle AK, Edwards C, McCaffrey T, Fu SW, Brody F. Reactivation of adiponectin expression in obese patients after bariatric surgery. Surg Endosc 24: 1367–1373, 2010. [DOI] [PubMed] [Google Scholar]
33.Huang RS, Chen P, Wisel S, Duan S, Zhang W, Cook EH, Das S, Cox NJ, Dolan ME. Population-specific GSTM1 copy number variation. Hum Mol Genet 18: 366–372, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249–264, 2003. [DOI] [PubMed] [Google Scholar]
35.Isensee J, Witt H, Pregla R, Hetzer R, Regitz-Zagrosek V, Noppinger PR. Sexually dimorphic gene expression in the heart of mice and men. J Mol Med 86: 61–74, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Iyoda T, Ushida M, Kimura Y, Minamino K, Hayuka A, Yokohata S, Ehara H, Inaba K. Invariant NKT cell anergy is induced by a strong TCR-mediated signal plus co-stimulation. Int Immunol 22: 905–913, 2010. [DOI] [PubMed] [Google Scholar]
37.Jacobsen LC, Sørensen OE, Cowland JB, Borregaard N, Theilgaard-Mönch K. The secretory leukocyte protease inhibitor (SLPI) and the secondary granule protein lactoferrin are synthesized in myelocytes, colocalize in subcellular fractions of neutrophils, and are coreleased by activated neutrophils. J Leukoc Biol 83: 1155–1164, 2008. [DOI] [PubMed] [Google Scholar]
38.Jeon JP, Kim JW, Park B, Nam HY, Shim SM, Lee MH, Han BG. Identification of tumor necrosis factor signaling-related proteins during Epstein-Barr virus-induced B cell transformation. Acta Virol 52: 151–159, 2008. [PubMed] [Google Scholar]
39.Jison ML, Munson PJ, Barb JJ, Suffredini AF, Talwar S, Logun C, Raghavachari N, Beigel JH, Shelhamer JH, Danner RL, Gladwin MT. Blood mononuclear cell gene expression profiles characterize the oxidant, hemolytic, and inflammatory stress of sickle cell disease. Blood 104: 270–280, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Johnson AD, O'Donnell CJ. An open access database of genome-wide association results. BMC Med Genet 10: 6, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Kashuba E, Yurchenko M, Yenamandra SP, Snopok B, Szekely L, Bercovich B, Ciechanover A, Klein G. Epstein-Barr virus-encoded EBNA-5 forms trimolecular protein complexes with MDM2 and p53 and inhibits the transactivating function of p53. Int J Cancer 128: 817–825, 2011. [DOI] [PubMed] [Google Scholar]
42.Khanna-Gupta A, Zibello T, Idone V, Sun H, Lekstrom-Himes J, Berliner N. Human neutrophil collagenase expression is C/EBP-dependent during myeloid development. Exp Hematol 33: 42–52, 2005. [DOI] [PubMed] [Google Scholar]
43.Kostylina G, Simon D, Fey MF, Yousefi S, Simon HU. Neutrophil apoptosis mediated by nicotinic acid receptors (GPR109A). Cell Death Differ 15: 134–142, 2008. [DOI] [PubMed] [Google Scholar]
44.Lane HC, Anand AR, Ganju RK. Cbl and Akt regulate CXCL8-induced and CXCR1- and CXCR2-mediated chemotaxis. Int Immunol 18: 1315–1325, 2006. [DOI] [PubMed] [Google Scholar]
45.Larousserie F, Bardel E, Coulomb L'Herminé A, Canioni D, Brousse N, Kastelein RA, Devergne O. Variable expression of Epstein-Barr virus-induced gene 3 during normal B-cell differentiation and among B-cell lymphomas. J Pathol 209: 360–368, 2006. [DOI] [PubMed] [Google Scholar]
46.Li CY, Zhan YQ, Xu CW, Xu WX, Wang SY, Lv J, Zhou Y, Yue PB, Chen B, Yang XM. EDAG regulates the proliferation and differentiation of hematopoietic cells and resists cell apoptosis through the activation of nuclear factor-kappa B. Cell Death Differ 11: 1299–1308, 2004. [DOI] [PubMed] [Google Scholar]
47.Malossini A, Blanzieri E, Ng RT. Assessment of SVM reliability of microarrays data analysis. 14th Dutch-Belgian Conference of Machine Learning. WP05–03, 2005. [Google Scholar]
48.Mason CC, Hanson RL, Ossowski V, Bian L, Baier LJ, Krakoff J, Bogardus C. Bimodal distribution of RNA expression levels in human skeletal muscle tissue. BMC Genomics 12: 98, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.McLaren JE, Zuo J, Grimstead J, Poghosyan Z, Bell AI, Rowe M, Brennan P. STAT1 contributes to the maintenance of the latency III viral programme observed in Epstein-Barr virus-transformed B cells and their recognition by CD8+ T cells. J Gen Virol 90: 2239–2250, 2009. [DOI] [PubMed] [Google Scholar]
50.Min JL, Barrett A, Watts T, Pettersson FH, Lockstone HE, Lindgren CM, Taylor JM, Allen M, Zondervan KT, McCarthy MI. Variability of gene expression profiles in human blood and lymphoblastoid cell lines. BMC Genomics 11: 96, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Mohan J, Dement-Brown J, Maier S, Ise T, Kempkes B, Tolnay M. Epstein-Barr virus nuclear antigen 2 induces FcRH5 expression through CBF1. Blood 107: 4433–4439, 2006. [DOI] [PubMed] [Google Scholar]
52.Munson PJ. A consistency test for determining the significance of gene expression changes on replicate samples and two convenient variance-stabilizing transformations [online]. GeneLogic Workshop on Low Level Analysis of Affymetrix GeneChip Data. http://stat-www.berkeley.edu/users/terry/zarray/Affy/GL_Workshop/genelogic2001.html.
53.Murtagh F. Multidimensional Clustering Algorithms. Würzburg: Physica-Verlag, 1985. [Google Scholar]
54.O'Donnell CJ, Elosua R. Cardiovascular risk factors. Insights from Framingham Heart Study. Rev Esp Cardiol 61: 299–310, 2008. [PubMed] [Google Scholar]
55.Oppenheimer GM. Becoming the Framingham Study 1947–1950. Am J Public Health 95: 602–610, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Palikhe NS, Kim JH, Park HS. Biomarkers predicting isocyanate-induced asthma. Allergy Asthma Immunol Res 3: 21–26, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Park JY, Matsuo K, Suzuki T, Ito H, Hosono S, Kawase T, Watanabe M, Oze I, Hida T, Yatabe Y, Mitsudomi T, Takezaki T, Tajima K, Tanaka H. Impact of smoking on lung cancer risk is stronger in those with the homozygous aldehyde dehydrogenase 2 null allele in a Japanese population. Carcinogenesis 31: 660–665, 2010. [DOI] [PubMed] [Google Scholar]
58.Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464: 768–772, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Pinheiro J, Bates D. Mixed-effects Models in S and S-PLUS. New York: Springer, 2009. [Google Scholar]
60.Provan S, Angel K, Semb AG, Atar D, Kvien TK. NT-proBNP predicts mortality in patients with rheumatoid arthritis: results from 10-year follow-up of the EURIDISS study. Ann Rheum Dis 69: 1946–1950, 2010. [DOI] [PubMed] [Google Scholar]
61.QIAGEN. RNeasy Plus Handbook [online]. http://www.qiagen.com/literature/render.aspx?id=103686.
62.QIAGEN. PAXgene Blood RNA Kit Handbook Version 2 [online]. http://www.qiagen.com/literature/render.aspx?id=104458.
63.R Development Core Team. R: A Language and Environment for Statistical Computing [online]. http://www.R-project.org.
64.Raghavachari N, Xu X, Harris A, Villagra J, Logun C, Barb J, Solomon MA, Suffredini AF, Danner RL, Kato G, Munson PJ, Morris SM, Jr, Gladwin MT. Amplified expression profiling of platelet transcriptome reveals changes in arginine metabolic pathways in patients with sickle cell disease. Circulation 115: 1551–1562, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Rana AP, Ruff P, Maalouf GJ, Speicher DW, Chishti AH. Cloning of human erythroid dematin reveals another member of the villin family. Proc Natl Acad Sci USA 90: 6651–6655, 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Rockett JC, Burczynski ME, Fornace AJ, Herrmann PC, Krawetz SA, Dix DJ. Surrogate tissue analysis: monitoring toxicant exposure and health status of inaccessible tissues through the analysis of accessible tissues and cells. Toxicol Appl Pharmacol 194: 189–199, 2004. [DOI] [PubMed] [Google Scholar]
67.Rollins B, Martin MV, Morgan L, Vawter MP. Analysis of whole genome biomarker expression in blood and brain. Am J Med Genet B Neuropsychiatr Genet 153B: 919–936, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Rowe M, Lear AL, Croom-Carter D, Davies AH, Rickinson AB. Three pathways of Epstein-Barr virus gene activation from EBNA1-positive latency in B lymphocytes. J Virol 66: 122–131, 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Rybicki AC, Musto S, Schwartz RS. Identification of a band-3 binding site near the N-terminus of erythrocyte membrane protein 4.2. Biochem J 309: 677–681, 1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Schaniel C, Rolink AG, Melchers F. Attractions and migrations of lymphoid cells in the organization of humoral immune responses. Adv Immunol 78: 111–168, 2001. [DOI] [PubMed] [Google Scholar]
71.Splansky GL, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin EJ, D'Agostino RB, Fox CS, Larson MG, Murabito JM, O'Donnell CJ, Vasan RS, Wolf PA, Levy D. The third generation cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. Am J Epidemiol 165: 1328–1335, 2007. [DOI] [PubMed] [Google Scholar]
72.Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA. ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34: D46–D55, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Stanietsky N, Mandelboim O. Paired NK cell receptors controlling NK cytotoxicity. FEBS Lett 584: 4895–4900, 2010. [DOI] [PubMed] [Google Scholar]
74.Tanner MJ. Molecular and cellular biology of the erythrocyte anion exchanger (AE1). Semin Hematol 30: 34–57, 1993. [PubMed] [Google Scholar]
75.Twine NC, Stover JA, Marshall B, Dukart G, Hidalgo M, Stadler W, Logan T, Dutcher J, Hudes G, Dorner AJ, Slonim DK, Trepicchio WL, Burczynski ME. Disease-associated expression profiles in peripheral blood mononuclear cells from patients with advanced renal cell carcinoma. Cancer Res 63: 6069–6075, 2003. [PubMed] [Google Scholar]
76.Venables WN, Ripley BD. Modern Applied Statistics With S (4th ed.). New York: Springer, 2002. [Google Scholar]
77.Wallace AE, Sales KJ, Catalano RD, Anderson RA, Williams ARW, Wilson MR, Schwarze J, Wang H, Rossi AG, Jabbour HN. Prostaglandin F2alpha-F-prostanoid receptor signaling promotes neutrophil chemotaxis via chemokine (C-X-C motif) ligand 1 in endometrial adenocarcinoma. Cancer Res 69: 5726–5733, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Waterworth DM, Ricketts SL, Song K, Chen L, Zhao JH, Ripatti S, Aulchenko YS, Zhang W, Yuan X, Lim N, Luan J, Ashford S, Wheeler E, Young EH, Hadley D, Thompson JR, Braund PS, Johnson T, Struchalin M, Surakka I, Luben R, Khaw KT, Rodwell SA, Loos RJF, Boekholdt SM, Inouye M, Deloukas P, Elliott P, Schlessinger D, Sanna S, Scuteri A, Jackson A, Mohlke KL, Tuomilehto J, Roberts R, Stewart A, Kesäniemi YA, Mahley RW, Grundy SM, McArdle W, Cardon L, Waeber G, Vollenweider P, Chambers JC, Boehnke M, Abecasis GR, Salomaa V, Järvelin MR, Ruokonen A, Barroso I, Epstein SE, Hakonarson HH, Rader DJ, Reilly MP, Witteman JCM, Hall AS, Samani NJ, Strachan DP, Barter P, van Duijn CM, Kooner JS, Peltonen L, Wareham NJ, McPherson R, Mooser V, Sandhu MS. Genetic variants influencing circulating lipid levels and risk of coronary artery disease. Arterioscler Thromb Vasc Biol 30: 2264–2276, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Jr, Marks JR, Nevins JR. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98: 11462–11467, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, Relman DA, Brown PO. Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci USA 100: 1896–1901, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

suppmat.pdf^{(1,000.5KB, pdf)}

[B1] 1.Affymetrix. Transcript assignment for NetAffx(TM) Annotations [online]. http://www.affymetrix.com/support/technical/byproduct.affx?product=huexon-st, 2006.

[B2] 2.Affymetrix. Quality Assessment of Exon and Gene Arrays, 2007. [Google Scholar]

[B3] 3.Affymetrix. GeneChip Whole Transcript (WT) Sense Target Labeling Assay Manual [online]. http://www.affymetrix.com/support/downloads/manuals/wt_sensetarget_label_manual.pdf.

[B4] 4.Affymetrix. Affymetrix Power Tools [online]. http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx.

[B5] 5.Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Dong Q, Zhang Q, Gu X, Vijayakrishnan J, Sullivan K, Matakidou A, Wang Y, Mills G, Doheny K, Tsai YY, Chen WV, Shete S, Spitz MR, Houlston RS. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40: 616–622, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Asare AL, Kolchinsky SA, Gao Z, Wang R, Raddassi K, Bourcier K, Seyfert-Margolis V. Differential gene expression profiles are dependent upon method of peripheral blood collection and RNA isolation. BMC Genomics 9: 474, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Avent ND, Reid ME. The Rh blood group system: a review. Blood 95: 375–387, 2000. [PubMed] [Google Scholar]

[B9] 9.Baechler EC, Batliwalla FM, Karypis G, Gaffney PM, Ortmann WA, Espe KJ, Shark KB, Grande WJ, Hughes KM, Kapur V, Gregersen PK, Behrens TW. Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proc Natl Acad Sci USA 100: 2610–2615, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Barr TL, Conley Y, Ding J, Dillman A, Warach S, Singleton A, Matarin M. Genomic biomarkers and cellular pathways of ischemic stroke by RNA gene expression profiling. Neurology 75: 1009–1014, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57: 289–300, 1995. [Google Scholar]

[B12] 12.Birkenbach M, Josefsen K, Yalamanchili R, Lenoir G, Kieff E. Epstein-Barr virus-induced genes: first lymphocyte-specific G protein-coupled peptide receptors. J Virol 67: 2209–2220, 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Blumenfeld OO, Huang CH. Molecular genetics of the glycophorin gene family, the antigens for MNSs blood groups: multiple gene rearrangements and modulation of splice site usage result in extensive diversification. Hum Mutat 6: 199–209, 1995. [DOI] [PubMed] [Google Scholar]

[B14] 14.Boldrick JC, Alizadeh AA, Diehn M, Dudoit S, Liu CL, Belcher CE, Botstein D, Staudt LM, Brown PO, Relman DA. Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc Natl Acad Sci USA 99: 972–977, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Bourke E, Cassetti A, Villa A, Fadlon E, Colotta F, Mantovani A. IL-1 beta scavenging by the type II IL-1 decoy receptor in human neutrophils. J Immunol 170: 5999–6005, 2003. [DOI] [PubMed] [Google Scholar]

[B16] 16.Cain CE, Blekhman R, Marioni JC, Gilad Y. Gene expression differences among primates are associated with changes in a histone epigenetic modification. Genetics 187: 1225–1234, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Caradec J, Sirab N, Keumeugni C, Moutereau S, Chimingqi M, Matar C, Revaud D, Bah M, Manivet P, Conti M, Loric S. “Desperate house genes”: the dramatic example of hypoxia. Br J Cancer 102: 1037–1043, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Casabonne D, Reina O, Benavente Y, Becker N, Maynadié M, Foretová L, Cocco P, González-Neira A, Nieters A, Boffetta P, Middeldorp JM, de Sanjose S. Single nucleotide polymorphisms of matrix metalloproteinase 9 (MMP9) and tumor protein 73 (TP73) interact with Epstein-Barr virus in chronic lymphocytic leukemia: results from the European case-control study EpiLymph. Haematologica 96: 323–327, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.CDC. Understanding Your Complete Blood Count [online]. http://www.cc.nih.gov/ccc/patient_education/pepubs/cbc97.pdf, 2008.

[B20] 20.Chen PW, Lin SJ, Tsai SC, Lin JH, Chen MR, Wang JT, Lee CP, Tsai CH. Regulation of microtubule dynamics through phosphorylation on stathmin by Epstein-Barr virus kinase BGLF4. J Biol Chem 285: 10053–10063, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Choy E, Yelensky R, Bonakdar S, Plenge RM, Saxena R, De Jager PL, Shaw SY, Wolfish CS, Slavik JM, Cotsapas C, Rivas M, Dermitzakis ET, Cahir-McFarland E, Kieff E, Hafler D, Daly MJ, Altshuler D. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet 4: e1000287, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Conboy JG. Structure, function, and molecular genetics of erythroid membrane skeletal protein 4.1 in normal and abnormal red blood cells. Semin Hematol 30: 58–73, 1993. [PubMed] [Google Scholar]

[B23] 23.Cox TC, Sadlon TJ, Schwarz QP, Matthews CS, Wise PD, Cox LL, Bottomley SS, May BK. The major splice variant of human 5-aminolevulinate synthase-2 contributes significantly to erythroid heme biosynthesis. Int J Biochem Cell Biol 36: 281–295, 2004. [DOI] [PubMed] [Google Scholar]

[B24] 24.Dawber TR, Meadors GF, Moore FE., Jr Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health Nations Health 41: 279–281, 1951. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Debey S, Schoenbeck U, Hellmich M, Gathof BS, Pillai R, Zander T, Schultze JL. Comparison of different isolation techniques prior gene expression profiling of blood derived cells: impact on physiological responses, on overall expression and the role of different cell types. Pharmacogenomics J 4: 193–207, 2004. [DOI] [PubMed] [Google Scholar]

[B26] 26.Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10: 48, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27.Feinleib M, Kannel WB, Garrison RJ, McNamara PM, Castelli WP. The Framingham Offspring Study. Prev Med 4: 518–525, 1975. [DOI] [PubMed] [Google Scholar]

[B28] 28.Fujita K, Horikawa I, Mondal AM, Jenkins LMM, Appella E, Vojtesek B, Bourdon JC, Lane DP, Harris CC. Positive feedback between p53 and TRF2 during telomere-damage signalling and cellular senescence. Nat Cell Biol 12: 1205–1212, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29.Govindaraju DR, Larson MG, Yin X, Benjamin EJ, Rao MB, Vasan RS. Association between SNP heterozygosity and quantitative traits in the Framingham Heart Study. Ann Hum Genet 73: 465–473, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.Grünblatt E, Bartl J, Zehetmayer S, Ringel TM, Bauer P, Riederer P, Jacob CP. Gene expression as peripheral biomarkers for sporadic Alzheimer's disease. J Alzheimers Dis 16: 627–634, 2009. [DOI] [PubMed] [Google Scholar]

[B31] 31.Higgs DR, Vickers MA, Wilkie AO, Pretorius IM, Jarman AP, Weatherall DJ. A review of the molecular genetics of the human alpha-globin gene cluster. Blood 73: 1081–1104, 1989. [PubMed] [Google Scholar]

[B32] 32.Hindle AK, Edwards C, McCaffrey T, Fu SW, Brody F. Reactivation of adiponectin expression in obese patients after bariatric surgery. Surg Endosc 24: 1367–1373, 2010. [DOI] [PubMed] [Google Scholar]

[B33] 33.Huang RS, Chen P, Wisel S, Duan S, Zhang W, Cook EH, Das S, Cox NJ, Dolan ME. Population-specific GSTM1 copy number variation. Hum Mol Genet 18: 366–372, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34.Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249–264, 2003. [DOI] [PubMed] [Google Scholar]

[B35] 35.Isensee J, Witt H, Pregla R, Hetzer R, Regitz-Zagrosek V, Noppinger PR. Sexually dimorphic gene expression in the heart of mice and men. J Mol Med 86: 61–74, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 36.Iyoda T, Ushida M, Kimura Y, Minamino K, Hayuka A, Yokohata S, Ehara H, Inaba K. Invariant NKT cell anergy is induced by a strong TCR-mediated signal plus co-stimulation. Int Immunol 22: 905–913, 2010. [DOI] [PubMed] [Google Scholar]

[B37] 37.Jacobsen LC, Sørensen OE, Cowland JB, Borregaard N, Theilgaard-Mönch K. The secretory leukocyte protease inhibitor (SLPI) and the secondary granule protein lactoferrin are synthesized in myelocytes, colocalize in subcellular fractions of neutrophils, and are coreleased by activated neutrophils. J Leukoc Biol 83: 1155–1164, 2008. [DOI] [PubMed] [Google Scholar]

[B38] 38.Jeon JP, Kim JW, Park B, Nam HY, Shim SM, Lee MH, Han BG. Identification of tumor necrosis factor signaling-related proteins during Epstein-Barr virus-induced B cell transformation. Acta Virol 52: 151–159, 2008. [PubMed] [Google Scholar]

[B39] 39.Jison ML, Munson PJ, Barb JJ, Suffredini AF, Talwar S, Logun C, Raghavachari N, Beigel JH, Shelhamer JH, Danner RL, Gladwin MT. Blood mononuclear cell gene expression profiles characterize the oxidant, hemolytic, and inflammatory stress of sickle cell disease. Blood 104: 270–280, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 40.Johnson AD, O'Donnell CJ. An open access database of genome-wide association results. BMC Med Genet 10: 6, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 41.Kashuba E, Yurchenko M, Yenamandra SP, Snopok B, Szekely L, Bercovich B, Ciechanover A, Klein G. Epstein-Barr virus-encoded EBNA-5 forms trimolecular protein complexes with MDM2 and p53 and inhibits the transactivating function of p53. Int J Cancer 128: 817–825, 2011. [DOI] [PubMed] [Google Scholar]

[B42] 42.Khanna-Gupta A, Zibello T, Idone V, Sun H, Lekstrom-Himes J, Berliner N. Human neutrophil collagenase expression is C/EBP-dependent during myeloid development. Exp Hematol 33: 42–52, 2005. [DOI] [PubMed] [Google Scholar]

[B43] 43.Kostylina G, Simon D, Fey MF, Yousefi S, Simon HU. Neutrophil apoptosis mediated by nicotinic acid receptors (GPR109A). Cell Death Differ 15: 134–142, 2008. [DOI] [PubMed] [Google Scholar]

[B44] 44.Lane HC, Anand AR, Ganju RK. Cbl and Akt regulate CXCL8-induced and CXCR1- and CXCR2-mediated chemotaxis. Int Immunol 18: 1315–1325, 2006. [DOI] [PubMed] [Google Scholar]

[B45] 45.Larousserie F, Bardel E, Coulomb L'Herminé A, Canioni D, Brousse N, Kastelein RA, Devergne O. Variable expression of Epstein-Barr virus-induced gene 3 during normal B-cell differentiation and among B-cell lymphomas. J Pathol 209: 360–368, 2006. [DOI] [PubMed] [Google Scholar]

[B46] 46.Li CY, Zhan YQ, Xu CW, Xu WX, Wang SY, Lv J, Zhou Y, Yue PB, Chen B, Yang XM. EDAG regulates the proliferation and differentiation of hematopoietic cells and resists cell apoptosis through the activation of nuclear factor-kappa B. Cell Death Differ 11: 1299–1308, 2004. [DOI] [PubMed] [Google Scholar]

[B47] 47.Malossini A, Blanzieri E, Ng RT. Assessment of SVM reliability of microarrays data analysis. 14th Dutch-Belgian Conference of Machine Learning. WP05–03, 2005. [Google Scholar]

[B48] 48.Mason CC, Hanson RL, Ossowski V, Bian L, Baier LJ, Krakoff J, Bogardus C. Bimodal distribution of RNA expression levels in human skeletal muscle tissue. BMC Genomics 12: 98, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 49.McLaren JE, Zuo J, Grimstead J, Poghosyan Z, Bell AI, Rowe M, Brennan P. STAT1 contributes to the maintenance of the latency III viral programme observed in Epstein-Barr virus-transformed B cells and their recognition by CD8+ T cells. J Gen Virol 90: 2239–2250, 2009. [DOI] [PubMed] [Google Scholar]

[B50] 50.Min JL, Barrett A, Watts T, Pettersson FH, Lockstone HE, Lindgren CM, Taylor JM, Allen M, Zondervan KT, McCarthy MI. Variability of gene expression profiles in human blood and lymphoblastoid cell lines. BMC Genomics 11: 96, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51.Mohan J, Dement-Brown J, Maier S, Ise T, Kempkes B, Tolnay M. Epstein-Barr virus nuclear antigen 2 induces FcRH5 expression through CBF1. Blood 107: 4433–4439, 2006. [DOI] [PubMed] [Google Scholar]

[B52] 52.Munson PJ. A consistency test for determining the significance of gene expression changes on replicate samples and two convenient variance-stabilizing transformations [online]. GeneLogic Workshop on Low Level Analysis of Affymetrix GeneChip Data. http://stat-www.berkeley.edu/users/terry/zarray/Affy/GL_Workshop/genelogic2001.html.

[B53] 53.Murtagh F. Multidimensional Clustering Algorithms. Würzburg: Physica-Verlag, 1985. [Google Scholar]

[B54] 54.O'Donnell CJ, Elosua R. Cardiovascular risk factors. Insights from Framingham Heart Study. Rev Esp Cardiol 61: 299–310, 2008. [PubMed] [Google Scholar]

[B55] 55.Oppenheimer GM. Becoming the Framingham Study 1947–1950. Am J Public Health 95: 602–610, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B56] 56.Palikhe NS, Kim JH, Park HS. Biomarkers predicting isocyanate-induced asthma. Allergy Asthma Immunol Res 3: 21–26, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B57] 57.Park JY, Matsuo K, Suzuki T, Ito H, Hosono S, Kawase T, Watanabe M, Oze I, Hida T, Yatabe Y, Mitsudomi T, Takezaki T, Tajima K, Tanaka H. Impact of smoking on lung cancer risk is stronger in those with the homozygous aldehyde dehydrogenase 2 null allele in a Japanese population. Carcinogenesis 31: 660–665, 2010. [DOI] [PubMed] [Google Scholar]

[B58] 58.Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464: 768–772, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B59] 59.Pinheiro J, Bates D. Mixed-effects Models in S and S-PLUS. New York: Springer, 2009. [Google Scholar]

[B60] 60.Provan S, Angel K, Semb AG, Atar D, Kvien TK. NT-proBNP predicts mortality in patients with rheumatoid arthritis: results from 10-year follow-up of the EURIDISS study. Ann Rheum Dis 69: 1946–1950, 2010. [DOI] [PubMed] [Google Scholar]

[B61] 61.QIAGEN. RNeasy Plus Handbook [online]. http://www.qiagen.com/literature/render.aspx?id=103686.

[B62] 62.QIAGEN. PAXgene Blood RNA Kit Handbook Version 2 [online]. http://www.qiagen.com/literature/render.aspx?id=104458.

[B63] 63.R Development Core Team. R: A Language and Environment for Statistical Computing [online]. http://www.R-project.org.

[B64] 64.Raghavachari N, Xu X, Harris A, Villagra J, Logun C, Barb J, Solomon MA, Suffredini AF, Danner RL, Kato G, Munson PJ, Morris SM, Jr, Gladwin MT. Amplified expression profiling of platelet transcriptome reveals changes in arginine metabolic pathways in patients with sickle cell disease. Circulation 115: 1551–1562, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B65] 65.Rana AP, Ruff P, Maalouf GJ, Speicher DW, Chishti AH. Cloning of human erythroid dematin reveals another member of the villin family. Proc Natl Acad Sci USA 90: 6651–6655, 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B66] 66.Rockett JC, Burczynski ME, Fornace AJ, Herrmann PC, Krawetz SA, Dix DJ. Surrogate tissue analysis: monitoring toxicant exposure and health status of inaccessible tissues through the analysis of accessible tissues and cells. Toxicol Appl Pharmacol 194: 189–199, 2004. [DOI] [PubMed] [Google Scholar]

[B67] 67.Rollins B, Martin MV, Morgan L, Vawter MP. Analysis of whole genome biomarker expression in blood and brain. Am J Med Genet B Neuropsychiatr Genet 153B: 919–936, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B68] 68.Rowe M, Lear AL, Croom-Carter D, Davies AH, Rickinson AB. Three pathways of Epstein-Barr virus gene activation from EBNA1-positive latency in B lymphocytes. J Virol 66: 122–131, 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B69] 69.Rybicki AC, Musto S, Schwartz RS. Identification of a band-3 binding site near the N-terminus of erythrocyte membrane protein 4.2. Biochem J 309: 677–681, 1995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B70] 70.Schaniel C, Rolink AG, Melchers F. Attractions and migrations of lymphoid cells in the organization of humoral immune responses. Adv Immunol 78: 111–168, 2001. [DOI] [PubMed] [Google Scholar]

[B71] 71.Splansky GL, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin EJ, D'Agostino RB, Fox CS, Larson MG, Murabito JM, O'Donnell CJ, Vasan RS, Wolf PA, Levy D. The third generation cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. Am J Epidemiol 165: 1328–1335, 2007. [DOI] [PubMed] [Google Scholar]

[B72] 72.Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA. ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34: D46–D55, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B73] 73.Stanietsky N, Mandelboim O. Paired NK cell receptors controlling NK cytotoxicity. FEBS Lett 584: 4895–4900, 2010. [DOI] [PubMed] [Google Scholar]

[B74] 74.Tanner MJ. Molecular and cellular biology of the erythrocyte anion exchanger (AE1). Semin Hematol 30: 34–57, 1993. [PubMed] [Google Scholar]

[B75] 75.Twine NC, Stover JA, Marshall B, Dukart G, Hidalgo M, Stadler W, Logan T, Dutcher J, Hudes G, Dorner AJ, Slonim DK, Trepicchio WL, Burczynski ME. Disease-associated expression profiles in peripheral blood mononuclear cells from patients with advanced renal cell carcinoma. Cancer Res 63: 6069–6075, 2003. [PubMed] [Google Scholar]

[B76] 76.Venables WN, Ripley BD. Modern Applied Statistics With S (4th ed.). New York: Springer, 2002. [Google Scholar]

[B77] 77.Wallace AE, Sales KJ, Catalano RD, Anderson RA, Williams ARW, Wilson MR, Schwarze J, Wang H, Rossi AG, Jabbour HN. Prostaglandin F2alpha-F-prostanoid receptor signaling promotes neutrophil chemotaxis via chemokine (C-X-C motif) ligand 1 in endometrial adenocarcinoma. Cancer Res 69: 5726–5733, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B78] 78.Waterworth DM, Ricketts SL, Song K, Chen L, Zhao JH, Ripatti S, Aulchenko YS, Zhang W, Yuan X, Lim N, Luan J, Ashford S, Wheeler E, Young EH, Hadley D, Thompson JR, Braund PS, Johnson T, Struchalin M, Surakka I, Luben R, Khaw KT, Rodwell SA, Loos RJF, Boekholdt SM, Inouye M, Deloukas P, Elliott P, Schlessinger D, Sanna S, Scuteri A, Jackson A, Mohlke KL, Tuomilehto J, Roberts R, Stewart A, Kesäniemi YA, Mahley RW, Grundy SM, McArdle W, Cardon L, Waeber G, Vollenweider P, Chambers JC, Boehnke M, Abecasis GR, Salomaa V, Järvelin MR, Ruokonen A, Barroso I, Epstein SE, Hakonarson HH, Rader DJ, Reilly MP, Witteman JCM, Hall AS, Samani NJ, Strachan DP, Barter P, van Duijn CM, Kooner JS, Peltonen L, Wareham NJ, McPherson R, Mooser V, Sandhu MS. Genetic variants influencing circulating lipid levels and risk of coronary artery disease. Arterioscler Thromb Vasc Biol 30: 2264–2276, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B79] 79.West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Jr, Marks JR, Nevins JR. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98: 11462–11467, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B80] 80.Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, Relman DA, Brown PO. Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci USA 100: 1896–1901, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Gene expression analysis of whole blood, peripheral blood mononuclear cells, and lymphoblastoid cell lines from the Framingham Heart Study

Roby Joehanes

Andrew D Johnson

Jennifer J Barb

Nalini Raghavachari

Poching Liu

Kimberly A Woodhouse

Christopher J O'Donnell

Peter J Munson

Daniel Levy

Abstract

MATERIALS AND METHODS

Study Samples

Individual Trait Data

RNA Isolation and Target Labeling

PAXgene samples.

PBMC samples.

LCL samples.

Microarray Hybridization

Expression Data Analysis

Quality control.

Postnormalization methods.

Fig. 1.

Statistical Methods

Transcript profile associations with age, sex, and selected CVD risk factors.

Differential blood count analysis.

Gene ontology analysis.

RESULTS

RNA Source Comparison

Table 1.

Fig. 2.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Analysis of Differential Blood Count Data

Table 7.

Identification of Genes Associated With Major CVD Risk Factors

Sex.

Table 8.

Table 9.

Smoking.

Age.

HDL cholesterol levels.

Robust and Consistent Markers

“Fingerprinting” genes.

Table 10.

Fig. 3.

Fig. 4.

Fig. 5.

Stable “calibration” genes.

DISCUSSION

GRANTS

DISCLOSURES

AUTHOR CONTRIBUTIONS

Supplementary Material

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases