Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Dec 7.
Published in final edited form as: J Proteome Res. 2012 Oct 30;11(12):5592–5601. doi: 10.1021/pr300796m

Improving International Research with Clinical Specimens: 5 Achievable Objectives

Joshua LaBaer 1
PMCID: PMC3640360  NIHMSID: NIHMS418770  PMID: 22998582

Abstract

Our increased interest in translational research has created a large demand for blood, tissue and other clinical samples, which find use in a broad variety of research including genomics, proteomics, and metabolomics. Hundreds of millions of dollars have been invested internationally on the collection, storage and distribution of samples. Nevertheless, many researchers complain in frustration about their inability to obtain relevant and/or useful samples for their research. Lack of access to samples, poor condition of samples, and unavailability of appropriate control samples have slowed our progress in the study of diseases and biomarkers. In this editorial, I focus on five major challenges that thwart clinical sample use for translational research and propose near term objectives to address them. They include: (1) defining our biobanking needs; (2) increasing the use of and access to standard operating procedures; (3) mapping inter-observer differences for use in normalizing diagnoses; (4) identifying natural internal protein controls; and (5) redefining the clinical sample paradigm by building partnerships with the public. In each case, I believe that we have the tools at hand required to achieve the objective within 5 years. Potential paths to achieve these objectives are explored. However we solve these problems, the future of proteomics depends on access to high quality clinical samples, collected under standardized conditions, accurately annotated and shared under conditions that promote the research we need to do.

Keywords: Specimen, biobank, clinical, translational, public partnership, standard operating procedure, sample processing, tissue, blood, standardization, healthcare, disease, reference standard, informed consent, donors, ethics, governance, proteomics, cohort, case/control, database, calibration, diversity, inclusion


Biomedical research is shifting its focus from the study of artificial cultured cell lines towards the direct study of clinical specimens. A host of artifacts in established cell lines including severe aneuploidy, genetic and phenotypic drift and misidentified tissue sources limit the conclusions that can be drawn about the natural behaviors of cells and have motivated scientists to focus more on clinically derived specimens1. The demand for blood, tissue and other clinical samples is larger than ever. These samples are used in genomics, proteomics, metabolomics, diagnostics, basic science and countless other forms of research. Investigation into the importance of specimen collection and improving quality control has been ongoing for more than 30 years.2, 3 In recent years, huge advances in the technologies that enable the collection, storage and distribution of the samples have engendered the new science of biobanking (Marko-Varga et al; ASAP JPR reference: DOI: 10.1021/pr300185k). Concomitant with this, funding organizations have spent hundreds of millions of dollars on the collection and storage of samples. Nevertheless, many researchers complain in frustration about their inability to obtain relevant and/or useful samples for their research4. Often, they must search to find collaborators with samples that can be used for their experiments only to be disheartened that the appropriate controls are not available or the condition of the samples is not satisfactory for their studies. During the early stages of The Cancer Genome Atlas (TCGA), an enthusiastic community response offering thousands of clinical samples for genomic sequencing of tumor samples reneged on a staggering dropout rate of 99% for a broad variety of reasons 5. In many cases, the lack of access to appropriate samples limits our progress in the study of diseases and biomarkers.

We continue to face challenges that span a wide range of aspects of clinical translational research from the need to better standardize how we collect, process and store clinical samples to social considerations regarding how biobanks are governed and how we involve the public, the source of these samples 6. There is an increased focus on ethical issues recently as the demands for the use of existing samples, collected for a different purpose, in new studies theoretically challenge the notion of anonymity7, 8, 9.

The challenges that surround the use of clinical samples affect the proteomics community more than others. Nearly everything we do requires testing clinical samples. Proteins are notoriously labile, creating the highest demand for sample quality. It follows then that we, as a community, should take a central role in addressing these challenges in order to advance this field and to ensure that the data and materials we collect will be both accurate and relevant.

In this editorial, I focus on five major challenges that thwart clinical sample use for translational research and propose near term objectives to address them. In each case, I believe that we have the tools at hand required to achieve the objective. What we now need most are organized efforts to tackle them. Many might characterize these objectives as unexciting, yet they address critical bottlenecks to the advancement of our field.

Challenge #1: Assessing our biobanking needs

There is no doubt that we spend considerable resources on creating and maintaining biobanks. Not long ago, the UK committed $105 M for the creation of a biobank at the University of Manchester 5 while Japan invested $218 M at the University of Tokyo 10. Other countries, including Iceland, Canada, Estonia, Spain and Finland have also made significant investments10, 11. In the USA, funding for biobanking spreads across many programs and projects, often funded by individual institutes of the National Institutes of Health (NIH), which support a wide variety of sample collection activities. As just one example, the National Cancer Institute (NCI) provides nearly $9 M per year to biobanks supporting its clinical trials network alone and additional funds for sample collection in other programs such as the Sponsored Programs of Research Excellence (SPORE), the Early Detection Research Network (EDRN), and the Cancer Centers, in addition to all of their specific research projects. Other institutes at NIH, funding agencies, industry, and institutions such as universities, hospitals and state governments have all invested in this important activity. In spite of these large international investments, many researchers feel challenged finding the samples they need for their projects. This begs the question, are we spending our biobank investments in the right places?

Objective #1: Assess the current availability of clinical sample resources and outline the clinical specimen needs for the proteomics community for the next 5 – 10 years

The most ambitious of the objectives listed here, I place this first because of its logical position and its exigency to overall success. By its nature it is a broad reaching objective that must cover varied topics including the types of biobanks, the types of specimens, the technical requirements for the specimens, and the annotation attached to the specimens. Table 1 provides a starting point for some of the topics that should be addressed. Each of these topics is complex and will require specialized input from experts.

Table 1.

Topics for developing a 5-10 year biospecimen plan for proteomics

Assessing Current Status Recommendations for the next 5–10 years

What are the relative numbers for different biobank types?
  1. Individual

  2. Specific project

  3. Grant program

  4. Public repository

What would be the ideal balance?
  1. Individual

  2. Specific project

  3. Grant program

  4. Public repository

How well utilized are each of these different types? Are there other types of biobanks that should be
recommended?
What characteristics contribute to the successful biobanks?
What characteristics limit use of others? Can we develop a rapid, on-demand approach to biobanks?
From past experience, what should we avoid?
What should we include?

What is the current proportion of cohort vs. case/control
collections?
What is the ideal proportion of cohort vs. case/control
collections?
What is the relative investment in cohort vs. case/control
collection?
What is the ideal relative investment that should be made?
Which cohorts have been used efficiently? For longitudinal studies, what are the overall time horizon
and sampling frequency needs?
Which case/control collections have been used efficiently? Are there important cohorts, currently not collected, that
should be collected?
Which characteristics have led to successful use of
case/control collections?
Which diseases should be represented in the case/control
collections?
What time horizons have been necessary for successful
longitudinal studies?
How can we better ensure that researchers know where to
look to find specimens?
What are the burdens and advantages of different sampling
frequencies in longitudinal studies?

How do researchers access samples in the current biobanks? What are the best ways to govern future biobanks?
Are there defined processes in place? What are the best ways to manage access to samples?
Are they efficient? How do we balance broad access to samples with the need to
preserve precious resources?
Which processes work well, which poorly?

Assess the samples currently available
  1. Which samples are available?

    1. Blood

    2. Serum

    3. Tissue – which ones?

    4. Other

  2. How have they been processed?

    1. Nucleic acid

    2. Protein

    3. Staining for microscopy

    4. Frozen

    5. Other

  3. Were they collected and processed using SOPs?

  4. Are their SOPs documented?

  5. What types of control samples are available?

  6. What types of storage are used?

Defining the sample needs
  1. Which types of samples are needed?

    1. Blood

    2. Serum

    3. Tissue – which ones?

    4. Other

  2. What kind of processing is needed?

    1. Nucleic acid

    2. Protein

    3. Staining for microscopy

    4. Frozen

    5. Other

  3. Which processing SOPs are recommended?

  4. Which control samples are needed?

  5. Ideal storage conditions?


Defining the annotation needs for future samples
  1. Clinical annotation – What is ideal? What is practical?

  2. Processing history

  3. Informed consent and usage guidelines

  4. Ability to capture follow up data

  5. Other

Defining the annotation needs for future samples
  1. Clinical annotation – What is ideal? What is practical?

  2. Processing history

  3. Informed consent and usage guidelines

  4. Ability to capture follow up data

  5. Other


Which biobanks work with commercial entities? How should future biobanks work with commercial entities?
Which aspects have been successful and unsuccessful? How should this be balanced?

As indicated in Table 1, we should proceed in two stages. The first will be to assess the resources that already exist and determine what has worked well and what has not. The second will focus on forecasting the needs for clinical specimens and making recommendations for the future. Although Eiseman and Haga prepared a comprehensive evaluation of existing tissue-based resources in 199912, it is clearly time to re-address this question. As we consider existing biobanks, we must consider who does the sample collecting. Typically, these include: individual researchers, specific research projects, major grant programs and organized public repositories. As indicated in Table 2, these collection types reflect a spectrum of sample quality and general usefulness that correlate with cost. Typically, the best quality samples are collected by the formal public repositories, which often include years of detailed planning in designing their standard operating procedures (SOPs) for sample collection and processing, as well as statistical considerations. Not surprisingly, these formal repositories are more likely to employ recommended standards from organizations devoted to advancing biobanking science, like the International Society for Biological and Environmental Repositories (ISBER; and others noted below). Yet even these repositories, often particularly expensive to create, cannot do everything for everyone. For example, they may have a specific focus, such as blood samples, which would not help the proteomicist interested in studying tissue.

Table 2.

Characteristics of Different Biobank Collection Types

Description Cost Appropriate for general use
Individual Clinicians with access to
specimens set aside “unused”
samples for future research
not yet specified
Low Low
Often not processed under standardized conditions
Often not well annotated
Not publicized, and therefore difficult to access
Specific
projects
Specific research project with
defined needs (and often a
budget) for samples
Moderate Moderate to low
Likely collected under conditions appropriate for
the defined project. Amount of sample may be
limited. Collections not usually publicized, so
difficult to access
Grant
programs
Programs like SPORES,
Cancer Centers, Clinical
trials networks and Centers
of Excellence collect samples
for both defined projects and
some anticipated future
projects not yet specified
Moderate
to high
Moderate
For those collections that include support for
undefined future projects, these samples may be
useful. The collections are publicized within their
grant programs, but variably outside that circle.
Access procedures vary.
Formal
public
repository
Repository created to collect
samples for numerous
projects, including future
projects not yet specified.
Such repositories often have
a specific theme (e.g., disease
genetics).
High Moderate to High
Samples collected with best practices and under
standardized conditions, well-defined storage
conditions, well annotated, well publicized with
defined access procedures, sample types may limit
application to specific research type (e.g., blood
only); may include restrictions on access (e.g.,
limited to researchers in a specific country).

Perhaps the most telling indicator of a successful repository is whether researchers use it. A common perception is that countless samples exist and are building accretions of ice while researchers cannot gain access to them. We need to understand if this is true. If so, where are the roadblocks? Is it that researchers simply do not know where to look for samples? Are there problems with procedures to obtain the samples? Is the annotation of existing samples insufficient? Or are there other problems? Particularly helpful here will be to examine the characteristics that make useful repositories successful.

As we look forward to outlining future needs, we will need to predict the likely direction of proteomics. Increased interests in multiple reaction monitoring, targeting specific proteins, quantifying protein levels and examining a broad variety of post translational modifications are all likely to impact the kinds of specimens we need and how they should be processed.

We must also consider our future biological and biomedical interests. Early disease detection, patient stratification, companion diagnostics and mapping protein pathways based on genomic data will all influence when and how we collect samples and from whom. A critical question here is the balance needed between cohort studies and case/control studies. In cohort studies, samples and clinical data are collected from a defined population, e.g., registered nurses, residents of Framingham, MA, etc. Information and samples are collected prospectively and often longitudinally over many years from numerous individuals who are apparently healthy when they enroll in the study. As a key advantage, this type of study offers the ability to observe the change in candidate biomarkers during the transition from health to disease. Moreover, prospective sample collection significantly reduces risks of bias and potentially allows detection of altered biomarkers prior to the onset of illness. However, cohort studies are costly to set up and to maintain, in large part because so many individuals must be followed. In its first 10 years, the Nurses’ Health Study collected 1799 cases of breast cancer, the most common women’s malignancy; however, to accomplish this, they enrolled more than 120,000 women 13. Cohort studies also carry the risk of certain biases, such as lead time bias, in which increased vigilance detects disease earlier than usual giving the false appearance of both higher risk and better outcomes, and selection bias, in which individuals who “self-select” to participate in the study may have a different risk aggregate than the general population.

By contrast, case/control studies are much less expensive and more likely to collect adequate numbers of specimens for rare diseases. Here, investigators collect samples and data from individuals known to have the disease of interest. These patients identify themselves when they appear at the relevant clinic, reducing constraints on the investigators due to low disease incidence. However, in order to make appropriate comparisons, investigators must also examine specimens from control individuals. A particular challenge for case/control studies is determining which controls are appropriate and how many to collect (e.g., Healthy individuals? Individuals with related diseases? Individuals with similar demographics?). Moreover, the physical collection of control samples must often occur under different circumstances than those from the cases, e.g., the cases may be collected at a specialty clinic where healthy patients are rarely seen, introducing a potential source of sample bias 14. Some of the advantages and disadvantages of these collection methods are highlighted in Table 3. Clearly, both types of collections will be needed and the key will be to strike the best balance.

Table 3.

Advantages and Disadvantages of Cohort vs. Case/Control Biobanks

Cohort Case/Control
Advantages
  1. Longitudinal samples arrive prospectively in the same manner as do samples in actual clinical settings - cases and non- cases are collected identically.

  2. The opportunity to evaluate samples prior to clinical presentation, which is invaluable for early detection biomarkers

  3. Longitudinal samples enable the patient to be his/her own control

  4. Enables the monitoring of changes over the course of an illness

  1. Much more cost effective way to collect samples

  2. Ensures enough case samples to study rare diseases

  3. Better control of the disease-related factors in the studied collection - e.g., Can ensure a broad representation of disease subtypes in the collection

Disadvantages
  1. Costly to maintain and to collect data and samples

  2. Requires extraordinary commitment to funding over long periods of time

  3. Requires very large populations without epidemiological biases in order to get adequate sampling of rare diseases

  4. Subj ect to outcome biases - e.g., lead time bias, selection bias, etc.

  1. There may be biases in sample processing and collection because it is difficult to collect case and control samples identically in retrospective samples

  2. Difficult to match controls to cases - i.e., to select the appropriate controls - healthy, related diseases, demographics, etc.

  3. May be difficult for individuals to recall exposures and risk factors - source of bias

  4. Forces pre-selection of the disease and disease factors

  5. Does not allow the calculation of disease incidence

There is no doubt that developing a sufficient plan will be a major undertaking. Yet, the current ad hoc approach to building biobanks is cumbersome and problematic. It can take a year or more to obtain the preliminary approvals and funding for even a modest study and then much longer to recruit volunteers and collect specimens. An innovative approach under consideration is rapid, on-demand assembly of biobanks. By this approach, an infrastructure is put in place that can be rapidly deployed when a need arises. Sample collection only begins when there is a defined need to avoid wasted resources. Whatever approaches we use, planning now for the future will ensure that samples will be there when we need them.

Challenge #2: Standardize sample processing

The opportunity for biomolecules, especially proteins, to undergo aberrant changes in a clinical sample begins even before the sample is physically removed from the donor and continues through all subsequent handling and processing steps, including storage. Proteins are notoriously labile, presenting a particular challenge in developing methods to provide proteomics-appropriate samples. This is especially true for post-translational modifications of proteins, which can change dramatically simply by clamping the blood supply to the tissue.15, 16 To reduce the biases caused by these potential artifacts, carefully developed sample processing protocols have been developed. The best of these are formalized into well-documented and clearly-written SOPs. A number of programs and consortia have begun to collect and distribute information about best practices for sample processing. These include: the Office of Biorepositories and Biospecimen Research (OBBR) from the NCI17, the Organization for Economic Cooperation and Development (OECD) 18, ISBER19, the Marble Arch International Working Group on Biobanking for Biomedical Research (MAIWGBBR)20, the International Agency for Research on Cancer (IARC)21, the European Organization for the Research and Treatment of Cancer (EORTC)22, the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) sponsored by the European Commission23 and others.

Clearly, there is no shortage of guidelines to follow. In addition to these major biobanking consortia, countless individual programs, centers and labs have developed protocols and SOPs for their sample collection. It should be noted, however, that many of the currently available SOPs are based on “best practices.” An important evolution of the field will see more SOPs developed based on comparative tests that measure actual molecular outcomes for specific assays. Yet, despite this wealth of experience many samples are routinely collected and stored using outdated methods, which are known to be prone to artifact5. Moreover, too many groups fail to use formal SOPs and it is often difficult to track down which specific processing methods were used.

Objective #2: Create a centralized and shared database of SOPs used for all sample isolation, collection, processing and storage

With so many available methods, achieving universal agreement on standards for sample collection and processing seems beyond our reach. However, the current jumble is not sustainable. As a first step towards standardizing sample handling, we should create a common and shared database that stores and registers all SOPs used in sample collections. I recommend that we avoid the temptation to review the SOPs as a requisite to inclusion. Instead, all submitted SOPs that include the minimum required information and meet the needed formatting standards should be accepted and registered. While there will be value in reviewing the content of the SOPs in the database, our first priority should be to get everything in one place. Setting up a review process initially will take too long, involve too much controversy and will reduce participation.

Once registered, an SOP would be assigned an ID number that could be used in clinical protocol development, internal review board (IRB) submissions, and manuscripts. The SOP ID numbers would also attach to relevant samples in a biobank. Each sample would have several such ID numbers corresponding to the various SOPs used during its life cycle (e.g., tissue collection, tissue storage, protein extraction, etc.)

These SOPs need not be restricted to descriptions for how to physically manage the samples. They can also relate to experimental design and analysis. Proper experimental design, appropriate epidemiological considerations, careful power analyses, and well considered data analysis contribute to ensuring the success of studies with clinical samples. All of these represent areas where publicly available SOPs would improve clinical and translational research.24

It would be neither practical nor desirable to insist that everyone use particular SOPs. What works for one type of experiment may not for another. But, it is entirely reasonable to expect that all sample collection and storage should be done under a defined SOP, and that sample collectors should be required to submit and register their SOPs at a common database. In fact, the College of American Pathologists has already taken a step in this direction. The first item on the Biorepository Checklist for their Accreditation Program is to have a procedure manual with up to date SOPs25. There are a number of benefits that would accrue from this approach:

  1. There would be a common location to find all biobank and clinical sample-related SOPs.

  2. Reviewers and auditors of biobanks will have a clear resource to confirm that SOPs were developed.

  3. Publication and document preparation will be simplified because authors could simply point to a specific SOP ID number in lieu of having to write out the SOP in detail.

  4. Forcing sample collectors to provide a registered SOP ID number will help ensure that they use SOPs.

  5. This will provide a resource for researchers planning to collect samples to find the best SOPs.

  6. As multiple SOPs for the same tasks accumulate, we can examine the commonalities and differences shedding light on those steps likely to be most critical.

  7. This can be regarded as a first step towards comparative studies that develop SOPs based on molecular-based data.

Of course, there are many details to resolve in trying to implement such a database; there will be both development and maintenance costs. We will need to decide on the minimum information required and the appropriate format for submitted SOPs. However, there is already a wealth of material on this subject and perhaps one of the biobanking organizations above could be persuaded to sponsor and host such a database. Finally, discussions should be held with granting agencies, accreditation agencies, IRBs and journal editors to gain their support and willingness to apply pressure on sample collectors to include SOP IDs with their submissions.

Challenge #3: Disambiguating sample annotation

Studies with clinical samples rely on aligning experimental results with clinical diagnoses. Typically, the diagnosis is based on the official tissue sample examination by one or more pathologists. If all the samples in a study are reviewed by the same pathologist(s), it is straightforward to compare samples to other samples. But for many large studies where there are numerous pathologists, inter-observer variability introduces biases regarding the diagnoses attached to the samples. This is a particularly formidable problem for biobanks, where samples often come from multiple sources and are likely to be read by pathologists at different institutions.

Pathologists frequently use different terminology to describe the same clinical phenotype, often based on their local institutional conventions. In principle, this could be improved by obtaining broad agreement on a fixed terminology, often referred to as a controlled vocabulary by database developers. However, enforcing controlled vocabularies in the context of clinical medicine rarely succeeds in practice. Moreover, even in circumstances where there are well-defined and agreed upon criteria and terminology, there may be disagreements about the actual diagnosis. In just one recent example, 13 pathologists, from a mixture of population-based sites and clinical-based sites, reviewed the same 35 breast cancer cases using a common detailed data form to aid in making each diagnosis. The agreement among the pathologists on the histological type of invasive breast cancer ranged from 35 – 99% for different types, with a wide category specific κ range of 0.3 – 1.0 26. There are many contributors to these differences including training location, historical experience, disease expertise, and pre-existing expectations. Achieving consensus for certain histological types was clearly more difficult than others. Variability in sample labels will significantly limit the performance of biomarkers and our ability to execute meta-informatics analyses. The challenge we face in this imperfect world is: how do we manage these differences?

Objective #3: Develop software tools to calibrate pathological analysis

Efforts will continue to drive towards the use of controlled vocabularies and detailed systematic approaches to making diagnoses. These will help, but adoption will be slow and sporadic. Additional solutions may come from technology. Advancements in pattern recognition and automated imaging are showing impressive progress. Computers will eventually assist pathologists in making diagnoses and can be programmed to substitute controlled vocabularies. Their abilities to incorporate many forms of data align well with the current drive towards precision medicine 23. However, these tools are still in their early phases, and clinical trials will be needed to demonstrate their applicability and accuracy before they can be deployed.

The discipline of color management for offset printing inspires another solution. That profession faces a similar problem because devices which display color information from image files (e.g., monitors and printers) often interpret the same digital color information differently, which can lead to differences between the printed image and the image on the monitor. To solve this problem, color managers use calibration devices. A calibrator sends a series of specific digital colors to an output device, e.g., a monitor, and then an optical camera placed on the monitor measures how it displays that information. This information is sent to a file that acts as a color map which translates colors displayed on that monitor back to the information in the color file. In other words, assess how the device behaves and use informatics to make adjustments. The creation of color mapping files for all devices enables them to talk to one another about the same colors.

In an analogous fashion, instead of trying to change the behaviors of the pathologists, we could simply record their behaviors and create “calibration” files for each. Each pathologist would be asked to evaluate a common set of relevant specimens, which would include a mixture of straightforward and ambiguous cases (Figure 1). The goal is not to determine a “correct” answer, it is to learn how each pathologist makes calls relative to the average. Their readings would be recorded and added to a database that includes a digital profile on each pathologist. Software and informatics could then apply these profiles as needed when analyzing and interpreting large studies that compare experimental findings with diagnoses.

Figure 1. Creating a reference review profile to calibrate diagnoses from pathologists.

Figure 1

Top panel. All pathologists enrolled to read study specimens will review the same set of digital images of a reference set, selected to represent a broad range of relevant histopathologies, including ambiguous cases. A reference review profile that indicates how each pathologist scored each reference specimen will then be stored in a database. Bottom panel. When new specimens arrive they are assigned to be read by individual pathologists. The resulting diagnoses will be evaluated in the context of the pathologist’s reference profile and both the original diagnosis and an adjusted diagnosis will be reported. The adjusted diagnosis may help in interpreting diagnoses by highlighting samples where pathologists may disagree. Obviously, a very similar approach could be taken with radiologists and other professionals who render opinions on data to make diagnoses.

As an oversimplified example, suppose that for a particular type of prostate histopathology Pathologist C routinely assigns Gleason scores that are 3 points higher than the average of other pathologists. The system could detect this and provide an alert when Pathologist C makes diagnoses on that particular histopathology on an experimental specimen. In this setting, the system would report both the original diagnosis from Pathologist C and an adjusted diagnosis that compensates relative to the average call with that histological form.

The calibration specimens could be delivered to all contributing pathologists digitally. Improved scanning technologies and digital microscopy have led to excellent concordance in results found between optical and virtual microscopy 27, 28. Of course, pathologists are not machines; human nature is less predictable and we will need to account for this. Careful consideration will be necessary to select the most appropriate reference specimens and new algorithms will be needed to align the various profiles, but the foundations for this approach exist in other fields and should be adaptable here. Finally, it is worth noting that other areas that involve expert interpretation, such as radiology, might also apply here.

Challenge #4: Standardizing comparisons among clinical samples

Marko-Varga et al. (ASAP JPR reference: DOI: 10.1021/pr300185k) outline significant needs for standardization in several aspects of examining clinical samples. Of particular importance are assessing sample quality and making quantitative comparisons from sample to sample and laboratory to laboratory. Notably missing in proteomics is the availability of known and characterized internal reference proteins that could assist standardization. In the study of gene expression, a number of genes have been identified whose expression remains reasonably constant from sample to sample and under varying conditions, such as GAPDH. Although these so-called housekeeping genes are not perfect, investigators can include their levels as comparators in a study to indicate general sample integrity, to demonstrate that equivalent quantities of sample were used and to enable comparisons from one sample to the next. This is routine in transcriptomics, why not proteomics? Are there housekeeping proteins that investigators could use as reference points for the levels of other proteins?

Objective #4: Identify and characterize internal reference proteins for proteomics

One suggested approach entails spiking in a known quantity of an exogenous reference peptide or protein (Marko-Varga et al., ASAP JPR reference: DOI: 10.1021/pr300185k). This useful approach solves some problems but not others. If one added a labile marker to a sample, one could track sample stability starting from the moment of marker addition. Conversely, a known amount of a stable marker could be added to allow quantitative comparisons. But, this would presuppose that the samples had already been adjusted to be quantitatively comparable by some other means before adding the standard. Spiked-in quantitative standards can be especially helpful when added in the same concentration range as a desired target protein.

On the other hand, endogenous markers have advantage of tracking sample stability throughout the entire sample life span, including critical steps such as cell or tissue lysis, protein extraction and processing. Used as quantitative markers, endogenous housekeeping proteins would not require prior methods to determine comparable levels. Unlike exogenous markers, they could be used in imaging studies, such as immunohistochemical staining of tissue microarrays. Moreover, the use of endogenous proteins would avoid the need to produce and qualify a reagent that would need to be distributed to many laboratories so that they could compare their results to others.

There are several types of endogenous markers that we should identify and characterize: 1. Quantitative reference proteins. The most basic of reference standards, these would enable quantitative comparisons between samples both within and between individual labs. The ideal proteins here would be stable housekeeping proteins whose levels remain generally unchanged under various cellular and tissue conditions. In the perfect world, the levels would be consistent across cell types as well, though this may be too difficult to achieve. At minimum, the relative abundances in different cell types should be understood. 2. Sample integrity reference proteins. These markers are primarily useful in experiments that focus on biomolecules sensitive to degradation. They can help determine the degree to which samples have undergone proteolysis or other degradative steps, such as changes in important post translational modifications after blood vessel clamping or during sample processing. Therefore, these proteins (or peptides) must themselves be sensitive to the same types of degradation of concern for the protein(s) of interest. Given the daunting variety of potential degradative steps (e.g., cleavage by proteases with various specificities, dephosphorylation, deacetylation, deglycosylation, oxidation, etc.), a panel of reference standards that addresses some or all of these will be needed. 3. Cell type-specific reference proteins. The contribution of different cell types in a clinical sample can affect the relative abundances of specific proteins. Different biopsies may have different ratios of stroma to tumor to fatty tissue, for example. The identification of reference proteins that are unique to certain cell types, such as stromal cells or epithelial cells, could inform the determination of the relative contribution of different cell types in a sample. Understanding this contribution would improve the biological interpretation of changes of protein abundance.

In order to be generally useful and to spare precious sample, the reference proteins should be straightforward to measure by common methods such as antibody detection, immunofluorescence or mass spectrometry.

The challenge, of course, is that no such endogenous standard proteins have been identified. Significant effort will be needed to examine existing data and to perform new experiments to find the best candidates for such standards. Ultimately we should seek consensus about the best standards and then agree to include the appropriate standards in all relevant proteomics reporting. The potential payoff is large. Such standards would not only enable comparisons of individual experiments, but would also provide reference points for metaproteomics studies, unleashing an interpretive power that will revolutionize our field. I suggest that it is time for our community to convene and make plans to search for these reference proteins. Should we consider a competition to find them?

Challenge #5: Rethinking public involvement in biobanks

Social challenges continue to represent a significant concern regarding the use of clinical samples. With the recent explosion of new techniques to examine specimens, some of which could theoretically enable identification of the sample donor, many potential donors have expressed concerns about preserving their privacy7. Understandably, this has engendered tensions between sample donors and scientists regarding informed consent and the freedom to use samples for new research directions 25. In addition, some sample donors want an opportunity to see the results of the studies that are performed on their specimens. Many clinical and translational studies are limited because of lack of diversity in the clinical specimens used for the study29. Greater efforts are needed to ensure broad representation of samples from all sources, including marginalized and minority populations. Moreover, like scientists, sample donors have expressed frustration hearing about stalled research for lack of sample access when they know that so many samples have been collected and stored that never get shared. They want to find more effective mechanisms to connect with scientists researching diseases of interest to them. Nevertheless, despite these various concerns about privacy and sample use, the majority of the public supports clinical studies and would be willing to participate in them. Indeed, most would be willing to share their data with researchers and many would be willing to provide access to their past medical records7. A better approach is needed. For this last challenge, we should develop structures that treat the collection and use of clinical samples as a partnership between scientists and the public6.

Objective #5: Develop new governance models for biobanks

The current practice for collecting clinical specimens does not lend itself to public-researcher partnerships. Typically, a researcher recognizes the need for samples in her research, writes a grant application and approaches an IRB with a protocol. After eventually obtaining approval for this protocol, she then recruits sample donors (patients and controls), obtains informed consent and begins sample collection (Figure 2). There are several potential flaws with this approach. First, it fosters the view of sample donors primarily as sample and information resources rather than partners in the research process. Second, after devoting significant effort both bureaucratically and technically to collect the samples, researchers understandably view the resulting collection as belonging to them. Certainly, in common practice, the researchers who organize the collection have control of who gets access to the samples and how much they get. For these reasons, this arrangement significantly reduces the likelihood that these samples will ever get shared with other investigators. Finally, this model does not lend itself to a developing a mechanism for providing information or feedback to the sample donors regarding the results of the research30.

Figure 2. Exploring new governance paradigms for biobanks.

Figure 2

Figure 2

Top panel. This researcher driven paradigm is predominant today. The researcher plays the central role by walking the process through various steps including obtaining funding, getting IRB approval and arranging to collect the specimens. The roles of the public here are often limited to providing input through lay membership on the IRB and providing specimens. The biobank is then governed by the researcher who drove the process. Bottom panel. A new paradigm for biobanking governance begins with a partnership between the researcher and the public, often the patient advocate community. They seek funding and IRB approval together, with both advocating on behalf of the research. The public arm of the partnership is well suited to help recruit participation. The biobank is then governed by the partnership. In this paradigm the public participates in all aspects of the process.

In recent years the development of a strong patient advocate community has enabled a new paradigm to emerge. Instead of a process driven almost entirely by the researcher that only involves sample donors at the end, the new paradigm begins with a partnership between the researcher and committed members of the public (often the patient advocate community) and they build the sample collection together (Figure 2). Obviously, such partnerships ensure that views of the public get representation in the review process – not only as a representative on the IRB board, but as advocatesfor the research itself. Once they receive the necessary approvals, the public arm of the partnership can help the researcher find the sample donors needed for the study. Because they have worked together, the sample collection belongs to the partnership and includes a process by which others can also obtain access to the samples, avoiding sequestration of the samples. Moreover, the tighter link between the sample collection and the donors who provided them facilitates mechanisms to provide feedback and information about research results to the donors. The partnership also provides a mechanism to request permission in the future from the donor community to use the samples for unanticipated experiments and new methods. In a similar fashion, it can help researchers obtain important clinical follow up information on patients and controls. After all, both sides of the partnership strive for the same goal: discoveries that lead to a reduction of morbidity and mortality from disease.

Perhaps the best argument for the partnership paradigm is that it affords the opportunity to accomplish things that would be notably difficult to achieve otherwise. In 2003, Connie Rufenbarger, a patient advocate and Director of Project Development for The Catherine Peachey Fund, hosted a meeting at the Indiana University Cancer Center where she learned that breast cancer scientists desperately needed healthy breast tissue. Breast tissue collected through biopsies or resections typically comes from women with documented or suspected breast cancer. The occasion to collect such tissue from healthy age-matched women does not arise in clinical medicine. Yet, understanding breast cancer necessitates comparing abnormal tissue to normal tissue, i.e., from women with no hint of abnormality. Were a researcher alone to propose a protocol to collect breast biopsies from healthy women, she could meet significant resistance from the evaluating IRB and find difficulty accruing volunteers. However, by partnering with Dr. Anna Maria Storniolo at Indiana University (IU) they created the Susan G. Komen for the Cure Tissue Bank at the IU Simon Cancer Center. They got the necessary approvals and funding, and mobilized women who are passionate about supporting medical research to stop breast cancer. This tissue bank now has more than 2800 samples of healthy breast tissue.

That these women were willing to assume the risk and consequences of a surgical procedure solely to advance our understanding of disease illustrates the will, courage and commitment of their community towards this cause. They are not alone. Similar motivation can be found in numerous advocate communities for countless diseases. Many are vocally committed to advancing research by improving sample collections. Partnering with them to create biobanks will help educate the public of the importance of biomedical research and enable bold new directions that will increase the success of these programs.

Summary

The five challenges outlined in this editorial address some of the most significant roadblocks limiting the success of biobanks and their roles in proteomics. They include: (1) defining our biobanking needs; (2) increasing the use of and access to SOPs; (3) mapping inter-observer differences for use in normalizing diagnoses; (4) identifying natural internal protein controls; and (5) redefining the clinical sample paradigm by building partnerships with the public. This list of challenges is not comprehensive. Rather, these objectives were chosen because we can act on all of them now; we have the tools we need. I believe that each of these challenges is resolvable in less than 5 years. In some cases, I have suggested solutions, but we should all keep open minds. There are better ideas out there. Whichever approaches we take will require resources and organized efforts. Perhaps we can turn towards our professional organizations like HUPO, ISBER and others to start on these challenges. However we go about solving these problems, the future of proteomics depends on access to high quality clinical samples, collected under standardized conditions, accurately annotated and shared under conditions that promote the research we need to do. We must all step up and contribute to this effort so that it gets done right as we move forward.

Acknowledgements

The author would like to thank Catherine Cormier, Garrick Wallstrom, Karen Anderson, Mitch Magee, and two anonymous reviewers for their suggestions and critical reading, and Kathleen Stinchfield for her expert help in preparing this manuscript. The author receives support from the NIH/NCI Early Detection Research Network 5U01CA117374, NIH/NIGMS Protein Structure Initiative U01GM098912, NIH/NCI Physical Science Oncology Center H47328, and the Breast Cancer Research Foundation.

References

  • 1.Burdall SE, Hanby AM, Lansdown MR, Speirs V. Breast cancer cell lines: friend or foe? Breast cancer research : BCR. 2003;5(2):89–95. doi: 10.1186/bcr577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Glenn GC, Hathaway TK. Effects of specimen evaporation on quality control. American journal of clinical pathology. 1976;66(4):645–652. doi: 10.1093/ajcp/66.4.645. [DOI] [PubMed] [Google Scholar]
  • 3.Calam RR. REVIEWING IMPORTANCE OF SPECIMEN COLLECTION. Journal of the American Medical Technologists. 1977;39(6):297–302. [Google Scholar]
  • 4.Vaught J. [accessed Aug 3, 2012];Research & Policy Initiatives in NCI ' s Office of Biorepositories & Biospecimen Research Translational Research Interest Group. http://sigs.nih.gov/trig/Documents/TRIG_091210_Vaught.pdf.
  • 5.Blow N. Biobanking: freezer burn. Nat Meth. 2009;6(2):173–178. [Google Scholar]
  • 6.Saha K, Hurlbut JB. Research ethics: Treat donors as partners in biobank research. Nature. 2011;478(7369):312–313. doi: 10.1038/478312a. [DOI] [PubMed] [Google Scholar]
  • 7.Kaufman DJ, Murphy-Bollinger J, Scott J, Hudson KL. Public opinion about the importance of privacy in biobank research. American journal of human genetics. 2009;85(5):643–654. doi: 10.1016/j.ajhg.2009.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Forsberg JS, Hansson MG, Eriksson S. Changing perspectives in biobank research: from individual rights to concerns about public health regarding the return of results. Eur J Hum Genet. 2009;17(12):1544–1549. doi: 10.1038/ejhg.2009.87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hansson MG, Dillner J, Bartram CR, Carlson JA, Helgesson G. Should donors be allowed to give broad consent to future biobank research? The lancet oncology. 2006;7(3):266–269. doi: 10.1016/S1470-2045(06)70618-0. [DOI] [PubMed] [Google Scholar]
  • 10.Cambon-Thomsen A. The social and ethical issues of post-genomic human biobanks. Nat Rev Genet. 2004;5(11):866–873. doi: 10.1038/nrg1473. [DOI] [PubMed] [Google Scholar]
  • 11.Garcia-Merino I, de Las Cuevas N, Jimenez JL, Gallego J, Gomez C, Prieto C, Serramia MJ, Lorente R, Munoz-Fernandez MA. The Spanish HIV BioBank: a model of cooperative HIV research. Retrovirology. 2009;6:27. doi: 10.1186/1742-4690-6-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Eiseman E, Haga SB. Handbook of Human Tissue Sources: A National Resource of Human Tissue Samples. [accessed Aug 6, 2012]; http://www.rand.org/pubs/monograph_reports/MR954.
  • 13.Colditz GA, Manson JE, Hankinson SE. The Nurses' Health Study-20-year contribution to the understanding of health among women. Journal of women ' s health / the official publication of the Society for the Advancement of Women ' s Health Research. 1997;6(1):49–62. doi: 10.1089/jwh.1997.6.49. [DOI] [PubMed] [Google Scholar]
  • 14.Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L, Fleisher M, Robbins RJ, Tempst P. Correcting Common Errors in Identifying Cancer-Specific Serum Peptide Signatures†. Journal of proteome research. 2005;4(4):1060–1072. doi: 10.1021/pr050034b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Berglund L, Bjoerling E, Oksvold P, Fagerberg L, Asplund A, Szigyarto CAK, Persson A, Ottosson J, Wernerus H, Nilsson P, Lundberg E, Sivertsson A, Navani S, Wester K, Kampf C, Hober S, Ponten F, Uhlen M. A Genecentric Human Protein Atlas for Expression Profiles Based on Antibodies. Molecular & Cellular Proteomics. 2008;7(10):2019–2027. doi: 10.1074/mcp.R800013-MCP200. [DOI] [PubMed] [Google Scholar]
  • 16.Espina V, Mueller C. Espina V, Liotta LA. Molecular Profiling. Vol. 823. 2012. Reduction of Preanalytical Variability in Specimen Procurement for Molecular Profiling; pp. 49–57. [DOI] [PubMed] [Google Scholar]
  • 17.Vaught J, Rogers J, Myers K, Lim MD, Lockhart N, Moore H, Sawyer S, Furman JL, Compton C. An NCI perspective on creating sustainable biospecimen resources. Journal of the National Cancer Institute. Monographs. 2011;2011(42):1–7. doi: 10.1093/jncimonographs/lgr006. [DOI] [PubMed] [Google Scholar]
  • 18. [accessed Aug 2, 2012];OECD Best Practice Guidelines for Biological Resource Centers – General Best Practice Guidelines for all BRCs. http://www.oecd.org/dataoecd/7/13/38777417.pdf.
  • 19.Betsou F, Lehmann S, Ashton G, Barnes M, Benson EE, Coppola D, DeSouza Y, Eliason J, Glazer B, Guadagni F, Harding K, Horsfall DJ, Kleeberger C, Nanni U, Prasad A, Shea K, Skubitz A, Somiari S, Gunter E. Biological, I. S. f.; Science, E. R. W. G. o. B., Standard Preanalytical Coding for Biospecimens: Defining the Sample PREanalytical Code. Cancer Epidemiology Biomarkers & Prevention. 2010;19(4):1004–1011. doi: 10.1158/1055-9965.EPI-09-1268. [DOI] [PubMed] [Google Scholar]
  • 20.Riegman PH, Morente MM, Betsou F, de Blasio P, Geary P. Biobanking for better healthcare. Mol Oncol. 2008;2(3):213–222. doi: 10.1016/j.molonc.2008.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. [accessed Aug 6, 2012 ];International Agency for Research on Cancer (IARC) http://ibb.iarc.fr/
  • 2.European Organisation for the Research and Treatment of Cancer (EORTC) [accessed Jul 31, 2012];Biobanking: Human Biological Materials Collection, Storage and Use (POL020) http://www.eortc.org/newsletter-archive/eortcbiobanking-human-biological-materials-collection-storage-and-use-pol020.
  • 23.Christine MM, Sharly JN, Gilbert SO. Evolution of Translational Omics: Lessons Learned and the Path Forward. The National Academies Press; 2012. [PubMed] [Google Scholar]
  • 24.Ransohoff DF, Gourlay ML. Sources of Bias in Specimens for Research About Molecular Markers for Cancer. Journal of Clinical Oncology. 2010;28(4):698–704. doi: 10.1200/JCO.2009.25.6065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hayden EC. Informed consent: a broken contract. Nature. 2012;486(7403):312–314. doi: 10.1038/486312a. [DOI] [PubMed] [Google Scholar]
  • 26.Longacre TA, Ennis M, Quenneville LA, Bane AL, Bleiweiss IJ, Carter BA, Catelano E, Hendrickson MR, Hibshoosh H, Layfield LJ, Memeo L, Wu H, O'Malley FP. Interobserver agreement and reproducibility in classification of invasive breast carcinoma: an NCI breast cancer family registry study. Mod Pathol. 2006;19(2):195–207. doi: 10.1038/modpathol.3800496. [DOI] [PubMed] [Google Scholar]
  • 27.Furness P. A randomized controlled trial of the diagnostic accuracy of internet-based telepathology compared with conventional microscopy. Histopathology. 2007;50(2):266–273. doi: 10.1111/j.1365-2559.2006.02581.x. [DOI] [PubMed] [Google Scholar]
  • 28.Molnar B, Berczi L, Diczhazy C, Tagscherer A, Varga SV, Szende B, Tulassay Z. Digital slide and virtual microscopy based routine and telepathology evaluation of routine gastrointestinal biopsy specimens. Journal of Clinical Pathology. 2003;56(6):433–438. doi: 10.1136/jcp.56.6.433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Haga SB. Impact of limited population diversity of genome-wide association studies. Genetics in Medicine. 2010;12(2):81–84. doi: 10.1097/GIM.0b013e3181ca2bbf. [DOI] [PubMed] [Google Scholar]
  • 30.Maschke KJ. [http://www.thehastingscenter.org/uploadedFiles/Publications/Briefing_Book/biobanks%20dna%20and%20research.pdf];Biobanks: DNA and Research. accessed Aug 6, 2012.

RESOURCES