Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2021 Nov 15;38(4):1157–1158. doi: 10.1093/bioinformatics/btab779

tcplfit2: an R-language general purpose concentration–response modeling package

Thomas Sheffield 1, Jason Brown 2, Sarah Davidson 3, Katie Paul Friedman 4, Richard Judson 5,
Editor: Anthony Mathelier
PMCID: PMC10202035  PMID: 34791027

Abstract

Summary

Many applications of chemical screening are performed in concentration or dose–response mode, and it is necessary to extract appropriate parameters, including whether the chemical/assay pair is active and if so, what are concentrations where activity is seen. Typically, multiple mathematical models or curve shapes are tested against the data to assess the best fit. There are several commercial programs used for this purpose as well as open-source libraries. A widely used system for managing high-throughput screening (HTS) concentration–response data is tcpl (ToxCast Pipeline). The current implementation of tcpl has the concentration–response modeling code tightly integrated with the data management and databasing aspects of HTS data processing. Tcplfit2 is a stand-alone version of the curve-fitting and hitcalling core of tcpl that has been extended to include a large number of standard curve classes and to use benchmark dose modeling. This package will be useful for HTS concentration–response data such as high-throughput whole genome transcriptomics.

Availability and implementation

tcplfit2 is written in R and is available from CRAN.

1 Introduction

A common technique in drug discovery or chemical safety assessment is to run chemicals in concentration–response mode against one or more assays. The key outputs from the concentration–response testing are (i) is the chemical active in the assay? (ii) if it is active, at what concentration is significant activity seen, relative to background? and (iii) what is the shape of the concentration–response curve? There are several commercial applications for performing this type of analysis [SigmaPlot (Systat, 2020), GraphPad Prism (GraphPad, 2021)] as well as open-source packages, described below.

Our group has developed a widely used R-language application called tcpl (ToxCast Pipeline; Filer et al., 2017) which manages large collections of high-throughput screening (HTS) data from the raw data management through concentration–response modeling and graphical results presentation. The tcplfit2 package addresses two shortcomings in the original implementation of tcpl. The first is that the collection of curve-fitting methods in tcpl was limited to a constant model, a Hill model (sigmoidal) and a gain-loss model (a rising Hill model followed by falling Hill model). We have now added several exponential and power law models which are commonly used in concentration–response modeling, including all models used in the BMDExpress program used for modeling concentration–response transcriptomics data (Phillips et al., 2019; Yang et al., 2007). Additionally, benchmark dose (BMD) modeling capabilities have been added. The BMD is the dose or concentration where the model curve crosses the benchmark response (BMR) level, which is defined as some multiple of the standard deviation of the background noise level (often 1.349). The second goal of developing tcplfit2 is to make a stand-alone package with this comprehensive functionality which is independent of the complex data management capabilities and database integration tasks carried out by tcpl. One major new use of tcplfit2 is to model concentration–response in transcriptomics at the gene and pathway levels (Harrill et al., 2021).

2 Implementation

Tcplfit2 is an R package downloadable from CRAN (https://cran.r-project.org/package=tcplfit2). All code functions are documented in the package manual, and a vignette is provided. There are two main functions that a user will need to call for each chemical-assay dataset. A dataset consists of a vector of concentrations and a vector of the same length of responses, which constitute a single concentration–response series, with replicates. The main functions are tcplfit2_core and tcplhit2_core. The first performs the concentration–response modeling, fitting the data to each of the specified models and selecting the model with the minimum AIC value (Akaike Information Criteria; Akaike, 1998) as the best model. The second determines whether the data show significant activity. The primary inputs to tcplfit2_core are the concentration and response vectors, a cutoff value, which determines the level of response required to call the curve a hit, the list of curve methods and optional chemical and assay identifiers. The output is a list indicating the winning model and fitting parameters for all models. This list is then input to tcplhit2_core, whose output is a 1-row data frame with elements that include the hitcall, fitting parameters for the winning model and the original identifiers (chemical, assay, etc.). The models that can be used include constant (response = 0), Hill, gain-loss, first- and second-order polynomials, a power function and 2, 3, 4 and 5 parameter exponential models. The functional forms for the models are defined in the manual. One novel aspect of the hit calling method is that it returns a value in the range of 0–1, providing a continuous estimate that the chemical endpoint pair is active, rather than a binary value.

Additional inputs to tcplhit2_core are the original identifiers and a value called onesd, which is one standard deviation of the background noise level. The BMR is defined as onesd × 1.349, which is the amount required to shift the mean response of the background distribution such that the treated distribution contains a 10% increase over the assumed background rate of response (Thomas et al., 2007).

Two further functions are provided to help the user develop a custom workflow. The first is concRespCore, which takes as input all of the parameters necessary for both the curve-fitting and hitcalling methods and returns a simple summary. The second auxiliary function is concRespPlot which takes the output from concRespCore and make a plot showing the original data, the curve and adding identifiers and output parameters such as the BMD [and its confidence intervals, calculated using a maximum likelihood method, identical to what is used in BMDExpress (Yang et al., 2007)] and the continuous hitcall. The package vignette demonstrates how to use concRespCore and concRespPlot. It is expected that users would make their own custom version of the plotting function.

3 Statistical considerations

The underlying statistical model assumes that the activity is zero at low concentrations, so that the background noise distribution will be zero-centered and the noise will follow a t-distribution with 4 degrees of freedom (Filer et al., 2017). A ‘true’ response can be either negative or positive, but always starting from a baseline of zero. Nonlinear maximum likelihood estimation is used to determine the parameters for each model. One can add new models or functional forms to the package by creating a new function named fit(model name) with inputs being the concentration and response vectors and any model-specific constraints. There are two inputs already mentioned (cutoff and onesd) that are estimates of the background or noise level, used to determine whether a curve is a hit or not. The first (cutoff) is defined by the user based on expert judgment about how large a response will be required to call a hit. Our preferred approach is to calculate the 95% confidence interval around baseline data. There are two ways to run an assay that will affect how the background distribution is analyzed. (i) The raw assay results range from 0 to a maximum value, for both samples treated with chemical and control samples (e.g. DMSO). A measurement of the background distribution, typically the standard deviation, can then be calculated from the control. (ii) Control well normalization is performed, for instance looking at fold-changes in a transcriptomics assay. The preferred approach to calculate the background in this case is to assume that most chemicals will be inactive at the lowest one or two concentrations across the assay, and then calculate distributional parameters from these samples.

The tcplhit2_core continuous hitcall is calculated based on the product of the probabilities of the following values: (i) that at least one median response is greater than the cutoff; (ii) that the top of the fitted curve is above the cutoff and (iii) that the winning AIC value is less than that of the constant model. The first probability is computed by using the error parameter from the model fit and t-distribution to calculate the odds of at least one response exceeding the cutoff (the error model around the data uses a 3-parameter t-distribution). The second is by using the likelihood ratio to compute the one-sided probability of the cutoff being exceeded. The third was set to be the Akaike weight relative to the constant model:

e-12AICwinninge-12AICwinning+ e-12AICcnst. (1)

4 Comparison with other R-language open-source concentration–response packages

The package drc is similar to tcpfit2 with an overlapping set of models, able to handle more complex types of data than are seen with HTS (Ritz et al., 2015). Mixtox is designed to model concentration–response in mixtures of chemicals (Zhu, 2017).

Disclaimer

The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the US EPA.

Funding

This work was supported by the US EPA.

Conflict of Interest: none declared.

Contributor Information

Thomas Sheffield, Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA.

Jason Brown, US Environmental Protection Agency, RTP NC  USA.

Sarah Davidson, US Environmental Protection Agency, RTP NC  USA.

Katie Paul Friedman, US Environmental Protection Agency, RTP NC  USA.

Richard Judson, US Environmental Protection Agency, RTP NC  USA.

References

  1. Akaike H. (1998) Information Theory and an Extension of the Maximum Liklihood Principle. Springer, New York, NY. [Google Scholar]
  2. Filer D.L.  et al. (2017) tcpl: the ToxCast pipeline for high-throughput screening data. Bioinformatics, 33, 618–620. [DOI] [PubMed] [Google Scholar]
  3. GraphPad (2021) Prism. https://www.graphpad.com/ (1 October 2021, date last accessed).
  4. Harrill J.A.  et al. (2021) High-throughput transcriptomics platform for screening environmental chemicals. Toxicol. Sci., 181, 68–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Phillips J.R.  et al. (2019) BMDExpress 2: enhanced transcriptomic dose-response analysis workflow. Bioinformatics, 35, 1780–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Ritz C.  et al. (2015) Dose-response analysis using R. PLoS One, 10, e0146021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Systat (2020) SigmaPlot. https://systatsoftware.com/products/sigmaplot/ (1 October 2021, date last accessed).
  8. Thomas R.S.  et al. (2007) A method to integrate benchmark dose estimates with genomic data to assess the functional effects of chemical exposure. Toxicol. Sci., 98, 240–248. [DOI] [PubMed] [Google Scholar]
  9. Yang L.  et al. (2007) BMDExpress: a software tool for the benchmark dose analyses of genomic data. BMC Genomics, 8, 387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Zhu X. (2017) Package mixtox. https://cran.r-project.org/web/packages/mixtox/mixtox.pdf (1 October 2021, date last accessed).

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES