Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 May 9;17(5):e0259327. doi: 10.1371/journal.pone.0259327

EZTraits: A programmable tool to evaluate multi-site deterministic traits

Matt Carland 1,#, Haley Pedersen 1,#, Madhuchanda Bose 1, Biljana Novković 1, Charles Manson 1, Shany Lahan 1, Alex Pavlenko 1, Puya G Yazdi 1, Manfred G Grabherr 1,*
Editor: Ahmed Mancy Mosa2
PMCID: PMC9084532  PMID: 35533190

Abstract

The vast majority of human traits, including many disease phenotypes, are affected by alleles at numerous genomic loci. With a continually increasing set of variants with published clinical disease or biomarker associations, an easy-to-use tool for non-programmers to rapidly screen VCF files for risk alleles is needed. We have developed EZTraits as a tool to quickly evaluate genotype data against a set of rules defined by the user. These rules can be defined directly in the scripting language Lua, for genotype calls using variant ID (RS number) or chromosomal position. Alternatively, EZTraits can parse simple and intuitive text including concepts like ’any’ or ’all’. Thus, EZTraits is designed to support rapid genetic analysis and hypothesis-testing by researchers, regardless of programming experience or technical background. The software is implemented in C++ and compiles and runs on Linux and MacOS. The source code is available under the MIT license from https://github.com/selfdecode/rd-eztraits.

Introduction

Although many common health disorders are highly polygenic, requiring the calculation of a complex aggregate genetic risk score, there is a subset of traits and disorders for which a few variants with disproportionately large effect sizes account for a significant portion of phenotypic variance. These mono- or oligogenic traits are therefore amenable to simpler analytical approaches, which do not rely on statistical association. Often, these traits can be easily identified with simple analyses and determination of the presence/absence of associated variants.

One illustrative example is the APOE gene, in which a two-SNP haplotype may modulate an individual’s risk of late-onset Alzheimer’s disease by approximately 15x [1]. Another example is the ability to digest lactose into adulthood, which can be fully predicted on the basis of just six SNPs in the MCM6 gene, among which a single heterozygous- or homozygous-derived genotype implies lactose tolerance [2]. Similarly, dietary tolerance to fructose can be predicted by the presence of a few different combinations of homozygous mutations in the ALDOB gene [3].

Furthermore, small numbers of variants may also be useful for characterizing individual variability within specific biological pathways. One example is the COMT gene, in which various four-SNP haplotypes have been associated with significant differences in the biological activity of the gene’s product enzyme [4, 5]. Even in the absence of a direct link to a clinical phenotype, such genetic markers may serve as a useful “jumping-off” point for further investigations into the etiological structure of clinically relevant phenotypes.

As whole-genome sequencing becomes more routine, many of these traits can be interrogated directly from genomic data. However, a typical sequencing project can produce millions of variants, and parsing through variant files often requires specialized programming knowledge. The difficulty of attracting and training enough researchers with the requisite programming and computing skills is well known [6]. In response, there has been a move towards intuitive GUIs as well as script-based tools free from expensive licenses. These solutions broaden access to the ability to computationally analyze the wealth of genomic data being generated. Indeed, the shortfall in computing skills for data analysis has been recorded in many surveys. Moreover, the situation is not improving, in fact, the skills gap is only set to widen [7]; this makes the development of tools to narrow this gap especially important. At present, the authors are not aware of any open-source, user-friendly programs that are tailored to the analysis of simple, deterministic traits from genomic data.

Here, we present EZTraits, a tool specifically designed for non-programmers. EZTraits is intended to assist with searching VCF (Variant Call Format [8]) files for the presence of mono- or oligogenic traits and returning their trait associations to the user, based either on our library of variant-trait associations or new, user-added conditions and associations. Thus, EZTraits allows genomic researchers to analyze a wide variety of phenotypes of clinical and scientific interest quickly and easily, regardless of their level of programming ability.

Methods

EZTraits evaluates variant combinations by internally building and interpreting Lua scripts. The Lua programming language [9] was designed with ease of use in mind and has been widely adopted by non-programmers for computer game modding and writing plug-ins, making it a natural choice for use by researchers both with or without a coding background.

There are two ways for users to build analyses with EZTraits: (a) by writing or modifying a Lua “snippet,” which contains pre-made variables for supplying key genotype and phenotype information; or (b) by writing a plaintext rule set that provides genotype and phenotype information by using simple concepts such as ‘all’ and ‘any’, which allows for a more intuitive and compact representation. This conversion feature allows users to easily write in rsID-trait associations to use with EZTraits without any “coding” at all.

Using scripts

Users can write Lua snippets directly by providing the appropriate genotype and phenotype information. For genotypes, SNPs can be referenced using either their rsID or chromosomal position (following the syntax ‘chr1:6658743’). Phenotype information is entered by modifying two return variables: the floating-point variable ‘risk’; and the string ‘comment’—both of which can be manipulated directly in the Lua script snippet. These two variables allow the user to flexibly provide either quantitative or qualitative phenotype data (or both), depending on the trait being analyzed.

For example, the snippet:

if rs568149713 == "A/G" and rs557514207 == "G/G"then

   comment = "highrisk"

   risk = 0.8

end

if chr1:16949 == "A/C" or rs553090414 == "C/C" then

   comment = "mediumrisk"

   risk = 0.5

end

is completed into a valid Lua function by adding variables that correspond to the RS identifiers or chromosomal positions. These variables are automatically initialized from a VCF or TSV file, and together with a small amount of bracketing code, the complete function is:

function evaluate ()

   comment = "none"

   risk = 0

   rs568149713 = "A/G"

   rs557514207 = "G/G"

   chr1_16949 = "A/C"

   rs553090414 = "C/C"

   if rs568149713 == "A/G" and rs557514207 == "G/G" then

      comment = "highrisk"

      risk = 0.8

   end

   if chr1_16494 == "A/C" or rs553090414 == "C/C" then

      comment = "mediumrisk"

      risk = 0.5

   end

   return risk, comment

end

This function is then called directly from C++ by EZTraits, and the results are presented to the user.

Structured text entry

In addition, EZTraits can automatically convert text files into Lua by applying some simple-yet-intuitive concepts, such as ‘any’ and ‘all’, i.e., any of the following conditions satisfy a trait, or all in combination do. This text ruleset is then automatically converted into fully-functional Lua code via the tool Txt2Lua.

For example, the rules to define fructose tolerance/intolerance using three common causal SNPs can be written as:

Any

rs1800546 ‘GG’

rs76917243 ‘TT’

rs78340951 ‘CC’

== “Fructose Intolerant”

Any:

rs1800546 ‘C/G’

rs76917243 ‘G/T’

rs78340951 C/G

== “Variant Carrier”

else “Tolerant to Fructose”

EZTraits accepts and interprets the keywords ‘All’, ‘Any’, and ‘else’, optionally followed by a colon. Acceptable genotype call formats include ‘CG’ and C/G (with optional single quotation marks), where the latter convention has to be used for sites that contain indels, e.g., ‘T/TGAT’.

The above text thus translates into the Lua snippet:

if rs1800546 == "G/G" then

   comment = "FructoseIntolerant"

   return risk, comment

end

if rs76917243 == "T/T" then

   comment = "FructoseIntolerant"

   return risk, comment

end

if rs78340951 == "C/C" then

   comment = "FructoseIntolerant"

   return risk, comment

end

if rs1800546 == "C/G" then

   comment = "VariantCarrier"

   return risk, comment

end

if rs76917243 == "G/T" then

   comment = "VariantCarrier"

   return risk, comment

end

if rs78340951 == "C/G" then

   comment = "VariantCarrier"

   return risk, comment

end

comment = "ToleranttoFructose"

Results

EZTraits is a command-line tool that compiles and runs on Linux and Mac operating systems. Inputs are VCF or space/tab-delimited TSV files. The Lua interpreter, version 5.4.2, is embedded so that EZTraits has no external dependencies. EZTraits has minimal requirements in terms of RAM, using less than 5KB on average. It takes about 2.4 minutes to parse a whole-genome VCF file from a single individual from the 1000 Genomes Project [10], containing ~78 million SNPs.

Usage

EZTraits has two input parameters: (a) the VCF or TSV file; and (b) the Lua snippet. The usage for processing a VCF and TSV file is:

  • ./EZTraits -i data/sample.vcf -lua scripts/test.lua

  • ./EZTraitsCSV -i data/sample.csv -lua scripts/test.lua

The output is written to the console. To convert structured text, run e.g.:

  • ./Txt2Lua -i scripts/fructose.txt > fruct_test.lua

Discussion

We developed EZTraits to enable researchers, regardless of programming experience, to easily screen genomic data for user-defined combinations of variants that underlie certain phenotypes or disease risks. While we do not have any large-scale scientific studies, in-house experiments show that even biologists with no training in computer programming can use EZTraits’ structured text entry feature to set up systems to inquire genomes for specific traits or hypotheses. Our ultimate goal is to make genomic investigations for specific traits as easy as a Google search. We consider elements of natural language processing as an essential part of this endeavor.

While there are many scripts and tools publicly available, we are not aware of any programs to screen VCFs for predefined variant-trait associations that are accessible for the non-programming community. Therefore, to examine these deterministic variants, researchers or medical professionals must either perform targeted genetic testing, or when genomic data is available, use VCF processing tools that require a higher-level of bioinformatics experience, like bcftools [11]. Additionally, interpretation of each genotype call with respect to a given trait is still necessary. This can prove impractical if many individuals or traits are being studied and a researcher lacks programming experience. EZTraits provides a simple solution to streamline such workflows.

The most comparable VCF screening tools are designed to facilitate variant discovery or identify causative mutations in patients with Mendelian disorders (e.g., VCF-Miner [12]), BrowseVCF [13], and MendelMD [14]). These tools integrate annotations from functional databases (e.g., ANNOVAR, SnpEFF, Variant Effect Predictor) to try to find mutations that could explain observed phenotypes. Conversely, EZTraits is designed to rapidly screen VCFs for known deterministic variants and support hypothesis-testing of suspected variant-trait associations.

Unlike the tools listed above, which are dependent on existing databases, EZTraits is flexible and customizable with no external dependencies. Therefore, it is also appropriate for use in non-human and non-model species. For example, within agriculture, genomics has facilitated targeted selection through the identification of causative loci for desirable traits in economically important species [15]. Without genetic insight, selection for profitable phenotypes (e.g., muscular hyperplasia from myostatin mutations in cattle and other livestock) can also lead to the widespread propagation of deleterious mono- or oligogenic disorders if a sire is an unknown carrier of disease-associated variants [16]. The ability to screen sires for carrier status is thus vital for herd health. However, existing tools may not provide the necessary annotations or species-specific resources to do so. EZTraits can identify carrier status or provide phenotype predictions in any species because the library of variant-trait associations is user-defined. Additionally, the library can easily be expanded as new associations are identified, with no limit on the number of variants or traits.

Importantly, while EZTraits can screen VCFs to help predict phenotypes for simple mono- or oligogenic traits, it does not predict phenotypes or disease risk for complex, polygenic traits, wherein the trait is influenced by a large number of variants each with small effect sizes [17]. More specifically, EZTraits is not a statistical tool and does not support the calculation of GWAS-derived polygenic risk scores. One big limitation at present is that EZTraits cannot make use of phasing information. However, for certain oligogenic traits, gametic phase (e.g., whether variants are cis or trans on homologous chromosomes) can affect phenotypic expression [18]. We plan to integrate phase-awareness in future releases, as well as multi-sample VCF processing for even greater efficiency. By design, EZTraits’ structured language input feature is inclusive and makes it usable for all researchers, including scientists that do not have a background in programming, computer science, or bioinformatics. While “structured language” is just a small step towards a new paradigm of human-machine interaction, genomics is a perfect sandbox for experiments, because very specific questions can be well formulated. In the future, we envision that this approach will be expanded to a more “natural language” interface with a “natural dialog” component, ensuring that all the relevant information and hypotheses are well defined for the machine before starting experiments.

Data Availability

The EZTraits source code is available under the MIT license from https://github.com/selfdecode/rd-eztraits.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Husain MA, Laurent B, Plourde M. APOE and Alzheimer’s Disease: From Lipid Transport to Physiopathology and Therapeutics. Front Neurosci. 2021;15: 85. doi: 10.3389/fnins.2021.630502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Anguita-Ruiz A, Aguilera CM, Gil Á. Genetics of lactose intolerance: An updated review and online interactive world maps of phenotype and genotype frequencies. Nutrients. 2020;12: 1–20. doi: 10.3390/nu12092689 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Coffee EM, Yerkes L, Ewen EP, Zee T, Tolan DR. Increased prevalence of mutant null alleles that cause hereditary fructose intolerance in the American population. J Inherit Metab Dis. 2010;33: 33–42. doi: 10.1007/s10545-009-9008-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, et al. Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science (80-). 2006;314: 1930–1933. doi: 10.1126/science.1131262 [DOI] [PubMed] [Google Scholar]
  • 5.Nackley AG, Shabalina SA, Lambert JE, Conrad MS, Gibson DG, Spiridonov AN, et al. Low Enzymatic Activity Haplotypes of the Human Catechol-O-Methyltransferase Gene: Enrichment for Marker SNPs. PLoS One. 2009;4: e5237. doi: 10.1371/journal.pone.0005237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Smith DR. Bringing bioinformatics to the scientific masses. EMBO Rep. 2018;19: e46262. doi: 10.15252/embr.201846262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Attwood TK, Blackford S, Brazas MD, Davies A, Schneider MV. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform. 2019;20: 398–404. doi: 10.1093/bib/bbx100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27: 2156–2158. doi: 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ierusalimschy R, de Figueiredo LH, Filho WC. Lua—an extensible extension language | Software—Practice & Experience. Softw Pract Exp. 1996;26: 635–652. [Google Scholar]
  • 10.Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, et al. High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv. 2021; 2021.02.06.430068. doi: 10.1101/2021.02.06.430068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10: 1–4. doi: 10.1093/gigascience/giab008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hart SN, Duffy P, Quest DJ, Hossain A, Meiners MA, Kocher JP. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Brief Bioinform. 2016;17: 346–351. doi: 10.1093/bib/bbv051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Salatino S, Ramraj V. BrowseVCF: a web-based application and workflow to quickly prioritize disease-causative variants in VCF files. Brief Bioinform. 2017;18: 774–779. doi: 10.1093/bib/bbw054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cardenas R G. C. C. L., Linhares N D., Ferreira R L., Pena SDJ. Mendel, MD: A user-friendly open-source web tool for analyzing WES and WGS in the diagnosis of patients with Mendelian disorders. PLoS Comput Biol. 2017;13: e1005520. doi: 10.1371/journal.pcbi.1005520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Georges M, Charlier C, Hayes B. Harnessing genomic information for livestock improvement. Nature Reviews Genetics. Nature Publishing Group; 2019. pp. 135–156. doi: 10.1038/s41576-018-0082-2 [DOI] [PubMed] [Google Scholar]
  • 16.Ciepłoch A, Rutkowska K, Oprządek J, Poławska E. Genetic disorders in beef cattle: a review. Genes and Genomics. Genetics Society of Korea; 2017. pp. 461–471. doi: 10.1007/s13258-017-0525-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Yong SY, Raben TG, Lello L, Hsu SDH. Genetic architecture of complex traits and disease risk predictors. Sci Rep. 2020;10: 1–14. doi: 10.1038/s41598-019-56847-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. The importance of phase information for human genomics. Nat Rev Genet 2011 123. 2011;12: 215–223. doi: 10.1038/nrg2950 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Ahmed Mancy Mosa

13 Dec 2021

PONE-D-21-32657EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic TraitsPLOS ONE

Dear Dr. Grabherr,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please consider all comments of all reviewers including reviewer 2

Please submit your revised manuscript by Jan 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ahmed Mancy Mosa, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following in the Competing Interests section:

“All authors are either employed by and/or hold stock or stock options in SelfDecode. In addition, PGY has equity in Systomic Health LLC and Ethobiotics LLC. There are no other relevant activities or financial relationships which have influenced this work.”

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript "EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits" by Matt Carland et al. describes a simple tool for non-programmers to quickly screen VCF files for risk alleles. This paper makes an important contribution to the field because, at the moment, only EZTraits evaluates variant combinations by internally building and interpreting Lua scripts, which is especially useful for non-programmers. While the paper is interesting and presents novel results, I believe a few changes are required before publication:

-Authors should describe the benefits and drawbacks of using EZtraits, as well as its limitations in comparison to other similar software.

Reviewer #2: Carland et al describes their software (EZTraits) to evaluate complex traits using genotype data from microarrays. This is a pure software description.

The authors should evaluate their software in different data sets and show its performance in for those data sets using appropriate plots and statistics. They should also compare the performance of their software with other available softwares and discuss its pitfalls and strengths in the discussion section.

Reviewer #3: This manuscript is short and sweet. A bit short at times, but I chose to accept it anyway based on the purpose it presents. The time and age is totally here to democratize genetics and make the analysis tools more readably available to an upcoming generation of researchers, many with limited coding abilities. Even in a Life Science “advanced” country like Sweden we are severely lacking clinical geneticists and genetic counselors as a profession. We are as a nation producing excellent Bioinformaticians, but the education system is lengthy and not updated to current needs, so tools like EZTraits presents a very attractive solution to this problem, which would empower our biomedical community with tools to address the ever-growing mountains of data we are producing. Genomics is here to stay with unprecedented ability in lower costs and producing medical data, that needs to be interpreted in a streamlined fashion.

General:

The text parts are a bit weak. They could spend a bit more time elaborating on the introduction section and there is no conclusion section at all.

Authors provide source code which is great, and it looks good.

Specific comments

1) In the abstract row 5, they should avoid giving the example of microarrays. NGS and other methods are starting to dominate this industry, so why limit scope and applications use to microarray, which might make some potential reader less excited about the content.

2) In the introduction section, they give some nice examples of interesting SNPs and the complications of polygenic risk scores. However they do not give us readers a bigger overview of how many variants there really is in a VCF file, say from a WGS sequencing run. They could make the whole manuscript a bit more interesting by including a small section discussing this diversity between various individuals and why this tool they are presenting provides a very attractive solution to analyze it.

3) The introduction or (possibly the results/conclusion section) does not provide comparison to currently available tools and where is stands against them. Would love to get some pros and cons.

4) Polygenic risk score (PRS) is a risky task. They don’t specify challenges of this tool for users.

5) Also since their mission is to address the non-programmers market segment, they could also include a section in the introduction that addresses this issue. E.g. provide figures on how many researchers would be potential users today, but cannot since they lack programming skills, and how many backlogged bioinformaticians there are. Since the paper is coming from a company, they should easily be able to pull up this time of marketing material, which is important even here with a scientific audience to pinpoint the reason for the tool. Simply phrased, highlight what the contribution of this tool is for scientific community?

6) The Results section is well-written and no comments but could include parts from comment 4.

7) I would have loved to see a conclusion section detailing the future of this and highlighting the tools capabilities a bit more.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 May 9;17(5):e0259327. doi: 10.1371/journal.pone.0259327.r002

Author response to Decision Letter 0


3 Feb 2022

We thank the authors for their thoughtful comments and valuable suggestions on how to improve the manuscript. Below, you will find our detailed responses point by point.

Reviewer #1: The manuscript "EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits" by Matt Carland et al. describes a simple tool for non-programmers to quickly screen VCF files for risk alleles. This paper makes an important contribution to the field because, at the moment, only EZTraits evaluates variant combinations by internally building and interpreting Lua scripts, which is especially useful for non-programmers. While the paper is interesting and presents novel results, I believe a few changes are required before publication:

-Authors should describe the benefits and drawbacks of using EZtraits, as well as its limitations in comparison to other similar software.

Response: thank you, we added a discussion section in which we detail the scenarios in which EZTraits is useful, and scenarios in which it is not.

Reviewer #2: Carland et al describes their software (EZTraits) to evaluate complex traits using genotype data from microarrays. This is a pure software description.

The authors should evaluate their software in different data sets and show its performance in for those data sets using appropriate plots and statistics. They should also compare the performance of their software with other available softwares and discuss its pitfalls and strengths in the discussion section.

Response: thank you, we clarified the test to better define the exact problem that EZTraits solves. We also added a discussion section in which we point out the differences to other tools, which address different problems

Reviewer #3: This manuscript is short and sweet. A bit short at times, but I chose to accept it anyway based on the purpose it presents. The time and age is totally here to democratize genetics and make the analysis tools more readably available to an upcoming generation of researchers, many with limited coding abilities. Even in a Life Science “advanced” country like Sweden we are severely lacking clinical geneticists and genetic counselors as a profession. We are as a nation producing excellent Bioinformaticians, but the education system is lengthy and not updated to current needs, so tools like EZTraits presents a very attractive solution to this problem, which would empower our biomedical community with tools to address the ever-growing mountains of data we are producing. Genomics is here to stay with unprecedented ability in lower costs and producing medical data, that needs to be interpreted in a streamlined fashion.

Response: thank you for your kind words, and we agree wholeheartedly!

General:

The text parts are a bit weak. They could spend a bit more time elaborating on the introduction section and there is no conclusion section at all.

Response: agreed, we clarified the introduction and added a discussion section (see below).

Authors provide source code which is great, and it looks good.

Specific comments

1) In the abstract row 5, they should avoid giving the example of microarrays. NGS and other methods are starting to dominate this industry, so why limit scope and applications use to microarray, which might make some potential reader less excited about the content.

Response: thank you for catching that, we fixed it!

2) In the introduction section, they give some nice examples of interesting SNPs and the complications of polygenic risk scores. However they do not give us readers a bigger overview of how many variants there really is in a VCF file, say from a WGS sequencing run. They could make the whole manuscript a bit more interesting by including a small section discussing this diversity between various individuals and why this tool they are presenting provides a very attractive solution to analyze it.

Response: good point, we added some numbers to the results section to highlight that even for millions of SNPs, the tool only takes a few minutes.

3) The introduction or (possibly the results/conclusion section) does not provide comparison to currently available tools and where is stands against them. Would love to get some pros and cons.

Response: agreed, we now discuss the scope of EZTraits versus the scope of other tools in detail in the newly added discussion section.

4) Polygenic risk score (PRS) is a risky task. They don’t specify challenges of this tool for users.

Response: thank you for pointing out that the phrasing was misleading, we edited the text. PRS is indeed risky business, because results from one cohort or population is not guaranteed. Our tool, on the other hand, targets phenotypes or diseases for which the mechanisms are known – or at least better understood - albeit they can be poly-genic or based on haplotypes rather than individual SNPs.

5) Also since their mission is to address the non-programmers market segment, they could also include a section in the introduction that addresses this issue. E.g. provide figures on how many researchers would be potential users today, but cannot since they lack programming skills, and how many backlogged bioinformaticians there are. Since the paper is coming from a company, they should easily be able to pull up this time of marketing material, which is important even here with a scientific audience to pinpoint the reason for the tool. Simply phrased, highlight what the contribution of this tool is for scientific community?

Response: while we appreciate your suggestion, we strongly prefer not to use any data from our marketing department, since we feel that whatever numbers they give us are way overly optimistic.

6) The Results section is well-written and no comments but could include parts from comment 4.

7) I would have loved to see a conclusion section detailing the future of this and highlighting the tools capabilities a bit more.

Response: this is a great suggestion, thank you! We added a Discussion section that explains the tool’s capabilities and what specific problem it solves.

Attachment

Submitted filename: Detailed_response.docx

Decision Letter 1

Ahmed Mancy Mosa

28 Feb 2022

PONE-D-21-32657R1EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic TraitsPLOS ONE

Dear Dr. Grabherr,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please, carefully, consider all comments

Please submit your revised manuscript by Apr 14 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ahmed Mancy Mosa, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #3: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript titled "EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits" has some merit and falls within the scope of PLOS One. However, the authors are yet to address the issues raised in the previous submission. The authors did not provide clear/well-organized discussion and conclusion sections. The discussion should NOT be a part of the introduction section, and to make the discussion section engaging; the authors should include the following information:

-The major findings of your study

-How these findings relate to what others have done

-Limitations of your findings

-An explanation for any surprising, unexpected, or inconclusive results

-Suggestions for further research

Reviewer #3: Thanks for addressing the review comments in a very smart, efficient, and correct manor. The manuscript has matured a lot, gives more background, scope and utilisation of the important tool that has been described and presented here. The extended introduction and new discussion section add huge value to the manuscript, and it was both educational and enjoyable to read it. Looking forward to downloading the pdf and reading post-publication date.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 May 9;17(5):e0259327. doi: 10.1371/journal.pone.0259327.r004

Author response to Decision Letter 1


14 Apr 2022

Reviewer comments and point-by-point responses

We again thank all the reviewers for their thoughtful comments and valuable suggestions. We fixed all the remaining issues, please see for detailed responses below.

Reviewer #1: The manuscript titled "EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits" has some merit and falls within the scope of PLOS One. However, the authors are yet to address the issues raised in the previous submission. The authors did not provide clear/well-organized discussion and conclusion sections. The discussion should NOT be a part of the introduction section, and to make the discussion section engaging; the authors should include the following information:

-The major findings of your study

Response: we clearly highlight this in the discussion now. We realize that we did not conduct large scale usability studies, yet the overwhelming positive response from geneticists and biologists in our organization – most of which did not even know that they were writing code that was executed by a computer verbatim – leads us to expect that this approach to human-machine interfaces will benefit research in many areas, and will ultimately bridge the gap between (even vehement) non-programmers and computers. Our ultimate goal is to make phenotype and trait queries – along with hypothesis testing and verification – as simple as a Google search.

-How these findings relate to what others have done

Response: we added some text to the discussion section; in short, there are two common responses: (a) throw experimental evidence at the problem, such as PCR; or (b) hire a bioinformatics student, Post-Doc, etc. to mine this information from already existing data. Problem with point 1 is that these experiments will fail in capturing combinatorial effects; and (b) frankly, what do you do with the bioinformatician once she or he ran bcftools, did some more scripting, did the analysis? This is the way many researchers in biology and medicine think, so these bioinformaticians never get hired.

I do have to point out that Sweden - also Switzerland, and some institutions in the US - have found ingenious solutions to this problem, yet here we are, offering an alternative, which will by no means replace these efforts. We just lower the bar for non-programmers.

-Limitations of your findings

Response: We added a sentence about the limitations to the discussion. The biggest shortcoming of EZTraits is that it is not aware of phasing, hence haplotype-dependent traits elude the scope of the program as it is. However, this was a conscious decision taken by the developers of EZTraits to defer that feature until we are certain that the user will not inadvertently enter incorrect statements or code.

-An explanation for any surprising, unexpected, or inconclusive results

Response: yeah, we were surprised to see biologists and medical researchers writing computer code – without them even knowing it. That is difficult to quantify though, so we will refer to future research, in that...

-Suggestions for further research

Response: we added some text at the end of the discussion. Our ultimate goal is that also computer-phobe researchers will have direct access to complex analyses. That might sound a bit Utopian, yet it is achievable. We start with structured language – which is a step up from programming language – will inch our way into more natural language, which implies that we have a real dialog with the machine, and will end up with something as easy to use as a smartphone. We have to take one step at a time, and EZTraits is a test balloon in this journey.

Attachment

Submitted filename: Point-by-point.docx

Decision Letter 2

Ahmed Mancy Mosa

25 Apr 2022

EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits

PONE-D-21-32657R2

Dear Dr. Grabherr,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ahmed Mancy Mosa, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: It was a pleasure to read because the authors provided sufficient information about their study's major findings and limitations. The manuscript deserves to be published in PLos One because it will aid researchers in performing rapid genetic analysis and hypothesis testing, regardless of programming experience or technical background.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Ahmed Mancy Mosa

27 Apr 2022

PONE-D-21-32657R2

EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits

Dear Dr. Grabherr:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ahmed Mancy Mosa

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Detailed_response.docx

    Attachment

    Submitted filename: Point-by-point.docx

    Data Availability Statement

    The EZTraits source code is available under the MIT license from https://github.com/selfdecode/rd-eztraits.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES