Comprehensive and User-Analytics-Friendly Cancer Patient Database for Physicians and Researchers

Ali Firooz; Avery T Funkhouser; Julie C Martin; W Jeffery Edenfield; Homayoun Valafar; Anna V Blenda

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Feb 1:arXiv:2302.01337v1. [Version 1]

Comprehensive and User-Analytics-Friendly Cancer Patient Database for Physicians and Researchers

Ali Firooz ¹, Avery T Funkhouser ², Julie C Martin ³, W Jeffery Edenfield ⁴, Homayoun Valafar ⁵, Anna V Blenda ⁶

PMCID: PMC9915752 PMID: 36776819

Abstract

Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze fashion. To address the need, a relational database has been developed integrating status of cancer-critical gene mutations, serum galectin profiles, serum and tumor glycomic profiles, with clinical, demographic, and lifestyle data points of individual cancer patients. The database, as a backend, provides physicians and researchers with a single, easily accessible repository of cancer profiling data to aid-in and enhance individualized treatment.

Our interactive database allows care providers to amalgamate cohorts from these groups to find correlations between different data types with the possibility of finding “molecular signatures” based upon a combination of genetic mutations, galectin serum levels, glycan compositions, and patient clinical data and lifestyle choices. Our project provides a framework for an integrated, interactive, and growing database to analyze molecular and clinical patterns across cancer stages and subtypes and provides opportunities for increased diagnostic and prognostic power.

Keywords: Cancer, Database, MySQL, Cancer-Critical Gene, Mutation, Galectin, Glycan, Artificial Intelligence, Machine Learning

I. Introduction

A. Cancer Stats and Why New Strategies are Needed

Cancer is a significant burden and is the second leading cause of death after heart disease in the United States. In 2022 alone, the number of new cancer cases is projected to be close to 2 million with 600,000 deaths [1]. It is predicted that by the year 2030, the national costs of cancer care will balloon to a quarter of a trillion dollars [2]. Strategies to enable enhanced secondary prevention of cancer – additional ways to measure and detect cancer at earlier stages – are profoundly needed, as cancer prognoses are worse the later the cancer is found [3].

B. Different Approaches to Cancer Screening

Glycomics has recently shown distinct potential in the area of cancer detection. Glycomics is the systematic study of all glycan (i.e., sugar) structures of a certain cell type or tissue. It is a promising strategy aimed at differentiating and diagnosing cancer [4]. Additionally, alterations in glycosylation have been correlated with tumor initiation, progression, and metastasis [5]. The tumor-associated glycans or glycoproteins are secreted or membrane-shed in the blood and can ultimately be used as markers for tumor presence [6].

In addition to using glycans as markers of cancer, proteins that interact with glycan epitopes can also serve as possible cancer signals. Galectins are a subset of the lectin family of proteins and share a high affinity β-galactoside binding domain which allows them to interact with other proteins by binding to glycosylation sites [7]. Expression of several galectins is known to be dysregulated in tumor cells and to have a variety of contributory and inhibitory roles on tumor cells and cancer environment [8–22]. As some of these galectins are significantly dysregulated in cancer, they have value as markers of cancer detection and progression [23].

Using galectin and glycomic profiling of cancer patients would be similar to other profiling methods already used in clinical practice. For instance, genomic and epigenomic profiling has shown that specific DNA methylation which have led to better clinical outcomes for glioblastoma multiforme [24]. Metabolomic profiling, based on tumor cells’ complex metabolic requirements, has also been used as a hallmark for cancer, such as upregulation of glucose and glutamine metabolism and convergence of many metabolic pathways on glucose and glutamine to provide a more nutritious environment for tumor growth [25].

We predict galectin and glycomic profiles can be similarly used as other benchmarks for cancer progression, and potentially diagnosis and treatment. Furthermore, since genetic profile and a patient’s individual lifestyle factors also play a role in oncogenesis, we predict a “stage signature” or “molecular subtype signature” can be generated based upon a combination of data such as glycan signatures, galectin serum levels, genetic mutations, clinical data, lifestyle choices, etc.

C. Background Importance of Using Multifactorial Data

A model created by Zuo et. al. integrated gene expression, gene mutation, and chemical structure, an approach that showed to greatly enhance predictive performance of cancer drug responses [26]. Similarly, a model by Xie et al. demonstrated how multifactorial input data can be used to demonstrate the mechanisms of resistance to immunotherapy [27]. These studies demonstrate the utility of combining unique data types to discover correlations that can be implemented for care. Precision medicine has recently been at the forefront of new medical approaches, and it has been shown that attempting new discoveries as such must take advantage of new technologies and computational methodologies [28, 29]. Furthermore, it has been recommended that health professionals should lend a greater emphasis on combining biological, social, and environmental metrics to develop precision interventions to patients and the population [30].

D. Why Combining Data in a Convenient Manner is Critical

“Big data” in healthcare refers to the exponentially increasing amount of health data that is collected and stored. The advent of this much information being at the access of providers’ fingertips demonstrates a goldmine of potential that can be used for medical discovery [31]. However, one caveat to using large amounts of data and new technological approaches is that presentation in easy-to-use manner for physicians is of utmost importance, otherwise only marginal improvements have been demonstrated [32–35]. West et al. further suggest that there is significant potential for discovery using the advent of increasing healthcare data so long as the data are effectively managed, presented, and visualized [36].

E. Aim

The goal of our project was the enhancement of the cancer patient clinical database with molecular data, for better diagnosis and clinical trials/treatment, and additional revenue. Our interactive database allows providers to amalgamate cohorts from their specific patient population to find correlations between different data types with the possibility of, for example, describing a “stage signature” based upon a combination of glycan signatures, galectin serum levels, genetic mutations, and lifestyle choices. Our project demonstrates, to the best of our knowledge, the first time such an interactive and patient-centered database has been made. We believe this utility, such as the ability to analyze patterns (signatures) across stages in our database, will provide an opportunity for increased diagnostic and prognostic power and perhaps change the outlook of what is possible with current patient data at point-of-care.

II. Materials & Methods

A. Data Categories

1). Reported Data

a). Patient Data –

Demographic, lifestyle, and disease-specific data were collected from the Prisma Health Cancer Institute Biorepository (PHCIB). The reported information included age, race, sex, smoking status, as well as tumor information such as TNM staging, grade, histology, and biopsy site.

b). Samples –

The samples obtained from the PHCIB included blood serum from cancer patients and healthy control group, as well as malignant tissue and surrounding benign tissue from surgical resections performed on cancer patients.

2). Produced Data

a). Galectins –

The serum concentration of galectins −1, −3, −7, −8 and −9 were determined as previously described [23].

b). Glycans –

Mass spectra were acquired by Bruker ultrafleXtreme MALDI TOF/TOF (Bruker, Billerica, MA). Data were collected from a mass range of 500 to 5,000 Daltons with positive mode. Following acquisition, masses were automatically identified with detection limits for peaks at S/N>3. This allowed the derivation of serum and tissue glycan composition levels of healthy controls and patients with cancer. Glycomic profiles were then created for each patient.

c). Gene Mutations –

Genetic mutation data for 74 patients were provided by the PHCIB. The mutation status of 50 cancer-critical genes was derived by the Ion Ampliseq Cancer Hotspot Panel v2 as previously described [37]. The Ampliseq panel covers 50 oncogene and tumor suppressor genes, amplifying 207 amplicons covering approximately 2,800 COSMIC mutations. These 50 genes were then sorted into eleven groups of different cellular pathways according to the gene product role.

B. Database

1). Storage

The patient data were then incorporated into a Relational Database Management System (RDBMS) using Structured Query Language (MySQL) server version 8.0.31 on a Linux machine running Ubuntu 20.04.

The most widely used database system worldwide is RDBMS, as it provides a dependable method of storing and retrieving large amounts of data while offering a combination of system performance and ease of implementation. In addition, RDBMS bases the structure of its data on the Atomicity, Consistency, Isolation, and Durability (ACID) model to ensure and maintain the security, accuracy, integrity, and consistency of the data [38]. This implementation makes the database scalable for future deployment on a more resourceful server such as IFESTOS (ifestos.cse.sc.edu) or Amazon Web Services (AWS).

The primary database relation is the basic patient information identified by anonymous IDs implemented as the primary key. However, due to the complication of the data being stored in the database, several relations were in need for a more robust primary key to be uniquely identifiable. Therefore, a combination of attributes acting as the primary key were chosen. Furthermore, relations such as “CancerInfo”, holding patient cancer information, are all connected to the primary relation “PersonInfo” by utilizing foreign keys that reference the primary key attribute. Additionally, several triggers were incorporated in the database to validate the data quality.

2). Interface

The initial interface to the database was facilitated through an intuitive web application that serves as a database administration tool, “phpMyAdmin”, version 4.9.5 residing on top of “Nginx” server version 1.18.0.

To further ease the process of importing new data and minimize human mistakes, a Python script was developed. Even though MySQL offers an option to import Excel files directly into the database, doing so will pose several limitations on the user. Using our script will eliminate these limitations and ensures that the correct data are being imported no matter the structure of the Excel file provided. The importing Python script utilizes several libraries including “Pandas”, “NumPy”, and “re” (i.e., regular expression) to read the Excel files, parse its data into relevant relations, validate the type of the data that is being imported, and eventually create a backup of the database before adding the data into the database.

3). Data Visualization

A Python script was developed to connect to the database and perform data analysis and visualizations. The database was queried by the script with the help of “PyMySQL” library and the result were then further manipulated and plotted using “Pandas”, “NumPy” and “Matplotlib”.

III. Results

The designed database holds 8 relations and 164 attributes in total, for each patient. It currently stores data for 500 unidentifiable patients and will continue to grow as new patient’s data become available. The designed Entity Relationship (ER) diagram is seen in Fig. 1, which resulted in the Relational Database shown in Fig. 2 and Fig. 3.

Fig. 1. — Entity Relationship (ER) diagram

Fig. 2. — The database schema showing primary and foreign keys

Fig. 3. — The database schema showing primary and foreign keys for Genes and nGlycanComposition

A. Data Statistics

The dispersion of patient’s birth year over sex is shown in Fig. 4, and it is noteworthy that the patient age distribution included as early as year 1920 and included year 2020 (2 years of age as of the writing of this paper). with the most data collected for females born in 1948. The patient’s cancer type is presented in Fig. 5, and as most of the patients are females, additional data including males is being imported into the database. The patients’ smoking history is presented in Fig. 6 which is dominated by females especially for the “Never smoked” category. In the following Fig. 7–9, cancer data distribution is shown against cancer stage and smoking history.

Fig. 5. — Cancer types categorized based on sex.

Fig. 6. — Smoking history percentages categorized by sex

Fig. 7. — Cancer type distribution by stage

Fig. 9. — Smoking history data distribution by cancer stage

B. Data Visualization

Unison of MySQL and Python enabled the organization of all patients’ information, their relative cancer information; the generation of glycomic, galectin, genetic, and lifestyle profiles.

Our interactive database allows providers to amalgamate cohorts from different groups to find correlations between different data types with the possibility of, for example, describing a “stage signature” based upon a combination of glycan signatures, galectin serum levels, genetic mutations, and lifestyle choices to create among our patient population.

Fig. 10–12 illustrate the database’s use in searching for specific molecular analysis information. Fig. 10 demonstrates evaluation of patient’s glycomic data by cancer stage. Fig. 11 shows mutation frequencies of a gene of interest while Fig. 12 displays serum levels of a protein that is under significant investigation in the oncological sphere.

Fig. 12. — Patient galectin profiling data. Example of results of database search by serum concentration of galectin-9 and breast cancer stage (I-IV)

Fig. 11. — Example of *TP53* gene mutation frequency

IV. Discussion and Conclusions

This work has clearly demonstrated the need for a centralized point of data access that combines different sources of data. Therefore, several databases were merged into a single access point to help ease the accessibility and use of the data. Furthermore, a 2000 line python script was developed to check, extract, and verify the integrity of data before importing to the database. Our interactive database allows providers to amalgamate cohorts from these groups to find correlations between different data types with the possibility of, for example, describing a “stage signature” based upon a combination of glycan signatures, galectin serum levels, genetic mutations, and lifestyle choices to create among our patient population.

In addition, the interface to the database was facilitated through an intuitive web application that serves as a database administration tool. We have also successfully integrated various types of molecular data into an interactive cancer patient database.

Such databases will enable new discoveries through the incorporation of data analytics techniques. We believe the utility, such as the ability to analyze patterns (signatures) across stages in our database, will provide an opportunity for increased diagnostic and prognostic power. In addition, these molecular signatures may have practical application as a non-invasive diagnostic tool for tumor stage refinement. The integrated data in the form of a database will usher the use of artificial intelligence (AI) and Machine Learning in patient diagnostics and optimal treatment [39–42]. AI and ML techniques have been incorporated widely in the variety of domains from hardware to software and yielded promising results specially to improve healthcare. They have been used to automate hardware design [43], medical image segmentation [44], identification of the optimal treatment [45, 46], and maximizing the impact of drug dosage administration [47]. When combined with smart and wearable devices [40–42], a full picture of a person’s health can be obtained to improve the quality of life and improvement of patient outcome.

Fig. 8. — Cancer type distribution by smoking history

Acknowledgment

Research reported in this publication was supported by a grant from the National Institutes of Health (5R01LM011648 and P20 RR-01646100). The authors would also like to thank Basil Chaballout for contributing to the project.

Contributor Information

Ali Firooz, College of Engineering and Computing, University of South Carolina, Columbia, SC, USA.

Avery T. Funkhouser, School of Medicine Greenville, University of South Carolina, Greenville, SC, USA

Julie C. Martin, Prisma Health Cancer Institute, Greenville, SC, USA

W. Jeffery Edenfield, Prisma Health Cancer Institute, Greenville, SC, USA.

Homayoun Valafar, College of Engineering and Computing, University of South Carolina, Columbia, SC, USA.

Anna V. Blenda, School of Medicine Greenville, University of South Carolina, Prisma Health Cancer Institute, Greenville, SC, USA

References

[1].Siegel R. L., Miller K. D., Fuchs H. E., and Jemal A., “Cancer statistics, 2022.,” CA: a cancer journal for clinicians, vol. 72, no. 1, pp. 7–33, Jan. 2022, doi: 10.3322/caac.21708. [DOI] [PubMed] [Google Scholar]
[2].Mariotto A. B., Enewold L., Zhao J., Zeruto C. A., and Robin Yabroff K., “Medical Care Costs Associated with Cancer Survivorship in the United States,” Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, vol. 29, no. 7, pp. 1304–1312, Jul. 2020, doi: 10.1158/1055-9965.EPI-19-1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Milosevic M., Jankovic D., Milenkovic A., and Stojanov D., “Early diagnosis and detection of breast cancer.,” Technology and health care : official journal of the European Society for Engineering and Medicine, vol. 26, no. 4, pp. 729–759, 2018, doi: 10.3233/THC-181277. [DOI] [PubMed] [Google Scholar]
[4].Drake R. R., “Glycosylation and cancer: moving glycomics to the forefront,” Advances in cancer research, vol. 126, pp. 1–10, 2015, doi: 10.1016/BS.ACR.2014.12.002. [DOI] [PubMed] [Google Scholar]
[5].Vajaria B. N. and Patel P. S., “Glycosylation: a hallmark of cancer?,” Glycoconjugate journal, vol. 34, no. 2, pp. 147–156, Apr. 2017, doi: 10.1007/s10719-016-9755-2. [DOI] [PubMed] [Google Scholar]
[6].Silsirivanit A., “Glycosylation markers in cancer.,” Advances in clinical chemistry, vol. 89, pp. 189–213, Jan. 2019, doi: 10.1016/bs.acc.2018.12.005. [DOI] [PubMed] [Google Scholar]
[7].Barondes S. H., Cooper D. N. W., Gitt M. A., and Leffler H., “Galectins. Structure and function of a large family of animal lectins.,” The Journal of biological chemistry, vol. 269, no. 33, pp. 20807–10, Aug. 1994, doi: 10.1016/s0021-9258(17)31891-4. [DOI] [PubMed] [Google Scholar]
[8].Liu F.-T. and Rabinovich G. A., “Galectins as modulators of tumour progression.,” Nature reviews. Cancer, vol. 5, no. 1, pp. 29–41, Jan. 2005, doi: 10.1038/nrc1527. [DOI] [PubMed] [Google Scholar]
[9].Ebrahim A. H. et al. , “Galectins in cancer: Carcinogenesis, diagnosis and therapy,” Annals of Translational Medicine, vol. 2, no. 9, Sep. 2014, doi: 10.3978/j.issn.2305-5839.2014.09.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Girotti M. R., Salatino M., Dalotto-Moreno T., and Rabinovich G. A., “Sweetening the hallmarks of cancer: Galectins as multifunctional mediators of tumor progression,” Journal of Experimental Medicine, vol. 217, no. 2, Feb. 2020, doi: 10.1084/jem.20182041. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Markowska A. I., Liu F. T., and Panjwani N., “Galectin-3 is an important mediator of VEGF- and bFGF-mediated angiogenic response,” Journal of Experimental Medicine, vol. 207, no. 9, pp. 1981–1993, Aug. 2010, doi: 10.1084/jem.20090121. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Chen C., Duckworth C. A., Zhao Q., Pritchard D. M., Rhodes J. M., and Yu L. G., “Increased circulation of galectin-3 in cancer induces secretion of metastasis-promoting cytokines from blood vascular endothelium,” Clinical Cancer Research, vol. 19, no. 7, pp. 1693–1704, Apr. 2013, doi: 10.1158/1078-0432.CCR-12-2940. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Campion C. G., Labrie M., Lavoie G., and St-Pierre Y., “Expression of Galectin-7 Is Induced in Breast Cancer Cells by Mutant p53,” PLoS ONE, vol. 8, no. 8, p. 72468, Aug. 2013, doi: 10.1371/journal.pone.0072468. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Kim S. J., Hwang J. A., Ro J. Y., Lee Y. S., and Chun K. H., “Galectin-7 is epigenetically-regulated tumor suppressor in gastric cancer,” Oncotarget, vol. 4, no. 9, pp. 1461–1471, Aug. 2013, doi: 10.18632/oncotarget.1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Barrow H., Rhodes J. M., and Yu L.-G., “The role of galectins in colorectal cancer progression.,” International journal of cancer, vol. 129, no. 1, pp. 1–8, Jul. 2011, doi: 10.1002/ijc.25945. [DOI] [PubMed] [Google Scholar]
[16].Chen C., Duckworth C. A., Fu B., Pritchard D. M., Rhodes J. M., and Yu L. G., “Circulating galectins-2,−4 and-8 in cancer patients make important contributions to the increased circulation of several cytokines and chemokines that promote angiogenesis and metastasis,” British Journal of Cancer, vol. 110, no. 3, pp. 741–752, Feb. 2014, doi: 10.1038/bjc.2013.793. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Hirashima M. et al. , “Galectin-9 suppresses tumor metastasis by blocking adhesion to endothelium and extracellular matrices,” Glycobiology, vol. 18, no. 9, pp. 735–744, Sep. 2008, doi: 10.1093/glycob/cwn062. [DOI] [PubMed] [Google Scholar]
[18].Irie A. et al. , “Galectin-9 as a prognostic factor with antimetastatic potential in breast cancer,” Clinical Cancer Research, vol. 11, no. 8, pp. 2962–2968, Apr. 2005, doi: 10.1158/1078-0432.CCR-04-0861. [DOI] [PubMed] [Google Scholar]
[19].Barrow H. et al. , “Serum galectin-2, −4, and −8 are greatly increased in colon and breast cancer patients and promote cancer cell adhesion to blood vascular endothelium,” Clinical Cancer Research, vol. 17, no. 22, pp. 7035–7046, Nov. 2011, doi: 10.1158/1078-0432.CCR-11-1462. [DOI] [PubMed] [Google Scholar]
[20].Topcu T. O. et al. , “The clinical importance of serum galectin-3 levels in breast cancer patients with and without metastasis,” Journal of Cancer Research and Therapeutics, vol. 14, no. 10, pp. S583–S586, 2018, doi: 10.4103/0973-1482.176425. [DOI] [PubMed] [Google Scholar]
[21].Saussez S. et al. , “The determination of the levels of circulating galectin-1 and −3 in HNSCC patients could be used to monitor tumor progression and/or responses to therapy,” Oral Oncology, vol. 44, no. 1, pp. 86–93, Jan. 2008, doi: 10.1016/j.oraloncology.2006.12.014. [DOI] [PubMed] [Google Scholar]
[22].Watanabe M. et al. , “Clinical significance of circulating galectins as colorectal cancer markers.,” Oncology reports, vol. 25, no. 5, pp. 1217–26, May 2011, doi: 10.3892/or.2011.1198. [DOI] [PubMed] [Google Scholar]
[23].Blair B. B. et al. , “Increased Circulating Levels of Galectin Proteins in Patients with Breast, Colon, and Lung Cancer,” Cancers, vol. 13, no. 19, p. 4819, Sep. 2021, doi: 10.3390/cancers13194819. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Tsai H.-C. and Baylin S. B., “Cancer epigenetics: linking basic biology to clinical medicine.,” Cell research, vol. 21, no. 3, pp. 502–17, Mar. 2011, doi: 10.1038/cr.2011.24. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Boroughs L. K. and Deberardinis R. J., “Metabolic pathways promoting cancer cell survival and growth,” Nature cell biology, vol. 17, no. 4, pp. 351–359, Apr. 2015, doi: 10.1038/NCB3124. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Zuo Z., Wang P., Chen X., Tian L., Ge H., and Qian D., “SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures,” BMC bioinformatics, vol. 22, no. 1, pp. 1–16, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Xie F. et al. , “Multifactorial Deep Learning Reveals Pan-Cancer Genomic Tumor Clusters with Distinct Immunogenomic Landscape and Response to Immunotherapy,” Clinical Cancer Research, vol. 26, no. 12, pp. 2908–2920, Jun. 2020, doi: 10.1158/1078-0432.CCR-19-1744. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Carracedo-Reboredo P. et al. , “A review on machine learning approaches and trends in drug discovery,” Computational and Structural Biotechnology Journal, vol. 19, pp. 4538–4558, 2021, doi: 10.1016/j.csbj.2021.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Collins F. S. and Varmus H., “A new initiative on precision medicine,” New England journal of medicine, vol. 372, no. 9, pp. 793–795, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Khoury M. J. and Galea S., “Will Precision Medicine Improve Population Health?,” JAMA, vol. 316, no. 13, p. 1357, Oct. 2016, doi: 10.1001/jama.2016.12260. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Murdoch T. B. and Detsky A. S., “The Inevitable Application of Big Data to Health Care,” JAMA, vol. 309, no. 13, p. 1351, Apr. 2013, doi: 10.1001/jama.2013.393. [DOI] [PubMed] [Google Scholar]
[32].Caban J. J. and Gotz D., “Visual analytics in healthcare – opportunities and research challenges,” Journal of the American Medical Informatics Association, vol. 22, no. 2, pp. 260–262, Mar. 2015, doi: 10.1093/jamia/ocv006. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Chaballout B. H., Shaw R. J., and Reuter-Rice K., “The SMART healthcare solution,” Advances in Precision Medicine, vol. 2, no. 1, 2017. [Google Scholar]
[34].Heisey-Grove D., Danehy L.-N., Consolazio M., Lynch K., and Mostashari F., “A National Study of Challenges to Electronic Health Record Adoption and Meaningful Use,” Medical Care, vol. 52, no. 2, pp. 144–148, Feb. 2014, doi: 10.1097/MLR.0000000000000038. [DOI] [PubMed] [Google Scholar]
[35].Lau F., Price M., Boyd J., Partridge C., Bell H., and Raworth R., “Impact of electronic medical record on physician practice in office settings: a systematic review,” BMC Med Inform Decis Mak, vol. 12, no. 1, p. 10, Dec. 2012, doi: 10.1186/1472-6947-12-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].West V. L., Borland D., and Hammond W. E., “Innovative information visualization of electronic health record data: a systematic review,” Journal of the American Medical Informatics Association, vol. 22, no. 2, pp. 330–339, Mar. 2015, doi: 10.1136/amiajnl-2014-002955. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Funkhouser A. T. et al. , “KIT Mutations Correlate with Higher Galectin Levels and Brain Metastasis in Breast and Non-Small Cell Lung Cancer,” Cancers, vol. 14, no. 11, p. 2781, Jun. 2022, doi: 10.3390/cancers14112781. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Li D., Ota K., Zhong Y., Dong M., Tang Y., and Qiu J., “Towards High-Efficient Transaction Commitment in a Virtualized and Sustainable RDBMS,” IEEE Trans. Sustain. Comput., vol. 6, no. 3, pp. 507–521, Jul. 2021, doi: 10.1109/TSUSC.2019.2890841. [DOI] [Google Scholar]
[39].Odhiambo C. O., Cole C. A., Torkjazi A., and Valafar H., “State Transition Modeling of the Smoking Behavior Using LSTM Recurrent Neural Networks,” in 2019. International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, Dec. 2019, pp. 898–904. doi: 10.1109/CSCI49370.2019.00171. [DOI] [Google Scholar]
[40].Odhiambo C., Wright P., Corbett C., and Valafar H., “MedSensor: Medication Adherence Monitoring Using Neural Networks on Smartwatch Accelerometer Sensor Data,” 2021, doi: 10.48550/ARXIV.2105.08907. [DOI] [Google Scholar]
[41].Odhiambo C. O., Saha S., Martin C. K., and Valafar H., “Human Activity Recognition on Time Series Accelerometer Sensor Data using LSTM Recurrent Neural Networks,” 2022, doi: 10.48550/ARXIV.2206.07654. [DOI] [Google Scholar]
[42].Cole C. A., Powers S., Tomko R. L., Froeliger B., and Valafar H., “Quantification of Smoking Characteristics Using Smartwatch Technology: Pilot Feasibility Study of New Technology,” JMIR Form Res, vol. 5, no. 2, p. e20464, Feb. 2021, doi: 10.2196/20464. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Bagheri Rajeoni A., “Analog Circuit Sizing Using Machine Learning Based Transistorcircuit Model,” PhD Thesis, 2021. [Online]. Available: https://www.proquest.com/dissertations-theses/analog-circuit-sizing-using-machine-learning/docview/2543477165/se-2
[44].Zhao L., Odigwe B., Lessner S., Clair D., Mussa F., and Valafar H., “Automated Analysis of Femoral Artery Calcification Using Machine Learning Techniques,” in 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, Dec. 2019, pp. 584–589. doi: 10.1109/CSCI49370.2019.00110. [DOI] [Google Scholar]
[45].Odigwe B. E., Spinale F. G., and Valafar H., “Application of Machine Learning in Early Recommendation of Cardiac Resynchronization Therapy,” 2021, doi: 10.48550/ARXIV.2109.06139. [DOI] [Google Scholar]
[46].Odigwe B. E., Rajeoni A. B., Odigwe C. I., Spinale F. G., and Valafar H., “Application of machine learning for patient response prediction to cardiac resynchronization therapy,” in Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook Illinois, Aug. 2022, pp. 1–4. doi: 10.1145/3535508.3545513. [DOI] [Google Scholar]
[47].Odigwe B. E., Eyitayo J. S., Odigwe C. I., and Valafar H., “Modelling of Sickle Cell Anemia Patients Response to Hydroxyurea using Artificial Neural Networks,” 2019, doi: 10.48550/ARXIV.1911.10978. [DOI] [Google Scholar]

[R1] [1].Siegel R. L., Miller K. D., Fuchs H. E., and Jemal A., “Cancer statistics, 2022.,” CA: a cancer journal for clinicians, vol. 72, no. 1, pp. 7–33, Jan. 2022, doi: 10.3322/caac.21708. [DOI] [PubMed] [Google Scholar]

[R2] [2].Mariotto A. B., Enewold L., Zhao J., Zeruto C. A., and Robin Yabroff K., “Medical Care Costs Associated with Cancer Survivorship in the United States,” Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, vol. 29, no. 7, pp. 1304–1312, Jul. 2020, doi: 10.1158/1055-9965.EPI-19-1534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Milosevic M., Jankovic D., Milenkovic A., and Stojanov D., “Early diagnosis and detection of breast cancer.,” Technology and health care : official journal of the European Society for Engineering and Medicine, vol. 26, no. 4, pp. 729–759, 2018, doi: 10.3233/THC-181277. [DOI] [PubMed] [Google Scholar]

[R4] [4].Drake R. R., “Glycosylation and cancer: moving glycomics to the forefront,” Advances in cancer research, vol. 126, pp. 1–10, 2015, doi: 10.1016/BS.ACR.2014.12.002. [DOI] [PubMed] [Google Scholar]

[R5] [5].Vajaria B. N. and Patel P. S., “Glycosylation: a hallmark of cancer?,” Glycoconjugate journal, vol. 34, no. 2, pp. 147–156, Apr. 2017, doi: 10.1007/s10719-016-9755-2. [DOI] [PubMed] [Google Scholar]

[R6] [6].Silsirivanit A., “Glycosylation markers in cancer.,” Advances in clinical chemistry, vol. 89, pp. 189–213, Jan. 2019, doi: 10.1016/bs.acc.2018.12.005. [DOI] [PubMed] [Google Scholar]

[R7] [7].Barondes S. H., Cooper D. N. W., Gitt M. A., and Leffler H., “Galectins. Structure and function of a large family of animal lectins.,” The Journal of biological chemistry, vol. 269, no. 33, pp. 20807–10, Aug. 1994, doi: 10.1016/s0021-9258(17)31891-4. [DOI] [PubMed] [Google Scholar]

[R8] [8].Liu F.-T. and Rabinovich G. A., “Galectins as modulators of tumour progression.,” Nature reviews. Cancer, vol. 5, no. 1, pp. 29–41, Jan. 2005, doi: 10.1038/nrc1527. [DOI] [PubMed] [Google Scholar]

[R9] [9].Ebrahim A. H. et al. , “Galectins in cancer: Carcinogenesis, diagnosis and therapy,” Annals of Translational Medicine, vol. 2, no. 9, Sep. 2014, doi: 10.3978/j.issn.2305-5839.2014.09.12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Girotti M. R., Salatino M., Dalotto-Moreno T., and Rabinovich G. A., “Sweetening the hallmarks of cancer: Galectins as multifunctional mediators of tumor progression,” Journal of Experimental Medicine, vol. 217, no. 2, Feb. 2020, doi: 10.1084/jem.20182041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Markowska A. I., Liu F. T., and Panjwani N., “Galectin-3 is an important mediator of VEGF- and bFGF-mediated angiogenic response,” Journal of Experimental Medicine, vol. 207, no. 9, pp. 1981–1993, Aug. 2010, doi: 10.1084/jem.20090121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Chen C., Duckworth C. A., Zhao Q., Pritchard D. M., Rhodes J. M., and Yu L. G., “Increased circulation of galectin-3 in cancer induces secretion of metastasis-promoting cytokines from blood vascular endothelium,” Clinical Cancer Research, vol. 19, no. 7, pp. 1693–1704, Apr. 2013, doi: 10.1158/1078-0432.CCR-12-2940. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Campion C. G., Labrie M., Lavoie G., and St-Pierre Y., “Expression of Galectin-7 Is Induced in Breast Cancer Cells by Mutant p53,” PLoS ONE, vol. 8, no. 8, p. 72468, Aug. 2013, doi: 10.1371/journal.pone.0072468. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Kim S. J., Hwang J. A., Ro J. Y., Lee Y. S., and Chun K. H., “Galectin-7 is epigenetically-regulated tumor suppressor in gastric cancer,” Oncotarget, vol. 4, no. 9, pp. 1461–1471, Aug. 2013, doi: 10.18632/oncotarget.1219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Barrow H., Rhodes J. M., and Yu L.-G., “The role of galectins in colorectal cancer progression.,” International journal of cancer, vol. 129, no. 1, pp. 1–8, Jul. 2011, doi: 10.1002/ijc.25945. [DOI] [PubMed] [Google Scholar]

[R16] [16].Chen C., Duckworth C. A., Fu B., Pritchard D. M., Rhodes J. M., and Yu L. G., “Circulating galectins-2,−4 and-8 in cancer patients make important contributions to the increased circulation of several cytokines and chemokines that promote angiogenesis and metastasis,” British Journal of Cancer, vol. 110, no. 3, pp. 741–752, Feb. 2014, doi: 10.1038/bjc.2013.793. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Hirashima M. et al. , “Galectin-9 suppresses tumor metastasis by blocking adhesion to endothelium and extracellular matrices,” Glycobiology, vol. 18, no. 9, pp. 735–744, Sep. 2008, doi: 10.1093/glycob/cwn062. [DOI] [PubMed] [Google Scholar]

[R18] [18].Irie A. et al. , “Galectin-9 as a prognostic factor with antimetastatic potential in breast cancer,” Clinical Cancer Research, vol. 11, no. 8, pp. 2962–2968, Apr. 2005, doi: 10.1158/1078-0432.CCR-04-0861. [DOI] [PubMed] [Google Scholar]

[R19] [19].Barrow H. et al. , “Serum galectin-2, −4, and −8 are greatly increased in colon and breast cancer patients and promote cancer cell adhesion to blood vascular endothelium,” Clinical Cancer Research, vol. 17, no. 22, pp. 7035–7046, Nov. 2011, doi: 10.1158/1078-0432.CCR-11-1462. [DOI] [PubMed] [Google Scholar]

[R20] [20].Topcu T. O. et al. , “The clinical importance of serum galectin-3 levels in breast cancer patients with and without metastasis,” Journal of Cancer Research and Therapeutics, vol. 14, no. 10, pp. S583–S586, 2018, doi: 10.4103/0973-1482.176425. [DOI] [PubMed] [Google Scholar]

[R21] [21].Saussez S. et al. , “The determination of the levels of circulating galectin-1 and −3 in HNSCC patients could be used to monitor tumor progression and/or responses to therapy,” Oral Oncology, vol. 44, no. 1, pp. 86–93, Jan. 2008, doi: 10.1016/j.oraloncology.2006.12.014. [DOI] [PubMed] [Google Scholar]

[R22] [22].Watanabe M. et al. , “Clinical significance of circulating galectins as colorectal cancer markers.,” Oncology reports, vol. 25, no. 5, pp. 1217–26, May 2011, doi: 10.3892/or.2011.1198. [DOI] [PubMed] [Google Scholar]

[R23] [23].Blair B. B. et al. , “Increased Circulating Levels of Galectin Proteins in Patients with Breast, Colon, and Lung Cancer,” Cancers, vol. 13, no. 19, p. 4819, Sep. 2021, doi: 10.3390/cancers13194819. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Tsai H.-C. and Baylin S. B., “Cancer epigenetics: linking basic biology to clinical medicine.,” Cell research, vol. 21, no. 3, pp. 502–17, Mar. 2011, doi: 10.1038/cr.2011.24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Boroughs L. K. and Deberardinis R. J., “Metabolic pathways promoting cancer cell survival and growth,” Nature cell biology, vol. 17, no. 4, pp. 351–359, Apr. 2015, doi: 10.1038/NCB3124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Zuo Z., Wang P., Chen X., Tian L., Ge H., and Qian D., “SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures,” BMC bioinformatics, vol. 22, no. 1, pp. 1–16, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Xie F. et al. , “Multifactorial Deep Learning Reveals Pan-Cancer Genomic Tumor Clusters with Distinct Immunogenomic Landscape and Response to Immunotherapy,” Clinical Cancer Research, vol. 26, no. 12, pp. 2908–2920, Jun. 2020, doi: 10.1158/1078-0432.CCR-19-1744. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Carracedo-Reboredo P. et al. , “A review on machine learning approaches and trends in drug discovery,” Computational and Structural Biotechnology Journal, vol. 19, pp. 4538–4558, 2021, doi: 10.1016/j.csbj.2021.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Collins F. S. and Varmus H., “A new initiative on precision medicine,” New England journal of medicine, vol. 372, no. 9, pp. 793–795, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Khoury M. J. and Galea S., “Will Precision Medicine Improve Population Health?,” JAMA, vol. 316, no. 13, p. 1357, Oct. 2016, doi: 10.1001/jama.2016.12260. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Murdoch T. B. and Detsky A. S., “The Inevitable Application of Big Data to Health Care,” JAMA, vol. 309, no. 13, p. 1351, Apr. 2013, doi: 10.1001/jama.2013.393. [DOI] [PubMed] [Google Scholar]

[R32] [32].Caban J. J. and Gotz D., “Visual analytics in healthcare – opportunities and research challenges,” Journal of the American Medical Informatics Association, vol. 22, no. 2, pp. 260–262, Mar. 2015, doi: 10.1093/jamia/ocv006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Chaballout B. H., Shaw R. J., and Reuter-Rice K., “The SMART healthcare solution,” Advances in Precision Medicine, vol. 2, no. 1, 2017. [Google Scholar]

[R34] [34].Heisey-Grove D., Danehy L.-N., Consolazio M., Lynch K., and Mostashari F., “A National Study of Challenges to Electronic Health Record Adoption and Meaningful Use,” Medical Care, vol. 52, no. 2, pp. 144–148, Feb. 2014, doi: 10.1097/MLR.0000000000000038. [DOI] [PubMed] [Google Scholar]

[R35] [35].Lau F., Price M., Boyd J., Partridge C., Bell H., and Raworth R., “Impact of electronic medical record on physician practice in office settings: a systematic review,” BMC Med Inform Decis Mak, vol. 12, no. 1, p. 10, Dec. 2012, doi: 10.1186/1472-6947-12-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].West V. L., Borland D., and Hammond W. E., “Innovative information visualization of electronic health record data: a systematic review,” Journal of the American Medical Informatics Association, vol. 22, no. 2, pp. 330–339, Mar. 2015, doi: 10.1136/amiajnl-2014-002955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Funkhouser A. T. et al. , “KIT Mutations Correlate with Higher Galectin Levels and Brain Metastasis in Breast and Non-Small Cell Lung Cancer,” Cancers, vol. 14, no. 11, p. 2781, Jun. 2022, doi: 10.3390/cancers14112781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Li D., Ota K., Zhong Y., Dong M., Tang Y., and Qiu J., “Towards High-Efficient Transaction Commitment in a Virtualized and Sustainable RDBMS,” IEEE Trans. Sustain. Comput., vol. 6, no. 3, pp. 507–521, Jul. 2021, doi: 10.1109/TSUSC.2019.2890841. [DOI] [Google Scholar]

[R39] [39].Odhiambo C. O., Cole C. A., Torkjazi A., and Valafar H., “State Transition Modeling of the Smoking Behavior Using LSTM Recurrent Neural Networks,” in 2019. International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, Dec. 2019, pp. 898–904. doi: 10.1109/CSCI49370.2019.00171. [DOI] [Google Scholar]

[R40] [40].Odhiambo C., Wright P., Corbett C., and Valafar H., “MedSensor: Medication Adherence Monitoring Using Neural Networks on Smartwatch Accelerometer Sensor Data,” 2021, doi: 10.48550/ARXIV.2105.08907. [DOI] [Google Scholar]

[R41] [41].Odhiambo C. O., Saha S., Martin C. K., and Valafar H., “Human Activity Recognition on Time Series Accelerometer Sensor Data using LSTM Recurrent Neural Networks,” 2022, doi: 10.48550/ARXIV.2206.07654. [DOI] [Google Scholar]

[R42] [42].Cole C. A., Powers S., Tomko R. L., Froeliger B., and Valafar H., “Quantification of Smoking Characteristics Using Smartwatch Technology: Pilot Feasibility Study of New Technology,” JMIR Form Res, vol. 5, no. 2, p. e20464, Feb. 2021, doi: 10.2196/20464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Bagheri Rajeoni A., “Analog Circuit Sizing Using Machine Learning Based Transistorcircuit Model,” PhD Thesis, 2021. [Online]. Available: https://www.proquest.com/dissertations-theses/analog-circuit-sizing-using-machine-learning/docview/2543477165/se-2

[R44] [44].Zhao L., Odigwe B., Lessner S., Clair D., Mussa F., and Valafar H., “Automated Analysis of Femoral Artery Calcification Using Machine Learning Techniques,” in 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, Dec. 2019, pp. 584–589. doi: 10.1109/CSCI49370.2019.00110. [DOI] [Google Scholar]

[R45] [45].Odigwe B. E., Spinale F. G., and Valafar H., “Application of Machine Learning in Early Recommendation of Cardiac Resynchronization Therapy,” 2021, doi: 10.48550/ARXIV.2109.06139. [DOI] [Google Scholar]

[R46] [46].Odigwe B. E., Rajeoni A. B., Odigwe C. I., Spinale F. G., and Valafar H., “Application of machine learning for patient response prediction to cardiac resynchronization therapy,” in Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook Illinois, Aug. 2022, pp. 1–4. doi: 10.1145/3535508.3545513. [DOI] [Google Scholar]

[R47] [47].Odigwe B. E., Eyitayo J. S., Odigwe C. I., and Valafar H., “Modelling of Sickle Cell Anemia Patients Response to Hydroxyurea using Artificial Neural Networks,” 2019, doi: 10.48550/ARXIV.1911.10978. [DOI] [Google Scholar]

PERMALINK

This is a preprint.

Comprehensive and User-Analytics-Friendly Cancer Patient Database for Physicians and Researchers

Ali Firooz

Avery T Funkhouser

Julie C Martin

W Jeffery Edenfield

Homayoun Valafar

Anna V Blenda

Abstract

I. Introduction

A. Cancer Stats and Why New Strategies are Needed

B. Different Approaches to Cancer Screening

C. Background Importance of Using Multifactorial Data

D. Why Combining Data in a Convenient Manner is Critical

E. Aim

II. Materials & Methods

A. Data Categories

1). Reported Data

a). Patient Data –

b). Samples –

2). Produced Data

a). Galectins –

b). Glycans –

c). Gene Mutations –

B. Database

1). Storage

2). Interface

3). Data Visualization

III. Results

Fig. 1.

Fig. 2.

Fig. 3.

A. Data Statistics

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 9.

B. Data Visualization

Fig. 10.

Fig. 12.

Fig. 11.

IV. Discussion and Conclusions

Fig. 8.

Acknowledgment

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases