1Department of Biochemistry & Molecular Biology, Jahangirnagar University, Dhaka-1342, Bangladesh
2Professor, Department of Biochemistry & Molecular Biology, Jahangirnagar University, Dhaka-1342, Bangladesh
A relatively new area of biology called bioinformatics deals with the processing and interpretation of biological data using computational, analytical, and algebraic methods. The broad term "bioinformatics" describes the application of digital technology to the study of biological processes through the collection of high-dimensional data from several sources. The main focus of bioinformatics research, which is mostly done in silico and usually entails synthesizing new knowledge from existing data, is the design and testing of the software tools needed to assess the data. A cancer patient's prognosis is enhanced by early detection; nonetheless, early diagnosis might be challenging to accept. Proteomics research and DNA microarrays have been utilized for extensive gene expression studies. In order to facilitate researchers' rapid acquisition and comprehension of these tools' functions, this review explores the use, features, and web servers, including KM plotter, GEPIA2, TCGA, PrognoScan, UALCAN, UCSC Xena, HPA, GENT2, cBioportal, GDC portal, TIMER, others multi-omics statistical tools.
The majority of this review explains to the researchers what bioinformatics is, how it's used in cancer detection and therapy, and how to use various kinds of bioinformatic tools and databases. One of the leading causes of death worldwide, cancer is extremely challenging to diagnose in its early stages. The utilization of many biomarkers in various cancer types at the genetic level is the primary topic of this review, which gives a clinical practitioner a clear concept for improved treatment and prognosis. Bioinformatics is the study of biological data acquisition and interpretation through the use of computational and analysis tools [1]. A relatively new area of biology called bioinformatics deals with the processing and interpretation of biological data using computational, analytical, and algebraic methods. During the "genomics" and "omics" eras, the rapid growth of internet communication knowledge made it overwhelming for laboratory researchers to examine data from experiments conducted during the preceding ten years. The increasing demand for in-depth biological research means that traditional gene-by-gene approaches are insufficient to fully satisfy genuine biology. The massive volumes of data produced by emerging sciences like genome sequencing and microarray chips demand data management and integration across multiple platforms. After that, in order to acquire biotic knowledge for therapeutic outcomes, information analysis and reporting are required [2]. A wide term, "bioinformatics" refers to the use of digital technology to study biological processes using high-dimensional data collected from many resources. The design and testing of the software tools required to evaluate the information are the core of bioinformatics research, which is conducted in large portions and typically involves the synthesis of new learning from available data negatively impacts the other's safety, therapeutic effectiveness, appearance, or elegance [1].
Background:
Due to the disease's poor diagnosis, cancer has been characterized as a complex sickness that affects a large number of people and is thought to be a common prominent cause of individual mortality. Prognoses and therapy may be correlated with treatment response and tolerance, duration, locations, and cellular expansion, source, and pathophysiological understanding. A vital and significant component of clinical medicine, cancer bioinformatics is also a vital instrument and method in the study of cancer [3]. The human body makes a large number of genes, proteins, and RNA that are regulated in space and time to work as a complicated network. This issue renders the traditional gene-by-gene approach unreliable and unable to provide a complete picture of cellular activity. With the use of microarray technology, gene expression across the entire genome can now be assessed in a single experiment [4]. Personalized treatment, prognosis, and diagnosis may all profit from multi-omics approaches. Sufficient bioinformatics capabilities are needed to meet the demands of precision oncology by organizing, integrating, and interpreting large amounts of complex data. Because of a stricter policy requirement, we analyze the unique requirements of bioinformatics approaches and tools that emerge in the context of clinical cancer, including the need for rapid, highly reproducible, and stable procedures. We describe the steps involved in creating a molecular tumor board, from the first analysis of raw genomic profile data to the automated report preparation. We also discuss the specific bioinformatics support needed for this procedure [5]. To mediate biological activity, DNA is transcribed into RNA, and RNA is translated into protein, according to the widely accepted fundamental concept of genetics. However, the human genome project shows to our amazement that just 1.5% of the human genome contains protein-coding genes [6-8]. Previously genome assemblers were only capable of successfully complete the creation of tiny bacterial genomes, but progression in data quality and quantity, together with more sophisticated design methods and processing resources, have made it possible to complete the assembly of more complex eukaryotic genomes [9]. Projects that aim to explain cancer from a global perspective are giving researchers the chance to have greater accessibility to data to integrate and analyze in fresh ways. The ultimate objective of cancer bioinformatics is the development of new treatment and diagnostic methods. Many tools were introduced and developed to know various problems starting from tumor heterogeneity to the analysis of gene mutation [10].
Bioinformatics in cancer diagnosis:
Early diagnosis of cancer causes improved prognosis, but at the same time, it is difficult to confirm the diagnosis at a very early stage due to the lack of top-notch statistical models that take into consideration clinical triage and variations in aggressiveness. Research suggests that identification delay in cancer matters; the evidence for this is mounting, although it is difficult to quantify its effect on survival or fatality [11,12]. The substances present in tobacco are DNA-toxic agents that may have a significant impact on the development and spread of certain cancers [13,14]. Cancer bioinformatics plays a major part in the identification and authentication of biomarkers, specified to early identification, about clinical phenotype, and also measures and observes the prognosis of disease and outcome of treatment and predicts the enhancement of patient's life quality. The incorporation of knowledge on protein annotations, relations, and signaling pathways, and complex biomarkers and novel classes of biomarkers with protein-protein interlinkage were studied. One of the novel approaches is to use effective complex biomarkers to track and assess changes in network biomarkers at various stages and intervals throughout the onset of illnesses. Clinical informatics such as clinical manifestations, patient complaints, treatment histories, biochemical analyses, imaging, pathologies, and other data were anticipated to be connected [15]. The use of DNA microarrays and proteomics studies for large-scale gene expression research has advanced technology, thus elevating the significance of bioinformatics tools. In today's research, wet experimentation and the application of bioinformatics analytics go side by side [16]. Molecular profiling of tumor biopsies is becoming more crucial to both cancer research and the treatment of cancer. Diagnosis, prognosis, and individualized treatment improvements are possible with bioinformatics. Relevant bioinformatics approaches for managing, integrating, and analyzing big, network information are required to fulfill the word precision oncology. Bioinformatics techniques and software that are developed in the context of oncology as a result of the strict monitoring environment and demand for quick, incredibly reproducible, and reliable methods are particularly needed. The molecular tumor board's plan and particular bioinformatics assistance that is needed, from the initial study of fresh molecular profiling information to the computerized making of the report, must be outlined. Numerous clinical studies and genomic tumor boards at specialist cancer centers and medical centers across the globe have employed similar approaches to various extents. Previous initiatives as well as modern ones ought to be examined to integrate tumor boards with certain other top pan-omics patient data, as well as the capability of clinical methodologies to convert molecular discoveries into advice on appropriate treatments. The method used to investigate the genetic basis of cancer is being revolutionized. Instead of concentrating on specific genes, researchers are now investigating important areas of the expressed genome. The quantity of data saved in the patient record and the amount of molecular data produced from the testing facility are growing at an incredible rate. To obtain a fresh understanding of the genetics of cancer, it is essential to find innovative approaches for combining these data. As a result, bioinformatics, the fusion of biology, information science, and computation, continues to become an essential part of cancer research [17].
Bioinformatics databases and tools
It is very important to collect useful and important data related to the study before using any technique. This process of collection of data is called data mining [18]. Oncomine is the name of the software system utilized in the data mining procedure. All free-of-charge cancer microarray data is meticulously curated, examined, and made accessible through the database and information mining technology known as Oncomine [19]. The tools of bioinformatics include the Database for Annotation, Visualization and Integrated Discovery (DAVID), Gene Ontology (GO), Surveillance, Epidemiology, Results Program (SEER), and Gene Expression Profiling Iterative Analysis (GEPIA) [20]. Commonly used cancer databases that are freely accessible for cancer research described below sequentially.
Databases are classified as follows: databases harboring gene/microRNA expression profiles, databases for copy number variations (CNVs), DNA mutation detection databases, epigenetic profiles databases, databases with integrative analyses, and databases with other data types [21]. Differential expression of TCGA (The Cancer Genome Atlas) samples for different cancer can be compared with normal complements through the ONCOMINE database (https://www.ONCOMINE.org/resource/login.html)[22-24]. The analysis of expression level are normally conducted considering a threshold parameter of; p-value: 1E-4, fold change: 2, gene ranking: 10%.
GEPIA stands for Gene Expression Profiling Interactive Analysis is a renowned server for differential gene expression, survival analysis, correlation analysis, similar gene detection etc. The expression profile for genes throughout different cancer and their complementary normal tissues are detected through the GEPIA2 website (http://gepia2.cancer-pku.cn/) [25].
A platform for exploring gene expression patterns across normal and tumor tissues (http://gent2.appex.kr/gent2/). This platform is designed to help in gene profile analysis, subtype analysis, meta-survival analysis [26].
Genotype-Tissue [removed]GTEx) (https://gtexportal.org/home/)is a platform for isoform expression, exon expression, junction expression analysis for cancer genes. The GTEx (Genotype-Tissue Expression) data and TCGA data are generally matched by the ANOVA differential method and to detect the expression with default threshold settings through the GEPIA2 website [27].
Expression of protein in different cancer tissues and their counterpart normal tissue are retrieved as immunohisto-chemistry images retrieved Human Protein Atlas database (https://www.proteinatlas.org/ [28, 29].
University of Alabama at Birmingham CANcer Data Analysis Portal is a comprehensive, user-friendly, and interactive web resource for analyzing cancer OMICS data. Sample gene expression and promoter methylation analysis are conducted with the UALCAN website (http://ualcan. path.uab.edu/index.html) by comparing the TPM (transcript per million) and beta value, respectively [28, 29]. The TCGA database is used in UALCAN to analyzed mRNA transcription count of cancer tissues with their normal counterparts, tissues in different cancer stages, different ethnicities and gender.
The number and location of the mutations in the peptide sequence are detected using cBioPortal (https://www.cbioportal.org/) [30, 31]. Frequency of alteration (Mutation, Amplification, Deep Deletion and Multiple Alterations) are investigated using the cBioPortal web.
PrognoScan database helps to identify gene expression and patient prognosis of the cancer. PrognoScan is utilized in examining the effect on overall survival (OS), relapse-free survival (RFS), and diseases-free survival (DFS) by multivariate and univariate investigation of gene expression [32].
The Kaplan Meier plotter (https://kmplot.com/analysis/) is capable of assessing the correlation between the expression of all genes (mRNA, miRNA, protein, & DNA) and survival in 35k+ samples from 21 tumor types. Applied statistical tools include Cox proportional hazards regression and the computation of the False Discovery Rate. With 18,000 analyses per day, the KM-plotter is a worldwide reference for the discovery and validation of survival biomarkers. It is mainly used in survival analysis of cancer patients in association with gene expression [32].
The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This joint effort between NCI and the National Human Genome Research Institute began in 2006, bringing together researchers from diverse disciplines and multiple institutions. Different cancer databases use this database as a source of data such as UALCAN, GDC portal, etc [33].
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Enrichr (https://maayanlab.cloud/Enrichr/) currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 73582803 annotated gene sets from 227 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. [34].
UCSC Xena (https://xenabrowser.net/) allows users to explore functional genomic data sets for correlations between genomic and/or phenotypic variables. The UCSC Xena platform (http://xena.ucsc.edu/) allows biologists and bioinformaticians to securely analyze and visualize their private functional genomics data in the context of public genomic and clinical data sets. The Xena platform consists of a set of federated data hubs and the Xena browser, which integrates across hubs, providing one location to analyze and visualize all data. Our expanding public Xena Data Hubs currently hosts 1400+ data sets from more than 35 cancer types, as well as Pan-Cancer data sets. This data hub serves seminal cancer genomics and functional genomics data set to the scientific community, including the latest TCGA, TARGET, ICGC, and GTEx data sets. This server helps to analyze data types including somatic and germline SNPs, INDELs, large structural variants, CNV, gene-, transcript-, exon- protein-, miRNA-expression, DNA methylation, phenotypes, clinical data, subtype classifications and genomic biomarkers [35].
CCLE (https://sites.broadinstitute.org/ccle/) provides public access to genomic data, analysis, and visualization for over 1,000 cancer cell lines. It contains cell line data, mutation data, gene data, proteomics data, RNA expression data, Epigenetic modification data, metabolomics data [36].
recount3 (https://rna.recount.bio/) is an online resource consisting of RNA-seq gene, exon, and exon-exon junction counts as well as coverage bigWig files for 8,679 and 10,088 different studies for human and mouse respectively. By taking care of several pre-processing steps and combining many datasets into one easily-accessible website, this server makes finding and analyzing RNA-seq data considerably more straightforward [37].
ArrayExpress (https://www.ebi.ac.uk/biostudies/arrayexpress) is a functional genomics data collection that stores data from high-throughput functional genomics experiments, and provides data for reuse to the research community. In line with community guidelines, a study typically contains metadata such as detailed sample annotations, protocols, processed data and raw data. Raw sequence reads from high-throughput sequencing studies are brokered to the European Nucleotide Archive (ENA), and links are provided to download the sequence reads from ENA [38].
Genomic Data Common Data Portal (https://portal.gdc.cancer.gov/) is a harmonized datasets. It is a repository and computational platform for cancer researchers who need to understand cancer, its clinical progression, and response to therapy. This is a High-quality datasets spanning 44,637 cases from cancer genomic studies such as TCGA, Human Cancer Models Initiative (HCMI), Foundation Medicine Inc.(FMI), Clinical Proteomic Tumor Analysis Consortium (CPTAC) [39]. Currently it has 83 projects, 69 primary sites, 44637 cases, 1019317 files, 22534 genes, 2934380 mutations.
TIMER (http://timer.comp-genomics.org/timer/) is a comprehensive resource for systematical analysis of immune infiltrates across diverse cancer types. This version of webserver provides immune infiltrates' abundances estimated by multiple immune deconvolution methods, and allows users to generate high-quality figures dynamically to explore tumor immunological, clinical and genomic features comprehensively. Instead of just using one algorithm, TIMER2.0 (http://timer.cistrome.org/) provides more robust estimation of immune infiltration levels for The Cancer Genome Atlas (TCGA) or user-provided tumor profiles using six state-of-the-art algorithms. TIMER2.0 provides four modules for investigating the associations between immune infiltrates and genetic or clinical features, and four modules for exploring cancer-related associations in the TCGA cohorts. Each module can generate a functional heatmap table, enabling the user to easily identify significant associations in multiple cancer types simultaneously. Overall, the TIMER2.0 web server provides comprehensive analysis and visualization functions of tumor infiltrating immune cells [40]. Table 1
Table 1 List of the database and server used in Cancer Bioinformatics
Bioinformatics Approach and steps in cancer detection and prognosis
The servers above are generally used for cancer research via computational approach. The steps that are taken into consideration during the data collection, data analysis, data interpretation are described below:
The first and foremost important things needed for biomarker identification is to analyze gene expression and comparison their expression between normal and tumor cells. Oncomine, GEPIA, GENT2 are normally used for this. In this case during statistical analysis, p value considered <0>
More than 30 cancers are till now have been documented. Gene expression patterns are most of the cases different but sometimes they have some sort similarities regarding gene expression patterns. For example, gene expression pattern of TAP1 in various cancer shows difference. Using the ONCOMINE database to observe the gene expression and fold changes, itt was proceeded by considering four cancer types: breast, lung, liver, and ovarian cancers. The expression was seen to be upregulated in breast, liver, ovary, and lung cancers. The GEPIA2 tool was used to look into the expression of the TAP1 gene, where the expression levels for LIHC and OV cancers were significantly higher than the normal tissues. The expression in the primary tumor and the normal was compared using the TCGA database in the UALCAN tool. A significant overexpression for the TAP1 expression was seen in the primary tumor in comparison with the normal in BRCA, LIHC, and LUAD [41].
UALCAN online database is used to analyze gene expression with clinical characteristics. As example, the expression of the TAP1 gene in normal tissue was compared with tissues in patients with different clinical outcomes for breast, liver, lung, and ovarian cancers. Overexpression of TAP1 gene in cancer patient compared to normal tissue was most in stage 2 for LUAD (p = 3.37e-09) and LIHC (p = 2.97e-08) cancer, stage 2 (p = 1.62e-12) and stage 4 (p = 1.08e-03) were among the highest for BRCA. Stage 3 (p < 1e xss=removed xss=removed xss=removed>
The level of target gene promoter methylation with different clinical characteristics can be done using the UALCAN online database. In case of TAP1 gene, CpG probes were used to identify a correlation between the expression levels and promoter methylation. The promoter methylation of the TAP1 gene in normal tissue was compared with tissues in patients with different clinical outcomes for breast, liver, lung, and ovarian cancer. Promoter methylation was increased significantly in tumor samples than normal tissues for BRCA (p= 4.00e-03), LIHC (p = 2.36e-02), and LUAD (p = 5.99e-03). Compared to normal tissue, patients having stage 1 (p = 3.25e-02) and stage 2 (p = 4.17e-04) BRCA showed a significant upregulation in beta value, patients with LIHC had the highest promoter methylation in stage 1 (p = 2.32e-02), patients having stage 1 and 3 LUAD had a significant rise in promoter methylation than normal tissue (Compared to normal tissue, Asian and African American patients had increased promoter methylation in BRCA, and Caucasian patients had increased methylation for LIHC and LUAD. In breast, liver, and lung cancer, TAP1 promoter methylation was significantly unchanged for females compared to normal tissue. However, the analysis suggests no specific relation of TAP1 DNA methylation and clinico-pathological subtypes [42].
By utilizing the cBioPortal database, genetic alteration of gene in different cancers can be studied. For example, generated database queried to observe the genetic mutation of TAP1 in 7710 specimens from 12 cases of BRCA, LIHC, LUAD and OV cancers. Of the total queried samples, there was a 2% alteration in the gene set or pathways with the somatic mutation frequency of 0.3%. Considering multiple sample studies, in total 21 mutations, including 9 duplications, were reported for the TAP1 gene area. We observed between 1 and 808 amino acids in the TAP1 pro-peptide and TAP1 domain for the query. Amidst those mutations, 19 were missense, and 2 truncating mutations were identified thoroughly. Breast adenocarcinoma and lung cancer had the highest level of mutations found in them, and the mutations laid among a hotspot in R547C/H. Two mutations were identified in the R547C/H. The site contained mutations, such as missense mutations, that were discovered in 8 breast adenocarcinoma specimens, 3 lung cancer expressed missense mutations. For ovarian cancer datasets, alteration frequency was found highest (>6%) among four cancer types. Consequently, it generated the expression of TAP1 mRNA (RNA Seq V2) among 12 cases of cancers by utilizing the cBioPortal. Breast cancer, 8 (7 mis-sense and 1 truncating) cases, had the highest level of mutation in mRNA expression, and subsequently, the liver cancer was next in mutation with 4 affected [42].
Different categories of cancer need to be considered for the prognosis of gene mRNA expression and summarization of the data using the prognostic databases with Cox p-value of a significance of (p <0 xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed n =161 xss=removed>
Lastly, it needed to figure out the genes that positively correlated with the target. As example TAP1 gene has some co-expressed genes in BRCA, LIHC, LUAD, and OV cancer that was detected by using the R2 genomics analysis and visualization platform. The correlated genes were used in Venn Draw to draw a Venn diagram giving us the common correlated genes in BRCA (6058 genes), LIHC (3773 genes), LUAD (4012 genes), and OV cancer (2901 genes). Then it was extracted the positively correlated common genes to conduct an ontology investigation. We used the Enrichr software in order to understand which signaling pathways were influenced by the positively co-expressed genes and the TAP1 gene in BRCA, LIHC, LUAD, and OV. In the pathway analysis of the KEGG database, We saw that the most correlated 10 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of TAP1 and the gene which are in positive correlation with TAP1, were primarily associated with cytokine-cytokine receptor pathway, chemokine signaling pathway, hematopoietic cell lineage, primary immunodeficiency, rheumatoid arthritis, cell adhesion molecules, Chagas disease, toll-like receptor signaling pathway, Salmonella infection, and human immunodeficiency virus-1 infection. Cytokine-cytokine receptor pathway and chemokine signaling pathway being the most significant pathways to be influenced [42].
DISCUSSION:
At present, the diagnosis and treatment of NSCLC is still far from satisfactory, and the number of this case is still rising year by year. It is necessary to investigate the pathogenesis and biomarker of NSCLC to provide effective treatment. Great progress has been made on the mechanism of initiation and development of NSCLC. Many experiments including vitro tumor cell lines, animal tumor models, and patients’ tumor model have been done, however, NSCLC demands more comprehensive analysis because the progress of lung cancer is a multi-stage and multi-cause process. Fortunately, with the development of human genome sequencing, the high throughput and associated tumor database developed and were more available to get. The integration of data by bioinformatics analyses from multiple datasets has become a vital source of data for studies of lung cancer [43].
Using serial analysis of gene [removed]SAGE), cell-type-specific cell surface indicators, and magnetic resonance imaging, Allinen et al. characterized the full transcriptome of every cell type making up healthy breast tissue as well as in situ and metastatic breast carcinomas. Beads were utilized for the quick isolation of each step. Their findings imply that all cell groups undergo modifications, but only cancer epithelial cells have genetic changes identified [44].
After lung cancer, liver cancer accounts for the majority of cancer-related fatalities. Using a mouse hepatoblast model and RNAi, Sawey et al. carried out a forward genetic screening under the direction of human hepatocellular cancer amplification data. They discovered that the selected susceptibility to FGF19 inhibition was caused by overexpression. Since CCND1 and FGF19 are both equally significant driver genes of the 11q13.3 amplicons in liver cancer, 11q13.3 amplifications may serve as a useful biomarker for individuals who are expected to respond favorably to anti-FGF19 treatment [45].
By employing this categorization to separate medulloblastomas from other histologically comparable brain tumors, Pomeroy et al. were able to anticipate the treatment activity of medulloblastomas [46]. Moreover, the molecular profile demonstrated that medulloblastomas and primitive neuroectodermal tumors (PNETS), two forms of brain cancers frequently regarded as single entities, are physiologically separate from one another. The medulloblastoma gene expression pattern revealed unanticipated participation of the sonic hedgehog signaling pathway and suggested cerebellar granule cells as their cell of origin. Bredel et al. also applied genomic network knowledge to the investigation of critical activities and addressing glioma genesis to employ gene transcription profiling in the biology view of human gliomas [47]. Several cancer investigators have used microarray technology to study multiple myeloma. The morphological uniformity of multiple myeloma was verified by Claudio et al. [48]. Locati et al. used self-organizing map techniques to organize publicly accessible HPV+ cancer information and derived gene signatures related to three different subgroups of the illness as a substitute strategy and to truly comprehend how gene groups may connect with prognosis [49]. A 10-gene proposed methodology of relapse duration and prognosis in epithelial ovarian cancer was discovered by Lu et al. using a support vector machine learning algorithm to examine the information from the cancer cell line encyclopedia. This model was verified on two different pieces of information [50].
Oral cancer is one of the most commonly seen neoplasms in the head and neck region and has a poor prognosis, and among oral cancer, oral squamous cell carcinoma (OSCC) is the most common [51,52]. Kumar et al. compared the top-ranked genes with the genes corresponding to strongly enriched GO keywords relevant to oral cancer. A total of 39 prospective oral cancer target genes were identified. Initial analysis of research and experimental data revealed 29 genes to be associated with OSCC. They proposed a function for the chosen candidate genes in the invasion and metastasis in OSCC following a thorough pathway analysis. Using immunohistochemistry (IHC), they further verified their hypotheses and discovered that in the OSCC specimens, FLNA was elevated, whereas ARRB1 and HTT were downregulated [52]. Nakashima et al. found that the miR-1290 expression level in the plasma of oral squamous cell carcinoma patients is lower than in healthy individuals. However, circulating miR-1290 status has been proposed as a promising biomarker for evaluating both overall survival and clinical response to chemo radiotherapy in patients with oral squamous cell carcinoma [53]. Differentially expressed genes, hub proteins, and pathways demonstrated a strong correlation with OSCC development. Thorough investigation using bioinformatics is necessary for understanding the underlying process of OSCC advancement. Important genes and pathways may serve as OSCC therapy management goals [54].
Ovarian cancer is the most prevalent and main cause of female death globally among numerous gynecological cancers. The pathophysiology and underlying causes of illness development remain unexplained despite substantial investigation. Different non-coding RNAs have been recognized as key regulators in the development of ovarian cancer. Beg et al. highlighted the significance of several ncRNAs, which have a strong promise as a therapeutic strategy for the treatment of ovarian cancer [55].
This study screened DEGs through the transcriptome sequencing results of AML samples. The GO enrichment analysis illustrated that the DEGs of AML were significantly enriched in functional items such as positive regulation of cell promotion, protein, and immune response. These items are closely related to the occurrence and function of tumors. The key process of malignant tumor proliferation is positive regulation of cell proliferation, and proteolysis and immune response inhibition are involved in tumor proliferation and invasion. The KEGG pathway enrichment analysis indicated that AML DEGs were significantly enriched in p53 signal pathway, TNF signal pathway, HIF-1 signal pathway, and other signal pathways. These pathways are common pathways for the progression of malignant tumors and are involved in regulating gastric cancer, bladder cancer, lung cancer, liver cancer, leukemia, and other malignant tumors. The results of GO enrichment analysis and KEGG pathway enrichment analysis showed that the DEGs of AML screened of this study were representative, which may be the key pathogenic genes of AML, participating in the regulation of tumor cell proliferation and invasion as well as in the occurrence and progression of AML [56].
Prostate cancer (PCa) is one of the pervasive carcinoma occurring in men and a large health burden worldwide [57]. The lethality of PCa is due to the lack of treatment options that can produce a lasting response at the genetic and cellular biological levels. About 15% of patients with PCa are diagnosed with high-risk disease [58]. Therefore, through utilizing univariate Cox and iterative lasso Cox regression analyses, a 3-gene (KCNK3, AK5, and ARHGEF38) risk signature model in PCas was constructed. The ROC curves further approved the accuracy of our risk model. It is reported that KCNK3 influenced physiological processes, ranging from vascular tone to metabolic diet through inflammation [59]. Also, KCNK3 was correlated with prolonged survival after surgery in colorectal cancer [60]. AK5 was reported as a new prognosis marker that promotes autophagy and proliferation in human gastric cancer [61]. Interestingly, ARHGEF3 was proved to be an oncogene and may be a novel biomarker for predicting invasive PCa [61, 62].
CONCLUSION:
Although these tools provide great convenience for prognostic biomarker development, several key aspects of these tools remain elusive. Differences in datasets collected and split points may result in significantly different results. So, it is essential to collect dataset and their source of these web servers and find excluding TCGA data and there are significant differences in other data sources. This may be one of the reasons why the analysis results of different tools are not completely consistent. In the future, efforts should be made in data optimization, prognostic tools should be improved to be able to predict multi-gene markers, select optimal cut-off computation, use hierarchical clustering and consider complex multi-omics networks of interactions. In addition, more molecular subtypes and clinical information including tumor tissue image and treatment data should be collected and mined to identify more meaningful prognostic markers through more detailed subtype analysis.
REFERENCES
Md Solayman Hossain, Farha Matin Juliana, Bioinformatics In Early Detection And Prognosis Of Cancer, Int. J. of Pharm. Sci., 2024, Vol 2, Issue 7, 144-161. https://doi.org/10.5281/zenodo.12610166