Identifying Undiagnosable Cancer with Machine Learning Technology

Identifying Undiagnosable Cancer with Machine Learning Technology

Unknown primary malignancies may be identified using a new model that associates developmental pathways with tumor cells.

Identifying a patient's precise type of cancer and its main site—the organ or area of the body where the disease first appears—is the first step in deciding on the best course of treatment.

By closely examining the gene expression patterns related to early cell development and differentiation, a new deep-learning method developed by scientists at MIT and Massachusetts General Hospital (MGH) may help categorize cancers of unknown primary.

Machine learning is well suited to solving the problem of separating the changes in gene expression between various types of cancers from an unknown source. In part because of major changes to how their genes are expressed, cancer cells differ greatly from normal cells in terms of how they appear and behave. There is a tonne of data that provides hints as to how and where certain tumors began, even though they may appear to be overwhelming to human eyes, thanks to developments in single-cell profiling and initiatives to catalog various cell expression patterns in cell atlases.

Creating a machine learning model that integrates distinctions between cancerous and healthy cells as well as between other cancer types requires striking a balance, though. A model that is overly complicated and takes into consideration too many aspects of cancer gene expression may appear to learn the training data flawlessly but struggle with new data. However, the ML model may overlook the kinds of data that would enable precise classifications of cancer types if it were made simpler by reducing the number of features.

The team concentrated the model on indicators of disrupted developmental processes in cancer cells to achieve a compromise between lowering the number of features while still capturing the most pertinent information. Undifferentiated cells specialize in various organs as an embryo grows, and several pathways control cell division, growth, shape change, and migration. Cancer cells lose many of the specialized characteristics of a mature cell as the tumor grows. As they develop the capacity to multiply, change, and spread to new tissues, they also start to resemble embryonic cells in some respects. Numerous embryogenesis-related gene expression processes are known to be reactivated or dysregulated in cancer cells.

The Cancer Genome Atlas (TCGA), which includes gene expression data for 33 different tumor types, and the Mouse Organogenesis Cell Atlas (MOCA), which profiles 56 different paths of embryonic cells as they grow and differentiate, were compared by the researchers to find correlations between tumor and embryonic cells.

The researchers assigned a numerical value to each component they created by dissecting the different components of the gene expression of tumor samples from the TCGA into discrete parts that each represent a distinct stage in a developmental trajectory. The Developmental Multilayer Perceptron (D-MLP), which the researchers created, is a machine-learning technology that rates a tumor according to its developmental elements and then forecasts its origin.

Following training, the D-MLP was used to analyze 52 fresh samples of difficult malignancies with unclear primary origins that were not able to be identified using the methods at hand. Over four years starting in 2017, these cases were the most difficult ones encountered at MGH. It's exciting to note that the model divided the tumors into four categories and produced forecasts and other data that may help doctors diagnose and treat these patients.

The comprehensive comparisons between tumor and embryonic cells in the study provided encouraging, and at times unexpected, new information about the gene expression profiles of particular tumor types. For instance, during the first stages of embryonic development, a simple gut tube arises, with the foregut serving as the origin of the lungs and other adjacent organs and the mid- and hindgut serving as the primary source of the majority of the digestive tract. According to the study, lung-derived tumor cells exhibited striking similarities to the developmental outcomes of the mid and hindgut as well as the foregut, as would be expected. These findings imply that differences in developmental programs may one day be leveraged in a manner similar to how genetic variants are frequently employed to generate tailored or customized cancer therapies.

Although the study offers a strong method for classifying cancers, it has several drawbacks. In their future work, the researchers hope to improve the model's prognostic ability by including additional types of data, particularly details from radiology, microscopy, and other tumor imaging techniques.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net