By Ma. Angelika Dinglasan, Marianne Jaraplasan, and Cedric Katigbak
Inside room D405 of the University of the Philippines Los Baños’ Institute of Mathematical Science and Physics (UPLB-IMSP) building, scientists sit across a long table in front of their working computers.
Dr. Ranzivelle Marianne Roxas-Villanueva heads the institute’s physics division and holds a doctorate in Physics. Across from her is Princess Silva, an Applied Mathematics graduate who analyzes and prepares reports on genomic data. Desiree Villanueva and Gabriel Manzanilla, both Applied Physics graduates, handle ultrasound and clinical data, respectively. Finally, Nicole Astrologo, also an Applied Physics graduate, is in charge of multimodal data, which includes all the data analyzed in the project – genomic, imaging, and clinical.
They work in a place called DARELab (Data Analytics Research Laboratory), a research laboratory that investigates biological health, environmental, and agricultural systems using applied physics, complexity science, and machine learning approaches. Currently, the lab, which was established in January 2021, is focused on developing an artificial intelligence (AI)-driven system for the early diagnosis of liver cancer in chronic hepatitis B patients, better known as the CANDLE study (Early CANcer Detection in the LivEr of Filipinos with Chronic Hepatitis B using AI-Driven Integration of Clinical and Genomic Biomarkers). This is the second part of the undertaking, with the first (patient recruitment and data gathering) initiated by the Department of Science and Technology (DOST) via the Philippine Council for Health Research and Development (PCHRD) and researchers from the University of the Philippines Manila.
The second deadliest cancer
A 2021 epidemiology study reports that liver cancer is one of the most common types of cancer and the fourth biggest cause of cancer death globally. In the Philippines, as of 2020, liver cancer is the fourth most common cancer and the second leading cause of cancer death, with more than 10,000 new cases that same year, as stated in the Philippine Journal of Internal Medicine.
Liver cancer is usually diagnosed very late in its course, as the symptoms become apparent when the disease has already reached its advanced stage. According to the National Nutrition Council, most liver cancer patients are able to survive only a year after diagnosis, and the five-year survival rate is only below 5% without treatment and less than 35% with treatment.
This late detection can be seen in the case of Ligaya Reaño, a 78-year-old resident of Barangay Sto. Domingo, Bay, Laguna, who was diagnosed with liver cancer in 2021.
“Kasi nung 2021, nag-positive siya sa COVID-19 [at] nagpa-confine siya. Lumabas na fatty liver siya, eh parang lagi s’yang nanlalambot kaya nagpa-check up ulit doon sa ibang doctor, sa Calamba. Nung ma-detect, positive agad sya sa [Stage 3 liver cancer]. Syempre malungkot ako. Naiiyak nga ako,” said Juanito Reaño, Ligaya’s husband, while narrating their experiences two years ago.
(In 2021, she was confined and tested positive for COVID-19. Her test results showed that she had a fatty liver, and she seemed to always feel weak, so we took her to another doctor in Calamba. She was already positive for Stage 3 liver cancer by the time it was detected. Of course, I feel sad. I was on the brink of tears.)
Reaño said that they were surprised, as there were no symptoms seen or felt besides weakness and fatigue. Unfortunately, since Ligaya’s cancer was detected at an old age, she was no longer eligible for surgery. Thus, she only relies on herbal medicine as treatment.
The Department of Health reported that an efficient method for early detection of liver cancer has yet to be released and made available for use by Filipinos. Blood tests, imaging tests (ultrasound, CT, and MRI scans), or a liver biopsy are the only means of diagnosis available.
In a liver biopsy, to get a tissue sample, the doctor inserts a small needle through the skin and into the liver. Doctors analyze the tissue under a microscope for cancer cells, and a liver biopsy may increase the patient’s risk of bleeding, bruising, and infection. Because of this, less invasive methods, without compromising the urgency of detection, are called for, and one of these possible methods involves using AI.
AI in Philippine healthcare
AI or machine learning models can be used to monitor patient symptoms, predict a patient’s likelihood of having a disease, and alert medical staff when a particular risk increases. However, AI does not intend to replace medical practitioners.
“‘Yung AI or ‘yung framework na ginagamit natin for healthcare or disease detection [ay nangangailangan] pa rin ng higher being which are the doctors, nurses, or anyone from the field. AI is just an instrument or like a bridge kung pa’no natin mapapa-automate ‘yung early detection, pero ‘yung end tunnel natin sa pagda-diagnose is still the medical practitioners,” said Sophia Lanuzo, an AI Engineer from InterVenn Biosciences.
(The AI or framework that we are using for healthcare or disease detection still needs a higher being, which are the doctors, nurses, or anyone from the field. AI is just an instrument or like a bridge that helps automate early detection, but the end tunnel for diagnosis is still the medical practitioners.)
In the Philippines, relatively few medical practitioners use AI or machine learning to help diagnose or predict disease susceptibility. With this, researchers in the country continue to develop models and examine them against various medical cases.
Dr. Roxas-Villanueva said that Filipino researchers have access to technologies that would enable them to create models that will assist them to catch up with global advances in machine learning as a cancer diagnosis tool. Their methods are up to date and their equipment is capable of reading new algorithms that can be used in research.
Moreover, Dr. Roxas-Villanueva affirmed that the Philippine government, specifically the DOST, is supportive of the project, as digital frontiers in AI research are one of the government’s top priorities.
In September 2022, the Department of Trade and Industry inaugurated the Center for AI Research (CAIR), which will serve as an avenue for researchers and data scientists to collaborate in AI research and development. One of the target clusters is the health and life sciences cluster. Sadly, according to a 2023 report, the AI research facility remains unfunded in the 2024 national budget.
What DARELab is doing differently
The attempt to study better ways for earlier and more convenient disease detection is not new.
Various genome-wide association studies (GWAS) used statistical methods to aid in identifying significant single nucleotide polymorphisms (SNPs). SNPs (pronounced “snips”) are genetic variations that, although not the cause of disorders, can be associated with a disease. Scientists who explore early detection methods, like the DARELab team, look for SNPs that affect a person’s genetic tendency to develop a disease.
However, Dr. Roxas-Villanueva and her team claimed that GWAS are not a hundred percent accurate all the time due to various deficiencies. This includes the inability to detect SNPs of modest effects and the lack of consideration for SNP-SNP interactions, which are both vital in disease development and progression.
“Sa GWAS kasi, tine-take mo ‘yung SNPs individually, pero may possibility kasi ‘yung disease is due to interaction of multiple SNPs. Kapag clustering approach, mas makikita mo ‘yung interaction mo between the SNPs,” said Dr. Roxas-Villanueva.
(In GWAS, SNPs are tested individually. However, there is a possibility that the disease is due to the interaction of multiple SNPs. When the clustering approach is used, you will be able to test interactions between SNPs.)
With this, DARELab took the detection of disease-associated SNPs to the next level by using the integration of GWAS and two machine learning approaches:cluster analysis and random forest.
Here’s what happens
The researchers produced an architecture of the proposed framework of the study to better explain the three stages of how GWAS and machine learning are combined to identify significant SNPs and SNP sets. The key processes in the framework that yield the results of the genomic track are the following:
STAGE 1. Random Forest. Random forest (RF), a machine learning algorithm, will assign scores to each SNP based on how predictive they are for the diseases.
STAGE 2. Clustering. Highly similar SNPs will then be joined into groups or clusters using a specific similarity measure.
STAGE 3. Association. Clusters will undergo an association test, and those that are under the set threshold value will be labeled as an SNP set that is associated with the disease.
The results for the genomic track would then be integrated with the results of other tracks, imaging and clinical, for a more holistic and accurate disease prediction.
However, like any other research endeavor, the CANDLE study faces several limitations. One is the small sample size that the models are trained in due to the nature of liver cancer cases (late detection and early death). This may affect the performance of the models, as the information that they learn is insufficient.
Moreover, poor data quality poses challenges on the imaging track, as the models may find it difficult to detect the disease in low-quality ultrasound images.
Another limitation is the paucity or lack of studies backing up some of the methodologies used for detection, as is the case with the genomic track.
“Dahil bagong method, walang masyadong magbaback-up na studies. For example, sa cluster analysis, kakaunti lang ‘yung mga related literature na gumamit nito sa pag-detect ng SNPs na related sa isang trait or sa isang disease. ‘Yung paggamit ng RF, medyo matagal na siya na study, pero yung integration na ginawa namin na RF combined with cluster analysis, bago siya,” said Dr. Roxas-Villanueva.
(Because this is a new method, there are no studies yet that back up the framework. For example, there is only a small body of literature that uses cluster analysis in detecting SNPs that are associated with a trait or a disease. RF has been used in studies for a long time already, but the integration that we have done in combining it with cluster analysis is new.)
The need for local data
One of the challenges of AI research in the Philippines is the availability of data, given the costs and long periods of collecting it. For the CANDLE project, it took a long time for the clinical and genomic data of Filipino samples (internal data) to be derived. While waiting for data completion, an external dataset (from a Korean cohort) was used.
“Since trained ‘yung models natin sa Korean population, pwedeng mag-introduce ng bias ‘pag itetest natin siya on another population. So need nating mag-retrain for the Filipino population, para ma-eliminate yung bias sa dataset and ‘di siya mag-result sa model na may mababang accuracy or may mataas na error in generalizability,” stated Villanueva, explaining that the model is aimed to have good generalizability, or the ability to adapt to new and unseen datasets.
(Since our models were trained using the Korean population, there can be a bias when testing another population. We need to retrain for the Filipino population to eliminate the bias in the dataset, and to avoid huge error in generalizability or the inaccuracy of the model.)
As of September 2023, data collection for all tracks is complete and all samples are in.
Breakthroughs and what’s to come
In its third and final year, Dr. Roxas-Villanueva reported that DARELab has already acquired a computer server, which serves as a repository for its clinical and genomic data and the sonograms.
Moreover, for clinical, genomic, ultrasound, and multi-modal data, the team also created several machine-learning and deep-learning models. One of the final products of the project, the Philippine Liver Cancer Data Repository (PLCDR), an AI-driven web computational platform, is about 60% finished. Data, particularly clinical data, can already be uploaded to and downloaded from the web app.
For the remainder of 2023, its goal was to work on the integration of the developed AI models from each track to the PLCDR.
From the laboratory to the people
AI technology in the early detection of liver cancer, and even AI technology itself, is relatively new to Filipinos. The CANDLE researchers recognize and find valid the possible skepticism that people may have for AI in healthcare and the potential preference for traditional methods of disease detection. They see awareness-raising as a solution to this possible lack of receptiveness.
“Kung i-integrate mo ‘tong AI sa clinical settings, dapat aware yung public ng kung pa’no ba gumagana itong AI na ito,” said Silva.
(If you will integrate AI into clinical settings, the public should be aware of how this AI technology works.)
Silva added that, when integrating AI within the clinical setting, various sectors, such as the developers, doctors, patients from whom the sample is generated, the government, and the media, should be involved to increase public trust for emerging technologies such as AI.
If this project becomes successful, months from now, we may be seeing Filipino doctors utilizing the CANDLE project’s application in the detection of liver cancer, a significant milestone that can help in early intervention and in decreasing its death rate.—MF