Publication

An Incremental Learning Method for Preserving World Coffee Aromas by Using an Electronic Nose and Accumulated Specialty Coffee Datasets December 18, 2023 AbstractSpecialty coffee beans have a unique aroma and flavor. The aromas of coffee in the world are affected by several issues, including growing area, climate, postharvest processing (such as dry and wet methods), roasting treatment, etc. These issues significantly contribute to the development of coffee-bean aromas. Since humans have a limited ability to recognize the aroma of coffee, we need a reliable system to resolve the method of characterizing the world’s coffee aroma. Therefore, in this article, we proposed an incremental learning method for digitizing the complexity of coffee aromas using an electronic nose (E-nose) system. We also developed a method to create coffee-aroma fingerprints to represent their aromatic features among different coffees. In our experiments, the incremental learning model achieved high accuracy, proving the authenticity of recognizing various world specialty coffee aromas. The approach leverages an E-nose system and coffee-aroma datasets to preserve specialty coffee aromas around the world. In addition, the ultimate goal of this method is to build a scalable database of various coffee aromas while improving the accuracy of system recognition.
The development of thermal error compensation on CNC machine tools by combining ridge parameter selection and backward elimination procedure December 16, 2023 AbstractThe total processing error of CNC machine tools essentially comprises geometric errors and thermal errors. Therefore, reducing the influence of thermal errors is necessary. In this study, 13 temperature sensors were utilized to measure temperature variations of heat sources on a machine. These sensors work in conjunction with a non-contact optical measurement system to measure the positioning offset error of a rotating shaft. This study set α value to be the maximum allowed ridge parameter and named the method Critical–α, allowing an appropriate ridge parameter for use with a ridge regression model to be quickly selected, and integrated into a backward elimination procedure to achieve ridge regression thermal error compensation modeling. The study considered three methods for selecting temperature variable combinations. The first method requires the use of all sensors, the second method selects the combination with the minimum mean-square error, and the third method considers the effect of diminishing returns. The ridge regression method, which considers the diminishing returns effect, is known as the “R–DR model.” The R–DR model is applied to the CNC machine used in this study to reduce the maximum peak-to-peak error on the Y-axis from 54.41 to 13.94 µm using only three temperature sensors, and on the Z-axis from 73.59 to 10.12 µm using four temperature sensors. Therefore, the R–DR model has two advantages: high precision (post-compensation peak-to-peak thermal error of less than 14 µm) and fewer temperature sensors, thereby allowing the thermal error compensation modeling method to demonstrate high engineering applicability and accuracy.
Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in De-Identifying Chinese-English Mixed Clinical Text December 08, 2023 AbstractBackground: The widespread use of electronic health records in clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual form posing a challenge to de-identify. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code-mixing (CM). Most current clinical natural language processing techniques are designed for monolingual texts, and there is a need to address the de-identification of CM texts.Objective: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned PLMs in identifying PHIs in CM context. Additionally, we also aimed to evaluate the potential of prompting LLMs in recognizing PHIs in a zero-shot manner.Methods: We compiled the first clinical CM de-identification dataset consisting of texts written in Chinese and English. We explored the effectiveness of fine-tuning pre-trained language models (PLMs) in recognizing PHIs in CM content, focusing on whether PLMs exploit naming regularity and mention coverage to achieve superior performance by probing the developed models’ outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of large language models (LLMs) in recognizing PHIs in CM text.Results: The developed methods were evaluated on a CM de-identification corpus of 1,700 discharge summaries. We observed that different PHI types had their preference in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHIs by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity was weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of CM training instances is essential for the model’s performance. Furthermore, LLM-based de-identification method is a feasible and appealing approach that can be controlled and enhanced through natural language prompts.Conclusions: The study contributes to understanding the underlying mechanism of PLMs in addressing the de-identification process in CM context and highlights the significance of incorporating CM training instances into the model training phase. To support the advancement of research, we have made a manipulated subset of the resynthesized dataset available for research purposes. Based on the compiled dataset, we find that the LLM-based de-identification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. While this study has advanced the understanding of de-identifying clinical text in CM languages, several limitations should be acknowledged. Firstly, our research is primarily based on a Chinese-English CM corpus from Taiwan, which possesses unique writing conventions and style. Consequently, the fine-tuned models may not generalize seamlessly to other multilingual contexts. Second, we recognize that imbalanced training sets and challenges associated with machine translation have influenced our model’s performance. Additionally, the sensitivity of the LLM-based framework to prompt crafting, the dynamic nature of model versions, and the choice of repetitions in prompts are factors that affect the reported performance for specific PHI categories. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHIs.
Automatic Extraction of Medication Mentions from Tweets—Overview of the BioCreative VII Shared Task 3 Competition February 03, 2023 Abstract
Augmenting DSM-5 diagnostic criteria with self- attention-based BiLSTM models for psychiatric diagnosis January 11, 2023 Abstract
Robust fault recognition and correction scheme for induction motors using an effective IoT with deep learning approach December 27, 2022 Abstract
Vickers Hardness Value Test via Multi-Task Learning Convolutional Neural Networks and Image Augmentation December 21, 2022 Abstract
Principle-Based Approach for the De-Identification of Code-Mixed Electronic Health Records February 01, 2022 Abstract
Cancer Registry Coding via Hybrid Neural Symbolic Systems in the Cross-Hospital Setting October 11, 2021 Abstract
Cohort selection for construction of a clinical natural language processing corpus July 01, 2021 Abstract
Deep Learning-Based Natural Language Processing for Screening Psychiatric Patients January 15, 2021 Abstract
Family History Information Extraction With Neural Attention and an Enhanced Relation-Side Scheme: Algorithm Development and Validation December 01, 2020 Abstract
Cohort selection for clinical trials using multiple instance learning July 01, 2020 Abstract
Using text mining to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records January 01, 2020 Abstract
Adverse drug event and medication extraction in electronic health records via a cascading architecture with different sequence labeling models and word embeddings January 01, 2020 Abstract
Family member information extraction via neural sequence labeling models with different tag schemes December 27, 2019 Abstract
Classifying adverse drug reactions from imbalanced twitter data September 01, 2019 Abstract
Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature February 27, 2019 Abstract
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine January 28, 2019 Abstract
Assessing the severity of positive valence symptoms in initial psychiatric evaluation records: Should we use convolutional neural networks? October 16, 2018 Abstract
SPRENO: a BioC module for identifying organism terms in figure captions June 03, 2018 Abstract
Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy December 08, 2017 Abstract
Exploring Associations of Clinical and Social Parameters with Violent Behaviors among Psychiatric Patients August 16, 2017 Abstract
NTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions July 27, 2016 Abstract
MET network in PubMed: a text-mined network visualization and curation system May 30, 2016 Abstract
Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts May 25, 2016 Abstract
A Context-Aware Approach for Progression Tracking of Medical Concepts in Electronic Medical Records September 30, 2015 Abstract
Text Mining for Translational Bioinformatics July 22, 2015 Abstract
Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields May 11, 2015 Abstract
Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization January 19, 2015 Abstract
LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations August 27, 2014 Abstract
Joint Learning of Entity Linking Constraints Using a Markov-Logic Network March 01, 2014 Abstract
Collective Instance-Level Gene Normalization on the IGN Corpus November 25, 2013 Abstract
TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries September 20, 2013 Abstract
T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes February 12, 2013 Abstract
Coreference resolution of medical concepts in discharge summaries by exploiting contextual information May 03, 2012 Abstract
Integration of gene normalization stages and co-reference resolution using a Markov logic network September 15, 2011 Abstract