- An Incremental Learning Method for Preserving World Coffee Aromas by Using an Electronic Nose and Accumulated Specialty Coffee Datasets AbstractSpecialty coffee beans have a unique aroma and flavor. The aromas of coffee in the world are affected by several issues, including growing area, climate, postharvest processing (such as dry and wet methods), roasting treatment, etc. These issues significantly contribute to the development of coffee-bean aromas. Since humans have a limited ability to recognize the aroma of coffee, we need a reliable system to resolve the method of characterizing the world’s coffee aroma. Therefore, in this article, we proposed an incremental learning method for digitizing the complexity of coffee aromas using an electronic nose (E-nose) system. We also developed a method to create coffee-aroma fingerprints to represent their aromatic features among different coffees. In our experiments, the incremental learning model achieved high accuracy, proving the authenticity of recognizing various world specialty coffee aromas. The approach leverages an E-nose system and coffee-aroma datasets to preserve specialty coffee aromas around the world. In addition, the ultimate goal of this method is to build a scalable database of various coffee aromas while improving the accuracy of system recognition.
- The development of thermal error compensation on CNC machine tools by combining ridge parameter selection and backward elimination procedure AbstractThe total processing error of CNC machine tools essentially comprises geometric errors and thermal errors. Therefore, reducing the influence of thermal errors is necessary. In this study, 13 temperature sensors were utilized to measure temperature variations of heat sources on a machine. These sensors work in conjunction with a non-contact optical measurement system to measure the positioning offset error of a rotating shaft. This study set α value to be the maximum allowed ridge parameter and named the method Critical–α, allowing an appropriate ridge parameter for use with a ridge regression model to be quickly selected, and integrated into a backward elimination procedure to achieve ridge regression thermal error compensation modeling. The study considered three methods for selecting temperature variable combinations. The first method requires the use of all sensors, the second method selects the combination with the minimum mean-square error, and the third method considers the effect of diminishing returns. The ridge regression method, which considers the diminishing returns effect, is known as the “R–DR model.” The R–DR model is applied to the CNC machine used in this study to reduce the maximum peak-to-peak error on the Y-axis from 54.41 to 13.94 µm using only three temperature sensors, and on the Z-axis from 73.59 to 10.12 µm using four temperature sensors. Therefore, the R–DR model has two advantages: high precision (post-compensation peak-to-peak thermal error of less than 14 µm) and fewer temperature sensors, thereby allowing the thermal error compensation modeling method to demonstrate high engineering applicability and accuracy.
- Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in De-Identifying Chinese-English Mixed Clinical Text AbstractBackground: The widespread use of electronic health records in clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual form posing a challenge to de-identify. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code-mixing (CM). Most current clinical natural language processing techniques are designed for monolingual texts, and there is a need to address the de-identification of CM texts.Objective: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned PLMs in identifying PHIs in CM context. Additionally, we also aimed to evaluate the potential of prompting LLMs in recognizing PHIs in a zero-shot manner.Methods: We compiled the first clinical CM de-identification dataset consisting of texts written in Chinese and English. We explored the effectiveness of fine-tuning pre-trained language models (PLMs) in recognizing PHIs in CM content, focusing on whether PLMs exploit naming regularity and mention coverage to achieve superior performance by probing the developed models’ outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of large language models (LLMs) in recognizing PHIs in CM text.Results: The developed methods were evaluated on a CM de-identification corpus of 1,700 discharge summaries. We observed that different PHI types had their preference in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHIs by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity was weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of CM training instances is essential for the model’s performance. Furthermore, LLM-based de-identification method is a feasible and appealing approach that can be controlled and enhanced through natural language prompts.Conclusions: The study contributes to understanding the underlying mechanism of PLMs in addressing the de-identification process in CM context and highlights the significance of incorporating CM training instances into the model training phase. To support the advancement of research, we have made a manipulated subset of the resynthesized dataset available for research purposes. Based on the compiled dataset, we find that the LLM-based de-identification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. While this study has advanced the understanding of de-identifying clinical text in CM languages, several limitations should be acknowledged. Firstly, our research is primarily based on a Chinese-English CM corpus from Taiwan, which possesses unique writing conventions and style. Consequently, the fine-tuned models may not generalize seamlessly to other multilingual contexts. Second, we recognize that imbalanced training sets and challenges associated with machine translation have influenced our model’s performance. Additionally, the sensitivity of the LLM-based framework to prompt crafting, the dynamic nature of model versions, and the choice of repetitions in prompts are factors that affect the reported performance for specific PHI categories. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHIs.
- Automatic Extraction of Medication Mentions from Tweets—Overview of the BioCreative VII Shared Task 3 Competition Abstract
- Augmenting DSM-5 diagnostic criteria with self- attention-based BiLSTM models for psychiatric diagnosis Abstract
- Robust fault recognition and correction scheme for induction motors using an effective IoT with deep learning approach Abstract
- Vickers Hardness Value Test via Multi-Task Learning Convolutional Neural Networks and Image Augmentation Abstract
- Principle-Based Approach for the De-Identification of Code-Mixed Electronic Health Records Abstract
- Cancer Registry Coding via Hybrid Neural Symbolic Systems in the Cross-Hospital Setting Abstract
- Cohort selection for construction of a clinical natural language processing corpus Abstract
- Deep Learning-Based Natural Language Processing for Screening Psychiatric Patients Abstract
- Family History Information Extraction With Neural Attention and an Enhanced Relation-Side Scheme: Algorithm Development and Validation Abstract
- Cohort selection for clinical trials using multiple instance learning Abstract
- Using text mining to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records Abstract
- Adverse drug event and medication extraction in electronic health records via a cascading architecture with different sequence labeling models and word embeddings Abstract
- Family member information extraction via neural sequence labeling models with different tag schemes Abstract
- Classifying adverse drug reactions from imbalanced twitter data Abstract
- Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature Abstract
- Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine Abstract
- Assessing the severity of positive valence symptoms in initial psychiatric evaluation records: Should we use convolutional neural networks? Abstract
- SPRENO: a BioC module for identifying organism terms in figure captions Abstract
- Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy Abstract
- Exploring Associations of Clinical and Social Parameters with Violent Behaviors among Psychiatric Patients Abstract
- NTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions Abstract
- MET network in PubMed: a text-mined network visualization and curation system Abstract
- Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts Abstract
- A Context-Aware Approach for Progression Tracking of Medical Concepts in Electronic Medical Records Abstract
- Text Mining for Translational Bioinformatics Abstract
- Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields Abstract
- Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization Abstract
- LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations Abstract
- Joint Learning of Entity Linking Constraints Using a Markov-Logic Network Abstract
- Collective Instance-Level Gene Normalization on the IGN Corpus Abstract
- TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries Abstract
- T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes Abstract
- Coreference resolution of medical concepts in discharge summaries by exploiting contextual information Abstract
- Integration of gene normalization stages and co-reference resolution using a Markov logic network Abstract