author: Jurgen Kleverwal
title: Supervised Text Classification of Medical Triage Reports
company: Topicus
committee: Dolf Trieschnigg (EWI-CS-HMI) ,
Pim van den Broek ,
Bram Kievit (Topicus) ,
Michiel Hakvoort (Topicus)
end: April 2015


Topicus Zorg has developed a system to help triage officers at the emergency  department perform a triage. This system uses keyword scanning for text classfication; an entered description of medical symptoms is categorized in one or more presenting complaints. This way of classi cation has its limitations. Only keywords are recognized, which makes that some information is ignored. Also sometimes more than two presenting complaints are used as category for one text, although almost always one presenting complaint is sufficient.
In this thesis the characteristics of medical texts are discussed. 10 characteristics of medical texts were found, only three of these characteristics were highly represented in the used data collection. These three characteristics are telegraphic style (no complete sentences), shorthand text (abbreviations, acronyms and local dialectal shorthand phrases) and negation (negated words or phrases, like `no pain on chest'). Also some commonly used supervised text classi cation methods are reviewed; k Nearest Neighbors, Support Vector Machines and Naive Bayes. One text classi cation method is chosen (k Nearest Neighbors, kNN) and ve parameters are de ned for modi cation of this text classi cation method. These parameters focus on query construction, number of nearest neighbors, scoring and ranking. Some implementations of these parameters are chosen to be tested. The current triage system of Topicus Zorg is then compared to the implementation of kNN and the parameters using an F-measure. A similar score is obtained for both systems, the triage system and the implementation of kNN using  parameters.

Additional Resources

  1. The paper