|title:||Supervised Text Classification of Medical Triage Reports|
Dolf Trieschnigg (EWI-CS-HMI)
Pim van den Broek ,
Bram Kievit (Topicus) ,
Michiel Hakvoort (Topicus)
Topicus Zorg has developed a system to help triage officers at the emergency department perform a triage. This system uses keyword scanning for text classfication; an entered description of medical symptoms is categorized in one or more presenting complaints. This way of classication has its limitations. Only keywords are recognized, which makes that some information is ignored. Also sometimes more than two presenting complaints are used as category for one text, although almost always one presenting complaint is sufficient.
In this thesis the characteristics of medical texts are discussed. 10 characteristics of medical texts were found, only three of these characteristics were highly represented in the used data collection. These three characteristics are telegraphic style (no complete sentences), shorthand text (abbreviations, acronyms and local dialectal shorthand phrases) and negation (negated words or phrases, like `no pain on chest'). Also some commonly used supervised text classication methods are reviewed; k Nearest Neighbors, Support Vector Machines and Naive Bayes. One text classication method is chosen (k Nearest Neighbors, kNN) and ve parameters are dened for modication of this text classication method. These parameters focus on query construction, number of nearest neighbors, scoring and ranking. Some implementations of these parameters are chosen to be tested. The current triage system of Topicus Zorg is then compared to the implementation of kNN and the parameters using an F-measure. A similar score is obtained for both systems, the triage system and the implementation of kNN using parameters.