The migration to electronic medical records, used by healthcare providers, hospitals, and medical insurers, continues. However, this switch from paper records is leading to an accumulation of data, a lot of which is in free-text form that cannot be processed easily by an algorithm searching for knowledge and looking for patterns.
A study in the International Journal of Business Process Integration and Management has looked at using basic text-mining methods to convert this free text, which might be as unsophisticated as the jottings of a doctor or nurse, into something more organised. This kind of processing could make decisions in medicine faster and more consistent as well as potentially opening up new avenues for medical research and epidemiology.
The research focused on the specific medical condition of lower back pain and the reports associated with it. Lower back pain is a big problem for a lot of people and a major reason people miss days in work or file for disability. Experts can evaluate symptoms and consider what medical scans show and make a diagnosis and offer a prognosis. Administrators have to read through reports manually to determine fees and payments. A system to convert free text to structured text would be a boon, allowing dates and diagnoses to be searched, checked, and analysed much more easily.
The team used pattern-matching rules to look for regular expressions that allow software to detect specific phrases or formats in text. This could then be used to extract clinical and administrative details. This rule-based text mining was combined with machine learning algorithms that can learn from past data and make predictions about new cases.
The researchers tested their system on 255 anonymised reports. Medical specialists validated the extracted information, confirming a precision rate of 98 per cent. The structured information was then used to train three established predictive models: AdaBoost, which combines multiple simple models to improve accuracy; Random Forest, which aggregates the results of many decision trees; and Support Vector Machines, which identify boundaries between categories in complex datasets.
In tests, AdaBoost achieved perfect accuracy in predicting when rest should be prescribed. Random Forest reached 91 per cent accuracy and 93 per cent recall, a measure of how many relevant cases are correctly identified, in return-to-work assessments. The Support Vector Machine recorded a 98 per cent recall rate in classifying disability cases.
Beyond performance metrics, the researchers argue that the approach reduces processing time and limits transcription errors. Because the extraction rules are explicit, the system remains interpretable. This is important, as decisions still need to be explained to patients and others regardless of how structured or unstructured the data is.
Zwawi, R., Elhadjamor, E.A., Ghannouchi, S.A. and Ghannouchi, S-E. (2025) ‘Optimising text mining applications for enhanced medical decision making’, Int. J. Business Process Integration and Management, Vol. 12, No. 4, pp.295–306.
No comments:
Post a Comment