February 1, 2025
A friend of mine leads an internal data team for hospital group.
They recently received the updated 2025 ICD-10 codes which is a coding system of ~74,000 codes that are used by healthcare providers and insurance companies for billing, tracking public health trends, and medical recordkeeping.
Basically, they got handed a plain text file with 74,260 lines and were asked to annotate the data with various additional bits of information. For example, they want to record whether each ICD code represents an infection:
A200 Bubonic plague
is an infection
Y9366 Activity, soccer
is not an infection
Annotating more than 74,000 lines is tedious and time consuming, so he hit up the group chat asking about how to script an LLM to do this.
This seemed like a fun toy problem to try out some stuff I personally hadn’t done yet, so I decided to see how well I could use LLMs as a binary classifier for “infection/not infection”
Continue Reading »