Hi, while doing the first experiments on a single note, I found a wrong annotation: mg as milligrams , but coded as Magnesium (271285000). I checked on all the training set, and the same mistake appears in about 25 annotations (some prescriptions with dose expressed in milligrams).
Well, as a fellow contestant, my suggestion is: embrace the error! So long as it doesn’t interfere with the logic of your solution, if that’s what they say it is (and what they’re scoring against), then that’s what your program should say too.
I’m having some issues with newlines in the middle of character spans, myself, but no dataset is ever perfect.
In principle I would agree, however it’s 25 out of a larger number of milligram instances, randomly coded as Magnesium. The contexts are different - blood exams vs prescriptions. If I code all the milligrams as Magnesium, my score will be lower than deserved; if I code none (the right thing) will be lower too. However, it’s a small fraction of the annotations, so the impact will be small.
Regarding newlines, if you treat them as word separators together with spaces and punctuations, it’s not an issue.
@vdellamea I haven’t gotten far enough in my solution yet to know how they’ll alter the intersection-over-union formula, is my issue; though if it’s only a newline in place of a space, then you’re right, it’s pretty much irrelevant.
After a little reflection, I think I see the issue you’re talking about. I misunderstood at first. I guess we can only hope that our solutions will be able to find a pattern in the classification of “mg” spans in the data, despite us not seeing a pattern at all. Stranger things have happened