Release of the top 3 scoring solutions

mmiron · March 6, 2024, 3:36am

Hi folks,

Congratulations to those of you who managed to get over 0.4 – nicely done. I’m quite interested in what the winners did: will the source code be released at some point? If not, I’d be happy to describe how I got my 0.30 (a much simpler method than you might think), if those who beat my score would be willing to do the same.

Also, I missed something: the public leaderboard was already scored based on private data that wasn’t (and will never be) publicly available. What’s the motivation for using an entirely separate test set?

YonatanBilu · March 6, 2024, 8:54am

In essence our solution was very simple - build a dictionary from (section, mention) to concept id, then annotate a document by going over it, and in each section find mentions such that the pair (section, mention) is in the dictionary, and annotate it with the corresponding concept.
Most of the work went into building this dictionary, though we also did a bit of regular expression expansion in the matching and some post-processing to improve the results.

bproduct · March 6, 2024, 9:26am

Congrats on 1st! How did you build that dictionary? How did you address zero-shot problem? Our solution is 2stage nlp approach + static dictionary. 1st stage is 4class segmentation (0, BodyS, Finding, Procedure), second stage is linking with deep embedders. Static dictionary is extracted from train data with a bit of processing. We will probably write a somewhat detailed solution in future.

YonatanBilu · March 6, 2024, 10:16am

We used the train data and snomed itself to build the dictionary, and expanded it using some permutations and linguistic rules. Our segmentation was derived directly from the sections in the clinical notes, so higher resolution than 4.

vdellamea · March 6, 2024, 10:51am

Congrats from the 4th to the first 3 but also to all those that participated.
We began with a dictionary based solution (0.32 is its top on private), mostly to have a baseline, but then switched to entity recognition + classification both with DL models. However, it seems that it was not the best approach (at least for the challenge - we were actually quite happy with the quality of generated annotations, somewhat complete).
I wonder if the organizers / sponsors will prepare a paper on results, like in other challenges. There is not much on SNOMED-CT and it will be very useful.

chrisk-dd · March 6, 2024, 5:18pm

Hey @mmiron-

will the source code be released at some point?

To be eligible for prizes, winners of the competition must open-source their submissions. You’ll eventually be able to find code from the winning solutions in our repository of competition-winning code.

Also, I missed something: the public leaderboard was already scored based on private data that wasn’t (and will never be) publicly available. What’s the motivation for using an entirely separate test set?

We use a public / private test set split to discourage over-fitting to the test set.

chrisk-dd · March 6, 2024, 5:20pm

Thanks @vdellamea for your interest in the results, I’ll pass it along to the competition sponsors!

anja · March 7, 2024, 1:39pm

Withholding evaluation data is a best practice if you don’t want participants to use that data during development time. Which would be cheating

informatician · March 18, 2024, 6:14pm

You’ll eventually be able to find code from the winning solutions…
When exactly? It is not there today.

vdellamea · March 19, 2024, 8:33am

Istinetz · April 19, 2024, 12:04pm

Hi everyone,

Is there any update on releasing the winners - it would be pretty nice to see what people did

KimTang · April 22, 2024, 7:12am

Hi, the winners were announced over here: https://www.snomed.org/news/snomed-international-announces-entity-linking-challenge-winners

And the solutions are highlighted in this blogpost and also shared on Github:

Github: GitHub - drivendataorg/snomed-ct-entity-linking: Winners of the SNOMED CT Entity Linking Challenge

Istinetz · April 22, 2024, 5:42pm

Awesome, thank you, Kim!

Topic		Replies	Views
Dataset split ratio SNOMED CT Entity Linking	7	222	February 27, 2024
Thanks for a fun and interesting competition SNOMED CT Entity Linking	0	149	March 6, 2024
Can I use LLMs to solve this challenge? SNOMED CT Entity Linking	4	795	January 8, 2024
Availability of Test Data SNOMED CT Entity Linking	2	151	July 8, 2024
Which SNOMED Version? SNOMED CT Entity Linking	2	244	January 6, 2024

Release of the top 3 scoring solutions

Related topics