Question about scoring metric

flamethrower · May 9, 2022, 10:50am

This line of code taken from score_per_query method in the scoring script in score_submission.py

predicted_n_pos = merged[“actual”].groupby(QUERY_ID_COL).sum().astype(“int64”).rename()

From my understanding of adjusted mean average precision, I think it is meant to be no predicted correct/actual correct. Hence, this is probably what the script meant for this line:

predicted_n_pos = merged.groupby(QUERY_ID_COL)[“actual”].sum().astype(“int64”).rename()

Please clarify.

Thanks.

jayqi · May 10, 2022, 2:31pm

Hi @flamethrower.

Yes, the adjustment to scikit-learn’s classification mean average precision into information retrieval mean average precision is indeed a factor of number predicted correct / number actual correct.

The original line of code and the line of code you are proposing should be equivalent for the scoring script. I’m not totally sure what you think is incorrect—is it related to QUERY_ID_COL being available on the merged["actual"] series to group by? In our script, we load all of the dataframes such that QUERY_ID_COL is set as an index and not a regular column. (See here.) That means it’s still available for the groupby operation. If you don’t have it set as an index, then the groupby will error. You can see the below example.

import pandas as pd
df = pd.DataFrame(
    {
        "query_id": ["A", "A", "A", "B", "B", "B", "B"],
        "database_image_id": ["01", "02", "03", "01", "02", "03", "04"],
        "score": [1.0, 0.9, 0.8, 1.0, 0.9, 0.8, 0.7],
        "actual": [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0],
    }
)
df = df.set_index("query_id")
df
#>          database_image_id  score  actual
#> query_id
#> A                       01    1.0     1.0
#> A                       02    0.9     0.0
#> A                       03    0.8     1.0
#> B                       01    1.0     0.0
#> B                       02    0.9     1.0
#> B                       03    0.8     1.0
#> B                       04    0.7     1.0

df["actual"]
#> query_id
#> A    1.0
#> A    0.0
#> A    1.0
#> B    0.0
#> B    1.0
#> B    1.0
#> B    1.0
#> Name: actual, dtype: float64

df["actual"].groupby("query_id").sum().astype("int64").rename()
#> query_id
#> A    2
#> B    3
#> dtype: int64

df.groupby("query_id")["actual"].sum().astype("int64").rename()
#> query_id
#> A    2
#> B    3
#> dtype: int64

^{Created at 2022-05-10 10:29:55 EDT by reprexlite v0.4.3}

flamethrower · May 10, 2022, 2:44pm

Thank you so much for the detailed response.

My bad, I can see they are actually equivalent.

Topic		Replies	Views
Performance metric calculated incorrectly? Where's Whale-do?	1	300	May 5, 2022
Performance metric implemented in Python N+1 Fish, N+2 Fish	12	1949	October 14, 2017
Part of 6th Place solution Where's Whale-do?	2	346	July 11, 2022
I'm having scoring issues Warm Up: Machine Learning with a Heart	2	640	May 2, 2019
What we should exactly predict? Where's Whale-do?	1	308	May 14, 2022

Question about scoring metric

Related topics