From my understanding of adjusted mean average precision, I think it is meant to be no predicted correct/actual correct. Hence, this is probably what the script meant for this line:
Yes, the adjustment to scikit-learn’s classification mean average precision into information retrieval mean average precision is indeed a factor of number predicted correct / number actual correct.
The original line of code and the line of code you are proposing should be equivalent for the scoring script. I’m not totally sure what you think is incorrect—is it related to QUERY_ID_COL being available on the merged["actual"] series to group by? In our script, we load all of the dataframes such that QUERY_ID_COL is set as an index and not a regular column. (See here.) That means it’s still available for the groupby operation. If you don’t have it set as an index, then the groupby will error. You can see the below example.
import pandas as pd
df = pd.DataFrame(
{
"query_id": ["A", "A", "A", "B", "B", "B", "B"],
"database_image_id": ["01", "02", "03", "01", "02", "03", "04"],
"score": [1.0, 0.9, 0.8, 1.0, 0.9, 0.8, 0.7],
"actual": [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0],
}
)
df = df.set_index("query_id")
df
#> database_image_id score actual
#> query_id
#> A 01 1.0 1.0
#> A 02 0.9 0.0
#> A 03 0.8 1.0
#> B 01 1.0 0.0
#> B 02 0.9 1.0
#> B 03 0.8 1.0
#> B 04 0.7 1.0
df["actual"]
#> query_id
#> A 1.0
#> A 0.0
#> A 1.0
#> B 0.0
#> B 1.0
#> B 1.0
#> B 1.0
#> Name: actual, dtype: float64
df["actual"].groupby("query_id").sum().astype("int64").rename()
#> query_id
#> A 2
#> B 3
#> dtype: int64
df.groupby("query_id")["actual"].sum().astype("int64").rename()
#> query_id
#> A 2
#> B 3
#> dtype: int64
Created at 2022-05-10 10:29:55 EDT by reprexlite v0.4.3