Do the id’s have to be in the same order as submission example?
Im getting the error message “IDs for submission are not correct.”
Yes, the submission you make needs to match the submission example except for the predicted values.
Is there an easy way to do that. I have my predictions calculated and have spent all day today trying to get into same order with no luck.
Nevermind I think I have a solution.
I spoke too soon. Is it possible for you to fix it so that you can submit results and the order of the id column does not matter. Sorting that many records to be exactly the same is proving to be a big pain point for me.
The easiest way to reorder/sort is based on the IDs/index that is in the submission file. Here are some examples in R and Python of loading the data, shuffling the test dataset (so that it is not in the same order as the submission file), and then reordering that shuffled data based on the order expected in
Hope it helps!
First in R:
# load data test_data <- read.csv("data/processed/test_values.csv", row.names=1) submission <- read.csv("data//processed/SubmissionFormat.csv", row.names=1) # reorder the test data so it is shuffled shuffled_data_frame <- test_data[sample(nrow(test_data)),] # confirm that the shuffled order does not match the original #  FALSE all(shuffled_data_frame == test_data) # use the submission index to reorder the test dataframe reordered_data_frame <- shuffled_data_frame[row.names(submission),] # check the reordered matches the original #  TRUE all(reordered_data_frame == test_data)
And now in Python with
import pandas as pd import numpy as np test_data = pd.read_csv("data/processed/test_values.csv", index_col=0) submission_format = pd.read_csv("data/processed/SubmissionFormat.csv", index_col=0) shuffled_data = test_data.loc[np.random.permutation(test_data.index)].copy() # False print (shuffled_data.values == test_data.values).all() reordered_data = shuffled_data.loc[submission_format.index].copy() # True print (reordered_data.values == test_data.values).all()
How do you manage to use the Y variable in the analysis if it is not included in the test set?
The code above uses
SubmissionFormat.csv, which is the same shape and has the same order of rows/columns as the actual Y variable–just not the same values.
Is the submission case sensitive? I am new to this and the score column next to my submission list is just an icon of spinning dots. I did not receive any errors or warnings when submitting the code but I feel like I must have done something wrong. Thanks in advance for any advise.
Yep, the submission is case sensitive. The labels you submit must exactly match the ones in the problem description, which are
functional needs repair, and
There’s a known issue where if your score is exactly equal to zero, you see the spinning dots. If you got a score of exactly zero, your labels probably don’t match the correct format!
Hope that helps!
I have a quick question:
The rules (general ones) say that submission limits are on a per competition basis; given this, what is the limit of submissions for this competition?
Hey @zamborg, you can find that on the submissions page:
For example, when under
Subs. Today I see that I have made 0 of 3 possible submissions.
Regarding the error message “IDs for submission are not correct.”.
The suggestion to reorder/sort the data as in the submission file didn’t work out right way for me (in R).
I was able to tackle this problem by using join() (from the plyr package), which is similar to merge() but it retains the order of the first data frame.
Join the ‘id’ variable with your solution (which contains id + status_group)
submission <- data.table(read.csv("SubmissionFormat.csv")) FileToSubmit <- join(submission[, .(id)], solution, by= "id")