Validation error: arrays lenghts

Hi!
I am experiencing some issue when trying my test-submission script.
I am getting this error:

main.DataValidationError: Arrays lengths for query do not match. video_ids: 8295; timestamps: 682608; features: 341304.

This obviously comes from the /opt/validation function:

def validate_lengths(dataset: str, features_npz):
    n_video_ids = len(features_npz["video_ids"])
    n_timestamps = len(features_npz["timestamps"])
    n_features = len(features_npz["features"])
    if not (n_video_ids == n_timestamps == n_features):
        raise DataValidationError(
            f"Arrays lengths for {dataset} do not match. "
            f"video_ids: {n_video_ids}; "
            f"timestamps: {n_timestamps}; "
            f"features: {n_features}. "
        )

However, when reading the Code Submission Format page, I understood that the timestamp sub-array is like this:

timestamps is a 1D or 2D array of timestamps indicating the start and end times in seconds that the descriptor describes.

The code in my main.py looks like this:

def generate_query_descriptors(query_video_ids) -> np.ndarray:
    # Initialize return values
    video_ids = []
    timestamps = []
    descriptors = []

    # Generate descriptors for each video

    for i in tqdm.tqdm(range(query_video_ids.shape[0])):
        try:
            video_id = query_video_ids[i]
            video_file = f'{QRY_VIDEOS_DIRECTORY}/{video_id}.mp4'  
            start_timestamps, end_timestamps, qry_descriptor = extract_descriptor(video_file)
            descriptors.append(qry_descriptor)

            timestamps.append(np.hstack([start_timestamps, end_timestamps]))
            video_ids.append(video_id)
        except Exception as e:
            print(query_video_ids[i], e)

    descriptors = np.concatenate(descriptors).astype(np.float32)
    timestamps = np.concatenate(timestamps).astype(np.float32)

    return video_ids, descriptors, timestamps

Where the start and end descriptors come from:

    start_timestamps = np.array(tuple(start_timestamps.values()), dtype=np.float32)
    end_timestamps = np.array(tuple(end_timestamps.values()), dtype=np.float32)

Any hints at what might I be doing wrong?

Thanks! :slight_smile:

Hey @ndujar-

My guess is that np.concatenate is not doing what you want. You might try logging the shape of your arrays to ensure that you get the dimensionality you expect.