Plotted the correlation between the amount of missing values in each of the consumedxxxx features for surveys 1-6.
There are some clear patterns within and between the surveys, not sure what insights can be gleamed from this if any. Thought I’d share in case anyone thought it was interesting or wanted to look further into this.
Excuse the lack of axis, couldn’t figure out an easy way to put readable ones in, but each row/col is just consumedxxxx starting from consumed100 ending at consumed5000.
Each cell represents the correlation between the amount of nulls in the two features, i.e. we can see in survey 3 that if for a response consumed100 is null, then all of consumed100-900 will be null, (in this survey there is only 1 row where these features are null so this isn’t all that interesting in of itself). I don’t think this exercise is all that useful in this problem, though in theory this could let us infer some things about the structure of the questionnaire.
