Back to DrivenData | Blog

Household country B Data has so many bugs in R


#1

Hey guys, is anyone having trouble running household train B?
I have tried running it and there are so many bugs. I have tried cleaning; I couldn’t get a code that removes only columns with missing values (NAs). And if I try removing missing value via rows, then I get a zero. Did anyone experience this problem? Let me know how you handled it. I am also looking for someone to partner with or anyone interested in teamwork. Thanks!


#2

Hi @fnyakundi

The NA are mostly with numeric variable in country B, based on the algorithm if tree ensemble it will certainly create a branch you use you can change them to one value for example 10000 (not taken by any numeric) do not use zero as it appears in two variable if I remember well.
Sorry to tell you there is no bug in data … you have to process the NA and wrong data.
Best regards
Alain


#3

To obtain columns that have at least one NA you could use sapply(data, function(x) any(is.na(x))).
I prefer the purrr way: map_lgl(data, ~any(is.na(.x)))


#4

For now I just deleted those nine columns using: B_hhold_train.keeps<-B_hhold_train[ , colSums(is.na(B_hhold_train)) == 0] as per: https://stackoverflow.com/questions/12454487/remove-columns-from-dataframe-where-some-of-values-are-na

But then depending on your model you might also want to remove the four factor variables with only one level.