Back to DrivenData | Blog

Household country B Data has so many bugs in R


Hey guys, is anyone having trouble running household train B?
I have tried running it and there are so many bugs. I have tried cleaning; I couldn’t get a code that removes only columns with missing values (NAs). And if I try removing missing value via rows, then I get a zero. Did anyone experience this problem? Let me know how you handled it. I am also looking for someone to partner with or anyone interested in teamwork. Thanks!


Hi @fnyakundi

The NA are mostly with numeric variable in country B, based on the algorithm if tree ensemble it will certainly create a branch you use you can change them to one value for example 10000 (not taken by any numeric) do not use zero as it appears in two variable if I remember well.
Sorry to tell you there is no bug in data … you have to process the NA and wrong data.
Best regards


To obtain columns that have at least one NA you could use sapply(data, function(x) any(
I prefer the purrr way: map_lgl(data, ~any(


For now I just deleted those nine columns using: B_hhold_train.keeps<-B_hhold_train[ , colSums( == 0] as per:

But then depending on your model you might also want to remove the four factor variables with only one level.