Fast way to create 'IsDayOff' column

Hi all, I am working in Python 3.6 and am looking for a fast way to create an ‘IsDayOff’ column. I have created a “DayofWeek” column based on the date, and have merged the main dataframe with the metadata. Currently, I use a for loop to iterate over each row, check the “DayofWeek”, and then check the “[day]IsDayOff” column and deduce whether that row was from a day off. This is incredibly slow, especially since the dataset is so large and I only have 8gb of RAM. Has anyone found a fast way to do this?

I found the solution; I need to remember to use Numpy arrays because they are really fast.

Hi rchesak, thanks for sharing your solution.
I know you found your solution long time ago but one other way to speed up this check is by knowing that only Friday, Saturday and Sunday have day off on any of the sites:



Thanks for sharing your solution mbzn. The only reason I would hesitate to use that method is because future data may not follow the same pattern, and thus this method requires manual checking of the observed pattern. An automated data pre-processing is preferable in production.

I understand your point! But you could still make these initial checks inside your function before doing any comparison and I think it will still be scalable for any dataset.

You are right, and I am curious how one might implement this in code. I am new to optimizing for efficiency and speed, and I wonder whether checking your booleans first would be faster.