Is this a classification or a regression problem? In the performance metric section, it’s written we have to submit the probability of having a heart disease which makes it not a classification problem (binary)

You could use either a classifier or a regression approach as long as the predict function can return probabilities. Both RandomForestClassifier and LogisticRegression from the sklearn can handle that. In my experience, both give very similar results with respect to accuracy, sensitivity, specificity, and log-loss scores. One approach would be to use a classifier to ascertain which features are important enough and then use logistic regression.

Regression aims to predict a continuous output value. For example, say that you are trying to predict the revenue of a certain brand as a function of many input parameters. A regression model would literally be a function which can output potentially any revenue number based on certain inputs. It could even output revenue numbers which never appeared anywhere in your training set.

Classification aims to predict which class (a discrete integer or categorical label) the input corresponds to. e.g. let us say that you had divided the sales into Low and High sales, and you were trying to build a model which could predict Low or High sales (binary/two-class classication). The inputs might even be the same as before, but the output would be different. In the case of classification, your model would output either “Low” or “High,” and in theory every input would generate only one of these two responses.

The above description is is true for any machine learning method; but, there is a gray area: There are algorithms which predict probability, which is a continuous value, between 0 and 1. By the above definition, you can consider them regression algorithms (think of logistic regression). At the same time, this probability refers to classes, so they can be used for classification (just set a threshold for probability: everything with probability < 0.5 goes into one class, and with > 0.5 into the other). How you “classify” these algorithms is a philosophical question, of little practical importance.