This dataset presumes a solution (Heart Disease)

jrcharlton · May 21, 2019, 1:08pm

This isn’t really a very good study, because it presumes that heart disease has a cause outside of one that is autocorrelated. For instance, if one were to take a specific set of generations and to see whether or not heart disease were present (1) or not present (0) in that situation it would be much more likely to provide a prior indicator of whether or not someone is likely to have heart disease and furthermore would allow a data scientist to establish that there were a prior pattern that is much more indicative of the cause than anything independent variables.

Accordingly, this isn’t likely to result in any meaningful solution because heart disease is something that comes with a prior probability. Simply look at the autocorrelation of heart disease across organisms with much faster procreation rates (like insects).

People like using y=f(x) assumptions (cause and effect) because it’s convenient to make that assumption, but in terms of predicting heart disease, it could be simply that y[n+1] = f(y[n+1]). That’s much more likely than heart disease being solved by 90 data points among 12 prior selected independent variables.

Gillesvdw · May 23, 2019, 7:00am

While the data collection could perhaps have been done better (its from 1988), it is clear though that there is predictive signal within the dataset given the performances achieved by ML algorithms (may it be through autocorrelation or confounding factors).

Optimus5 · July 18, 2019, 3:26pm

Hi, thank you for your responses. How do you know the data is from 1988? Does anyone know how to get hold of the original data? (With 75+ variables)

bgilroy26 · September 23, 2019, 5:19pm

The Driven Data description of the problem notes that they are using the UC Irvine Heart Disease dataset

https://archive.ics.uci.edu/ml/datasets/Heart+Disease

Topic		Replies	Views
"Machine Learning with a Heart" 's task purpose Warm Up: Machine Learning with a Heart	1	917	November 2, 2018
Question about submissions Warm Up: Machine Learning with a Heart	0	472	October 28, 2019
A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy rule-based system Warm Up: Machine Learning with a Heart	0	633	April 7, 2019
Feature Selection Warm Up: Machine Learning with a Heart	2	1500	October 28, 2019
Q&A Event Summary Unsupervised Wisdom	0	294	September 7, 2023

This dataset presumes a solution (Heart Disease)

Related topics