DiabetesPredict — Dataset

About the Dataset

PIMA Indians Diabetes Dataset

Overview

The PIMA Indians Diabetes Dataset originates from the National Institute of Diabetes and Digestive and Kidney Diseases. It was collected to diagnostically predict whether a patient has diabetes based on certain medical measurements. All patients in the dataset are females of Pima Indian heritage, at least 21 years of age.

The dataset contains 768 records with 8 input features and a binary target variable (Outcome): 0 indicates non-diabetic, 1 indicates diabetic. It is widely used as a benchmark for classification algorithms in healthcare ML research.

500

Non-Diabetic (65.1%)

268

Diabetic (34.9%)

Feature Descriptions

#	Feature Name	Data Type	Range	Description
1	`Pregnancies`	Integer	0 – 17	Number of times pregnant
2	`Glucose`	Integer	0 – 199	Plasma glucose concentration (2-hour oral glucose tolerance test, mg/dL)
3	`BloodPressure`	Integer	0 – 122	Diastolic blood pressure (mm Hg)
4	`SkinThickness`	Integer	0 – 99	Triceps skin fold thickness (mm)
5	`Insulin`	Integer	0 – 846	2-Hour serum insulin (mu U/ml)
6	`BMI`	Float	0 – 67.1	Body mass index (weight in kg / height in m^2)
7	`DiabetesPedigreeFunction`	Float	0.078 – 2.42	Diabetes pedigree function (genetic score)
8	`Age`	Integer	21 – 81	Age in years

Correlation Heatmap

Data Source

National Institute of Diabetes and Digestive and Kidney Diseases
Originally published in: Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). "Using the ADAP learning algorithm to forecast the onset of diabetes mellitus." Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265.