For this case study, we have taken data from Kaggle.com:
Health Insurance Fraud Detection Data
The data set consists of the following data:
(1) Provider Data: It contains ID of healthcare provide and whether it is Potential Fraud. The number of rows is 5,410 and two columns are Provider ID and Potential Fraud (No or Yes). Both are categorial data.
(2) Beneficiary Data: It contains 1,38,556 rows and 25 columns. It contains KYC data (like Date of Birth, Gender, Race etc.), Health Condition (Various diseases like Renal disease, Heart, Kidney etc.), State and Country Code.
(3) Inpatient Data: It contains data of those patients who have been admitted in the hospital (healthcare provider) and filed claim. It contains 40,474 rows and 30 columns.
(4) Outpatient Data: It contains data of those patients who have not been admitted in the hospital and filed the claim. It contains 5,17,737 rows and 27 columns.
For further details, please click on the link
Continue to Read