Dataset:
For this case study, we have taken data of Prudential Life Insurance from Kaggle.com:
Dataset.
The data set consists of the following data:
(1) Provider Data: It contains ID of healthcare provide and whether it is Potential Fraud. The number of rows is 5,410 and two columns are Provider ID and Potential Fraud (No or Yes). Both are categorial data.
(2) Beneficiary Data: It contains 1,38,556 rows and 25 columns. It contains KYC data (like Date of Birth, Gender, Race etc.), Health Condition (Various diseases like Renal disease, Heart, Kidney etc.), State and Country Code.
(3) Inpatient Data: It contains data of those patients who have been admitted in the hospital (healthcare provider) and filed claim. It contains 40,474 rows and 30 columns.
(4) Outpatient Data: It contains data of those patients who have not been admitted in the hospital and filed the claim. It contains 5,17,737 rows and 27 columns.
We have used mainly Inpatient Data and Provider Data to detect the fraudulent Providers.
We have computed for each Beneficiary and Provider, the number of times a beneficiary has been reimbursed claim amount and total claim amount reimbursed. Then, we have added Potential Fraud from Provider dataset. This dataset contains 36,616 rows. Below, we give screenshot of the dataset created:
Result: Top 1,000 Fradulent Providers
We have selected the dataset in "Discover" and asked to find out top 1,000 fradulent providers. Below, we give the screenshot of "Discover":
When we click on "Result" button, we see the following result (given screenshot of Result):
We find that out of 1,000 top rows; 725 rows are of Fraudent Providers. Thus, we have achieved accuracy of 72.5%. Please note that we have achieved this without taking any data from Beneficiary where various details of beneficiary is provided. Below, we give screenshot of the result:
Result 2: Top 100 Fradulent Providers
We have selected the dataset in "Discover" and asked to find out top 100 fradulent providers. Below, we give the screenshot of "Discover":
When we click on "Result" button, we see the following result (given screenshot of Result):
We find that out of 100 top rows; 83 rows are of Fraudent Providers. Thus, we have achieved accuracy of 83% from top 100 Providers. It has been achieved without taking any data from Beneficiary where various details of beneficiary is provided. Below, we give screenshot of the result:
In near future, we shall take few data from Beneficiary dataset to improve the result and give update here.