Most Advanced AI

Uncover the Hidden Gems of Your Business with Our Data Solution

Using AI / ML, and stay ahead of your competitors to enhance your revenue and profitability.



Case Study - Health Insurance Fraud Detection

Dataset:

For this case study, we have taken data of Prudential Life Insurance from Kaggle.com: Dataset.

The data set consists of the following data:
(1) Provider Data: It contains ID of healthcare provide and whether it is Potential Fraud. The number of rows is 5,410 and two columns are Provider ID and Potential Fraud (No or Yes). Both are categorial data.
(2) Beneficiary Data: It contains 1,38,556 rows and 25 columns. It contains KYC data (like Date of Birth, Gender, Race etc.), Health Condition (Various diseases like Renal disease, Heart, Kidney etc.), State and Country Code.
(3) Inpatient Data: It contains data of those patients who have been admitted in the hospital (healthcare provider) and filed claim. It contains 40,474 rows and 30 columns.
(4) Outpatient Data: It contains data of those patients who have not been admitted in the hospital and filed the claim. It contains 5,17,737 rows and 27 columns.

We have used mainly Inpatient Data and Provider Data to detect the fraudulent Providers. We have computed for each Beneficiary and Provider, the number of times a beneficiary has been reimbursed claim amount and total claim amount reimbursed. Then, we have added Potential Fraud from Provider dataset. This dataset contains 36,616 rows. Below, we give screenshot of the dataset created:

Medical Insurance Summarized Data


Result: Top 1,000 Fradulent Providers


We have selected the dataset in "Discover" and asked to find out top 1,000 fradulent providers. Below, we give the screenshot of "Discover":

Medical Insurance Discover

When we click on "Result" button, we see the following result (given screenshot of Result):

Medical Insurance Result

We find that out of 1,000 top rows; 725 rows are of Fraudent Providers. Thus, we have achieved accuracy of 72.5%. Please note that we have achieved this without taking any data from Beneficiary where various details of beneficiary is provided. Below, we give screenshot of the result:

Medical Insurance Result With Count


Result 2: Top 100 Fradulent Providers


We have selected the dataset in "Discover" and asked to find out top 100 fradulent providers. Below, we give the screenshot of "Discover":

Medical Insurance Discover

When we click on "Result" button, we see the following result (given screenshot of Result):

Medical Insurance Result

We find that out of 100 top rows; 83 rows are of Fraudent Providers. Thus, we have achieved accuracy of 83% from top 100 Providers. It has been achieved without taking any data from Beneficiary where various details of beneficiary is provided. Below, we give screenshot of the result:

Medical Insurance Result With Count

In near future, we shall take few data from Beneficiary dataset to improve the result and give update here.