Most Advanced AI

Uncover the Hidden Gems of Your Business with Our Data Solution

Using AI / ML, and stay ahead of your competitors to enhance your revenue and profitability.



Other Case Studies

Life Insurance Risk Assessment of Insurer

For this case study, we have taken data of Prudential Life Insurance from Kaggle.com: Life Insurance Risk Assessment data

The data set consists of 59,381 life insurance applicants (number of rows) with 128 attributes (number of columns) which describe the characteristics of life insurance applicants.

The data set comprises of categorical, continuous as well as discrete variables, which are anonymized.

The result is "Response" (last column) representing 8 levels of risks.

For further details, please click on the link Continue to Read


Health Insurance Fraud Detection

For this case study, we have taken data from Kaggle.com: Health Insurance Fraud Detection Data

The data set consists of the following data:
(1) Provider Data: It contains ID of healthcare provide and whether it is Potential Fraud. The number of rows is 5,410 and two columns are Provider ID and Potential Fraud (No or Yes). Both are categorial data.
(2) Beneficiary Data: It contains 1,38,556 rows and 25 columns. It contains KYC data (like Date of Birth, Gender, Race etc.), Health Condition (Various diseases like Renal disease, Heart, Kidney etc.), State and Country Code.
(3) Inpatient Data: It contains data of those patients who have been admitted in the hospital (healthcare provider) and filed claim. It contains 40,474 rows and 30 columns.
(4) Outpatient Data: It contains data of those patients who have not been admitted in the hospital and filed the claim. It contains 5,17,737 rows and 27 columns.

For further details, please click on the link Continue to Read


Liver Patients Detection

For this case study, we have taken data from UCI Machine Learning Repository: Liver Patients Data

The dataset consists of 583 rows and 11 attributes (including Target). However, there were 13 duplicate rows. We have removed the duplicate rows and left with 570 rows. Target 1 means the person is Liver patient and 2 means the person is not liver patient.

Without any feature engineering, "Discover" has achieved accuracy of 95%. For further details, please click on the link Continue to Read


Breast Cancer Patients Detection

For this case study, we have taken data from UCI Machine Learning Repository: Breast Cancer Data

The dataset consists of 699 rows and 11 attributes (including Class). Class 2 means Benign Tumor (Not Cancer Patient) and 4 means Malignant Tumor (Breast Cancer patient).

Without any feature engineering, "Discover" has achieved accuracy of 97%. For further details, please click on the link  Continue to Read


Countries Data Analysis

For this case study, we have taken data from Kaggle.com: Countries Data

The dataset consists of 227 rows (number of countries) and 20 attributes. However, we have used the following attributes:
  1. Country
  2. Region
  3. Population
  4. Area (sq. mi.)
  5. GDP ($ per capita)
  6. Literacy (%)
Further, we have retained only those rows which had non-null values for all the above attributes. Thus, we have left with 209 rows.

For further details, please click on the link  Continue to Read


Rainfall Data Analysis

For this case study, we have taken data from www.indiawaterportal.org: Rainfall Data

As the data were given in Excel format, we have converted these to CSV. Conversion process is very simple. Just open the data in Excel and save as CSV.

The dataset consists of 55309 rows and 16 attributes. The data contains monthly rainfall from 1901 to 2002 for various states and districts. The columns are State, District, Year, Jan, Feb, ..., Nov, Dec and Vlookup. We have analysed yearly and quarterly rainfall data using "Discover".

For further details, please click on the link  Continue to Read


Epileptic Seizure Recognition

For this case study, we have taken data from UCI Machine Learning Repository: Epileptic Seizure Recognition Data

The dataset consists of 11500 rows and 180 columns. The data contains a recording of brain activity for 23.6 seconds for 500 persons. Each data point is the value of the EEG recording at a different point in time. So we have total 500 individuals with each has 4097 data points for 23.5 seconds. The data has been shuffled every 4097 data points into 23 chunks, each chunk contains 178 data points for 1 second, and each data point is the value of the EEG recording at a different point in time. So now we have 23 x 500 = 11500 pieces of information(row), each information contains 178 data points for 1 second(column), the last column represents the label y. The first column is ID.

The last column with the name 'y' can take value from 1 to 5. The meaning of each is given below:

1 - Recording of seizure activity

2 - Recording of the EEG from the area where the tumor was located

3 - Yes they identify where the region of the tumor was in the brain and recording the EEG activity from the healthy brain area

4 - eyes closed, means when they were recording the EEG signal the patient had their eyes closed

5 - eyes open, means when they were recording the EEG signal of the brain the patient had their eyes open

For further details, please click on the link  Continue to Read


Chronic Kidney Disease

For this case study, we have taken data from UCI Machine Learning Repository: Chronic Kidney Disease Data

The dataset consists of 400 rows and 25 columns. The details of columns are given below:
(1) age years
(2) blood pressure
(3) specific gravity
(4) albumin
(5) sugar
(6) red blood cells
(7) pus cell
(8) pus cell clumps
(9) bacteria
(10) blood glucose random
(11) blood urea
(12) serum creatinine
(13) sodium
(14) potassium
(15) hemoglobin
(16) packed cell volume
(17) white blood cell count
(18) red blood cell count
(19) hypertension
(20) diabetes mellitus
(21) coronary artery disease
(22) appetite
(23) pedal edema
(24) anemia
(25) class

The last column with the name 'class' can take value "ckd" or "notckd". Value "ckd" means the person is suffering from Chronic Kidney Disease, otherwise not.

For further details, please click on the link  Continue to Read