Most Advanced AI

Uncover the Hidden Gems of Your Business with Our Data Solution

Using AI / ML, and stay ahead of your competitors to enhance your revenue and profitability.



Case Study - Countries Data Analysis

Dataset:

For this case study, we have taken data from Kaggle.com: Countries Data

The dataset consists of 227 rows and 20 attributes. Below, we give list of attributes:
  1. Country *
  2. Region *
  3. Population *
  4. Area (sq. mi.) *
  5. Pop. Density (per sq. mi.)
  6. Coastline (coast/area ratio)
  7. Net migration
  8. Infant mortality (per 1000 births)
  9. GDP ($ per capita) *
  10. Literacy (%) *
  11. Phones (per 1000)
  12. Arable (%)
  13. Crops (%)
  14. Other (%)
  15. Climate
  16. Birthrate
  17. Deathrate
  18. Agriculture
  19. Industry
  20. Service
However, we have used the following attributes:
  1. Country
  2. Region
  3. Population
  4. Area (sq. mi.)
  5. GDP ($ per capita)
  6. Literacy (%)
Further, we have retained only those rows which had non-null values for all the above attributes. Thus, we have left with 209 rows.


Result: Countries Data Result


We have selected the dataset in "Discover" and asked to find out top 1000 Countries (as there are only 209, it listed all 209 countries). Please note that we have not done any feature engineering. Below, we give the screenshot of "Discover":

Countries Data Discover

When we click on "Result" button, we see the following result (given screenshot of Result):

Countries Data Result

Analysis of Result


Let us analyse the result given by Discover. The top two countries are China and India and the key column for both the countries is "Population". It should be noted that these two countries have population above 1 billion whereas the third highest populated country "United States" has population of 298 million as per data given in our dataset.

The next four countries are Russia, United States, Canada and Brazil. The key column (Column that has highest impact, however anomaly is computed based on Population, Area, GDP and Literacy) is Area (sq. mi.). Russia has highest area. However, Canada has slightly bigger area than United States but placed lower. The reason is its population is much smaller than United states.

The next country Luxembourg and key column is GDP ($ per capita). The country has GDP of 55,100 that is highest among all countries listed in the given data.

We have analysed countries with high Population, Area and GDP. Now, let us take the case of Literacy. We find that it has listed first country with least Literacy is Niger that has literacy rate of 17.6%. The next country listed with low literacy rate is Pakistan. It has 45.7% literacy which is not among lowest. However, it has been listed probably because it has comparatively higher GDP and high population. Please also note that it has put countries in the top based on High Population, High Area and High GDP whereas it has put countries on top with low literacy on its own (without any additional input or feature engineering).