Applying Exploratory Data Analysis on IRIS dataset

Iris dataset is one of the most easiest and straightforward datasets to use.

In Iris dataset, we can represent dataset as a matrix in the following format.

However, more often than not, a dataset also contains labels or output values.

Dataset D is mathematically expressed as

D= { x_i,y_i}ⁿ_i=1

Furthermore, most of the labeled dataset also contains well class labels. In the case of Iris dataset, the class labels are the flower names.

In the case of Iris dataset, x_ican be any real number whereas y_iis a value from a set of flower names.

Using pair plots we can show we can differentiate between different flowers.

In this analysis, we use this dataset to introduce the readers to exploratory data analysis.

By plotting the parameters of the different species, helps in finding a useful relation to distinguishing between these flowers.

Some of the observation we found were:

Using sepal_length and sepal_width features, we can distinguish Setosa flowers from others.
Separating Versicolor from Virginica is much harder as they have considerable overlap.
petal_length and petal_width are the most useful features to identify various flower types.
While Setosa can be easily identified (linearly separable), Virginica and Versicolor have some overlap (almost linearly separable).
We can find “lines” and “if-else” conditions to build a simple model to classify the flower types

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw eda_on_iris.ipynb hosted with ❤ by GitHub

Related Posts