Iris dataset is one of the most easiest and straightforward datasets to use.

In Iris dataset, we can represent dataset as a matrix in the following format.

Petal lengthPetal WidthSepal lengthSepal width
Flower-1
Flower-2
:
Flower-n

However, more often than not, a dataset also contains labels or output values.

Dataset D is mathematically expressed as

D= { xi,yi}ni=1

Furthermore, most of the labeled dataset also contains well class labels. In the case of Iris dataset, the class labels are the flower names.

In the case of Iris dataset, xi  can be any real number whereas yis a value from a set of flower names.

Petal lengthPetal WidthSepal lengthSepal widthFlower type
Flower-1Virginia
Flower-2Sentosa
:

Using pair plots we can show we can differentiate between different flowers.

In this analysis, we use this dataset to introduce the readers to exploratory data analysis.

By plotting the parameters of the different species, helps in finding a useful relation to distinguishing between these flowers.

Some of the observation we found were:

  1. Using sepal_length and sepal_width features, we can distinguish Setosa flowers from others.
  2. Separating Versicolor from Virginica is much harder as they have considerable overlap.
  3. petal_length and petal_width are the most useful features to identify various flower types.
  4. While Setosa can be easily identified (linearly separable), Virginica and Versicolor have some overlap (almost linearly separable).
    We can find “lines” and “if-else” conditions to build a simple model to classify the flower types
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 Comments

What do you think?