Uncategorized
January 22, 2019
No Comments
By Databrio Admin

Visualization Of Data In Python Part 1

Databrio Admin

“Every second of every day, our senses bring in way too much data than we can possibly process in our brains.” – Peter Diamandis, Chairman/CEO, X-Prize Foundation.

However farfetched and over-exaggerating the quote may seem, it is nothing but the truth. The amount of data that surrounds us is immense. These data provide us with information which in turn provide us with insights. But, while it may seem that collecting a lot of data is very useful, the massive quantity of data is a disadvantage in itself. Simply by observing the numerous aspects of data, it is not humanely possible to reach any satisfactory conclusion. This is where Data Visualization comes in handy. This blog will be covering how data can be visualized using a very common language, Python.

Why Python?

Python, being an object-oriented, open source, flexible, and easy-to-use programming language, is widely used by Data Analysts and Data Scientists. Also, the wide varieties of libraries make it easier to analyze data.

How to visualize data using Python?

Python already has two exclusive libraries for Data Visualization. They are:

Matplotlib: Matplotlib is a python based plotting library. It mainly provides 2D visualizations of data while also supporting limited 3D graphic visualizations.
Seaborn: Seaborn is based on Matplotlib. It provides multiple features like numerous color palettes, themes, tools to visualize data, statistical time series and many more.

How to visualize data using Python libraries?

Histogram

Histograms are used to visualize the distribution of continuous data. By visualizing data using a histogram, we can approximate statistical values like the mean, median, mode of data, and the distribution of the variable.

Creating a Histogram in Python:

The data set used here is the iris data that is part of SciKit-learn package.

import seaborn as sns

df = sns.load_dataset(“iris”)

df.sepal_length.hist()

#Creating the histogram plot selecting the required continuous variable from the dataset

plt.title(“Sepal Length Distribution”) #setting plot title

plt.xlabel(“Sepal Length”) #setting x-axis label

plt.ylabel(“Count of iris plants”) #setting y-axis label

plt.show()

Box Plot

Box Plots show the variation of data ranging from the minimum to the maximum value. Box Plots are used extensively to detect outliers from a given dataset. Any data outside the upper and lower limit is usually considered as an outlier.

Upper Limit: Q3 + 1.5 * (Q3 – Q1), and,

Lower Limit: Q1 – 1.5 * (Q3 – Q1),

Where Q3 and Q1 are the third and first quartiles respectively, and (Q3 – Q1) is the Inter-Quartile Range. We can also see box plots between two continuous, or one continuous and one categorical variable.

Creating a Box Plot in Python:

Here also the dataset of iris has been used.

import seaborn as sns

df = sns.load_dataset(“iris”)

plt.boxplot(df[‘sepal_length’]) #Creating a boxplot using matplotlib

sns.boxplot(df[‘sepal_length’]) #Creating a boxplot using seaborn

Box Plot of sepal length using matplotlib

Fig: Box Plot of sepal length using seaborn

Bar Charts and Stacked Bar Charts

Bar Charts are mostly used to compare values of different categorical variables. Stacked Bar charts are usually used to compare multiple metrics across different categories.

Creating Bar Chart and Stacked Bar Chart in Python:

Here again, the iris dataset has been used.

import seaborn as sns

df = sns.load_dataset(“iris”)

df.groupby(‘species’).mean().plot(kind = ‘bar’) #selecting “species” as the independent variable and plotting the mean dimensions of the different species of iris flowers

df.groupby(‘species’).mean().plot(kind = ‘bar’, stacked = True, color = [‘red’, ‘blue’, ‘green’, ‘yellow’], grid = False) #creating stacked bar chart

Bar Chart comparing mean dimensions of different iris flower species

Stacked Bar Chart comparing mean dimensions between different iris flower species

Line Chart

Line Charts are usually plotted to study time series data. They are mostly used to detect trends over time.

Creating a Line Chart in Python

Here the flights dataset from the seaborn package has been used

import seaborn as sns

df1 = sns.load_dataset(“flights”)

df1.plot(x = ‘month’, y = ‘passengers’,kind = ‘line’)

This creates a trend line of the number of passengers availing flights in over the months in 3 years period

plt.ylabel(“#passengers”)

plt.xlabel(“months”)

plt.show()

Time Chart showing trend of the number of passengers availing flights over the months

Data Brio Academy offering Python Programming in Kolkata

Are you interested to learn more about data visualization using Python? Join us! Our course of Python programming in Kolkata will equip you with practical skills as
well as in-depth knowledge to make you industry-ready.

At Data Brio Academy, you’ll learn about Python’s extensive collection of libraries and how to use them for analytics. You’ll also learn about techniques related to data
management, data visualization, and Python usage in ML algorithms.

Industry-level training and guidance from world-class mentors will prepare you for a career growth.

Visualization Of Data In Python Part 1

Databrio Admin

Data Brio Academy offering Python Programming in Kolkata

About the Academy

Resources

Courses

Recent Posts

Generative AI: The New Cornerstone of Innovative Data Science and Modelling

Navigating Artificial Intelligence Job Interviews: Critical Questions and Skills Employers Look For

Python vs R Programming: Choosing the Right Language for Data Science in 2025

Mastering Python: A Definitive Roadmap for Aspiring Data Scientists

The Evolution of Generative AI: Enter the Prompt Engineers