Data Visualization
INTRODUCTION:
When data is shown in the form of pictures, it becomes easy for the user to understand it. So representing the data in the form of pictures or graph is called “data visualization”.
It represents patterns, trends, correlations etc. In data and thereby helps decision makers to understand the meaning of data for making decision in business.
Several data visualization libraries are available in Python, namely Matplotlib, Seaborn, and Folium etc.
Matplotlib is a python library which provides many interfaces and function to present data in 2D graphics. We can say, Matplotlib is a high quality plotting library of Python.
Matplotlib library offers many different collections of sub modules; Pyplot is one such sub module.
Pyplot is a collection of methods within Matplotlib library which allows user to construct 2D plots easily.
- Drawing–plots can be drawn based on passed data through specific functions.
- Customization–plots can be customized as per requirement after specifying it in the arguments of the functions. Like color,style (dashed,dotted), width; adding label,title,and legend in plots can be customized.
- Saving–After drawing and customization plots can be saved for future use.
Installing and importing Matplotlib-
python –m pip install –U pip
python –m pip install –U matplotlib
or
pip install matplotlib in command prompt
To use Pyplot for data visualization, we have to first import it in our python environment.
import matplotlib.pyplot as plt
To use Pyplot for data visualization, we have to first import it in our python environment.
import matplotlib.pyplot as plt
Types of plot using matplotlib
- LINE PLOT
- BAR GRAPH
- HISTOGRAM
- PIE CHART
- FREQUENCY POLYGON
- BOX PLOT
- SCATTER PLOT
Line Plot:
A line plot / chart is a graph that shows the frequency of data occurring along a number line.
The line plot is represented by a series of data points connected with a straight line. Generally line plots are used to display trends over time.
A line plot or line graph can be created using the plot()function available in pyplot library. We can, not only just plot a line but we can explicitly define the grid, the x and y axis scale and labels,title and display options etc.
A line plot or line graph can be created using the plot()function available in pyplot library. We can, not only just plot a line but we can explicitly define the grid, the x and y axis scale and labels,title and display options etc.
1.Simple Line Draw:
2. Setting label of x and y axis , adding title
3. Changing the line color , line width and line style
4. Changing the Marker Type, Size and color
Bar Graph
A bar
graph is used to represents
data in the form of vertical
or
horizontal bars. It is useful to
compare the
quantities.
- Decide the no. of X points, we can use arange() or linspace() function to find no. of points based on the length of values in sequence.
- Decide the thickness of each bar and accordingly adjust X point on X-axis
- Give different color to different data ranges
- The width remains the same for all ranges being plotted
- Call plot() for each data range
Pie Chart
A pie chart shows a circle that is divided into sectors and each sector represents a proportion of the whole.
Pie Charts shows proportions and percentages between categories,by dividing a circle into proportional segments/parts. Each arc length represents a proportion of each category, while the full circle represents the total sum of all the data,equal to 100%.
import matplotlib.pyplot as plt
#Data to plot
labels='Candidate1','Candidate2','Candidate3','Candidate4'
votes=[315,130,245,210]
sizes=votes
colors=['gold','yellowgreen','lightcoral','lightskyblue']
explode=(0.1,0,0,0)#explode 1st slice
#Plot
plt.pie(sizes,explode=explode,labels=labels,colors=colors,
autopct='%1.1f%%',shadow=True,startangle=140)
plt.axis('equal')
plt.show()
OUTPUT:
The pie chart drawn using the Matplotlib.pyplot can be customized of its several aspects:-
- Sometimes we want to emphasize on one or more slice and show them little pulled out. This feature is called explode in pie chart ·If we want to explode or stand out 2nd and 3rd slice out of 5 slices to 0.2 and 0.3 unit respectively , explode will be [0,0.2,0.3,0,0]. The value of explode vary from 0.1 to 1 to show that how much a slice will come out of pie chart.
- The start angle parameter rotates the pie chart by the specified number of degrees.The rotation is counter clockwise and performed on X Axis of the pie chart.
- Shadow effect can be provided using the shadow parameter of the pie()function. Passing True will make a shadow appear below the rim of the pie chart. By default value of shadow is False and there will be no shadow of the pie chart.
- Shadow= True indicates that the pie chart should be displayed with a shadow. This will improve the look of the chart.
- The wedges of the pie chart can be further customized using the wedge prop parameter. A python dictionary with the name value pairs describing the wedge properties like edge color,line width can be passed as the wedge prop argument.
- By setting the frame argument to True, the axes frame is drawn around the pie chart.
- Autopct parameter of the arc()function control s how the percentages are displayed in the wedges. Either format string starting with a% can be specified or a function can be specified.
- autopct : allows to view percentage of share in a pie chart-The option autopct=’%.1f %%’ indicates how to display the percentages on the slices. Here %.1 shows that the percentage value should be displayed with 1 digit after decimal point. The next two % symbols indicates that only one symbol is to be displayed.
- e.g.,%.1f will display percentage values in the format 25.0,35.2 and soon. %.2f%% will display percentage values in the format 50.25,75.5 and soon.
Histogram
Histogram shows distribution of values. Histogram is similar to bar graph but it is useful to show values grouped in bins or intervals.
Histogram provide s a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values(“bins”). It is similar to a vertical bar graph but without gaps between the bars.
For example- we can collect the age of each employee in a office and show it in the form of a histogram to know how many employees are there in the range 0-10 years, 10-20 years and so on. For this we can create histogram like this-
Frequency Polygons
Frequency polygon is a way for understanding the shape of distributions. It connects the top center point of each bins and then we get the relative frequency polygon. It has the same purpose as the histogram have but is used specially for comparing sets of data.
Box Plot
A Box plot is graphical representation of the five number summary of given data set. It includes-
1. Maximum
2. Minimum
3. 1st Quartile
4. ND Quartile (Median)
5. 3RD Quartile
Example: 1
Example: 2
Scatter Chart
A scatter plot is a type of plot that shows the
data
as a collection of points in the
form
of dots, and shows the relationship between
two
variables - one plotted along the x- axis and the other plotted along y-axis.
Syntax- Scatter(x,
y, color, marker
Marker- is a symbol (style) for representing data point.
Following is a list
of
valid marker style-
Marker
|
Description
|
‘s’
|
Square Marker
|
‘o’
|
Circle Marker
|
‘d’
|
Diamond Marker
|
‘x’
|
Cross Marker
|
‘+’
|
Plus
Marker
|
‘^’
|
Triangle
down
|
‘v’
|
Triangle Up
|
Example: 1
Example: 2
How to save plot
For future use we have to save the plot.To save any plot savefig() method is used . plot scan be saved like pdf,svg,png,jpg file formats.
plt.savefig('line_plot.pdf')
plt.savefig('line_plot.svg')
plt.savefig('line_plot.png')
Parameter for saving plots .e.g.
plt.savefig('line_plot.jpg', dpi=300, quality=80, optimize=True, progressive=True)
Which Export Format to Use?
The export as vector-based SVG or PDF files is generally preferred over bitmap-based PNG or JPG files as they are richer formats, usually providing higher quality plots along with smaller file sizes.
Example:
****************************************************
Assignment
2. Name the function which is used to draw horizontal bar graph in Python?
3. What is use of legend function in bar graph?
4. What is use of xlim() function in bar graph?
5. What is use of xticks() function in bar graph?