CBSE- Informatics Practices (IP) : Pandas-II

Data handling using Pandas – II

Descriptive Statistics

Descriptive statistics are used to describe/summarize large data in ways that are meaningful and useful. Means“must knows”with any set of data.It gives us a general idea of trends in our data including:

•The mean,mode,median and range.

•Variance and standard deviation,quartile

•Sum Count,maximum and minimum.

Descriptive statistics is useful because it allows us take decision.

For example,

let’s say we are having data on the incomes of one million people. No one is going to want to read a million pieces of data; if they did, they wouldn’t be able to get any useful information from it. On the other hand, if we summarize it,it becomes useful: an average wage,or a median income,is much easier to understand than reams of data.

Steps to Get the descriptive statistics

Step 1: Collect the Data either from data file or from user

Step 2: Create the DataFrame or Create dataframe from pandas object

Step 3: Get the Descriptive Statistics for Pandas DataFrame or Get the descriptive statistics as per requirement like mean,mode,max,sum etc. from pandas object

Note :-Dataframe object is best for descriptive statistics as it can hold large amount of data and relevant functions.

Pandas dataframe object come up with the methods to calculate max, min, count, sum, mean, median, mode, quartile, Standard deviation, variance.

Mean

Mean is an average of all the numbers. The steps required to calculate a mean are:

sum up all the values of a target variable in the data-set.

divide the sum by the number of values.

Median

Median is the middle value of a sorted list of numbers.The steps required to get a median from a list of numbers are:

sort the numbers from smallest to highest.
if the list has an odd number of values, the value in the middle position is the median.
if the list has an even number of values, the average of the two values in the middle will be the median.

Mode

To find the mode,or modal value,it is best to put the numbers in order. Then count how many of each number. A number that appears most often is the mode.

e.g. {19,8,29,35,19,28,15}.

Arrange them in order:{8,15,19,19,28,29,35}.

19 appears twice, all the rest appear only once, so 19 is the mode.

Having two modes is called "bi-modal".

Having more than two modes is called "multi-modal".

Example:

import pandas as pd
import numpy as np
d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),
'Age':pd.Series([17,18,19,18,19]),
'Score':pd.Series([9,8,9,8,9])}
df=pd.DataFrame(d)
print("Data Frame is :")
print (df)
print("No. of records in above DF are:")
print(df.count())
print("Sum of Score",df[['Score']].sum())
print("Minimum Age",df[['Age']].min())
print("Maxmum Age",df[['Age']].max())
print("Mean Score",df[['Score']].mean())
print("Mode Score",df[['Score']].mode())
print("Median Score",df[['Score']].median())

OUTPUT:

Data Frame is :
Name Age Score
0 Disha 17 9
1 Rohit 18 8
2 Manas 19 9
3 Krish 18 8
4 Vishnu 19 9

No. of records in above DF are:
Name 5
Age 5
Score 5
dtype: int64

Sum of Score
Score 43
dtype: int64

Minimum Age
Age 17
dtype: int64

Maxmum Age
Age 19
dtype: int64

Mean Score
Score 8.6
dtype: float64

Mode Score
Score
0 9

Median Score
Score 9.0
dtype: float64

Use of describe function

#describe method return mean, standard deviation min, max, percentile values

Example-

#program for data aggregation /descriptive statistics using describe method

from pandas import DataFrame

Cars={'Brand':['Maruti','Ford','Tata','Toyota','Audi'], #step-1

'Price':[22000,27000,25000,29000,35000],

'Year':[2014,2015,2016,2017,2018] }

df=DataFrame(Cars, columns=['Brand','Price','Year']) #step-2

ds=df['Price'].describe().astype(int) #step-3

print (df)

print(ds)

OUTPUT:

Brand Price Year
0 Maruti 22000 2014
1 Ford 27000 2015
2 Tata 25000 2016
3 Toyota 29000 2017
4 Audi 35000 2018
count 5
mean 27600
std 4878
min 22000
25% 25000
50% 27000
75% 29000
max 35000

Name: Price, dtype: int32

Note: describe function used for Generate descriptive statistics.

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

How to Calculate Percentile in

Descriptive Statistics

Example:

df1 = pd.DataFrame({'One':[4,6,8,10]},columns=['One']) df1.describe()

OUTPUT:

One

count 4.000000

mean 7.000000

std 2.581989

min 4.000000

25% 5.500000

50% 7.000000

75% 8.500000

max 10.000000

mathematical formula to calculate the percentile :-

min+(max-min)*percentile

in the above example as per formula we get for 25% =

>>> 4+(10-4)*25% = 4+6*25% = 4+1.5=5.5

for 50% =>>> all values are even so average of two middle values (4,6,8,10) (6+8=14/2)=7.0

Using above formula 4+(10-4)*50%=4+6*50%=4+3= 7.0

for 75% =>>>> 4+(10-4)*75% = 4+6*75% = 4+2.5= 8.5

*********************************************************************************

QUARTILES

The word “quartile” comes from the word quantity.means, a quantile is where a sample is divided in to equal-sized or subgroups (that’s why it’s sometimes called a“fractile“).

So that’s why, It can also refer to dividing a probability distribution into areas of equal probability.

The median is a kind of quantile; the median is placed in a probability distribution at center so that exactly half of the data is lower than the median and half of the data is above the median. The median cuts a distribution into two equal parts and thats why sometimes it is called 2-quantile.

in other words we can say that the word "quartile" is taken from the word "quantile" and the word "quantile" taken from the "quantity". Let us understand this by taking an example:-

The 0.35 quantile states tha 35% of the observations in the dataset are below a given line. It also states that there are 65% remaining observations are above the line.

Method to find Quartiles

suppose we have numbers- 1,3,4,7,8,8,9

Step 1: Arrange the data in ascending order (already in ascending order)

Step 2: Count total number of observations (n=7)

Step 3: Find out the first Quartile i.e. Q1 (25%) known as 25th percentile (0.25)

Step 4: Now calculate Q1= round(0.25(n+1)

i.e. round(.25(7+1))

= round(.25(8))=2.0 it means 2nd observation i.e. 3

Step 5: Calculate second Quartile i.e. Q2 (50%) 0.50 or 50th percentile

= round(.50(7+1))=4.0 it means 4th observation i.e. 7

Step 6: Calculate third Quartile i.e.Q3 (75%)=0.75 or 75th percentile

= round (.75(7+1))= 6.0 it means 6th observation i.e. 8

Python Program:

import pandas as pd

import numpy as np

s=pd.Series([1,3,4,7,8,8,9])

q=s.quantile([0.25,0.50,0.75]).astype(int)

print(q)

Output:

0.25 3

0.50 7

0.75 8

dtype: int32

=================================================

Variance

var() –Variance Function in python pandas is used to calculate variance of a given set of numbers, Variance of a data frame, Variance of column and Variance of rows, let’s see an example of each.

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers are spread out from their average value.

Example:
We have a dataset "Score"= [90,40,50,90,65] now let us know how to calculate Variance:

Step 1: Read Data-set (n) [90,40,50,90,65]

Step 2 : Square of each number given in Data-Set= [8100,1600,2500,8100,4225]

Step 3: Sum of each number of Data-Set [90+40+50+90+65 = 335 ]

Step 4: Square of sum of each number (x) = 335*335= 112225

Step 5: Sum of Square of each number (m) = [8100+1600+2500+8100+4225]=24525

Step 6: X / Count of Data-Set (n) = [112225/5] = 22445

Step 7: ( m-n ) =>> 24525-22445= 2080

Step 8: Variance (m-n)/count-1) = 2080/5-1 = 520

Python Program to find out Variance of a Data-Set

import pandas as pd

import numpy as np

d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),

'Score':pd.Series([90,40,50,90,65])}

df=pd.DataFrame(d)

print("Data Frame is :")

print (df)

print ("Variance of Score")

print (df.var())

OUTPUT:

Data Frame is :

Name Score

0 Disha 90

1 Rohit 40

2 Manas 50

3 Krish 90

4 Vishnu 65

Variance of Score

Score 520.0

dtype: float64

We can also use variance function with following purpose:-

print(df.loc[:,"Score"].var()) # used for variance of specific column
print (df.var(axis=0)) # column wise variance

print (df.var(axis=1)) # row wise variance

Standard Deviation

standard deviation means measure the amount of variation / dispersion of a set of values. A low standard deviation means the values tend to be close to the mean in a set and a high standard deviation means the values are spread out over a wider range.

Standard deviation is the most important concepts as far as finance is concerned. Finance and banking is all about measuring the risk and standard deviation measures risk. Standard deviation is used by all portfolio managers to measure and track risk.

Steps to calculate the standard deviation:

Work out the Mean(the simple average of the numbers)
Then for each number:subtract the Mean and square the result
Then work out the sum of those squared differences and divide (N-1)
Take the square root of that and we are done!

Example :

To find out Standard Deviation for a dataset "Score" (n)= [90,40,50,90,65]

Step 1: Work Out the Mean :- (90+40+50+90+65)/5= 335/5=67

Step 2: Then for each number: Subtract the Mean and Square the Result:-
(90-67)= 23 *23 = 529
(40-67)= -27*-27= 729

(50-67)= -13*-13= 289

(90-67)= 23 *23 = 529

(65-67)= -2 *-2 = 4

Step 3: Then work out the sum of those squared differences and divide (N-1):-

Sum (529+729+289+529+4)=2080/(n-1)=2080/4=520 (This Value also known as Variance)

Step 4: Take the Suare root of the Value found at Step 3 (Variance)

The Square Root of 520 is 22.80 i.e. Standard Deviation

Note: we can also apply the steps of finding out Variance and after getting Variance .we can find the square root of the value of Variance for calculation of Standard Deviation.

Python Program to find out Standard Deviation of a Data-Set

import pandas as pd

import numpy as np

d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),

   'Score':pd.Series([90,40,50,90,65])}

df=pd.DataFrame(d)

print("Data Frame is :")

print (df)

print ("Variance of Score")

print (df.var())

print ("Standard Deviation of Score")

print (df.std())

OUTPUT:

Data Frame is :

     Name  Score

0   Disha     90

1   Rohit     40

2   Manas     50

3   Krish     90

4  Vishnu     65

Variance of Score

Score    520.0

dtype: float64

Standard Deviation of Score

Score    22.803509

dtype: float64

DataFrame operations

Aggregation

Aggregation is the process of turning the values of a dataset (or a subset of it) in to one single value or data aggregation is a multi valued function, which require multiple values and return a single value as a result. There are number of aggregations possible like count, sum, min, max, median, quartile etc. These (count, sum etc.) are descriptive statistics and other related operations on DataFrame. Let us make this clear! If we have a DataFrame like

Data Frame is :

Name Score

0 Disha 90

1 Rohit 40

2 Manas 50

3 Krish 90

4 Vishnu 65

…then a simple aggregation method is to calculate the summary of the Score, which is 90+40+50+90+65=335. Or a different aggregation method would be to count the number of Name,which is 5.

Program to show use of aggregate functions in Python DF

import pandas as pd

import numpy as np

d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),

'Score':pd.Series([90,40,50,90,65])}

df=pd.DataFrame(d)

print("Data Frame is :")

print (df)

print ('---Basic Aggregate functions min(),max(),sum(),Avg()---')

print('Minimum Score is:',df['Score'].min())

print('Maximum Score is:',df['Score'].max())

print('Sum of Score is:',df['Score'].sum())

print('Average of Score is:',df['Score'].mean())

OUTPUT:

Data Frame is :
Name Score
0 Disha 90
1 Rohit 40
2 Manas 50
3 Krish 90
4 Vishnu 65
---Basic Aggregate functions min(),max(),sum(),Avg()---
Minimum Score is: 40
Maximum Score is: 90
Sum of Score is: 335
Average of Score is: 67.0

Group by

A group by operation involves one of the following operations on the data frame:-

Splitting the data frame
Applying a function (usually an aggregate function)
Combining the result.

# PROGRAM TO GROUP THE DATA CITY-WISE AND FIND OUT MAXIMUM and MINIMUM TEMPERATURE ACCORDING THE CITY.

import pandas as pd

data={

'Date' : ['1-1-2020','1-1-2020','1-2-2020','1-2-2020','1-3-2020','1-3-2020'],

'City' : ['Jaipur','Jaipur','Jodhpur','Jodhpur','Jaisalmer','Jaisalmer'],

'Temp ': [28,30,22,24,32,34],

'Humidity': [60,55,80,70,90,85]

}

df=pd.DataFrame(data)

print (df)

print('\n result after group operation')

print (df.groupby('City').max())

print (df.groupby('City').min())

OUTPUT:

Date City Temp Humidity

0 1-1-2020 Jaipur 28 60

1 1-1-2020 Jaipur 30 55

2 1-2-2020 Jodhpur 22 80

3 1-2-2020 Jodhpur 24 70

4 1-3-2020 Jaisalmer 32 90

5 1-3-2020 Jaisalmer 34 85

result after group operation

Date Temp Humidity

City

Jaipur 1-1-2020 30 60

Jaisalmer 1-3-2020 34 90

Jodhpur 1-2-2020 24 80

Date Temp Humidity

City

Jaipur 1-1-2020 28 55

Jaisalmer 1-3-2020 32 85

Jodhpur 1-2-2020 22 70

SORTING

Sorting means arranging the contents in ascending or descending order. There are two kinds of sorting available in pandas(Dataframe).
1.By value(column)
2.By index

By value -Sorting over dataframe column/s elements is supported by sort_values() method. We will cover here three aspects of sorting values of dataframe.
Sort a pandas dataframe in python by Ascending and Descending
Sort a python pandas dataframe by single column
Sort a pandas dataframe by multiple columns.

Sort the python pandas Dataframe by single column –Ascending order (Default)

import pandas as pd

import numpy as np

d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),

'Age':pd.Series([16,17,15,14,19]),

'Score':pd.Series([87,89,67,55,47])}

df=pd.DataFrame(d)

print("Data Frame before sorting is :")

print (df)

df=df.sort_values(by='Score')

print("Data Frame after sorting")

print(df)

OUTPUT

Data Frame before sorting is :

Name Age Score

0 Disha 16 87

1 Rohit 17 89

2 Manas 15 67

3 Krish 14 55

4 Vishnu 19 47

Data Frame after sorting

Name Age Score

4 Vishnu 19 47

3 Krish 14 55

2 Manas 15 67

0 Disha 16 87

1 Rohit 17 89

#In above example dictionary object is used to create the dataframe. Elements of dataframe object df is sorted by sort_value() method. As argument we are passing value score for by parameter only.

by default it is sorting in ascending manner.

Sort the python pandas Data frame by single column –Descending order

Sort the pandas Data frame by Multiple Columns

import pandas as pd
import numpy as np
d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),
'Age':pd.Series([16,17,15,14,19]),
'Score':pd.Series([87,89,67,55,47])}
df=pd.DataFrame(d)
print("Data Frame before sorting is :")
print (df)
df=df.sort_values(by=['Age','Score'],ascending=[True,False])
print("Data Frame after sorting")

print(df)

OUTPUT:

Data Frame before sorting is :
Name Age Score
0 Disha 16 87
1 Rohit 17 89
2 Manas 15 67
3 Krish 14 55
4 Vishnu 19 47
Data Frame after sorting
Name Age Score
3 Krish 14 55
2 Manas 15 67
0 Disha 16 87
1 Rohit 17 89

4 Vishnu 19 47

2. By index

Sorting over data frame index sort_index()is supported by sort_values() method.

import pandas as pd
import numpy as np
d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),
'Age':pd.Series([16,17,15,14,19]),
'Score':pd.Series([87,89,67,55,47])}
df=pd.DataFrame(d)
df=df.reindex([1,4,3,2,0])
print("Data Frame before sorting is :")
print (df)
df1=df.sort_index()
print("Data Frame after sorting IN ASCENDING")

print(df1)

OUTPUT:

Data Frame before sorting is :
Name Age Score
1 Rohit 17 89
4 Vishnu 19 47
3 Krish 14 55
2 Manas 15 67
0 Disha 16 87

Data Frame after sorting IN ASCENDING

Name Age Score
0 Disha 16 87
1 Rohit 17 89
2 Manas 15 67
3 Krish 14 55

4 Vishnu 19 47

Sorting pandas data frame by index in descending order

RENAMING INDEX

rename()method is used to rename the indexes in a dataframe

Syntax- df.rename (index,inplace (optional))

Example:

import pandas as pd
L=[{'First_Name': 'Manas', 'last_Name':'Purohit'},
{'First_Name': 'Rohit', 'last_Name':'Soni'},
{'First_Name': 'Disha', 'last_Name':'Bucha'},
{'First_Name': 'Vishnu', 'last_Name':'Nair'},
{'First_Name': 'Shubham', 'last_Name':'Parihar'}
]
df1=pd.DataFrame(L)
print (df1)
df=df1.rename(index={0:'Row1',1:'Row2',2:'Row3',3:'Row4',4:'Row5'})
print("Data Frame after renaming the indexes")

print(df)

OUTPUT:

First_Name last_Name
0 Manas Purohit
1 Rohit Soni
2 Disha Bucha
3 Vishnu Nair
4 Shubham Parihar

Data Frame after renaming the indexes

First_Name last_Name
Row1 Manas Purohit
Row2 Rohit Soni
Row3 Disha Bucha
Row4 Vishnu Nair
Row5 Shubham Parihar

DELETING INDEX

reset_index().drop() method is used to delete the indexes in a dataframe

Syntax- df.reset_index().drop(index,inplace (optional))

Example:

import pandas as pd
L=[{'First_Name': 'Manas', 'last_Name':'Purohit'},
{'First_Name': 'Rohit', 'last_Name':'Soni'},
{'First_Name': 'Disha', 'last_Name':'Bucha'},
{'First_Name': 'Vishnu', 'last_Name':'Nair'},
{'First_Name': 'Shubham', 'last_Name':'Parihar'}
]
df1=pd.DataFrame(L)
print (df1)
df=df1.reset_index().drop(index=[2,3])
print("Data Frame after deleting the indexes")

print(df)

OUTPUT:
First_Name last_Name
0 Manas Purohit
1 Rohit Soni
2 Disha Bucha
3 Vishnu Nair
4 Shubham Parihar

Data Frame after deleting the indexes

index First_Name last_Name
0 0 Manas Purohit
1 1 Rohit Soni
4 4 Shubham Parihar

Pivoting - Data-frame

Pivoting: - Pivoting is one of the important aspect of data analyst. It is used to summarize large amount of data and permit us to access important records from a large dataset.

Python Pandas provide two functions for pivoting:

1. pivot()

2. pivot_table()

pivot() allows us to transform or reshape the data frame based on the column values according to our perspective. It takes 3 arguments: (index,columns and values).

Example:

Program:

import pandas as pd
data={
'Year': ['2018','2019','2018','2019','2018','2019'],
'Team': ['MI','MI','RCB','RCB','CSK','CSK'],
'Runs': ['2500','2650','2200','2400','2300','2700']
}
df=pd.DataFrame(data)
print (df)
pv=pd.pivot(df,index='Year',columns='Team',values='Runs')

print(pv)

OUTPUT:

Year Team Runs
0 2018 MI 2500
1 2019 MI 2650
2 2018 RCB 2200
3 2019 RCB 2400
4 2018 CSK 2300
5 2019 CSK 2700

Team CSK MI RCB
Year
2018 2300 2500 2200

2019 2700 2650 2400

pivot_table():- we know that pivot() method takes at least 2 column names as parameters- index and column. what will happen if we have multiple rows with the same values for these columns.

The pivot_table() method comes to solve this problem. It works like pivot but it aggregates the values from rows with duplicate entries for the specified columns (means apply aggregate function by us)

By default pivot_table() apply mean() to aggregate the values from rows with duplicate entries for the specified columns.

For Example:-

Program:

import pandas as pd
data={
'Date': ['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-3-2019','1-3-2019'],
'City':['Delhi','Delhi','Mumbai','Mumbai','Chennai','Chennai'],
'Temp': [28,30,22,24,32,34],
'Humidity': [60,55,80,70,90,85]
}
df=pd.DataFrame(data)
print (df)
pv=pd.pivot_table(df,index='City',values='Temp')

print (pv)

OUTPUT:

Date City Temp Humidity
0 1-1-2019 Delhi 28 60
1 1-1-2019 Delhi 30 55
2 1-2-2019 Mumbai 22 80
3 1-2-2019 Mumbai 24 70
4 1-3-2019 Chennai 32 90
5 1-3-2019 Chennai 34 85

Temp
City
Chennai 33
Delhi 29

Mumbai 23

Note: in the above program pivot_table() apply mean() function by default.

PROGRAM TO FIND OUT CITYWISE MAXIMUM TEMPERATURE:

import pandas as pd
data={
'Date': ['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-3-2019','1-3-2019'],
'City':['Delhi','Delhi','Mumbai','Mumbai','Chennai','Chennai'],
'Temp': [28,30,22,24,32,34],
'Humidity': [60,55,80,70,90,85]
}
df=pd.DataFrame(data)
print (df)
pv=pd.pivot_table(df,index='City',values='Temp',aggfunc='max')
print (pv)

Output:

Date City Temp Humidity
0 1-1-2019 Delhi 28 60
1 1-1-2019 Delhi 30 55
2 1-2-2019 Mumbai 22 80
3 1-2-2019 Mumbai 24 70
4 1-3-2019 Chennai 32 90
5 1-3-2019 Chennai 34 85

Temp
City
Chennai 34
Delhi 30
Mumbai 24

PROGRAM TO PRINT DATA FRAME ON DATE INDEX AND CITY COLUMN

import pandas as pd

data={

'Date': ['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-3-2019','1-3-2019'],

'City':['Delhi','Delhi','Mumbai','Mumbai','Chennai','Chennai'],

'Temp': [28,30,22,24,32,34],

'Humidity': [60,55,80,70,90,85]

}

df=pd.DataFrame(data)

print (df)

pv=pd.pivot_table(df,index='Date',columns='City')

print (pv)

OUTPUT

Date City Temp Humidity

0 1-1-2019 Delhi 28 60

1 1-1-2019 Delhi 30 55

2 1-2-2019 Mumbai 22 80

3 1-2-2019 Mumbai 24 70

4 1-3-2019 Chennai 32 90

5 1-3-2019 Chennai 34 85

Humidity Temp

City Chennai Delhi Mumbai Chennai Delhi Mumbai

Date

1-1-2019 NaN 57.5 NaN NaN 29.0 NaN

1-2-2019 NaN NaN 75.0 NaN NaN 23.0

1-3-2019 87.5 NaN NaN 33.0 NaN NaN

HANDLING WITH MISSING VALUES

Dropping and filling

Sometimes, the data that we receive from many sources may not be perfect. that means there may be some missing data. for example- in the following program where student name is missing in one row and score is missing in other row:

import pandas as pd
import numpy as np
d={'Name':pd.Series(['Disha','Rohit','Manas',np.nan,'Vishnu']),
'Age':pd.Series([16,17,15,14,19]),
'Score':pd.Series([87,89,67,55,np.nan])}
df=pd.DataFrame(d)

print(df)

OUTPUT:

Name Age Score
0 Disha 16 87.0
1 Rohit 17 89.0
2 Manas 15 67.0
3 NaN 14 55.0

4 Vishnu 19 NaN

When we convert the data into dataframe, the missing data is represented by NaN (Not a Number). Nan is a default marker for the missing values.

Consider the following Dataframe:-

we can use fillna() method to replace Nan or NA value by a specified value.

# to fill the NaN value by 0

OUTPUT:

Name Age Score
0 Disha 16 87.0
1 Rohit 17 89.0
2 Manas 15 67.0
3 0 14 55.0

4 Vishnu 19 0.0

But this is not useful as it is filling any type of column with 0. We can fill each column with a different value by passing the column name and the value to be used to fill in that column.

for example: to fill Student Name with ' XXX' and Score with " **" . We should supply these values as a dictionary inside fillna() method.

OUTPUT:

Name Age Score
0 Disha 16 87
1 Rohit 17 89
2 Manas 15 67
3 XXXX 14 55

4 Vishnu 19 **

if we do not want any missing data and want to remove those rows having Na or NaN values then we can use dropna() method.

example:

Output:

Name Age Score
0 Disha 16 87.0
1 Rohit 17 89.0

2 Manas 15 67.0

*****************************************************

Assignements

Descriptive Statistics

Quartiles

Standard Deviation

aggregation and sorting

Pivoting

Handling with missing values

CBSE- Informatics Practices (IP)

Wednesday, April 22, 2020

Pandas-II

Note: in the above program pivot_table() apply mean() function by default.

No comments:

Post a Comment