CBSE- Informatics Practices (IP) : Pandas-1

Data Handling using Pandas -1

Python Library – Pandas

It is a most famous Python package for data science, which offers powerful and flexible data structures that make data analysis and manipulation easy.

Pandas makes data importing and data analyzing much easier. Pandas builds on packages like NumPy and matplotlib to give us a single & convenient place for data analysis and visualization work.

Pandas or Python Pandas is a library of Python which is used for data analysis.

•The term Pandas is derived from “Panel data system” , which is an ecometric term for multidimentioal, structured data set ecometrics.

•Now a days, Pandas has become a popular option for Data Analysis.

•Pandas provides various tools for data analysis in simpler form.

•Pandas is an Open Source, BSD library built for Python Programming language. BSD stands for Berkeley Software Distributions and BSD licensed code is freely-distributable with copyright and liability disclaimer

•Pandas offers high performance, easy to use data structure and data analysis tools.

•The main author of Pandas is Wes McKinney.

Pandas – Installation/Environment Setup

Pandas installation can be done in Standard Python distribution,using following steps:

There must be service pack installed on our computer if we are using windows.

If it is not installed then we will not be able to install pandas in existing Standard Python(which is already installed).So install it first(google it).

We can check it through properties option of my computer icon
Now move to script folder of python distribution in command prompt (through cmd command of windows).Execute following commands in command prompt serially.
>pip install numpy
>pip install six
>pip install pandas

Wait after each command for installation

Now we will be able to use pandas in standard python distribution.

Type import pandas as pd in python (IDLE) shell.

If it executed without error(it means pandas is installed on your system)Data Structures in Pandas

Two important data structures of pandas are–Series, DataFrame

1. Series

Series is like a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers.

7 4 1 5 8

Basic feature of series are:-

Homogeneous data

Size Immutable

Values of Data Mutable

2. DataFrame
DataFrame is like two dimensional Array with heterogeneous data.
ex-

1 2 3
4 5 6

Pandas Series

It is like one-dimensional array capable of holding data of any type (integer, string, float, python objects, etc.). Series can be created using constructor.

Syntax :- pandas.Series( data, index, dtype, copy) Creation of Series is also possible from – ndarray, dictionary, scalar value.

Series can be created using
1.Array
2.Dict
3.Scalar value or constant

Pandas Series

1. Create an Empty Series:-
e.g.

import pandas as pseries

s = pseries.Series()

print(s)

Output:

Series([], dtype: float64

2. Create a Series from ndarray

Without index

e.g.

import pandas as pd1

import numpy as np1

data = np1.array(['a','b','c','d'])

s = pd1.Series(data) print(s)

Output
0 a
1 b
2 c
3 d

dtype: object

Note : default index is starting from 0

With index position

e.g.

import pandas as p1

import numpy as np1

data = np1.array(['a','b','c','d'])

s = p1.Series(data,index=[100,101,102,103])

print(s)
Output

100 a
101 b
102 c
103 d

dtype: object

Note : index is starting from 100

3. Create a Series from dictionary

Example -1 (without index)

import pandas as pd1

import numpy as np1

data = {'a' : 0., 'b' : 1., 'c' : 2.}

s = pd1.Series(data)

print(s)

Output:

a 0.0
b 1.0
c 2.0

dtype: float64

Example 2 (with index)

import pandas as pd1

import numpy as np1

data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd1.Series(data,index=['b','c','d','a'])

print(s)

Output

b 1.0
c 2.0
d NaN
a 0.0

dtype: float64

4. Create a Series from Scalar

e.g

import pandas as pd1

import numpy as np1

s = pd1.Series(5, index=[0, 1, 2, 3]) print(s)
Output

0 5
1 5
2 5
3 5

dtype: int64 Note :- here 5 is repeated for 4 times (as per no of index)

5. Mathemetical Operation with Series

e.g.

import pandas as pd1

s = pd1.Series([1,2,3])
t = pd1.Series([1,2,4])

u=s+t #addition operation

print (u)

u=s*t # multiplication operation

print (u)

Output # (addition)
0 2
1 4
2 7
dtype: int64

Output # (Multiplication)

0 1
1 4
2 12
dtype: int64

Head function

e.g

import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

print (s.head(3))

Output

a 1
b. 2
c. 3

dtype: int64
#Return first 3 elements

Tail function

e.g

import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

print (s.tail(3))

Output

c 3
d. 4
e. 5

dtype: int64
#Return last 3 elements

Accessing Data from Series with indexing and slicing

e.g.
import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print (s[0]) # for 0 index position
print (s[:3]) #for first 3 index values
print (s[-3:]) # slicing for last 3 index values

Output
1
a.1
b.2
c.3
dtype: int64
c.3
d.4

e.5
dtype: int64

Retrieve Data Using Label as (Index)

e.g.
import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print (s[['c','d']])

Output
c 3
d 4

dtype: int64

Retrieve Data from selection

There are three methods for data selection:

loc gets rows (or columns) with particular labels from the index.
iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index. ix is deprecated and the use of loc and iloc is encouraged instead

Retrieve Data from selection

e.g.
import pandas as pd
>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s.iloc[:3] # slice the first three rows

output:

49 NaN
48 NaN
47 NaN

>>> s.loc[:3] # slice up to and including label 3

output:

49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc

output:

49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN

Pandas DataFrame

It is a two-dimensional data structure, just like any table (with rows &columns).

Basic Features of DataFrame

Columns may be of different types
Size can be changed(Mutable)
Labeled axes (rows / columns)
Arithmetic operations on rows and columns

Create DataFrame

It can be created with followings

Lists
dict
Series
Numpy ndarrays
Another DataFrame

Create an Empty DataFrame

import pandas as pd1

df1 = pd1.DataFrame()

print(df1)

Output

Empty Data Frame

Columns: [ ]

Index: [ ]

Create a DataFrame from Lists

example-1

import pandas as pd1

data1 = [1,2,3,4,5]

df1 = pd1.DataFrame(data1)

print (df1)

Output

0 1

1 2

2 3

3 4

4 5

example-2 

import pandas as pd1

data1 = [['Disha',10],['Rohit',12],['Vishnu',13]]

df1 = pd1.DataFrame(data1,columns=['Name','Age'])

print (df1)

Output

  Name   Age 

1 Disha  10

2 Rohit  12

3 Vishnu 13

Create a DataFrame from Dict of ndarrays / Lists

e.g.1 

import pandas as pd1 

data1 = {'Name':['Disha','Manas'],'Age':[9,10]} 

df1 = pd1.DataFrame(data1) 

print (df1) 

Output

Name Age

1 Disha 9

2 Manas 10

Create a DataFrame from List of Dicts

import pandas as pd1
data1 = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z': 5}]
df1 = pd1.DataFrame(data1)
print (df1)

Output

x y z
0 1 2 NaN
1 5 4 5.0

Create a DataFrame from Dict of Series

e.g.1
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df1 = pd1.DataFrame(d1) print (df1)

Output

one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

Pandas DataFrame Column addition

import pandas as pd df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
print (df)
c = [7,8,9]
df[‘C'] = c # adding a new column named as 'C'
print (df)

Output: (Before adding column C)

A B
0 1 4
1 2 5

2 3 6

Output: (After adding column C)

A B C
0 1 4 7
1 2 5 8

2 3 6 9

Column Deletion

we can delete column using three functions:-

1- del function is used with [ ] bracket without dot(.) operator.

2- pop function is used with ( ) bracket with dot(.) operator.

3- drop function is used with () bracket with axes.

Note: example associated with previous created dataframe df.

del df['C'] # Deleting the last column using DEL function
print (df)

Output:

A B
0 1 4
1 2 5

2 3 6

df.pop('B') #Deleting another column using POP function

Output:
A
0 1
1 2

2 3

import pandas as pd
df = pd.DataFrame({"A":[1, 2, 3],"B":[4, 5, 6]})
print (df)

Output:
A B
0 1 4
1 2 5

2 3 6

df.drop('A',axis=1)

Output:
B
0 4
1 5

2 6

Note#Deleting another column using drop function axis=1 means to delete data column-wise if we use axis=0 means to delete data row-wise with given index no.

Rename columns

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})

Output:
a c
0 1 4
1 2 5
2 3 6

Row Selection, Addition, and Deletion

#Selection by Label

import pandas as pd1 d1 = {'one' : pd1.Series([1, 2, 3],
index=['a', 'b', 'c']), 'two' : pd1.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df1 = pd1.DataFrame(d1)
print (df1.loc['b'])

Output
one 2.0
two 2.0

Name: b, dtype: float64

#Selection by integer location

import pandas as pd1
d1 = {'one':pd1.Series([1,2,3],index=['a','b','c']),
'two':pd1.Series([1,2,3,4],index=['a','b','c','d'])}
df1 = pd1.DataFrame(d1)
print (df1.iloc[2])

Output
one 3.0
two 3.0
Name: c, dtype: float64

Slice Rows : Multiple rows can be selected using ":" operator.
print (df1[2:4])

Addition of Rows

import pandas as pd1
df1 = pd1.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd1.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df1 = df1.append(df2)
print (df1)

Output

a b
0 1 2
1 3 4
0 5 6
1 7 8

Deletion of Rows

# Drop rows with label 0

df1 = df1.drop(0)
print (df1)

#output after deletion of first row of both colomn (0 label)

a b
1 3 4

1 7 8

Iteration on Rows and Columns

If we want to access record or data from a data frame row wise or column wise then iteration is used. Pandas provide 2 functions to perform iterations-

1.iterrows ()

2.iteritems ()

iterrows()

It is used to access the data row wise.

Example-
import pandas as pd
L=[{'First_Name': 'Birbal', 'last_Name':'Jat'},
{'First_Name': 'Vihaan', 'last_Name':'Chodhary'}]
df1=pd.DataFrame(L)
print (df1)
for (row_index,row_value) in df1.iterrows():
print('\n Row index is ::',row_index)
print('Row value is ::')
print(row_value)

OUTPUT:

First_Name last_Name
0 Birbal Jat
1 Vihaan Chodhary

Row index is :: 0
Row value is ::
First_Name Birbal
last_Name Jat

Name: 0, dtype: object

Row index is :: 1
Row value is ::
First_Name Vihaan
last_Name Chodhary

Name: 1, dtype: object

iteritems()

It is used to access the data column wise.
Example-

import pandas as pd
L=[{'First_Name': 'Birbal', 'last_Name':'Jat'},
{'First_Name': 'Vihaan', 'last_Name':'Chodhary'}]
df1=pd.DataFrame(L)
print (df1)
for (col_name,col_value) in df1.iteritems():
print('\n')
print('\n column Name is ::',col_name)
print('Column Value are ::')

print(col_value)

Output:
First_Name last_Name
0 Birbal Jat
1 Vihaan Chodhary

column Name is :: First_Name
Column Value are ::
0 Birbal
1 Vihaan
Name: First_Name, dtype: object

column Name is :: last_Name
Column Value are ::
0 Jat
1 Chodhary

Name: last_Name, dtype: object

Head & Tail

head() returns the first n rows (observe the index values). The default number of elements to display is 5, but you may pass a custom number. tail() returns the last n rows .

import pandas as pd
import numpy as np
d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),
'Age':pd.Series([17,18,19,18,19]),'Score':pd.Series([9,8,9,8,9])}
df=pd.DataFrame(d)
print("Data Frame is:")
print (df)
print("The first two rows of the DataFrame is (use of head function )")
print (df.head(2))
print("The last two rows of the DataFrame is (use of tail function )")

print (df.tail(2))
print("Result of showing 2nd to 4th row ")

print (df[2:5])

OUTPUT:
Data Frame is:
Name Age Score
0 Disha 17 9
1 Rohit 18 8
2 Manas 19 9
3 Krish 18 8
4 Vishnu 19 9

The first two rows of the DataFrame is (use of head function )

Name Age Score
0 Disha 17 9
1 Rohit 18 8

The last two rows of the DataFrame is (use of tail function )

Name Age Score
3 Krish 18 8

4 Vishnu 19 9

Result of showing 2nd to 4th row

Name Age Score
2 Manas 19 9
3 Krish 18 8

4 Vishnu 19 9

Boolean Indexing in Data Frame

In order to access a dataframe with a boolean index, we have to create a dataframe in which index of dataframe contains a boolean value that is “True” or “False”.

import pandas as pd

import numpy as np

d={'Name':['Disha','Rohit','Manas','Krish','Vishnu'],

'Age':[17,18,19,18,19],'Score':[9,8,9,8,9]}

df=pd.DataFrame(d,index=[True,False,True,False,True])

print("Data Frame is:")

print (df)

print("Accessing a DataFrame using .loc[] with index True ")

print (df.loc[True])

print("Accessing a DataFrame using .loc[] with index False ")

print (df.loc[False])

OUTPUT:

Data Frame is:

Name Age Score
True Disha 17 9
False Rohit 18 8
True Manas 19 9
False Krish 18 8
True Vishnu 19 9

Accessing a DataFrame using .loc[] with index True

Name Age Score
True Disha 17 9
True Manas 19 9
True Vishnu 19 9

Accessing a DataFrame using .loc[] with index False

Name Age Score
False Rohit 18 8
False Krish 18 8

Concat operation in data frame

Pandas provides various facilities for easily combining together Series, DataFrame.

pd.concat(objs, axis=0, join='inner', join_axes=None,ignore_index=False

·objs − This is a sequence or mapping of Series, DataFrame, or Panel objects.

·axis − {0, 1, ...}, default 0. This is the axis to concatenate along.

·join − {‘inner’, ‘outer’}, . How to handle indexes on other axis(es). Outer for union and inner for intersection.
By default, the Pandas merge operation acts with an “inner” merge. An inner merge, (or inner join) keeps only the common values in both the left and right dataframes for the result.

·ignore_index − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1.

·join_axes − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic.

The Concat() performs concatenation operations along an axis

example 1:

import pandas as pd
d1={'RollNo':['101','102','103','104'],
'Name':['Disha','Manas','Rohit','Sarthak'],
'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
'Name':['Krishan','Gautam','Shubham','Vishnu'],
'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.concat([df1,df2])
print (df3)
OUTPUT:

RollNo Name Score
0 101 Disha 95
1 102 Manas 92
2 103 Rohit 93
3 104 Sarthak 91
0 105 Krishan 93
1 106 Gautam 92
2 107 Shubham 91
3 108 Vishnu 95

If you want the row labels to adjust automatically according to the join, you will have to set the argument ignore_index as True while calling the concat() function:

Example-2

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

df3=pd.concat([df1,df2],ignore_index=True)

print (df3)

OUTPUT:
RollNo Name Score
0 101 Disha 95
1 102 Manas 92
2 103 Rohit 93
3 104 Sarthak 91
4 105 Krishan 93
5 106 Gautam 92
6 107 Shubham 91
7 108 Vishnu 95

pandas also provides you with an option to label
the DataFrames, after the concatenation, with
a key so that you may know which data came
from which DataFrame.

EXAMPLE-3

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

merge={'Data1':df1,'Data2':df2}

df3=pd.concat(merge)

print (df3)

OUTPUT:

RollNo Name Score

Data1 0 101 Disha 95

1 102 Manas 92

2 103 Rohit 93

3 104 Sarthak 91

Data2 0 105 Krishan 93

1 106 Gautam 92

2 107 Shubham 91

3 108 Vishnu 95

To concatenate DataFrames along column, you can specify

the axis parameter as 1.

EXAMPLE-4

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

df3=pd.concat([df1,df2],axis=1)

print (df3)

OUTPUT:

RollNo Name Score RollNo Name Score

0 101 Disha 95 105 Krishan 93

1 102 Manas 92 106 Gautam 92

2 103 Rohit 93 107 Shubham 91

3 104 Sarthak 91 108 Vishnu 95

Merge operation in data frame

Two Data Frames might hold different kinds of information about the same entity and linked by some common feature/column. To join these Data Frames, pandas provides multiple functions like merge(), join() etc.

Example-1

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

df3=pd.merge(df1,df2,on='Score')

print (df3)

OUTPUT:

RollNo_x Name_x Score RollNo_y Name_y

0 101 Disha 95 108 Vishnu

1 102 Manas 92 106 Gautam

2 103 Rohit 93 105 Krishan

3 104 Sarthak 91 107 Shubham

Join operation in data frame

It is used to merge data frames based on some common column/key

1.Full Outer Join:-	The full outer join combines the results of both
the left and the right outer joins. The joined data frame will contain all
records from both the data frames and fill in NaNs for missing
matches on either side. You can perform a full outer join by specifying
the how argument as outer in merge() function.

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

df3=pd.merge(df1,df2,on='RollNo',how='outer')

print (df3)

Output:

RollNo Name_x Score_x Name_y Score_y

0 101 Disha 95 NaN NaN

1 102 Manas 92 NaN NaN

2 103 Rohit 93 NaN NaN

3 104 Sarthak 91 NaN NaN

4 105 NaN NaN Krishan 93

5 106 NaN NaN Gautam 92

6 107 NaN NaN Shubham 91

7 108 NaN NaN Vishnu 95

2.Inner Join: The inner join produce only those records that match in both the data frame. You have to pass inner in how argument inside merge() function

OUTPUT;

RollNo_x Name_x Score RollNo_y Name_y
0 101 Disha 95 108 Vishnu
1 102 Manas 92 106 Gautam
2 103 Rohit 93 105 Krishan
3 104 Sarthak 91 107 Shubham

3. Right Join :-	The right join produce a complete set of records
from data frame B(Right side Data Frame) with the matching records
(where available) in data frame A( Left side data frame). If there is no
match right side will contain null. You have to pass right in how
argument inside merge() function.

OUTPUT:

RollNo Name_x Score_x Name_y Score_y

0 105 NaN NaN Krishan 93

1 106 NaN NaN Gautam 92

2 107 NaN NaN Shubham 91

3 108 NaN NaN Vishnu 95

4. Left Join :-

The right join produce a complete set of records from data frame A(Left side Data Frame) with the matching records (where available) in data frame B ( Right side data frame). If there is no match

left side will contain null. You have to pass left in how argument inside merge() function.

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

df3=pd.merge(df1,df2,on='RollNo',how='left')

print (df3)

OUTPUT:

RollNo Name_x Score_x Name_y Score_y

0 101 Disha 95 NaN NaN

1 102 Manas 92 NaN NaN

2 103 Rohit 93 NaN NaN

3 104 Sarthak 91 NaN NaN

5. Joining on Index :- Sometimes you have to perform the join on the indexes or the row labels. For that you have to specify right_index ( for the indexes of the right data frame )and left_index( for the indexes of left data frame) as True.

import pandas as pd

d1={'RollNo':['101','102','103','104'],

'Name':['Disha','Manas','Rohit','Sarthak'],

'Score':['95','92','93','91']}

d2={'RollNo':['105','106','107','108'],

'Name':['Krishan','Gautam','Shubham','Vishnu'],

'Score':['93','92','91','95']}

df1=pd.DataFrame(d1)

df2=pd.DataFrame(d2)

df3=pd.merge(df1,df2,right_index=True, left_index=True)

print (df3)

OUTPUT:

RollNo_x Name_x Score_x RollNo_y Name_y Score_y

0 101 Disha 95 105 Krishan 93

1      102    Manas      92      106   Gautam      92

2      103    Rohit      93      107  Shubham      91

3      104  Sarthak      91      108   Vishnu      95

Importing/Exporting Data between CSV files and Data Frames.

CSV File: Introduction

A CSV is a comma separated values file, which allows data to be saved in a tabular format. CSV is a simple file such as a spreadsheet or database. Files in the csv format can be imported and exported from programs that store data in tables, such as Microsoft excel or Open Office.

CSV files data fields are most often separated, or delimited by a comma. Here the data in each row are delimited by comma and individual rows are separated by newline.

To create a csv file, first choose your favorite text editor such as- Notepad and open a new file. Then enter the text data you want the file to contain, separating each value with a comma and each row with a new line. Save the file with the extension.csv. You can open the file using MS Excel or another spread sheet program. It will create the table of similar data.

Advantages of CSV format-

A simple, compact and ubiquitous format for data storage.

A common format for data interchange.

It can be opened in popular spredsheet packages like MS-EXCEL etc.

Nearly all spredsheets and databases support import/export to csv format.

Reading from a CSV file to DataFrame

Syntax

import pandas as pd

df=pd.read_csv() <file-path>

Open an MS-Excel file and enter some data in it and save <file_name>.csv (with .csv extension). give correct path in Python Program:

Assume the file path as e:\student.csv then following type of file will be opened

Now we will create a DataFrame like this:

import pandas as pd

df=pd.read_csv("e:\student.csv")

print (df)

OUTPUT:

RollNo Name Class Marks

0 101 Disha 12-A 95

1 102 Manas 12-A 94

2 103 Rohit 12-A 94

3 104 Gautam 12-B 95

4 105 Krishna 12-B 94

USE OF HEADER as None

import pandas as pd

df=pd.read_csv("e:\student.csv", header=None)

print (df)

OUTPUT:

0 1 2 3

0 RollNo Name Class Marks

1 101 Disha 12-A 95

2 102 Manas 12-A 94

3 103 Rohit 12-A 94

4 104 Gautam 12-B 95

5 105 Krishna 12-B 94

Reading selected lines from CSV file Use of nrows ()

import pandas as pd

df=pd.read_csv("e:\student.csv",nrows=3)

print (df)

OUTPUT:

RollNo Name Class Marks

0 101 Disha 12-A 95

1 102 Manas 12-A 94

2 103 Rohit 12-A 94

Exporting data from data frame to CSV File

To export a data frame into a csv file first of all, we create a dataframe say df1 and use dataframe.to_csv ( E:\student1.csv ’ ) method to export data -frame df1 into csv file student1.csv.

import pandas as pd

d={'Name':['Disha','Rohit','Manas','Krish','Vishnu'],

'Age':[17,18,19,18,19],

'Score':[9,8,9,8,9]}

df1=pd.DataFrame(d)

df1.to_csv("e:\student1.csv")

now a MS-Excel file will be created at given path i.e. in E drive of computer system.

OUTPUT: