Saturday, April 11, 2020

Pandas-1



Data Handling  using  Pandas -1


Python Library Pandas
It is a most famous Python package for data science, which offers  powerful and flexible data structures that make data analysis and  manipulation easy.
Pandas makes data importing and data analyzing  much easier. Pandas builds on packages like NumPy and matplotlib  to give us a single & convenient place for data analysis and  visualization work.

Pandas or Python Pandas is a library of Python which is used for data analysis.

•The term Pandas is derived from “Panel data system” , which is an ecometric term for multidimentioal, structured data set ecometrics.
•Now a days, Pandas has become a popular option for Data Analysis.
•Pandas provides various tools for data analysis in simpler form.
•Pandas is an Open Source, BSD library built for Python Programming language. BSD stands for Berkeley Software Distributions and BSD licensed code is freely-distributable with copyright and liability disclaimer
•Pandas offers high performance, easy to use data structure and data analysis tools.
•The main author of Pandas is Wes McKinney.


 Pandas Installation/Environment Setup


Pandas installation can be done in Standard Python distribution,using following steps:

There must be service pack installed on our computer if we are using windows.

If it is not installed then we will not be able to install pandas in existing Standard Python(which is already installed).So install it first(google it).


We can check it through properties option of my computer icon
Now move to script folder of python distribution in command prompt (through cmd command of windows).Execute following commands in command prompt serially.

>pip install numpy
>pip install six
>pip install pandas


Wait after each command for installation

Now we will be able to use pandas in standard python distribution.

Type import pandas as pd in python (IDLE) shell.

If it executed without error(it means pandas is installed on your system)Data Structures in Pandas



Two important data structures of pandas are–Series, DataFrame

1. Series

Series is like a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers.

7 4 1 5 8

Basic feature of series are:-



  • Homogeneous data
  • Size Immutable
  • Values of Data Mutable


2. DataFrame
 DataFrame is like two dimensional Array with heterogeneous data.
ex-
1 2 3
4 5 6


Pandas Series

It is like one-dimensional array capable of holding data of any type (integer, string, float, python objects, etc.). Series can be created using constructor.

Syntax :- pandas.Series( data, index, dtype, copy) Creation of Series is also possible from – ndarray, dictionary, scalar value.

Series can be created using
1.Array
2.Dict
3.Scalar value or constant



Pandas Series 

1. Create an Empty Series:- 

e.g.

import pandas as pseries 
s = pseries.Series() 
print(s) 


Output:

 Series([], dtype: float64

2. Create a Series from ndarray

Without index

e.g.

import pandas as pd1 
import numpy as np1

data = np1.array(['a','b','c','d'])

s = pd1.Series(data) print(s)


Output
0   a
1   b
2   c
3   d

dtype: object

Note : default index is starting from 0



With index position

e.g.

import pandas as p1 
import numpy as np1

data = np1.array(['a','b','c','d'])

s = p1.Series(data,index=[100,101,102,103])

print(s)

Output

100  a

101  b
102  c
103  d 
 dtype: object

Note : index is starting from 100

3. Create a Series from dictionary



Example -1 (without index)

import pandas as pd1 
import numpy as np1

data = {'a' : 0., 'b' : 1., 'c' : 2.}

s = pd1.Series(data)


print(s)

Output:

a    0.0
b    1.0
c    2.0

dtype: float64



Example 2 (with index) 


 import pandas as pd1 
 import numpy as np1

data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd1.Series(data,index=['b','c','d','a'])

print(s)


Output 

b    1.0
c    2.0
d    NaN
a    0.0

dtype: float64



4. Create a Series from Scalar 
e.g

import pandas as pd1
import numpy as np1

s = pd1.Series(5, index=[0, 1, 2, 3]) print(s)

Output 

0    5
1    5
2    5
3    5

dtype: int64 Note :- here 5 is repeated for 4 times (as per no of index)

5. Mathemetical Operation with Series



e.g.

import pandas as pd1 
s = pd1.Series([1,2,3])
t = pd1.Series([1,2,4])

u=s+t              #addition operation 
print (u) 
u=s*t             # multiplication operation
print (u)


Output # (addition)
0    2
1    4
2    7
dtype: int64


Output # (Multiplication)

0    1
1    4
2    12

dtype: int64



Head function 
 e.g

import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) 
print (s.head(3))

Output

a    1
b.   2
c.   3

dtype: int64
#Return first 3 elements



Tail function 
 e.g

import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) 
print (s.tail(3))

Output 
c    3
d.   4
e.   5

dtype: int64
#Return last 3 elements

Accessing Data from Series with indexing and slicing

e.g.
import pandas as pd1 
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) 
print (s[0])  # for 0 index position 
print (s[:3]) #for first 3 index values 
print (s[-3:]) # slicing for last 3 index values 

Output 
1
a.1
b.2
c.3
dtype: int64 
c.3
d.4

e.5 
dtype: int64


Retrieve Data Using Label as (Index) 
e.g.
import pandas as pd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) 
print (s[['c','d']])

Output 
3
d 4

dtype: int64



Retrieve Data from selection 

There are three methods for data selection:
  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index. ix is deprecated and the use of loc and iloc is encouraged instead

Retrieve Data from selection 

e.g.
import pandas as pd
>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5]) 
>>> s.iloc[:3] # slice the first three rows


output:

49 NaN 
48 NaN 
47 NaN 

>>> s.loc[:3] # slice up to and including label 3 


output:

49 NaN 
48 NaN 
47 NaN 
46 NaN 
45 NaN
1  NaN
2  NaN
3  NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like  loc 

output: 

49 NaN 
48 NaN 
47 NaN 
46 NaN 
45 NaN
1  NaN
2  NaN
3  NaN


Pandas DataFrame  

 It is a two-dimensional data structure, just like any table (with rows &columns).

Basic Features of DataFrame
  • Columns may be of different types
  • Size can be changed(Mutable)
  • Labeled axes (rows / columns)
  • Arithmetic operations on rows and columns

Create DataFrame 

It can be created with followings
  • Lists
  • dict
  • Series
  • Numpy ndarrays
  • Another DataFrame 

Create an Empty DataFrame 

import pandas as pd1 
df1 = pd1.DataFrame() 
print(df1) 

Output

Empty Data Frame 
Columns: [ ] 
Index: [ ]

Create a DataFrame from Lists

example-1

import pandas as pd1 
data1 = [1,2,3,4,5] 
df1 = pd1.DataFrame(data1) 
print (df1)



Output


0 1
1 2
2 3
3 4
4 5


example-2 

import pandas as pd1
data1 = [['Disha',10],['Rohit',12],['Vishnu',13]]
df1 = pd1.DataFrame(data1,columns=['Name','Age'])
print (df1)

Output

  Name   Age 
1 Disha  10
2 Rohit  12
3 Vishnu 13

Create a DataFrame from Dict of ndarrays / Lists 

e.g.1 
import pandas as pd1 
data1 = {'Name':['Disha','Manas'],'Age':[9,10]} 
df1 = pd1.DataFrame(data1) 
print (df1) 

Output
Name Age
1 Disha 9
2 Manas 10

Create a DataFrame from List of Dicts 


import pandas as pd1 
data1 = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z': 5}] 
df1 = pd1.DataFrame(data1) 
print (df1) 


Output 

  x    y   z
0 1    2   NaN
1 5    4   5.0


Create a DataFrame from Dict of Series

e.g.1 
import pandas as pd1 
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :         pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df1 = pd1.DataFrame(d1) print (df1) 

Output

  one two 
a 1.0 1 
b 2.0 2 
c 3.0 3 
d NaN 4


Pandas DataFrame Column addition 

import pandas as pd df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) 
print (df)
c = [7,8,9] 
df[‘C'] = c # adding a new column named as 'C'
print (df)

Output: (Before adding column C)

   A  B
0  1  4
1  2  5

2  3  6

Output: (After adding column C)

   A  B  C
0  1  4  7
1  2  5  8

2  3  6  9



Column Deletion 
we can delete column using three functions:-
1- del function is used with [ ] bracket without dot(.) operator.
2- pop function is used with ( ) bracket with dot(.) operator.
3- drop function is used with () bracket with axes.

Note: example associated with previous created dataframe df.

del df['C'] # Deleting the last column using DEL function 
print (df)

Output:

   A  B
0  1  4
1  2  5

2  3  6



df.pop('B') #Deleting another column using POP function 
             
Output:
   A
0  1
1  2

2  3

import pandas as pd
df = pd.DataFrame({"A":[1, 2, 3],"B":[4, 5, 6]})
print (df)

Output: 
   A  B
0  1  4
1  2  5

2  3  6

df.drop('A',axis=1) 

Output:
   B
0  4
1  5

2  6

Note#Deleting another column using drop function axis=1 means to delete data column-wise if we use axis=0 means to delete data row-wise with given index no.



Rename columns 

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) 
>>> df.rename(columns={"A": "a", "B": "c"}) 

Output:
  a c 
0 1 
2 5 
2 3 6


Row Selection, Addition, and Deletion 

#Selection by Label 

import pandas as pd1 d1 = {'one' : pd1.Series([1, 2, 3], 
index=['a', 'b', 'c']), 'two' : pd1.Series([1, 2, 3, 4], 
index=['a', 'b', 'c', 'd'])} 
df1 = pd1.DataFrame(d1) 
print (df1.loc['b']) 

Output 
one 2.0 
two 2.0 

Name: b, dtype: float64


#Selection by integer location

import pandas as pd1 
d1 = {'one':pd1.Series([1,2,3],index=['a','b','c']), 
      'two':pd1.Series([1,2,3,4],index=['a','b','c','d'])} 
df1 = pd1.DataFrame(d1) 
print (df1.iloc[2]) 


Output 
one 3.0 
two 3.0 
Name: c, dtype: float64 

Slice Rows : Multiple rows can be selected using ":" operator. 
print (df1[2:4])


Addition of Rows 

import pandas as pd1 
df1 = pd1.DataFrame([[1, 2], [3, 4]], columns = ['a','b']) 
df2 = pd1.DataFrame([[5, 6], [7, 8]], columns = ['a','b']) 
df1 = df1.append(df2) 
print (df1) 

Output

   a  b
0  1  2
1  3  4
0  5  6
1  7  8


Deletion of Rows 

# Drop rows with label 0 

df1 = df1.drop(0)
print (df1)

#output after deletion of first row of both colomn (0 label)

   a  b
1  3  4

1  7  8


Iteration on Rows and Columns
If we want to access record or data from a data frame row wise or column wise then iteration is used. Pandas provide 2 functions to perform iterations-

1.iterrows ()
2.iteritems ()

iterrows()


It is used to access the data row wise.

Example-

import pandas as pd
L=[{'First_Name': 'Birbal', 'last_Name':'Jat'},
   {'First_Name': 'Vihaan', 'last_Name':'Chodhary'}]
df1=pd.DataFrame(L)
print (df1)
for (row_index,row_value) in df1.iterrows():
    print('\n Row index is ::',row_index)
    print('Row value is ::')
    print(row_value)

OUTPUT:

              First_Name last_Name
0     Birbal        Jat
1     Vihaan      Chodhary

Row index is :: 0
Row value is ::
First_Name       Birbal
last_Name        Jat

Name: 0, dtype: object

Row index is :: 1
Row value is ::
First_Name       Vihaan
last_Name        Chodhary

Name: 1, dtype: object


iteritems()


It is used to access the data column wise. 
Example-


import pandas as pd
L=[{'First_Name': 'Birbal', 'last_Name':'Jat'},
   {'First_Name': 'Vihaan', 'last_Name':'Chodhary'}]
df1=pd.DataFrame(L)
print (df1)
for (col_name,col_value) in df1.iteritems():
    print('\n')
    print('\n column Name is ::',col_name)
    print('Column Value are ::')

    print(col_value)

Output:
First_Name last_Name
0     Birbal       Jat
1     Vihaan  Chodhary

column Name is :: First_Name
Column Value are ::
0    Birbal
1    Vihaan
Name: First_Name, dtype: object


 column Name is :: last_Name
Column Value are ::
0         Jat
1    Chodhary

Name: last_Name, dtype: object


Head & Tail 

head() returns the first n rows (observe the index values). The default number of elements to display is 5, but you may pass a custom number. tail() returns the last n rows .


import pandas as pd
import numpy as np
d={'Name':pd.Series(['Disha','Rohit','Manas','Krish','Vishnu']),
   'Age':pd.Series([17,18,19,18,19]),'Score':pd.Series([9,8,9,8,9])}
df=pd.DataFrame(d)
print("Data Frame is:")
print (df)
print("The first two rows of the DataFrame is (use of head function )")
print (df.head(2))
print("The last two rows of the DataFrame is (use of  tail function )")

print (df.tail(2))
print("Result of showing 2nd to 4th row ")

print (df[2:5])

OUTPUT:
Data Frame is:
     Name  Age  Score
0   Disha   17      9
1   Rohit   18      8
2   Manas   19      9
3   Krish   18      8
4  Vishnu   19      9

The first two rows of the DataFrame is (use of head function )

    Name  Age  Score
0  Disha   17      9
1  Rohit   18      8

The last two rows of the DataFrame is (use of tail function )

     Name  Age  Score
3   Krish   18      8

4  Vishnu   19      9

Result of showing 2nd to 4th row 

     Name  Age  Score
2   Manas   19      9
3   Krish   18      8

4  Vishnu   19      9



Boolean Indexing in Data Frame
In order to access a dataframe with a boolean index, we have to create a dataframe in which index of dataframe contains a boolean value that is “True” or “False”.

import pandas as pd
import numpy as np
d={'Name':['Disha','Rohit','Manas','Krish','Vishnu'],
   'Age':[17,18,19,18,19],'Score':[9,8,9,8,9]}
df=pd.DataFrame(d,index=[True,False,True,False,True])
print("Data Frame is:")
print (df)
print("Accessing a DataFrame using .loc[] with index True ")
print (df.loc[True])

print("Accessing a DataFrame using .loc[] with index False ")
print (df.loc[False])


OUTPUT:
Data Frame is:

         Name  Age  Score
True    Disha   17      9
False   Rohit   18      8
True    Manas   19      9
False   Krish   18      8
True   Vishnu   19      9

Accessing a DataFrame using .loc[] with index True

        Name  Age  Score
True   Disha   17      9
True   Manas   19      9
True  Vishnu   19      9

Accessing a DataFrame using .loc[] with index False

        Name  Age  Score
False  Rohit   18      8
False  Krish   18      8




Concat operation in data frame

Pandas provides various facilities for easily combining together Series,  DataFrame.

pd.concat(objs, axis=0, join='inner', join_axes=None,ignore_index=False



·objs This is a sequence or mapping of Series, DataFrame, or Panel objects.
·axis {0, 1, ...}, default 0. This is the axis to concatenate along.
·join {‘inner’, ‘outer’}, . How to handle indexes on other axis(es). Outer for union and inner for intersection.
   By default, the Pandas merge operation acts with an “inner” merge. An inner merge, (or inner join) keeps only the common values in both the left and right dataframes for the result.
·ignore_index boolean, default False. If True, do not use the  index values on the concatenation axis. The resulting axis will be  labeled 0, ..., n - 1.
·join_axes This is the list of Index objects. Specific indexes to  use for the other (n-1) axes instead of performing inner/outer  set logic.

The Concat() performs concatenation operations along an axis
 example 1:


import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.concat([df1,df2])
print (df3)

OUTPUT: 


   RollNo    Name  Score
0    101    Disha    95
1    102    Manas    92
2    103    Rohit    93
3    104  Sarthak    91
0    105  Krishan    93
1    106   Gautam    92
2    107  Shubham    91
3    108   Vishnu    95


If you want the row labels to adjust automatically  according to the join, you will have to set the argument ignore_index as True while calling  the concat() function:

Example-2

import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.concat([df1,df2],ignore_index=True)
print (df3)


OUTPUT: 
    RollNo  Name   Score
0    101    Disha    95
1    102    Manas    92
2    103    Rohit    93
3    104  Sarthak    91
4    105  Krishan    93
5    106   Gautam    92
6    107  Shubham    91
7    108   Vishnu    95


pandas also provides you with an option to label
the DataFrames, after the concatenation, with
a key so that you may know which data came
from which DataFrame.

EXAMPLE-3

import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
merge={'Data1':df1,'Data2':df2}
df3=pd.concat(merge)
print (df3)




OUTPUT:

          RollNo  Name    Score
Data1 0    101    Disha    95
      1    102    Manas    92
      2    103    Rohit    93
      3    104  Sarthak    91
Data2 0    105  Krishan    93
      1    106   Gautam    92
      2    107  Shubham    91
      3    108   Vishnu    95


To  concatenate  DataFrames  along column, you can specify
the axis parameter as 1.

EXAMPLE-4 

import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.concat([df1,df2],axis=1)
print (df3)

OUTPUT:

    RollNo  Name    Score RollNo  Name   Score
0    101    Disha    95    105  Krishan    93
1    102    Manas    92    106   Gautam    92
2    103    Rohit    93    107  Shubham    91
3    104  Sarthak    91    108   Vishnu    95

Merge operation in data frame


Two Data Frames might hold different kinds of information about the same entity and linked by some common feature/column. To join these Data Frames, pandas provides multiple functions like merge(), join() etc.

Example-1

import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.merge(df1,df2,on='Score')
print (df3)

OUTPUT:


 RollNo_x   Name_x   Score RollNo_y   Name_y

0      101    Disha    95      108   Vishnu

1      102    Manas    92      106   Gautam
2      103    Rohit    93      105  Krishan
3      104  Sarthak    91      107  Shubham



Join operation in data frame


It is used to merge data frames based on some common column/key

1.Full Outer Join:-
The full outer join combines the results of both
the left and the right outer joins. The joined data frame will contain all
records from both the data frames and fill in NaNs for missing
matches on either side. You can perform a full outer join by specifying
the how argument as outer in merge() function.


import pandas as pd

d1={'RollNo':['101','102','103','104'],

    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.merge(df1,df2,on='RollNo',how='outer')
print (df3)

Output:
RollNo   Name_x Score_x   Name_y Score_y
0    101    Disha      95      NaN     NaN
1    102    Manas      92      NaN     NaN
2    103    Rohit      93      NaN     NaN
3    104  Sarthak      91      NaN     NaN
4    105      NaN     NaN  Krishan      93
5    106      NaN     NaN   Gautam      92
6    107      NaN     NaN  Shubham      91
7    108      NaN     NaN   Vishnu      95


2.Inner Join: The inner join produce only those records that match in both the data frame. You have to pass inner in how argument inside merge() function


import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.merge(df1,df2,on='Score',how='inner')
print (df3)


OUTPUT;
  RollNo_x Name_x  Score  RollNo_y   Name_y
0 101    Disha      95      108     Vishnu
1 102    Manas      92      106     Gautam
2 103    Rohit      93      105     Krishan
3 104    Sarthak    91      107     Shubham


3. Right Join :-
The  right join produce a complete set of records
from data frame B(Right side Data Frame) with the matching records
(where available) in data frame A( Left side data frame). If there is no
match right side will contain null. You have to pass right in how
argument inside merge() function.


import pandas as pd
d1={'RollNo':['101','102','103','104'],
   'Name':['Disha','Manas','Rohit','Sarthak'],
   'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
   'Name':['Krishan','Gautam','Shubham','Vishnu'],
   'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.merge(df1,df2,on='RollNo',how='right')
print (df3)

OUTPUT:

     RollNo Name_x Score_x   Name_y Score_y

0    105    NaN     NaN    Krishan      93

1    106    NaN     NaN    Gautam       92
2    107    NaN     NaN    Shubham      91
3    108    NaN     NaN    Vishnu       95



4. Left Join :-
The right join produce a complete set of records from data frame A(Left side Data Frame) with the matching records (where available) in data frame B ( Right side data frame). If there is no match 
left side will contain null. You have to pass left in how argument inside merge() function.

import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.merge(df1,df2,on='RollNo',how='left')
print (df3)

OUTPUT:

    RollNo   Name_x Score_x Name_y Score_y

0    101    Disha      95    NaN     NaN

1    102    Manas      92    NaN     NaN
2    103    Rohit      93    NaN     NaN
3    104  Sarthak      91    NaN     NaN

5. Joining on Index :- Sometimes you have to perform the join on the indexes or the row labels. For that you have to specify right_index ( for the indexes of the right data frame )and left_index( for the indexes of left data frame) as True.

import pandas as pd
d1={'RollNo':['101','102','103','104'],
    'Name':['Disha','Manas','Rohit','Sarthak'],
    'Score':['95','92','93','91']}
d2={'RollNo':['105','106','107','108'],
    'Name':['Krishan','Gautam','Shubham','Vishnu'],
    'Score':['93','92','91','95']}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.merge(df1,df2,right_index=True, left_index=True)
print (df3)

OUTPUT:


 RollNo_x   Name_x Score_x RollNo_y   Name_y Score_y

0      101    Disha      95      105  Krishan      93

1      102    Manas      92      106   Gautam      92
2      103    Rohit      93      107  Shubham      91
3      104  Sarthak      91      108   Vishnu      95




Importing/Exporting Data between CSV files and Data Frames.


CSV File: Introduction

A CSV is a comma separated values file, which allows data to  be saved in a tabular format. CSV is a simple file such as a  spreadsheet or database. Files in the csv format can be  imported and exported from programs that store data in  tables, such as Microsoft excel or Open Office.
CSV files data fields are most often  separated, or delimited by a comma. Here the data in each  row are delimited by comma and individual rows are separated  by newline.
To create a csv file, first choose your  favorite text editor such as- Notepad and open a new file.  Then enter the text data you want the file to contain,  separating each value with a comma and each row with a new  line. Save the file with the extension.csv. You can open the  file using MS Excel or another spread sheet program. It will  create the table of similar data.


Advantages of CSV format-

  • A simple, compact and ubiquitous format for data storage. 
  • A common format for data interchange. 
  • It can be opened in popular spredsheet packages like MS-EXCEL etc.
  • Nearly all spredsheets and databases support import/export to csv format.

Reading from a CSV file to DataFrame


Syntax
import pandas as pd
df=pd.read_csv() <file-path>

Open an MS-Excel file and enter some data in it and save <file_name>.csv (with .csv extension). give correct path in Python Program:
Assume the file path as e:\student.csv then following type of file will be opened


Now we will create a DataFrame like this:

import pandas as pd
df=pd.read_csv("e:\student.csv")
print (df)


OUTPUT:

    RollNo     Name    Class  Marks
0     101      Disha     12-A     95
1     102      Manas   12-A     94
2     103      Rohit     12-A     94
3     104     Gautam  12-B     95
4     105     Krishna  12-B     94


USE OF HEADER as None


import pandas as pd
df=pd.read_csv("e:\student.csv", header=None)
print (df)

OUTPUT:

       0        1      2      3
0    RollNo  Name    Class  Marks
1     101    Disha   12-A     95
2     102    Manas   12-A     94
3     103    Rohit   12-A     94
4     104   Gautam   12-B     95
5     105  Krishna   12-B     94


 Reading selected lines from CSV file  Use of nrows ()


import pandas as pd
df=pd.read_csv("e:\student.csv",nrows=3)
print (df)


OUTPUT:

     RollNo   Name Class  Marks
0     101  Disha  12-A     95
1     102  Manas  12-A     94
2     103  Rohit  12-A     94



Exporting data from data frame to CSV File

To export a data frame into a csv file first of all, we create a dataframe say df1 and use dataframe.to_csv ( E:\student1.csv ’ ) method to export data -frame df1 into csv file student1.csv.

import pandas as pd
d={'Name':['Disha','Rohit','Manas','Krish','Vishnu'],
   'Age':[17,18,19,18,19],
   'Score':[9,8,9,8,9]}
df1=pd.DataFrame(d)
df1.to_csv("e:\student1.csv")

now a MS-Excel file will be created at given path i.e. in E drive of computer system.

OUTPUT:


We can change comma separator with other character using "sep"


import pandas as pd
d={'Name':['Disha','Rohit','Manas','Krish','Vishnu'],
   'Age':[17,18,19,18,19],
   'Score':[9,8,9,8,9]}
df1=pd.DataFrame(d)
df1.to_csv("e:\student1.csv",sep='#')

TO see the output open "student1.csv" file with Notepad 

OUTPUT


#Name#Age#Score
0#Disha#17#9
1#Rohit#18#8
2#Manas#19#9
3#Krish#18#8
4#Vishnu#19#9


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.