Saturday, May 4, 2019

XII-IP: Re-indexing and Altering lables in Pandas

Reindexing

Reindexing in Pandas can be used to change the index of rows and columns of a DataFrame. To reindex means to conform the data to match a given set of labels 
import pandas as pd
df = pd.DataFrame([[10,5,50,8],[20,10,60,16],[30,15,70,24],[40,20,80,32],[50,25,90,40]],\
index=['A','B','C','D','E'], columns=['COL1','COL2','COL3','COL4'])
display(df)
df1=df.reindex(['C','B','D','E','A'])
display(df1)

output:

  COL1 COL2 COL3 COL4
A 10 5 50 8
B 20 10 60 16
C 30 15 70 24
D 40 20 80 32
E 50 25 90 40
  COL1 COL2 COL3 COL4
C 30 15 70 24
B 20 10 60 16
D 40 20 80 32
E 50 25 90 40
A 10 5 50 8

display(df)
df2=df.reindex(['C','B','Z','E','A'])
display(df2)

  COL1 COL2 COL3 COL4
A 10 5 50 8
B 20 10 60 16
C 30 15 70 24
D 40 20 80 32
E 50 25 90 40
  COL1 COL2 COL3 COL4
C 30.0 15.0 70.0 24.0
B 20.0 10.0 60.0 16.0
Z NaN NaN NaN NaN
E 50.0 25.0 90.0 40.0
A 10.0 5.0 50.0 8.0

We can fill in the missing values by passing a value to the argument fill_value. In the following example the NaNs are replaced with 0.

display(df)
df2=df.reindex(['C','B','Z','E','A'], fill_value=0)
display(df2)

  COL1 COL2 COL3 COL4
A 10 5 50 8
B 20 10 60 16
C 30 15 70 24
D 40 20 80 32
E 50 25 90 40
  COL1 COL2 COL3 COL4
C 30 15 70 24
B 20 10 60 16
Z 0 0 0 0
E 50 25 90 40
A 10 5 50 8
We can use the method parameter with values bfillffill or nearest to fill values from adjacent rows. See the following example: In this example the ffill value has replaced the NaNs in Z row with values from E row.
df2=df.reindex(['C','B','Z','E','A'], method='ffill')
display(df2)
  COL1 COL2 COL3 COL4
C 30 15 70 24
B 20 10 60 16
Z 50 25 90 40
E 50 25 90 40
A 10 5 50 8

Altering Labels in DataFrames

We can also reindex the columns by setting the parameter columns as shown in the following example. Notice that since the reindexed DataFrame has a new column label as COL5 which is not a part of original DataFrame, NaNs are displayed in that column

display(df)
df2=df.reindex(columns=['COL1','COL2','COL3','COL5'])
display(df2)

  COL1 COL2 COL3 COL4
A 10 5 50 8
B 20 10 60 16
C 30 15 70 24
D 40 20 80 32
E 50 25 90 40
  COL1 COL2 COL3 COL5
A 10 5 50 NaN
B 20 10 60 NaN
C 30 15 70 NaN
D 40 20 80 NaN
E 50 25 90 NaN

The same can be achieved by setting the axis parameter to columns as shown below

df2=df.reindex(['COL1','COL2','COL3','COL5'], axis='columns')
display(df2)

  COL1 COL2 COL3 COL5
A 10 5 50 NaN
B 20 10 60 NaN
C 30 15 70 NaN
D 40 20 80 NaN
E 50 25 90 NaN

We can use rename method to rename the columns in a DataFrame. We need to pass a dictionary containing the old and new names to the columns parameter of rename method

df = df.rename(columns={'COL1':'W','COL2':'X','COL3':'Y','COL4':'Z'})
df
  W X Y Z
A 10 5 50 8
B 20 10 60 16
C 30 15 70 24
D 40 20 80 32
E 50 25 90 40

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.