PANDAS LAB EXP-2

1.import pandas as pd

ds=pd.Series([1,2,3,4])

print(ds)

output:

0 1

1 2

2 3

3 4

dtype: int64

2.import pandas as pd

ds=pd.Series([1,2,3,4])

print(ds.tolist())

print(type(ds.tolist()))

output:

[1, 2, 3, 4]

3.import pandas as pd

ds1=pd.Series([1,3,5,7])

ds2=pd.Series([2,4,6,8])

add=ds1+ds2

sub=ds1=ds2

mul=ds1*ds2

div=ds1/ds2

print(add)

print(sub)

print(mul)

print(div)

output:

0 3

1 7

2 11

3 15

dtype: int64

0 2

1 4

2 6

3 8

dtype: int64

0 4

1 16

2 36

3 64

dtype: int64

0 1.0

1 1.0

2 1.0

3 1.0

dtype: float64

4.import numpy as np

a=np.array([10,20,30,40,50])

print(a)

import pandas as pd

ds=pd.Series(a)

print(ds)

output:

[10 20 30 40 50]

0 10

1 20

2 30

3 40

4 50

dtype: int32

5.import pandas as pd

ds=pd.Series(i for i in range(50))

print(ds.head(-10))

output:

0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

dtype: int64

6.import pandas as pd

ds=pd.Series(i for i in range(50))

print(ds.head(-29).tail(-14))

output:

14 14

15 15

16 16

17 17

18 18

19 19

20 20

dtype: int64

7.import numpy as np

import pandas as pd

exam_data = {

'name': ['Anastasia','Dima','Katherine','James','Emily','Michael','Mathnew','Laura','Kevin','Jonas'],

'score':[12.5,9,16.5,np.nan,9,20,14.5,np.nan,8,19],

'attempts':[1,3,2,3,2,3,1,1,2,1],

'qulify':['yes','no','yes','no','no','yes','yes','no','no','yes']}

lables=['a','b','c','d','e','f','g','h','i','j']

df=pd.DataFrame(exam_data,index=lables)

print(df)

output:

name score attempts qulify

a Anastasia 12.5 1 yes

b Dima 9.0 3 no

c Katherine 16.5 2 yes

d James NaN 3 no

e Emily 9.0 2 no

f Michael 20.0 3 yes

g Mathnew 14.5 1 yes

h Laura NaN 1 no

i Kevin 8.0 2 no

j Jonas 19.0 1 yes

8.import numpy as np

import pandas as pd

exam_data = {

'name': ['Anastasia','Dima','Katherine','James','Emily','Michael','Mathnew','Laura','Kevin','Jonas'],

'score':[12.5,9,16.5,np.nan,9,20,14.5,np.nan,8,19],

'attempts':[1,3,2,3,2,3,1,1,2,1],

'qulify':['yes','no','yes','no','no','yes','yes','no','no','yes']}

lables=['a','b','c','d','e','f','g','h','i','j']

df['name']=df['name'].replace('Jonas','Suresh')

print(df)

output:

name score attempts qulify

a Anastasia 12.5 1 yes

b Dima 9.0 3 no

c Katherine 16.5 2 yes

d James NaN 3 no

e Emily 9.0 2 no

f Michael 20.0 3 yes

g Mathnew 14.5 1 yes

h Laura NaN 1 no

i Kevin 8.0 2 no

j Suresh 19.0 1 yes

9.import numpy as np

import pandas as pd

exam_data = {

'name': ['Anastasia','Dima','Katherine','James','Emily','Michael','Mathnew','Laura','Kevin','Jonas'],

'score':[12.5,9,16.5,np.nan,9,20,14.5,np.nan,8,19],

'attempts':[1,3,2,3,2,3,1,1,2,1],

'qulify':['yes','no','yes','no','no','yes','yes','no','no','yes']}

lables=['a','b','c','d','e','f','g','h','i','j']

df['colour']=['red','blue','red','blue','red','blue','red','blue','red','blue']

print(df)

output:

name score attempts qulify colour

a Anastasia 12.5 1 yes red

b Dima 9.0 3 no blue

c Katherine 16.5 2 yes red

d James NaN 3 no blue

e Emily 9.0 2 no red

f Michael 20.0 3 yes blue

g Mathnew 14.5 1 yes red

h Laura NaN 1 no blue

i Kevin 8.0 2 no red

j Suresh 19.0 1 yes blue

10.import numpy as np

import pandas as pd

exam_data = {

'name': ['Anastasia','Dima','Katherine','James','Emily','Michael','Mathnew','Laura','Kevin','Jonas'],

'score':[12.5,9,16.5,np.nan,9,20,14.5,np.nan,8,19],

'attempts':[1,3,2,3,2,3,1,1,2,1],

'qulify':['yes','no','yes','no','no','yes','yes','no','no','yes']}

lables=['a','b','c','d','e','f','g','h','i','j']

df=pd.DataFrame(exam_data,index=lables)

df=(list(df.columns.values))

print(df)

output:

['name', 'score', 'attempts', 'qulify']

C) Pandas Index:

1) Write a Pandas program to display the default index and set a column as an Index in a givendataframe.

import pandas as pd

df= pd.DataFrame({'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes'], 'labels' : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] })

print("Default Index:")

print(df.head(10))

print("\nt_id as new Index:")

df1 = df.set_index('label')

print(df1)

print("\nReset the index:")

df2 = df1.reset_index(inplace=False)

print(df2)

O/P

Default Index:

name score attempts qualify labels

0 Anastasia 12.5 1 yes a

1 Dima 9.0 3 no b

2 Katherine 16.5 2 yes c

3 James NaN 3 no d

4 Emily 9.0 2 no e

5 Michael 20.0 3 yes f

6 Matthew 14.5 1 yes g

7 Laura NaN 1 no h

8 Kevin 8.0 2 no i

9 Jonas 19.0 1 yes j

label as new Index:

name score attempts qualify

labels

a Anastasia 12.5 1 yes

b Dima 9.0 3 no

c Katherine 16.5 2 yes

d James NaN 3 no

e Emily 9.0 2 no

f Michael 20.0 3 yes

g Matthew 14.5 1 yes

h Laura NaN 1 no

i Kevin 8.0 2 no

j Jonas 19.0 1 yes

Reset the index:

labels name score attempts qualify

0 a Anastasia 12.5 1 yes

1 b Dima 9.0 3 no

2 c Katherine 16.5 2 yes

3 d James NaN 3 no

4 e Emily 9.0 2 no

5 f Michael 20.0 3 yes

6 g Matthew 14.5 1 yes

7 h Laura NaN 1 no

8 i Kevin 8.0 2 no

9 j Jonas 19.0 1 yes

2) Write a Pandas program to create an index labels by using 64-bit integers, usingfloating-point numbers in a given dataframe.

import pandas as pd

print(df)

print("\nView the Attempts:")

print(df.attempts)

print("\nView the Score:")

print(df.score)

O/P

name  score  attempts qualify labels

0  Anastasia   12.5         1     yes      a

1       Dima    9.0         3      no      b

2  Katherine   16.5         2     yes      c

3      James    NaN         3      no      d

4      Emily    9.0         2      no      e

5    Michael   20.0         3     yes      f

6    Matthew   14.5         1     yes      g

7      Laura    NaN         1      no      h

8      Kevin    8.0         2      no      i

9      Jonas   19.0         1     yes      j

View the Attempts:

0    1

1    3

2    2

3    3

4    2

5    3

6    1

7    1

8    2

9    1

Name: attempts, dtype: int64

View the Score:

0    12.5

1     9.0

2    16.5

3     NaN

4     9.0

5    20.0

6    14.5

7     NaN

8     8.0

9    19.0

Name: score, dtype: float64

D) Pandas String and Regular Expressions:

1) Write a Pandas program to convert all the string values to upper, lower cases in agiven pandas series. Also find the length of the string values.

import pandas as pd

import numpy as np

s=pd.Series(['hima','Parrot','np.num'])

print("Original Series")

print(s)

print("\nConvert all string values of the said Series to upper case:")

print(s.str.upper())

print("\nConvert all string values of the said Series to lower case:")

print(s.str.lower())

print("\nLength of the string values of the said Series:")

print(s.str.len())

O/P

Original Series

0      hima

1    Parrot

2    np.num

dtype: object

Convert all string values of the said Series to upper case:

0      HIMA

1    PARROT

2    NP.NUM

dtype: object

Convert all string values of the said Series to lower case:

0      hima

1    parrot

2    np.num

dtype: object

Length of the string values of the said Series:

0    4

1    6

2    6

dtype: int64

2) Write a Pandas program to remove whitespaces, left sided whitespaces and right sided whitespaces of the string values of a given pandas series.

import pandas as pd

c = pd.Index([' Yellow', 'Black ', ' Orange ', 'White', ' Pink '])

print("Original series:")

print(c)

print("\nRemove whitespace")

print(c.str.strip())

print("\nRemove left sided whitespace")

print(c.str.lstrip())

print("\nRemove Right sided whitespace")

print(c.str.rstrip())

O/P

Original series:

Index([' Yellow', 'Black ', ' Orange ', 'White', ' Pink '], dtype='object')

Remove whitespace

Index(['Yellow', 'Black', 'Orange', 'White', 'Pink'], dtype='object')

Remove left sided whitespace

Index(['Yellow', 'Black ', 'Orange ', 'White', 'Pink '], dtype='object')

Remove Right sided whitespace

Index([' Yellow', 'Black', ' Orange', 'White', ' Pink'], dtype='object')

3) Write a Pandas program to count of occurrence of a specified substring in a DataFrame column.

import pandas as pd

df=pd.DataFrame({'labels' : ['a', 'b', 'c', 'd', 'e', 'f' , 'g', 'h', 'i', 'j'], 'score': ['A12', 'A9', 'A16', 'A22', 'A9', 'A20', 'A14', 'A32', 'A8', 'A129' ], 'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas']})

print("Original DataFrame:")

print(df)

print("\nCount occurrence of 2 in score column:")

df['count'] = list(map(lambda x: x.count("2"), df['score']))

print(df)

O/P

Original DataFrame:

  labels score       name

0      a   A12  Anastasia

1      b    A9       Dima

2      c   A16  Katherine

3      d   A22      James

4      e    A9      Emily

5      f   A20    Michael

6      g   A14    Matthew

7      h   A32      Laura

8      i    A8      Kevin

9      j  A129      Jonas

Count occurrence of 2 in score column:

  labels score       name  count

0      a   A12  Anastasia      1

1      b    A9       Dima      0

2      c   A16  Katherine      0

3      d   A22      James      2

4      e    A9      Emily      0

5      f   A20    Michael      1

6      g   A14    Matthew      0

7      h   A32      Laura      1

8      i    A8      Kevin      0

9      j  A129      Jonas      1

4) Write a Pandas program to swap the cases of a specified character column in a givenDataFrame.

import pandas as pd

df=pd.DataFrame({'labels' : ['a', 'b', 'c', 'd', 'e', 'f' , 'g', 'h', 'i', 'j'], 'score': ['A12', 'A9', 'A16', 'A22', 'A9', 'A20', 'A14', 'A32', 'A8', 'A129' ], 'name': ['Anastasia', 'DIMA', 'Katherine', 'JAMES', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas']})

print("Original DataFrame:")

print(df)

print("\nSwapp cases in name:")

df['swapped_name'] = list(map(lambda x: x.swapcase(), df['name']))

print(df)

O/P

Original DataFrame:

  labels score       name

0      a   A12  Anastasia

1      b    A9       DIMA

2      c   A16  Katherine

3      d   A22      JAMES

4      e    A9      Emily

5      f   A20    Michael

6      g   A14    Matthew

7      h   A32      Laura

8      i    A8      Kevin

9      j  A129      Jonas

Swapp cases in name:

  labels score       name swapped_name

0      a   A12  AnastasiaaNASTASIA

1      b    A9       DIMA         dima

2      c   A16  KatherinekATHERINE

3      d   A22      JAMES        james

4      e    A9      Emily        eMILY

5      f   A20    Michael      mICHAEL

6      g   A14    Matthew      mATTHEW

7      h   A32      Laura        lAURA

8      i    A8      Kevin        kEVIN

9      j  A129      Jonas        jONAS

E) Pandas Joining and merging DataFrame:

1) Write a Pandas program to join the two given dataframesalong rows and assign all data.

import pandas as pd

stu_data1=pd.DataFrame({'sid':['s1', 's2', 's3', 's4', 's5'], 'name': ['Sunny', 'Munny', 'Buddy', 'Akshu', 'Nani'], 'Marks': [55, 80, 95, 82, 85]})

stu_data2=pd.DataFrame({'sid':['s6', 's7', 's8', 's9', 's10'], 'name': ['Sunny', 'Munny', 'Buddy', 'Akshu', 'Nani'], 'Marks': [65, 70, 98, 72, 65]})

print("Original DataFrames:")

print(stu_data1)

print("-------------------------------------")

print(stu_data2)

print("\nJoin the said two dataframes along rows:")

result_data = pd.concat([stu_data1, stu_data2])

print(result_data)

O/P

Original DataFrames:

sidname  Marks

0  s1  Sunny     55

1  s2  Munny     80

2  s3  Buddy     95

3  s4  Akshu     82

4  s5   Nani     85

-------------------------------------

sidname  Marks

0   s6  Sunny     65

1   s7  Munny     70

2   s8  Buddy     98

3   s9  Akshu     72

4  s10   Nani     65

Join the said two dataframes along rows:

sidname  Marks

0   s1  Sunny     55

1   s2  Munny     80

2   s3  Buddy     95

3   s4  Akshu     82

4   s5   Nani     85

0   s6  Sunny     65

1   s7  Munny     70

2   s8  Buddy     98

3   s9  Akshu     72

4  s10   Nani     65

2) Write a Pandas program to append a list of dictioneries or series to a existing DataFrame and display the combined data.

import pandas as pd

stu_data1=pd.DataFrame({'sid':['s1', 's2', 's3', 's4', 's5'], 'name': ['Sunny', 'Munny', 'Buddy', 'Akshu', 'Nani'], 'Marks': [55, 80, 95, 82, 85]})

s6 = pd.Series(['S6', 'Yogi', 96], index=['sid', 'name', 'marks'])

dicts = [{'sid': 'S6', 'name': 'Yogi', 'Marks': 96},{'sid': 'S7', 'name': 'Abhi', 'Marks': 88}]

print("Original DataFrames:")

print(stu_data1)

print("\nDictionary:")

print(s6)

combined_data= stu_data1.append(dicts, ignore_index=True, sort=False)

print("\nCombined Data:")

print(combined_data)

O/P

Original DataFrames:

sidname  Marks

0  s1  Sunny     55

1  s2  Munny     80

2  s3  Buddy     95

3  s4  Akshu     82

4  s5   Nani     85

Dictionary:

sid        S6

name     Yogi

marks      96

dtype: object

Combined Data:

sidname  Marks

0  s1  Sunny     55

1  s2  Munny     80

2  s3  Buddy     95

3  s4  Akshu     82

4  s5   Nani     85

5  S6   Yogi     96

6  S7   Abhi     88

3) Write a Pandas program to join the two dataframes with matching records from bothsides where available.

import pandas as pd

stu_data1=pd.DataFrame({'sid':['s1', 's2', 's3', 's4', 's5'], 'name': ['Sunny', 'Munny', 'Buddy', 'Akshu', 'Nani'], 'Marks': [55, 80, 95, 82, 85]})

stu_data2=pd.DataFrame({'sid':['s6', 's7', 's8', 's9', 's10'], 'name': ['Sunny', 'Munny', 'Buddy', 'Akshu', 'Nani'], 'Marks': [65, 70, 98, 72, 65]})

print("Original DataFrames:")

print(stu_data1)

print(stu_data2)

merged_data = pd.merge(stu_data1, stu_data2, on='sid', how='outer')

print("Merged data (outer join):")

print(merged_data)

O/P

Original DataFrames:

sidname Marks

0 s1 Sunny 55

1 s2 Munny 80

2 s3 Buddy 95

3 s4 Akshu 82

4 s5 Nani 85

sidname Marks

0 s6 Sunny 65

1 s7 Munny 70

2 s8 Buddy 98

3 s9 Akshu 72

4 s10 Nani 65

Merged data (outer join):

sidname_xMarks_xname_yMarks_y

0 s1 Sunny 55.0 NaNNaN

1 s2 Munny 80.0 NaNNaN

2 s3 Buddy 95.0 NaNNaN

3 s4 Akshu 82.0 NaNNaN

4 s5 Nani 85.0 NaNNaN

5 s6 NaNNaN Sunny 65.0

6 s7 NaNNaNMunny 70.0

7 s8 NaNNaN Buddy 98.0

8 s9 NaNNaNAkshu 72.0

9 s10 NaNNaN Nani 65.0

F) Pandas Time Series:

1) Write a Pandas program to create

a) Datetime object for Jan 15 2012.

b) Specific date and time of 9:20 pm.

c) Local date and time.

d) A date without time.

e) Current date.

f) Time from a datetime.

g) Current local time.

import datetime

from datetime import datetime

print("Datetime object for Jan 11 2012:")

print(datetime(2012, 1, 11))

print("\nSpecific date and time of 9:20 pm")

print(datetime(2011, 1, 11, 21, 20))

print("\nLocal date and time:")

print(datetime.now())

print("\nA date without time: ")

print(datetime.date(datetime(2012, 5, 22)))

print("\nCurrent date:")

print(datetime.now().date())

print("\nTime from a datetime:")

print(datetime.time(datetime(2012, 12, 15, 18, 12)))

print("\nCurrent local time:")

print(datetime.now().time())

O/P

Datetime object for May 20 1985:

1985-05-20 00:00:00

Specific date and time of 9:20 pm

2011-01-11 21:20:00

Local date and time:

2022-03-18 14:12:11.440313

A date without time:

2012-05-22

Current date:

2022-03-18

Time from a datetime:

18:12:00

Current local time:

14:12:11.440313

2) Write a Pandas program to create a date from a given year, month, day and anotherdate from a given string formats.

from datetime import datetime

date1 = datetime(year=2017, month=09, day=10)

print("Date from a given year, month, day:")

print(date1)

from dateutil import parser

date2 = parser.parse("8th of January, 2010")

print("\nDate from a given string formats:")

print(date2)

O/P

Date from a given year, month, day:

2017-09-10 00:00:00

Date from a given string formats:

2010-01-08 00:00:00

3) Write a Pandas program to create a time-series with two index labels and randomvalues. Also print the type of the index.

import pandas as pd

import numpy as np

import datetime

from datetime import datetime, date

dates = [datetime(2011, 9, 1), datetime(2011, 9, 2)]

print("Time-series with two index labels:")

time_series = pd.Series(np.random.randn(2), dates)

print(time_series)

print("\nType of the index:")

print(type(time_series.index))

O/P

Time-series with two index labels:

2011-09-01 0.408125

2011-09-02 0.884415

dtype: float64

Type of the index:

G) Pandas Grouping Aggregate:

Consider dataset:

school class name date_Of_Birth age height weight address

S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1

S2 s002 V Gino Mcneill17/05/2002 12 192 32street2

S3 s003 VI Ryan Parkes 16/02/1999 13 186 33street3

S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1

S5 s002 V Gino Mcneill11/05/2002 14 151 31street2

S6 s004 VI David Parkes 15/09/1997 12 159 32street4

1) Write a Pandas program to split the following dataframe into groups based on school

code. Also check the type of GroupBy object.

import pandas as pd

pd.set_option('display.max_rows', None)

#pd.set_option('display.max_columns', None)

stu_data = pd.DataFrame({'school_code': ['s001','s002','s003','s001','s002','s004'],'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'], 'name': ['Alberto Franco','GinoMcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'], 'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],'age': [12, 12, 13, 13, 14, 12], 'height': [173, 192, 186, 167, 151, 159], 'weight': [35, 32, 33, 30, 31, 32], 'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']}, index=['S1', 'S2', 'S3', 'S4', 'S5', 'S6'])

print("Original DataFrame:")

print(stu_data)

print('\nSplit the said data on school_code wise:')

result = stu_data.groupby(['school_code'])

for name,group in result:

print("\nGroup:")

print(name)

print(group0

print("\nType of the object:")

print(type(result))

O/P

Original DataFrame:

school_code class name date_Of_Birthage height weight \

S1 s001 V Alberto Franco 15/05/2002 12 173 35

S2 s002 V Gino Mcneill 17/05/2002 12 192 32

S3 s003 VI Ryan Parkes 16/02/1999 13 186 33

S4 s001 VI Eesha Hinton 25/09/1998 13 167 30

S5 s002 V Gino Mcneill 11/05/2002 14 151 31

S6 s004 VI David Parkes 15/09/1997 12 159 32

address

S1 street1

S2 street2

S3 street3

S4 street1

S5 street2

S6 street4

Split the said data on school_code wise:

Group:

s001

school_code class name date_Of_Birthage height weight \

S1 s001 V Alberto Franco 15/05/2002 12 173 35

S4 s001 VI Eesha Hinton 25/09/1998 13 167 30

address

S1 street1

S4 street1

Group:

s002

school_code class name date_Of_Birthage height weight \

S2 s002 V GinoMcneill 17/05/2002 12 192 32

S5 s002 V GinoMcneill 11/05/2002 14 151 31

address

S2 street2

S5 street2

Group:

s003

school_code class name date_Of_Birthage height weight address

S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3

Group:

s004

school_code class name date_Of_Birthage height weight \

S6 s004 VI David Parkes 15/09/1997 12 159 32

address

S6 street4

Type of the object:

pandas.core.groupby.generic.DataFrameGroupBy

2) Write a Pandas program to split the following dataframe by school code and getmean, min, and max value of age for each school.

import pandas as pd

pd.set_option('display.max_rows', None)

#pd.set_option('display.max_columns', None)

print("Original DataFrame:")

print(stu_data)

print('\nMean, min, and max value of age for each value of the school:')

grouped_single = stu_data.groupby('school_code').agg({'age': ['mean', 'min', 'max']})

print(grouped_single)

O/P

Original DataFrame:

school_code class name date_Of_Birthage height weight \

S1 s001 V Alberto Franco 15/05/2002 12 173 35

S2 s002 V Gino Mcneill 17/05/2002 12 192 32

S3 s003 VI Ryan Parkes 16/02/1999 13 186 33

S4 s001 VI Eesha Hinton 25/09/1998 13 167 30

S5 s002 V Gino Mcneill 11/05/2002 14 151 31

S6 s004 VI David Parkes 15/09/1997 12 159 32

address

S1 street1

S2 street2

S3 street3

S4 street1

S5 street2

S6 street4

Mean, min, and max value of age for each value of the school:

age

mean min max

school_code

s001 12.5 12 13

s002 13.0 12 14

s003 13.0 13 13

s004 12.0 12 12

H) Pandas Styling:

1) Create a dataframe of ten rows, four columns with random values. Write a Pandasprogram to highlight the negative numbers red and positive numbers black.

import pandas as pd

import numpy as np

np.random.seed(24)

df = pd.DataFrame({'A': np.linspace(1, 10, 10)})

df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],

axis=1)

print("Original array:")

print(df)

def color_negative_red(val):

color = 'red' if val< 0 else 'black'

return 'color: %s' % color

print("\nNegative numbers red and positive numbers black:")

df.style.applymap(color_negative_red)

O/P

Original array:

A B C D E

0 1.0 1.329212 -0.770033 -0.316280 -0.990810

1 2.0 -1.070816 -1.438713 0.564417 0.295722

2 3.0 -1.626404 0.219565 0.678805 1.889273

3 4.0 0.961538 0.104011 -0.481165 0.850229

4 5.0 1.453425 1.057737 0.165562 0.515018

5 6.0 -1.336936 0.562861 1.392855 -0.063328

6 7.0 0.121668 1.207603 -0.002040 1.627796

7 8.0 0.354493 1.037528 -0.385684 0.519818

8 9.0 1.686583 -1.325963 1.428984 -2.089354

9 10.0 -0.129820 0.631523 -0.586538 0.290720

Negative numbers red and positive numbers black:

Out[10]:

	A	B	C	D	E
0	1.000000	1.329212	-0.770033	-0.316280	-0.990810
1	2.000000	-1.070816	-1.438713	0.564417	0.295722
2	3.000000	-1.626404	0.219565	0.678805	1.889273
3	4.000000	0.961538	0.104011	-0.481165	0.850229
4	5.000000	1.453425	1.057737	0.165562	0.515018
5	6.000000	-1.336936	0.562861	1.392855	-0.063328
6	7.000000	0.121668	1.207603	-0.002040	1.627796
7	8.000000	0.354493	1.037528	-0.385684	0.519818
8	9.000000	1.686583	-1.325963	1.428984	-2.089354
9	10.000000	-0.129820	0.631523	-0.586538	0.290720

2) Create a dataframe of ten rows, four columns with random values. Write a Pandasprogram to highlight the maximum value in each column.

import pandas as pd

import numpy as np

np.random.seed(24)

df = pd.DataFrame({'A': np.linspace(1, 10, 10)})

df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],axis=1)

df.iloc[0, 2] = np.nan

df.iloc[3, 3] = np.nan

df.iloc[4, 1] = np.nan

df.iloc[9, 4] = np.nan

print("Original array:")

print(df)

def highlight_max(s):

'''

highlight the maximum in a Series green.

'''

is_max = s == s.max()

return ['background-color: green' if v else '' for v in is_max]

print("\nHighlight the maximum value in each column:")

df.style.apply(highlight_max,subset=pd.IndexSlice[:, ['B', 'C', 'D', 'E']])

O/P

Original array:

A B C D E

0 1.0 1.329212NaN -0.316280 -0.990810

1 2.0 -1.070816 -1.438713 0.564417 0.295722

2 3.0 -1.626404 0.219565 0.678805 1.889273

3 4.0 0.961538 0.104011 NaN 0.850229

4 5.0 NaN 1.057737 0.165562 0.515018

5 6.0 -1.336936 0.562861 1.392855 -0.063328

6 7.0 0.121668 1.207603 -0.002040 1.627796

7 8.0 0.354493 1.037528 -0.385684 0.519818

8 9.0 1.686583 -1.325963 1.428984 -2.089354

9 10.0 -0.129820 0.631523 -0.586538 NaN

Highlight the maximum value in each column:

Out[11]:

	A	B	C	D	E
0	1.000000	1.329212	nan	-0.316280	-0.990810
1	2.000000	-1.070816	-1.438713	0.564417	0.295722
2	3.000000	-1.626404	0.219565	0.678805	1.889273
3	4.000000	0.961538	0.104011	nan	0.850229
4	5.000000	nan	1.057737	0.165562	0.515018
5	6.000000	-1.336936	0.562861	1.392855	-0.063328
6	7.000000	0.121668	1.207603	-0.002040	1.627796
7	8.000000	0.354493	1.037528	-0.385684	0.519818
8	9.000000	1.686583	-1.325963	1.428984	-2.089354
9	10.000000	-0.129820	0.631523	-0.586538	nan

3) Create a dataframe of ten rows, four columns with random values. Write a Pandasprogram to highlight dataframe's specific columns.

import pandas as pd

import numpy as np

np.random.seed(24)

df = pd.DataFrame({'A': np.linspace(1, 10, 10)})

df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],axis=1)

df.iloc[0, 2] = np.nan

df.iloc[3, 3] = np.nan

df.iloc[4, 1] = np.nan

df.iloc[9, 4] = np.nan

print("Original array:")

print(df)

def highlight_cols(s):

color = 'grey'

return 'background-color: %s' % color

print("\nHighlight specific columns:")

df.style.applymap(highlight_cols, subset=pd.IndexSlice[:, ['B', 'C']])

O/P

Original array:

A B C D E

0 1.0 1.329212NaN -0.316280 -0.990810

1 2.0 -1.070816 -1.438713 0.564417 0.295722

2 3.0 -1.626404 0.219565 0.678805 1.889273

3 4.0 0.961538 0.104011 NaN 0.850229

4 5.0 NaN 1.057737 0.165562 0.515018

5 6.0 -1.336936 0.562861 1.392855 -0.063328

6 7.0 0.121668 1.207603 -0.002040 1.627796

7 8.0 0.354493 1.037528 -0.385684 0.519818

8 9.0 1.686583 -1.325963 1.428984 -2.089354

9 10.0 -0.129820 0.631523 -0.586538 NaN

Highlight specific columns:

Out[12]:

	A	B	C	D	E
0	1.000000	1.329212	nan	-0.316280	-0.990810
1	2.000000	-1.070816	-1.438713	0.564417	0.295722
2	3.000000	-1.626404	0.219565	0.678805	1.889273
3	4.000000	0.961538	0.104011	nan	0.850229
4	5.000000	nan	1.057737	0.165562	0.515018
5	6.000000	-1.336936	0.562861	1.392855	-0.063328
6	7.000000	0.121668	1.207603	-0.002040	1.627796
7	8.000000	0.354493	1.037528	-0.385684	0.519818
8	9.000000	1.686583	-1.325963	1.428984	-2.089354
9	10.000000	-0.129820	0.631523	-0.586538	nan

I) Excel:

1) Write a Pandas program to import excel data into a Pandas dataframe.

import pandas as pd

df = pd.read_excel('D:\Student.xlsx')

print(df.head)

O/P

<bound method NDFrame.head of Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3

0 NaNsidsname address

1 NaN 1 ABC STREET1

2 NaN 2 BCD STREET2

3 NaN 3 CDE STREET3

4 NaN 4 DEF STREET4

5 NaN 5 EFG STREET5>

2) Write a Pandas program to find the sum, mean, max, min value of a column of file.

PANDAS LAB EXP-2

Contact Form