mercredi 31 octobre 2018

Unique Values and thier count in for each column in a pandas dataframe

I have built this test to explore issues in a dataset. Now I want to expand it in order to have the counting of the most and least frequent values (maximum 10 Values but would be nice if this is adjustable)

lvl1 = ['A','A','A','A','A','B','B','B','B',np.nan ]
lvl2 = ['foo','foo','bar','bar','bar','foo','foo','foo','bar','bar']
lvl3=  [1,1,1,2,2,3,3,4,5,6]
df = pd.DataFrame({ 'L1' : lvl1, 'L2' : lvl2, 'L3':lvl3})


df.apply(lambda x: [ 100*(1-x.count()/len(x.index)),x.dtype,x.unique()],result_type='expand').T.rename(index=str, columns={0: "Nullity %", 1: "Type",2:"Unique Values"})

This gives

 Nullity %   Type    Unique Values
L1  10        object  [A, B, nan]
L2  0         object  [foo, bar]
L3  0         int     [1,2,3,4,5,6]

How can I expand it to:

   Nullity %   Type    UniuqueValue1 UniuqueValue2 UniuqueValue3 ... UniuqueValue-3  UniuqueValue-2  UniuqueValue-1
L1  10         object  A:5               B:4          nan:1
L2  0          object  foo:5             bar:5
L3  0           int    1:3               2:2           3:2      ...   4:1             5:1               6:1  

Aucun commentaire:

Enregistrer un commentaire