I have built this test to explore issues in a dataset. Now I want to expand it in order to have the counting of the most and least frequent values (maximum 10 Values but would be nice if this is adjustable)
lvl1 = ['A','A','A','A','A','B','B','B','B',np.nan ]
lvl2 = ['foo','foo','bar','bar','bar','foo','foo','foo','bar','bar']
lvl3= [1,1,1,2,2,3,3,4,5,6]
df = pd.DataFrame({ 'L1' : lvl1, 'L2' : lvl2, 'L3':lvl3})
df.apply(lambda x: [ 100*(1-x.count()/len(x.index)),x.dtype,x.unique()],result_type='expand').T.rename(index=str, columns={0: "Nullity %", 1: "Type",2:"Unique Values"})
This gives
Nullity % Type Unique Values
L1 10 object [A, B, nan]
L2 0 object [foo, bar]
L3 0 int [1,2,3,4,5,6]
How can I expand it to:
Nullity % Type UniuqueValue1 UniuqueValue2 UniuqueValue3 ... UniuqueValue-3 UniuqueValue-2 UniuqueValue-1
L1 10 object A:5 B:4 nan:1
L2 0 object foo:5 bar:5
L3 0 int 1:3 2:2 3:2 ... 4:1 5:1 6:1
Aucun commentaire:
Enregistrer un commentaire