lundi 9 avril 2018

pandas DataFrame comparison succeds and then after fails

I need some help please, I am testing equality of two dataframes df1 and df2 using the following code in jupyter notebook(I did not write all, but just the relevant parts):

1)

from pandas.util.testing import assert_frame_equal
import pandas as pd

def transform(df,my_dict,operator):

   for key, value in my_dict.items():
      if operator=='sum':
        df[key]=df[value].sum(axis=1)

    elif  operator=='count':
        df[key]=(df[value] >0).sum(axis=1)

2)

df1=pd.read_sas("rawdata.sas7bdat")       
transform(df1,my_dict,operator='count')
transform(df1,my_dict,operator='sum')

df2=pd.read_sas("testdata.sas7bdat") 

3)

  assert_frame_equal(df1[sorted(df1.columns)], df2[sorted(df2.columns)],check_dtype=False)

When I restart my jupyter kernel and I launch from 1 to 3, it gives me that the two dataframes are equal. But if I run again the code (without restarting the kernel) the comparison fails(Some columns are different with a tiny percentage).

I am a bit confused... Thanks in advance!

Aucun commentaire:

Enregistrer un commentaire