I need some help please, I am testing equality of two dataframes df1 and df2 using the following code in jupyter notebook(I did not write all, but just the relevant parts):
1)
from pandas.util.testing import assert_frame_equal
import pandas as pd
def transform(df,my_dict,operator):
for key, value in my_dict.items():
if operator=='sum':
df[key]=df[value].sum(axis=1)
elif operator=='count':
df[key]=(df[value] >0).sum(axis=1)
2)
df1=pd.read_sas("rawdata.sas7bdat")
transform(df1,my_dict,operator='count')
transform(df1,my_dict,operator='sum')
df2=pd.read_sas("testdata.sas7bdat")
3)
assert_frame_equal(df1[sorted(df1.columns)], df2[sorted(df2.columns)],check_dtype=False)
When I restart my jupyter kernel and I launch from 1 to 3, it gives me that the two dataframes are equal. But if I run again the code (without restarting the kernel) the comparison fails(Some columns are different with a tiny percentage).
I am a bit confused... Thanks in advance!
Aucun commentaire:
Enregistrer un commentaire