Before pre-processing and training a model on some data, I want to check that each feature (each column) of a dataframe is of the correct data type. i.e. if a dataframe has columns col1
, col2
, col3
, they should have types int
, float
, string
respectively as I have defined them (col1
can't be of type string
, the order matters).
What is the best way to do this if
- The columns have various types - int, float, timestamp, string
- There are too many columns (>500) to manually write out / label each column data type
Something like
types = df.dtypes # returns a pandas series
if types != correct_types:
raise TypeError("Some of the columns do not have the correct type")
Where correct_types
are the known data types of each column - these would need to be in the same order as types
to ensure each column type is correctly matched. It would also be good to know which column is throwing the error (so maybe a for loop over the columns is more appropriate?)
Is there any way to achieve this, and if so what is the best way to achieve this? Maybe I am looking at the issue the wrong way - more generally, how do I ensure that the columns of df
are of the correct data type as I have defined them?
Aucun commentaire:
Enregistrer un commentaire