I'm a total amateur/hobbyist developer trying to learn more about testing the software I write. While I understand the core concept of testing, as the functions get more complicated, I feel as though it's a rabbit hole of varations, outcomes, conditions etc. For example...
The function below reads files from a directory into a Pandas DataFrame. A few columns adjustments are made before the data is passed to a different function that ultimately imports the data to our database.
I've already coded a test for the convert_date_string function. But what about this entire function as as whole - how do I write a test for it? In my mind, much of the Pandas library is already tested - thus making sure core functionality there works with my setup seems like a waste. But, maybe it isn't. Or, maybe this is a refactoring question to break this down into smaller parts?
Anyway, here is the code... any insight would be appreciated!
def process_file(import_id=None):
all_files = glob.glob(config.IMPORT_DIRECTORY + "*.txt")
if len(all_files) == 0:
return []
import_data = (pd.read_csv(f, sep='~', encoding='latin-1',
warn_bad_lines=True, error_bad_lines=False,
low_memory=False) for f in all_files)
data = pd.concat(import_data, ignore_index=True, sort=False)
data.columns = [col.lower() for col in data.columns]
data = data.where((pd.notnull(data)), None)
data['import_id'] = import_id
data['date'] = data['date'].apply(lambda x: convert_date_string(x))
insert_data_into_database(data=data, table='sales')
return all_files
Aucun commentaire:
Enregistrer un commentaire