lundi 11 janvier 2016

More elegant way to test mapreduce streaming job?

I have a mapreduce job writing in Python. Before I put it on EMR I'd like to test it locally.

Currently the only way I know to test is to run the command:

cat input_file | python mapper.py | sort -k 1,1 | python reducer > output_file

But the pipe is a little scary to me cause if anything breaks in it I wouldn't know (other than check the exit code of this command).

Is there a more elegant/pythonic way to run the mapreduce and check it runs successfully (so I can catch a specific exception and handle it)?

Thank you

Aucun commentaire:

Enregistrer un commentaire