jeudi 30 mars 2017

Unit Testing Pig Script with streaming to a python script

I have a pig script and I am using org.apache.pig.pigunit.PigTest. I am streaming data through two scripts (a python script and an awk script) in my pig script. This works in my functional tests and when I manually run it, but it does not work in my unit test.

These are the function definitions:

DEFINE AWK `$AWK_SCRIPT` ship('$AWK_SCRIPT');
DEFINE PY `$PYTHON_BIN $PYTHON_SCRIPT` ship('$PYTHON_SCRIPT');

This gives me the error:

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias bid_recommendation_model_1. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING

If I remove the ship statement (which I can have my unit tests do for testing), I instead get "unable to open iterator," but with no explanation.

Basically all the pig script does is load the data and stream it through those scripts and outputs the union, eg:

py_output = STREAM input THROUGH PY as (...);
awk_output = STREAM input THROUGH AWK as (...);
result = UNION py_putput, awk_output;
STORE result INTO ...;

Aucun commentaire:

Enregistrer un commentaire