dimanche 29 mars 2020

How to set PYTHONHASHSEED environment variable in PyCharm for testing Word2Vec model?

I need to write a fully reproducible Word2Vec test, and need to set PYTHONHASHSEED to a fixed value. This is my current set-yp

# conftest.py
@pytest.fixture(autouse=True)
def env_setup(monkeypatch):
    monkeypatch.setenv("PYTHONHASHSEED", "123")

# test_w2v.py

def test_w2v():
    assert os.getenv("PYTHONHASHSEED") == "123"
    expected_words_embeddings = np.array(...)
    w2v = Word2Vec(my_tokenized_sentences, workers=1, seed=42, hashfxn=hash)
    words_embeddings = np.array([w2v.wv.get_vector(word) for word in sentence for sentence in my_tokenized_sentences)])
    np.testing.assert_array_equal(expected_words_embeddings, words_embeddings)

Here is the curious thing.

If I run the test from the terminal by doing PYTHONHASHSEED=123 python3 -m pytest test_w2v.py the test passes without any issues. However, if I run the test from PyCharm (using pytest, set up from Edit Configurations -> Templates -> Python tests -> pytest) then it fails. Most interestingly, it doesn't fail at assert os.getenv("PYTHONHASHSEED") == "123", but it fails at np.testing.assert_array_equal(expected_words_embeddings, words_embeddings)

Why could this be the case, and is there a way to fix this issue?

Aucun commentaire:

Enregistrer un commentaire