jeudi 28 juin 2018

Mocking the requests for testing in Scrapy Spider

My objective is to test the spider written using scrapy (Python). I tried using contracts but it is really limited in the sense that I can not test things like pagination or whether some attributes are extracted correctly or not.

def parse(self, response):
    """ This function parses a sample response. Some contracts are mingled
    with this docstring.

    @url http://someurl.com
    @returns items 1 16
    @returns requests 0 0
    @scrapes Title Author Year Price
    """

So the second idea is to mock all the requests that the spider makes in one run, and use that in the testing phase to check against expected results. However, I am unsure and how can I mock every request that is made via the spider. I looked into various libraries and one of them is betamax. However, it only supports http requests made by requests client of Python. (As mentioned here). There is another library vcrpy, but it also supports limited clients only.

Are you using Requests? If you’re not using Requests, Betamax is not for you. You should checkout VCRpy. Are you using Sessions or are you using the functional API (e.g., requests.get)?

Last option is to manually record all the requests and somehow store them, but that's not really feasible at the scale which the spider make requests.

Does scrapy.Requests use some underline python client which can be used to mock those requests? Or is there any other way I can mock all the http requests made by the spider in one run and use that for testing the spider for expected behavior?

Aucun commentaire:

Enregistrer un commentaire