My objective is to test the spider written using scrapy
(Python
). I tried using contracts
but it is really limited in the sense that I can not test things like pagination or whether some attributes are extracted correctly or not.
def parse(self, response):
""" This function parses a sample response. Some contracts are mingled
with this docstring.
@url http://someurl.com
@returns items 1 16
@returns requests 0 0
@scrapes Title Author Year Price
"""
So the second idea is to mock
all the requests that the spider makes in one run, and use that in the testing
phase to check against expected results. However, I am unsure and how can I mock every request that is made via the spider. I looked into various libraries and one of them is betamax. However, it only supports http requests made by requests
client of Python. (As mentioned here). There is another library vcrpy, but it also supports limited clients only.
Are you using Requests? If you’re not using Requests, Betamax is not for you. You should checkout VCRpy. Are you using Sessions or are you using the functional API (e.g., requests.get)?
Last option is to manually record all the requests and somehow store them, but that's not really feasible at the scale which the spider make requests.
Does scrapy.Requests
use some underline python client which can be used to mock those requests? Or is there any other way I can mock all the http
requests made by the spider in one run and use that for testing the spider for expected behavior?
Aucun commentaire:
Enregistrer un commentaire