Deprecations¶
This document outlines the Scrapy deprecation policy, how to handle deprecation warnings, and lists when various pieces of Scrapy have been removed or altered in a backward incompatible way, following their deprecation.
Deprecation policy¶
Scrapy features may be deprecated in any version of Scrapy.
After a Scrapy feature is deprecated in a non-bugfix release (see release number in Versioning and API Stability), that feature may be removed in any later Scrapy release.
For example, a feature of 1.0.0 deprecated in 1.1.0 may stop working in
1.2.0 or in any later version.
Deprecation warnings¶
When you use a deprecated feature, Scrapy issues a Python warning (see
warnings).
Scrapy deprecation warnings use the following warning category:
-
class
scrapy.exceptions.ScrapyDeprecationWarning[source]¶ Warning category for deprecated Scrapy features.
Unlike
DeprecationWarning, warnings in this category are shown by default.
Filtering out deprecation warnings¶
Filtering out only Scrapy warnings is not easy due to a Python issue.
If you do not mind filtering out all warnings, not only Scrapy deprecation
warnings, apply the ignore warning filter
with -W or PYTHONWARNINGS. For example:
$ export PYTHONWARNINGS=ignore
Upcoming changes¶
The changes below will be required in a future version of Scrapy. We encourage you to apply any change that is applicable to your version of Scrapy.
Applicable since 1.4.0¶
Spider.make_requests_from_urlis removed, useSpider.start_requestsinstead.
Applicable since 1.3.0¶
ChunkedTransferMiddlewareis removed, chunked transfers are supported by default.
Applicable since 1.1.0¶
scrapy.utils.python.isbinarytextis removed. Usescrapy.utils.python.binary_is_textinstead, but mind that it returns the inverse value (isbinarytext() == not binary_is_text()).- In
scrapy.utils.datatypes, theMultiValueDictKeyErrorexception and classesMultiValueDictandSiteNodeare removed. - The previously bundled
scrapy.xlib.pydispatchlibrary is replaced by pydispatcher.
Applicable since 1.0.0¶
The following classes are removed in favor of
LinkExtractor:scrapy.linkextractors.htmlparser.HtmlParserLinkExtractor scrapy.contrib.linkextractors.sgml.BaseSgmlLinkExtractor scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor
The
scrapy.crawler.Crawler.spidersis removed, useCrawlerRunner.spider_loaderor instantiateSpiderLoaderwith your settings.
1.7.0¶
429is part of theRETRY_HTTP_CODESsetting by default.Crawler,CrawlerRunner.crawlandCrawlerRunner.create_crawlerdo not accept aSpidersubclass instance, use aSpidersubclass.- Custom scheduler priority queues (see
SCHEDULER_PRIORITY_QUEUE) must handleRequestobjects instead of arbitrary Python data structures. - The
scrapy.logmodule is replaced by Python’s logging module. See Logging. - The
SPIDER_MANAGER_CLASSsetting is renamed toSPIDER_LOADER_CLASS. - In
scrapy.utils.python, thestr_to_unicodeandunicode_to_strfunctions are replaced byto_unicodeandto_bytes, respectively. scrapy.spiders.spidersis removed, instantiateSpiderLoaderwith your settings.- The
scrapy.telnetmodule is moved toscrapy.extensions.telnet. - The
scrapy.confmodule is removed, useCrawler.settings. - In
scrapy.core.downloader.handlers,http.HttpDownloadHandleris removed, usehttp10.HTTP10DownloadHandler. - In
scrapy.loader.ItemLoader,_get_valuesis removed, use_get_xpathvalues. - In
scrapy.loader,XPathItemLoaderis removed, useItemLoader. - In
scrapy.pipelines.files.FilesPipeline,file_keyis removed, usefile_path. - In
scrapy.pipelines.images.ImagesPipeline:file_keyis removed, usefile_pathimage_keyis removed, usefile_paththumb_keyis removed, usethumb_path
- In both
scrapy.selectorandscrapy.selector.lxmlsel,HtmlXPathSelector,XmlXPathSelector,XPathSelector, andXPathSelectorListare removed, useSelector. - In
scrapy.selector.csstranslator:ScrapyGenericTranslatoris removed, useparsel.csstranslator.GenericTranslator_ScrapyHTMLTranslatoris removed, useparsel.csstranslator.HTMLTranslator_ScrapyXPathExpris removed, useparsel.csstranslator.XPathExpr_
- In
Selector:_root, both the constructor argument and the object property, are removed; , userootextract_unquotedis removed, usegetallselectis removed, usexpath
- In
SelectorList:extract_unquotedis removed, usegetallselectis removed, usexpathxis removed, usexpath
scrapy.spiders.BaseSpideris removed, useSpider- In
Spiderand subclasses:DOWNLOAD_DELAYis removed, use download_delayset_crawleris removed, usefrom_crawler()
scrapy.utils.response.body_or_stris removed
1.6.0¶
The following modules are removed:
scrapy.commandscrapy.contrib(with all submodules)scrapy.contrib_exp(with all submodules)scrapy.dupefilterscrapy.linkextractorscrapy.projectscrapy.spiderscrapy.spidermanagerscrapy.squeuescrapy.statsscrapy.statscolscrapy.utils.decorator
See Module Relocations for more information.
The
scrapy.interfaces.ISpiderManagerinterface is removed, usescrapy.interfaces.ISpiderLoaderinstead.The
scrapy.settings.CrawlerSettingsclass is removed, usescrapy.settings.Settingsinstead.The
scrapy.settings.Settings.overridesproperty is removed, useSettings.set(name, value, priority='cmdline')instead (seeSettings.set).The
scrapy.settings.Settings.defaultsproperty is removed, useSettings.set(name, value, priority='default')instead (seeSettings.set).Scrapy requires parsel ≥ 1.5. Custom
Selectorsubclasses may be affected by backward incompatible changes in parsel 1.5.A non-zero exit code is returned from Scrapy commands when an error happens on spider inititalization.
1.5.2¶
- Scrapy’s telnet console requires username and password. See Telnet Console for more details.
1.5.0¶
- Python 3.3 is not supported anymore.
- The default Scrapy user agent string uses an HTTPS link to scrapy.org. Override
USER_AGENTif you relied on the old value. - The logging of settings overridden by
custom_settingschanges from[scrapy.utils.log]to[scrapy.crawler]. LinkExtractorignores them4vextension by default. Use thedeny_extensionsparameter of theLinkExtractorconstructor to override this behavior.- The
522and524status codes are added toRETRY_HTTP_CODES.
1.4.0¶
LinkExtractordoes not canonicalize URLs by default. Passcanonicalize=Trueto theLinkExtractorconstructor to override this behavior.- The
MemoryUsageextension is enabled by default. - The
EDITORenvironment variable takes precedence over theEDITORsetting.
1.3.0¶
HttpErrorMiddlewarelogs errors withINFOlevel instead ofDEBUG.- By default, logger names now use a long-form path, e.g.
[scrapy.extensions.logstats], instead of the shorter “top-level” variant of prior releases (e.g.[scrapy]). You can switch back to short logger names settingLOG_SHORT_NAMEStoTrue. ChunkedTransferMiddlewareis removed fromDOWNLOADER_MIDDLEWARES, chunked transfers are supported by default.
1.2.0¶
DefaultHeadersMiddlewareruns beforeUserAgentMiddlewareinDOWNLOADER_MIDDLEWARESby default.- The HTTP cache extension and plugins that use the
.scrapydata directory now work outside projects. - The
Selectorconstructor does not allow passing bothresponseandtextarguments. - The
scrapy.utils.url.canonicalize_urlfunction has been moved to w3lib.url.canonicalize_url.
1.1.0¶
Response status code
400is not retried by default. If you need the old behavior, add400toRETRY_HTTP_CODES.When uploading files or images to S3, the default ACL policy is now “private” instead of “public”. You can use
FILES_STORE_S3_ACLto change it.LinkExtractorignores theppsextension by default. Use thedeny_extensionsparameter of theLinkExtractorconstructor to override this behavior.In the output of the
scrapy.utils.url.canonicalize_urlfunction, non-ASCII query arguments are now encoded using the corresponding encoding, instead of forcing UTF-8. This could change the output of link extractors and invalidate some cache entries from older Scrapy versions.Responses with
application/x-jsonasContent-Typeare parsed asTextResponseobjects.The
scrapy.optional_featuresset is removed.The global command-line option
--lsprofis removed.scrapy shellsupports URLs without scheme.For example, if you use
scrapy shell example.com,http://example.comis fetched in the shell. To fetch a local file calledexample.cominstead, you must either use explicit relative syntax (./example.com) or an absolute path.
1.0.0¶
- The
scrapy.webservicemodule is removed, use scrapy-jsonrpc instead. FeedExportersubclasses must accept asettingsfirst argument.- The
spider_closedsignal does not receive aspider_statsargument. - The
CONCURRENT_REQUESTS_PER_SPIDERsetting is removed, useCONCURRENT_REQUESTS_PER_DOMAINinstead. - The
CONCURRENT_SPIDERSsetting is removed, use the max_proc setting of scrapyd instead. - The
scrapy.utils.python.FixedSGMLParserclass is removed as part of the deprecation of theBaseSgmlLinkExtractorandSgmlLinkExtractorclasses of thescrapy.contrib.linkextractors.sgmlmodule. - The default value of the
SPIDER_MANAGER_CLASSsetting becomesscrapy.spiderloader.SpiderLoader. - The
spidersTelnet variable is removed. - The
spidermanagerargument of thespidercls_for_request()function is renamed tospider_loader. - The
scrapy.contrib.djangoitemmodule is removed, use scrapy-djangoitem instead. - The
scrapy deploycommand is removed in favor of thescrapyd-deploycommand from scrapyd-client.