Deprecations¶

This document outlines the Scrapy deprecation policy, how to handle deprecation warnings, and lists when various pieces of Scrapy have been removed or altered in a backward incompatible way, following their deprecation.

Deprecation policy¶

Scrapy features may be deprecated in any version of Scrapy.

After a Scrapy feature is deprecated in a non-bugfix release (see release number in Versioning and API Stability), that feature may be removed in any later Scrapy release.

For example, a feature of 1.0.0 deprecated in 1.1.0 may stop working in 1.2.0 or in any later version.

Deprecation warnings¶

When you use a deprecated feature, Scrapy issues a Python warning (see warnings).

Scrapy deprecation warnings use the following warning category:

class scrapy.exceptions.ScrapyDeprecationWarning[source]¶

Warning category for deprecated Scrapy features.

Unlike DeprecationWarning, warnings in this category are shown by default.

Filtering out deprecation warnings¶

Filtering out only Scrapy warnings is not easy due to a Python issue.

If you do not mind filtering out all warnings, not only Scrapy deprecation warnings, apply the ignore warning filter with -W or PYTHONWARNINGS. For example:

$ export PYTHONWARNINGS=ignore

Upcoming changes¶

The changes below will be required in a future version of Scrapy. We encourage you to apply any change that is applicable to your version of Scrapy.

Applicable since 1.4.0¶

Spider.make_requests_from_url is removed, use Spider.start_requests instead.

Applicable since 1.3.0¶

ChunkedTransferMiddleware is removed, chunked transfers are supported by default.

Applicable since 1.1.0¶

scrapy.utils.python.isbinarytext is removed. Use scrapy.utils.python.binary_is_text instead, but mind that it returns the inverse value (isbinarytext() == not binary_is_text()).
In scrapy.utils.datatypes, the MultiValueDictKeyError exception and classes MultiValueDict and SiteNode are removed.
The previously bundled scrapy.xlib.pydispatch library is replaced by pydispatcher.

Applicable since 1.0.0¶

The following classes are removed in favor of LinkExtractor:

scrapy.linkextractors.htmlparser.HtmlParserLinkExtractor
scrapy.contrib.linkextractors.sgml.BaseSgmlLinkExtractor
scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor

The scrapy.crawler.Crawler.spiders is removed, use CrawlerRunner.spider_loader or instantiate SpiderLoader with your settings.

1.7.0¶

429 is part of the RETRY_HTTP_CODES setting by default.
Crawler, CrawlerRunner.crawl and CrawlerRunner.create_crawler do not accept a Spider subclass instance, use a Spider subclass.
Custom scheduler priority queues (see SCHEDULER_PRIORITY_QUEUE) must handle Request objects instead of arbitrary Python data structures.
The scrapy.log module is replaced by Python’s logging module. See Logging.
The SPIDER_MANAGER_CLASS setting is renamed to SPIDER_LOADER_CLASS.
In scrapy.utils.python, the str_to_unicode and unicode_to_str functions are replaced by to_unicode and to_bytes, respectively.
scrapy.spiders.spiders is removed, instantiate SpiderLoader with your settings.
The scrapy.telnet module is moved to scrapy.extensions.telnet.
The scrapy.conf module is removed, use Crawler.settings.
In scrapy.core.downloader.handlers, http.HttpDownloadHandler is removed, use http10.HTTP10DownloadHandler.
In scrapy.loader.ItemLoader, _get_values is removed, use _get_xpathvalues.
In scrapy.loader, XPathItemLoader is removed, use ItemLoader.
In scrapy.pipelines.files.FilesPipeline, file_key is removed, use file_path.
In scrapy.pipelines.images.ImagesPipeline:
- file_key is removed, use file_path
- image_key is removed, use file_path
- thumb_key is removed, use thumb_path
In both scrapy.selector and scrapy.selector.lxmlsel, HtmlXPathSelector, XmlXPathSelector, XPathSelector, and XPathSelectorList are removed, use Selector.
In scrapy.selector.csstranslator:
- ScrapyGenericTranslator is removed, use parsel.csstranslator.GenericTranslator_
- ScrapyHTMLTranslator is removed, use parsel.csstranslator.HTMLTranslator_
- ScrapyXPathExpr is removed, use parsel.csstranslator.XPathExpr_
In Selector:
- _root, both the constructor argument and the object property, are removed; , use root
- extract_unquoted is removed, use getall
- select is removed, use xpath
In SelectorList:
- extract_unquoted is removed, use getall
- select is removed, use xpath
- x is removed, use xpath
scrapy.spiders.BaseSpider is removed, use Spider
In Spider and subclasses:
- DOWNLOAD_DELAY is removed, use download_delay
- set_crawler is removed, use from_crawler()
scrapy.utils.response.body_or_str is removed

1.6.0¶

The following modules are removed:
- scrapy.command
- scrapy.contrib (with all submodules)
- scrapy.contrib_exp (with all submodules)
- scrapy.dupefilter
- scrapy.linkextractor
- scrapy.project
- scrapy.spider
- scrapy.spidermanager
- scrapy.squeue
- scrapy.stats
- scrapy.statscol
- scrapy.utils.decorator
See Module Relocations for more information.
The scrapy.interfaces.ISpiderManager interface is removed, use scrapy.interfaces.ISpiderLoader instead.
The scrapy.settings.CrawlerSettings class is removed, use scrapy.settings.Settings instead.
The scrapy.settings.Settings.overrides property is removed, use Settings.set(name, value, priority='cmdline') instead (see Settings.set).
The scrapy.settings.Settings.defaults property is removed, use Settings.set(name, value, priority='default') instead (see Settings.set).
Scrapy requires parsel ≥ 1.5. Custom Selector subclasses may be affected by backward incompatible changes in parsel 1.5.
A non-zero exit code is returned from Scrapy commands when an error happens on spider inititalization.

1.5.2¶

Scrapy’s telnet console requires username and password. See Telnet Console for more details.

1.5.0¶

Python 3.3 is not supported anymore.
The default Scrapy user agent string uses an HTTPS link to scrapy.org. Override USER_AGENT if you relied on the old value.
The logging of settings overridden by custom_settings changes from [scrapy.utils.log] to [scrapy.crawler].
LinkExtractor ignores the m4v extension by default. Use the deny_extensions parameter of the LinkExtractor constructor to override this behavior.
The 522 and 524 status codes are added to RETRY_HTTP_CODES.

1.4.0¶

LinkExtractor does not canonicalize URLs by default. Pass canonicalize=True to the LinkExtractor constructor to override this behavior.
The MemoryUsage extension is enabled by default.
The EDITOR environment variable takes precedence over the EDITOR setting.

1.3.0¶

HttpErrorMiddleware logs errors with INFO level instead of DEBUG.
By default, logger names now use a long-form path, e.g. [scrapy.extensions.logstats], instead of the shorter “top-level” variant of prior releases (e.g. [scrapy]). You can switch back to short logger names setting LOG_SHORT_NAMES to True.
ChunkedTransferMiddleware is removed from DOWNLOADER_MIDDLEWARES, chunked transfers are supported by default.

1.2.0¶

DefaultHeadersMiddleware runs before UserAgentMiddleware in DOWNLOADER_MIDDLEWARES by default.
The HTTP cache extension and plugins that use the .scrapy data directory now work outside projects.
The Selector constructor does not allow passing both response and text arguments.
The scrapy.utils.url.canonicalize_url function has been moved to w3lib.url.canonicalize_url.

1.1.0¶

Response status code 400 is not retried by default. If you need the old behavior, add 400 to RETRY_HTTP_CODES.
When uploading files or images to S3, the default ACL policy is now “private” instead of “public”. You can use FILES_STORE_S3_ACL to change it.
LinkExtractor ignores the pps extension by default. Use the deny_extensions parameter of the LinkExtractor constructor to override this behavior.
In the output of the scrapy.utils.url.canonicalize_url function, non-ASCII query arguments are now encoded using the corresponding encoding, instead of forcing UTF-8. This could change the output of link extractors and invalidate some cache entries from older Scrapy versions.
Responses with application/x-json as Content-Type are parsed as TextResponse objects.
The scrapy.optional_features set is removed.
The global command-line option --lsprof is removed.
scrapy shell supports URLs without scheme.

For example, if you use scrapy shell example.com, http://example.com is fetched in the shell. To fetch a local file called example.com instead, you must either use explicit relative syntax (./example.com) or an absolute path.

1.0.0¶

The scrapy.webservice module is removed, use scrapy-jsonrpc instead.
FeedExporter subclasses must accept a settings first argument.
The spider_closed signal does not receive a spider_stats argument.
The CONCURRENT_REQUESTS_PER_SPIDER setting is removed, use CONCURRENT_REQUESTS_PER_DOMAIN instead.
The CONCURRENT_SPIDERS setting is removed, use the max_proc setting of scrapyd instead.
The scrapy.utils.python.FixedSGMLParser class is removed as part of the deprecation of the BaseSgmlLinkExtractor and SgmlLinkExtractor classes of the scrapy.contrib.linkextractors.sgml module.
The default value of the SPIDER_MANAGER_CLASS setting becomes scrapy.spiderloader.SpiderLoader.
The spiders Telnet variable is removed.
The spidermanager argument of the spidercls_for_request() function is renamed to spider_loader.
The scrapy.contrib.djangoitem module is removed, use scrapy-djangoitem instead.
The scrapy deploy command is removed in favor of the scrapyd-deploy command from scrapyd-client.