Deprecations

This document outlines the Scrapy deprecation policy, how to handle deprecation warnings, and lists when various pieces of Scrapy have been removed or altered in a backward incompatible way, following their deprecation.

Deprecation policy

Scrapy features may be deprecated in any version of Scrapy.

After a Scrapy feature is deprecated in a non-bugfix release (see release number in Versioning and API Stability), that feature may be removed in any later Scrapy release.

For example, a feature of 1.0.0 deprecated in 1.1.0 may stop working in 1.2.0 or in any later version.

Deprecation warnings

When you use a deprecated feature, Scrapy issues a Python warning (see warnings).

Scrapy deprecation warnings use the following warning category:

class scrapy.exceptions.ScrapyDeprecationWarning[source]

Warning category for deprecated Scrapy features.

Unlike DeprecationWarning, warnings in this category are shown by default.

Filtering out deprecation warnings

Filtering out only Scrapy warnings is not easy due to a Python issue.

If you do not mind filtering out all warnings, not only Scrapy deprecation warnings, apply the ignore warning filter with -W or PYTHONWARNINGS. For example:

$ export PYTHONWARNINGS=ignore

Upcoming changes

The changes below will be required in a future version of Scrapy. We encourage you to apply any change that is applicable to your version of Scrapy.

Applicable since 1.4.0

Applicable since 1.3.0

  • ChunkedTransferMiddleware is removed, chunked transfers are supported by default.

Applicable since 1.1.0

  • scrapy.utils.python.isbinarytext is removed. Use scrapy.utils.python.binary_is_text instead, but mind that it returns the inverse value (isbinarytext() == not binary_is_text()).
  • In scrapy.utils.datatypes, the MultiValueDictKeyError exception and classes MultiValueDict and SiteNode are removed.
  • The previously bundled scrapy.xlib.pydispatch library is replaced by pydispatcher.

Applicable since 1.0.0

  • The following classes are removed in favor of LinkExtractor:

    scrapy.linkextractors.htmlparser.HtmlParserLinkExtractor
    scrapy.contrib.linkextractors.sgml.BaseSgmlLinkExtractor
    scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor
    
  • The scrapy.crawler.Crawler.spiders is removed, use CrawlerRunner.spider_loader or instantiate SpiderLoader with your settings.

1.7.0

  • 429 is part of the RETRY_HTTP_CODES setting by default.
  • Crawler, CrawlerRunner.crawl and CrawlerRunner.create_crawler do not accept a Spider subclass instance, use a Spider subclass.
  • Custom scheduler priority queues (see SCHEDULER_PRIORITY_QUEUE) must handle Request objects instead of arbitrary Python data structures.
  • The scrapy.log module is replaced by Python’s logging module. See Logging.
  • The SPIDER_MANAGER_CLASS setting is renamed to SPIDER_LOADER_CLASS.
  • In scrapy.utils.python, the str_to_unicode and unicode_to_str functions are replaced by to_unicode and to_bytes, respectively.
  • scrapy.spiders.spiders is removed, instantiate SpiderLoader with your settings.
  • The scrapy.telnet module is moved to scrapy.extensions.telnet.
  • The scrapy.conf module is removed, use Crawler.settings.
  • In scrapy.core.downloader.handlers, http.HttpDownloadHandler is removed, use http10.HTTP10DownloadHandler.
  • In scrapy.loader.ItemLoader, _get_values is removed, use _get_xpathvalues.
  • In scrapy.loader, XPathItemLoader is removed, use ItemLoader.
  • In scrapy.pipelines.files.FilesPipeline, file_key is removed, use file_path.
  • In scrapy.pipelines.images.ImagesPipeline:
    • file_key is removed, use file_path
    • image_key is removed, use file_path
    • thumb_key is removed, use thumb_path
  • In both scrapy.selector and scrapy.selector.lxmlsel, HtmlXPathSelector, XmlXPathSelector, XPathSelector, and XPathSelectorList are removed, use Selector.
  • In scrapy.selector.csstranslator:
    • ScrapyGenericTranslator is removed, use parsel.csstranslator.GenericTranslator_
    • ScrapyHTMLTranslator is removed, use parsel.csstranslator.HTMLTranslator_
    • ScrapyXPathExpr is removed, use parsel.csstranslator.XPathExpr_
  • In Selector:
    • _root, both the constructor argument and the object property, are removed; , use root
    • extract_unquoted is removed, use getall
    • select is removed, use xpath
  • In SelectorList:
    • extract_unquoted is removed, use getall
    • select is removed, use xpath
    • x is removed, use xpath
  • scrapy.spiders.BaseSpider is removed, use Spider
  • In Spider and subclasses:
  • scrapy.utils.response.body_or_str is removed

1.6.0

  • The following modules are removed:

    • scrapy.command
    • scrapy.contrib (with all submodules)
    • scrapy.contrib_exp (with all submodules)
    • scrapy.dupefilter
    • scrapy.linkextractor
    • scrapy.project
    • scrapy.spider
    • scrapy.spidermanager
    • scrapy.squeue
    • scrapy.stats
    • scrapy.statscol
    • scrapy.utils.decorator

    See Module Relocations for more information.

  • The scrapy.interfaces.ISpiderManager interface is removed, use scrapy.interfaces.ISpiderLoader instead.

  • The scrapy.settings.CrawlerSettings class is removed, use scrapy.settings.Settings instead.

  • The scrapy.settings.Settings.overrides property is removed, use Settings.set(name, value, priority='cmdline') instead (see Settings.set).

  • The scrapy.settings.Settings.defaults property is removed, use Settings.set(name, value, priority='default') instead (see Settings.set).

  • Scrapy requires parsel ≥ 1.5. Custom Selector subclasses may be affected by backward incompatible changes in parsel 1.5.

  • A non-zero exit code is returned from Scrapy commands when an error happens on spider inititalization.

1.5.2

  • Scrapy’s telnet console requires username and password. See Telnet Console for more details.

1.5.0

  • Python 3.3 is not supported anymore.
  • The default Scrapy user agent string uses an HTTPS link to scrapy.org. Override USER_AGENT if you relied on the old value.
  • The logging of settings overridden by custom_settings changes from [scrapy.utils.log] to [scrapy.crawler].
  • LinkExtractor ignores the m4v extension by default. Use the deny_extensions parameter of the LinkExtractor constructor to override this behavior.
  • The 522 and 524 status codes are added to RETRY_HTTP_CODES.

1.4.0

  • LinkExtractor does not canonicalize URLs by default. Pass canonicalize=True to the LinkExtractor constructor to override this behavior.
  • The MemoryUsage extension is enabled by default.
  • The EDITOR environment variable takes precedence over the EDITOR setting.

1.3.0

  • HttpErrorMiddleware logs errors with INFO level instead of DEBUG.
  • By default, logger names now use a long-form path, e.g. [scrapy.extensions.logstats], instead of the shorter “top-level” variant of prior releases (e.g. [scrapy]). You can switch back to short logger names setting LOG_SHORT_NAMES to True.
  • ChunkedTransferMiddleware is removed from DOWNLOADER_MIDDLEWARES, chunked transfers are supported by default.

1.2.0

1.1.0

  • Response status code 400 is not retried by default. If you need the old behavior, add 400 to RETRY_HTTP_CODES.

  • When uploading files or images to S3, the default ACL policy is now “private” instead of “public”. You can use FILES_STORE_S3_ACL to change it.

  • LinkExtractor ignores the pps extension by default. Use the deny_extensions parameter of the LinkExtractor constructor to override this behavior.

  • In the output of the scrapy.utils.url.canonicalize_url function, non-ASCII query arguments are now encoded using the corresponding encoding, instead of forcing UTF-8. This could change the output of link extractors and invalidate some cache entries from older Scrapy versions.

  • Responses with application/x-json as Content-Type are parsed as TextResponse objects.

  • The scrapy.optional_features set is removed.

  • The global command-line option --lsprof is removed.

  • scrapy shell supports URLs without scheme.

    For example, if you use scrapy shell example.com, http://example.com is fetched in the shell. To fetch a local file called example.com instead, you must either use explicit relative syntax (./example.com) or an absolute path.

1.0.0

  • The scrapy.webservice module is removed, use scrapy-jsonrpc instead.
  • FeedExporter subclasses must accept a settings first argument.
  • The spider_closed signal does not receive a spider_stats argument.
  • The CONCURRENT_REQUESTS_PER_SPIDER setting is removed, use CONCURRENT_REQUESTS_PER_DOMAIN instead.
  • The CONCURRENT_SPIDERS setting is removed, use the max_proc setting of scrapyd instead.
  • The scrapy.utils.python.FixedSGMLParser class is removed as part of the deprecation of the BaseSgmlLinkExtractor and SgmlLinkExtractor classes of the scrapy.contrib.linkextractors.sgml module.
  • The default value of the SPIDER_MANAGER_CLASS setting becomes scrapy.spiderloader.SpiderLoader.
  • The spiders Telnet variable is removed.
  • The spidermanager argument of the spidercls_for_request() function is renamed to spider_loader.
  • The scrapy.contrib.djangoitem module is removed, use scrapy-djangoitem instead.
  • The scrapy deploy command is removed in favor of the scrapyd-deploy command from scrapyd-client.