Deprecations¶
This document outlines the Scrapy deprecation policy, how to handle deprecation warnings, and lists when various pieces of Scrapy have been removed or altered in a backward incompatible way, following their deprecation.
Deprecation policy¶
Scrapy features may be deprecated in any version of Scrapy.
After a Scrapy feature is deprecated in a non-bugfix release (see release number in Versioning and API Stability), that feature may be removed in any later Scrapy release.
For example, a feature of 1.0.0
deprecated in 1.1.0
may stop working in
1.2.0
or in any later version.
Deprecation warnings¶
When you use a deprecated feature, Scrapy issues a Python warning (see
warnings
).
Scrapy deprecation warnings use the following warning category:
-
class
scrapy.exceptions.
ScrapyDeprecationWarning
[source]¶ Warning category for deprecated Scrapy features.
Unlike
DeprecationWarning
, warnings in this category are shown by default.
Filtering out deprecation warnings¶
Filtering out only Scrapy warnings is not easy due to a Python issue.
If you do not mind filtering out all warnings, not only Scrapy deprecation
warnings, apply the ignore
warning filter
with -W
or PYTHONWARNINGS
. For example:
$ export PYTHONWARNINGS=ignore
Upcoming changes¶
The changes below will be required in a future version of Scrapy. We encourage you to apply any change that is applicable to your version of Scrapy.
Applicable since 1.4.0¶
Spider.make_requests_from_url
is removed, useSpider.start_requests
instead.
Applicable since 1.3.0¶
ChunkedTransferMiddleware
is removed, chunked transfers are supported by default.
Applicable since 1.1.0¶
scrapy.utils.python.isbinarytext
is removed. Usescrapy.utils.python.binary_is_text
instead, but mind that it returns the inverse value (isbinarytext() == not binary_is_text()
).- In
scrapy.utils.datatypes
, theMultiValueDictKeyError
exception and classesMultiValueDict
andSiteNode
are removed. - The previously bundled
scrapy.xlib.pydispatch
library is replaced by pydispatcher.
Applicable since 1.0.0¶
The following classes are removed in favor of
LinkExtractor
:scrapy.linkextractors.htmlparser.HtmlParserLinkExtractor scrapy.contrib.linkextractors.sgml.BaseSgmlLinkExtractor scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor
The
scrapy.crawler.Crawler.spiders
is removed, useCrawlerRunner.spider_loader
or instantiateSpiderLoader
with your settings.
1.7.0¶
429
is part of theRETRY_HTTP_CODES
setting by default.Crawler
,CrawlerRunner.crawl
andCrawlerRunner.create_crawler
do not accept aSpider
subclass instance, use aSpider
subclass.- Custom scheduler priority queues (see
SCHEDULER_PRIORITY_QUEUE
) must handleRequest
objects instead of arbitrary Python data structures. - The
scrapy.log
module is replaced by Python’s logging module. See Logging. - The
SPIDER_MANAGER_CLASS
setting is renamed toSPIDER_LOADER_CLASS
. - In
scrapy.utils.python
, thestr_to_unicode
andunicode_to_str
functions are replaced byto_unicode
andto_bytes
, respectively. scrapy.spiders.spiders
is removed, instantiateSpiderLoader
with your settings.- The
scrapy.telnet
module is moved toscrapy.extensions.telnet
. - The
scrapy.conf
module is removed, useCrawler.settings
. - In
scrapy.core.downloader.handlers
,http.HttpDownloadHandler
is removed, usehttp10.HTTP10DownloadHandler
. - In
scrapy.loader.ItemLoader
,_get_values
is removed, use_get_xpathvalues
. - In
scrapy.loader
,XPathItemLoader
is removed, useItemLoader
. - In
scrapy.pipelines.files.FilesPipeline
,file_key
is removed, usefile_path
. - In
scrapy.pipelines.images.ImagesPipeline
:file_key
is removed, usefile_path
image_key
is removed, usefile_path
thumb_key
is removed, usethumb_path
- In both
scrapy.selector
andscrapy.selector.lxmlsel
,HtmlXPathSelector
,XmlXPathSelector
,XPathSelector
, andXPathSelectorList
are removed, useSelector
. - In
scrapy.selector.csstranslator
:ScrapyGenericTranslator
is removed, useparsel.csstranslator.GenericTranslator_
ScrapyHTMLTranslator
is removed, useparsel.csstranslator.HTMLTranslator_
ScrapyXPathExpr
is removed, useparsel.csstranslator.XPathExpr_
- In
Selector
:_root
, both the constructor argument and the object property, are removed; , useroot
extract_unquoted
is removed, usegetall
select
is removed, usexpath
- In
SelectorList
:extract_unquoted
is removed, usegetall
select
is removed, usexpath
x
is removed, usexpath
scrapy.spiders.BaseSpider
is removed, useSpider
- In
Spider
and subclasses:DOWNLOAD_DELAY
is removed, use download_delayset_crawler
is removed, usefrom_crawler()
scrapy.utils.response.body_or_str
is removed
1.6.0¶
The following modules are removed:
scrapy.command
scrapy.contrib
(with all submodules)scrapy.contrib_exp
(with all submodules)scrapy.dupefilter
scrapy.linkextractor
scrapy.project
scrapy.spider
scrapy.spidermanager
scrapy.squeue
scrapy.stats
scrapy.statscol
scrapy.utils.decorator
See Module Relocations for more information.
The
scrapy.interfaces.ISpiderManager
interface is removed, usescrapy.interfaces.ISpiderLoader
instead.The
scrapy.settings.CrawlerSettings
class is removed, usescrapy.settings.Settings
instead.The
scrapy.settings.Settings.overrides
property is removed, useSettings.set(name, value, priority='cmdline')
instead (seeSettings.set
).The
scrapy.settings.Settings.defaults
property is removed, useSettings.set(name, value, priority='default')
instead (seeSettings.set
).Scrapy requires parsel ≥ 1.5. Custom
Selector
subclasses may be affected by backward incompatible changes in parsel 1.5.A non-zero exit code is returned from Scrapy commands when an error happens on spider inititalization.
1.5.2¶
- Scrapy’s telnet console requires username and password. See Telnet Console for more details.
1.5.0¶
- Python 3.3 is not supported anymore.
- The default Scrapy user agent string uses an HTTPS link to scrapy.org. Override
USER_AGENT
if you relied on the old value. - The logging of settings overridden by
custom_settings
changes from[scrapy.utils.log]
to[scrapy.crawler]
. LinkExtractor
ignores them4v
extension by default. Use thedeny_extensions
parameter of theLinkExtractor
constructor to override this behavior.- The
522
and524
status codes are added toRETRY_HTTP_CODES
.
1.4.0¶
LinkExtractor
does not canonicalize URLs by default. Passcanonicalize=True
to theLinkExtractor
constructor to override this behavior.- The
MemoryUsage
extension is enabled by default. - The
EDITOR
environment variable takes precedence over theEDITOR
setting.
1.3.0¶
HttpErrorMiddleware
logs errors withINFO
level instead ofDEBUG
.- By default, logger names now use a long-form path, e.g.
[scrapy.extensions.logstats]
, instead of the shorter “top-level” variant of prior releases (e.g.[scrapy]
). You can switch back to short logger names settingLOG_SHORT_NAMES
toTrue
. ChunkedTransferMiddleware
is removed fromDOWNLOADER_MIDDLEWARES
, chunked transfers are supported by default.
1.2.0¶
DefaultHeadersMiddleware
runs beforeUserAgentMiddleware
inDOWNLOADER_MIDDLEWARES
by default.- The HTTP cache extension and plugins that use the
.scrapy
data directory now work outside projects. - The
Selector
constructor does not allow passing bothresponse
andtext
arguments. - The
scrapy.utils.url.canonicalize_url
function has been moved to w3lib.url.canonicalize_url.
1.1.0¶
Response status code
400
is not retried by default. If you need the old behavior, add400
toRETRY_HTTP_CODES
.When uploading files or images to S3, the default ACL policy is now “private” instead of “public”. You can use
FILES_STORE_S3_ACL
to change it.LinkExtractor
ignores thepps
extension by default. Use thedeny_extensions
parameter of theLinkExtractor
constructor to override this behavior.In the output of the
scrapy.utils.url.canonicalize_url
function, non-ASCII query arguments are now encoded using the corresponding encoding, instead of forcing UTF-8. This could change the output of link extractors and invalidate some cache entries from older Scrapy versions.Responses with
application/x-json
asContent-Type
are parsed asTextResponse
objects.The
scrapy.optional_features
set is removed.The global command-line option
--lsprof
is removed.scrapy shell
supports URLs without scheme.For example, if you use
scrapy shell example.com
,http://example.com
is fetched in the shell. To fetch a local file calledexample.com
instead, you must either use explicit relative syntax (./example.com
) or an absolute path.
1.0.0¶
- The
scrapy.webservice
module is removed, use scrapy-jsonrpc instead. FeedExporter
subclasses must accept asettings
first argument.- The
spider_closed
signal does not receive aspider_stats
argument. - The
CONCURRENT_REQUESTS_PER_SPIDER
setting is removed, useCONCURRENT_REQUESTS_PER_DOMAIN
instead. - The
CONCURRENT_SPIDERS
setting is removed, use the max_proc setting of scrapyd instead. - The
scrapy.utils.python.FixedSGMLParser
class is removed as part of the deprecation of theBaseSgmlLinkExtractor
andSgmlLinkExtractor
classes of thescrapy.contrib.linkextractors.sgml
module. - The default value of the
SPIDER_MANAGER_CLASS
setting becomesscrapy.spiderloader.SpiderLoader
. - The
spiders
Telnet variable is removed. - The
spidermanager
argument of thespidercls_for_request()
function is renamed tospider_loader
. - The
scrapy.contrib.djangoitem
module is removed, use scrapy-djangoitem instead. - The
scrapy deploy
command is removed in favor of thescrapyd-deploy
command from scrapyd-client.