Download files from CloudFlare CDN using PyCurl to solve permissions problems

When you use Python to download a file cached by CloudFlare CDN it’s mandatory that you inform a good “User Agent” and also support SSLv3 connections, otherwise you will get a file with the name you requested but with the HTML below inside:

Error 1010 Ray ID: 29xxxxxxxxxed • 2016-05-08 19:20:51 UTC

Access denied

What happened?

The owner of this website (website.domain.org) has banned your access based on your browser’s signature (xxxxxxxxxxxxxxxx-ua47).

CloudFlare Ray ID: 2xxxxxxxxxxxxed • Your IP: 1xx.20x.xx.xxx • Performance & security by CloudFlare

Realizei testes com urllib, urllib2, urllib3 e requests (que usa urllib), com todos ocorreram situações que ou envolveram instalar pacotes demais ou não havia bom suporte à SSLv3.

I’ve tested implementations using urllib, urllib2, urllib3 and requests (that uses urllib), with all of them I had situations related to install a lot of packages or no support to SSLv3.

The solution was to use PyCurl, that if you use Tornado and consume any API with its asynchronous HTTP client you probably has it installed.

Check the example below:

[code language=”python”] import pycurl

with open(‘testimage.jpg’, ‘wb’) as f:
c = pycurl.Curl()
c.setopt(pycurl.USERAGENT, ‘Mozilla/5.0 (Windows; U; Windows NT 6.1; it; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)’)
c.setopt(c.URL,’https://website-in-cloudflare-cdn.domain.extension/imagex.jpg’)
c.setopt(c.WRITEDATA, f)
c.perform()
c.close()[/code]

After this I found (in here) a setting (if you are the site owner. The website admin can turn this feature off by doing the following:
Settings->CloudFlare Settings->Browser Integrity Check->Toggle Off.

Download files from CloudFlare CDN using PyCurl to solve permissions problems

Error 1010 Ray ID: 29xxxxxxxxxed • 2016-05-08 19:20:51 UTC

Access denied

What happened?

UPS: Maintenance and retrieving information

How to setup and start using a VirtualEnv

Download files from CloudFlare CDN using PyCurl to solve permissions problems

Error 1010 Ray ID: 29xxxxxxxxxed • 2016-05-08 19:20:51 UTC

Access denied

What happened?

Share This Story, Choose Your Platform!

Related Posts

UPS: Maintenance and retrieving information

How to setup and start using a VirtualEnv