Web Scraping 3: Error Handling On Downloads

Our download function currently doesn’t do much in the way of retying downloads. In this next part we will add in some code to make our function try and download the page 3 times if it fails.

We can use the status code of our request object to determine whether or not we should retry the download e.g. if the page is not found there is no point trying to download the page again. Make sure you add  importtime to the top of you file with the other imports.

def download(url, header = None, proxy = None, timeout = 5, tries = 3):
    """ Downloads a page using requests and returns the page content """
    if tries == 0:
        return False
    page = requests.get(url, headers = header, proxies = proxy, timeout = timeout)
    if page.status_code >= 300 and page.status_code < 400:
        # Error code in 300 < code < 400
        # Redirect Error
        return download(url, tries = tries - 1)
    elif page.status_code >= 400 and page.status_code < 500:
        # Error code in 400 < code < 500
        # Client Error (Something wrong with our request)
        return False
    elif page.status_code >= 500 and page.status_code < 600:
        # Error code 500 < code < 600
        # Server error (Try retying after 1 second)
        time.sleep(1)
        return download(url, tries = tries - 1)
    elif page.status_code >= 200 and page.status_code < 300:
        return page.content
    else:
        return False

The above code checks the status code of the request object and if the code is a redirect error or a server error then we wait a second before retrying. If the error code is a client error then that means there is something wrong with our request so there is no point retrying the download.

So now that we have a basic download script we can test the script. Continue to part 4 to find out how.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s