Member-only story

Scraping Websites Using Python

Published in

Analytics Vidhya

6 min readFeb 2, 2022

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. Web scraping can both be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. In this blog I will discuss the following three libraries that are design specifically for web scraping; firstly we’ll be covering Beautiful Soup 4, then Selenium, and finally Scrapy.

Beautiful Soup

pip install beautifulsoup4

Beautiful Soup is a Python library for pulling data out of HTML and XML files and is realistically the easiest to learn and use among these 3 options. Beautiful Soup comes with it’s downfalls though, it has some dependencies, such as the need of the requests library to make requests to the website and the use of external parsers to extract data; for example, a XML or HTML parser. These dependencies can make it become quite a bit more difficult to transfer code between projects. Let’s take a look at Beautiful Soup in use, for this example I will use a py file from my last blog post about Programming a Twitter Bot that happened to use this library:

As shown above, just a couple lines of code are needed to extract data with BeautifulSoup, but we are still required to import requests to gain access the URL that we want to extract data from and html.parser to actually parse the content. Let’s take a look at some alternatives to this.

Selenium

pip install selenium

Selenium requires a driver to interface with the chosen browser:

Chrome:

Analytics Vidhya

Scraping Websites Using Python

Create an account to read the full story.

Published in Analytics Vidhya

Written by @lee-rowe

No responses yet