📘 Lesson 30 · Advanced

Project: Python Web Scraper

Build a web scraper with requests and BeautifulSoup to extract and save data.

Project Overview

Web scraping extracts data from web pages automatically. Two libraries do the heavy lifting: requests downloads the HTML and BeautifulSoup parses it into a navigable tree. This is one of Python's most practical skills — used for data collection, price monitoring, research automation, and more.

terminal

pip install requests beautifulsoup4

▶ Output

Successfully installed requests-2.31.0 beautifulsoup4-4.12.0

⚠️

Check a site's robots.txt and Terms of Service before scraping. Only scrape sites that permit it. Add delays with time.sleep() to avoid hammering servers.

Fetching a Web Page

requests.get(url) sends an HTTP GET request. Status code 200 means success. The HTML content is in response.text. Always check the status code before parsing.

fetch.py

import requests
r = requests.get("https://books.toscrape.com")
print("Status:", r.status_code)
print("Preview:", r.text[:120])

▶ Output

Status: 200
Preview: 

  
    All products | Books to Scrape

Parsing HTML with BeautifulSoup

Pass the HTML string to BeautifulSoup. Use find_all(tag, class_=...) to find all matching elements. Access text with .text and attributes with bracket notation.

scrape.py

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://books.toscrape.com").text, "html.parser"
)
books = soup.find_all("article", class_="product_pod")
print(f"Found {len(books)} books")

for b in books[:3]:
    title = b.h3.a["title"]
    price = b.find("p", class_="price_color").text
    print(f"  {title[:38]:<40} {price}")

▶ Output

Found 20 books
  A Light in the Attic                    £51.77
  Tipping the Velvet                      £53.74
  Soumission                              £50.10

Saving to CSV

save.py

import requests, csv
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://books.toscrape.com").text, "html.parser")
books = soup.find_all("article", class_="product_pod")

with open("books.csv", "w", newline="") as f:
    w = csv.writer(f)
    w.writerow(["Title", "Price"])
    for b in books:
        w.writerow([b.h3.a["title"], b.find("p", class_="price_color").text])

print(f"Saved {len(books)} books to books.csv")

▶ Output

Saved 20 books to books.csv

🎉

Congratulations — you've completed Python A-Z! From variables and print() all the way to decorators, generators, and real projects. Now build something of your own — pick a problem you care about and solve it with Python.

🧠 Quick Check

Which library parses HTML in Python web scraping?

html5lib

lxml

BeautifulSoup

HTMLParser