📘 Lesson 30 · Advanced
Project: Python Web Scraper
Build a web scraper with requests and BeautifulSoup to extract and save data.
Project Overview
Web scraping extracts data from web pages automatically. Two libraries do the heavy lifting: requests downloads the HTML and BeautifulSoup parses it into a navigable tree. This is one of Python's most practical skills — used for data collection, price monitoring, research automation, and more.
terminal
pip install requests beautifulsoup4
▶ Output
Successfully installed requests-2.31.0 beautifulsoup4-4.12.0
Check a site's robots.txt and Terms of Service before scraping. Only scrape sites that permit it. Add delays with time.sleep() to avoid hammering servers.
Fetching a Web Page
requests.get(url) sends an HTTP GET request. Status code 200 means success. The HTML
content is in response.text. Always check the status code before parsing.
fetch.py
import requests r = requests.get("https://books.toscrape.com") print("Status:", r.status_code) print("Preview:", r.text[:120])
▶ Output
Status: 200
Preview:
All products | Books to Scrape Parsing HTML with BeautifulSoup
Pass the HTML string to BeautifulSoup. Use find_all(tag, class_=...) to find all
matching elements. Access text with .text and attributes with bracket notation.
scrape.py
import requests from bs4 import BeautifulSoup soup = BeautifulSoup( requests.get("https://books.toscrape.com").text, "html.parser" ) books = soup.find_all("article", class_="product_pod") print(f"Found {len(books)} books") for b in books[:3]: title = b.h3.a["title"] price = b.find("p", class_="price_color").text print(f" {title[:38]:<40} {price}")
▶ Output
Found 20 books A Light in the Attic £51.77 Tipping the Velvet £53.74 Soumission £50.10
Saving to CSV
save.py
import requests, csv from bs4 import BeautifulSoup soup = BeautifulSoup(requests.get("https://books.toscrape.com").text, "html.parser") books = soup.find_all("article", class_="product_pod") with open("books.csv", "w", newline="") as f: w = csv.writer(f) w.writerow(["Title", "Price"]) for b in books: w.writerow([b.h3.a["title"], b.find("p", class_="price_color").text]) print(f"Saved {len(books)} books to books.csv")
▶ Output
Saved 20 books to books.csv
Congratulations — you've completed Python A-Z! From variables and print() all the way to decorators, generators, and real projects. Now build something of your own — pick a problem you care about and solve it with Python.
🧠 Quick Check
Which library parses HTML in Python web scraping?