Web Scraping Made Easy with Zene
Extracting data from websites is a common task for data analysts and researchers. Zene can automate this process, writing scripts to fetch, parse, and structure web content.
In this example, we ask Zene to scrape quotes from a public website.
The Task
python
task = """
Write a Python script `scrape_quotes.py` to scrape quotes from 'http://quotes.toscrape.com'.
Requirements:
1. Use `requests` to fetch the page content.
2. Use `BeautifulSoup` (bs4) to parse the HTML.
3. Extract the quote text, the author name, and the tags for each quote on the first page.
4. Store the data as a list of dictionaries.
5. Save the result to a JSON file named `quotes.json` with indentation for readability.
6. Print the number of quotes scraped.
"""Execution Process
1. Planning
The Planner recognizes this as a scraping task:
- Analyze: Visit the website (simulated via requests) to inspect the structure.
- Scrape: Write
scrape_quotes.pyusingrequestsandbs4. - Parse: Extract
span.text,small.author, anddiv.tags. - Save: Write to
quotes.json. - Run: Execute the script and verify the output file.
2. Execution & Reflection
- Step 1 (Analyze): The Executor writes the scraping logic. It correctly identifies the CSS selectors for quotes (
.quote), text (.text), author (.author), and tags (.tag). - Step 2 (Run): It executes the script.
- Step 3 (Verify): The Reflector checks if
quotes.jsonexists and contains valid JSON data.
The Result
scrape_quotes.py:
python
import requests
from bs4 import BeautifulSoup
import json
url = 'http://quotes.toscrape.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
quotes = []
for quote in soup.select('.quote'):
text = quote.select_one('.text').get_text(strip=True)
author = quote.select_one('.author').get_text(strip=True)
tags = [tag.get_text(strip=True) for tag in quote.select('.tag')]
quotes.append({
'text': text,
'author': author,
'tags': tags
})
with open('quotes.json', 'w') as f:
json.dump(quotes, f, indent=4)
print(f"Scraped {len(quotes)} quotes.")quotes.json (excerpt):
json
[
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein",
"tags": [
"change",
"deep-thoughts",
"thinking",
"world"
]
},
...
]Key Takeaway
Zene can automate web interactions, parsing HTML and structuring unstructured data into clean JSON for further analysis.
