Learn to ethically scrape financial data from the web using Python. Build a stock price monitor with BeautifulSoup while respecting websites and following best practices.
🕒 9 min read
Hey there! 👋 Remember when we built that budget tracker and wished we could automatically pull stock prices or exchange rates? Well, today we’re making that happen!
I’ll never forget the day I realized I could access real-world financial data with just a few lines of Python. I was building a personal investment tracker, and manually entering stock prices was driving me crazy.
But wait - before we dive in, let me share something important. Web scraping is like picking fruit from a tree:
Today, I’ll show you how to scrape financial data ethically and responsibly - the right way!
website.com/robots.txt# Good practice example
import time
import requests
from bs4 import BeautifulSoup
# 1. Identify yourself
headers = {
'User-Agent': 'MyFinanceBot/1.0 (Contact: myemail@example.com)'
}
# 2. Respect robots.txt by checking first
# 3. Add delays between requests
time.sleep(2) # Be nice to the server!
Setting Up Your Scraping Environment Install Required Packages:
pip install requests beautifulsoup4 pandas
Basic Setup:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import json
from datetime import datetime
# Your scraper configuration
config = {
"timeout": 10,
"delay": 2, # Seconds between requests
"max_retries": 3,
"user_agent": "MyFinanceBot/1.0"
}
Project 1: Simple Stock Price Scraper Let’s start with a real example - scraping stock prices from Yahoo Finance:
import requests
from bs4 import BeautifulSoup
import time
class StockPriceScraper:
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
self.base_url = "https://finance.yahoo.com/quote/"
def get_stock_price(self, symbol):
"""Fetch current stock price for a given symbol"""
try:
url = f"{self.base_url}{symbol}"
response = requests.get(url, headers=self.headers, timeout=10)
if response.status_code != 200:
print(f"❌ Failed to fetch {symbol}: Status {response.status_code}")
return None
soup = BeautifulSoup(response.content, 'html.parser')
# Find price element (Yahoo Finance structure)
price_element = soup.find('fin-streamer', {'data-field': 'regularMarketPrice'})
if price_element:
price = float(price_element.text.replace(',', ''))
return price
else:
print(f"❌ Could not find price for {symbol}")
return None
except requests.exceptions.Timeout:
print(f"⏰ Timeout fetching {symbol}")
return None
except Exception as e:
print(f"❌ Error fetching {symbol}: {e}")
return None
def get_multiple_prices(self, symbols):
"""Get prices for multiple stocks with rate limiting"""
prices = {}
for symbol in symbols:
print(f"📊 Fetching {symbol}...")
price = self.get_stock_price(symbol)
if price:
prices[symbol] = price
time.sleep(1) # Be nice to the server
return prices
# Test it out
scraper = StockPriceScraper()
# Get prices for popular stocks
stocks = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA']
prices = scraper.get_multiple_prices(stocks)
print("\n📈 Current Stock Prices:")
print("-" * 30)
for symbol, price in prices.items():
print(f"{symbol}: ${price:.2f}")
Project 2: Currency Exchange Rate Monitor Let’s build a currency converter that gets live rates:
class CurrencyScraper:
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
self.base_url = "https://www.x-rates.com/calculator/"
def get_exchange_rate(self, from_currency, to_currency):
"""Get live exchange rate between two currencies"""
try:
# Build URL
url = f"{self.base_url}?from={from_currency}&to={to_currency}"
response = requests.get(url, headers=self.headers, timeout=10)
if response.status_code != 200:
print(f"❌ Failed to fetch exchange rate")
return None
soup = BeautifulSoup(response.content, 'html.parser')
# Find exchange rate
rate_element = soup.find('span', class_='ccOutputRslt')
if rate_element:
# Extract the numeric rate
rate_text = rate_element.text.strip()
# Remove currency code and convert to float
rate = float(rate_text.split(' ')[0])
return rate
else:
print(f"❌ Could not find exchange rate")
return None
except Exception as e:
print(f"❌ Error fetching exchange rate: {e}")
return None
def get_rates_for_base(self, base_currency, target_currencies):
"""Get exchange rates for base currency against multiple targets"""
rates = {}
print(f"\n💱 Fetching exchange rates for {base_currency}...")
print("-" * 30)
for currency in target_currencies:
print(f" {base_currency} → {currency}...")
rate = self.get_exchange_rate(base_currency, currency)
if rate:
rates[currency] = rate
time.sleep(1) # Be polite
return rates
# Test it
currency_scraper = CurrencyScraper()
rates = currency_scraper.get_rates_for_base('USD', ['EUR', 'GBP', 'JPY', 'CAD', 'AUD'])
print("\n💱 Exchange Rates (1 USD):")
print("-" * 30)
for currency, rate in rates.items():
print(f"1 USD = {rate:.4f} {currency}")
Project 3: Crypto Price Alert System Now let’s build something more advanced - a cryptocurrency price monitor:
class CryptoPriceMonitor:
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
self.coingecko_url = "https://api.coingecko.com/api/v3/simple/price"
# CoinGecko has a public API - use this instead of scraping!
# It's much more reliable and ethical
def get_crypto_prices_api(self, coins, currency='usd'):
"""Get crypto prices using CoinGecko API (recommended)"""
try:
params = {
'ids': ','.join(coins),
'vs_currencies': currency
}
response = requests.get(
self.coingecko_url,
params=params,
headers=self.headers,
timeout=10
)
if response.status_code == 200:
data = response.json()
prices = {}
for coin in coins:
if coin in data and currency in data[coin]:
prices[coin] = data[coin][currency]
return prices
else:
print(f"❌ API Error: {response.status_code}")
return None
except Exception as e:
print(f"❌ Error fetching crypto prices: {e}")
return None
def check_price_alerts(self, prices, alerts):
"""Check if any prices triggered alerts"""
triggered = []
for coin, price in prices.items():
if coin in alerts:
target = alerts[coin]
if price >= target['above']:
triggered.append({
'coin': coin,
'price': price,
'alert_type': 'above',
'target': target['above']
})
elif price <= target['below']:
triggered.append({
'coin': coin,
'price': price,
'alert_type': 'below',
'target': target['below']
})
return triggered
# Test it
monitor = CryptoPriceMonitor()
# Set price alerts
alerts = {
'bitcoin': {'above': 50000, 'below': 40000},
'ethereum': {'above': 3000, 'below': 2000}
}
# Get current prices
coins = ['bitcoin', 'ethereum']
prices = monitor.get_crypto_prices_api(coins)
if prices:
print("\n💰 Current Crypto Prices:")
print("-" * 30)
for coin, price in prices.items():
print(f"{coin.title()}: ${price:,.2f}")
# Check alerts
triggered = monitor.check_price_alerts(prices, alerts)
if triggered:
print("\n🚨 PRICE ALERTS TRIGGERED!")
for alert in triggered:
if alert['alert_type'] == 'above':
print(f"🔺 {alert['coin'].title()} is ABOVE ${alert['target']:,} (currently ${alert['price']:,.2f})")
else:
print(f"🔻 {alert['coin'].title()} is BELOW ${alert['target']:,} (currently ${alert['price']:,.2f})")
Project 4: Complete Financial Data Dashboard Let’s combine everything into a useful dashboard:
import json
import csv
from datetime import datetime
class FinancialDashboard:
def __init__(self):
self.data_dir = "financial_data"
self.stock_scraper = StockPriceScraper()
self.crypto_monitor = CryptoPriceMonitor()
# Create data directory if it doesn't exist
import os
os.makedirs(self.data_dir, exist_ok=True)
def track_portfolio(self, holdings):
"""Track a portfolio of stocks and crypto"""
print("\n📊 PORTFOLIO TRACKER")
print("=" * 50)
stocks = [h['symbol'] for h in holdings if h['type'] == 'stock']
cryptos = [h['symbol'] for h in holdings if h['type'] == 'crypto']
total_value = 0
portfolio_data = []
# Get stock prices
if stocks:
stock_prices = self.stock_scraper.get_multiple_prices(stocks)
for holding in holdings:
if holding['type'] == 'stock':
symbol = holding['symbol']
if symbol in stock_prices:
current_price = stock_prices[symbol]
value = current_price * holding['shares']
total_value += value
portfolio_data.append({
'asset': symbol,
'type': 'Stock',
'shares': holding['shares'],
'current_price': current_price,
'value': value,
'purchase_price': holding.get('purchase_price', 'N/A')
})
# Get crypto prices
if cryptos:
crypto_prices = self.crypto_monitor.get_crypto_prices_api(cryptos)
for holding in holdings:
if holding['type'] == 'crypto':
symbol = holding['symbol']
if symbol in crypto_prices:
current_price = crypto_prices[symbol]
value = current_price * holding['shares']
total_value += value
portfolio_data.append({
'asset': symbol,
'type': 'Crypto',
'shares': holding['shares'],
'current_price': current_price,
'value': value,
'purchase_price': holding.get('purchase_price', 'N/A')
})
# Display results
print("\n📈 Your Holdings:")
print("-" * 50)
for item in portfolio_data:
print(f"{item['asset']} ({item['type']})")
print(f" Shares: {item['shares']}")
print(f" Price: ${item['current_price']:,.2f}")
print(f" Value: ${item['value']:,.2f}")
print()
print("=" * 50)
print(f"💰 TOTAL PORTFOLIO VALUE: ${total_value:,.2f}")
# Save data
self.save_data(portfolio_data, total_value)
def save_data(self, data, total_value):
"""Save portfolio data to file"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Save as CSV
csv_file = f"{self.data_dir}/portfolio_{timestamp}.csv"
with open(csv_file, 'w', newline='') as file:
writer = csv.DictWriter(file, fieldnames=['asset', 'type', 'shares', 'current_price', 'value'])
writer.writeheader()
writer.writerows(data)
# Save summary
json_file = f"{self.data_dir}/summary_{timestamp}.json"
with open(json_file, 'w') as file:
json.dump({
'timestamp': timestamp,
'total_value': total_value,
'holdings': data
}, file, indent=2)
print(f"\n💾 Data saved to:")
print(f" CSV: {csv_file}")
print(f" JSON: {json_file}")
# Example usage
def demo_dashboard():
dashboard = FinancialDashboard()
# Sample portfolio holdings
holdings = [
{'type': 'stock', 'symbol': 'AAPL', 'shares': 10, 'purchase_price': 150.00},
{'type': 'stock', 'symbol': 'GOOGL', 'shares': 5, 'purchase_price': 2500.00},
{'type': 'crypto', 'symbol': 'bitcoin', 'shares': 0.5},
{'type': 'crypto', 'symbol': 'ethereum', 'shares': 2}
]
dashboard.track_portfolio(holdings)
# Uncomment to run
# demo_dashboard()
Ethical Scraping Best Practices Complete Scraper Template:
class EthicalScraper:
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'MyDataScraper/1.0 (Contact: your-email@example.com)'
})
self.delay = 2 # Seconds between requests
self.last_request_time = 0
def polite_request(self, url):
"""Make a request with rate limiting"""
# Respect rate limits
time_since_last = time.time() - self.last_request_time
if time_since_last < self.delay:
time.sleep(self.delay - time_since_last)
try:
response = self.session.get(url, timeout=10)
self.last_request_time = time.time()
# Check if we're being rate-limited
if response.status_code == 429:
print("⏰ Rate limited. Waiting longer...")
time.sleep(self.delay * 2)
return self.polite_request(url) # Retry
return response
except requests.exceptions.RequestException as e:
print(f"❌ Request failed: {e}")
return None
def check_robots_txt(self, domain):
"""Check robots.txt for permissions"""
try:
url = f"https://{domain}/robots.txt"
response = self.session.get(url, timeout=5)
if response.status_code == 200:
print(f"✅ robots.txt found for {domain}")
# Parse robots.txt content
# Look for your user-agent or 'Disallow: /'
return True
else:
print(f"⚠️ Could not fetch robots.txt for {domain}")
return True # Assume allowed if not found
except Exception as e:
print(f"⚠️ Error checking robots.txt: {e}")
return True # Assume allowed on error
def scrape_with_care(self, url):
"""Main scraping method with all safety checks"""
# Step 1: Check robots.txt
domain = url.split('/')[2] # Extract domain
if not self.check_robots_txt(domain):
print(f"❌ Disallowed by robots.txt")
return None
# Step 2: Make polite request
response = self.polite_request(url)
if response and response.status_code == 200:
return response.text
else:
return None
Handling Anti-Scraping Measures Common Issues and Solutions:
# Add delays between requests
import random
time.sleep(random.uniform(1, 3)) # Random delay
# Use retry logic
def request_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10)
if response.status_code == 200:
return response
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
time.sleep(2 ** attempt) # Exponential backoff
return None
# Rotate user agents
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]
def get_random_headers():
return {
'User-Agent': random.choice(user_agents),
'Accept': 'text/html,application/xhtml+xml,application/xml',
'Accept-Language': 'en-US,en;q=0.9',
}
Legal and Ethical Considerations ✅ Do: Scrape publicly available data
Respect robots.txt
Add delays between requests
Identify yourself with user-agent
Use APIs when available
Only collect what you need
❌ Don’t: Scrape behind paywalls
Bypass login systems
Scrape copyrighted content
Sell scraped data without permission
Overload servers with requests
Scrape personal information
What’s Next? Python for Finance! In our next FinTech post, we’ll explore “Financial Data Analysis with Python” where we’ll:
Analyze stock market trends
Calculate returns and volatility
Create financial visualizations
Build a simple trading strategy
Understand risk metrics
Your Web Scraping Mission This week, try these exercises:
Modify the stock price scraper to track 5 of your favorite stocks
Build a currency exchange rate monitor for your travel plans
Create a simple price alert system for products you want to buy
Share in comments: What financial data would you like to track automatically?
Wrapping Up Today you learned how to ethically collect financial data from the web! You can now:
✅ Scrape stock prices responsibly
✅ Track currency exchange rates
✅ Monitor cryptocurrency prices
✅ Build a portfolio dashboard
✅ Respect rate limits and robots.txt
✅ Handle anti-scraping measures
Remember: With great power comes great responsibility. Use these skills ethically and always respect the websites you scrape!
Happy ethical scraping! 🚀💰
What financial data are you most excited to track? Have questions about scraping techniques? Share your thoughts and projects in the comments!
Comments & Discussion
Join the conversation using your GitHub account. Comments are powered by Utterances.