Webscraping - Reading stocks from Yahoo Finance

In the below simple steps, we will read stock data for crude oil from yahoo finance

Preparing the enviroment:

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

Forming the url, and using the crude oil stock symbol and placing all as text in crude_data object

In [2]:
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
stockSymbol = 'CL=F'
url = 'https://finance.yahoo.com/quote/'+ stockSymbol + '/history?p=' + stockSymbol
crude_data = requests.get(url, headers=headers, timeout=5).text
#print(crude_data)

Now we have to parse the text data to html using BeautifulSoap

In [3]:
soup = BeautifulSoup(crude_data, 'html5lib')
#soup

Now we will trun the html table into pandas dataframe:

In [4]:
crude_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])
# First we isolate the body of the table which contains all the information
# Then we loop through each row and find all the column values for each row
for row in soup.find("tbody").find_all('tr'):
    col = row.find_all("td")
    date = col[0].text
    Open = col[1].text
    high = col[2].text
    low = col[3].text
    close = col[4].text
    adj_close = col[5].text
    volume = col[6].text
    # Finally we append the data of each row to the table
    crude_data= crude_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)

Now we will print the dataframe:

In [5]:
#we will print the first 5 rows
crude_data.head()
Out[5]:
Date Open High Low Close Volume Adj Close
0 Apr 18, 2022 107.03 109.81 106.00 107.59 49,567 107.59
1 Apr 14, 2022 104.20 107.64 102.12 106.95 312,502 106.95
2 Apr 13, 2022 100.91 104.47 99.87 104.25 312,502 104.25
3 Apr 12, 2022 95.17 101.35 94.84 100.60 329,037 100.60
4 Apr 11, 2022 98.40 98.52 92.93 94.29 315,873 94.29