Webscraping - Reading stocks from Yahoo Finance¶

In the below simple steps, we will read stock data for crude oil from yahoo finance¶

Preparing the enviroment:¶

import pandas as pd
import requests
from bs4 import BeautifulSoup

Forming the url, and using the crude oil stock symbol and placing all as text in crude_data object¶

headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
stockSymbol = 'CL=F'
url = 'https://finance.yahoo.com/quote/'+ stockSymbol + '/history?p=' + stockSymbol
crude_data = requests.get(url, headers=headers, timeout=5).text
#print(crude_data)

Now we have to parse the text data to html using BeautifulSoap¶

soup = BeautifulSoup(crude_data, 'html5lib')
#soup

Now we will trun the html table into pandas dataframe:¶

crude_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])
# First we isolate the body of the table which contains all the information
# Then we loop through each row and find all the column values for each row
for row in soup.find("tbody").find_all('tr'):
    col = row.find_all("td")
    date = col[0].text
    Open = col[1].text
    high = col[2].text
    low = col[3].text
    close = col[4].text
    adj_close = col[5].text
    volume = col[6].text
    # Finally we append the data of each row to the table
    crude_data= crude_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)

Now we will print the dataframe:¶

#we will print the first 5 rows
crude_data.head()

	Date	Open	High	Low	Close	Volume	Adj Close
0	Apr 18, 2022	107.03	109.81	106.00	107.59	49,567	107.59
1	Apr 14, 2022	104.20	107.64	102.12	106.95	312,502	106.95
2	Apr 13, 2022	100.91	104.47	99.87	104.25	312,502	104.25
3	Apr 12, 2022	95.17	101.35	94.84	100.60	329,037	100.60
4	Apr 11, 2022	98.40	98.52	92.93	94.29	315,873	94.29