Using pandas to read a table from an HTML page.
Today I wanted to write a bit of simple code to try out a hypothesis I had about stock prices. I found historical data at multpl.com. At first I thought I'd have to write my own code using Python's HTMLParser. As much as I like to write code, I decided to save myself some time and find something that already does this. To my surprise, pandas, already has a function that reads data from HTML tables. Great. Let's see how it works.
First, I installed pandas, lxml and a bunch of other requirements in a virtual environment.
To read the tables from the webpage, I used the following lines of code:
tables = pandas.read_html('http://www.multpl.com/s-p-500-historical-prices/table/by-month', header=0)
assert len(tables) == 1
table = tables[0]
The header
parameter, says that we should use the first row in a table as the dataframe's column names. Spefically for this table, one of the columns was getting a somewhat wrong name, but it was easy to fix it with this line of code:
table.columns = ['Date', 'Price']
There we go, we have read historical stock prices into a pandas dataframe. To iterate and print them, we can do it like this:
for (idx, row) in table.iterrows():
date = row['Date']
price = row['Price']
print date, price