Using pandas to read a table from an HTML page.

Today I wanted to write a bit of simple code to try out a hypothesis I had about stock prices. I found historical data at At first I thought I'd have to write my own code using Python's HTMLParser. As much as I like to write code, I decided to save myself some time and find something that already does this. To my surprise, pandas, already has a function that reads data from HTML tables. Great. Let's see how it works.

First, I installed pandas, lxml and a bunch of other requirements in a virtual environment.

To read the tables from the webpage, I used the following lines of code:

  tables = pandas.read_html('', header=0)
  assert len(tables) == 1
  table = tables[0]

The header parameter, says that we should use the first row in a table as the dataframe's column names. Spefically for this table, one of the columns was getting a somewhat wrong name, but it was easy to fix it with this line of code:

table.columns = ['Date', 'Price']

There we go, we have read historical stock prices into a pandas dataframe. To iterate and print them, we can do it like this:

for (idx, row) in table.iterrows():
   date = row['Date']
   price = row['Price']
   print date, price