Home › Forums › Main Forums › Python Forum › How to use Python to extract table data from websites?
-
How to use Python to extract table data from websites?
-
I want to extract the data in tables at the BC_Immigration website:
https://www.welcomebc.ca/Immigrate-to-B-C/B-C-Provincial-Nominee-Program/Invitations-to-Apply
How many different methods we can use ? What is the best one?
-
One way is to use pandas package, it has the read_html() function and read HTML tables into a list of data frames. It is very simple and convenient. Any other method?
import requests
url = 'https://www.welcomebc.ca/Immigrate-to-B-C/B-C-Provincial-Nominee-Program/Invitations-to-Apply'
html = requests.get(url).content
html
tables = pd.read_html(html)
tables
df = tables[1]
df
df.to_excel('/kaggle/working/skilled.xlsx', sheet_name='skilled', index = False)Also, I want to extract the table names from the web content:
Table 1: Skills Immigration and Express Entry BC
Table 2: Entrepreneur Immigration
How to grab them and assign them to each list element?
Log in to reply.