Home Forums Main Forums Python Forum How to create multiple data frames from a list?

  • How to create multiple data frames from a list?

     Justin updated 4 years ago 5 Members · 13 Posts
  • Datura

    Member
    October 26, 2020 at 12:45 am
    Up
    0
    Down

    When using web crawler to extract ranking data for immigration from government websites, the extracted data are stored in a list of 20 dataframe elements.

    !pip install requests
    !pip install bs4
    !pip install lxml
    import requests
    from bs4 import BeautifulSoup

    url='https://www.canada.ca/en/immigration-refugees-citizenship/services/immigrate-canada/express-entry/eligibility/criteria-comprehensive-ranking-system/grid.html'
    html = requests.get(url).content
    raw = pd.read_html(html)
    type(raw) #raw is a list
    len(raw) # raw has 20 elements

    Then, we want to create 20 data frames for the 20 list elements. We can do it manually as below:

    A_Core = raw[0]
    B_Spouse=raw[1]
    C_Transfer_Education=raw[2]
    C_Transfer_Foreign=raw[3]
    C_Transfer_Certificate=raw[4]
    D_Additional=raw[5]
    Age=raw[6]

    This repetitive work is really annoying because we have so many data frames to create. Can we use looping to create the 20 data frames automatically???

    Who can develop the code? Please share with us, thanks!

    • This discussion was modified 4 years ago by  Datura.
    • This discussion was modified 4 years ago by  Datura.
    • This discussion was modified 4 years ago by  Justin.
  • yiyiyi

    Member
    October 26, 2020 at 12:02 pm
    Up
    0
    Down

    Hello, very good question, but in this case, we don’t need to convert these 20 elements into date frames because they are already date frames. You could use isinstance(raw[X], pd.DataFrame) to check and we will see that these are data frames already.

    • Datura

      Member
      October 26, 2020 at 12:08 pm
      Up
      0
      Down

      No. All these tables are totally different with different columns and structures, I need separate them and then use each one differently.

  • yiyiyi

    Member
    October 26, 2020 at 12:08 pm
    Up
    0
    Down

    And then to answer your loop question, if you’d like to read in these df using loops, this is definitely feasible. We could simply do something like this: for i in range(Len(row)): [list of names you defined before] [i] = raw[i]

    • Datura

      Member
      October 26, 2020 at 12:33 pm
      Up
      0
      Down

      I tried this method, it run through without errors, but all the data frames df1-df3 are still all empty. The loop does not overwrite the pre-defined empty ones.

      df1=pd.DataFrame()
      df2=pd.DataFrame()
      df3=pd.DataFrame()

      dflist=[df1,df2,df3]
      for i in range(len(dflist)):
      print("loop round: ", i)
      temp=raw[i]
      print(temp)
      dflist[i] = temp

      df1

      So, what’s wrong?

      • This reply was modified 4 years ago by  Datura.
      • yiyiyi

        Member
        October 26, 2020 at 8:01 pm
        Up
        1
        Down

        because when we call every element from the list, the one we are calling is actually the name. So what I refers is:

        we need the name but not overwrite. You could try:

        locals()[“df”+ str(i) ] = raw[i]

        this will create 20 dfs with names df1 … df20 like you did

        • This reply was modified 4 years ago by  Justin.
        • This reply was modified 4 years ago by  Justin.
        • Justin

          Administrator
          October 26, 2020 at 8:53 pm
          Up
          0
          Down

          Yes, I think this is probably the right way, let me try……. can you talk about the macro functions: locals() and global() in Python? You can use open a new discussion. Thanks!

        • Datura

          Member
          October 26, 2020 at 10:10 pm
          Up
          1
          Down

          Wow, it works, awesome! you are a genius. Thank you! We need to use the locals() /globals() macro symbol tables, which are similar to those in SAS.

          df1=pd.DataFrame({})
          df2=pd.DataFrame({})
          df3=pd.DataFrame({})
          df4=pd.DataFrame({})
          df5=pd.DataFrame({})
          df6=pd.DataFrame({})

          dflist=[df1,df2,df3, df4, df5, df6]

          for i in range(len(dflist)):
          print("loop round: ", i)
          locals()['df'+str(i+1)] = raw[i]

          df6

          • This reply was modified 4 years ago by  Datura.
  • Patrick

    Member
    October 26, 2020 at 4:07 pm
    Up
    1
    Down

    Content removed.

    • This reply was modified 4 years ago by  Patrick. Reason: irrelevant
    • Datura

      Member
      October 26, 2020 at 5:35 pm
      Up
      0
      Down

      ??? Is this the method to extract table data without using the read_html() function?

  • Anonymous

    Deleted User
    October 26, 2020 at 4:49 pm
    Up
    0
    Down

    if you need the dataframes created, the raw has already a list of dataframes. You can refer to it by raw[0], ….. etc. If you want to df1, df2, … instead of df[1], df[2] style, you have to do it manually.

    • Datura

      Member
      October 26, 2020 at 5:32 pm
      Up
      0
      Down

      No automatic method? OK, it’s too bad, really inconvenient to use raw[1], raw[2]……

      • This reply was modified 4 years ago by  Datura.
  • Justin

    Administrator
    October 26, 2020 at 11:15 pm
    Up
    0
    Down

    Thank all for the contribution and discussion, very useful and helpful.

    Based on Yi’s contribution, we can go one step further. If the newly created data frames don’t have any similar pattern, we can use below method to create all of them.

    ### create a list of dataframe names.
    dflist=['Core', 'Spouse', 'Education', 'Work', 'Certificate', 'Additional']

    ### looping over each element
    for i in range(len(dflist)):
    print("loop round: ", i)
    globals()[dflist[i]] = raw[i]

    print(Work)

    It will create the 6 data frames: Core, Spouse… Additional. Ideas on ideas, this is a more generic way to do it.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now