Home Forums Main Forums Big Data Cut out files using cygwin in UNIX

  • Cut out files using cygwin in UNIX

     Datura updated 4 years, 1 month ago 1 Member · 1 Post
  • Datura

    Member
    December 7, 2020 at 4:56 pm

    In real work, we often need to process huge files. For example, we have a huge text file with hundreds of columns, it is 10GB. It is impossible to open it in and text editors such as UltraEditor or Notepad++. However, we do need to check the data structure.

    For this purpose, we can use below UNIX command “head” to achieve it.

    head -2       filename.txt            >test_1234_heading.txt

    It will extract the first 2 columns and export them to a new text file. If you want 100 records, just change the syntax to “head -100”.

    More frequently, we may not need all the columns for our analysis. We only need several columns actually. The file will become much smaller and easy to process. In this case, we can use the “cut” UNIX command to cut out the file.

    For example, we only need the first 9 columns for our work. Below syntax will cut out them for us.

    cut  filename.txt   -d'|' -f-9     > test_1234_cut.txt
    ### -d: it indicates that the file is pipe-delimited. '|' is the delimiter.
    ### -f-9: cut out the first 9 columns from left.
    ### > test_1234_cut.txt: write out the cut-out to a new text file.

    Isn’t it cool and make our daily life easier? Please read more from below references:

    http://2min2code.com/articles/cut_command_intro

    http://2min2code.com/articles/<wbr>cut_command_intro/pipe_as_<wbr>delimiter

    Please feel free to ask if any questions.

    • This discussion was modified 4 years, 1 month ago by  Datura.
    • This discussion was modified 4 years, 1 month ago by  Datura.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now