Home › Forums › Main Forums › Big Data › Cut out files using cygwin in UNIX
-
In real work, we often need to process huge files. For example, we have a huge text file with hundreds of columns, it is 10GB. It is impossible to open it in and text editors such as UltraEditor or Notepad++. However, we do need to check the data structure.
For this purpose, we can use below UNIX command “head” to achieve it.
head -2 filename.txt >test_1234_heading.txt
It will extract the first 2 columns and export them to a new text file. If you want 100 records, just change the syntax to “head -100”.
More frequently, we may not need all the columns for our analysis. We only need several columns actually. The file will become much smaller and easy to process. In this case, we can use the “cut” UNIX command to cut out the file.
For example, we only need the first 9 columns for our work. Below syntax will cut out them for us.
cut filename.txt -d'|' -f-9 > test_1234_cut.txt
### -d: it indicates that the file is pipe-delimited. '|' is the delimiter.
### -f-9: cut out the first 9 columns from left.
### > test_1234_cut.txt: write out the cut-out to a new text file.Isn’t it cool and make our daily life easier? Please read more from below references:
http://2min2code.com/articles/cut_command_intro
http://2min2code.com/articles/<wbr>cut_command_intro/pipe_as_<wbr>delimiter
Please feel free to ask if any questions.
Log in to reply.