Maker.io main logo

How To Use Advanced Unix Tools for Working with Textual Data

2023-12-11 | By Maker.io Staff

Understanding advanced Unix tools and mechanisms such as pipes, pagination, head, and tail can significantly enhance your efficiency in working with text files and data processing using the command line. This article offers a comprehensive guide to help you become proficient with these commands.

How To Use Advanced Unix Tools for Working with Textual Data

Understanding the Basics of the Unix Pipe and Redirection Operations

Most command-line programs produce some form of text output using the operating system’s standard output channel. This output usually appears in the terminal that started the program, and it may include calculation results, status messages, or binary data — for example, the pixels in an image. In addition, many other command-line tools or programs can take textual input and perform operations that manipulate the data.

The Unix pipe operator allows feeding the output of one program into another one without having to store the data in a file:

Copy Code
program_1 | program_2 | … | program_n

Using the small vertical bar (called the pipe symbol) allows chaining multiple programs where the OS passes the output of one program to the next one in the chain. As a more concrete example, the following instructions find the first three files in the current folder based on their name sorted in ascending order:

Copy Code
ls | sort | head -n 3

Which results in the following output:

How To Use Advanced Unix Tools for Working with Textual Data Users can achieve impressive results by chaining commands in a Unix terminal.

Further, the redirect operation (>) sends the output of a command-line program to a file. Note that this operator overwrites the contents in the file if it already exists or creates a new one:

Copy Code
command_1 > /path_to/file.txt

Alternatively, the append operation (>>) achieves a similar effect. However, it retains the existing contents in the target file and appends the new data to the end of the file:

Copy Code
command_1 >> /path_to/file.txt

Finally, you can combine the pipe, redirect, and append operations to perform complex calculations and write the results to files, which is especially useful when automating long running and periodic tasks:

Copy Code
command_1 | command_2 >> logfile.txt
 

Inspecting the Beginning and End of Files

One of the previous examples already employs the head instruction to display the first three results of the sorted file list. As that example shows, you can utilize the head command to output only the first few lines of a file or instruction. By default, head lists the first ten lines. However, you can supply an optional -n parameter to state how many lines the tool should show:

Copy Code
head -n 5 ./document.txt

How To Use Advanced Unix Tools for Working with Textual Data This image illustrates how the -n parameter affects the head command’s behavior.

As demonstrated before, head also works with the pipe operation. Similarly, tail displays the last few lines of a document or stream, which makes it valuable for quickly scanning through information, such as when viewing the most recent events in log files:

Copy Code
tail ./my-log-file.txt

However, unlike head, tail can also continuously monitor files and display the last lines in a file as they are being written, by adding the -f option after the file name. Similarly, tail works with pipes, making it useful for only viewing the results of commands that produce output, for example:

Copy Code
sudo apt update | tail -n 5

Using Less to Scroll Through Long Files

Sometimes, viewing only the first or the last few lines of a file does not suffice, and manually scanning through the file may be necessary. Yet as most console terminal windows only keep a limited number of lines, you can utilize the less command to display the contents of long files using virtual pages that fit the console window’s height. The program keeps track of the currently viewed page, and you can jump back and forth between pages to read the document more conveniently:

Copy Code
less long-document.txt

Once open, you can click the spacebar to jump one page ahead, use the b-button to navigate back to the previous page, and tap the arrow keys to scroll up and down one line at a time. Finally, q exits the reader and returns to the terminal window.

Similar to before, you can also combine less with other commands using the pipe operator to make the output easier to navigate:

Copy Code
sudo apt upgrade | less

Summary

Understanding the Unix pipe, redirect, and append operations is essential for working with textual data efficiently. Pipe allows sending the output of one command to another program for further processing, and you can utilize the operator to chain multiple commands. The redirect operation sends a command's output to a file, overwrites the contents of existing files, and creates new ones if necessary. In contrast, the append operation keeps the original contents and adds the new text to the end of the file.

The head and tail tools are helpful for only showing the first few or last lines of files. When combined with the pipe operation, head and tail can omit the end or beginning of a program's output, which can be practical when you are only interested in some parts of the output.

Finally, less allows scrolling through long text documents more conveniently by organizing the contents as separate pages, allowing you to skip back and forth quickly.

TechForum

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.

Visit TechForum