Searching For Files

Chapter 17

As we've explored our Linux environment, it's evident that:

Files abound in a typical Linux system! This raises the query: "How do we locate them?" While we understand that the Linux file system follows well-established conventions passed down through generations of Unix-like systems, the sheer volume of files can pose a significant challenge.

This chapter delves into two tools essential for locating files within a system. These tools include:

locate – Find files by name
find – Search for files in a directory hierarchy

Additionally, we'll explore a command frequently employed alongside file-search commands to manage the generated list of files:

xargs – Build and execute command lines from standard input

Moreover, we'll present a couple of commands to aid us in our investigations:

touch – Change file times
stat – Display file or file system status

`locate` – Find Files The Easy Way

The locate tool swiftly searches through pathnames in a database and displays all names that match a specified substring. For instance, if we aim to locate programs starting with "zip", presuming these programs are within directories ending with bin/, we can use locate in this manner to find our files:

[me@linuxmachine ~]$ locate bin/zip

The locate tool scans its database of pathnames and displays any that include the string "bin/zip":

/usr/bin/zip
/usr/bin/zipcloak
/usr/bin/zipgrep
/usr/bin/zipinfo
/usr/bin/zipnote
/usr/bin/zipsplit

When the search criteria become more complex, you can combine locate with other tools like grep to create more intricate and engaging searches:

[me@linuxmachine ~]$ locate zip | grep bin
/bin/bunzip2
/bin/bzip2
/bin/bzip2recover
/bin/gunzip
/bin/gzip
/usr/bin/funzip
/usr/bin/gpg-zip
/usr/bin/preunzip
/usr/bin/prezip
/usr/bin/prezip-bin
/usr/bin/unzip
/usr/bin/unzipsfx
/usr/bin/zip
/usr/bin/zipcloak
/usr/bin/zipgrep
/usr/bin/zipinfo
/usr/bin/zipnote
/usr/bin/zipsplit

The locate program has a longstanding history, and multiple variants are prevalent. Among the most frequently encountered in modern Linux distributions are slocate and mlocate, typically accessed via a symbolic link named locate. These versions share similar option sets, with some offering features like regular expression matching (covered in a later chapter) and wildcard support. Consult the locate's manual page to identify the installed version and its specific capabilities.

Where Does The locate Database Come From?

You might observe that on certain distributions, locate might not function immediately after system installation. However, attempting it the following day typically resolves the issue. This is because the locate database is generated by another program called updatedb. Normally, updatedb runs periodically as a cron job, a task executed at regular intervals by the cron daemon. On most systems with locate, updatedb runs once a day. As the database isn't continuously updated, very recent files won't appear in locate searches. To address this, you can manually run the updatedb program as the superuser by executing updatedb at the prompt.

`find` – Find Files The Hard Way

Although the locate program locates a file solely by its name, the find program explores a specified directory (along with its subdirectories) for files using various attributes. We'll focus extensively on find because of its numerous compelling features that will reappear frequently when we delve into programming concepts in subsequent chapters.

At its most basic, find is provided with one or more directory names to search. For instance, to generate a list of files in our home directory:

[me@linuxmachine ~]$ find ~

For most active user accounts, this command will generate an extensive list. As the list is directed to standard output, we can channel it into other programs. Let's employ wc to tally the number of files:

[me@linuxmachine ~]$ find ~ | wc -l
47068

Impressive work! The brilliance of find lies in its ability to pinpoint files that meet precise criteria. It achieves this through the (somewhat peculiar) utilization of options, tests, and actions. Let's begin by exploring the tests.

Tests

Suppose we aim to gather a list of directories from our search. To achieve this, we can include the following test:

[me@linuxmachine ~]$ find ~ -type d | wc -l
1695

By incorporating the test -type d, we narrowed down the search to directories. Conversely, we could have restricted the search to regular files using this test:

[me@linuxmachine ~]$ find ~ -type f | wc -l
38737

Here are the common file type tests supported by find:

File Type

Description

b

Block special device file

c

Character special device file

d

Operators

Despite the array of tests offered by find, there might still be a need for a more precise way to define the logical connections between these tests. For instance, consider the scenario where we aim to ascertain whether all files and subdirectories in a directory possess secure permissions. In this case, we'd search for files with permissions other than 0600 and directories with permissions other than 0700. Luckily, find offers a solution by enabling the combination of tests through logical operators, allowing for the creation of more intricate logical relationships. To execute the aforementioned test, we could employ this method:

[me@linuxmachine ~]$ find ~ \( -type f -not -perm 0600 \) -or \( -type d
-not -perm 0700 \)

Whoa, that might seem a bit puzzling at first glance. But once you familiarize yourself with these operators, they're not as complex as they appear. Here's the rundown:

Operator

Description

-and

Match when the tests on both sides of the operator are true. It can be abbreviated to -a. Note that when no operator is explicitly present, -and is implied by default.

-or

Match if a test on either side of the operator is true. It can be shortened to -o.

-not

Match if the test subsequent to the operator is false. It can be abbreviated using an exclamation point (!).

( )

Binds tests and operators together to create larger expressions, regulating the precedence of logical evaluations. By default, find evaluates from left to right. At times, it's essential to override this default order to achieve the desired outcome. Even when not explicitly required, including the grouping characters can enhance the command's readability. Note that because parentheses have special significance to the shell, they must be quoted when used on the command line to be passed as arguments to find. Typically, the backslash character is employed to escape them.

Armed with this list of operators, let's break down our find command. At the highest level, our tests are organized into two groups, divided by an -or operator:

( expression 1 ) -or ( expression 2 )

This structure makes sense because we're seeking files with a specific set of permissions and directories with a different set. If we're looking for both files and directories, why use -or instead of -and? Well, as find traverses through files and directories, each one is assessed to check if it matches the specified tests. We want to determine if it's either a file with inadequate permissions or a directory with inadequate permissions—it can't simultaneously be both. So, if we expand the grouped expressions, it looks like this:

( file with inadequate perms ) -or ( directory with inadequate perms )

Now, how do we define "bad permissions"? In reality, we test for "not good permissions" since we're aware of what qualifies as "good permissions." For files, we define good permissions as 0600 and for directories as 0700. The expression that checks files for "not good" permissions is:

-type f -and -not -perms 0600

And for directories:

-type d -and -not -perms 0700

As mentioned in the operator table above, the -and operator is implicitly understood and can be safely omitted. So, when assembling everything, we arrive at our final command:

find ~ ( -type f -not -perms 0600 ) -or ( -type d -not -perms 0700 )

However, since parentheses hold special meaning for the shell, we must escape them to prevent misinterpretation. Preceding each parenthesis with a backslash character accomplishes this task.

Understanding logical operators also involves grasping how two expressions separated by a logical operator function:

expr1 -operator expr2

In all cases, expr1 is always executed. However, the operator dictates whether expr2 is executed. Here's how it operates:

Results of expr1

Operator

expr2 is...

True

-and

Always performed

False

-and

Never performed

True

-or

Never performed

False

-or

Always performed

Why does this occur? It's all about enhancing efficiency. Consider -and, for instance. If the expression expr1 -and expr2 relies on expr1 being false, there's no need to execute expr2. Similarly, with the expression expr1 -or expr2, if expr1 evaluates to true, there's no necessity to proceed with expr2, as we already know that expr1 -or expr2 is true.

So, it's a speed optimization. But why is this significant? Because we can leverage this behavior to regulate how actions are executed, as we'll soon discover.

Predefined Actions

Time to roll up our sleeves! While having a list of results from our find command is handy, the real goal is to take action on these items. Luckily, find enables actions to be executed based on the search outcomes. There's a collection of preset actions and various methods to implement user-defined actions. Let's begin by exploring a few of the predefined actions:

Action

Description

-delete

Delete the currently matching file.

-ls

Perform the equivalent of ls -dils on the matching file. Output is sent to standard output.

-print

Output the full pathname of the matching file to standard output. This is the default action if no other action is specified.

-quit

Quit once a match has been made.

Just like the tests, there are numerous other actions available. Refer to the find manual page for complete details.

In our very first example, we did this:

find ~

which generated a list encompassing every file and subdirectory within our home directory. It generated a list because the -print action is implied if no other action is specified. Hence, our command could also be formulated as:

find ~ -print

find offers the capability to remove files based on specific criteria. For instance, to delete files with the file extension .BAK (commonly used for backup files), we can employ this command:

find ~ -type f -name '*.BAK' -delete

In this instance, the command searches through every file in the user's home directory and its subdirectories for filenames ending in .BAK. Upon discovery, these files are removed.

Warning

It's crucial to exercise extreme caution when employing the -delete action. Always conduct a test by substituting the -delete action with -print to verify the search results before proceeding.

Before proceeding further, let's revisit how logical operators influence actions. Take a glance at the following command:

find ~ -type f -name '*.BAK' -print

As observed, this command seeks every regular file (-type f) with names ending in .BAK (-name '*.BAK') and prints the relative pathname of each matched file to standard output (-print). However, the command's behavior is shaped by the logical connections between the tests and actions. Keep in mind the default implied -and relationship between each test and action. To enhance clarity in the logical relationships, we could also express the command in this manner:

find ~ -type f -and -name '*.BAK' -and -print

Now that our command is fully articulated, let’s explore how the logical operators impact its execution:

Test/Action

Is Performed Only If...

-print

-type f and -name '*.BAK' are true

-name ‘*.BAK’

-type f is true

-type f

Is always performed, since it is the first test/action in an -and relationship.

Given that the logical relationship between tests and actions dictates their execution, the order of these tests and actions holds significance. For instance, if we were to rearrange the order so that the -print action precedes the tests, the command's behavior would notably change:

find ~ -print -and -type f -and -name '*.BAK'

This modified command will print every file (as the -print action always evaluates to true) and then proceed to test for file type and the specified file extension.

User-Defined Actions

Apart from the preset actions, we have the option to execute custom commands using the -exec action. Its usage is as follows:

-exec command {} ;

In this structure, command represents the command's name, {} symbolically represents the current pathname, and the semicolon is a necessary delimiter denoting the command's end.

Here's an example demonstrating how -exec can replicate the functionality of the earlier discussed -delete action:

-exec rm '{}' ';'

Once more, as the brace and semicolon characters hold significance for the shell, they need to be quoted or escaped.

It's also plausible to execute a user-defined action interactively. By substituting -exec with -ok, the user is prompted before each specified command is executed:

find ~ -type f -name 'foo*' -ok ls -l '{}' ';'
< ls ... /home/me/bin/foo > ? y
-rwxr-xr-x 1 me me 224 2007-10-29 18:44 /home/me/bin/foo
< ls ... /home/me/foo.txt > ? y
-rw-r--r-- 1 me me 0 2008-09-19 12:53 /home/me/foo.txt

In this instance, we seek files beginning with the string "foo" and trigger the command ls -l each time a match is located. Employing the -ok action prompts the user before the execution of the ls command.

Improving Efficiency

When the -exec action is employed, it initiates a new instance of the specified command for each matching file found. However, there are instances where consolidating all search results into a single instance of the command is preferable. For instance, instead of executing commands individually like this:

ls -l file1 ls -l file2

we might opt for this approach:

ls -l file1 file2

ensuring the command is executed only once, not multiple times. There are two methods to achieve this: the traditional method utilizing the external command xargs and an alternative method using a new feature within find itself. Let's delve into the alternative method first.

By substituting the trailing semicolon character with a plus sign, we activate find's capability to merge the search results into an argument list, executing the desired command just once. Returning to our example:

find ~ -type f -name 'foo*' -exec ls -l '{}' ';'
-rwxr-xr-x 1 me me 224 2007-10-29 18:44 /home/me/bin/foo
-rw-r--r-- 1 me me 0 2008-09-19 12:53 /home/me/foo.txt

will execute ls each time a matching file is found. By changing the command to:

find ~ -type f -name 'foo*' -exec ls -l '{}' +
-rwxr-xr-x 1 me me 224 2007-10-29 18:44 /home/me/bin/foo
-rw-r--r-- 1 me me 0 2008-09-19 12:53 /home/me/foo.txt

we get the same results, but the system only has to execute the ls command once.

`xargs`

The xargs command serves an intriguing purpose. It takes input from standard input and transforms it into an argument list for a designated command. In our scenario, its usage would resemble this:

find ~ -type f -name 'foo*' -print | xargs ls -l
-rwxr-xr-x 1 me me 224 2007-10-29 18:44 /home/me/bin/foo
-rw-r--r-- 1 me me 0 2008-09-19 12:53 /home/me/foo.txt

Here, we witness the find command's output being directed into xargs, which subsequently assembles an argument list for the ls command before executing it.

Note

Although the command line can accommodate a substantial number of arguments, it isn't limitless. It's plausible to create commands that exceed the shell's acceptance threshold. In cases where the command line surpasses the system's supported maximum length, xargs executes the specified command with the maximum possible number of arguments, repeating this process until standard input is depleted. To determine the maximum command line size, execute xargs with the --show-limits option.

Dealing With Funny Filenames

Unix-like systems permit spaces (and even newlines!) within filenames. However, this can pose challenges for programs like xargs, which compile argument lists for other programs. An embedded space is construed as a delimiter, causing the resulting command to interpret each space-separated word as an individual argument. To tackle this issue, find and xargs offer the optional utilization of a null character as an argument separator. In ASCII, a null character is represented by the number zero (in contrast to, for instance, the space character, represented by the number 32 in ASCII). The find command offers the -print0 action, generating null-separated output. Simultaneously, the xargs command includes the --null option, accommodating null-separated input. Consider this example:

find ~ -iname '*.jpg' -print0 | xargs --null ls -l

Employing this technique ensures correct handling of all files, including those with embedded spaces in their names.

A Return To The Playground

Let's apply find to a (nearly) practical scenario. We'll set up an environment to experiment with what we've learned.

To start, we'll construct a playground abundant with numerous subdirectories and files:

[me@linuxmachine ~]$ mkdir -p playground/dir-{001..100}
[me@linuxmachine ~]$ touch playground/dir-{001..100}/file-{A..Z}

Witness the command line's prowess! These two lines swiftly birthed a playground directory containing 100 subdirectories, each housing 26 blank files. Try achieving that using a GUI!

Our magical method combined a familiar command (mkdir), an exotic shell expansion (braces), and a new command, touch. By pairing mkdir with the -p option (prompting mkdir to create parent directories of specified paths) alongside brace expansion, we effortlessly formed 100 subdirectories.

Typically, the touch command manages file access, change, and modification times. Yet, if a nonexistent file is referenced as an argument, touch crafts an empty file.

In our playground, we spawned 100 instances of a file named file-A. Now, let's locate them:

[me@linuxmachine ~]$ find playground -type f -name 'file-A'

Unlike ls, find does not yield results in a sorted order. Its sequence is dictated by the storage device's layout. To validate the existence of all 100 instances of the file, we can execute the following:

[me@linuxmachine ~]$ find playground -type f -name 'file-A' | wc -l
100

Now, let's explore locating files according to their modification times. This proves useful when establishing backups or organizing files chronologically. Initially, we'll generate a reference file for comparing modification times:

[me@linuxmachine ~]$ touch playground/timestamp

This command generates an empty file titled timestamp and updates its modification time to the current moment. We can confirm this using another useful command, stat, which serves as an advanced version of ls. stat unveils comprehensive details about a file and its attributes that the system comprehends:

[me@linuxmachine ~]$ stat playground/timestamp
 File: `playground/timestamp'
 Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 803h/2051d Inode: 14265061 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1001/ me) Gid: ( 1001/ me)
Access: 2008-10-08 15:15:39.000000000 -0400
Modify: 2008-10-08 15:15:39.000000000 -0400
Change: 2008-10-08 15:15:39.000000000 -0400

When we touch the file once more and subsequently inspect it using stat, we'll observe that the file's times have been revised:

[me@linuxmachine ~]$ touch playground/timestamp
[me@linuxmachine ~]$ stat playground/timestamp
 File: `playground/timestamp'
 Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 803h/2051d Inode: 14265061 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1001/ me) Gid: ( 1001/ me)
Access: 2008-10-08 15:23:33.000000000 -0400
Modify: 2008-10-08 15:23:33.000000000 -0400
Change: 2008-10-08 15:23:33.000000000 -0400

Next, let’s use find to update some of our playground files:

[me@linuxmachine ~]$ find playground -type f -name 'file-B' -exec touch
'{}' ';'

This command updates all files in the playground labeled as file-B. Following this, we'll utilize find to pinpoint the altered files by comparing them against the reference file timestamp:

[me@linuxmachine ~]$ find playground -type f -newer playground/timestamp

The outcomes encompass all 100 occurrences of file-B. As we executed a touch on all files within the playground named file-B subsequent to updating timestamp, they are now considered "newer" than timestamp and hence identifiable using the -newer test.

Now, let's revisit the earlier assessment for bad permissions and apply it to the playground:

[me@linuxmachine ~]$ find playground \( -type f -not -perm 0600 \) -or \(
-type d -not -perm 0700 \)

This command showcases a listing of all 100 directories and 2600 files within the playground (inclusive of timestamp and playground itself, totaling 2702 items) as none of them adheres to our criteria for "good permissions." Leveraging our understanding of operators and actions, we can supplement this command with actions to implement updated permissions across the files and directories within our playground:

[me@linuxmachine ~]$ find playground \( -type f -not -perm 0600 -exec
chmod 0600 '{}' ';' \) -or \( -type d -not -perm 0700 -exec chmod
0700 '{}' ';' \)

In everyday practice, it might be more convenient to execute two commands—one for the directories and another for the files—rather than employing this comprehensive compound command. However, it's beneficial to acknowledge that this method is possible. The key takeaway here is comprehending how operators and actions harmonize to execute practical tasks.

Options

Lastly, there are the options. These options serve to manage the scope of a find search. They can be incorporated alongside other tests and actions when crafting find expressions. Below is a compilation of the frequently utilized ones:

Option

Description

-depth

Instruct find to handle a directory's files prior to the directory itself. This option is automatically activated when the -delete action is employed.

-maxdepth levels

Specify the maximum depth that find will traverse within a directory tree while executing tests and actions.

-mindepth levels

Establish the minimum depth that find will explore within a directory tree before implementing tests and actions.

-mount

Instruct find to refrain from traversing directories mounted on separate file systems.

-noleaf

Instruct find to avoid optimizing its search presuming it's exploring a Unix-like file system. This adjustment is necessary when scanning DOS/Windows file systems and CD-ROMs.

Summary

It's evident that locate offers simplicity compared to the complexity of find. Each has its own advantages. Take the opportunity to delve into the diverse capabilities of find. Regular use can significantly enhance your comprehension of Linux file system operations.

PreviousNetworking NextArchiving And Backup

Last updated 2 years ago

hashtaglocate – Find Files The Easy Way

hashtagWhere Does The locate Database Come From?

hashtagfind – Find Files The Hard Way

hashtagTests

hashtagOperators

hashtagPredefined Actions

hashtagWarning

hashtagUser-Defined Actions

hashtagImproving Efficiency

hashtagxargs

hashtagNote

hashtagDealing With Funny Filenames

hashtagA Return To The Playground

hashtagOptions

hashtagSummary

`locate` – Find Files The Easy Way

Where Does The locate Database Come From?

`find` – Find Files The Hard Way

Tests

Operators

Predefined Actions

Warning

User-Defined Actions

Improving Efficiency

`xargs`

Note

Dealing With Funny Filenames

A Return To The Playground

Options

Summary