Mastering Wildcards & Globbing in Osquery: Unleashing Powerful Query Capabilities

Tags:
Blog Author
Milan Shah

Filepath globbing (filename patterns with wildcards) support in osquery has regularly been a source of confusion, frustration, and lost time. You can certainly explore the wildcarding system, but it is hoped that the notes below will help shed light on how globbing in osquery actually works to help save you some grief.

When we say you can use wildcards and globbing in osquery, we mean that osquery calls the system function glob() for filename pattern expansion when osquery is presented with a filename/path with wildcards. This, unfortunately, leads to the expectation that filename/paths work the same as they would in a Linux shell, since both rely on the same system call. The reality is distant from this expectation (for good reasons), leading to the sometimes head-scratching situations.

 

Here’s the general actions (simplified to remove special cases) taken by osquery with filenames/paths:

  1. Unilaterally replace all % characters in the incoming string with *
  2. If it’s a relative path, prefix the path with osquery’s current directory
  3. Canonicalize the file path (ie., remove and resolve all ./ and ../)
  4. Pass the string to glob(), and let it expand the filename/path to a set of fully qualified file names/paths

So far, this looks like it should produce the same results as in shell.

 

What’s Not to LIKE?

The first important nuance stems from the use of file globs as values in SQL LIKE or EQUALS (=) statements. This effectively adds an additional layer of comparisons which is not relevant for most Linux shell commands. This is actually why it is important to use the % wildcard instead of *, even though internally, the % will simply be replaced with * before being passed to glob().

 

So,

select filename,path from file where path like ‘/tmp/2files2deep/2deep/%’;

works as follows: (Note: The W0514 output was generated from a special build of osquery to illustrate what was internally being passed to the glob() function, and what was being returned.)

 

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep/%';
W0514 18:31:35.289675 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep/*
W0514 18:31:35.289749 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file1"
W0514 18:31:35.289762 10578 file.cpp:58] Adding row for file1
W0514 18:31:35.289803 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file2"
W0514 18:31:35.289813 10578 file.cpp:58] Adding row for file2

+----------+------------------------------+

| filename | path                         |

+----------+------------------------------+

| file1    | /tmp/2files2deep/2deep/file1 |

| file2    | /tmp/2files2deep/2deep/file2 |

+----------+------------------------------+

 

But,

select filename,path from file where path like ‘/tmp/2files2deep/2deep/*’;

works like this:

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep/*';
W0514 18:34:01.920370 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep/*
W0514 18:34:01.920431 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file1"
W0514 18:34:01.920444 10578 file.cpp:58] Adding row for file1
W0514 18:34:01.920485 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file2"
W0514 18:34:01.920495 10578 file.cpp:58] Adding row for file2

 

The latter returns no results, even though internally, glob() generated the same file names. The reason is that the SQLite layer (the SQL interpreter used by osquery), does not interpret the * as a wildcard, and so trims the output before returning.

So, it is important to use % as the wildcard.

 

ESCAPE From Victory?

Since osquery replaces all % characters in input filenames/paths without any constraint, it is not possible to escape the % character. Filenames with the % character will not work as expected, and there is no generally reliable best practice to avoid this.

 

Osquery does not accept the _ character as a wildcard to match a single character, so it’s not possible to match a single character.

 

Passing the “.” or “?” character directly, or doing more advanced globbing using character ranges inside [] or patterns inside {} will all work internally when passed to glob(), but will be trimmed out by the SQLite interpreter.

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep/file?';

W0514 18:41:52.184568 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep/file?
W0514 18:41:52.184628 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file1"
W0514 18:41:52.184639 10578 file.cpp:58] Adding row for file1
W0514 18:41:52.184680 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file2"
W0514 18:41:52.184690 10578 file.cpp:58] Adding row for file2

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep/file[1,2]';
W0514 18:42:23.973093 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep/file[1,2]
W0514 18:42:23.973166 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file1"
W0514 18:42:23.973179 10578 file.cpp:58] Adding row for file1
W0514 18:42:23.973209 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file2"
W0514 18:42:23.973230 10578 file.cpp:58] Adding row for file2

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep/file{1,2}';
W0514 18:44:08.148453 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep/file{1,2}
W0514 18:44:08.148495 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file1"
W0514 18:44:08.148507 10578 file.cpp:58] Adding row for file1
W0514 18:44:08.148546 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/file2"
W0514 18:44:08.148556 10578 file.cpp:58] Adding row for file2

So, the first principle in wildcards and globbing in osquery is to include the effects of SQLite’s pattern matching on the final results. Effectively, this means that osquery’s globbing is limited to the use of % as a “match any string” wildcard. The fancier stuff won’t work.

 

The Trailing Slash

A more insidious version of the effects of SQLite’s pattern matching superimposed on top of globbing is the effect of the trailing slash. Consider:

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep/';
W0514 19:03:24.414203 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep/
W0514 19:03:24.414275 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/"
W0514 19:03:24.414291 10578 file.cpp:58] Adding row for .

+----------+-------------------------+
| filename | path                    |
+----------+-------------------------+
| .        | /tmp/2files2deep/2deep/ |
+----------+-------------------------+

VS

osquery> select filename,path from file where path like '/tmp/2files2deep/2deep';
W0514 19:03:27.956544 10578 filesystem.cpp:258] Globbing /tmp/2files2deep/2deep
W0514 19:03:27.956581 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/"
W0514 19:03:27.956593 10578 file.cpp:58] Adding row for .

Note the trailing slash in the first select vs the missing slash in the latter. SQLite’s pattern matching is agnostic to the semantics of file paths; so, even though glob() handled the path /tmp/2files2deep/2deep/ in the same way as /tmp/2files2deep/2deep, the returned path was always correctly /tmp/2files2deep/2deep/. At the SQLite level, the returned path only matched the first LIKE string, and not the second.

 

To compound the confusion, consider using the EQUALS operator instead of LIKE. The same queries return different results! Do you see why? This time, it’s because of how glob works when presented a name with no wildcards, with and without a trailing slash.

osquery> select filename,path from file where path = '/tmp/2files2deep/2deep';
W0514 19:10:40.054903 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep"
W0514 19:10:40.054930 10578 file.cpp:58] Adding row for 2deep

+----------+------------------------+
| filename | path                   |
+----------+------------------------+
| 2deep    | /tmp/2files2deep/2deep |
+----------+------------------------+

osquery> select filename,path from file where path = '/tmp/2files2deep/2deep/';
W0514 19:11:11.355253 10578 file.cpp:42] Generating file info for: "/tmp/2files2deep/2deep/"
W0514 19:11:11.355281 10578 file.cpp:58] Adding row for .

+----------+-------------------------+
| filename | path                    |
+----------+-------------------------+
| .        | /tmp/2files2deep/2deep/ |
+----------+-------------------------+

 

File Monitoring

So far, so good? Let’s mess with your mind some more. Filepaths are also used to specify the paths to be monitored using osquery’s file monitoring functionality. In this case, of course, there is no additional SQLite LIKE or EQUALs pattern matching superimposed on top of glob. So, the string is passed to glob, and the resulting expansions are monitored. So, many of the cases that don’t work as part of normal SQL queries actually work with file monitoring.

In particular, “?”, characters specified in “[]”, and strings specified inside “{}”, work as they normally do in filename/path patterns.

 

The main limitation is that % still cannot be escaped, because osquery unilaterally replaces all instances of % with * as soon as it receives a filename/path.

 

Recursive Matching

The final confusion stems from the %% (equivalently, **) to specify a recursive traversal through directories to match. Since many shells support ** (sometimes after a global option is set – in bash, for example, do shopt -s globstar), the natural expectation is that it not only works like it does in shells, but it must be something handled by the glob() system function itself. Neither one is true. The glob() system function does not support any semantics associated with **, and instead, it evaluates simply to *. More importantly, osquery’s implementation of ** is very limited compared to that in a typical shell.

 

When osquery see’s a filename/path with ** in it, it simply calls glob in loop, each time simply adding “/**” to the end of the input path. It does this until glob returns an empty result, or the limit of recursion (default depth is 64, not user modifiable).

 

The end result is that putting ** at the end of the path usually works as expected – specifying a file monitor on /opt/google/** puts a monitor on every file in every directory under /opt/google, but /etc/**/*.conf does not put a file monitor on every .conf file that exists somewhere under /etc directory.

 

And of course, to give one a false sense of confidence, /etc/**/*.conf will actually list all the .conf files that occur exactly one level deep from /etc, but not two or more levels deep.

 

Related osquery resources: