Intro to Osquery: Frequently Asked Questions for Beginners

Written by Amber Picotte | 7/12/18 10:38 PM

It’s back! osquery@scale ‘22: Risk Reduction for Modern Defenders will be happening in person at San Francisco’s Exploratorium on September 14 & 15. Join us for 2 days of captivating content, hands-on learning, and fun with your fellow osquery community members.

There is a growing and passionate community around osquery, actively sharing information and perspective, answering questions, exposing challenges and dispelling misconceptions. Even so, learning the basics as you're getting started requires a lot of piecing together bits of wisdom (ie Googling + reading + networking).

The intention of this post is to a) curate some of the great content from the community b) organize it to cover common questions for beginners c) incorporate some of what we've learned over the past three years through the Uptycs journey. If you like it, and it is helpful, let us know on Twitter and we'll create a more advanced FAQ next time around.

What Is Osquery?

Osquery is a universal endpoint agent that was developed by Facebook in 2014. It is an active and growing open source project on GitHub, with 230 contributors and more than 90 releases to-date.

According to the official osquery docs, osquery (os=operating system) is an operating system instrumentation framework that exposes an operating system as a high-performance relational database. Using SQL, you can write a single query to explore any given data, regardless of operating system.

This is a unique approach in the security landscape, creating one agent for many operating systems, leveraging a standard query language instead of creating a proprietary one, and collecting rich data sets that have broad applications. Osquery represents a fundamental rethinking of the fragmented, siloed approach plaguing the security industry today.

With that said, osquery is just an agent—“an instrumentation framework” for data collection. Security teams looking to put osquery into production and leverage the data for security protocols will need to consider:

How you’ll configure, deploy, and manage the agent
How you’ll manage query packs (more on these below) and schedules as the community adds more
Where you’ll store osquery data (and how much it will cost)
How you’ll analyze the data—i.e., what problems are you looking to solve? What questions do you need to ask?
How you’ll handle suspicious activity that requires further investigation or remediation
Whether you need any integrations with existing tooling
How you’ll troubleshoot production issues and develop any custom functionality you may need

This often leads to a build vs buy analysis. See the section below, "What are some pros and cons of osquery?" for additional considerations.

What Is Osquery Used For?

Allowing an organization to craft system queries using SQL statements, osquery provides a simplified tool for security engineers that are already familiar with SQL. Primarily used to troubleshoot performance and operational issues, osquery is a flexible tool valued for its ability to be used for a variety of use cases.

Is Osquery an EDR?

Osquery lets you query machines to both preempt threats and find them, performing as an audit system, compliance tool, and an EDR.

What Is Linux Osquery?

Alongside Windows and OS X (macOS), Linux is an operating system that osquery functions as an operating system instrumentatio framework for, making low-level operating system analytics and monitoring both performant and intuitive. Osquery posits operating systems as high-performance relational databases.

What Operating Systems Does Osquery Support?

Currently, osquery supports OS X (macOS), Linux, FreeBSD, and Windows. Osquery can also monitor and extract data from Docker containers. One of the most powerful features of osquery is its ability to collect and normalize relational data independent of operating system.

Because of the subtleties that exist between platforms, with other agent-based solutions users are often forced to write (and maintain) scripts to extract related information—an approach that quickly becomes a barrier to scale. Osquery solves this by exposing operating system information as normalized SQL tables. In other words, users now have the ability to ask the same questions, to get the same type of answers, regardless of operating system.

Does Osquery Use SQL?

Using SQL tables to represent abstract concepts such as running processes, loaded kernel modules, open network connections, browser plugins, hardware events, and file hashes, osquery allows teams to write SQL-based queries to explore data across all operating systems and infrastructure.

What Tables Does Osquery Support?

As of publish date, Osquery version 3.2.6 supports 207 tables. You can view all of the supported tables and schema information, sortable by release version, at osquery.io/schema

What Type Of Data/Information Can I Get From Osquery?

Far richer than standard log files, here are just a few of the data elements osquery collects:

running processes
user logins
loaded kernel modules
open network connections
browser plugins
hardware events
file hashes
sockets
mounts
ports
storage volumes
packages

And a great deal more.

How Much Data Is Osquery Collecting on Average Per Endpoint Each Day? Where Is This Data Stored?

We’ve observed osquery generating an average of 110MB of data per endpoint, per day. Of course, your mileage may vary depending on the monitored assets function, and what data is being collected.

Osquery supports both an interactive query console (osqueryi) and a distributed host monitoring daemon (osqueryd) that can be used to schedule queries on a reoccurring basis. From a data storage perspective, the amount of data collected, combined with the desired retention period, will ultimately dictate your requirements. For example, osquery data can be:

Not stored anywhere—i.e. queried only in real-time via osqueryi, the osquery interactive query console/shell
Stored in a SIEM (Splunk, ELK stack, etc.)
Aggregated in a security analytics platform, like Uptycs, for integrated threat intel, correlations, anomaly detection, and more.

What Is a Query Pack?

A query pack is a group, or collection of queries, designed to accomplish a particular function or task. Query packs are typically configured with a specific run schedule to help avoid any potential impact to the host machine. For example, there are query packs for incident response, vulnerability management, known OS X malware, and more (you can find a full list of query packs here).

It’s important to note that while query packs will help you collect organized sets of data, simply running a query pack for compliance, for example, does not mean you are compliant— it means you’re scheduled to collect the data required to answer questions about your compliance standing. You still need to review that data and understand what it means for your particular compliance standards and goals.

Query packs are also meant to work for the majority and require the end user to ensure usefulness, determine if any pruning is required, and weigh the performance impact. Commercial offerings (like Uptycs) provide this query pack optimization and offer the analytical insight required to take action based on the information the query pack yields (ie, X machines are failing compliance check Y because of Z.)

What Are Some Pros & Cons Of Osquery?

In addition to the benefits of an open source universal approach, Trail of Bits shares that teams like osquery because it’s:

Simpler to use
More customizable
Exposes teams to new endpoint data to which they never before had access

Some of osquery's cons, or areas for improvement, include requests for more extensive documentation (especially on Windows), commercially available support, and continued expansion and parity for operating systems outside of macOS and Linux. In addition, even resource-rich security teams that have deployed a fully open source/DIY solution around osquery have learned that:

Cost of data storage can be high—For example, Elastic is 3x the cost of the bytes transmitted from osquery. On average, 110MB are transmitted from each endpoint per day.
Translating the incremental data is hard—Making sense of the information for vulnerability management, threat investigation, compliance and audits, etc. is quite complicated and requires more heavy lifting than initially anticipated.
Optimizing queries and query packs is critical—Building your own query packs can have an impact on computing resources. A query will often pull far more system data than anticipated and this can cause systems to crash, yet testing and optimization before deployment is taxing, and in some cases, not possible.
Third-party data is still needed—For threat or intrusion detection, integrated third-party data sets are still necessary.

For a more technical perspective on some of the current challenges of osquery, watch this video from QueryCon 2018 where Teddy Reed, Facebook Security Engineering Manager, shares what keeps him up at night with osquery.

How Do I Get Started And Install Osquery?

There are a few options for getting started with osquery. Before you begin, you’ll need to consider:

How you’ll configure, deploy, and manage the agent
How you’ll manage query packs and schedules as the community adds more
Where you’ll store osquery data (and how much it will cost)
How you’ll analyze the data—i.e., what problems are you looking to solve?
How you’ll handle suspicious activity that requires further investigation or remediation
Whether you need any integrations with existing tooling
How you’ll troubleshoot production issues and develop any custom functionality you may need

Deploying The Osquery Agent

Download osquery from osquery.io—you’ll find macOS, Linux, RPM, Debian, and Windows versions (you may need to customize your configuration). For example, here is a post from Joshua Brower at Defensive Depth that walks through custom MSI configs.

We’ve also created a few handy videos that walk through various osquery installations. You can see them on YouTube.

Everything Else

Build it: You can build an end-to-end osquery solution yourself, pairing other open source and commercially available products with custom developed functionality if time, cost, and resources aren’t limiting constraints. This post from Chris Long of Palantir provides a comprehensive playbook of how they constructed a solution for rapid incident response.

Buy it: You can sign up for a free trial of Uptycs—this will enable you to deploy the necessary versions of the osquery agent. You’ll also have access to the Uptycs security analytics platform, which collects, aggregates, and analyzes osquery data for fleet visibility, intrusion detection, vulnerability management, incident investigation, and audit & compliance.

How Do I Ask Questions Or Extract Information From Osquery?

Using SQL (Standard Query Language), and an understanding of the osquery tables where the data you require is stored, you can construct commands that return nearly any piece of information you desire about a single endpoint where osquery is running. Here is an overview of how to construct some of the most common SQL queries for osquery. If you want to query many (or all) of your endpoints at once, you’ll need additional instrumentation and a data store for aggregation across your infrastructure.

What Are Some Basic SQL Commands I Need To Know?

Many engineers and developers have used SQL before, but still find a refresher helpful. We also hear that the way SQL is applied and optimized for osquery is slightly different than how they used or applied SQL before. For example, joining tables, using count or limit functions and filtering results are popular given the volume of fleet-wide data you’ll be working with in osquery.

Two of the most common and basic queries are select * from uptime; and select * from users limit 5;. Check out this SQL introduction for osquery to learn common queries and how to join tables, filter results, and more.

Learn more about osquery:

Osquery: What it is, how it works, and how to use it

View full post