It’s back! osquery@scale ‘22: Risk Reduction for Modern Defenders will be happening in person at San Francisco’s Exploratorium on September 14 & 15. Join us for 2 days of captivating content, hands-on learning, and fun with your fellow osquery community members.
There is a growing and passionate community around osquery, actively sharing information and perspective, answering questions, exposing challenges and dispelling misconceptions. Even so, learning the basics as you're getting started requires a lot of piecing together bits of wisdom (ie Googling + reading + networking).
The intention of this post is to a) curate some of the great content from the community b) organize it to cover common questions for beginners c) incorporate some of what we've learned over the past three years through the Uptycs journey. If you like it, and it is helpful, let us know on Twitter and we'll create a more advanced FAQ next time around.
Osquery is a universal endpoint agent that was developed by Facebook in 2014. It is an active and growing open source project on GitHub, with 230 contributors and more than 90 releases to-date.
According to the official osquery docs, osquery (os=operating system) is an operating system instrumentation framework that exposes an operating system as a high-performance relational database. Using SQL, you can write a single query to explore any given data, regardless of operating system.
This is a unique approach in the security landscape, creating one agent for many operating systems, leveraging a standard query language instead of creating a proprietary one, and collecting rich data sets that have broad applications. Osquery represents a fundamental rethinking of the fragmented, siloed approach plaguing the security industry today.
With that said, osquery is just an agent—“an instrumentation framework” for data collection. Security teams looking to put osquery into production and leverage the data for security protocols will need to consider:
This often leads to a build vs buy analysis. See the section below, "What are some pros and cons of osquery?" for additional considerations.
Allowing an organization to craft system queries using SQL statements, osquery provides a simplified tool for security engineers that are already familiar with SQL. Primarily used to troubleshoot performance and operational issues, osquery is a flexible tool valued for its ability to be used for a variety of use cases.
Osquery lets you query machines to both preempt threats and find them, performing as an audit system, compliance tool, and an EDR.
Alongside Windows and OS X (macOS), Linux is an operating system that osquery functions as an operating system instrumentatio framework for, making low-level operating system analytics and monitoring both performant and intuitive. Osquery posits operating systems as high-performance relational databases.
Currently, osquery supports OS X (macOS), Linux, FreeBSD, and Windows. Osquery can also monitor and extract data from Docker containers. One of the most powerful features of osquery is its ability to collect and normalize relational data independent of operating system.
Because of the subtleties that exist between platforms, with other agent-based solutions users are often forced to write (and maintain) scripts to extract related information—an approach that quickly becomes a barrier to scale. Osquery solves this by exposing operating system information as normalized SQL tables. In other words, users now have the ability to ask the same questions, to get the same type of answers, regardless of operating system.
Using SQL tables to represent abstract concepts such as running processes, loaded kernel modules, open network connections, browser plugins, hardware events, and file hashes, osquery allows teams to write SQL-based queries to explore data across all operating systems and infrastructure.
As of publish date, Osquery version 3.2.6 supports 207 tables. You can view all of the supported tables and schema information, sortable by release version, at osquery.io/schema
Far richer than standard log files, here are just a few of the data elements osquery collects:
And a great deal more.
We’ve observed osquery generating an average of 110MB of data per endpoint, per day. Of course, your mileage may vary depending on the monitored assets function, and what data is being collected.
Osquery supports both an interactive query console (osqueryi) and a distributed host monitoring daemon (osqueryd) that can be used to schedule queries on a reoccurring basis. From a data storage perspective, the amount of data collected, combined with the desired retention period, will ultimately dictate your requirements. For example, osquery data can be:
A query pack is a group, or collection of queries, designed to accomplish a particular function or task. Query packs are typically configured with a specific run schedule to help avoid any potential impact to the host machine. For example, there are query packs for incident response, vulnerability management, known OS X malware, and more (you can find a full list of query packs here).
It’s important to note that while query packs will help you collect organized sets of data, simply running a query pack for compliance, for example, does not mean you are compliant— it means you’re scheduled to collect the data required to answer questions about your compliance standing. You still need to review that data and understand what it means for your particular compliance standards and goals.
Query packs are also meant to work for the majority and require the end user to ensure usefulness, determine if any pruning is required, and weigh the performance impact. Commercial offerings (like Uptycs) provide this query pack optimization and offer the analytical insight required to take action based on the information the query pack yields (ie, X machines are failing compliance check Y because of Z.)
In addition to the benefits of an open source universal approach, Trail of Bits shares that teams like osquery because it’s:
Some of osquery's cons, or areas for improvement, include requests for more extensive documentation (especially on Windows), commercially available support, and continued expansion and parity for operating systems outside of macOS and Linux. In addition, even resource-rich security teams that have deployed a fully open source/DIY solution around osquery have learned that:
For a more technical perspective on some of the current challenges of osquery, watch this video from QueryCon 2018 where Teddy Reed, Facebook Security Engineering Manager, shares what keeps him up at night with osquery.
There are a few options for getting started with osquery. Before you begin, you’ll need to consider:
Download osquery from osquery.io—you’ll find macOS, Linux, RPM, Debian, and Windows versions (you may need to customize your configuration). For example, here is a post from Joshua Brower at Defensive Depth that walks through custom MSI configs.
We’ve also created a few handy videos that walk through various osquery installations. You can see them on YouTube.
Build it: You can build an end-to-end osquery solution yourself, pairing other open source and commercially available products with custom developed functionality if time, cost, and resources aren’t limiting constraints. This post from Chris Long of Palantir provides a comprehensive playbook of how they constructed a solution for rapid incident response.
Buy it: You can sign up for a free trial of Uptycs—this will enable you to deploy the necessary versions of the osquery agent. You’ll also have access to the Uptycs security analytics platform, which collects, aggregates, and analyzes osquery data for fleet visibility, intrusion detection, vulnerability management, incident investigation, and audit & compliance.
Using SQL (Standard Query Language), and an understanding of the osquery tables where the data you require is stored, you can construct commands that return nearly any piece of information you desire about a single endpoint where osquery is running. Here is an overview of how to construct some of the most common SQL queries for osquery. If you want to query many (or all) of your endpoints at once, you’ll need additional instrumentation and a data store for aggregation across your infrastructure.
Many engineers and developers have used SQL before, but still find a refresher helpful. We also hear that the way SQL is applied and optimized for osquery is slightly different than how they used or applied SQL before. For example, joining tables, using count or limit functions and filtering results are popular given the volume of fleet-wide data you’ll be working with in osquery.
Two of the most common and basic queries are select * from uptime; and select * from users limit 5;. Check out this SQL introduction for osquery to learn common queries and how to join tables, filter results, and more.
Learn more about osquery: