Intro to Osquery: Frequently Asked Questions for Beginners

Posted by Amber Picotte on 7/17/18 6:52 AM
Amber Picotte
Find me on:

There is a growing and passionate community around osquery, actively sharing information and perspective, answering questions, exposing challenges and dispelling misconceptions. Even so, learning the basics as you're getting started requires a lot of piecing together bits of wisdom (ie Googling + Reading + Networking). The intention of this post is to a) curate some of the great content from the community b) organize it to cover common questions for beginners c) incorporate some of what we've learned over the past three years through the Uptycs journey. If you like it, and it is helpful, throw a comment down below or let us know on Twitter and we'll create a more advanced FAQ next time around. 

Intro to Osquery for Blog

What is osquery?

Osquery is a universal endpoint agent that was developed by Facebook in 2014. It is an active and growing open source project on GitHub, with 230 contributors and over 90 releases to-date.

According to the official osquery docs, osquery (os=operating system) is an operating system instrumentation framework that exposes an operating system as a high-performance relational database. Using SQL, you can write a single query to explore any given data, regardless of operating system.

This is a unique approach in the security landscape, creating one agent for many operating systems, leveraging a standard query language instead of creating a proprietary one, and collecting rich data sets which have broad applications. Osquery represents a fundamental rethinking of the fragmented, siloed approach plaguing the security industry today.  

With that said, osquery is just an agent - “an instrumentation framework” for data collection. Security teams looking to put osquery into production and leverage the data for security protocols will need to consider:

  1. How you’ll configure, deploy, and manage the agent
  2. How you’ll manage query packs (more on these below) and schedules as the community adds more
  3. Where you’ll store osquery data (and how much it will cost)
  4. How you’ll analyze the data - i.e., what problems are you looking to solve? What questions do you need to ask?
  5. How you’ll handle suspicious activity that requires further investigation or remediation
  6. Whether you need any integrations with existing tooling
  7. How you’ll troubleshoot production issues and develop any custom functionality you may need

This often leads to a build vs buy analysis. See the section below, "what are osquery pros and cons?" for additional considerations.  

What operating systems does it support?

Currently, osquery supports OS X (macOS), Linux, FreeBSD, and Windows. Osquery can also monitor and extract data from Docker containers. One of the most powerful features of osquery is its ability to collect and normalize relational data independent of operating system. Because of the subtleties that exist between platforms, with other agent based solutions, users are often forced to write (and maintain) scripts to extract related information - an approach that quickly becomes a barrier to scale. Osquery solves this by exposing operating system information as normalized SQL tables. In other words, users now have the ability to ask the same questions, to get the same type of answers, regardless of operating system.  

What tables does osquery support?

As of publish date, Osquery version 3.2.6 supports 207 tables. You can view all of the supported tables and schema information, sortable by release version here at osquery.io/schema

What type of data/information can I get from osquery?

Far richer than standard log files, here are just a few of the data elements osquery collects:

  • running processes,
  • user logins,
  • loaded kernel modules,
  • open network connections,
  • browser plugins,
  • hardware events,
  • file hashes,
  • sockets,
  • mounts,
  • ports,
  • storage volumes,
  • packages,
  • and a great deal more

How much data is osquery collecting on average per endpoint each day? Where is this data stored?

We’ve observed osquery generating an average of 110mb of data per endpoint, per day. Of course, your mileage may vary depending on the monitored assets function, and what data is being collected. 

Osquery supports both an interactive query console (osqueryi) and a distributed host monitoring daemon (osqueryd) that can be used to schedule queries on a reoccurring basis. From a data storage perspective, the amount of data collected, combined with the desired retention period will ultimately dictate your requirements. For example, osquery data can be:

1) not stored anywhere i.e. queried only in real-time via osqueryi - the osquery interactive query console/shell

2) stored in a SIEM (Splunk, ELK stack, etc.)

3) aggregated in a security analytics platform, like Uptycs, for integrated threat intel, correlations, anomaly detection, and more

What is a query pack?

A query pack is a group, or collection of queries, designed to accomplish a particular function or task. Query packs are typically configured with a specific run schedule to help avoid any potential impact to the host machine. For example, there are query packs for incident response, vulnerability management, known OS X malware, and more (you can find a full list here). It’s important to note that while query packs will help you collect organized sets of data, simply running a query pack for compliance, for example, does not mean you are compliant -- it means you’re scheduled to collect the data required to answer questions about your compliance standing. You still need to review that data and understand what it means for your particular compliance standards and goals. Query packs are also meant to work for the majority and require the end user to ensure usefulness, determine if any pruning is required, and weigh the performance impact. Commercial offerings (like Uptycs) provide this query pack optimization and provide the analytical insight required to take action based on the information the query pack yields (ie, X machines are failing compliance check Y because of Z.)    

What are some pros and cons of osquery?

In addition to the benefits of an open source universal approach, Trail of Bits shares that teams like osquery because it’s:

  • simpler to use,
  • more customizable, and
  • exposes them to new endpoint data to which they never before had access.


Some of the cons, or areas for improvement, include requests for more extensive documentation (especially on Windows), commercially available support, and continued expansion and parody for operating systems outside of macOS and Linux. In addition, even resource-rich security teams who’ve deployed a fully open source/DIY solution around osquery have learned that:

1) Cost of data storage can be high - for example, Elastic is 3x the cost of the bytes transmitted from osquery. On average, 110mb are transmitted from each endpoint per day.  

2) Translating the incremental data is hard - making sense of the information forScales 2 vulnerability management, threat investigation, compliance and audits, etc. is quite complicated and requires more heavy lifting than initially anticipated.

3) Optimizing queries and query packs is critical - building your own query packs can have an impact on computing resources. Often times, a query will pull far more system data than anticipated and this can cause systems to crash, yet testing and optimization before deployment is taxing, and in some cases, not possible.

4) Third party data is still needed - For threat or intrusion detection, integrated third-party data sets are still needed. 

For a more technical perspective on some of the current challenges, watch this video from QueryCon 18 where Teddy Reed, Facebook Security Engineering Manager, shares what keeps him up at night with osquery.

How do I get started and install osquery?

There are a few options for getting started with osquery. Before you begin, you’ll need to consider:

  • How you’ll configure, deploy, and manage the agent
  • How you’ll manage query packs and schedules as the community adds more
  • Where you’ll store osquery data (and how much it will cost)
  • How you’ll analyze the data - i.e., what problems are you looking to solve?
  • How you’ll handle suspicious activity that requires further investigation or remediation
  • Whether you need any integrations with existing tooling
  • How you’ll troubleshoot production issues and develop any custom functionality you may need

 

Deploying the Agent:

Download from osquery.io - you’ll find macOS, Linux, RPM, Debian and Windows versions (you may find you need to customize your configuration). For example, here is a post from Joshua Brower at Defensive Depth that walks through Custom MSI Configs. We’ve created a few handy videos that walkthrough various installations over on YouTube.

Everything Else:

Build It: You can build an end-to-end solution yourself, pairing other open source and commercially available products with some custom developed functionality if time, cost, and resources aren’t limiting constraints. This post from Chris Long of Palantir provides a comprehensive playbook of how they constructed a solution for rapid incident response.

Buy It: You can sign up for a Free Trial of Uptycs - this will enable you to deploy the necessary versions of the osquery agent. You’ll also have access to the Uptycs security analytics platform which collects, aggregates, and analyzes osquery data for fleet visibility, intrusion detection, vulnerability management, incident investigation, and audit & compliance.

Build vs Buy meme

How do I ask questions or extract information from osquery?

Using SQL (Standard Query Language), and an understanding of the osquery tables where the data you require is stored, you can construct commands that return nearly any piece of information you desire about a single endpoint where osquery is running. Here is an overview of how to construct some of the most common SQL queries for osquery. If you want to query many (or all) of your endpoints at once, you’ll need additional instrumentation and a data store for aggregation across your infrastructure.

What are some basic SQL commands I need to know?

Many engineers and developers have used SQL before, but still find a refresher helpful. We also hear that the way SQL is applied and optimized for osquery is slightly different than how they used or applied SQL before. For example, joining tables, using count or limit functions and filtering results are popular given the volume of fleet wide data you’ll be working with in osquery.

Two of the most common and basic queries are select * from uptime; and select * from users limit 5; check out this post on SQL introduction for osquery to learn more common queries and how to join tables, filter results and more.

Have another question we can help answer? Know of a great resource we should include? Let us know in the comments below.

Topics: osquery

Uptycs Blog | Cloud Security Trends and Analysis

Welcome! The Uptycs blog is for security professionals and osquery enthusiasts interested in exploring new ideas in cloud security. We hope you'll enjoy our blog enough to subscribe, share and comment.

Find Uptycs Everywhere

Subscribe for New Posts

Recommended Reads