Osquery in action: Where and when to apply "threat intel"
So, what does threat intelligence mean? Ask a variety of people, and they will give you a variety of responses -- IOCs, IOAs, File Hashes, Signatures, Bad IPs, Bad Domains, C2 servers . . . Most of what people consider “Threat Intel” are lists of artifacts shared information security companies, government agencies, or other entities trying to protect customers, citizens, or organizations from various threats on the internet.
Categories of threat artifacts
The artifacts tend to fall into several different categories. I “grew up” with the original use of the term "Indicators of Compromise" coined at Mandiant and worked a lot with the OpenIOC community, so that's mainly the way I think of Threat Intelligence. In these cases, IOCs were collections of indicators that could be logically grouped and evaluated, and if the grouping (when compared against things in your organization) evaluated to "true," you had something which was worthy of further investigation. Groupings of terms that have to be evaluated together are what I call "Complex" or “Classic” indicators, because that's what an IOC was before the term became hugely overloaded in the security marketing space. However, you could have an IOC with only one entry in it -- it just wasn't usually a very good IOC.
Simple artifacts in isolation (the "one entry") are what I call "Atomic" Indicators -- these are often things that are easy to match, but carry little context -- and while some of them can be very specific and never "wrong" (like an MD5 file hash), they can be "brittle" (which means that they are easy for an attacker to change). These are things like IP addresses, file hashes, other file signatures, malware and botnet C2 domains -- and often come in very large lists, as this approach is trying to catalog "all the bad things we have ever seen in this very narrow context." These lists are huge, hard to maintain, and nearly impossible to send all of to an endpoint with any sort of real efficiency.
Without having a staff of intelligence analysts (a position that is hard to staff and retain), maintaining a large body of IOCs is not an easy task. Most IOCs are for threats that are “already known,” and unless you have analysts cataloging and vetting IOCs, you are blindly placing your trust in the organization publishing them, which may be republishing from somewhere else, and so on. It is not common, but there are documented cases where what should be “reputable sources” have published parts of internet infrastructure that are seen everywhere (such as Google’s name servers) because they were included in activities that malware conducted, and that was not caught in the creation of the initial report.
In most cases, you have a better resource for protecting your network than IOC feeds -- your own staff! You know your environment, you know what belongs versus things that are suspicious, you know what things you see regularly, and what things you don’t. Quickly spotting an anomaly on your network and knowing it should not be there, even if you don’t know what it is, is often way more valuable than an out of context IOC hit. However, this doesn’t work if you don’t have a way to look for those anomalies at scale.
Threat intelligence at scale
This is where we move to talking about osquery. By itself, osquery is a really neat project that allows you to virtualize an endpoint as if it were a SQL database of information, instead of having to run and remember hundreds of different system utilities. You can ask questions with queries, and schedule questions with query packs. However, what you really need is a way to deploy and manage osquery at scale. This involves not only osquery, with it’s ability to ask questions and get answers, but some type of endpoint manager to communicate with the endpoints. You’ll also need a data bus to transfer data to processing and storage, and a data store (or many) to keep data from osquery long term for further questions as well as current and historical analysis. A properly configured and managed osquery system is a great way to establish the baselines of what should be on your network, so you can look for those anomalies we talked about.
So, let’s put this all together. With a proper osquery deployment, you have endpoints running osquery, an endpoint manager, and a data bus and data storage. For the sake of this exercise, with Threat Intelligence, you have “Complex Indicators” (a set of terms that logically evaluate to true if there is something bad or suspicious on a system), “Atomic Indicators” (single terms that evaluate to true if there is something bad or suspicious), and baselines and anomalies that you find on your own systems, regardless of IOCs.
Query Packs in osquery allow you to ask questions. Those questions can be as simple a set of SQL as
SELECT * FROM processes
This would give you back a list of processes, and asking this question regularly (and many more) is how osquery establishes a baseline and historical data for your organization’s endpoints.
However, the SQL could also look something like this:
SELECT * FROM file
WHERE path = ‘/Applications/Firefox.app/Contents/Resources/script’
This is basically the same as a Complex IOC -- if malware has written one of the files above, there is likely a problem with that system, and you should investigate. If there is nothing there, the statement will return nothing, and you don’t have to worry about this particular threat.
You don’t want to push a lot of these to an endpoint, but sometimes you have to take this approach with osquery. There are some tables that require specific paths or keys, especially requests involving files (because a query that tried to retrieve the entire filesystem would be very problematic).
So, now we can run Complex IOCs as query packs on endpoints. Now on to figuring out how we do the Atomic Indicators -- lists of “bad” IP addresses can number in the thousands, and “bad domains” can number in the tens or hundreds of thousands, and if you could aggregate all of them, there would probably be even more. How in the world can you check for these? There’s no way to pass that to an endpoint and maintain it efficiently.
However, with a proper osquery “at scale” deployment, we don’t have to. When data is being written up to the cloud (or your server cluster), you have the ability to do stream processing while it is being passed on the data bus to the storage mechanism. One example of something like this is an AWS Lambda function, but you don’t have to use AWS or any specific provider. You can examine data as it passes by on the bus, and create alerts based off of matching conditions (like, say, an Atomic IOC -- it’s very easy to compare an IP address or domain name to a matching field in the data stream at high speed). “In the cloud” (or your datacenter) is a great place to put large lists of indicators -- you have a lot more processing power, storage, RAM, etc, and you also keep your indicators to yourself -- you are not exposing them by pushing them down to the endpoint. And by processing on the stream, you can alert in almost real time to things that require rapid reactions.
We haven’t even begun to get to the cool part. If you are properly storing data from osquery endpoints and you aggregate that all together in a data storage system, so that you can also query over time or historically, there is an amazing amount of things you can do with that data. You can create baselines for your enterprise by looking at the data you have collected, but also see outliers at the same time -- or new ones when they occur. If you do have an anomaly, or get an IOC hit, you can start cross-referencing data on that host, and then start looking backward in time to other events or anomalies that might be related on that host or any of your other hosts. But unlike a SIEM, at any point in your investigation, you can go back and use osquery to check an endpoint in real time to find out its current status as well.
So, we’ve looked at how there are several types of Threat Intel, and how they can be harvested from many different sources. If you don’t have a dedicated staff to manage IOCs, your most important “Intel” likely comes from within -- you want to make sure you catch anomalies and outliers on your network that are going to be unique to your environment as well as looking for IOCs of things that others have seen before. If you do this with the right style of osquery deployment, you have several injection points for this Threat Intel, the flexibility to easily write your own in SQL and deploy them quickly, and the ability to do real time and historical investigation with the same system.
I hope this has been useful -- if you want, you can see the video version of this that I recently gave at the MacDevOps Vancouver event below. And, if you have any questions about how to efficiently deploy and benefit from osquery at scale with Uptycs, don’t hesitate to drop us a line or book a demo.
Related osquery resources: