People familiar with osquery know that it’s a powerful visibility tool. What’s even better is that osquery extensions enable you to broaden its functionality to solve complex problems.
With the recent spate of vulnerabilities in Java software libraries, Uptycs developed an open source osquery extension. It adds a table in the schema to gather Java software information, which is extremely helpful when trying to answer questions about your Java-based assets.
When a new vulnerability is announced your questions need to be answered quickly: ‘Which hosts does it affect?’ ‘Which packages are we running?’ ‘Does this vulnerable file exist in any of our packages?’
At osquery@scale 2022 Uma Reddy, chief product officer, and Anadi Sharma, principal software engineer on the Uptycs osquery team, demonstrate how Uptycs responded to some recent vulnerabilities in libraries such as Log4j and the Spring Framework. They used osquery extensions to sleuth out the vulnerabilities in their own environment and share the extended functionality with the community.
Uma: My name is Uma Reddy. I'm responsible for product and engineering at Uptycs. And I have Anadi Sharma with me, who is an Uptycs software architect. He does a lot of work on the osquery side. We're going to talk about how osquery extensions can be used to solve really complex problems.
"Osquery can help you solve really complex problems."
With osquery you run a query, get data, collect it on a periodic basis, and you can do a real-time query. It's a visibility tool that maybe provides some insights. But what people don't realize is that using osquery can help you solve really complex problems—which even the security vendors you have here today really cannot accurately solve, or at least not without false positives.
So we'll give you an example of a vulnerability all of us are familiar with, which happened in December of 2021. And we’ll walk you through how osquery extensions came to the rescue and how they pinpointed exactly where you needed to pay attention—not looking through your entire file system for every place where that particular package exists, but only those that are really vulnerable. Anadi, do you want to talk about extensions a little bit?
Anadi: Thanks, Uma. Osquery extensions leverage the fact that osquery is a very extensible and configurable framework. And with extensions, you can add additional functionality. As Uma said, you can extend osquery to solve those problems that are really complex that might not always be a part of the platform.
And then there are niche problems. You want to see which classes are in a Java package, or which packages are loaded by Java processors. Those are some things that might not be a part of the actual osquery solution, and they don't make sense here either.
Extensions come to the rescue because you can very quickly deploy something that asks only the specific questions you're having and plug them into the osquery framework. Solve specific problems. In a parallel universe, you might want to know which .NET assemblies are loaded, or which JAR files are running. One advantage of extensions is that they integrate seamlessly with existing osquery queries. You can run the time query, plug it into your extension, and do amazing things with it.
Similarly, extensions are good because you can write them not only in C/C++, but also in Python and Go. So you don't have to learn C/C++ to run extensions. And then you can experiment with something that you want to do with osquery, but don't want to break it. If something crashes, only the extension process crashes. So you can do some development with the help of extensions and see what that experiment is going to look like when it becomes about osquery.
How Does an Extension Work?
Anadi: An extension basically runs as a separate process, which means if you break the system, you break the extension process—you don't break the worker process.
Execution – Started and Watched by Watcher
Anadi: You can take all the memory you want, but Watcher comes in and restarts your extension if it takes more memory or more CPU than Watcher limits permit. But it also takes advantage of everything good the worker has to offer. That happens via a remote procedure call that happens between the worker and extension processes, so you get the best of the both worlds. You get your extensions to do what you want them to do. Of course, there is a registration and then there is a RPC that's involved between the worker and the exchange process.
Anadi: to compile extensions, you need to take the open source osquery, and then you take the SDK. Open source osquery gives you the extension SDK, and then that SDK has header files and other components that become a part of your extension build process. Then when you compile your extension with the SDK components provided by open source, the Thrift IPC stubs get stuck to their binary and you don't really have to do a lot of work. You just have to write the business intelligence there and the registration function.
The registration of the classes implements the business logic and everything is taken care of behind the scenes. There's a lot of boilerplate code that you can look at and write an extension, find out what the time might be in Australia right now and write an extension. So we really like how extensions can be used to solve difficult problems. Uma is now going to talk about how we did that with the Log4J vulnerability.
"We looked at the osquery and looked at what the Log4J problem was about."
Uma: All right. I don't know how many of you remember the 10th of December, 2021. That was the day Log4J came out, at 10 a.m. in the morning, where meetings pulled out and said, "Hey, this is a problem." Now we got calls from our customers saying, "How can you help us right away?" And so we looked at...you know, everybody was on deck, and they say, "How do you solve this problem?" And, so we looked at the osquery and looked at what the Log4J problem was about.
Uma: So you look for a JAR file that has a specific version of the log4j-core JAR file. If it was 2.14 or less, you were vulnerable. And immediately, we looked at the production data, and could really see it happening because in our own production cloud our NGINX logs were constantly outputting those strings that people are using. Everybody was doing it. We were using NGINX. We were safe. There was no Log4J, so we were protected.
“Typical vulnerability management software looks at package versions and says, ‘Okay, this package is greater than this package or less than this version, so you're vulnerable.’ It was not that easy here.”
But then a lot of our customers using Apache and others were vulnerable. So, now, typical vulnerability management software looks at package versions and says, "Okay, this package is greater than this package or less than this version, so you're vulnerable." It was not that easy here. And then you just look, if you just traditional scanners, you scan your disk, can you say, "The Log4J JAR is all over the place. It is found in many, many folders. So now, how do I prioritize? Where do I fix? And how do I get started?" It's already being exploited today by threat actors.
Smart Indicators with Osquery
“We can do something clever with osquery, because with osquery, you can do very smart things. We call this smart indicators. Osquery can give you deep visibility into anything happening in the operating system.”
Uma: The first thing was you could shut your network off for some time, as long as your business would allow it so that nobody else could come in, and then take your time to solve the problem. But then we said, "Okay, here we can do something clever with osquery," because with it you can do very smart things. We call this smart indicators. Osquery can give you deep visibility into anything happening in the operating system.
So, now I can write an extension, and go in there and look at the files, the processes running, Java processes, see which of them have a JAR file open, see if any file actually contains the log4j-core JAR file, look at the metadata there and ask, what is the version of the JAR file? Is it 2.14 or less? And then you can go even further and say that, does that JAR really have the JndiLookup class?
One of the first things folks started doing was go and pull that JndiLookup class out. And in the majority of the cases, it was of no use. It was just sitting there in the class. Nobody was using network communications to use that class anyway. So they pulled it out, repackaged the JAR, redeployed the container, and they were safe. But they were still running 2.14 version of the Log4J JAR.
Differentiating Java Archive Files and Structures
Uma: Now, when our customers pull out the traditional report, the problem with traditional scanners was, if you had 10,000 hosts, you go in as fast as you can go, one by one, or in parallel, how many threads you can run, pull the data out. It took days to gather the data, and then you mostly got garbage, and developers leaving JAR files in there, which is...nobody is running any program which uses that JAR.
JAR (Java ARchive)
Uma: So what we did...we'll talk little bit about the JAR file. Then we said, "Okay, JAR. Now, there is a WAR format. There's a JAR format, WAR format, EAR format. So there are multiple formats of JAR files. Now, you have to look at each of these. A traditional scanner could never do this. So a JAR is an archive format, where you archive the different classes together along with the metadata to go with it, and create a JAR which is used in the Java program. Sorry, that is the JAR, and the next one.
WAR (Web Application Resource)
Uma: we also have the WAR files. It's essentially a collection of various JARs together and put into another file called WAR file. Now, it's not sufficient to look at a JAR file, you have to look at a WAR file.
Uma: And then you have shaded JARs, where you pull out all the classes you need, package them all together into one large file archive, and now that's your shaded JAR.
Uma: And then there is EAR files. So not just JAR, you have to look at all these four different formats of files, and look inside and see if you're vulnerable.
Uma: And the EAR is a collection of WAR files.
Structure of a JAR File
Uma: And then we looked at, you know, we have Java developers in-house, so we use Java internally. So the developer said, "Here is how it looks like. Osquery developer, look at this. This is what you would find in a JAR file. You would look at the POM file, and what you see there is that you see the version number, like, it says 2.17.2, and it says the bundle name, Apache Log4j Core. So look for that.
Embedded JARs (WARs)
Uma: You need to go in the JAR file, look at that metadata, look at that information, and return that in a table in osquery saying that when you say, "Select star from this Java Packages table, and give the WAR file or the JAR file as input," it's going to list all of the bundles there and their versions. That was one query we developed. So once the osquery developer saw that this was a format, "Sure, I will develop a query to do that." And so we developed a query to go pull that out. And all we needed to know was how a JAR file or a WAR file is structured, which our Java developers helped in writing. This is some additional detail on the metadata.
Uma: Now, so we went ahead, wrote an extension. Write an extension to look inside JAR files.
“Out of the box, there was nothing in the osquery to go look inside a JAR file. So what we did was develop an extension to determine the version and look inside and see if a JndiLookup class is present in that particular JAR file.”
Uma: So we saw that, out of the box, there was nothing in the osquery to go look inside a JAR file. So what we did was develop an extension to determine the version and look inside and see if a JndiLookup class is present in that particular JAR file. And that's all we had to do. And then deploy that extension to our collection of servers.
Uma: So here is what that extension did. It went and first enumerated all of the Java processes running, and then it formed, which of these processes had JAR, WAR, or EAR files open? It's all very simple, trivial queries and osqueries. So let's start from process, open files where EXE name calls Java, and the path ends in what EAR or JAR file. And once that happens, you got the JAR file name. So you took that and fed it into the new table we wrote, Java Packages table. So, "Here is a JAR file. Mr. Java Packages, tell me all the contents of that." So then, that was a new query we wrote osquery.
It returned a list of all the Java, you know, packages in that JAR file. So once you have that, you saw your version. Then you could run another query to go and take the subset of those JARs and say, "Does this JAR file have the JndiLookup class?" And then it came back and said, "Yes, it does, or does not." So you have a complete in-depth answer, give you a prioritized list of all of the JARs and WARs you have to take care of right away, which host has WAR open? Shut off the process, or go in and fix that particular JAR or WAR. Do that first, and then when you have time, you go in, scan your entire disk and get rid of all of the developer-left JAR and WAR files in your file systems.
Finding Vulnerabilities Across Your Fleet
Uma: So, we finally wrote one query which found all the running processes, which have WAR and JAR open? And which of those JARs and WARs have a JndiLookup class inside them? So now, with that one query, we could accurately point out where to go and fix. Now, osquery also helps in scaling this out to tens of thousands of endpoints, or even millions of endpoints. All you do is take this query, put it in a query pack, and then use your fleet management software to push this query to all of the endpoints.
“The beauty is, in 15 minutes, you could do 10 endpoints, you could do 1,000, you could do 1 million.”
Typically, you know, when you have a fleet management software, the endpoint checks in periodically every few minutes once, and you give it the configuration, and then it runs the query pack on the endpoint. You collect all that data in the cloud in the fleet manager. Then you feed it into your SIM or wherever and you immediately have a report. So the beauty is, in 15 minutes, you could do 10 endpoints, you could do 1,000, you could do 1 million.
It's a capacity of your SIM to ingest all that data and then process it. Now, there are commercial solutions out there which took days to do the same thing, because if you have 10,000 endpoints, 100,000 endpoints, going after them in parallel, scanning the entire disk, and then getting a lot of garbage back, which you really don't have to pay attention to immediately, but they didn't have a way to prioritize. So some of our customers used our way of detection, this way of detection to, kind of, prioritize. And this is fully supported in open-source osquery. All you have to do is write an extension, have a C++ developer, have someone who understood Java to help you do this. And then you understand query packs, distribute, and then report on them, as simple as that. And then, we are pleased, but this whole thing was such a moving target, 2.14, 2.15, 2.16, 2.17, to 2.17.1, 2.17.2, and then all you have to do is go and change your query.
Now, on December 14th, it was 2...I think it's 15, right? And then, so you just change it to 15 or less. Then to 16 or less. And, so it was a very quick way to change. And you, again, run it in 15 minutes, you run the query through a distributed query pack, and you got the answer back. So you're up to date within a few hours of somebody announcing a new way of how 2.16 is vulnerable. You are able to detect that.
So the perception that osquery is a tool for visibility or getting some little bit of metrics and data is people don't understand that this can stack up against the big vendors out there who are in vulnerability management or threat detection. So you build a query pack, you collect the responses and you report on it. All this could be done in 30 minutes across 10,000 nodes or 100,000 hosts. It doesn't matter the number.
Spring Shell and Spring Cloud
Uma: And come January, February, I don't know when this more vulnerability...now, Java is such a mess. Everybody, every day you're finding new things. But now we built one tool, Spring Shell, Spring Cloud. It was just a matter of hours for us to first understand what Spring Shell and Spring Cloud is about, and be able to find the vulnerabilities, how to detect these. So, once you build that tool ability to look in the Java Packages table, now you have coverage for all of Java vulnerabilities that are going to come in the future.
There are some variations. One of these required you to find the version of Java running. That's very interesting. It's not found in the command line. You need to find the version of Java, it's 8 or...I don't remember. So there are some extensions we had to make. And there was good extension to accommodate that, but basically you have a good base for Java vulnerabilities going forward if you, you know, took this extension approach and built it yourself.
And it's quick to do. You don't have to worry about the endpoint performance being impacted because you're doing this in a hurry. It's a process which runs along the side. And, you know, if there is a problem, you know, that particular process crashes or...nothing to do. Your primary osquery is not impacted in any significant way.
Uma: And then, so what we have done is we have taken this code which Anadi built, and put that extension code out in our Uptycs lab repository. You can go and click in that, download this extension code and use it, and then give us feedback on how it works. The point here we're trying to make is the model of osquery and extensions is where the power of osquery comes from. You have a process which...I mean, in extensions, you can write any code you want. So you're a software developer, you can go and find anything in the operating system and return it in a query pack back to your SIM, and then do your analysis.
Question: What would you say is the roundtrip time between when you decided, ’Maybe we can do this in osquery,’ to having a working prototype? And then what was the time between having the prototype and being able to communicate that to your customers? That is, how easy was it to roll out this extension?
Uma: Ryan, our VP of engineering, had an intern work on the Java packages table the previous summer. That's what the intern said, "Whoa—we already have the code in there." All depends on the skill of the programmer and their knowledge. Ryan tells me the intern took two to four weeks to write it.
If I had done it, it would have taken two days. But in the summer of '21, we predicted the problem, so it was a fun project for an intern.
Question: Between C++, Python, and Go SDKs for osquery, with which have you or your customers found the most success for deploying extensions or something else?
Uma: C/C++ is the most. And the other thing is if you talk about Uptycs osquery, we sometimes avoid extensions in some way because it's not just another process. So there is some pushback from customers saying, "Why do you want that another process?" So it's written very same way. It's how you compile it. You bind it in or you leave it out as a...osquery allows you to do that. You can put it into the same binary, or you can leave it out and talk over RPC. So we use the approach of binding it into one binary. But that's our solution. General purpose, you could just write your own extension, run into the process.
Question: Uma, taking the Log4J example, as you mentioned, it was very, very fast-moving, and versions, and some other aspects of the vulnerability itself had changed. How easy was it to not update your extension, but to kind of deploy this extension at scale to your customer environments? Is it like the osquery agent itself needs to be updated all the time or is it like a micro, in this case, potentially a version is an artifact, is a property that could be updated in a localized manner and so on?
Uma: So, we had to update the binary once. Osquery had to be updated once. Some customers, you distribute using Puppet and their own ways of doing it. And there are also autoupdate capabilities where you can automatically update from our cloud. But after that, the change in version was a query change. So you change your query pack. That's all you deployed again. The fundamental nature of the problem did not change that much in a month for a new binary. It was queries.
Question: When you compile in, say, Host 1 Executable, does it still have the protections you showed us in the beginning where it's a separate process that...you know, how do you deal with that situation? It sounds like it's if it's 1 Executable, and it does bad things, you're in trouble?
Uma: Yes. Correct. We're very cognizant about that. The fortunate thing is we have a large team of quality engineers, and automation tests of performance, and all of that which can run overnight across eight different operating systems, and tell us what the impact is. So that's how we were doing it. But if you didn't have those resources, extension is a good approach where you can feel a little bit safer that your core osquery is not impacted.