Detecting malicious packages in repositories like PyPI: Using osquery for complete software inventory
Many systems make installing third-party software incredibly convenient; from packaging systems and well loved Linux distribution tools like Debian Apt to app stores and per-language repositories. Users are also often allowed to install browser extensions or plugins, which come from their own “store” and are just another type of software. For these reasons, and without forgetting containers, maintaining a software inventory that allows you to identify dangerous packages has become harder to do, but more critical to accomplish.
Research shows, time and time again, that threat actors will go the distance and either push malicious packages with names similar to legitimate ones, or attempt to make existing legitimate packages malicious by attacking their source code repositories or by buying them from legitimate maintainers. Supply chain attacks are difficult to defend against even by the best prepared organizations, and that’s on top of all the vulnerabilities that occur naturally during legitimate software development cycles.
In recent news, ReversingLabs discovered many such packages in the Python repository, PyPI. Packages such as libpeshnx, which itself was being installed 82 times a month, were discovered with backdoors.
About a year ago, npm suffered the event-stream incident, where an attacker impersonated the maintainer of the package to inject code in event-stream which would then detect running cryptocurrency wallets and steal the private keys, offloading them to an external server.
All repositories are amazing targets for attackers. The software they host is usually trusted, and one command away from being installed.
This timeline video represents a very small sample of events that occured in the last few years with popular software repositories:
(note: video has no sound)
Full software inventory with osquery
Everyone knows it is important to stay on top of what packages are installed where, but having a strategy for achieving that isn’t always easy.
Software inventory is not only the second of the CIS Critical Controls, it is also a part of the Identify function of the NIST Cybersecurity Framework and many others. It is only logical, since we allow all kinds of software to be executed in privileged locations, that knowing what we have is the first step to securing the environment.
By looking at the osquery schema, we can easily identify tables that will be useful for this task, which itself is broken down into three main pieces.
- Inventory of "standard" software packages on all OSes
- Inventory of software installed with 3rd party package management tools like PyPI or npm
- Tracking browser extensions and plugins
|Apps||macOS applications installed|
|Deb_packages||Debian packages installed on Linux distributions like Debian or Ubuntu|
|Portage_packages||Gentoo portage packages|
|Programs||Applications installed on Windows, typically those found in Add/Remove Programs|
|Rpm_packages||RedHat / CentOS RPM packages|
These tables all serve the same main purpose, which is to find out what software is installed using the standard package manager or package format of an operating system. While support for Windows and Mac for this is common, osquery shines by letting you perform this software inventory the same way on Ubuntu, RedHat, CentOS or even Gentoo and FreeBSD!
Make sure you gather package names and versions in a centralized system such as Uptycs, as well as the publisher and any other identifying information that could be useful. In many cases, the same publisher’s infrastructure could be hijacked to compromise multiple software, so you need a way to look for packages by name, version, and publisher. If your centralized osquery environment allows for it, store all of these tables regularly. As packages are not installed extremely frequently, good use of differential queries will preserve storage. At Uptycs, we take full advantage of the structured nature of osquery results to allow for the storage of all of this data in a way that is easy to query in the future, allowing point in time queries as well as real time queries.
To take things to another level, since some package managers support multiple repositories, there are tables like apt_sources and yum_sources to help you ensure that the repositories configured are those allowed and expected in your environment.
|chocolatey_packages||Packages installed by the popular Windows package management tool Chocolatey|
|homebrew_packages||Packages installed by the popular macOS package management tool Homebrew|
|Npm_packages||Packages installed by the Node.JS package management tool npm|
|Python_packages||Python packages such as those installed via the pip command line from the PyPI repository|
With the help of these tables, you can supplement your software inventory far beyond what is located in the main system software inventory.
Each of these tables contains important information like names and version numbers, as well as authors. Be sure to gather them regularly even if you do not believe a package manager is in use. There is no harm in gathering data from chocolatey_packages if most of your users do not use it, as the response will be empty. By doing that, you will then have inventory data for the few systems where someone did install it, even though it may not be officially supported or even allowed.
Here's an example of the information you'll want to gather for Python packages:
|Name||Package display name|
|Author||Optional package author|
|License||License under which package is launched|
|Path||Path at which this module resides|
|Directory||Directory where Python modules are located|
Browsers have virtually become “Operating Systems” (in some cases, literally.) Because most browsers allow the installation of software, usually in the form of extensions, tracking those extensions on end-user systems is critical to security hygiene.
These extensions are often exposed to all browsing sessions and have access to the Internet, a great vantage point from which to steal data, session cookies and other credentials and exfiltrate them to the Internet.
Reports of problematic extensions are common, as shown in the timeline video above. Extensions can be vulnerable, malicious, or legitimate extensions that get hijacked for malicious purposes.
|Browser_plugins||Provides details for “legacy” browser plugins for users on macOS|
|Firefox_Addons||Inventory of Firefox browser extensions, webapps and addons on macOS and Linux|
|Chrome_extensions||Inventory of Chrome extensions on all platforms|
|Safari_Extensions||Safari extension details on macOS|
|Ie_extensions||Internet Explorer extension details on Windows|
Once you have achieved this level of software inventory hygiene, you’ll be ready to take it a step further by tracking Docker containers, AWS EC2 virtual machines and images, as well as tracking the execution of every single process.
Osquery makes all of these inventory practices... well, practical. Observing and recording the data in the tables outlined above gives you a means to achieving installed packages, browsers, plugins, and extension inventories across browsers and OSes.
Related osquery resources:
Subscribe for new posts
- Building Your Cyber Security Strategy: A Step-By-Step Guide
- 8 Docker Security Best Practices To Optimize Your Container System
- Intro to Osquery: Frequently Asked Questions for Beginners
- SOC 2 Compliance Requirements: Essential Knowledge For Security Audits
- Warzone RAT comes with UAC bypass technique