Live Webinar: 5 Must-Have CNAPP Capabilities for Hybrid Cloud Security Register Now →

Detecting Malicious Packages in Repositories Like PyPI With Osquery

Blog Author
Guillaume Ross

Many systems make installing third-party software incredibly convenient; from packaging systems and well loved Linux distribution tools like Debian Apt to app stores and per-language repositories. Users are also often allowed to install browser extensions or plugins, which come from their own “store” and are just another type of software. For these reasons, and without forgetting containers, maintaining a software inventory that allows you to identify dangerous packages has become harder to do, but more critical to accomplish.

 

Research shows, time and time again, that threat actors will go the distance and either push malicious packages with names similar to legitimate ones, or attempt to make existing legitimate packages malicious by attacking their source code repositories or by buying them from legitimate maintainers. Supply chain attacks are difficult to defend against even by the best prepared organizations, and that’s on top of all the vulnerabilities that occur naturally during legitimate software development cycles.

 

In recent news, ReversingLabs discovered many such packages in the Python repository, PyPI. Packages such as libpeshnx, which itself was being installed 82 times a month, were discovered with backdoors. 

About a year ago, npm suffered the event-stream incident, where an attacker impersonated the maintainer of the package to inject code in event-stream which would then detect running cryptocurrency wallets and steal the private keys, offloading them to an external server.

 

All repositories are amazing targets for attackers. The software they host is usually trusted, and one command away from being installed.

 

Like anything else, these repositories could also suffer from vulnerabilities, as shown by  RubyGems and Homebrew remote code execution vulnerabilities in the last two years.

 

This timeline video represents a very small sample of events that occured in the last few years with popular software repositories:

(note: video has no sound)

Full software inventory with osquery

Everyone knows it is important to stay on top of what packages are installed where, but having a strategy for achieving that isn’t always easy.

 

Software inventory is not only the second of the CIS Critical Controls, it is also a part of the Identify function of the NIST Cybersecurity Framework and many others. It is only logical, since we allow all kinds of software to be executed in privileged locations, that knowing what we have is the first step to securing the environment.

 

By looking at the osquery schema, we can easily identify tables that will be useful for this task, which itself is broken down into three main pieces.

  1. Inventory of "standard" software packages on all OSes
  2. Inventory of software installed with 3rd party package management tools like PyPI or npm
  3. Tracking browser extensions and plugins

 

Tables to Cover "Standard" Packages on all OSes
Table Name Description
Apps macOS applications installed
Deb_packages Debian packages installed on Linux distributions like Debian or Ubuntu
Pkg_packages FreeBSD packages
Portage_packages Gentoo portage packages
Programs Applications installed on Windows, typically those found in Add/Remove Programs
Rpm_packages RedHat / CentOS RPM packages

 

These tables all serve the same main purpose, which is to find out what software is installed using the standard package manager or package format of an operating system. While support for Windows and Mac for this is common, osquery shines by letting you perform this software inventory the same way on Ubuntu, RedHat, CentOS or even Gentoo and FreeBSD!

 

Make sure you gather package names and versions in a centralized system such as Uptycs, as well as the publisher and any other identifying information that could be useful. In many cases, the same publisher’s infrastructure could be hijacked to compromise multiple software, so you need a way to look for packages by name, version, and publisher. If your centralized osquery environment allows for it, store all of these tables regularly. As packages are not installed extremely frequently, good use of differential queries will preserve storage. At Uptycs, we take full advantage of the structured nature of osquery results to allow for the storage of all of this data in a way that is easy to query in the future, allowing point in time queries as well as real time queries.

 

To take things to another level, since some package managers support multiple repositories, there are tables like apt_sources and yum_sources to help you ensure that the repositories configured are those allowed and expected in your environment.

Tables to Track Packages from 3rd Party Package Mgmt Tools
Table Name Description
chocolatey_packages   Packages installed by the popular Windows package management tool Chocolatey
homebrew_packages Packages installed by the popular macOS package management tool Homebrew
Npm_packages Packages installed by the Node.JS package management tool npm
Python_packages Python packages such as those installed via the pip command line from the PyPI repository

 

With the help of these tables, you can supplement your software inventory far beyond what is located in the main system software inventory.

 

Each of these tables contains important information like names and version numbers, as well as authors. Be sure to gather them regularly even if you do not believe a package manager is in use. There is no harm in gathering data from chocolatey_packages if most of your users do not use it, as the response will be empty. By doing that, you will then have inventory data for the few systems where someone did install it, even though it may not be officially supported or even allowed.

 

Here's an example of the information you'll want to gather for Python packages:

 

Python Packages Table
Field Description
Name Package display name
Version Package-supplied version
Summary Package-supplied summary
Author Optional package author
License License under which package is launched
Path Path at which this module resides
Directory Directory where Python modules are located


Browsers have virtually become “Operating Systems” (in some cases, literally.) Because most browsers allow the installation of software, usually in the form of extensions, tracking those extensions on end-user systems is critical to security hygiene.

 

These extensions are often exposed to all browsing sessions and have access to the Internet, a great vantage point from which to steal data, session cookies and other credentials and exfiltrate them to the Internet.

 

Reports of problematic extensions are common, as shown in the timeline video above. Extensions can be vulnerable, malicious, or legitimate extensions that get hijacked for malicious purposes.

 

Tables to Track Browser Extensions and Plugins
Table Name Description
Browser_plugins Provides details for “legacy” browser plugins for users on macOS
Firefox_Addons Inventory of Firefox browser extensions, webapps and addons on macOS and Linux
Chrome_extensions Inventory of Chrome extensions on all platforms
Safari_Extensions Safari extension details on macOS
Ie_extensions Internet Explorer extension details on Windows

 

Once you have achieved this level of software inventory hygiene, you’ll be ready to take it a step further by tracking Docker containers, AWS EC2 virtual machines and images, as well as tracking the execution of every single process.

 

Osquery makes all of these inventory practices... well, practical. Observing and recording the data in the tables outlined above gives you a means to achieving installed packages, browsers, plugins, and extension inventories across browsers and OSes.

 

Related osquery resources: