Building a Zero Trust Network (and where osquery fits) - GitLab’s Real Life Roadmap Recap
While a simple “zero trust” google search will return a variety of educational resources on the topic, what I valued most about the GitLab teams story was how pragmatically they break down the steps they took (and are still taking) along their Zero Trust journey. It’s also one of the only case studies showcasing a 100% cloud native organization working to implement the zero trust approach BEFORE a major security breach. Below, I’ll recap the core concepts of Kathy & Philippe’s talk, but you can also catch the full conversation in this YouTube video.
First, let's start with a quick explanation of what Zero Trust is. (or as Google calls it, BeyondCorp.)
What is Zero Trust?
Cloudflare defines Zero Trust as:
“...an IT security model that requires strict identity verification for every person and device trying to access resources on a private network, regardless of whether they are sitting within or outside of the network perimeter.”
Kathy shares some additional perspective, saying, traditional network security is heavily perimeter based. Hard on the outside, soft on the inside. When an attacker does inevitably gain access, they can move laterally, gain privileged access, and cause a lot of headaches. Not ideal. Zero Trust means the device is authenticated and authorized, the user is authenticated and authorized and decisions are risk-based and dynamic; meaning rules are enforced, ensuring that each access request takes into account the context, device used, application and data requested as well as the employee’s role in the organization. For example, HR data might be available to HR, but only when using a corporate, managed system would an HR employee be able to access salary information. The same goes for system accounts accessing APIs and databases.
Kathy goes on to describe that Zero Trust is:
- Not a product, but a process (that involves multiple products, configurations, procedures, and people)
- Not a new idea
- Not built all at once
In fact, even with corporate backing, Kathy shared that the road to building a Zero Trust Network can easily take 9-12 months. Older companies with a large IT footprint may take much longer, though any incremental progress also reduces risk accordingly and is worth it on its own. Considering this investment in time and resources, it’s helpful to understand why GitLab (and others) see value in building a Zero Trust Network. Kathy cited GitLab’s reasons as:
- Lateral movement is much harder
- Stolen credentials are less valuable
- Known vulnerabilities that are easy to exploit will be rarer
- Non-targeted attacks have less value
- GitLab has a 100% remote workforce (so people are connecting remotely from around the world)
For another in depth perspective on what Zero Trust is, check out this comprehensive review of a Zero Trust Security Model provided by Akamai Technologies.
How Did GitLab Approach the Process?
This is where the pragmatic and approachable perspective comes into play. Kathy shares that, while the timeline and process for building a Zero Trust Network is long, it can also be broken up into buckets representing independent streams of work that can be built in parallel. Taking this bucketed approach to the build out allows speed and flexibility. We’ll get to GitLab’s defined buckets in a moment, but first, take a look at the foundational policies and systems that were put in place to support their Zero Trust build out:
- Data Classification Policy: Identifies and governs what data is being stored/processed and what level of priority or sensitivity it is
- GCP Security Guidelines: Org Follows Google’s Best Practices
- Internal Acceptable Use Policy
- HRIS System: A database of who works here, what role they are in & what access they have
- Homogeneous endpoints macOS and Linux only
With that context and foundation, Kathy and team then identified their three main buckets to help “wrap their heads around where to get started.” At GitLab, those buckets became:
- Customer Data: anything that will process and store customer data and is centrally managed
- Endpoints: user/employee laptops and devices, individually managed
- Backend Infrastructure & 3rd Party SaaS: anything that does not process/store customer repo data (Slack, Zoom, Salesforce)
The buckets were then expanded into a roadmap to identify the critical components of each work path as shown in the diagram below.
Click here to skip ahead to this part of the video
Processes, Policies and Technologies
Attacking the work across these three segments also required solving three major problems which are outlined below, sharing the process, policy and technologies required:
Problem #1: Managing User Identity Access
CSO.com defines Identity and access management (IAM) as “defining and managing the roles and access privileges of individual network users and the circumstances in which users are granted (or denied) those privileges... The core objective of IAM systems is one digital identity per individual.” For GitLab, building an IAM system that worked to provide visibility into their macOS user endpoints and production Linux servers, along with the ability to handle a fast pace onboarding and offboarding of staff was critical. (Kathy shares that GitLab is on track to grow from ~200 to 1,000 employees over a two year period!) GitLab’s IAM system is required to:
- Verify Endpoint Integrity: Because they are a Linux and macOS shop, they looked deeply at Osquery and explored Uptycs and Kolide for deployment and management.
- Verify Access Level Aligns with Role: This required a constantly updated org chart database or HRIS (Human Resource Information System)
- Onboarding/offboarding of Cloud Services: Centralized SSO > Okta, Duo, Google Cloud Identity & Google Cloud Identity Secure LDAP
- Minimize Credential Theft: U2F devices > Google’s Titan Security Keys
- Enforce Data Classification Policy: DLP solution> G Suite Enterprise
Problem #2: Securing GitLab Applications
Philippe helped to build a system that educated and empowered their developers to own the security of their applications and code, acknowledging that a secure network doesn’t mean much if they’re shipping insecure applications. His goal was to relieve the security team of the burden by “shifting security left” and making it a seamless part of the developers process. Much of what they’ve built below is also a part of their product vision and offering in GitLab Defend.
Here’s the process and tools their engineers and developers use to “Trust What’s Running In Production Environments”:
- Secure GitLab: Scan every commit for security issues
- SAST (Static Application Security Testing) > testing the source code
- Dependency scanning
- Container Scanning +binauthz
- DAST (Dynamic Application Security Testing) > testing the running application
- Coming in 2019> IAST and Fuzzing
- Trust What’s Deployed: Remove humans from the process
- Ensure only trusted containers can be deployed
- Sign & Annotate Images during CI phase (Define an Attestor Policy)
- Binary Authorization (Grafeas & Kritis)
- Dynamically Manage Keys
- Google Key Management Service: Only specific users can request keys
- Secrets divided up based on Chef role
- JSON files stored and encrypted on GCS
- Access restricted by environment
- Keys are auto-rotated every 90 days
- Audit Google Cloud Identity/StackDriver logs
- Proxy combined with WAF
- Audit git actions
- Proactively Identify Compromised Accounts: Using Machine Learning and Data Analysis to identify compromise
For a refresher, here’s a helpful article describing the differences between SAST, DAST and IAST.
Problem #3: Securing GitLab Infrastructure
GitLab.com has over 2 million users that trust their sensitive data to GitLab. It’s not only a customer environment, but where GitLab itself manages their product repos. Ensuring the GitLab.com infrastructure is imperative. Using Google’s security best practices (and the Google Cloud Security Command Center) as a guidepost, here’s what their current process entails:
- Vulnerability Management: How do we ensure systems are patched in a timely manner > Tenable.io
- Asset Management & Ownership: Who owns what assets> Uptycs/ osquery
- Mitigate Abuse Activities: Stopping DDos, etc > Fastly, Cloudflare
- Blocking Lateral Movement: Using Google Virtual Private Clouds (VPC) enables compartmentalized access
- Cloud Policy Automation: Forseti Security helps enforce our cloud policies
Taking It All In
It’s easy to appreciate how much work has gone into GitLab’s Zero Trust journey and the diligence that was required to plan, evaluate, implement and refine the various components. While Kathy and Philippe have outlined several granular actions/requirements, don’t lose sight of these higher-level takeaways when considering your own Zero Trust Network journey:
- Seek Internal Buy-In/Support
- Gear up for a long-term project
- Don’t underestimate foundational policies & procedures
- Break the work into segments that can be built independently
- Educate internally
- Do this BEFORE a major breach
Check out the last few minutes of Kathy & Philippe’s presentation to hear about their lessons learned and advice for others considering a Zero Trust Network.
Subscribe for new posts
- Building Your Cyber Security Strategy: A Step-By-Step Guide
- Intro to Osquery: Frequently Asked Questions for Beginners
- Deploying Osquery at Scale: A Comprehensive List of Open Source Tools
- Osquery vs. OSSEC: Which Is Best for Linux Security in 2020?
- Windows Registry & Osquery: The Easy Way to Ensure Users are Secured