Updated on April 30, 2018 by UpGuard
GitHub is a popular online code repository used by over 26 million people across the world for personal and enterprise uses. GitHub offers a way for people to collaborate on a distributed code base with powerful versioning, merging, and branching features. GitHub has become a common way to outsource the logistics of managing a code base repository so that teams can focus on the coding itself.
But as GitHub has become a de facto standard, even among software companies, it has also become a vector for data breaches— the code stored on GitHub ranges from simple student tests to proprietary corporate software worth millions of dollars. Like any server, network device, database, or other digital surface, GitHub suffers from misconfiguration.
In June of this year, a third party development firm left “development notes, raw source, internal reports on web banking code development plans, and records of telephone calls with outsourcing partners” on a public GitHub repository. This information belonged to “six big Canadian banks, two well-known American financial organizations, a multinational Japanese bank, and a multibillion dollar financial software company.” A few years ago, Uber put their database key in a public repo.
Although GitHub itself is designed to be secure, the work performed to maintain it varies from organization to organization. A single repository set to public could leak entire blueprints onto the internet. Poor user management leaves open doors to valuable data. The way GitHub is used becomes a type of business risk, one easily overlooked in a sea of servers and network gear.
GitHub and UpGuard
UpGuard can add GitHub organizations and repositories as nodes, just like a server, router, or website. Through GitHub’s API configurations are automatically assessed and recorded, creating an audit trail for repository access control and configuration over time. With policies, UpGuard can detect when an organization or repo does not comply with your requirements. This automatic and continuous validation means that enterprises can use GitHub for their projects without risking unauthorized access.
There are two ways UpGuard can look at GitHub: by organization, and by repository. These two types of nodes allow you to control important top level configurations as well as granular detail per repository.
The Organization Node
The Repository Node
What We Analyze
What are we looking at when we scan these GitHub details?
Visibility and Inventory
With your GitHub organization and repositories in UpGuard, you have total visibility into your GitHub footprint, including the configurations and permissions that keep your data private. In addition to point-in-time visibility, UpGuard also tracks GitHub configurations over time, so that repos and orgs can be compared for differences in any given timeframe.
Managing an inventory of your GitHub instances can become difficult as projects, collaborators and repositories increase. UpGuard presents all of your GitHub assets so old repositories and collaborators aren’t overlooked during normal maintenance. But UpGuard’s real strength is the ability to automate control over settings that are important to you.
When you know exactly what you have, you can measure it against what you want. UpGuard does this with policies, allowing you to set expectations for every single configuration item, so that deviance from them can be found and remediated in as short a cycle as possible.
We’ll look at some specific use cases for how UpGuard’s policies can prevent critical misconfigurations from turning into data breaches or unauthorized use of GitHub. With UpGuard in place, organizations can take advantage of GitHub’s value without exposing themselves to its risks.
Maintaining GitHub Permissions
Since UpGuard monitors organization members, teams, and outside collaborators, policies can be established to control access and to prevent misconfigured permissions or user settings from going unchecked. For example:
The goal of these policies is twofold: drastically reduce the administrative overhead necessary to manage permissions across large GitHub environments, and protect the data inside those environments from exposure and misuse.
Keeping Repositories Private
There’s one particular configuration for every GitHub repository that absolutely must be enforced if sensitive or proprietary data is being stored there. The “private” setting on a repository is what restricts access from the world at large. When “private” is set to false, the repository and its contents are visible to the internet.
The key here is that cloud resources like GitHub repos and Amazon S3 storage buckets can be made internet-accessible. To utilize these resources for corporate purposes means the risk of internet-exposure must be explicitly mitigated to protect the privacy of everyone involved.
Cloud resources offer enormous value. But it’s a mistake to assume that you can just start pushing sensitive information to them without considering the risk of data breach through that vector. The same things that make GitHub powerful— distributed use, accessibility from anywhere, collaborative support— create the possible risks it poses. But it doesn’t mean you have to throw the baby out with the bathwater to maintain privacy. You just have to implement controls so that when an error does happen, and it always does, you have a system in place to catch it before it becomes a much bigger problem.
UpGuard supports GitHub as a node because every aspect of an organization’s digital footprint contributes to their overall risk. Servers and network devices are important, and UpGuard supports them in full, but just as important are the non-traditional spaces being used to store and process sensitive information. These spaces tend to be overlooked during security hardening and process control, and become blindspots for cyber risk, one mistake away from massive data exposure.
Misconfigurations are an internal problem that emanate from within the IT infrastructure of any enterprise; no hacker is necessary for massive damage to occur to digital systems and stored data. And the problem is pervasive, with Gartner estimating anywhere from 70% to 99% of data breaches result not from external, concerted attacks, but from internal misconfiguration of the affected IT systems.