The UpGuard Data Breach Research team can now disclose that it has discovered, reported, and secured a storage server with exposed data belonging to the Oklahoma Department of Securities, preventing any future malicious exploitation of this data. While file size and file count are imprecise tools for gauging the significance of an exposure, they at least provide familiar yardsticks for a sense of scale, and in this case, the publicly accessible data totalled three terabytes and millions of files. The contents of those files ran the gamut from personal information to system credentials to internal documentation and communications intended for the Oklahoma Securities Commission.
The amount, and reach, of administrative and staff credentials represents a significant impact to the Oklahoma Department of Securities’ network integrity.
It is uncertain exactly how long this data store was configured for public access, but Shodan, a search engine for internet-facing IP addresses, first registered it being publicly accessible on November 30th, 2018. UpGuard analysts identified the server's potential for sensitive content on December 7 and notified Oklahoma on December 8. Public access was removed that day, preventing any further downloads by the means used by the UpGuard analysts.
By the best available measures of the files’ contents and metadata, the data was generated over decades, with the oldest data originating in 1986 and the most recent modified in 2016. The data was exposed via an unsecured rsync service at an IP address registered to the Oklahoma Office of Management and Enterprise Services, allowing any user from any IP address to download all the files stored on the server.
The Oklahoma Securities Commission is part of the state’s Department of Securities. Like the federal Securities and Exchange Commission, they ensure that individuals and corporate entities trading securities are certified to do so and follow the regulations that protect citizens from fraud. The website for the Securities Commission has an UpGuard Cyber Risk score of 171 out of 950, indicating severe risk of breach. Among the issues lowering the website’s score is the use of the web server IIS 6.0, which reached end of life in July 2015, meaning no updates to address any newly discovered vulnerabilities have been released in the last three and a half years. Of all the sites on the ok.gov domain, securities.ok.gov has the worst risk score.
In each report from the Data Breach Research team, we have to make decisions about how to present the findings to best convey their significance. In some cases, that structure is inherent in the data storage itself, as file directory or database schemas contain the organizational logic designed for the meaning of the data. In other cases, the files do not have a strong organizing logic or are heterogeneous over many different directories. For example, when there are directories for each of a business’ customers, the contents of those documents can vary widely.
In this case, the scale of the data makes it impractical to perform any kind of exhaustive documentation of the exposed information. To achieve the research team’s goal of showing how cyber risk results from misconfigurations and digital supply chains, this report will approach the dataset from two cross cutting angles: the types of digital artifacts and the types of data stored in them.
One classification method is to sort the files by file type, with the file extension providing a straightforward method for identifying file type. This method doesn’t tell us much about the significance of the data– it’s quite possible to have database dumps with no sensitive data and jpgs with protected personally identifiable information (PII)– but reviewing some of these files types help serve the research team’s goal of highlighting the risk related to different artifacts types. Awareness of the types of risk attached to different artifact types can help inform processes and procedures for handling those files to reduce cyber risk associated with their storage.
Example of file types, file count, and total byte size for the files in the "archive" directory. The total file count is much larger in part because of the many files contained in compressed formats like the five Virtual Machine Disk files second from the top of the list.
Personal Storage Table (.pst) Archives
Storing backups of email mailboxes is a common practice required by data detention policies. The contents of those backups rarely includes concentrated sensitive data, like in a user database, but over the course of thousands of emails people invariably reveal information intended to be private. Plaintext passwords, images of identification cards, tax documents, and internal strategic deliberations– like in the Facebook emails released to the public by the DCMS committee– are all commonly found in .pst files. In the case of the OK Securities Commission exposure, email backups from 1999 to 2016 were present, with the largest and most recent reaching 16GB in size.
Virtual Machine Disk Images
Sometimes the entire state of a machine needs to be stored as part of processes like employee offboarding, disaster recovery, or inventory cycling. When restored, virtual machine files can include all kinds of data. Files related to the business can include system credentials, personal information, and financial documents. Employees can also be personally exposed; people very commonly store some personal files on their work computers, and browser caches can include credentials for their personal accounts and services. The OK collection contained virtual machine backups of systems used within the Department of Securities.
While file types govern how we interact with data in digital formats, the contents of the files are what is actually sensitive. In the course of our research we have developed a data taxonomy based on the types of entities affected by breaches. In this case we found examples of the many of the types of data that might be leaked in a breach.
The rsync server contained multiple accounting, administration, and investigatory directory trees along with a few virtual machine backup drive files containing personal information. Much of the exposed information was for individuals involved in the exchange of financial securities, sometimes operating under larger organizations, and sometimes acting as individuals. The documents varied in the number of individuals and the types of information describing them.
One Microsoft Access database contained information on approximately ten thousand brokers, including their social security numbers.
A CSV with the partial name “IdentifyingInformation.csv” containing the date of birth, state of birth, country of birth, gender, height, weight, hair color, and eye color for over a hundred thousand brokers.
A database related to viators, a financial vehicle through which terminally ill patients can sell their life insurance benefits, contained information related to people with AIDS including patient names and T cell counts.
Database containing names and Social Security Numbers.
Exposed system credentials can carry the highest risk for large scale abuse. Not only can credentials be used to gather PII, but in offering access to systems themselves they may be used to modify files– for example, for the purpose of further distributing malware– or to gather information that is intentionally obscured in its storage format. Passwords should be stored in a hashed or encrypted format, but access to the systems where users input those passwords could allow attackers to intercept them in plaintext. While exposed system credentials do not immediately impinge on individuals’ privacy in the same way that exposed personal information does, they carry systemic risk that may result in secondary breaches.
VNC credentials for remote access to OK Department of Securities workstations.
A BlueExpress database of credentials for third parties submitting securities filings.
Spreadsheet of IT services with the usernames and passwords for accounts with Thawte, Symantec Protection Suite, Tivoli, and others.
Like personally identifiable information (PII), business documents can reveal more than intended about the interior of a corporate organization. Just as personal information can increase the risk of individuals being defrauded or deceived, business information can provide insight that attackers might use to fool employees by demonstrating familiarity with knowledge that only authorized persons would have. The Oklahoma rsync server contained an abundance of business information.
Training documents for personnel working on the Securities Commission.
Commissioners email histories.
Supporting files for Department of Securities investigations.
Spreadsheets documenting the timeline for investigations by the FBI and people they interviewed.
A message in one of the mailbox backups containing sensitive information
Businesses and organizations naturally accumulate stores of data, both because of the value of that data and to comply with retention policies. Creating backups is a good practice to increase resilience in the face of attacks like ransomware (think WannaCry). Backups are also necessary for migrations to ensure data can be recovered as businesses adopt newer and more secure technologies. But as this case highlights, the final crucial step is to maintain control over every copy of those data stores.
The good news is that, while the contents of the server extended over years, the known period of exposure was quite short. Thanks to the Data Breach Research team's techniques for quickly identifying risks, the exposure was identified only one week after it showed up in Shodan's catalogue of global IP addresses. Shortening the window of exposure reduces the likelihood of other parties accessing the data and enables its owners to take responsive measures before the data is used maliciously.