Veritone AI: How Two Open Servers Exposed US Government Data

UpGuard Team
UpGuard Team
Published Apr 30, 2024

Scope

  • Company: Veritone, Inc. (VERI on NASDAQ)
  • Company HQ: Irvine, CA
  • Industry: Technology, AI
  • Data Exposed: ~550GB in 1.664 billion documents
  • Data Types: Audio, video, and image media, police body camera footage, FOIA requests and related documents, plain text employee credentials, system logs with authorization tokens, employee and client PII, AI training data
  • Impact: Veritone, Inc. employees, US Department of Homeland Security, US Department of Veterans Affairs, US Federal Reserve, US police forces
  • Exposure Vector: Elasticsearch Database
  • Provider: Microsoft Azure Government Cloud
  • Azure Data Region: US Gov Virginia
  • Veritone, Inc. Fedramp Authorized: 03/14/2019

The White House recently announced plans to regulate governmental use of artificial intelligence, with restrictions to take effect on December 1st, 2024. Despite regulations only being discussed now, “the US government has been using AI in some form for years, but it's becoming more difficult to know how — and why.” On March 23rd, UpGuard discovered that one major provider of governmental AI technology, Veritone Inc., exposed approximately 550GB of internal and client data on two separate unprotected Elasticsearch servers. The exposed data included Veritone employee data and credentials, internal system logs, AI training data, and client data from US government organizations such as the Department of Homeland Security and Veterans Affairs. Among the exposed governmental data were documents, videos, and images related to Freedom of Information Act (FOIA) requests and police body camera videos. As of March 30th, UpGuard confirmed that Veritone had secured the exposed data, which is no longer publicly accessible.

Veritone provides artificial intelligence-based services across several industries aside from government, including law, energy, and entertainment. A significant portion of the services Veritone provides for government and police agencies involves automatically redacting sensitive information from documents, analyzing facial recognition data (referred to as identifying suspects), and processing audio and video surveillance data to find insights, keywords, and types of images. Veritone recently launched its aiWARE software on the Microsoft Azure Government cloud, meeting the compliance requirements to allow even more government agencies to use their technology.

Timeline

On March 23rd, 2024, UpGuard research analysts discovered the first of two open Elasticsearch servers hosted on the Microsoft Azure Government Cloud. This server hosted approximately 162GB of data across 464 million documents. The next day, March 24th, the second server was located. It held 390GB across over 1.2 billion documents. These servers did not require or ask for any credentials but rather provided anonymous access to anyone on the internet. According to DNS, these servers belonged to the veritone.com domain. A sample analysis of the data, containing internal employee details and system logs, corroborated the ownership.

UpGuard contacted Veritone on the day of the second discovery, March 24th, informing them of the data exposure. Veritone responded to this notification on March 26th, suggesting a third-party bug bounty program from inspectiv.com. An UpGuard researcher contacted Inspectiv and informed them of the data exposure. Inspectiv then contacted Veritone to confirm the exposure. Veritone secured the  Elastic servers on March 30th and the data is no longer publicly accessible.

Breach Vector

Elasticsearch is a widely used technology for many sectors and an important search engine to quickly manage large datasets. By misconfiguring the two Elasticsearch servers not to require authentication, they exposed their data to the open internet for the duration they were configured this way. 

Elasticsearch does support required authentication but must be configured to employ it. The misconfiguration of this one setting can render all other protections and data security moot. 

Elastic posted about this on their blog over four years ago. Despite that, Elastic servers continue to be exposed, such as this StoreHub data leak that exposed over a million records or this exposure involving the personal data of nearly every person in Brazil.

general information image showing cloud provider, region, and service
Cloud information showing one of the systems in the “usgovvirginia” region

Microsoft Azure offers its government cloud option for “US government agencies or their partners interested in cloud services that meet government security and compliance requirements.” Azure Government operates in three regions: Arizona, Texas, and Virginia. The exposed Elasticsearch servers belonged to the Virginia region. Clients of Azure Government can acquire different levels of security and functionality depending on their needs, but the protections configured in these instances of the government cloud did not prevent the exposure of the Elasticsearch data.

Contents

Internal Data

screenshot of internal employee data
Screenshot of internal employee data

The exposed dataset contained sensitive information about Veritone resources and users, such as Azure spending details, employee full names, usernames, and email addresses. Internal credentials also appear in the exposed logs, such as application tokens and, in some cases, plain text passwords.

The unauthorized use of these credentials would grant a threat actor whatever level of access the exposed accounts held, possibly exposing additional sensitive data to a malicious third party.

screenshot of plain text password discovered in the database
Screenshot of plain text password discovered in the database

Beyond operational data, AI training datasets hosted on the exposed servers included metadata such as score, source, and timestamp. These materials train AI software such as Veritone’s aiWARE to handle their clients' production data.

screenshot of administrative user details and tokens
Screenshot of administrative user details and token
screenshot of data being used to train Veritone's "Alware"
Screenshot of data being used to train Veritone’s “AIware”

Client Data

More importantly, the misconfigured Elastic servers hosted Veritone client data, including that belonging to the US government. System logs contained government personnel details such as organization names, usernames, email addresses, full names, and even IP addresses and system details pulled from the client browser or application. Affected agencies included Veteran Affairs, the Department of Homeland Security, and the Federal Reserve.

screenshot of email and details for the DHS Inspector General Office
Screenshot of email and details for the DHS Inspector General office
usernames and access tokens, along with Veritone ID and server information
Usernames and access tokens, along with Veritone ID and server information

Information requests to the Office of Veterans Affairs showed requestors' identities and links to relevant audio and video media. This media appeared to be publicly accessible as well, but due to the sensitive nature of the files, UpGuard did not attempt to access them.

Screenshot of details for VA employees and links to video footage
Screenshot of details for VA employees and links to video footage.

Freedom of Information Act (FOIA) requests revealed the requesters’ identities, operational timelines, names, descriptions of relevant media, and links to the media itself.

Screenshot of FOIA media URLs and VA user details
Screenshot of FOIA media URLs and VA user details
FOIA request data
Further FOIA request data, including client IPs and usernames from Veterans Affairs

Veritone’s client data also included many references to police body camera media, including full links and descriptions. Some body camera footage may be public, but it is unclear if all the linked videos were intended for public exposure. That would likely depend on whether Vertione’s AI software is given access to police videos that have not been released to the public.

screenshot of exposed body cam data
Screenshot of body cam links and details

With over a billion and a half documents between the two exposed systems, each type of exposure has many instances. UpGuard gathered the details listed in this report from a small sampling of the available data.

Ramifications

What we have become accustomed to call “artificial intelligence” relies on concatenating pieces of an enormous dataset with a complex algorithm and detailed data tagging. In Veritone’s case, once the model is trained, it then must access the enormous production dataset of a client in order to provide its insight. Because AI technologies often require massive databases full of whatever information they are analyzing, both the likelihood and impact of a data exposure rapidly increase.

Veritone promotes itself as being the “first multi-cloud AI platform provider approved for use across the entire US Department of Justice.” They have been granted authority to support the Microsoft Azure Government Cloud. Veritone states that “earning this authorization required the company to undergo a stringent security audit.” Misconfigurations can slip through audits focused on security because they are an operational problem. Security can’t prevent a misconfigured system from exposing data, as the means of accessing exposed records is identical to legitimate access.

Operational tasks such as spinning up an Elastic server should have controls in place to ensure that the server is not publicly accessible. These controls could include automated checking of the actual Elastic configuration, limiting connections to only authorized IP addresses, or putting the Elastic server on an internal network and requiring a VPN connection to reach it.

After reporting their 2023 financials, Veritone assured investors that “management has taken significant steps to realign the organization and reduce costs.” Veritone’s restructuring is “expected to result in annualized savings of over 15% in operating expenses.” This cut in operating costs comes as a reaction to “a significant decrease in revenue and an increased net loss” in 2023.

When public and government services rely on the private sector to perform their duties, they entangle themselves in the contemporary business paradigm of constant and infinite financial growth that drives big companies and their investors. An extreme example of this is Boeing’s current struggle with failing products, failures that result in a loss of human life. Even Boeing’s leaders “are tepidly admitting that this shareholders-first, cut-costs, workers-be-damned strategy was flawed.” 

That same ideology, when applied to datasets that are being used to determine the identities of criminal suspects, handle domestic surveillance data and automatically redact sensitive governmental documents, could produce similarly devastating results.

Conclusion

This is not the first AI-related data breach. In 2023, Microsoft AI researchers accidentally exposed 38 terabytes of sensitive data. OpenAI’s ChatGPT had a data breach where some users were able to access the details of other users. Furthermore, over 100,000 sets of ChatGPT credentials were offered for sale on the dark web.

Centralized data stores that rely on third-party platforms become opportune targets not just for malicious activities but for mistakes that leave data exposed. The integration of AI into some of our most sensitive and controversial governmental and police practices raises the stakes for these exposures as the information becomes more valuable and potentially dangerous.

The increased risk of these technologies should bring a responsibility to protect the individuals whose data is being collected and stored, often without their knowledge or consent. Companies can’t reinvent every wheel, so they must rely on third-party solutions in their workflows. Understanding these solutions and ensuring that they are delivered securely should hold equal importance to the functionality they provide.

Is your organization at risk of a data breach? Collect a FREE snapshot of your security score to find out >

UpGuard customer support teamUpGuard customer support teamUpGuard customer support team

Protect your organization

Get in touch or book a free demo.

Related breaches

Learn more about the latest issues in cybersecurity.
Deliver icon

Sign up for our newsletter

Stay up-to-date on everything UpGuard with our monthly newsletter, full of product updates, company highlights, free cybersecurity resources, and more.
Free instant security score

How secure is your organization?

Request a free cybersecurity report to discover key risks on your website, email, network, and brand.
  • Check icon
    Instant insights you can act on immediately
  • Check icon
    Hundreds of risk factors including email security, SSL, DNS health, open ports and common vulnerabilities
Website Security scan resultsWebsite Security scan rating