Student Applications: How an Education Software Company Exposed Millions of Files

UpGuard can now report that a public Google Cloud Storage bucket containing approximately 1.5 terabytes of data used to administer funding programs for college students has been secured. The bucket belonged to SmarterSelect, a company that provides software for managing the application process for scholarships, grants, and awards. The more than 2.8 million files included documents like transcripts, resumes, personal essays, tax returns, and invoices for approximately 1.2 million applications to funding programs.

Discovery

On September 8 an UpGuard analyst detected the bucket hosted in Google Cloud Storage and began analysing the contents to determine the owner and affected entities. On September 15, UpGuard sent a breach notification email to the address listed on SmarterSelect’s privacy policy. On September 27, another notification was submitted through the support function on SmarterSelect’s site. On September 30, SmarterSelect replied. By October 5, public access to the bucket was removed.

Significance

The contents of the bucket were organized into nine top level directories. The directories with logos gave an indication of the number of organizations involved in this data exposure. The directories for "provider_logos" and "fund_logos" contained logos for a few hundred entities each, while “scholarship_logo” had over 15,000 logos. Two other directories contained the files with significant personal data: “file_attachment_file” and “exports,” which contained original files submitted as part of applications and exported summaries of applications, respectively.

The “exports” folder contained approximately 23,000 CSVs and 8,000 ZIP files. Inside the ZIP files were about 150,000 PDFs with dates ranging between November 5th, 2020 to September 29th, 2021. The PDF files were mostly “printed” copies of submitted applications and evaluations. The CSV files were categorized as “user,” “apps,” and “evals,” and had data about user accounts, application contents, and the evaluations of those applications by reviewers. Some of the data personally identified the evaluators, including their names, email addresses, organizations, and evaluation comments and results, but most of the information pertained to the applicants. The structures of the CSVs varied based on the data requested by each application process, making a wide variety of data points available in some but not all files. Across all the CSVs there were 1.98 million unique email addresses.

For applicants, these files contained contact information like name, email address, and phone number, as well as details probing into their lives and backgrounds, like their parents’ education and income, the students’ performance at school, and personal experiences like living in a foster home or abusive situations.

In addition to the structured data, some files also contained the text of longer documents that had been submitted and reviewed. These included intensely revealing statements like letters of recommendation and personal essays detailing poverty, physical and sexual abuse, domestic violence, and other personal information.

Another directory named “file_attachment_file” had 2.79 million files, the vast majority of the total collection. These files were organized into 1.2 million subfolders, each of which contained the original files submitted by an applicant for a given funding opportunity, often PDF and .DOCX files of applicants’ transcripts, letters of recommendation, and other academic documentation. Student photos were included when required as part of the application. Additionally, some applications included documents related to the applicant’s financial status, like FAFSA forms, which included the last four digits of the person’s social security number, and personal and parental tax returns, which included full social security numbers.

Manual review of a sampling of documents indicated they were largely the kinds of documents commonly used to apply for scholarships: transcripts, personal statement, letters of recommendation, and other documentation of university status. Searching across the names of files in “file_attachments” gave a conservative indication of the number of files of each type. (Many more documents were simply titled with the applicant’s name and could not be classified based on file name.) Other documents answered the specific requirements of particular programs and included information like proof of COVID-19 vaccinations and descriptions of hardships.

Number of files corresponding to each search term

‍

‍

‍

Invoice for five million dollar construction project

‍

‍

Application detailing personal hardships

‍

Conclusion

While Amazon S3 buckets are more well known for their history of data exposures due to cloud misconfiguration, Google Cloud Storage has fundamentally the same configuration options. Like S3, Google Cloud Storage includes a UI element in the console that indicates when buckets or files are public to help users avoid data exposures, but misconfigurations still happen. In this case, the bucket and its contents were configured to be publicly accessible.

The contents of the bucket also serve as a reminder of the risks of collecting and retaining sensitive data, particularly for populations like college students. The process of applying to, attending, and securing funding for university education requires young people to provide detailed information about themselves to a complex institutional supply chain. Even well-intentioned programs aiming to assist students who have been disadvantaged by circumstances beyond their control– in fact, especially those programs that seek to help those most in need– require a detailed accounting of the facts of one’s life.

Where all that data ultimately goes, how it is secured, and whether it is ever destroyed is not under the control of the applicants. For companies, non-profits, and universities that make up that digital supply chain, destroying that information once it is no longer needed may provide a more foolproof path to privacy than retaining and protecting it. As data exposures like this one continue to occur, and ransomware groups target schools across the world, minimizing cybersecurity risk continues to be an important part of an overall security strategy.

Protect your organization

Get in touch or book a free demo.

Contact sales

Free demo

Related breaches

Learn more about the latest issues in cybersecurity.

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

UpGuard can now report that it has secured an Elasticsearch database for AngelSense, a GPS tracker for children and adults with special needs.

UpGuard Team

January 30, 2025

Stolen Data: National PTA Database Available on Dark Web

On May 13th, UpGuard discovered a new set of data recently posted on a prominent dark web forum, this time allegedly belonging to the National Parent Teacher Association.

UpGuard Team

May 14, 2024

Student Applications: How an Education Software Company Exposed Millions of Files

UpGuard Team

November 22, 2021

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

38 million records were exposed in multiple data leaks resulting from misconfigured Microsoft Power Apps portals. Data included sensitive information such as COVID-19 contact tracing data, COVID-19 vaccination appointments, social security numbers for job applicants, employee IDs, and millions of names and email addresses.

UpGuard Team

August 23, 2021

Florida County Database Mistake: Election Officials’ Logins Among Exposed Data

UpGuard can now disclose that an Amazon S3 storage bucket containing publicly exposed backups of systems representing the intranet and web presence for Martin County, Florida has been secured.

UpGuard Team

October 30, 2020

Streamlit: The Tip of The Shadow AI Iceberg

Tens of thousands of AI-enabled web applications using the Streamlit framework are publicly available, exposing PII and other confidential data.

Greg Pollock

December 9, 2025

View all breaches

Sign up for our newsletter

UpGuard's monthly newsletter cuts through the noise and brings you what matters most: our breaking research, in-depth analysis of emerging threats, and actionable strategic insights.

Free instant security score

How secure is your organization?

Request a free cybersecurity report to discover key risks on your website, email, network, and brand.

Instant insights you can act on immediately
Hundreds of risk factors including email security, SSL, DNS health, open ports and common vulnerabilities

Free score

Seeing is believing

Defend at Machine-Speed

Student Applications: How an Education Software Company Exposed Millions of Files

UpGuard Team

Table of contents

Discovery

Significance

Conclusion

Protect your organization

Related breaches

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

Stolen Data: National PTA Database Available on Dark Web

Student Applications: How an Education Software Company Exposed Millions of Files

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

Florida County Database Mistake: Election Officials’ Logins Among Exposed Data

Streamlit: The Tip of The Shadow AI Iceberg

Sign up for our newsletter

Free instant security score

How secure is your organization?

Seeing is believing

Defend at Machine-Speed

Table of contents

Join 27,000+ cybersecurity newsletter subscribers

Discovery

Significance

Conclusion

Protect your organization

Related breaches

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

Stolen Data: National PTA Database Available on Dark Web

Student Applications: How an Education Software Company Exposed Millions of Files

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

Florida County Database Mistake: Election Officials’ Logins Among Exposed Data

Streamlit: The Tip of The Shadow AI Iceberg

Sign up for our newsletter

Free instant security score

How secure is your organization?