• Source:JND

In a concerning revelation, several leading artificial intelligence companies — including Perplexity, Anthropic, Mistral, Cohere, and Midjourney — may have inadvertently exposed sensitive information on GitHub, according to a new report by cloud security firm Wiz. The findings suggest that around 65% of the world’s top AI companies have potential exposure risks involving proprietary model data, training processes, and API credentials.

AI Firms’ Data Reportedly Exposed on GitHub

Wiz’s researchers claim that many AI developers, while collaborating on GitHub, unknowingly publish sensitive data such as:

- API keys and access tokens

- Dataset details and metadata

- Training scripts and configuration files

- Model weights and biases

- Credentials linked to Hugging Face, Google Cloud, and ElevenLabs

Such information, if accessed by malicious actors, could provide insights into the architecture and training pipelines of proprietary AI systems — potentially leading to intellectual property theft, security breaches, or misuse of APIs.

ALSO READ: Google Photos Gets Major AI Upgrade With Nano Banana And New Ask Button

According to Wiz, this pattern was observed across numerous companies listed in Forbes’ AI 50, which includes high-profile names such as Anthropic, Perplexity, Cohere, Mistral, Suno, and World Labs.

How the Data Was Found

To identify potential exposure, Wiz’s team conducted a multi-layered analysis based on depth, coverage, and perimeter:

- Identifying employee accounts:

The researchers mapped employee identities by scanning LinkedIn for company followers, analysing GitHub metadata referencing the company, and cross-referencing contributors on platforms like Hugging Face.

- Scanning code repositories:

Once the accounts were identified, the team examined commit histories, deleted forks, workflow logs, and GitHub gists—even from private or previously deleted projects—to look for potential exposure.

- Depth search:

By tracing the full commit history of repositories, the team uncovered hidden data in code updates, branches, and test scripts that had been left unintentionally exposed.

Shockingly, Wiz reported that some exposure occurred even when companies did not have any public repositories, suggesting that individual developers’ personal accounts or forks could still leak sensitive details.

The Scale and Severity of Exposure

The report emphasised that this was not a case of a single large breach but rather a systemic issue across the AI industry.

Some examples of leaked information included:

- Model weights and bias files, which could reveal proprietary training methods.

- Google Cloud API credentials, granting potential access to internal databases or compute environments.

- Hugging Face and ElevenLabs tokens, which might allow external users to interact with private model endpoints or datasets.

Researchers note that AI companies carry greater risks due to the complex, collaborative development ecosystems they operate within. With hundreds of engineers, researchers, and third-party contributors using platforms like GitHub for collaboration purposes, the risk of accidental exposure increases exponentially.

Call for Stronger Security Practices

Wiz’s report underlined the urgent need for AI companies to adopt advanced scanning and alerting tools capable of detecting sensitive data exposure in real time.

The firm recommended:

- Automated code scanning tools to detect API keys and credentials before commits are made.

- Strict internal access controls to prevent sensitive model data from being shared publicly.

- Enhanced developer training on data hygiene and secure coding practices.

- Continuous monitoring systems to track data exposure across both public and private repositories.

Researchers highlighted how, while GitHub remains an integral component of collaborative software development, it also represents a risk for inadvertent data leakage - particularly within rapidly evolving sectors like AI.

ALSO READ: Qualcomm Could Bring Android 16 To PCs With Snapdragon X Elite And X Chipsets

Broader Implications for the AI Industry

AI companies are under increased scrutiny regarding how they handle data privacy, training materials and model transparency. Accidental leaks could compromise proprietary technologies as well as expose personal or proprietary information used for model training sessions.

Wiz didn't provide details regarding specific companies linked to each exposure, yet their findings underline the continuing challenge facing AI developers: finding a balance between open collaboration and data protection measures.

As competition within AI increases, experts warn that even minor leaks of proprietary code or model data could provide rivals or malicious actors with valuable insight into an organization's inner workings.

Conclusion

As AI firms expand development teams and open source contributions, this report serves as a reminder that even minor breaches in data security can have lasting repercussions - especially given the high value placed upon AI models and datasets as intellectual capital.

Also In News