DeepSeek’s dataset might have suffered public exposure, claimed a cybersecurity research firm. As per a report, a publicly accessible ClickHouse database belonging to DeepSeek was discovered which allowed full control over its database operations. Additionally, the exposure is also said to contain a large volume of sensitive information including chat history, secret keys, log times, and backend details. It is unclear whether the firm reported the matter to the Chinese AI firm, and if the exposed dataset has been taken down.
DeepSeek’s Dataset Might Have Suffered a Breach
In a blog post, cybersecurity firm Wiz Research revealed that it found a completely open and unauthenticated dataset that contained highly sensitive information about the DeepSeek platform. The exposed information is said to pose a potential risk to both the AI firm as well as the end users.
The cybersecurity firm claimed that it intended to assess DeepSeek’s external security to identify any potential vulnerabilities, given the rising popularity of the AI platform. The researchers started by mapping any Internet-facing subdomains but did not find anything that could suggest a high-risk exposure.
However, after implementing new techniques, the researchers were able to detect two open ports (8123 and 9000) associated with multiple public hosts. Wiz Research claimed that these ports led them to a publicly exposed ClickHouse database which could be accessed without any authentication.
Notably, ClickHouse is an open-source, columnar database management system developed by Yandex. It is used for fast analytical queries and is often used by ethical hackers to scan the dark web for exposed data.
A log stream table in the dataset is claimed to contain more than one million log entries including timestamps with logs from January 6, references to multiple internal DeepSeek application programming interface (API) endpoints, as well as chat history, API Keys, backend details, and operational metadata in plain-text.
The researchers claimed that with this level of information, a bad actor could potentially exfiltrate passwords, local files, and proprietary information directly from the server. At the time of writing this, there was no update on whether this data exposure can be contained and whether the dataset can be taken offline.