The New York Times source code leaked by a 4chan user

A user on the online forum 4chan has leaked a massive 270GB of data belonging to The New York Times. This leak includes the source code for the newspaper’s digital operations.

The user who posted the data claimed that The New York Times has over 5,000 source code repositories, less than 30 of which are encrypted. The leaked data is said to contain approximately 3,600,000 files in total.

Initially, those attempting to download the data via torrent were reportedly stuck at 85% and unable to complete the download.

A few hours later, the person responsible for the leak provided another torrent link, which separated the files as opposed to making them a single archive. This new torrent has three folders called nytimes, nytm, and TheAthletic. The two files inside the TheAthletic folder are iOS.tar and android.tar; we verified that they reflect the source code of The Athletics mobile apps.

Because of this new torrent, it was also possible to create a list of the repositories in the entire archive, which has over 6,200 repositories in total.

...
repos/nytimes/nyt-brand-site
repos/nytimes/nyt-cancel
repos/nytimes/nyt-cancel-bff
repos/nytimes/nyt-cancel-bff-old
repos/nytimes/nyt-cancel-e2e
repos/nytimes/nyt-cheese
repos/nytimes/nyt-chrome-env-switcher
repos/nytimes/nyt-chrono
repos/nytimes/nyt-classified-mac-app-signing
...

Here are some other findings we can confirm:

The leak does have the original source code of the game Wordle, which the Times acquired in 2022.
The leak includes a WordPress database of 1,500 NY Times Education site users. The database contains names and surnames, email addresses, and hashed passwords.
Several folders contain internal communications from Slack channels.
Many exposed authentication methods exist, including authentication URLs and their respective passwords, secret keys, and API tokens. The majority are well protected, but plenty of such secrets need immediate attention. We have also seen private user keys used for authentication.
There are a lot of details about internal architecture from a software development point of view.

So far, it is difficult to say whether the NY Times will need to reset the passwords for everyone who is a member of its main magazine site.

It’s worth pointing out that this leak appears to involve data from The New York Times’s IT/infrastructure/website organization rather than the news organization composed of reporters. In media companies, these two entities are largely separate. The IT/infrastructure team handles the technical aspects of the website and digital operations, while the news organization manages reporting and editorial content.

Typically, these organizations use different internal GitHub instances, with the news organization likely having much stricter access controls due to the sensitivity of their code, data, and assets. This separation helps ensure that the most critical and sensitive information remains secure, even if other parts of the company experience a breach.

A little over two days after the initial leak was announced, the Times issued a statement about the leak and also its exact date:

In January 2024, a credential for a third-party cloud-based code platform was accidentally exposed. We identified the issue quickly and took necessary actions. There is no evidence of unauthorized access to Times-owned systems or any impact on our operations. We continuously monitor for any unusual activity to ensure security.
The New York Times

The said third-party provider is GitHub. Interestingly, the statement also says there is no evidence of unauthorized access to internal systems. It’s entirely possible that this attitude will change as threat actors take their time to properly understand how the exposed credentials and other systems can be abused.

As we saw with the Twitch source code leak in 2021, these massive leaks take time to settle down properly.

Tags

Posted by api