PyTorch infrastructure was compromised

PyTorch

PyTorch logo

Recently sdetails about an attack were released that the infrastructure used in the development of the machine learning framework suffered PyTorch. Among the technical details revealed, it is mentioned that The attacker managed to extract access keys which allowed you to put arbitrary data into the GitHub and AWS repository, replace code in the master branch of the repository, and add a backdoor via dependencies.

This incident poses significant risks, since the spoofing of PyTorch versions could be used to attack large companies such as Google, Meta, Boeing and Lockheed Martin, which use PyTorch in their projects.

Four months ago, Adnan Khan and I exploited a critical CI/CD vulnerability in PyTorch, one of the world's leading machine learning platforms. Used by titans like Google, Meta, Boeing, and Lockheed Martin, PyTorch is a major target for hackers and nation-states alike.

Fortunately, we took advantage of this vulnerability before the bad guys did.

This is how we did it.

Regarding the attack, it is mentioned that this It comes down to the ability to run code on continuous integration servers that perform rebuilds and run jobs to test new changes pushed to the repository. The issue affects projects that use external “Self-Hosted Runner” drivers with GitHub Actions. Unlike traditional GitHub Actions, self-hosted controllers do not run on GitHub infrastructure, but on their own servers or on virtual machines maintained by developers.

Running build tasks on your servers allows you to organize the release of code that can scan a company's internal network, search the local FS for encryption keys and access tokens, and analyze environmental variables with parameters to access external storage or cloud services and with this, through these drivers, the attacker was able to execute compilation tasks on their own servers, which allowed them to scan a company's internal network to search for encryption keys and access tokens.

In PyTorch and other projects that use Self-Hosted Runner, that Developers can run build jobsn only after your changes have been reviewed. However, the attacker managed to bypass this system by first sending a minor change and then, once accepted, automatically obtained the status of "collaborator" which allowed you to run code in any GitHub Actions Runner environment associated with the repository or supervising organization. During the attack, GitHub access keys and AWS keys were intercepted, allowing the attacker to compromise the infrastructure.

The link to the “contributor” status turned out to be easy to bypass: it is enough to first submit a minor change and wait for it to be accepted into the code base, after which the developer automatically receives the status of an active participant. whose pull requests can be tested in the CI infrastructure without separate verification. To achieve active developer status, the experiment included minor cosmetic changes to fix typos in the documentation. To gain access to the repository and storage of PyTorch versions, during an attack when executing code in the "Self-Hosted Runner", the GitHub token used to access the repository from the build processes was intercepted (GITHUB_TOKEN allowed write access ), as well as the AWS keys involved in saving the build results.

As such, it is mentioned that this issue is not specific to PyTorch and affects other large projects which use default configurations for “Self-Hosted Runner” in GitHub Actions.

In addition, the possibility of similar attacks on cryptocurrency, blockchain, Microsoft Deepspeed, TensorFlow and other projects has been mentioned, with potentially serious consequences. Researchers have submitted more than 20 applications to bug bounty programs, seeking rewards worth several hundred thousand dollars.

finally if you are interested in knowing more about it, you can check the details in the following link


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.