GitHub will create a TAR image of each active public repository and maintain it in an Arctic Vault

arctic code vault

GitHub wants to make sure that some of the global knowledge which is stored in hard drives, SSD (whose theoretical life of 30 years assumes strictly controlled heat and humidity) is stored safely. And it is that it wants to contribute to the resolution of this problem and others such as the occurrence of disasters that probably cause the loss of content.

That is why I launched the projecto "Artic Code Vault" in which the idea behind this is save the contents of the repositories on a storage medium that has a longer shelf life. Piql, a Norwegian company specializing in very long-term data storage, is responsible for supplying and encoding that data on film. The film technology is based on silver halides and polyester.

Since servers and flash drives are not robust enough for this purpose, so data is encoded on what looks like old school movie reels, each weighs a few pounds and is stored in a white plastic container the size of a pizza box. It's basically microfilm.

According to ISO measurements, this material has a useful life of 500 years. Simulated aging tests indicate that Piql film will last twice as long.

With this, GitHub plans to host the tapes in a coal mine. dismantled that is located in the Svalbard archipelago, the archive is closer to the North Pole than to the Arctic Circle.

The city itself is home to a global cold room. It is one of the northernmost cities on the planet. Archivists believe that the cold and almost constant conditions will contribute favorably to the preservation of the contents.

On February 2, 2020, GitHub will create a TAR image of each public repository active and will keep it in the Arctic Code Vault. The file will include items from the default branch of each repository, excluding any binary files larger than 100 kilobytes. For higher data density and integrity, most of the data will be stored as a QR code. A human-readable index and guide will detail the location of each repository and explain how to retrieve the data.

 

The platform then plans to multiply the duration of content backup by 10. GitHub has entered into a partnership with Microsoft Research in this direction for up to 10,000 years. To achieve this, research teams intend to 'write the contents on quartz glass trays using femtosecond lasers. »

The Artic Code Vault is part of an archive program launched by GitHub with a number of partners including the Internet Archive, Microsoft Research, and the Long Now Foundation. The strategy boils down to «archive content across multiple organizations as per LOCKS recommendation - lots of copies keep things safe«.

Backup strategy is organized in batches which will be updated in real time. For example, at the GitHub level, data will be immediately transferred to multiple data centers around the world. While on the other hand Other types of lots will be handled that will be updated monthly or annually. Finally, what will be stored in this proposal, in which we find the Artic Code Vault, will be updated every 5 years at least.

«Our main mission is to preserve free software for future generations. We also intend the GitHub Archive Program to be a testament to the importance of the open source community. We hope that, today and in the future, it will raise awareness of the global Open Source movement, as it will contribute to greater adoption of Open Source and Open Data policies around the world and encourage long-term thinking, "writes GitHub.

If you want to know more about the Artic Code Vault project you can consult the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.

  1.   anonymous said

    What a bad smell this has… .you can call me paranoid, but the first thing I thought was this:
    If I wanted to change something on everyone's nose, how would I do it?
    I would make a backup to another medium, then I fake a failure and delete or ruin the original, then I recover from the backup what I want and how I want… telling everyone that this is the original copy.
    Maybe my imagination is very creative, but for a moment ... think about it.