Mystery group creates publicly accessible online Sapa archive

A group of civic-minded hackers have come together to launch SapaFiles, a publicly available, searchable archive of articles belonging to the recently shuttered South African Press Association (Sapa), dating back to 1998.

The group decided to make the archives available after it was announced that the 77-year old news wire service would be closing its doors on 1 April

“We took the extraordinary step of saving as much as we could and making it public, because we believe that the Sapa archive represents an invaluable South African historical asset, that should not be controlled by one private company,” the group said in email sent to Memeburn. “Controlling such an asset could allow the owner to literally rewrite South Africa’s history, or simply bury it”.

When Sapa sent out a letter to its subscribers requesting that news sites across the country delete Sapa content from their archives, the group felt compelled to speed up its efforts.

The letter was sent out because the service’s archives had been bought out by Sekunjalo, which also made the company owner of all the intellectual property contained within the archive. Sapa eventually backed off that demand in the face of widespread criticism, but the implications were nonetheless alarming.

Aside from the potential for information to be buried, online news sites stood to lose out on advertising revenue made from archived content. Even a site like the Mail & Guardian, which produces large amounts of original content, uses as many as 75 Sapa stories a month. When Memeburn broke news of the deletion demands Mail & Guardian editor in chief Chris Roper said that this was “not a lot for a 24/7 news site”.

The group behind SapaFiles says that it managed to grab around 1.9-million articles before Sapa shut down, including most of the content produced between 1998 and 2006, as well as most of 2015. It adds that it hopes to eventually complete the archive, which it says “holds tremendous value for researchers, historians, and academics, who do not have access to it”.

“We also believe that it could be a valuable resource to journalists for researching the history of a story, and for investigative journalists looking for links between companies and individuals,” the group adds.

For reasons of anonymity, the archive is currently only available on Tor, the highly secure anonymous communication network.

The group says it hopes to “remedy this in time”, but is “extremely cautious” about doing so right now.

The archive is accessible on the open web through Tor2Web.

More

News

Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

Sign up to our newsletter to get the latest in digital insights.