Head To Head
Log In
Register
The Modern Antiquarian Forum »
Getting TMA securely archived for posterity: EDS/anyone
Log In to post a reply

18 messages
Topic View: Flat | Threaded
Evergreen Dazed
1881 posts

Re: Getting TMA securely archived for posterity: EDS/anyone
Mar 27, 2016, 12:41
Aha! From that there wiki :


Methods of collection[edit]
See also: List of Web archiving initiatives

Remote harvesting[edit]
The most common web archiving technique uses web crawlers to automate the process of collecting web pages. Web crawlers typically access web pages in the same manner that users with a browser see the Web, and therefore provide a comparatively simple method of remote harvesting web content. Examples of web crawlers used for web archiving include:

Heritrix
HTTrack
Wget
There exist various free services which may be used to archive web resources "on-demand", using web crawling techniques. These services include the Wayback Machine and WebCite.

Database archiving[edit]
Database archiving refers to methods for archiving the underlying content of database-driven websites. It typically requires the extraction of the database content into a standard schema, often using XML. Once stored in that standard format, the archived content of multiple databases can then be made available using a single access system. This approach is exemplified by the DeepArc and Xinq tools developed by the Bibliothèque nationale de France and the National Library of Australia respectively. DeepArc enables the structure of a relational database to be mapped to an XML schema, and the content exported into an XML document. Xinq then allows that content to be delivered online. Although the original layout and behavior of the website cannot be preserved exactly, Xinq does allow the basic querying and retrieval functionality to be replicated.

Transactional archiving[edit]
Transactional archiving is an event-driven approach, which collects the actual transactions which take place between a web server and a web browser. It is primarily used as a means of preserving evidence of the content which was actually viewed on a particular website, on a given date. This may be particularly important for organizations which need to comply with legal or regulatory requirements for disclosing and retaining information.

A transactional archiving system typically operates by intercepting every HTTP request to, and response from, the web server, filtering each response to eliminate duplicate content, and permanently storing the responses as bitstreams.
Topic Outline:

The Modern Antiquarian Forum Index