Killing the single biggest research irritation

Sometimes I strongly suspect that the First Law of Research is: “The probability of the server being down is directly proportional to the usefulness of the article.” One of the single most frustrating experiences of research is being unable to access a good article, either because of technical problems or because it has been moved or deleted. However, there are some easy ways to keep one’s mental sanity when encountering this.

Solution #1: Google Cache

Google maintains a backup of the way a webpage looked the last time it crawled over it with its little pointy digital claws.  A little link labeled “Cached” will appear next to the page’s entry in the Google search results (circled in red in the image below:)

wikipedia

Click this link to view Google’s saved version. This is (mostly) stored on Google’s servers, so it doesn’t matter if you can’t access the “real” version – as long as Google is available, you should be able to get it. Some elements, such as images, may be unavailable, however. For a more complete copy, try the next solution.

Solution #2: Archive.org

Archive.org maintains, among other things, what basically amounts to a running backup of the internet dating back to about 1996. If a webpage has been around for at least a few months, it’s almost certainly available on Archive.org. They make their backups all freely available through the WayBack Machine. The debate applications are obvious – if you can’t access the article, you can look at the version archived in the WayBack Machine.

I do this so much I’ve created a special javascript “WayBack It” button on my toolbar. Just create a new bookmark, and set the location/URL to:

javascript:document.location='http://web.archive.org/web/*/'+escape(window.location);

Just click it whenever you get an error. It will automatically load up the WayBack Machine archive for that page. Then click on the date of the version you want to view.

Archive.org has its caveats, of course. Pages do take awhile to become available, so very recent articles probably won’t be up yet. (Google Cache is a better option for recent articles.) Additionally, some sites specifically prevent Archive.org from mirroring them. But, for the vast majority of articles, it’s a very useful tool.

TIP: Another very useful thing you can use Archive.org for is to find out the approximate date an article first appeared online. If you can’t find out the date any other way, look it up on Archive.org and look at the date of the first version they have archived. This isn’t very accurate, but it’s certainly better than not knowing at all.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: