Sunday, April 19, 2009

Cyber-Archaeology of the future

I am working on a television show about science, and in preparing the segments we want to film I have to do a fair amount of research to gather information for the hosts so that they can simplify it for the audience.  The show is more about cool visuals than real science, but there is a drive to keep everything we do grounded in science of some redeeming value.

In the process of looking up random bits of scientific information (mostly through google searches), a picture forms in my mind of the internet as a whole and the value of a search engine to sift through ever-increasing numbers of pages to find what I'm looking for.  I see pages like this one, buried in a government website, that stand alone, relatively unconnected to the rest of the internet (well...I guess not anymore now that I'm linking to it).  It's not an old website, it just looks like one.  But what about all those old angelfire pages (here's a fabulous example) ?  What will become of them in the future as the internet grows and changes?  More importantly, what obscure tidbits of information might be contained in them that future netizens will want to find?

I envision areas of study dedicated to delving through old websites and mining them for information.  The Indiana Jones of the future will be a master googler rather than a bullwhip-wielding grave robber.  All kinds of random information might be hidden in rarely-visited sites that are too small to bother deleting, being copied onto newer, more massive servers in bulk information transfers along with all their early 21st century secrets and collecting cyber-dust until one day someone stumbles across them.

And then there will be that rare gem that someone knows is out there, but doesn't know how to find.  In the year 2080, how will someone find the credits for that Cadillac commercial with Kate Walsh that came out in 2008?  What was on the cover of USAToday 75 years ago? The 'official' sources of information for these kinds of questions may actually be unavailable - if the companies in question went bankrupt and official websites were taken offline - but what if there is a random blog post about the subject hidden away somewhere, still accessible online because it was never important enough to delete?

Google and Wikipedia are doing a great job of organizing the internet, but there's still a lot of mess out there - and maybe that's good for future cyber-archaeologists.