Article: 182729 of alt.sysadmin.recovery
From: logan@cs.utexas.edu (Logan Shaw)
Newsgroups: alt.sysadmin.recovery
Subject: Re: $*&!ing Deja/Google/Assmonkeys
Date: 15 Feb 2001 08:40:57 -0600
Organization: CS Dept, University of Texas at Austin
Lines: 34
Sender: logan@boomer.cs.utexas.edu
Approved: by a random group of volunteers
Message-ID: <96gppp$jgp$1@boomer.cs.utexas.edu>
References: <xkfofw7lsro.fsf@valdemar.cos.agilent.com> <slrn98hfv4.2dog.kamikaze@kuoi.asui.uidaho.edu> <slrn98hl8p.7fs.rich@bofh.concordia.ca> <96gmvs$h7a$1@amoeba.cugc.org>
NNTP-Posting-Host: boomer.cs.utexas.edu
X-Trace: news.cs.utexas.edu 982248059 10259 128.83.144.39 (15 Feb 2001 14:40:59 GMT)
X-Complaints-To: usenet@cs.utexas.edu
NNTP-Posting-Date: Thu, 15 Feb 2001 14:40:59 +0000 (UTC)
Path: news.meer.net!nntp1.ba.best.com!news2.best.com!news.maxwell.syr.edu!cpk-news-hub1.bbnplanet.com!news.gtei.net!newsfeed.cs.utexas.edu!news.cs.utexas.edu!not-for-mail
Xref: news.meer.net alt.sysadmin.recovery:182729

In article <96gmvs$h7a$1@amoeba.cugc.org>, Matt McLeod <matt@cugc.org> wrote:
>On a slightly different tack, how about a large group of people
>each slowly snarfing a subset of the archive and storing it offline?

I was sort of thinking the same thing.  Now that 80 GB hard drives
are cheap and widely available, you really don't need but a small
number of volunteers to store all of Usenet.  And as fast as Usenet,
etc. is growing, hard drives seem to be keeping up these days.

Heck, getting the archive from Google might not even be that
hard.  They might or might not be willing, but maybe with the
right licensing and agreements, it wouldn't be too tough to
convince them you're not going to threaten the business reasons
they have for having it.  And if they agree, they might even
let you walk in with a stack of DLT tapes to grab a copy.

The problem would be making the archive accessible.  You can store
all that data, but how are you going to index it in a totally
distributed way?  Sure, you can have a central directory server of
who has what groups and what dates, but what happens when you want
to search all articles from all groups for the past N years for some
keywords?  You have to spread that request all over the place.

Actually, the difficulty of the searching process might even be a
selling point for getting google to give a copy to some non-profit
trust that would use it mainly for archiving purposes.

If I were not a poor college student, I might even be willing to buy 50
or 100 GB of space to devote to it.

  - Logan
-- 
my  your   his  her   our   their   *its*
I'm you're he's she's we're they're *it's*


