1/9/1 (Item 1 from file: 47) DIALOG(R)File 47:Gale Group Magazine DB(TM) (c) 2003 The Gale group. All rts. reserv.
        06461335     Supplier Number: 96338417 (THIS IS THE FULL TEXT )
        The evolution of internet research: shifting allegiances.

        Novak, David
        Online , 27 , 1 , 18(5)
        Jan-Feb , 2003
        ISSN: 0146-5422
        Language: English      Record Type: Fulltext
        Word Count: 3488    Line Count: 00275

        Text:

         It has taken me several years to grasp just how vast a gulf lies between
        searching and researching the Internet. Searching the Internet is computer
        science. We practice our understanding of search technologies and search
        engines. Researching the Internet is library science. We act upon our
        understanding of how information is arranged.
 

         These two approaches are very distinct.
              I shall attempt to trace a gradual evolution in how we find
        information using the Internet. I believe we have been moving from Internet
        searching to Internet research--from computer science to library science.
        If I am right, this portends perhaps the single most dramatic change to
        library science in decades: a renaissance of library science and
        librarianship.
              CONSTANT CHANGE
              The two or three most effective ways to search the Internet change
        every year or two. It comes as a bit of a shock to realize, but even the
        very short history of the Internet has seen a wide range of tools and
        techniques come and go. Today, there appears to be a consensus that Google
        is the primary search tool for searching for Internet information. And yet
        this same conviction was directed to Yahoo! just 2 years ago. What has
        happened?
              In the very early days, before the Web arrived, I remember pleading
        with my Internet service provider to mirror a copy of the many guidebooks
        that made up the Internet Clearinghouse Project. You may know of this
        project as its later re-incarnation: the Argus Clearinghouse. In its heyday
        it was internationally famous. One of its typical text guidebooks, Not Just
        Cows, described in detail all of the better Internet resources and active
        mailing lists for agriculture. When I met this archive, it was racing past
        130 guidebooks.
              Archie complemented this as a database of all the publicly accessible
        files found on FTP sites. Actually, Archie was not a complete database but
        was thought to index well over 95 percent of all FTP material. This
        coverage was so complete, it started the tradition that the publisher was
        responsible for informing a nearby Archie if a new FTP site was launched.
              How far we have come today Most of the guidebooks have grown up or
        disintegrated in time. Argus has not been updating for several years and is
        being folded into the Internet Public Library (IPL) directory. Argus
        founder Lou Rosenfeld formed his own consulting company
        (www.lourosenfeld.com) and gives seminars in conjunction with the Norman
        Nielsen group. Argus' direct competitor, AlphaSearch, is gone, too. Even
        Archie gave way to Shareware.com, which was then purchased by CINET, then
        lost all pretense at completeness. But much more was lost. The idea that a
        single person could organize all the resources in a given topic was one
        casualty. So was the idea of a search engine that indexed all Internet
        resources, as Archie did for FTP. The Internet simply outgrew these ideas.
        In the early days, it was both possible and brilliantly executed.
              With the arrival of Gophers, Veronica stepped in and became a third
        vital approach to finding Internet information. Veronica was a
        quasi-definitive list of all Gopher categories. It never attained the
        completeness that Archie had for FTP resources and its fame slipped rapidly
        away once it became apparent that the Web was going to be far more
        interesting than Gopherspace.
              THE WEB ARRIVES
              The early search engines, with names like the World Wide Web Worm and
        Webcrawler, changed this environment significantly. These search engines
        indexed most of the Web, certainly achieving initially over 50 percent
        coverage, then slipping to 30 percent as the Web grew. These tools were as
        famous as Google and Yahoo! Are today. Everyone used them. And when the Web
        was young, these tools sparkled.
              Unfortunately, the search algorithms used by early search engines
        were of the kind used by commercial databases of the day A search for
        "Internet Research" returned a list of Web pages ranked by frequency and
        title. Web pages with "Internet Research" in their titles would lead the
        list, followed by pages with the words "Internet research" occurring
        several times in the text. This gave rise to the uninspired marketing maxim
        that you must place your primary keywords in the title and three or four
        times in the first paragraph.
              These early search engines also invited and even expected publishers
        to inform them of new Web pages. The search engines would dutifully send
        out their spiders, sometimes immediately. For some reason, though, I don't
        remember much use of field searching in these early days. Perhaps the early
        search engines did not permit Title and URL searching, or perhaps we didn't
        know we needed these tools.
              Complementing these early search engines were two simple techniques
        that gave the motion to Internet surfing. Initially, we would search for a
        hotlinks page. A search for "Accounting Hotlinks" would likely unearth a
        page created by someone who had just finished a scan of accounting
        resources. If it was a month or two old, it served as a very fine starting
        point for your efforts to do the same.
              About a year later, as Hotlinks stopped being the word de jour, we
        would visit the "further links" section of an interesting Web site.
        Publishers were kindly creating these lists more and more, pointing out and
        linking to comparable sites. This may have been where the habit of surfing
        arose--you could hop on and gradually move from one Web site, to its
        further links page, to the next Web site, to its further links
        page--surfing to the information that peers recognized as useful.
              THE AGE OF THE DIRECTORY
              The World Wide Web Virtual Library, soon followed by Yahoo!, began to
        succeed as the guidebooks began to falter. Yahoo! required much less effort
        to update, so rapidly delivered a far more extensive list of
        resources--though sadly listing few of the cherished mailing lists.
              Yahoo! really made its move at a time when the early search engines
        were struggling to make the transition to popularity ranking. There were
        too many resources out there. The basic search algorithms that had
        delivered such brilliant results only a year earlier were now increasingly
        exasperating. They didn't work any more. The best information was often
        buried deep within a mass of other information.
              Essentially, as the Web grew and search engine databases struggled
        unsuccessfully to keep pace, the search engine results deteriorated. It did
        not help that these early search engines defaulted to OR, so that even a
        simple search for three blind mice would deliver millions of results.
        Adding the + symbol before each word-making an explicit request for a
        Boolean AND search--initially tamed this mess, but the trouble was more
        fundamental. It required a major rethink in how information was ranked to
        revitalize these search engines.
              In this chaotic transition, Yahoo! reigned supreme. Suddenly you
        could not move fast enough to see what Yahoo! had to offer. The age of the
        directory also heralded a raging business model that, through massive
        promotion, made Yahoo! synonymous with Internet research for a time.
              LATE ERA SEARCH ENGINES
              The growth of the Internet continued. When Google introduced ranking
        technologies, it changed everything. Here was a way to float the more
        popular and, coincidentally, the more recognized, resources to the top of
        the long search engine lists. With the default changed to AND, the search
        engines began to work again as an effective research tool. Then the
        databases searched by search engines swelled in size.
              There were fundamental shifts taking place. With these new
        algorithms, the search engines no longer required the assistance of
        publishers to index the best information. Initially, the engines began
        asking for e-mail addresses.--often bathing a publisher in spam as a price
        for indexing--and then some gradually stopped altogether. At the same time,
        as databases grew, the potential pay-off for a publisher shrank. Most new
        publishers would only occasionally see a visitor sent their way from any
        effort in informing the search engines of new pages.
              When Google crested 1 billion records, the limitations of Yahoo! were
        becoming increasingly apparent. No directory could ever index the complete
        volume of the Internet effectively, it was said, forgetting that only a few
        years earlier Archie had effectively indexed all FTP resources. What had
        happened, of course, was rapid Internet growth, which diluted earlier
        achievements to the point of being inadequate. It did not help that at this
        time Yahoo! began to charge a consideration fee for publishers wishing to
        be indexed.
              BOOLEAN, FIELDS, AND MORE
              Another change happened. The search engines allowed for field
        searching, and those in the know began to make much greater use of
        additional techniques to further refine their searching. A title search
        could be most helpful in certain circumstances. All the Web permitted a
        title search using tit title.normal:words. This Was later changed to match
        Alta Vista's simpler title:words, though Google persisted for a long time
        in not inviting users to use its title search capability.
              Almost by accident, many researchers began extending a skill I refer
        to as URL interpretation. From an early understanding that .gov means
        government and .au, Australia, researchers could intuit additional
        information from the Web address. On a good day, I can tell the format,
        date, publisher, and type of author from the URL. Guessing these elements
        helps me to anticipate type and quality of information on the site.
              Region also came into play. A simple url:.au would limit results to
        Australia. Even more effectively, Bryan Strome with his
        SearchEngineCollossus.com would (and still does) lead you quickly to a
        regional search engine, an Australian-only search engine. Predictions swept
        the Web that the next great step forward would be in regional Webspace and
        in topic-specific search engines. Both predictions, I am mindful, play as
        yet minor roles in Internet research.
              BACK TO CHAOS
              As the Internet grows further, search engines have begun to run into
        trouble again. Google stands at just about 3 billion records now, but the
        Web races ahead at a much faster pace. There are complex reasons for this
        pace--not least that the number of people capable of Internet publishing
        grows at an exponential rate. I've explained my views at
        www.SpireProject.com/art10.htm and www.SpireProject.com/art13.htm. This
        growth is real and seriously disrupts popularity ranking. Estimating an
        absolute size of the Web is perilous, but if you accept an estimate of 15
        billion Web pages, only 14 percent of the Web is indexed. Next year, as
        this figure surely dips below 7 percent, ranking technology will take on a
        whole new meaning.
              Where once ranking would float the best information to our attention,
        by next year it will retreat to become similar to Yahoo! with its emphasis
        on site, time, and money. Google is not losing its battle but is definitely
        losing the technological war on organizing chaos. However, this war is
        being fought more successfully on other fronts.
              CHANGES IN APPROACH
              There is more to this evolution than a change in tools. This is
        really a story about a change in approach. In the early days, we expected
        almost all FTP resources to be indexed by Archie. With the early search
        engines, we expected most important Web pages to be represented. Tomorrow,
        we will expect most important Web sites to be represented. Yes, we will
        leap from Web pages to Web sites.
              There is another message here. Over time, we discover better ways to
        find information.
              For a simple illustration, consider how we judge the quality of
        Web-based information. In the early days, there were murmurs about
        assessing quality based on the .gov versus .com or perhaps just assuming
        the worst. Even today, some online advice suggests an assessment based on
        the presence of a copyright notice and date. Is the author identified on
        the article? Are the links working? Is the spelling correct?
              Thankfully, we've progressed. We now look to context, format, and
        source. Who wrote it--and if we have a name, what else have they written
        (found with a simple search)? Make an assessment of the author and
        publisher based on other items they have published. (Hack the URL or query
        Google with a URL field search to find information logically located
        nearby.) Look for evidence of peer review by considering the format in
        which the information was prepared. Perhaps consider Web site popularity
        (found with a link field search). We can still consider spelling.
              MORE AND MORE LIBRARY SCIENCE
              Let's have a research example. One of my frequent tasks as a
        traveling public speaker is to find suitable auditoriums. This is not
        simple. Bluntly querying Google for a list of auditoriums in Dallas will
        only give me a list of those with Web sites, primarily those with some
        popularity. What I re-ally want is a list of auditoriums. It turns out two
        organizations create such lists. The local convention and visitors bureau
        often has a list of meeting room venues that include auditoriums. The state
        agency involved in disability legislation also may have a definitive list
        of auditoriums and their respective handicap access status.
              I learned this through a bit of feedback research. After I stumbled
        upon two such lists in other cities, I began to actively seek such lists
        with a purpose. The key, however, is to realize Google rarely indexes these
        lists. But knowing they exist, I'll first strike out and find the local
        convention and visitors bureau (with the help of Google or a list of
        convention centers) and then move through the Web site towards the list of
        meeting facilities. I may also consult a directory of museum Web
        sites--since they occasionally have auditoriums.
              What has happened? Simple. Searching failed me. Without library
        science--knowledge of source, anticipating information, feedback
        research--I would have to admit defeat and choose a hotel.
              Internet research continues to mature. About a year ago, I had a
        delightful afternoon with Lecturer Theresa Anderson at University of
        Technology Sydney (UTS). She was completing her thesis on the criteria
        experienced researchers use to select information. With the help of
        multiple video cameras and computer memory, she has traced how skilled
        commercial-zone searchers interact with the information world dynamically,
        predicting what was out there, selecting and guiding their attention based
        on clues.
              As we watched while I executed a difficult Internet search, we saw
        the same techniques at play. I was intimately aware of what I thought was
        out there, what I was finding, and constantly comparing the two. There was
        an internal dialogue selecting, reformulating, seeking a certain type of
        information, and being frustrated when I didn't find it. At the
        experiential level, Internet research techniques merge with commercial and
        information research techniques.
              WHAT THIS MEANS
              We have witnessed a voyage away from an era in which the Internet was
        controlled and deeply understood from a computer science perspective.
        Internet research was initially about technically searching the Internet.
        It extended from search engines, to Boolean logic, to popularity
        ranking--all elements of computer science. Because most early adopters were
        computer techies, Internet research adopted this computer tech mantle.
              This is changing, and the change is accelerating.
              Over time, the Internet has grown. It has gradually morphed from a
        shallow pool, into a deep lake, into an ocean where the depths are largely
        unknown and not directly searchable. We simply can no longer see much of
        the information from a single vantage point.
              The Internet transformed into the very beast found in the older
        information world--very much requiring library science and a research
        heritage distilled from years of working with incompletely indexed
        information with multiple and overlapping layers of organization.
              The Internet became not congested, or chaotic, since it is clearly
        neither. The Internet began to grow up, add weight, and resemble its
        information birth parent.
              Evidence of this lies in the amusement we now hold for early search
        techniques. Why don't we still search for hotlink pages? Why can't a single
        person write a guidebook organizing all the resources in agriculture? Do we
        really need to use quotes with search engines?
              GUIDEBOOKS PRECURSOR TO MODERN SEARCHING
              The one ill-fitting piece to this jigsaw is the early guidebooks I
        long held so dear. It reminds me that even in the early, pre-Web era,
        library science was there, evident, and making an impact. But that impact
        was initially minor compared to the results of computer science and
        visibility of commercially viable search engines.
              As the Internet has grown up, dwarfing our simplistic search tools
        and techniques, we have put in its place more and more library science to
        deliver us from confusion. This trend will continue. In fact, it will
        continue until the very nature of Internet research shifts monumentally
        from computer science to library science.
              The relative gifts of computer science will be eclipsed by an
        understanding that Internet research is more about finding information than
        about searching--and finding information is intimately library science.
              SHIFTING ALLEGIANCES
              Yes, the whole concept of Internet research will detach itself from
        computing science and merge as a discipline of library science. It will
        shift allegiance. The move is inevitable and I personally think it will
        take about 3 years.
              What else could transpire? Could computer science absorb library
        science? Not likely. In the vast Internet, resembling in so many ways the
        reality of information research, computing science is relegated to a role
        in organizing discrete baskets of information--not the task of guiding
        research itself. The computing aspect of searching will become a sub-topic
        to the concept of Internet research.
              As an aside, Internet cataloging actually runs the opposite risk, of
        being absorbed into computer science. The relative gifts of thesaurus and
        classification schemes can be eclipsed by the more visible gifts of
        computer science--but that is another story.
              How will we find information on an Internet with 50 billion records,
        in which the largest index is but 3 or 4 billion records in size? The
        answer is with intellect, with skill, and primarily with the arsenal
        provided by library science. We will have a multi-tiered approach, where
        individuals with more skill will dig deeper and be more effective. We have
        been moving in this direction for a decade.
              The totality and inevitability of this move is the inspiring event.
        Slowly, Internet searching will come to be seen as an element of Internet
        research. Internet research will assume the undisputed mantle of library
        science.
              WHERE TO GO FROM HERE
              The digitizing of our lives never altered the need for
        assistance--just the type of assistance the community required. The new
        forms of assistance will relate to digital information. In viewing the
        library community in its widest context, that of assisting and facilitating
        access to information, we see that the library community belongs here. This
        is your home. I see three effects:
              1) There is no urgency to selling a message that the Internet needs a
        librarian. There is no need to sell your role to the community: There is
        only the need to be there when the community learns it needs you.
              2) Priorities within the library community are changing. There are
        ways to prepare for these changes with training and, legislation. I
        personally want to see libraries involved in teaching Internet research to
        the community. Soon the community will come to you seeking advice on how to
        undertake a challenging bit of Internet research. Will you be ready?
              3) This should inspire the library community. Its destiny is assured.
        Librarians will be as important as they've always been.
              History will describe the early Internet as an aberration, the one
        time when the Internet did not resemble the whole information sphere, in
        all its complexity organization, and beauty. History will remember these
        last few years as the one time when Internet research was not part of
        library science.
              David Novak (david@spireproject.com) is a public speaker and founder
        of the Spire Project.
              Comments? E-mail letters to the editor to marydee@xmission.com.