Darwin on the Web:
The Evolution of Search Tools

 
image of a blue line

Computers in Libraries, May 1999, v19 n5, 22 - 28.

    Darwin on the Web: The Evolution of Search Tools
    Dale J Vidmar

Abstract:
Vidmar notes some similarities between the evolution of man and the evolution of search tools, and postulates that both are
becoming more advanced. A number of search tools, including Yahoo! and AltaVista, are discussed.
 

The Primordial Web

Searching for information on the Internet is generally an exercise in survival of the fittest. Trying to understand how search tools
work, which tool to use for which purposes, and which tool does what best offers little joy for even the most dedicated of
cybersearchers. With so many factors to consider-simple modes vs. advanced modes, natural language vs. Boolean logic,
metasearching, portals, customization, relevancy ranking, backlinks, folders, directories, and more-answering these questions is
not simple. When it comes to search tools, certainty lies only in change. Mastering the art of cybersearching is an exhaustive
undertaking, perhaps less rewarding because of the knowledge that today's expert can quickly become tomorrow's
Neanderthal.

The potential to create worldwide networks is both the major strength and weakness of the Internet. There are millions of Web
pages-text and graphical-posted by anyone and everyone. But the Web still lacks organization. Finding "good" information is
difficult at best. Search tools have emerged to help navigate a course through this primordial soup of information. They carry the
hope for a better fuIure despite the difficulties of the present. To try to make the best of today's search tools, I believe it helps
to divide the features and options into three major categories. This may help illustrate their releive importance as well as their
strengths.

  1. Naturally Selected and Adaptedthese are established features that advanced searchers have come to expect in search tools.
  2. In Transition-These are current features that still need to prove their worth in the long run.
  3. Advanced Evolution-These are features that contribute to creating the future of cybersearching.
Understanding and knowing how to use these features are the keys to growing from a simple cybersurfer into a more valuable
online researcher.
 

Tools that Are Naturally Selected and Adapted

"I think we may conclude that habit, use, and disuse, have, in some cases, played a considerable part in the modification..."
-Charles Darwin from Origin of Species
Chapter 5: Laws of Variation

What search tools do best is collect data about what people are searching and how they execute their queries. The collected
data is analyzed so the tools can be altered or developed accordingly. Simple interfaces and natural language searching are
examples of search tools responding to the needs of the typical searcher. The features that are listed under the heading of
Naturally Selected and Adapted offer more options and better control but require advanced skills. More than anything, they are
guided by the desire for more precision that leads to specific information.

Subjects, Categories, Channels, and Folders

Yahoo! is best known for its structured hierarchy to direct a search to more specific information. What distinguishes a search
engine from a search directory is becoming less perceptible. Most search and metasearch tools, such as AltaVista, Northern
Light, Lycos, The Internet Sleuth, MetaCrawler, etc., have subject directories. Subject directories in these search tools dif fer
dramatically from the more extensive subject directory sites-Digital Librarian (http://www.servtech.com/*mvail), INFOMINE
(http://lib-www.ucr.edu), Academic Info (http://www.academicinfo.net), and others that are maintained by individuals who
select resources linking to "good information." Subject directories, categories, channels, and folders allow directed searching
through selected materials that have been indexed.

Boolean Searching: AND, OR, NOT, NEAR

The move from implied OR searching to implied AND searching and phrase de tection has increased precision and rele vancy.
Tools allowing Boolean searching have been the standard for librarian-typ searching but are overlooked by the typical searcher
who's opting for simple search modes. The best adaptation of Boolean with simplicity are drop-down menus and radio buttons
to specify AND, OR, and NOT searches.

Search Basics:

Excludes and Includes

Pluses, minuses, quotations, and paren- 0 theses are basic to complex search strategies. but they are getting less useful as
search tools move in the direction of doing more for the searcher. The question asked by most sophisticated searchers is how
far these tools will go in "dumbing down" for the sake of the inexperienced searcher or lowest common denominator The plus
[+] symbol requires/includes terms: [+pumpkin +recipe]. This is especially useful when a search term is aS stop word. In
Google, "web" is a stop word. A search for "web design" do not produce relevant documents like "+web design" does.

The minus [-] symbol excludes terms: [clinton -lewinski].

"Quotations" specify terms in a phrase search: ["lesson plans"].

And you can combine features: [+"lesson plans" +"social studies" -history].

Parenthetical expression orders the search operation but will generally eliminate some of the clustering operations in simple
search modes: [Cleveland and (Indians or Tribe)].

Most of the search tools and metasearch tools allow these options. However, the use of these advanced search options will
often produce less-than-satisfactory results. Starting simple is often the best strategy.

Truncation

AltaVista, Northern Light, HotBot, and Snap use the asterisk symbol to allow the cybersearcher to use a truncated form of a
term. Northern Light automatically generates the plural variation of a search term. The need for truncation can often be
overcome by choosing more unique terms. The term "adolescence" can be used instead of teen* with better results.

Field Searching

Most search tools have adopted a variety of field searching options such as title, URL, image, and more. What may be more
notable is that Excite, Magellan, WebCrawler, and Google do not allow this option. HotBot excels in field searching by offering
user-friendly drop-down menu and radio button options to indicate field selections.
 

Search Tools In Transition

" ... as new forms are produced ... many old forms must become extinct."
-Charles Darwin from Origin of Species
Chapter 4: Natural Selection

The acceleration of change presents a challenge to even the most dedicated searcher. In the flux and flow of innovation, the
following features still have something to prove-that they can indeed help make sense of the chaos.

Metasearching

Metasearch tools such as SavvySearch, The Internet Sleuth, ProFusion, MetaCrawler, and others search multiple tools
simultaneously. They have the advantage of searching a greater portion of Web pages than individual search tools by their very
nature and as such are an excellent way to begin a search. Customization or choice of tools to include in a search as well as
options for organizing retrieval sets make metasearchers a logical choice for broadening the scope of a search. The
disadvantage of metasearching is that the searcher has little control over how a search is executed. Some tools may ignore
certain operations such as Boolean expressions, quotations, capitalization, etc., leading to results that are less accurate than
you'd get by searching with a specific tool. [Editor's Note: To read more about metasearch tools, turn to this issue's Internet
Librarian feature, which begins on page 50.]

Relevancy Ranking

Relevancy ranking is at the heart of search tool evolution and is the most guarded proprietary feature in the industry. Because
each tool has its own algorithm and functions somewhat differently than every other tool, the relevancy percentiles are
tool-specific anomalies. How the percentiles are fashioned is proprietary information, so there is no way to know what the
percentile really signifies. A truly relevant document is likely to be given 98 percent ranking in one tool and 55 percent ranking
in another tool. Much like a weather forecast calling for a 10 percent chance of rain, no matter what happens, the predicted
likelihood of relevancy is always correct.

While relevancy ranking may be the scourge of cybersearching because of differing retrieval sets, it is also a blessing for the
very same reason. Since no single tool is the "right" or "best" one, a savvy cybersearcher knows to use as many tools as
possible or as many tools as necessary to satisfy the search. The very competitive nature of these commercial tools ensures that
evolution will continue and natural selection will filter out the best and most efficient tools for the future.

Automatic Phrase Detection This feature is used by AltaVista to match words within a database of common phrases. However,
more often than not, the search tool cannot second-guess the searcher accurately. Allowing a searcher to specify a phrase is a
more logical option.

Sorting Features

A retrieval set sorted by relevancy ranking tends to be effective, but sorting the retrieval set by date without combined
relevancy produces results that are less than useful unless the retrieval set is small.

Hybridization of Tools

Northern Light combines the Web with Special Collections database that accesses over 4 million articles from about 5,400
journals, magazines, encyclopedias, and newswires. Northern Light offers individual, corporate, library, and organization
accounts. Northern Light has also recently been selected by the National Technical Information Service (NTIS) to develop a
search site to access U.S. federal government information. It remains to be seen what the outcome of comingling Web
information with more traditional magazine and journal information will be, but rest assured that this is only the beginning.
 

Advanced Evolution

"These individual differences are highly important for us, as they afford materials for natural selection to accumulate..."
Charles Darwin from Origin of Species
Chapter 2: Variation Under Nature

One of the major reasons for the growth spurts of the Internet is a new development or innovation of search tools. This
evolution translates into better access and navigation of cyberspace. The following features are designs that are opening new
windows to the future of the Web.

Relevancy Ranking Based on Backlinks

Google has pioneered perhaps one of the better features for determining relevancy-backlinks. In addition to using algorithms
like other search tools, Google analyzes hyperlinks or sites linking to a particular site, thus relying on the collective expertise of
other sites to determine the relative worth of a site. If a lot of hyperlinks-particularly within primary sitespoint to a site, then
Google reasons that the site must be of some value. The recent addition of quotations to specify phrase searching has made
Google a leader in finding pertinent information. [Editor's Note: For more information on Google, see Peter Jacso's Internet
Insights column in the April issue of Information Today, page 30.]

Natural Language Searching

Ask Jeeves is a metasearch tool that has also created a database of almost 7 million answers to questions that have been
commonly asked by searchers. The natural language feature allows the searcher to ask a question like, "What causes ocean
waves?" If the question appears in their "KnowledgeBase" of common questions, then Ask Jeeves provides a "good" Web site
that answers the question. If the question cannot be found, possible options are listed with drop-down selections. In addition,
Ask Jeeves metasearches on six other search tools. AltaVista uses the Ask Jeeves technology to offer its own version of Ask
AltaVista. While AltaVista is useful, it is not nearly as robust nor as accurate as Ask Jeeves. It is difficult to get any easier than
natural language searching, especially in combination with good answers from the database of common questions.

Clustering/ Cataloging Information

Search tools are not only providing subject directories and categories to direct a search, they are integrating cataloging-type
options that link to the equivalent See Also or Related Terms. Yahoo! has evolved into a hierarchical classification system to
organize retrieval sets. Northern Light's Custom Folders feature refines results as an integrated element of a search. AltaVista
provides a list of Related Searches. Infoseek links to Directory topics. Excite provides a list of words to add to refine a search
and a Web Site Directory using concept-based indexing to help direct the search. Lycos directs searches to a category listing
called First and Fast. All these options generally improve search results.

Image Searching

AltaVista has an option called AV PhotoFinder (http://image.altavista.com/ cgi-bin/avncgi) that allows both filtered and
unfiltered image searching. Lycos Image Gallery (http://www.lycos.com/ picturethis) finds images, illustrations, sound files, and
video. Yahoo! Image Surfer (http://ipix.yahoo.com) searches via categories for images. The customization feature of
SavvySearch allows the searcher to search AV PhotoFinder, Lycos Image Gallery, Yahoo! Image Surfer, and Scour.net
(http://www.Scour.net) simultaneously. Excite also allows searching of audio files and is partnered with RealPlayer.

Customization and Portals SavvySearch's Customization (http:// www.savvysearch.com/custom) provides a selection of tools
that you can opt to use for a metasearch. After the tools are selected, a searcher can specify a name and choose to start a
search from the custom menu. SavvySearch's Customization is more versatile for searching than the portal options offered by
major search tools such as MyYahoo!, MyExcite, MyAltaVista, etc. As market share continues to drive search tools, there will
be more concentration on providing portal search tool sites. Whether portals will evolve into anything more than a useful option
within the individual search tools remains a question. At present, portals are at their best when used in conjunction with other
tools. Again, no one tool is everything. Partnerships

RealNames (http://www.realnames. com) is a subscription database used by AltaVista that leads searchers to top-level
corporate, organizational, individual, or other sites. An AltaVista search for "census" produces links to the U.S. Census Bureau,
U.S. Census Bureau Population Topics, etc., whereas the retrieval set begins with a link to a commercial site about the U.S.
Federal Census or a link to a U.S. Census Bureau page that is further down the hierarchy about County Population Estimates.

Direct Hit (http://www.directhit.com) is a tool used by HotBot that creates a list of relevant sites based on popularity or traffic
patterns. By examining the retrieval set sites visited by searchers, Direct Hit creates a top 10 list of the most popular sites. The
first list in the HotBot search results is a link to the Direct Hit's Top Ten sites. Direct Hit is partnering with tools such as Lycos,
LookSmart, and potentially others in the future.

Multilingual Searching and Translations

The advanced evolutions of search tools will require more sophisticated techniques to overcome language barriers in order to
create a truly "World" Wide Web. Yahoo! has separate iterations such as Yahoo! en Espanol (http://espanol.yahoo. com) and
Yahoo! Chinese (http://chinese. yahoo.com) to navigate sites in languages other than English. For a list of countries, go to
WorldYahoo!s (http://howto.yahoo. com/ask/world.html). SavvySearch provides a metasearch interface in 23 languages.
AltaVista (http://babelfish.alta vista.digital.com/cgi-bin/translate?) provides a basic translation from English to French, German,
Spanish, Portuguese, and Italian by inputting either text or the address (URL) or a Web page. AltaVista will also translate
French, German, Spanish, Portuguese, and Italian to English. HotBot allows a search to be limited to one of nine languages via
a drop-down menu. Beaucoup (http://www.beaucoup.com/engbig. html) provides a subject directory listing of geographically
specific search tools for finding information from specific countries outside the United States. (Be aware that this feature may
require extra fonts necessary for Russian, Hebrew, Nihongo, or other languages using non-Roman characters.)

The Shift to Simplicity

Simplicity may be the most underrated feature in the evolutionary process of search tools. Natural language searching,
clustering, guidance, and See Also types of features retrieve good results with minimal expertise. The data collected by search
tools indicates that the average searcher rarely uses advanced techniques, choosing instead to input one or more terms without
Boolean, truncation, proximity, includes, or excludes. Consequently, search tools have taken the path of developing simple
approaches.

Library and information specialists who are versed in research databases and online information services like DIALOG and Lexis-Nexis may desire a variety of sophisticated options and commands needed for retrieving precise information.
However, the Web environment is vastly different than online or traditional research databases both in content and in user
profile. Online, librarian-type searching requires professional expertise-something the average cybersearcher does not
necessarily possess. While simplicity may seem to be the "dumbing down" of search tools and the scourge of the Internet, the
evolution of "smarter" search tools is actually a practical attempt to meet the needs of the general public. Professional and
expert searchers would be well advised to stay flexible and to adapt their strategies to the changing environment.
 

The More They're Used, The More They Develop

The more adept searchers become in using these tools, the better the tools will become. For all their inconsistencies, differing
results, and other constraints and problems, search tools are all that cybersearchers have to make sense of the Internet. Despite
the commercial attraction to create one all-encompassing search tool that is a portal for everyone, tools will continue to arise,
evolve, and change in a variety of ways in order to meet the challenge of finding information. The individual differences that
occur as a part of the evolution will shape the ever-changing environment of the Web.

If there was only one single search tool, or worse, several search tools that produced the same results for every search, the
consequences would actually be more detrimental to the Internet. Different tools producing varying results may be essential to
the further growth of the World Wide Web. Perhaps the greatest hope for the future cybrarian lies in the competitive spirit that
inspires multiple approaches in the development of search tools. Vive la difference!

Addresses for the main Search Tools mentioned in this Article

Dale J. Vidmar is an assistant professor and the electronic resources and instruction librarian at the Southern Oregon University Library in Ashland, Oregon. He has an M.L.S. from Kent State University. His e-mail address is vidmar@ sou. edu.

 

Image of a blue line
 
Image of a row of rocks in grass
 
You are visitor number ,
since May 1, 1999. 
Updated 
 

 
If you have questions about the information on this page or its content,
please send comments to Dale Vidmar.
Copyright 1999, Southern Oregon University Library.