Computers in Libraries, May 1999, v19 n5, 22 - 28.
Darwin on the Web: The Evolution of Search Tools
Dale J Vidmar
Vidmar notes some similarities between the evolution of man and the evolution of search tools, and postulates that both are
becoming more advanced. A number of search tools, including Yahoo! and AltaVista, are discussed.
The potential to create worldwide networks is both the major strength
and weakness of the Internet. There are millions of Web
pages-text and graphical-posted by anyone and everyone. But the Web still lacks organization. Finding "good" information is
difficult at best. Search tools have emerged to help navigate a course through this primordial soup of information. They carry the
hope for a better fuIure despite the difficulties of the present. To try to make the best of today's search tools, I believe it helps
to divide the features and options into three major categories. This may help illustrate their releive importance as well as their
What search tools do best is collect data about what people are searching
and how they execute their queries. The collected
data is analyzed so the tools can be altered or developed accordingly. Simple interfaces and natural language searching are
examples of search tools responding to the needs of the typical searcher. The features that are listed under the heading of
Naturally Selected and Adapted offer more options and better control but require advanced skills. More than anything, they are
guided by the desire for more precision that leads to specific information.
Subjects, Categories, Channels, and Folders
Yahoo! is best known for its structured hierarchy to direct a search
to more specific information. What distinguishes a search
engine from a search directory is becoming less perceptible. Most search and metasearch tools, such as AltaVista, Northern
Light, Lycos, The Internet Sleuth, MetaCrawler, etc., have subject directories. Subject directories in these search tools dif fer
dramatically from the more extensive subject directory sites-Digital Librarian (http://www.servtech.com/*mvail), INFOMINE
(http://lib-www.ucr.edu), Academic Info (http://www.academicinfo.net), and others that are maintained by individuals who
select resources linking to "good information." Subject directories, categories, channels, and folders allow directed searching
through selected materials that have been indexed.
Boolean Searching: AND, OR, NOT, NEAR
The move from implied OR searching to implied AND searching and phrase
de tection has increased precision and rele vancy.
Tools allowing Boolean searching have been the standard for librarian-typ searching but are overlooked by the typical searcher
who's opting for simple search modes. The best adaptation of Boolean with simplicity are drop-down menus and radio buttons
to specify AND, OR, and NOT searches.
Excludes and Includes
Pluses, minuses, quotations, and paren- 0 theses are basic to complex
search strategies. but they are getting less useful as
search tools move in the direction of doing more for the searcher. The question asked by most sophisticated searchers is how
far these tools will go in "dumbing down" for the sake of the inexperienced searcher or lowest common denominator The plus
[+] symbol requires/includes terms: [+pumpkin +recipe]. This is especially useful when a search term is aS stop word. In
Google, "web" is a stop word. A search for "web design" do not produce relevant documents like "+web design" does.
The minus [-] symbol excludes terms: [clinton -lewinski].
"Quotations" specify terms in a phrase search: ["lesson plans"].
And you can combine features: [+"lesson plans" +"social studies" -history].
Parenthetical expression orders the search operation but will generally
eliminate some of the clustering operations in simple
search modes: [Cleveland and (Indians or Tribe)].
Most of the search tools and metasearch tools allow these options. However,
the use of these advanced search options will
often produce less-than-satisfactory results. Starting simple is often the best strategy.
AltaVista, Northern Light, HotBot, and Snap use the asterisk symbol
to allow the cybersearcher to use a truncated form of a
term. Northern Light automatically generates the plural variation of a search term. The need for truncation can often be
overcome by choosing more unique terms. The term "adolescence" can be used instead of teen* with better results.
Most search tools have adopted a variety of field searching options
such as title, URL, image, and more. What may be more
notable is that Excite, Magellan, WebCrawler, and Google do not allow this option. HotBot excels in field searching by offering
user-friendly drop-down menu and radio button options to indicate field selections.
The acceleration of change presents a challenge to even the most dedicated
searcher. In the flux and flow of innovation, the
following features still have something to prove-that they can indeed help make sense of the chaos.
Metasearch tools such as SavvySearch, The Internet Sleuth, ProFusion,
MetaCrawler, and others search multiple tools
simultaneously. They have the advantage of searching a greater portion of Web pages than individual search tools by their very
nature and as such are an excellent way to begin a search. Customization or choice of tools to include in a search as well as
options for organizing retrieval sets make metasearchers a logical choice for broadening the scope of a search. The
disadvantage of metasearching is that the searcher has little control over how a search is executed. Some tools may ignore
certain operations such as Boolean expressions, quotations, capitalization, etc., leading to results that are less accurate than
you'd get by searching with a specific tool. [Editor's Note: To read more about metasearch tools, turn to this issue's Internet
Librarian feature, which begins on page 50.]
Relevancy ranking is at the heart of search tool evolution and is the
most guarded proprietary feature in the industry. Because
each tool has its own algorithm and functions somewhat differently than every other tool, the relevancy percentiles are
tool-specific anomalies. How the percentiles are fashioned is proprietary information, so there is no way to know what the
percentile really signifies. A truly relevant document is likely to be given 98 percent ranking in one tool and 55 percent ranking
in another tool. Much like a weather forecast calling for a 10 percent chance of rain, no matter what happens, the predicted
likelihood of relevancy is always correct.
While relevancy ranking may be the scourge of cybersearching because
of differing retrieval sets, it is also a blessing for the
very same reason. Since no single tool is the "right" or "best" one, a savvy cybersearcher knows to use as many tools as
possible or as many tools as necessary to satisfy the search. The very competitive nature of these commercial tools ensures that
evolution will continue and natural selection will filter out the best and most efficient tools for the future.
Automatic Phrase Detection This feature is used by AltaVista to match
words within a database of common phrases. However,
more often than not, the search tool cannot second-guess the searcher accurately. Allowing a searcher to specify a phrase is a
more logical option.
A retrieval set sorted by relevancy ranking tends to be effective, but
sorting the retrieval set by date without combined
relevancy produces results that are less than useful unless the retrieval set is small.
Hybridization of Tools
Northern Light combines the Web with Special Collections database that
accesses over 4 million articles from about 5,400
journals, magazines, encyclopedias, and newswires. Northern Light offers individual, corporate, library, and organization
accounts. Northern Light has also recently been selected by the National Technical Information Service (NTIS) to develop a
search site to access U.S. federal government information. It remains to be seen what the outcome of comingling Web
information with more traditional magazine and journal information will be, but rest assured that this is only the beginning.
One of the major reasons for the growth spurts of the Internet is a
new development or innovation of search tools. This
evolution translates into better access and navigation of cyberspace. The following features are designs that are opening new
windows to the future of the Web.
Relevancy Ranking Based on Backlinks
Google has pioneered perhaps one of the better features for determining
relevancy-backlinks. In addition to using algorithms
like other search tools, Google analyzes hyperlinks or sites linking to a particular site, thus relying on the collective expertise of
other sites to determine the relative worth of a site. If a lot of hyperlinks-particularly within primary sitespoint to a site, then
Google reasons that the site must be of some value. The recent addition of quotations to specify phrase searching has made
Google a leader in finding pertinent information. [Editor's Note: For more information on Google, see Peter Jacso's Internet
Insights column in the April issue of Information Today, page 30.]
Natural Language Searching
Ask Jeeves is a metasearch tool that has also created a database of
almost 7 million answers to questions that have been
commonly asked by searchers. The natural language feature allows the searcher to ask a question like, "What causes ocean
waves?" If the question appears in their "KnowledgeBase" of common questions, then Ask Jeeves provides a "good" Web site
that answers the question. If the question cannot be found, possible options are listed with drop-down selections. In addition,
Ask Jeeves metasearches on six other search tools. AltaVista uses the Ask Jeeves technology to offer its own version of Ask
AltaVista. While AltaVista is useful, it is not nearly as robust nor as accurate as Ask Jeeves. It is difficult to get any easier than
natural language searching, especially in combination with good answers from the database of common questions.
Clustering/ Cataloging Information
Search tools are not only providing subject directories and categories
to direct a search, they are integrating cataloging-type
options that link to the equivalent See Also or Related Terms. Yahoo! has evolved into a hierarchical classification system to
organize retrieval sets. Northern Light's Custom Folders feature refines results as an integrated element of a search. AltaVista
provides a list of Related Searches. Infoseek links to Directory topics. Excite provides a list of words to add to refine a search
and a Web Site Directory using concept-based indexing to help direct the search. Lycos directs searches to a category listing
called First and Fast. All these options generally improve search results.
AltaVista has an option called AV PhotoFinder (http://image.altavista.com/
cgi-bin/avncgi) that allows both filtered and
unfiltered image searching. Lycos Image Gallery (http://www.lycos.com/ picturethis) finds images, illustrations, sound files, and
video. Yahoo! Image Surfer (http://ipix.yahoo.com) searches via categories for images. The customization feature of
SavvySearch allows the searcher to search AV PhotoFinder, Lycos Image Gallery, Yahoo! Image Surfer, and Scour.net
(http://www.Scour.net) simultaneously. Excite also allows searching of audio files and is partnered with RealPlayer.
Customization and Portals SavvySearch's Customization (http:// www.savvysearch.com/custom)
provides a selection of tools
that you can opt to use for a metasearch. After the tools are selected, a searcher can specify a name and choose to start a
search from the custom menu. SavvySearch's Customization is more versatile for searching than the portal options offered by
major search tools such as MyYahoo!, MyExcite, MyAltaVista, etc. As market share continues to drive search tools, there will
be more concentration on providing portal search tool sites. Whether portals will evolve into anything more than a useful option
within the individual search tools remains a question. At present, portals are at their best when used in conjunction with other
tools. Again, no one tool is everything. Partnerships
RealNames (http://www.realnames. com) is a subscription database used
by AltaVista that leads searchers to top-level
corporate, organizational, individual, or other sites. An AltaVista search for "census" produces links to the U.S. Census Bureau,
U.S. Census Bureau Population Topics, etc., whereas the retrieval set begins with a link to a commercial site about the U.S.
Federal Census or a link to a U.S. Census Bureau page that is further down the hierarchy about County Population Estimates.
Direct Hit (http://www.directhit.com) is a tool used by HotBot that
creates a list of relevant sites based on popularity or traffic
patterns. By examining the retrieval set sites visited by searchers, Direct Hit creates a top 10 list of the most popular sites. The
first list in the HotBot search results is a link to the Direct Hit's Top Ten sites. Direct Hit is partnering with tools such as Lycos,
LookSmart, and potentially others in the future.
Multilingual Searching and Translations
The advanced evolutions of search tools will require more sophisticated
techniques to overcome language barriers in order to
create a truly "World" Wide Web. Yahoo! has separate iterations such as Yahoo! en Espanol (http://espanol.yahoo. com) and
Yahoo! Chinese (http://chinese. yahoo.com) to navigate sites in languages other than English. For a list of countries, go to
WorldYahoo!s (http://howto.yahoo. com/ask/world.html). SavvySearch provides a metasearch interface in 23 languages.
AltaVista (http://babelfish.alta vista.digital.com/cgi-bin/translate?) provides a basic translation from English to French, German,
Spanish, Portuguese, and Italian by inputting either text or the address (URL) or a Web page. AltaVista will also translate
French, German, Spanish, Portuguese, and Italian to English. HotBot allows a search to be limited to one of nine languages via
a drop-down menu. Beaucoup (http://www.beaucoup.com/engbig. html) provides a subject directory listing of geographically
specific search tools for finding information from specific countries outside the United States. (Be aware that this feature may
require extra fonts necessary for Russian, Hebrew, Nihongo, or other languages using non-Roman characters.)
The Shift to Simplicity
Simplicity may be the most underrated feature in the evolutionary process
of search tools. Natural language searching,
clustering, guidance, and See Also types of features retrieve good results with minimal expertise. The data collected by search
tools indicates that the average searcher rarely uses advanced techniques, choosing instead to input one or more terms without
Boolean, truncation, proximity, includes, or excludes. Consequently, search tools have taken the path of developing simple
Library and information specialists who are versed in research databases
and online information services like DIALOG and Lexis-Nexis may desire a variety of sophisticated options and commands
needed for retrieving precise information.
However, the Web environment is vastly different than online or traditional research databases both in content and in user
profile. Online, librarian-type searching requires professional expertise-something the average cybersearcher does not
necessarily possess. While simplicity may seem to be the "dumbing down" of search tools and the scourge of the Internet, the
evolution of "smarter" search tools is actually a practical attempt to meet the needs of the general public. Professional and
expert searchers would be well advised to stay flexible and to adapt their strategies to the changing environment.
If there was only one single search tool, or worse, several search tools
that produced the same results for every search, the
consequences would actually be more detrimental to the Internet. Different tools producing varying results may be essential to
the further growth of the World Wide Web. Perhaps the greatest hope for the future cybrarian lies in the competitive spirit that
inspires multiple approaches in the development of search tools. Vive la difference!
Addresses for the main Search Tools mentioned in this Article