Computers in Libraries, May 1999, v19 n5, 22 - 28.
Darwin on the Web: The Evolution of Search Tools
Dale J Vidmar
Abstract:
Vidmar notes some similarities between the evolution of man and the
evolution of search tools, and postulates that both are
becoming more advanced. A number of search tools, including Yahoo!
and AltaVista, are discussed.
The potential to create worldwide networks is both the major strength
and weakness of the Internet. There are millions of Web
pages-text and graphical-posted by anyone and everyone. But the Web
still lacks organization. Finding "good" information is
difficult at best. Search tools have emerged to help navigate a course
through this primordial soup of information. They carry the
hope for a better fuIure despite the difficulties of the present. To
try to make the best of today's search tools, I believe it helps
to divide the features and options into three major categories. This
may help illustrate their releive importance as well as their
strengths.
What search tools do best is collect data about what people are searching
and how they execute their queries. The collected
data is analyzed so the tools can be altered or developed accordingly.
Simple interfaces and natural language searching are
examples of search tools responding to the needs of the typical searcher.
The features that are listed under the heading of
Naturally Selected and Adapted offer more options and better control
but require advanced skills. More than anything, they are
guided by the desire for more precision that leads to specific information.
Subjects, Categories, Channels, and Folders
Yahoo! is best known for its structured hierarchy to direct a search
to more specific information. What distinguishes a search
engine from a search directory is becoming less perceptible. Most search
and metasearch tools, such as AltaVista, Northern
Light, Lycos, The Internet Sleuth, MetaCrawler, etc., have subject
directories. Subject directories in these search tools dif fer
dramatically from the more extensive subject directory sites-Digital
Librarian (http://www.servtech.com/*mvail), INFOMINE
(http://lib-www.ucr.edu), Academic Info (http://www.academicinfo.net),
and others that are maintained by individuals who
select resources linking to "good information." Subject directories,
categories, channels, and folders allow directed searching
through selected materials that have been indexed.
Boolean Searching: AND, OR, NOT, NEAR
The move from implied OR searching to implied AND searching and phrase
de tection has increased precision and rele vancy.
Tools allowing Boolean searching have been the standard for librarian-typ
searching but are overlooked by the typical searcher
who's opting for simple search modes. The best adaptation of Boolean
with simplicity are drop-down menus and radio buttons
to specify AND, OR, and NOT searches.
Search Basics:
Excludes and Includes
Pluses, minuses, quotations, and paren- 0 theses are basic to complex
search strategies. but they are getting less useful as
search tools move in the direction of doing more for the searcher.
The question asked by most sophisticated searchers is how
far these tools will go in "dumbing down" for the sake of the inexperienced
searcher or lowest common denominator The plus
[+] symbol requires/includes terms: [+pumpkin +recipe]. This is especially
useful when a search term is aS stop word. In
Google, "web" is a stop word. A search for "web design" do not produce
relevant documents like "+web design" does.
The minus [-] symbol excludes terms: [clinton -lewinski].
"Quotations" specify terms in a phrase search: ["lesson plans"].
And you can combine features: [+"lesson plans" +"social studies" -history].
Parenthetical expression orders the search operation but will generally
eliminate some of the clustering operations in simple
search modes: [Cleveland and (Indians or Tribe)].
Most of the search tools and metasearch tools allow these options. However,
the use of these advanced search options will
often produce less-than-satisfactory results. Starting simple is often
the best strategy.
Truncation
AltaVista, Northern Light, HotBot, and Snap use the asterisk symbol
to allow the cybersearcher to use a truncated form of a
term. Northern Light automatically generates the plural variation of
a search term. The need for truncation can often be
overcome by choosing more unique terms. The term "adolescence" can
be used instead of teen* with better results.
Field Searching
Most search tools have adopted a variety of field searching options
such as title, URL, image, and more. What may be more
notable is that Excite, Magellan, WebCrawler, and Google do not allow
this option. HotBot excels in field searching by offering
user-friendly drop-down menu and radio button options to indicate field
selections.
The acceleration of change presents a challenge to even the most dedicated
searcher. In the flux and flow of innovation, the
following features still have something to prove-that they can indeed
help make sense of the chaos.
Metasearching
Metasearch tools such as SavvySearch, The Internet Sleuth, ProFusion,
MetaCrawler, and others search multiple tools
simultaneously. They have the advantage of searching a greater portion
of Web pages than individual search tools by their very
nature and as such are an excellent way to begin a search. Customization
or choice of tools to include in a search as well as
options for organizing retrieval sets make metasearchers a logical
choice for broadening the scope of a search. The
disadvantage of metasearching is that the searcher has little control
over how a search is executed. Some tools may ignore
certain operations such as Boolean expressions, quotations, capitalization,
etc., leading to results that are less accurate than
you'd get by searching with a specific tool. [Editor's Note: To read
more about metasearch tools, turn to this issue's Internet
Librarian feature, which begins on page 50.]
Relevancy Ranking
Relevancy ranking is at the heart of search tool evolution and is the
most guarded proprietary feature in the industry. Because
each tool has its own algorithm and functions somewhat differently
than every other tool, the relevancy percentiles are
tool-specific anomalies. How the percentiles are fashioned is proprietary
information, so there is no way to know what the
percentile really signifies. A truly relevant document is likely to
be given 98 percent ranking in one tool and 55 percent ranking
in another tool. Much like a weather forecast calling for a 10 percent
chance of rain, no matter what happens, the predicted
likelihood of relevancy is always correct.
While relevancy ranking may be the scourge of cybersearching because
of differing retrieval sets, it is also a blessing for the
very same reason. Since no single tool is the "right" or "best" one,
a savvy cybersearcher knows to use as many tools as
possible or as many tools as necessary to satisfy the search. The very
competitive nature of these commercial tools ensures that
evolution will continue and natural selection will filter out the best
and most efficient tools for the future.
Automatic Phrase Detection This feature is used by AltaVista to match
words within a database of common phrases. However,
more often than not, the search tool cannot second-guess the searcher
accurately. Allowing a searcher to specify a phrase is a
more logical option.
Sorting Features
A retrieval set sorted by relevancy ranking tends to be effective, but
sorting the retrieval set by date without combined
relevancy produces results that are less than useful unless the retrieval
set is small.
Hybridization of Tools
Northern Light combines the Web with Special Collections database that
accesses over 4 million articles from about 5,400
journals, magazines, encyclopedias, and newswires. Northern Light offers
individual, corporate, library, and organization
accounts. Northern Light has also recently been selected by the National
Technical Information Service (NTIS) to develop a
search site to access U.S. federal government information. It remains
to be seen what the outcome of comingling Web
information with more traditional magazine and journal information
will be, but rest assured that this is only the beginning.
One of the major reasons for the growth spurts of the Internet is a
new development or innovation of search tools. This
evolution translates into better access and navigation of cyberspace.
The following features are designs that are opening new
windows to the future of the Web.
Relevancy Ranking Based on Backlinks
Google has pioneered perhaps one of the better features for determining
relevancy-backlinks. In addition to using algorithms
like other search tools, Google analyzes hyperlinks or sites linking
to a particular site, thus relying on the collective expertise of
other sites to determine the relative worth of a site. If a lot of
hyperlinks-particularly within primary sitespoint to a site, then
Google reasons that the site must be of some value. The recent addition
of quotations to specify phrase searching has made
Google a leader in finding pertinent information. [Editor's Note: For
more information on Google, see Peter Jacso's Internet
Insights column in the April issue of Information Today, page 30.]
Natural Language Searching
Ask Jeeves is a metasearch tool that has also created a database of
almost 7 million answers to questions that have been
commonly asked by searchers. The natural language feature allows the
searcher to ask a question like, "What causes ocean
waves?" If the question appears in their "KnowledgeBase" of common
questions, then Ask Jeeves provides a "good" Web site
that answers the question. If the question cannot be found, possible
options are listed with drop-down selections. In addition,
Ask Jeeves metasearches on six other search tools. AltaVista uses the
Ask Jeeves technology to offer its own version of Ask
AltaVista. While AltaVista is useful, it is not nearly as robust nor
as accurate as Ask Jeeves. It is difficult to get any easier than
natural language searching, especially in combination with good answers
from the database of common questions.
Clustering/ Cataloging Information
Search tools are not only providing subject directories and categories
to direct a search, they are integrating cataloging-type
options that link to the equivalent See Also or Related Terms. Yahoo!
has evolved into a hierarchical classification system to
organize retrieval sets. Northern Light's Custom Folders feature refines
results as an integrated element of a search. AltaVista
provides a list of Related Searches. Infoseek links to Directory topics.
Excite provides a list of words to add to refine a search
and a Web Site Directory using concept-based indexing to help direct
the search. Lycos directs searches to a category listing
called First and Fast. All these options generally improve search results.
Image Searching
AltaVista has an option called AV PhotoFinder (http://image.altavista.com/
cgi-bin/avncgi) that allows both filtered and
unfiltered image searching. Lycos Image Gallery (http://www.lycos.com/
picturethis) finds images, illustrations, sound files, and
video. Yahoo! Image Surfer (http://ipix.yahoo.com) searches via categories
for images. The customization feature of
SavvySearch allows the searcher to search AV PhotoFinder, Lycos Image
Gallery, Yahoo! Image Surfer, and Scour.net
(http://www.Scour.net) simultaneously. Excite also allows searching
of audio files and is partnered with RealPlayer.
Customization and Portals SavvySearch's Customization (http:// www.savvysearch.com/custom)
provides a selection of tools
that you can opt to use for a metasearch. After the tools are selected,
a searcher can specify a name and choose to start a
search from the custom menu. SavvySearch's Customization is more versatile
for searching than the portal options offered by
major search tools such as MyYahoo!, MyExcite, MyAltaVista, etc. As
market share continues to drive search tools, there will
be more concentration on providing portal search tool sites. Whether
portals will evolve into anything more than a useful option
within the individual search tools remains a question. At present,
portals are at their best when used in conjunction with other
tools. Again, no one tool is everything. Partnerships
RealNames (http://www.realnames. com) is a subscription database used
by AltaVista that leads searchers to top-level
corporate, organizational, individual, or other sites. An AltaVista
search for "census" produces links to the U.S. Census Bureau,
U.S. Census Bureau Population Topics, etc., whereas the retrieval set
begins with a link to a commercial site about the U.S.
Federal Census or a link to a U.S. Census Bureau page that is further
down the hierarchy about County Population Estimates.
Direct Hit (http://www.directhit.com) is a tool used by HotBot that
creates a list of relevant sites based on popularity or traffic
patterns. By examining the retrieval set sites visited by searchers,
Direct Hit creates a top 10 list of the most popular sites. The
first list in the HotBot search results is a link to the Direct Hit's
Top Ten sites. Direct Hit is partnering with tools such as Lycos,
LookSmart, and potentially others in the future.
Multilingual Searching and Translations
The advanced evolutions of search tools will require more sophisticated
techniques to overcome language barriers in order to
create a truly "World" Wide Web. Yahoo! has separate iterations such
as Yahoo! en Espanol (http://espanol.yahoo. com) and
Yahoo! Chinese (http://chinese. yahoo.com) to navigate sites in languages
other than English. For a list of countries, go to
WorldYahoo!s (http://howto.yahoo. com/ask/world.html). SavvySearch
provides a metasearch interface in 23 languages.
AltaVista (http://babelfish.alta vista.digital.com/cgi-bin/translate?)
provides a basic translation from English to French, German,
Spanish, Portuguese, and Italian by inputting either text or the address
(URL) or a Web page. AltaVista will also translate
French, German, Spanish, Portuguese, and Italian to English. HotBot
allows a search to be limited to one of nine languages via
a drop-down menu. Beaucoup (http://www.beaucoup.com/engbig. html) provides
a subject directory listing of geographically
specific search tools for finding information from specific countries
outside the United States. (Be aware that this feature may
require extra fonts necessary for Russian, Hebrew, Nihongo, or other
languages using non-Roman characters.)
The Shift to Simplicity
Simplicity may be the most underrated feature in the evolutionary process
of search tools. Natural language searching,
clustering, guidance, and See Also types of features retrieve good
results with minimal expertise. The data collected by search
tools indicates that the average searcher rarely uses advanced techniques,
choosing instead to input one or more terms without
Boolean, truncation, proximity, includes, or excludes. Consequently,
search tools have taken the path of developing simple
approaches.
Library and information specialists who are versed in research databases
and online information services like DIALOG and Lexis-Nexis may desire a variety of sophisticated options and commands
needed for retrieving precise information.
However, the Web environment is vastly different than online or traditional
research databases both in content and in user
profile. Online, librarian-type searching requires professional expertise-something
the average cybersearcher does not
necessarily possess. While simplicity may seem to be the "dumbing down"
of search tools and the scourge of the Internet, the
evolution of "smarter" search tools is actually a practical attempt
to meet the needs of the general public. Professional and
expert searchers would be well advised to stay flexible and to adapt
their strategies to the changing environment.
If there was only one single search tool, or worse, several search tools
that produced the same results for every search, the
consequences would actually be more detrimental to the Internet. Different
tools producing varying results may be essential to
the further growth of the World Wide Web. Perhaps the greatest hope
for the future cybrarian lies in the competitive spirit that
inspires multiple approaches in the development of search tools. Vive
la difference!
Addresses for the main Search Tools mentioned in this Article
|