|
(compiled
from
SearchDay, Mar. 28, 2002
In the lighthearted spirit
of the popular books for "idiots" and "dummies," here's a
look at seven common blunders that are virtually guaranteed to deliver useless,
nonsensical, or completely worthless search results. Some of these gaffes might
surprise you. But once you recognize them, it's easy to banish these
little gremlins forever from your Web search tool kit.
1. Stop Words
2. Boolean Operators
3. Common Words
4. Heteronyms
5. Capitalization
6. Proximity
7. Wrong Search Engine
1.
Stop Words
Some search engines
simply ignore certain words. They are never used to find a matching document,
despite what amounts to a direct command when you type them into a search form.
These are called "stop words"
because the search engine doesn't "stop" when the words are found in
the index (if they are even indexed at all). Why not? Because stop words are
either too common to generate meaningful results, or are parts of speech like
adverbs, conjunctions, prepositions, or forms of "be" that mean
nothing unless they're part of a phrase with more "important" nouns
and verbs.
If you use a stop word in a query you
may get wildly irrelevant results. For example, the phrase "searching the
web" contains two stop words: "the" and "web." Though
it's not a particularly common word, web is used so frequently on the Internet
that it's virtually worthless as a finding aid.
Stripping out the stop words,
"searching the web" becomes "searching," which will
naturally lead to results describing everything from criminal manhunts to quests
for enlightenment—and if you're lucky, maybe even something about searching
the web.
How can you identify stop words? Google
tells you when it's ignoring a stop word, at the very top of a results page. You
can force Google to include a stop word in a query by putting a plus sign in
front of it. AlltheWeb takes a different approach -- it often automatically
rewrites your query to include a stop word as part of a quoted phrase with other
query terms. Check out the link below to the 300 most common words in English,
many of which are stop words.
300 Most Common Words in English
http://www.zingman.com/commonWords.html
many of the 300 Most Common Words in English
shown in this list are treated as stop words.
2.
Boolean Operators
Boolean operators, like "and,"
"or," and "not," can help narrow search results—when used
properly. The problem is that Boolean operators, because of their apparent
simplicity, appear to be easy to use. Maybe, and/or not really.
According to Ran Hock, author of The
Extreme Searcher's Guide to web Search Engines, search engines implement Boolean
features in different ways. For example, while some accept a simple
"not," others require "and not" for the same effect.
Additionally, some engines require that Boolean operators be capitalized, while
others do not (or and do not?).
If you really want to use Boolean
operators, learn how to use them. Two outstanding tutorials on using these
features are listed below.
Search Engine Math
http://www.searchenginewatch.com/facts/math.html
Most search engines implement a simple form of Boolean logic that's relatively
easy to master. This tutorial by Search Engine Watch's Danny Sullivan shows how
to use this search engine math.
Boolean Searching on the Internet
http://library.albany.edu/internet/boolean.html
an outstanding, detailed, and comprehensive overview of Boolean searching on the
Internet, from the University at Albany Libraries.
3.
"Vulgar" (common) Words
Vulgar comes from the Latin vulgus,
meaning common. Like some educated sophisticates, search engines have a problem
with common words. It's not that they're being snotty or pretentious. It's that
some words are so common that they appear in literally millions of documents,
making them virtually useless as a finding aid.
Take weather, for example. There are
thousands of sites providing weather information, from local forecasts to
elaborate treatises on meteorology. Tighten your query by using focusing words
to narrow the scope of your search. Rather than merely searching for
"weather," construct a query like "Cicely Alaska annual
snowfall," or something equally specific.
4.
Heteronyms
Be careful when a word has multiple
meanings. Think of the word "bond" as an example. If you just the
single word "bond" as a query, the search engine has to figure out if
you're looking for information about financial bonds, chemical bonds, or even
James Bond. Make it easier for the engine to help you. Ask yourself the question
before the search engine does for you, and phrase your query accordingly.
Search engines are also easily confused
by heteronyms, words that are spelled identically but have different meanings
when pronounced differently. For example, "lead," pronounced LEED,
means to guide. Pronounced as LED, though, the word refers to the metal element.
When you can, use concrete synonyms instead of heteronyms.
The Heteronym Home Page
http://www-personal.umich.edu/~cellis/heteronym.html
Using synonyms in search queries can be an effective way of narrowing the focus
of your search -- unless they're heteronyms. For more examples, see this page.
5.
Capitalization Yet another problem for the searcher is
whether to use capital letters in a query. Some engines are case sensitive,
while others are not. As a rule of thumb, it's a good idea to always use lower
case letters when you search. This will typically return results that contain
both upper and lower case letters.
If you use uppercase letters in a query
to a case sensitive engine, results will only include documents that also use
upper case letters. This is usually a good thing for proper nouns like names or
places, which use initial upper case letters anyway. But it might cause you to
miss other documents where case-sensitivity is less important.
6.
Proximity Most search engines do a good job at
matching simple phrases, like "Afghan refugees," or "space
shuttle missions." The distance between one word and another in a document
is referred to as proximity. Some search engines will give a positive result if
your query words appear anywhere on a page, whether or not they are near each
other, or are used together in a phrase.
If you're searching for something where
your keywords must be near each other to get good results, your only option is
to use AltaVista's advanced search and the NEAR operator in your query. This
finds documents containing either specified words or phrases within 10 words of
each other.
7.
Using the Wrong Search Engine And now for the number one most common
searching mistake:
If you're determined to find what you're
looking for on the Web, be sure you're using the right tools for the job. Search
engines vary widely in scope, function, and quality. You'll waste a lot of time
if you don't choose the best search engine for each specific searching task.
Should you use a crawler-based search
engine, or a human compiled web directory? How about a specialized search site,
a database, or an invisible web resource? By analyzing your needs and comparing
them with the strengths and weaknesses of each search service *before* you
search, you'll likely get better results.
This may sound like a chicken and egg
problem -- how do you know which search tools is best without trying them out
first? Experience helps. There's also a
terrific
table created by Debbie Abilock that's a virtual cheat sheet for a wide
variety of information needs.
If you're relatively new to searching
and get stuck, don't be hard on yourself. One of the most ridiculous
misconceptions I've ever heard is that "you can find anything on the
Internet." This is about as true as saying that there are diamonds in every
coal mine. And though it may sound like heresy coming from someone who
lives and breathes web search, sometimes your best bet for finding information
is to log off and take a trip to your local library.
Libraries have tons of resources that
aren't available on the Web. And librarians are trained experts who are usually
more than willing to help you find what you're looking for. When you're getting
nowhere on the Web, take advantage of these (usually very nice) "human
search engines."
Updated
04/09/03; KD
|