| Narrowing your search
One of the basic problems with searching for information
in a very large information space such as the World
Wide Web is that there are a very large number of
pages that match most terms you might try. So when
the total number of pages your search engine knows
about is in the billions, it is not uncommon for
a search engine to return over 100,000 matches for
a search.
There's one big problem with this: Who wants to
go through 100,000 web pages? And what makes you
think that the first 20 pages are the ones most likely
to be what you want out of the 100,000 pages? It
is true that some search engines order their results
so that the pages it calculates to be the most relevant
are first - at least, after the ads - but there's
a strong possibility that what the search engine
thinks is relevant is not what you wanted at all!
So how do we limit the number of matches to something
more reasonable? Boolean logic is a very powerful
tool in this situation. Basic Boolean logic has 3
operators that help us: AND, OR & NOT.
Suppose you want to find a hotel room for a planned
trip. You could search the web for "hotel" and
find hotels on every continent - not exactly what
you need! But how can you limit your search to hotels
in Atlanta, Georgia? By using Boolean logic to say
that we only want pages that mention "hotel" AND "Atlanta" we
get a much smaller set of pages.
So limiting the search significantly reduces the
number of pages we need to look through to find what
we want. But suppose that this set is still too large?
How could we limit it further? Many times your best
clue comes from looking at the pages that are not
what you want. For example, if your search turns
up hotels in cities named Atlanta, but not Atlanta,
Georgia, you would want to eliminate all the hotels
that are not in Georgia. How could you use the Boolean
AND to eliminate those from the set returned by the
search engine?
One possibility would be to search for "hotel" AND "Atlanta" AND "Georgia" which
would give us the following:
Notice that in common English speech and writing, we use "and" to
include larger areas and more things. For example, when a student
says, "I like movies with Tom Selleck and Halle Berry." we
correctly interpret that to mean a larger number of movies
than either of those stars has made individually. But in Boolean
search logic, using "AND" actually reduces the matches
found. So in this sense, the student would only be including
movies that have both Tom Selleck and Halle Berry - a pretty
small number of movies. This is because using "AND" in
searching for items means that all matching items must have
all the terms joined by "AND," instead of having
any one of the terms. Since this is different from our habitual
English usage, this is often a source of errors.
It is still likely the red area that represents
the result of our search so far still contains
many items that are not what we want, even
though they do contain all three terms.
Considererations:
- Can you think of some items that would contain all the terms, but still
not be what we want?
- Can you think of a way to modify our
search that would exclude those?
- On the other hand, can you think of some
places or things you might want to look
for that are not in the hits we've found
so far?
|