Feature
Improving Information Retrieval with Human Indexing
Special to Intranet Design
By Kevin Broccoli, Broccoli Information Management
As company intranets grow in
content, it becomes increasingly difficult to find the exact information
that an intranet user may be looking for. Companies have traditionally
used search engines to locate information on their intranets. However,
many are finding that search engines (even the newer, so-called "intelligent"
ones) are just not enough.
For example, perhaps you are looking up information on a particular
subject. You type the word into the search engine interface and click
"GO!" Within seconds you have a list of retrieved documents.
But there are 87 of them! And there is little indication as to which
document might be the one that you need. You have a choice of clicking
on some of the entries with the hopes that the needed information will
be within first few documents, or spending literally hours combing through
each one of them.
Why is it so difficult to find what you need?
Intelligent - Not!
One reason is the manner in which search engines operate. Generally
search engines look for every occurrence of the word which was
typed into the search interface. Upon finding them, it lists each and
every document containing that word. However, the topic may only be
mentioned within some of the documents, with no information of real
value.
Also, you may be searching for more specific information regarding
the topic, but are not sure how to narrow your search. Or perhaps the
documents use certain words or phrases within the text, and although
you are typing in synomynous terms, they are not the exact terms
needed. Or perhaps a word is simply misspelled.
A search engine, like other computer automata, can't allow for such
errors.
For example, if you work in an insurance company, you may be looking
for information regarding "theft." Some of the documents use
this precise word, so the search engine grabs those pages. But it does
not retrieve any of the pages using the term "robbery" or
"thievery." You may not even understand why the search engine
retrieved certain documents. In many instances, only the title of the
document is listed, which doesn't tell you much.
One way of improving the relevance of search results is to look for
keywords that can be inserted as "metadata" within the pages
of the intranet. This is one of the promises of eXtended Markup Language
(XML), which (among other things) lets authors tag pages precisely so
users can more easily find them.
But metadata is no panacea. For one thing, the user may still be usure
how to narrow a search, resulting in an overabundance of irrelevant
hits. Moreover, word processing tools like Microsoft Word have long
given authors the ability to add metadata to documents. Yet how many
times have you filled in those Summary Info fields? Any information
retrieval scheme that relies on people to categorize their ideas will
at best be limited, and at worst may interfere with the creation of
intellectual capital.
Indexes and Outlines
What, then, is the solution to the above mentioned problems? Simply put,
an index with main headings and subcategories. Users are instantly aided
in narrowing down their search by choosing from such available subcategories.
The interface of an index is also quite familiar to everyone, having been
employed in the back of reference and trade books practically since the
beginning of print.
Even in online help for software programs, there is always a table
of contents and an index. Software producers would never consider having
ONLY a search engine available to find information within their help
files. Instead, they know that users need to find relevant information
right away, and that they will need better navigation cues than full-text
searching can provide.
more ...