No,
it has nothing to do with dead animals. Taxonomy, describing or
categorizing content, can take the greatest amount of development time.
See Beyond Spin, for the story of portal development at Silicon
Graphics, one of the earliest adopters of the technology. Every institution
has a shared vocabulary that reflects its corporate culture, a view of
the world, describing internal units, partners, customers and competitors.
Finding and codifying all the terms, acronyms, and abbreviations can be
quite a chore. Start by collecting any glossaries, telephone books, organization
charts, annual reports, and piece together a picture of the business environment.
There may be standard industry nomenclature or traditional subject indexes
covering the field.
You may have to negotiate usage. Your advertising department
thinks "advertising" is about their product, marketing thinks "advertising"
is an industry they may target. "Corporate Advertising" and "Advertising
Industry" may solve the problem. Good software can distinguish between
the two concepts by examining the other language in the document.
Is your software categorizing documents or concepts? Both can
be useful, but it is important to know the granularity of classification.
You can miss a lot if it tries to summarize content, or you can be
overwhelmed by retrieval of a common term.
How does the software work? "A proprietary algorithm" is not
an answer. If the vendor can't tell you more, you don't need to be
doing business with them.
Does it really handle your documents? If the vendor won't crawl
a sample collection and let you thoroughly examine the results, don't
buy it.
Support is critical. Taxonomy is a process, you don't just
do it once and forget it. There will be new products and people.
Test, Test, Test. You have to check the output continually.
This function is way too complicated for some little program to just
hum along continually performing at the level you require. You have
to be the power user of your site. Ignore something this important
and it will fail.
Be a customer not a victim. Some vendors price taxonomy as
a service, not a major purchase with hazy upgrade costs. They have
to keep you as a satisfied customer, not an investor who can't afford
to change programs.
Conclusions
No effort is wasted in portal development. The only mistake is waiting
for the total or "perfect" solution. Flexibility, portability, open architecture,
and commitment to emerging standards are the most important considerations.
TREC Text Retrieval Conference sponsored
by the National Institute of Standards and Technology (NIST) and the Defense
Advanced Research Projects Agency (DARPA).