Working with Organizational Dark Data
Paul Chin
(post
paulchinonline.com)
7/7/2005
Go to page: 1 2
Printer Friendly Version
How easy is it to manage your organization's content when you don't know where a lot of this content is? You have a sense — theoretically, at least — that there are vast, undiscovered caches of information scattered throughout the organization. What you don't know is how much of it there is, where it is, and who has it.
While companies laud the size, efficiency, and design of their intranet, we have to wonder how much of their content is really contained on the intranet. The fact of the matter is, visible intranet content only accounts for a small percentage of an organization's total knowledge assets. There's often a large unseen — and in some cases, unknown — portion of corporate content that never reaches the general user community. This is what's known as dark data.
What is Dark Data?
The name might sound menacing, like some black cloud that seeps into your office and ties your shoelaces to the leg of a desk. But dark data has much more value than its moniker affords. It borrows its name from the cosmological theory of dark matter, defined as "non-luminous matter not yet directly detected by astronomers that is hypothesized to exist because the visible matter in the universe is insufficient to account for various observed gravitational effects."
In plain English: You can't see it directly but you know something is out there because it's affecting the movement of other things. Quite vague, but that's the nature of dark matter.
Intranets gained their fame in the 1990's, but before that — and to this day for those without a formal content management system — much of an organization's core content was stored in all sorts of different mediums:
-
word processor documents
-
spreadsheets
-
PDF files
-
structured databases
-
proprietary applications
-
hard copy documents stored in filing cabinets
-
employees' e-mail messages
-
discussion groups and blogs
-
inside employees' heads
All of this information was managed — and I used the term "managed" very loosely — in a relatively informal manner. There was no single, centralized repository for all of this information. Employees had to either ask the "informationally privileged" for assistance or had to dig through large corporate file servers to retrieve what they were looking for. And because this information was so spread out, they would have to repeat this process several times at several locations with several people before completing their task.
When intranets emerged as a corporate content management tool, developers and content owners attempted to port and consolidate all of this dispersed content into a centralized environment that can be easily navigated by even the casual user. But how successful was this exercise? How much of this content truly made it onto the corporate intranet?
Like dark matter, no one truly knows. You can't port what you can't see. But this content does exist because — although you can't see it directly in your intranet — you can see its effects in many corporate efforts. Content most users have never seen is referenced during presentations, in conversations, and in e-mails. It's out there; it just hasn't been harnessed by the intranet.
The Effects of Dark Data on Organizations and Users
Dark data comes in all shapes and sizes, and some is more useful than others. But to fully understand the concept of dark data you need to be aware of its two main classes:
-
Undiscovered: Undiscovered content that's been lost within an organization. No one knows where it is, or if it even exists.
-
Concealed: Content that's intentionally kept hidden by its owners for personal use or gain. Concealed dark data is often accessible by one person or a small group of users.
The exact amount of dark data within an organization will vary depending on the company and how long it has been operating. Organizations that have been in business for decades — before the advent of digital content management — will likely have more dark data than those that have only been in business for the last several years.
But the term dark data can be used to describe not only the hidden nature of this content's existence within an organization, it can also be used to describe the state of users who rely on the intranet as their central information source. In other words, if they don't know about the existence of this "invisible" content, they themselves are, in a sense, in the dark as well.
When organizations try to run a comprehensive intranet without making an effort to locate as much dark data as possible they run the risk of duplicating both effort and content.
Without the availability and integration of dark data into an intranet, you might end up spending time and effort re-doing something that has already been done. The information that you're trying so hard to collect and process could very well already exist in a spreadsheet or database in another department. This leads not only to data duplication but also content inconsistencies.
Since the creator of an original piece of dark data and the creator of the content redux are usually different people, the content — although they reference the same subject matter — may have slight variances. And if the original dark data is ever discovered, intranet owners will be left scratching their heads wondering which is the more accurate. And to make matters worse, if the dark data's originator is no longer with the company, there's no way to compare notes.
Page 2: Processing Dark Data...
Go to page: 1 2
Printer Friendly Version