Intranet Journal
The online resource for intranet professionals

Back to Article | Home | Discussion Board | Tutorials | Special Reports/Series ]

Working with Organizational Dark Data

Paul Chin

7/7/2005

How easy is it to manage your organization's content when you don't know where a lot of this content is? You have a sense — theoretically, at least — that there are vast, undiscovered caches of information scattered throughout the organization. What you don't know is how much of it there is, where it is, and who has it.

While companies laud the size, efficiency, and design of their intranet, we have to wonder how much of their content is really contained on the intranet. The fact of the matter is, visible intranet content only accounts for a small percentage of an organization's total knowledge assets. There's often a large unseen — and in some cases, unknown — portion of corporate content that never reaches the general user community. This is what's known as dark data.

What is Dark Data?

The name might sound menacing, like some black cloud that seeps into your office and ties your shoelaces to the leg of a desk. But dark data has much more value than its moniker affords. It borrows its name from the cosmological theory of dark matter, defined as "non-luminous matter not yet directly detected by astronomers that is hypothesized to exist because the visible matter in the universe is insufficient to account for various observed gravitational effects."

In plain English: You can't see it directly but you know something is out there because it's affecting the movement of other things. Quite vague, but that's the nature of dark matter.

Intranets gained their fame in the 1990's, but before that — and to this day for those without a formal content management system — much of an organization's core content was stored in all sorts of different mediums:

All of this information was managed — and I used the term "managed" very loosely — in a relatively informal manner. There was no single, centralized repository for all of this information. Employees had to either ask the "informationally privileged" for assistance or had to dig through large corporate file servers to retrieve what they were looking for. And because this information was so spread out, they would have to repeat this process several times at several locations with several people before completing their task.

When intranets emerged as a corporate content management tool, developers and content owners attempted to port and consolidate all of this dispersed content into a centralized environment that can be easily navigated by even the casual user. But how successful was this exercise? How much of this content truly made it onto the corporate intranet?

Like dark matter, no one truly knows. You can't port what you can't see. But this content does exist because — although you can't see it directly in your intranet — you can see its effects in many corporate efforts. Content most users have never seen is referenced during presentations, in conversations, and in e-mails. It's out there; it just hasn't been harnessed by the intranet.

The Effects of Dark Data on Organizations and Users

Dark data comes in all shapes and sizes, and some is more useful than others. But to fully understand the concept of dark data you need to be aware of its two main classes:

  1. Undiscovered: Undiscovered content that's been lost within an organization. No one knows where it is, or if it even exists.

  2. Concealed: Content that's intentionally kept hidden by its owners for personal use or gain. Concealed dark data is often accessible by one person or a small group of users.

The exact amount of dark data within an organization will vary depending on the company and how long it has been operating. Organizations that have been in business for decades — before the advent of digital content management — will likely have more dark data than those that have only been in business for the last several years.

But the term dark data can be used to describe not only the hidden nature of this content's existence within an organization, it can also be used to describe the state of users who rely on the intranet as their central information source. In other words, if they don't know about the existence of this "invisible" content, they themselves are, in a sense, in the dark as well.

When organizations try to run a comprehensive intranet without making an effort to locate as much dark data as possible they run the risk of duplicating both effort and content.

Without the availability and integration of dark data into an intranet, you might end up spending time and effort re-doing something that has already been done. The information that you're trying so hard to collect and process could very well already exist in a spreadsheet or database in another department. This leads not only to data duplication but also content inconsistencies.

Since the creator of an original piece of dark data and the creator of the content redux are usually different people, the content — although they reference the same subject matter — may have slight variances. And if the original dark data is ever discovered, intranet owners will be left scratching their heads wondering which is the more accurate. And to make matters worse, if the dark data's originator is no longer with the company, there's no way to compare notes.

Processing Dark Data

While dark data is an invaluable addition to an intranet it doesn't come without its challenges. Since dark data exists outside the corporate intranet, and often outside the knowledge of intranet owners, finding and porting this content to an intranet can be quite time-consuming.

There are two major issues associated with the consolidation of dark data into an intranet: Discovery and integration.

Discovery
Discovery is the process of finding and aggregating all applicable dark data to be included in an organization's intranet. But this process isn't an exact science. And it can be further complicated by content owners who, for various reasons, aren't willing to share their pool of information. In these cases dark data isn't hidden by obscurity, but rather deliberately concealed.

Although organizations with large amounts of heterogeneous content types can install a search appliance such as Google's Mini (for small- to medium-sized businesses) or Search Appliance (for large businesses), it does nothing to address dark data that exists in the form of hardcopies or user knowledge that has no corporeal home.

In order to maximize your chances of uncovering useful dark data, you need to run an information audit (the issue of running information audits will be explored in greater detail in an upcoming article). But don't try to create a single group to perform this audit. Instead, have individual intranet content owners conduct the audit from within their department or workgroup. Those who know their content best should be the ones responsible for the audits.

But the true key to dark data discovery lies in users' perception of the intranet. When users begin to see it as a productive, long-term business tool and not a flavor of the day, they will be more likely to share knowledge and bring some of this dark data to light. Without this cooperation your chances of uncovering dark data will be greatly diminished.

Integration
Once dark data has been discovered, your biggest decision will be what action to take upon this content — to decide the extent of your integration process. Will all of your heterogeneous content types be left in their native formats and simply linked to in your intranet or will this content be converted to one standard Web-based format? There's a case to be made for both.

Leaving all dark data in their native formats is certainly the quickest method, but you might be left with content inconsistencies in the long run. Intranets provide the entire user community with read-only content — much like Internet content. It's the content owners' responsibility to manage and update their intranet content. If dark data were to be left in its native format, say an Excel spreadsheet, users would most likely have to download the file from the intranet for local viewing. When this happens, there's a danger that the original file will be modified and re-circulated into the organization's information stream. The file could be changed and e-mailed from user to user — each making their own sets of modifications until there are dozens of copies. And when everyone is done with their copy, there will be dramatic differences from those files and the original "production" version sitting on the intranet.

Converting dark data to a standard intranet format is ideal but can become very effort intensive — especially if the format is not consistent with intranet content standards. Dark data contained within databases, applications, or in hardcopies can prove to be particularly difficult to convert. Dark data integration can include manually reformatting content, integrating applications with your intranet, and digitizing hardcopy documents via OCR (optical character recognition).

Closing Thoughts

The value of dark data isn't in question, but the trick is in finding it. It can be hidden in small corners of the company or long forgotten in some dusty old file server. And with all the value they can provide to the entire user community, dark data remains only accessible by a privileged minority — if even that.

Unfortunately, there's no quick fix or silver bullet. The actions required to find and integrate dark data into an intranet is dependent on the amount and complexity of this content. If you have a sense that you're in the possession of large quantities of usable dark data, focus your attention on finding this content rather than re-inventing the wheel.

But however you decide to approach this process of discovery and integration, the most important consideration when dealing with dark data is not to allow it to turn into a black hole, never to be seen again.

Back to Article | Home | Discussion Board | Tutorials ]

email this page

Tutorials
and more at:
Intranet Journal's Tutorials
Intranet Journal Favorites

Creating a PHP-Based Content Management System

The Spyware Guide

Introduction to Microsoft SharePoint Portal

Intranet Journal
Part of the EarthWeb Network

Managing Editor
Intranet Journal

Tom Dunlap

EarthWeb Home Page
Jupitermedia Home Page

Media Kit





JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
Avaya Article: Call Control XML - Powerful, Standards-Based Call Control
Tripwire Whitepaper: Seven Practical Steps to Mitigate Virtualization Security Risks
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
Microsoft Partner Portal Video: Microsoft Gold Certified Partners Build Successful Practices
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES