Digital Information: Gaps in Knowledge Understanding and Access

January 30, 2013

Stephen Arnold took a conversation he and a few of our colleagues had and wrote more about it in his Beyond Search blog. “Thoughts about Commercial Databases: 2013” is worth a review and I’ve added a few of my own thoughts here for your consideration.

The conclusion of our discussion is summed up nicely  by Arnold in that the digital future of information companies is gloomy and his post outlines a few familiar names in the world of libraries.

  • Ebsco Electronic Publishing (everything but the kitchen sink coverage)
  • Elsevier (scientific and technical with Fast Search in its background)
  • ProQuest (everything but the kitchen sink coverage plus Dialog)
  • Thomson Reuters (multiple disciplines, including financial real time info)
  • Wolters Kluwer (mostly legal and medical and a truckload of individual brands)

During our discussion the questions was posed, how can database companies grow? The short answer is their are no obvious growth patterns beyond acquiring other information publishers. A point that caused one in the group to say, eventually the beasts will begin eating themselves because of the hunger when there is no fresh meat. Amusing and yet frightening.

Arnold quotes “Why Acquisitions Fail: The Five Main Factory by Pearson Education” to explain the key factors in why acquisitions result in problems rather than soaring success.

The fact that library budgets continue to shrink, open access continues to grow and  large database companies fail to adjust business models for these realities causes deep concern for the researcher in me. As Arnold states:

The business model for these firms has been built on selling “must have” information to markets who need the information to do their job. The reason for the stress on this group of companies is that the traditional customers are strapped for cash or have lower cost alternatives.

Other concerns abound as well. As libraries continue to limit access to physical collections thanks to the value of library real estate, strain is placed on the serendipity of browsing researchers. Digital research presents its own challenges. It often leaves one feeling as though they have retrieved a a good match but is it really the best and is it complete? When you add in that many of today’s students, even those training to be librarians,  do not successfully distinguish between source and provider in the electronic age, the concerns for access, understanding and knowledge abound.

While Arnold concentrates on the outlook of commercial databases and even suggests that an acquisition by Google to monetize the content with ads could be a shift in the future of information publishing, there are other concerns to ponder.  Curated content has a future, but what that future holds in terms of commercial versus open access is yet to be thought out in light of what Arnold suggests as the trend for 2013 commercial databases.

Those who think that public search companies are keeping the archive of all digital information are in for a rude awakening. Librarians and information professionals need to get beyond teaching people how to search. As professionals, we have a duty to understand the business pressures of our information suppliers, free or fee, and what those pressures do to the availability of yesterday’s information in today’s reality of right now access.

Information professionals must think about and prepare for the inevitability of lost information. The Way Back Machine may be expanding their database but they are not archiving the complete history of companies that are no longer in business. Think about the number of start-ups that are no longer around, who were the corporate officers, what was their credit history? The gaps in corporate information mean that there will be gaps in ongoing competitive intelligence.

This is a simple issue on the surface with unfolding complexities that warrant thought and planning and action. Just as the burning of the Timbuktu library means of loss of valuable information, so too do the cost pressures and lack of access and exposure to digital data.

Innovation on the commercial side seems nearly impossible. Curation and access on the public search side is limited by the ability for providers to drive their profit in light of their own business models. Open access is being challenged to the point where advocates such as Aaron Swartz ends his own life. The Library of Congress is archiving Twitter when they may be better serving the longevity of knowledge and information by archiving the “free” information on the world wide web.

Of course, the practical part of me that understands that daily life grinds on no matter what understands that this is a good intellectual argument. In the long run will this have a significant impact on daily life? Probably not. It is something that when I think about the history of knowledge and culture, gets my mind whirling. Business will do what businesses do, libraries will do what libraries do and maybe just maybe the digital gaps won’t cause overwhelming repetition of mistakes.

Either way, it is fun to think and share and get input from intelligent colleagues.

Constance Ard, January 30, 2013



Big Data Meets Predictive Coding: Economic Impact To Be Determined

December 10, 2012

Litigation, especially for high profile companies means big money. In the age of big data, litigation and eDiscovery means big money too. So this Corporate Counsel article, “Change Is Coming: The Evolution of E-Discovery Economics,” caught my attention.

If you have ever done a Google keyword search, even if it is a custom search, you realize that keyword search is not efficient in returning narrow, accurate, relevant results. In a deluge of data, when time and costs matters, keyword searching is probably not the most efficient method of reviewing and producing documents in response to a discovery request. However, the article does make the valid point that until recently keyword search was the best we had available.

Now we enter the age of predictive coding. This technology opens an entirely new view of the massive landscape of structured and unstructured data.

Predictive coding is software that is trained by a user to predict which documents in a document set will be responsive and which will be non-responsive. Predictive coding goes by many names, including computer-assisted review and technology-assisted review.

Predictive coding aims to reduce the number of documents reviewed by ranking the documents according to a calculated level of responsiveness. Instead of looking at every email written by a custodian over a three-year time period, predictive coding uses a number of factors including keywords, writing style, subject matter of the writing, and even punctuation style to determine the chain of documents that are most relevant to the matter. These underlying programmable algorithms vary between software brands.

The discussion in the Corporate Counsel article is lengthy and worth reading carefully. Predictive coding is hot in the technology solutions landscape and Recommind was one of the first in the market space. However, as we all know, the world of law is a bit slow to embrace the newest technology. Someone else needs to test the water and find out if it is too hot, too cold, or maybe it is the Baby Bear solution to eDiscovery and offers the just right option. 

I think the massive amount of information that is being created, retained and therefore open for discovery needs more than predictive technology. I think active information management policies and procedures will be a key to the effective cost control of the evolving land of litigation.

Constance Ard, December 10, 2012


Lawyers Hold Ultimate Responsibility for eDiscovery Processes: Are You Prepared?

December 7, 2012

Sometimes risk management as related to eDiscovery takes on new levels of quality control. As I read a recent Metropolitan Corporate Counsel article, “Risk Management and Quality Control of E-Discovery Vendors” the complexity of eDiscovery was driven home.

Understanding the capabilities, quality and costs related to using eDiscovery vendors is an important responsibility. The reason being, the attorney has ultimate responsibility.

This article offers some very good reminders about what should be asked, inspected and known about your vendors. It is more than a processing capability. Is the data truly secure? Is your vendor using other providers?

As the article points out:

The consequences are steep for failing to be fully engaged with e-discovery vendors. In J-M Mfg. Co., Inc. v. McDermott Will & Emery, No. BC 462 832 (Cal. App. Dep’t Super. Ct. L.A. Cnty. filed June 2, 2011), the defendant faced a legal malpractice suit when it allegedly did not carefully review the work of contract attorneys at an e-discovery vendor, resulting in the production of almost 4,000 privileged documents to the federal government in a whistleblower suit.

The need to be involved and to understand the capabilities of your vendors is important. Understanding in-house processes is just as important. The dimensions of where information is stored, how it is accessed and the management involved are critical components of knowing what, when, how and who related to eDiscovery processes.

Constance Ard, December 7, 2012


Information Management: The Lost Leg of EDRM

December 6, 2012

How big is eDiscovery? Really big! In a CMS Wire article, “Overview of e-Discovery Across the Microsoft Office Platform #spc12” we see that Microsoft is pushing into the technology wolf pack that is circling the corporate eDiscovery wagon.

At a recent Sharepoint Conference, author Mike Ferrera sat in on a session by Quentin Christensen, Program Manager at Microsoft and learned just how Microsoft was addressing the eDiscovery landscape.

Quentin began the session by talking about the nuances of e-Discovery, specifically Identify and Preserve, Search and Process, Review and Produce, and how Microsoft plans to address these. The main driver behind Microsoft’s e-Discovery push is in-place hold, query and export, which will give the legal team the ability to create case sites very quickly without having to call IT.

Like many technology vendors that offer eDiscovery solutions, Microsoft is paying less attention to the front end of the EDRM, especially the information management portion.

A company that seeks to get the best bang for their information technology and information management investments will do well to create a data map, and put in place governance measures that make the preserve, search and process work more cost-effective to manage.

Constance Ard, December 6, 2012


TextRadar Targets High Quality SharePoint Data

September 10, 2012

As I get more ingrained in the world of search technologies for enterprises, I am learning more and more about how to maximize investments and get good user results.

So when I ran across this Beyond Search post, “From SharePoint Semantics to Text Radar” and read the introductory paragraph I put it into my file of “read more closely” items.

SharePoint has been a go to search solution for Microsoft users. The limits were recognized but the availability and access were attractive. Then along came social content and the need to drive business decisions with access to content and the inadequacies were more apparent and a little les acceptable.

The offering of a high quality “abstract” service that really dives into the solutions available to make SharePoint perform better is worth taking note.

As the article states:

Organizations wanting to go “beyond SharePoint” have no easy way to keep track of the interrelated developments in these niches, said Stephen E Arnold, founder of Augmentext. “Our team of librarians and researchers will be looking for important articles and relating those articles’ key points to other companies, issues, and management factors. Technology is no longer enough to deal with the “beyond SharePoint” shift.”

The fact that Arnold and his team uses librarians and researchers to track the world of search and analytics, gives credence to the importance of high quality vetted information. If you are drowning in big data when what you really need is a way to pull out intelligent decision drivers then TextRadar may be a source for you.

Constance Ard, September 10, 2012

Authors Note: I am associated with Augmentext and ArnoldIT.


Competitive Business Edge Can Be Drawn from a Full View of Enterprise Search Options

August 22, 2012

I am happy to share the news that IDC has published another vendor profile in their open source search series. Last week IDC release “Polyspot: Unified Information Access Vendor Offers Flexibilty and Performance.”

According to the Beyond Search write up, “IDC Publishes Vendor Profile”:

The new IDC open source search reports represent an important milestone in coverage of this disruptive sector of information retrieval.

It is exciting to be a part of this work. This is the second report in the series, the first profile on LucidWorks, was released earlier in August.

The research I conducted during the writing of these reports reiterates that traditional enterprise search is shifting. The services provided by open source vendors are robust, flexible and professional.

If you are seeking a full view of the search enterprise market that encompasses both commercial and open source options the Arnold IT team stands ready. According to a statement by Stephen E. Arnold:

IDC has licensed ArnoldIT’s exclusive research about open source search and content processing. In addition to the profiles created for IDC, ArnoldIT offers an open source sector analysis which compares the functionality of open source search technology with that of proprietary search vendors.

The knowledge gained through the research done for these reports coupled with the ongoing tracking of the search market that Mr. Arnold conducts offers a robust view of how to maximize the use of internal data for enterprise knowledge and competitive business edges.

Constance Ard, August 22, 2012

Note: Constance Ard is a member of the Arnold IT team and a contributor to the IDC reports.



Open Search News Service Captures Evolving Industry News

April 20, 2012

A little late to the news but I’m excited to share that there is a resource collecting the latest information about open source search companies. This PRWeb news release, “Open Source Search News Service Debuts” alerts us to a new service that collects and indexes valuable research for the open source industry.

In looking through the new service, Open Search News, it is obvious that the service is collecting information that hits the big themes of the developing open search industry. Stories discuss a variety of companies who are providing solutions that can be implemented within enterprises and that use cloud technologies to optimize search technology investments.

The stories also seem to highlight the features and functions of the system that really hit upon what we need to efficiently and effectively search big data. Multiple languages, scalability and even information about the developer community that builds these open source solutions are discussed.

This looks to be a useful tool to track the best of what’s happening in the industry.  According to the press release,

The open search news information service provides news and analysis about the dynamic market for open source search. Vendors from IBM to Lucid Imagination are tapping into open source search systems, adding features and functions, and providing a lower cost, higher operating efficiency, findability solution.

Finding information about features, functions, considerations in choosing the system and having vendor data available makes for an easy place to research the options available.

Constance Ard, April 20, 2012

Note: Author is associated with Arnold IT


Choosing the Right eDiscovery Software Requires a Knowledgeable and Dedicated Team

April 10, 2012

“Reed Smith Brings Relativity In-House for e-Discovery Document Review” reveals a law firm’s forward looking investment into the importance of eDiscovery.

Reed Smith has always been a leader in legal services, especially as related to information management and use practices. Thus their development in 2011 of the Records and eDiscovery Team (RED) is not a surprise. 

Being methodical and getting ahead of the curve in legal services seems to be something Reed has always been good at. Thus their use of Relativity in-house just might be a clue to other firms who wish to jump on the fast moving eDiscovery practices development train.

The justification for their in-house software choice is shared in the article:

“In searching for the right e-discovery software, we found that Relativity maximizes efficiency during document review, ultimately reducing the cost of e-discovery for our clients,” said David Cohen, partner at Reed Smith and head of the RED Team. “We also wanted to bring in-house the same seamless experience that we’ve had with Relativity as a hosted solution, while continuing to customize the tool to our workflows. It’s a great addition for providing a high-quality level of service.”

Relativity may not be the best choice for all firms and there is a myriad of choices available.  Working with an eDiscovery team that understands the technology, the applications and the use of eDiscovery software is a necessity when seeking the right solution. Its unclear what the mix of attorney, information professional (aka librarian) and IT team members were for the testing and final choice at Reed.  Answer Maven suggests that representation from each of these three areas is essential to making the best choice.

Constance Ard, April 10, 2012


Follow

Get every new post delivered to your Inbox.

Join 350 other followers