A Song, A Coke and Scanning the World’s Knowledge

I despise duplication of effort.  Therefore, when it came time to begin the planning for the digitization phase of a current project I did a bit of research.

Apparently, Google is not the only one that has been out there digitizing library holdings.  Granted Project Gutenberg and the Internet Archive seem to be concentrating on public domain titles (pre-1922). It kind of makes one wonder why duplicate the effort?

I’d say let’s give the world a coke and a scanner and see if we can’t make our culture collection Alexandria available through collaboration.  Or is that already happening?  It seems chaotic at best to the uninitiated.

So as I try to make sense of this in my own world I thought I’d let you grab your own coke, sing your own song of harmony and I’ll give you a list of current digitization projects that I have discovered.

Author’s Note: I do not purport this list to be a complete and comprehensive list of ongoing projects.  These are projects I have stumbled upon while seeking information for another purpose.

Scanning without a coke.The Big Ones

  • Project Gutenberg
  • Internet Archive
  • Google
  • Microsoft’s Live   project has been discontinued but the as is items are still out there for your use and discovery.

Smaller Library Consortia projects with a scope of books and more.

At this point I feel like I’m barely scraping the surface of what is going on in the digitization of libraries.

What is apparent is that digitization is and will continue to happen.  Google is willing to make a private investment that will benefit the public good.  Libraries struggle with funding and human resources necessary to complete a digitization project.  I think I might be ready to let Google provide the coke and scanning robot while we sit back and enjoy the song.

It would be nice to avoid duplication of effort in the digitization projects but that is as impossible as gathering every publication ever into a single physical space.  There are benefits to consortia efforts and even special libraries with very niche collections that take on their own project.

It would be nice though to take comfort in the knowledge that one of the big guys is gonna do all the really important stuff.  Of course, then you get into the definition of what is important and that’s way to librarian for me to debate here.

I’ll settle for some best practices that help all involved in these projects to make the best of their resources both fiscal and human so that the greater good  of access is served well.  Perhaps by the end of my own work I’ll even be able to define some of those best practices.

Constance Ard February 17, 2010

2 Responses to A Song, A Coke and Scanning the World’s Knowledge

  1. satansparakeet says:

    Digitization is a big, ugly beast with many heads. The weirdest part about it is that the process changes so much depending on the types of books you’re digitizing and the sort of product you expect to come out at the end.

    Speaking as someone who has worked on the ground floor of a book digitization company, I can say that the quality that a company like Google produces while attempting to digitize as much as possible, as quickly as possible simply can’t be very good. Is it good enough? Probably, in most cases, for most members of the public, it is. Is it going to be a little sketchy to use for the purposes of any intensive academic studies of the contents of historic publications? You better believe it will be!

    This is a discussion that has been going on in the library literature since the beginning of the idea of digitizing historic collections so I won’t go much deeper into that discussion in this brief comment. The point I want to make, though, is that all digitization efforts are not equivalent so you need to be careful throwing around the idea of duplication of effort. Google Books is doing something quite different from what the Kentuckiana Digital Library is doing. Even if Google decided to start scanning the Daily Racing Form the results would look and feel quite distinct.

    The underlying purpose has a large role in determining the product that is produced by a digitization effort. Scanning books is scanning books is scanning books to the casual observer, but the results can be quite different. I’m struggling to think of good illustrative examples, which is one of the reasons why the difference is not always obvious, but having seen 20th century novels scanned where every use of modern becomes modem in the searchable text I can tell you there is a difference. Similar examples that I have heard from those who digitize newspapers is decisions about how to distinguish between advertisements and articles in the searchable text. For an operation like Google Books those are problems and questions that simply never come up. For scholars and researchers this kind of stuff can matter a lot, while for most people it is not a problem.

    • answermaven says:

      Alex,

      Thank you for your very thorough and thoughtful comment. Certainly the efforts at all levels of digitization are distinguished by the audience for whom the material is aimed and the underlying purpose of the effort. The human and fiscal resources available for every project also add another layer into this very complex undertaking. And that just scratches the surface of a beginning overview of digitization efforts.

      As I continue to scratch I’ll indulge and share more thoughts and I look forward to seeing other comments and insights in this space.

Leave a Reply

Please log in using one of these methods to post your comment:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 246 other followers