Google opens up 1 million public domain books

Posted on Aug 26, 2009

It's fairly common knowledge that Google has been scanning in (actually photographing) entire books, converting the images to text (OCR) and storing the digital text in a database for searching for the Google Books project. Google claimed about a year ago that by that time they had scanned in over 7 million titles at a cost of over $5 million dollars. (note: Google has taken steps to make it difficult or impossible to download or print portions of copyrighted material.) Today they announced that users can download a million titles that are in the public domain - books for which no copyright is in force.

They will be offering these titles in EPUB and PDF format. I'm not familiar with EPUB, but it appears to be an open format which allows the text to "flow" into whatever screen size you have. As a long time user of Plucker on the palm for ebook reading, I'm hoping it's at least as cool as that open source project. This is a very exciting day! The only downside is that the texts are just OCR'd scans and they aren't cleaned up at all. Hopefully folks like those that participate in the Gutenberg project will lend a hand to help get them fixed up.