Image 01

Archive for the ‘news’ Category

Ebook news roundup

Friday, February 5th, 2010

Apple’s new iPad will include an ebook store, iBooks, with reader software pre-installed. The device is larger than an iPod but smaller than a standard laptop, and can handle web pages, color graphics and video. Apple has done distribution deals with the world’s largest book publishers, and a number of academic publishers have contracted with Scrollmotion to create interactive textbooks for the new device. Interestingly, unlike Amazon, Apple will be using the industry-standard EPUB format for its files, which at least theoretically would make them more easily shareable with other devices. No word, however, on whether Apple will add copy protection to the files, which seems depressingly likely.

As expected, the Apple announcement completely overshadows the raft of new ebook devices announced at the Consumer Electronics Show in Las Vegas last month. Photo gallery from CNET.

The most recent deadline for comments on the Google Books settlement has passed. Almost everyone who complained about the first settlement deal complained about the revised deal on the same grounds, including the Department of Justice. Critics say the deal still gives Google unprecedented control of orphaned works (those out of print and of uncertain ownership), and uses a legal settlement to significantly change copyright law. Many authors and their representatives also feel the deal should not assume consent from authors unless they object. The next hearing is scheduled for February 18th.

Cleopatra's Pylon and Google's Greek

Friday, December 18th, 2009

(Image courtesy of Wikimedia Commons)

According to the Associated Press, Egyptian archaeologists have excavated a huge pylon thought to be from the entrance of the Temple of Isis in Alexandria. Actually, dredged would probably be a better word, as much of the ancient city is now beneath the harbor of the modern city. The temple was part of the palace complex of Cleopatra.
More from Tisch on underwater archaeology.

Google Transliteration
Google Labs has a tool, Google Transliteration, to make it easier to write in non-Roman alphabets online and using several of its services. You can type what the word sounds like in English and have it automatically transliterated into Unicode versions of Greek (alas, modern only), Sanskrit, Arabic, Hindi, Persian, Russian, and about a dozen other languages. You can also install a version for Windows on your machine. Early experiments are πάντα καλά.

Cormac McCarthy's typewriter and Jane Austen's Twitter account

Monday, December 7th, 2009

Cormac McCarthy’s typewriter was sold at auction last week for $254,500, according to the New York Times. A friend bought McCarthy a new one, the same style of Olivetti, for less than $20.

McCarthy’s works at Tisch.

Sarah Milstein at O’Reilly Radar comments on the speed of mail delivery in 18th century London, sometimes as much as six times a day. Which from a certain perspective makes Jane Austen’s letters more like email or even Twitter posts in immediacy and level of detail. Or at least allows one to make the comparison

Revised Google Books settlement

Monday, November 16th, 2009

Google and publishers submitted a revised settlement agreement to the court on Friday which addresses some of the concerns expressed about the previous agreement. The best coverage, as usual, is on Danny Sullivan’s Search Engine Land. The Open Book Alliance, which includes Amazon and Microsoft as well as some library associations, opposed the previous settlement. It also opposes this, referring to it as “sleight of hand“–a quote which has been recopied so many times in news stories this weekend that I almost didn’t include the link here out of sheer disgust. I recommend a look at Google’s summary of new developments.

The major news for scholarly purposes is that this version of the agreement includes only American, British, Canadian, and Australian publishers. This sidesteps problems raised by the EU, Germany, and other countries with different copyright arrangements. It also makes the resulting collection of books less useful for scholarly purposes.

Much of what I want has been published in the English-speaking world, but much of it has not, and the ability to wander across something from (say) the Bulgarian Academy of Sciences is part of what makes a comprehensive digital collection valuable to me. Dan Clancy, engineering director of the project for Google, “estimates”: that at least 50% of any given university library collection would be excluded.

On the other hand, I should arguably be happy that the new deal feels much less like an existential threat to libraries. I’m not, precisely, because I think it would replicate the situation we currently have, where the freely available tools appear to be comprehensive to users but actually aren’t. I frequently talk to students using Google Scholar who are not aware that we subscribe to much of what publishers there offer to let you pay for. Complexity is job security for me, but I’d much rather have services which are self-explanatory.

Free ebooks from University of Chicago Press

Monday, November 9th, 2009

The University of Chicago Press has begun a monthly ebook giveaway. This is interesting to me for a couple of reasons. First, it’s a major university press experimenting with both a new format and with using free ebooks as publicity. Oxford University Press and others have found that making it easy to search and browse an ebook increases sales; Cory Doctorow, a sci-fi novelist, actually posts freely distributable copies of his books online and finds it increases sales (Ebooks: Neither E Nor Books).

This month’s givewaway item is Censorinus’ Birthday Book. I’m also happy to promote the work of Holt Parker, one of my professors at the University of Cincinnati. His choice of projects has always been first-rate and slightly off-beat. The download is quick and painless, and a nice introduction to Adobe’s Digital Editions software, which was the first ebook reader software I ever saw which was pretty enough to make me want to spend time in it.

Tufts has the print version of this ebook if you’re inclined to compare and contrast. We also have Parker’s translation of the works of Olympia Morata, The Complete Writings of an Italian Heretic. More of Parker’s work are findable on Worldcat.

(via Ancient World Bloggers Group and an editor friend of mine)

Literary Twitter

Thursday, October 15th, 2009

Twitter logo
One of the conventional responses to Twitter is puzzlement. It’s in the news all the time now, influencing revolutions and political coverage and celebrity news. And, despite having had it explained to me several times it made no sense to me as a concept until I realized that it’s not actually a service or a social network, it’s a new form of publication that you can do almost anything with. With that in mind, here are two literary uses for Twitter: one recent, one which I’ve been following for a while.

Collaborative Publishing

Neil Gaiman and the BBC are collaboratively writing an audiobook using Twitter (via Found History). Gaiman wrote the first line on October 13th, and Twitter users are writing the rest, one line at a time. When they get to about a thousand tweets, it will become a script for a BBC audio recording. Description from BBC Audiobooks America.
Here’s the thing: it’s actually not bad. It’s a fairy tale/sci fi/fantasy story about a princess with a missing heart, and it’s also more or less in Gaiman’s style. If you follow the Twitter stream, the story runs in reverse chronological order, i.e., with the most recent thing on top. There are also periodic links to summaries of the story so far.
(Covered in Publishers Weekly and Library Journal)

Dark Epigrams

For about a year the New York Review of Books has been posting a series of very short pieces Félix Fénéon wrote for Le Matin in 1906, from their published edition, called Novels in Three Lines. I found out about this through one of my favorite book blogs, If: book (run by the Institute for the Future of the Book). Their coverage here. Fénéon’s style falls somewhere between epigram, Zen painting, and News of the Weird. Here’s the post which got me to start reading:
“In a café on Rue Fontaine, Vautour, Lenoir, and Atanis exchanged a few bullets regarding their wives, who were not present.” (novelsin3lines)

Tisch on Twitter

You can follow TischLibrary on Twitter. The library Twitter account is something we just started doing: at the moment it’s mostly news, and the occasional answer to a Frequently Asked Question. It’s moderated at the moment by the excellent Alex May with help from the Tisch Web Services team, of which group yours truly is a member.

Google Books: Department of Justice and ReCaptcha

Monday, September 21st, 2009

Department of Justice Brief Filed on Google Books Settlement

Not surprisingly there has been a lot of activity around the Google Books settlement right around the deadlines for filing. The US Department of Justice filed a brief (PDF) (link via searchengineland) with the court on Friday. DOJ objects to the settlement as proposed on several grounds, while recognizing the cultural significance of what Google is working on. The objections are: 1) the scope and value of the settlement are public interests also, and may not be appropriately settled by a private lawsuit–really, Congress should be drafting legislation to do this, 2) the structure of the proposed Book Rights Registry may create a situation which makes it impossible for anyone else to compete in the new market, 3) not all of the interested parties may have been adequately notified of the settlement and possible changes to their rights, 4) it would be more appropriate for Google to have rights-holders opt in to the agreement rather than the current arrangement, which assumes consent. More detailed summaries at the New York Times and Search Engine Land.

Significantly, DOJ is not rejecting the deal outright, and is apparently working with the parties to modify the agreement. A hearing is scheduled for October 7th.

Improving OCR

As an example of how quickly things can change, while I was composing my book-length post on problems with optical character recognition, Google was buying a company which has been working on that problem in a really innovative way. Captchas are the odd squiggly text websites force you to log in with in order to screen out spam bots. ReCaptcha takes advantage of this common mechanism to proofread troublesome documents. Instead of one word, users are offered two. The first word is known to be correct, the second is one that was flagged as questionable from an online archive of texts. If you get the first one right your reading of the second one is likely to be right, too, and that data can be fed back to improve both the source text and the OCR software. The technology has already been used to improve the New York Times historical archive. Google announcement (via CNET). How ReCaptcha works.
I suspect it will be a while before Gothic typeface German or Ancient Greek will be prioritized, though since the ReCaptcha technology is designed to be installed on a variety of different websites there’s no reason, for example, specialized web communities like H-Net or Voice of the Shuttle couldn’t install it and let their expert users contribute their expertise.

The Book, Terms of Service

Thursday, September 10th, 2009

A nice thought experiment in what the licensing terms for a book would be, if it were spelled out in the terms used for computer software, music, and movies. (via


I. Privacy
What takes place in the exchange between your brain and the contents of The Book is your exclusive private concern. The Book will never download the contents of your brain, either whole or in part.

Google Books Practicalities (Part II)

Tuesday, September 8th, 2009

google-books.png(from a Greek/Latin New Testament on Google Books)

Professionally speaking I have mixed feelings about the Google Books project, for reasons which I will try to explain below. It has the potential to completely change how research in the Humanities is done. I’m not given to hyperbole, simply saying that a full-text searchable database of book-length texts the size of a research library (much less two dozen) has never existed, and there are all sorts of possible unintended consequences. Most of them are good for scholarship. The proposed settlement was a surprise–I had been placidly assuming that the lawsuits would go on forever–and I quite literally spent about forty-five minutes staring at my computer screen the day last fall when it was announced trying to figure out what the ramifications were. This series of posts contains some of my thoughts, and I’ll share more as I read more.

What Google Books Does Well

Full text search. It provides searchable access to a vast quantity of published literature. Library catalogs are built to facilitate browsing. Books are described in general terms, and once you’re in the right spot on the shelf you’re surrounded by related materials. This is exactly what you need for some projects. But for other projects, ones where you’re looking for an obscure fact or name, or (and this is a library school classic) identify the original version of a particular quote or phrase, it doesn’t work as well. Going that extra step requires you to use the index and table of contents of the book. It’s entirely possible to flip through an entire shelf of books in a few minutes if all you’re looking for is references to a person or idea…but standard indexing never covers every word or concept mentioned. A computerized index (ideally at least, see below) does. A computer can search thousands or millions of items in the time it takes you to open one.

Scale. There are ebook collections which can be searched in bulk, but most built for scholarly purposes are fairly small. Even Early English Books Online, a massive collection, includes only about 125,000 items. Google is scanning the entire contents of dozens of research libraries worldwide: books, journals, everything. Current estimates are that ten million items have been scanned so far.

Scope. Modern research has been becoming more and more interdisciplinary for years. In contrast, most ebook collections, like the 1700 titles in ACLS Humanities, are small and fairly narrowly scoped.

What Google Books Does Poorly

Again, I’ll refer you to the article by Geoff Nunberg I mentioned in my previous post. He eloquently makes the case for metadata, which is to say that it really does matter how well you can describe the book and relate it to others.

Typefaces and Languages
The search index is generated based on Optical Character Recognition (OCR) run on the PDF images of the texts produced by scanners. OCR works best on clean texts with modern typefaces. Old books fare less well. Books in foreign languages (especially non-Roman alphabets) fare less well. Old books in foreign languages are very, very hard to do. Here are a couple of extreme examples. Google gets credit for providing access to the raw text of the search index, which most traditional library vendors do not.

The process is similar in concept to what JSTOR and ACLS do, but because JSTOR (about 1500 journals) and ACLS (about 1700 books) cover vastly smaller collections of texts their descriptive and cataloging data tends to be of much higher quality. From conversations with JSTOR reps at conferences they are aware of the problem I am about to describe, but none of the database aggregators and vendors works particularly well with this kind of material. The only solution on a large scale is Project Gutenberg‘s collaborative proofreading model.

From Google Books:
A Greek-Latin parallel text New Testament (1821)
what the page looks like
what the search engine sees
Comment: I love several things about this, while being really sympathetic to how hard it is to get a machine to make any sense of what’s going on here. The first is that, as is not uncommon, the introduction to this Greek-Latin parallel New Testament is in Latin. As is half the text. Yet the language that seems to have been used for OCR is Greek. So all the Latin gets processed into Greek gibberish (as is most of the Greek), and the division of the page is not noted by the computer interpreter at all.
Liddell and Scott Greek Dictionary (1848)
what the page looks like
what the search engine sees

Lewis and Short’s Elementary Latin Dictionary
what the page looks like
what the search engine sees

Comment: This is almost usable, though it’s not a completely reliable version. Latin seems to work better, but italics and all the other tricks typesetters use to make a huge body of text legible to humans really aren’t to a computer.

Immanuel Kant’s Gesammelte Schriften, volume 5
what the page looks like
what the search engine sees

Comment: This is, to borrow Nunberg’s phrase, a complete train wreck. For practical purposes it’s not searchable…and it’s the canonical version of Kant’s complete works.

Things That Reassure Me I Will Continue To Have A Job

1) Optical Character Recognition cannot do everything yet. In particular, until it can reliably search a Latin text from the 19th century or prior, much less anything in Greek or in German Gothic type, it is not yet a replacement for the tools traditionally in use.

2) Google does not know that volume five of Kant’s Collected Works is related to anything else in the eight volume edition Tufts has. So if I wanted the Critique of Pure Reason, volume 3, Google thinks I will be able to find that by searching for it. Except #1.

3) The settlement only applies to out of print books. Current editions will still only be searchable on whatever terms Google agrees to with publishers. The library will still be your most convenient free source.

And yet, like Nunberg, I am optimistic. There are explicit terms in the agreement which allow Google to make the data available for research and study. Once the huge collection of data exists it will be possible to make it better.
Questions or concerns about Google Books or the future of libraries? Ask in the comments.

Google Books Settlement (Part I)

Friday, September 4th, 2009

Image of digital scanner Today (September 4th, 2009) is the last day for authors to opt out of the proposed class action settlement between Google and publishers concerning the Google Books project. Since the settlement was announced last fall there has been a blizzard of press coverage and arguments for and against the settlement. Part I is an introduction to the project, the players, and what’s under discussion.

The Story So Far
Starting in 2004 Google, in cooperation with several libraries worldwide, including Stanford, the University of Michigan, Oxford, Harvard, and the New York Public Library, began the largest digitization project in history. The goal was to comprehensively scan and make available online the collections of these major research institutions. Google claimed that “fair use” allowed it to make copies of the millions of works involved as long as it displayed no more than brief excerpts online. Publishers disagreed, claiming that the act of making the copies was itself copyright infringement on a massive scale, and sued Google in 2005. While the case was pending a number of other libraries have joined the project, including the Bibliothèque Nationale, the Bavarian State Library, and the National Library of Catalonia. A proposed settlement was announced last fall, with a time limit for objections set by the court overseeing the case.

The Settlement
The settlement would set up a Book Rights Registry to manage royalties from the sale of digital books and advertising. Current publications and pre-1923 publications would be handled the same way they are now. A major problem with past digitization efforts has been determining who owns the copyright to large numbers of “orphan works”, those whose copyright holders cannot easily (or at all) be identified. The settlement would hold royalties in trust where copyright is unclear, and therefore provide an incentive for publishers to claim their orphan works.
Objections to the settlement largely center on monopoly power, pricing, and privacy. Robert Darnton of Harvard started the discussion of serious objections to the settlement in an article in the New York Review of Books in February. While the settlement is not an exclusive one, the path Google has taken is not one open to many others. Anyone wanting to replicate the Google Books project would have to begin scanning books, get sued by every author and publisher on the planet, and come to a settlement. Google could also abuse its monopoly position by raising prices to whatever level it wanted. Objections by the Internet Archive and Amazon say the way to do this is to change the law to make it possible, not use a court case to completely change how copyright law works to the benefit of one company. The FTC and the American Library Association have expressed concerns about the privacy implications of such a large digital library.

A second set of concerns, not related to the legalities, are concerns about the quality of the data produced by Google’s massive and rapid scanning (almost 10 million books in five years). A recent article in the Chronicle of Higher Education by Geoff Nunberg of UC-Berkeley points out a range of problems which need to be addressed to make the Google library useful for scholarship.

Additional overviews
Tome Raider (brief). By the Economist. Good coverage of the European angle, which is important but not being discussed in US publications as much, because the settlement would apply to US publishers only.
Google’s Moon Shot (lengthy). By Jeffrey Toobin, in the New Yorker.