Google
wants to index all knowledge, and it thought that scanning a few tens
of millions of books might be a good addition to the compendium of
billions of Web pages, PDFs, and Word documents they already offer. The
only trouble? Most of the books they wanted to scan are still under
copyright protection. This caused the Association of American
Publishers (AAP), the Authors Guild, and other organizations to gnash
their teeth - and file lawsuits.
Last week, Google and a host of these complainants agreed to a settlement that a court must still approve
[1]. Google will contribute piles of cash - $125 million - to settle
outstanding issues and fund a new copyright clearinghouse that will
enable authors and publishers to receive funds for online viewing of
works.
The settlement also clears the way for far greater access to orphaned works:
books (and other material) that remain protected by copyright, but
which are out of print or out of production, the party owning the
rights is nowhere to be found, and the works largely unavailable even
through lending libraries.
Unlike the outcome of many lawsuits about copyright and access, this
settlement could be a big win for authors, publishers, readers, and
libraries. Could such a thing be possible?
(Full disclosure: I am a member of the Authors Guild. Although I did
not support the particular form of the Authors Guild lawsuit, neither
did I cancel my membership as a result of the legal action.)
[Editors' disclosure: With our Take Control hats on, we've worked
with Google Book Search for years, and it pains me to say that the
experience has been nothing but frustrating, with literally months of
delay between uploading a fully searchable PDF - no need to scan
anything - and having it posted. Plus, although Google's support people
responded quickly to our queries, they were universally useless at
addressing any complaints, such as posting delays or the existence of
guaranteed broken links to Amazon.com for our titles, given the fact
that Amazon doesn't resell our ebooks. I certainly hope that the
settlement will mean increased exposure for Google Book Search and our
content, and additional sales. -Adam]
The Backstory -- After a couple years of prep work, Google announced in 2004 its Google Print program [2], later renamed Google Book Search, as well as its Library Project, the controversial part.
Google started partnering with major publishers first, followed by
smaller houses - a total of 20,000 so far - to make their books
available in some form online.
Google's bigger objective was to partner with major academic
libraries around the world, scan books using high-speed techniques it
had invented, and use optical character recognition (OCR) technology to
turn the scans into searchable text.
Google Book Search made it possible for anyone to search the
contents of any scanned book and, depending on the copyright status of
the book and other factors, view or even download some or all pages.
(Microsoft started two similar programs which avoided many copyright
issues, but the company shut those projects down [3] in May 2008.)
This behavior rankled many because Google claimed the right to scan
copyright-protected books because the company wasn't per se
distributing the books, even though it had full digital copies. Google
maintained - in a rough approximation - that because it was working
under contract with libraries that owned physical copies of the books,
that making archival digital copies was perfectly legitimate, as was
turning the copyrighted works into text and images that weren't
revealed in whole on the Web.
The various parties aligned against Google disagreed, and filed suit in 2005.
The Variety of Works under Discussion --
Part of what publishers and the Authors Guild found problematic, and
part of how the settlement on which parties agreed was designed,
centers around separating works into three categories: public domain,
in copyright/out of print, and in copyright/in print.
- Public domain works are no longer covered by copyright, and may
be used in essentially any form and any fashion. Many publishers,
notably Dover [4], reprint
public-domain works in various forms and compendiums. Copyright holders
can also release all rights on works they control, placing a creation
in the public domain. Google Book Search makes the full text available,
including for download.
- Books that are in copyright, but out of print, are often called orphan works
when the owner of the rights can't be found or there's no clear owner,
as when a writer dies without an estate; there are also plenty of books
that writers and publishers can't find a way to get back into print or
wouldn't consider bringing back into print due to low sales or other
complexities. This broad category covers books that are no longer
stocked or available from the commercial book trade, although sometimes
individual authors buy remainders from a publisher - the last in-stock
copies that a publisher was intended to turn into pulp - and sells them
through hard effort. The copyright for out-of-print books may be owned
by a living person or his or her estate, by a trust, by a publisher, or
by a company; or it may be entirely unclear who (if anyone) owns the
copyright, which is likely the case for many works created before the
1980s. Out-of-print works make writers cry, because their hard-wrung
prose - fiction or non-fiction or reference - is unavailable, even if
the market desires it, because the economics of print publishing have
until recently put their children in the gutter. Google Book Search
makes the full text searchable, with snippets of context presented.
- Active books are in copyright and in print. Books that are actively
sold by publishers through booksellers or directly, even if they're 30,
40, or 70 years old, fit in this category. Publishers often refer to
their frontlist, books that are relatively new and actively promoted,
and their backlist, titles still in stock and available, and which may
even sell well, but which aren't promoted. The same searching and
results are allowed as with out of print titles. (By the way, Amazon's
special-order books program, launched at the same time as the
bookseller's overall store in 1995, was the first simple way to obtain
in-print books that weren't routinely stocked by either bookstores or
book distributors. Prior to Amazon, special order books required time
and effort on the part of a bookseller, and were often regarded as a
giant pain to fulfill.)
These three categories raise the question: what's covered under copyright, anyway? I'm glad you asked.
Copyright's Increasing Longevity --
Copyright law in the United States has been tweaked quite a bit since
the right was granted in the Constitution, and because of this, there's
quite a bit of complexity involved. The U.S. Copyright Office has a brief explanation [5], as well as a more extended discussion of terms [6].
If I can try to boil the discussion down for published works copyrighted in the United States:
- Everything copyrighted - registered with the Copyright Office - before 1922 is in the public domain.
- Nearly everything registered as under copyright starting in 1922
was under copyright initially for a term of 28 years, which could be
renewed on the 28th anniversary through the Copyright Office for
another 28 years.
- Works registered starting 01-Jan-50 are grandfathered through a
variety of rules to extend their copyright with no renewal being
required. There are a lot of niceties involved, but this is the general
rule.
- Any work copyrighted from 01-Jan-78 on is under copyright
protection the moment it's created for the author's life plus 70 years,
or for 95 years from publication for works owned by a company -
so-called "work for hire," in which a work was created by a statutorily
defined employee of a firm or institutions, or for which copyright has
been transferred by the individual or people involved to a company. No
registration is required, but it ensures both a proof of ownership
along with the maximum statutory damages (treble!) for successful proof
of violation. (Before the Sonny Bono Copyright Term Extension Act of 1998
[7], the duration was 50 years following death or 75 years for works
for hire. This was also pejoratively known as the Mickey Mouse
Protection Act, because Mickey's appearance in Steamboat Willie would
have entered the public domain in 2000.)
A lot more explanation, which I'll avoid here, is necessary for
rules surrounding other countries' copyright regulations prior to
general international agreement in the 1970s about copyright terms, and
rules in the United States for anonymous, pseudonymous, and unpublished
works.
If you read this carefully, you'll notice a gap. If a work was
registered starting in 1922 and before 1950, it would wind up in the
public domain if a renewal notice were not filed. It's unclear how many
hundreds of thousands or millions of works may have fallen into that
gap.
But you can see that there's a giant divide. Before 1922,
essentially everything. After 1922, nothing that anyone paid attention
to.
Fair Use -- Copyright law contains a
giant set of exemptions that are supposed to balance the U.S.
Constitution's language against the public good. Article 1, Section 8,
states that Congress shall have power "To promote the Progress of
Science and useful Arts, by securing for limited Times to Authors and
Inventors the exclusive Right to their respective Writings and
Discoveries."
Many arguments have been made about what limited times means - Stanford law professor Larry Lessig argued the Sonny Bono Act all the way to the Supreme Court
[8] - but the idea that copyright is intended not solely for the
benefit of "authors and inventors" but for society as a whole should be
undisputed. (If you've followed the actions of the movie and recording
industries, and legislative efforts to support their actions, you might
believe that copyright is all about ownership, not public good [9].)
In that spirit, Congress defined exceptions to copyright, including
fair use, which have further been refined by practice and the courts.
There's a quadripartite test when a claimed fair use is examined: the
commercial nature or lack thereof; the kind of work involved; the
quantity of work used in relation to the original; the effect on the
market of the original work. The test doesn't require every element to
be met, but each part to be evaluated against the whole. (You can read about this in more depth [10] at the Copyright Office.)
Google has argued that its efforts at scanning copyrighted books and
making them available for search with only snippets of results meet the
smell test: Google was making no specific commercial return on its book
search (in fact, investing tens of millions into its library-scanning
efforts with libraries), that the works were intended for public
distribution, that snippets were infinitesimal parts of books, and that
the search giant was stimulating demand for the books it provided
results against. Google provided links to purchase the books, and could
thus track sales, too.
The Authors Guild, among others, stated that simply the act of
creating electronic editions that were stored and distributed, required
permission from copyright holders, much less displaying the results.
With a little programming work, an interested party could extract
passages or entire books, too.
Without being a lawyer specializing in this area, I believe it was
and remains impossible to determine whether Google or its one-time
opponents would have prevailed. They clearly would have created a new
sub-area of law, either affirming, denying, or making far more
complicated the notion of whether creating and owning copies of
copyrighted works were de facto violations of the law.
But these one-time opponents are now at least somewhat supportive of
Google's efforts. What changed? Quite a lot, and in ways that all
parties, and we readers, stand to benefit.
Out-of-Print Books and Book Rights Registry
-- The settlement opens the way to allowing vastly improved
availability of in-copyright books by separating out-of-print and
in-print books into their respective categories, and collecting fees
for all snippet displays, page reading, and page printing.
Publishers, authors, and other copyright holders will be able to
opt-out of having out-of-print books included; by default, all
out-of-print books will be available, but parties can opt out. For
in-print books, those who own the rights will opt in. This allows all
of Google's existing partners to continue what they're doing, and
publishers to experiment by adding specific titles or simply adding
their entire catalogs.
If I read the settlement right, publishers who do not opt in to
allow in-print titles to be included by Google will simply have their
works removed if available or not added in the future. (A complete set
of links to resources can be found at the Authors Guild site [11].)
Where this agreement goes far beyond Google's current program,
making it a win for Google, is that Google will now be able to provide
not just snippet results, but entire pages or books (for viewing and
printing).
Google would collect the fees and pass them on to the Book Rights
Registry, which will be run by a board of authors and publishers, and
be founded with $34.5 million of a $125 million settlement that Google
has agreed to pay - without admitting that any of Google's claims are
invalid.
Authors and publishers win by suddenly having a mechanism to
disseminate electronic editions while collecting for per-snippet,
per-page viewing, and per-page printing. Google has agreed to a 63-37
split in favor of the copyright holder.
The public wins because the settlement calls for a free subscription
license for "designated" computers at all U.S. public and academic
libraries - a miserly 1 per public library building or either 1 per
4,000 or 1 per 10,000 students, depending on the institution type.
Google has also agreed to pay all printing royalty fees for 5 years or
up to $3 million, whichever comes first, for these qualifying locations.
Other institutions can pay for overarching printing and reading
licenses, and public libraries can upgrade to fuller licenses, too.
Without knowing what these more extensive subscriptions cost, it's hard
to know whether public libraries will be able to afford them. Wade
Roush of Xconomy, from whose writing I learned about the limits on free
library access, is down on the whole deal
[12], partly due to the scale of free access and partly due to the
default pricing that Google will set on out-of-print, in-copyright
books.
Anyone who researches a topic should benefit from the availability
of out-of-print works, as they comprise many millions of titles that
are rarely available in wide circulation. Ten libraries around the
world might have a particular book you need, but that doesn't mean you
can gain access to it.
Google has also agreed to pay legal fees, and at least $45 million
to copyright holders whose works were scanned before a certain date
connected to the lawsuit.
Now, of course, not all publishers or copyright holders are
represented by the parties involved, and some may choose to sue
separately in the future. The court might also require the parties to
appear in court, although courts prefer settlements.
The only fly in the ointment is that copyright holders of
out-of-print but in-copyright works are being de facto opted in to
having their works available by virtue of this settlement, even if
they're not party to it. That should fly, because most of these
creators or owners can get no value out of their works at present, and
few people complain about receiving additional compensation. Further,
the creation of a clearinghouse gives a kind of imprimatur, allowing a
party that represents authors and publishers to make sure out-of-print
works see life again.
There was the notable case in the music world of James Carter, a
former convict whose voice was recorded on a chain gang in 1959 by
pioneering folk music collector Alan Lomax. In 2002, the song he sang,
"Po' Lazarus," was used in the opening of the movie "O Brother, Where
Art Thou?" The soundtrack sold 4 million copies.
Carter, who left prison in 1967 and had led a quiet life since, was
tracked down after months of research by the Lomax archives, and presented with a $20,000 check [13]; he received $100,000 by his death in 2003.
Avoiding Collision with the Future -- I'm
a writer. I make my living by sitting down and typing, as I am now. The
notion of Google appropriating my words without my permission or
acknowledgment always bothered me, even though I also accepted that
there was a fine chance that the company was operating within the legal
constraints of copyright law.
I similarly was troubled by the Authors Guild partnering with what
is often its natural enemy, the AAP, in trying to prevent Google from
related activities, some of which seemed to benefit me and authors, and
others of which did not. (For instance, the AAP at times has suggested
that public libraries should pay fees to publishers when they lend
works. While this is the case in EU nations
[14], authors generally don't believe that publishers would pass along
these fees to authors; that's separate from the seemingly un-American
idea that public libraries pay royalties!)
This reconciliation doesn't solve all issues, but it makes it much
more likely that independent authors and publishers survive and even
thrive by providing a broader marketplace, while also providing greater
availability of human knowledge. While the ease of access to publicly
promulgated information, like Web pages, has increased, trends seemed
to suggest that books would go down the path that movies are still
taking and music is slowly escaping from: being available only in
highly restricted ways that interfere with technological progress.
With this new agreement in place, it's possible that you could
publish a book, distribute it entirely through Google Book Search, and
earn some money - maybe even a lot of money if the book goes viral -
and bypass publishers entirely. That was the promise of the Internet
music, blog, and podcast revolutions, too. While it hasn't come true
for everyone, it's certain that many more voices are being heard by
many more people around the world. And that's a good thing.
[Note: This article was edited to clarify the difference between
in-copyright, out-of-print works which are orphaned - no copyright
owner is known or can be found - and those that are not.]
[1]: http://books.google.com/booksrightsholders/
[2]: http://books.google.com/googlebooks/newsviews/history.html
[3]: http://www.libraryjournal.com/article/CA6564275.html
[4]: http://store.doverpublications.com/by-subject-literature-dover-thrift-editions.html
[5]: http://www.copyright.gov/help/faq/faq-duration.html#duration
[6]: http://www.copyright.gov/circs/circ15a.html#works
[7]: http://en.wikipedia.org/wiki/Sonny_Bono_Copyright_Term_Extension_Act
[8]: http://www.lessig.org/blog/eldredcc/
[9]: http://wiki.lessig.org/index.php/Against_perpetual_copyright
[10]: http://www.copyright.gov/fls/fl102.html
[11]: http://authorsguild.org/advocacy/articles/settlement-resources.html
[12]: http://www.xconomy.com/boston/2008/10/31/in-google-book-search-settlement-readers-lose/
[13]: http://articles.latimes.com/2003/dec/08/local/me-carter8
[14]: http://en.wikipedia.org/wiki/Directive_92/100/EEC