EPUB – the non-proprietary format for the distribution and interchange of digital publications using web standards – has matured significantly in the past few years. Originating in 1999 as the Open E-Book standard (OEB), it was officially renamed as EPUB with the release of EPUB 1.0 by the International Digital Publishing Forum (IDPF)1 in 2007 – virtually simultaneously with Amazon's release of the first Kindle, which ironically (and, to many publishers and readers, unfortunately) was not based on EPUB. For the balance of that decade, EPUB was fine-tuned but was basically stable – EPUB 2.0.1, released in 2010, improved but did not significantly expand what EPUB could do – and the e-book landscape grew rapidly: Google digitized millions of books as EPUBs, Amazon sold millions of Kindles, academic libraries rapidly increased their holdings of e-books (although still mainly as PDFs or even in the old OEB format), and publishers began to realize that e-books were becoming a big opportunity (if, then, still a small part of their sales). By 2010, US publishers were selling US$500 million worth of e-books (a number that doubled to a billion dollars less than a year later).
But these EPUB-based e-books were mainly fiction and non-fiction trade books or straightforward academic monographs and reference works. The EPUB 2.0.1 standard was simply not robust enough – and the E Ink readers like the Kindle, Nook and Kobo devices of that day were not sophisticated enough – to handle books with complicated layouts and the need for features like interactivity and multimedia demanded by textbooks, cookbooks, children's books, magazines and a host of other types of publications.
That all changed in 2010 with the introduction of Apple's iPad. While iBooks, a featured benefit of the iPad, was based on EPUB, the books it could handle were still limited. The devices, and the myriad competitors it spawned, were not. Suddenly the publishers that had sat on the sidelines had devices that could provide the features and functionality that EPUB 2 and the E Ink readers had been lacking. The catch: exploiting those new features typically meant creating apps, which required proprietary and expensive programming that had to be repeated for each platform.
The birth of EPUB 3 …
Fortunately, the IDPF had seen this coming. A formal, broad-based working group of publishers, retailers, developers and others was convened and charged with an ambitious mandate: to create an update to the EPUB specification that would accommodate improved typography, complex layouts, rich media and scripting and that would be global in scope. Significantly, EPUB 3 was also mandated to be based on web standards and to accommodate accessibility in a meaningful fashion.
… and its difficult adolescence
While publishers were eager to embrace EPUB 3.0, they immediately encountered a problem: only some aspects of the new standard worked on any reading systems, and almost no aspects of it worked on all reading systems (except those that were already part of EPUB 2, since EPUB 3 is backwards compatible with EPUB 2). Apple's iBooks and Google Books (later, Google Play) were early supporters and adopters, but a publisher could not be sure a publication would render consistently in both of them or that advanced features would work the same way, if at all. Most problematic: the dominant retailer, Amazon, still required its proprietary Mobi format. The prospect of making ‘one file that works everywhere’ seemed to be more dream than reality.
“… only some aspects of the new standard worked on any reading systems, and almost no aspects of it worked on all reading systems …”
The reality is more encouraging than most people realize. Today, there are many EPUB 3-based reading systems. Apple iBooks and Google Play have become increasingly EPUB 3 compliant, as have dedicated reading devices like those from Nook and Kobo. Many specialized platforms are based on EPUB 3, like O'Reilly Safari, Benetech's accessible Bookshare, and CourseSmart and VitalSource, the two leading textbook platforms. Tools like Adobe's InDesign and free software like Google's Sigil make it easy to create EPUBs, and EPUB 3 is the basis for browser-based e-book reading systems like Readium, a free plug-in for Chrome. Standards like the DAISY4 standard for accessibility and the IDEAlliance PSV (PRISM Source Vocabulary) standard for magazines are designed to align with EPUB 3. The momentum is clear: although it takes a while for technologies to adapt to a new standard, there is a clear consensus among both publishers and system developers that EPUB 3 is here to stay.
An ecosystem in an ecosystem
What we have got now is an EPUB ecosystem: a complex network of devices and platforms and systems – some open and free, some proprietary and commercial – as well as publishers and publications of all types that use, or are gearing up to use, EPUB 3. They don't all have the same interests. Not everybody needs video or interactivity; some need rich media but don't need rich layouts. Companies have a right to engineer their systems for competitive advantage. What the Korean market demands is not the same as what Canadians need. Textbooks aren't the same as trade books, and books aren't the same as magazines. EPUB 3 is designed to accommodate all of this. It does that mainly not by being complicated, but by being simple, flexible and standards-based. It conforms to modern web standards, and it lets publishers do most things (but not everything) they can do on the web or in proprietary apps. This rich ecosystem is what is sometimes referred to as ‘the Open Web Platform’.
“… although it takes a while for technologies to adapt to a new standard, there is a clear consensus … that EPUB 3 is here to stay.”
EPUB 3 plays an important role in creating a specification for how to do this in a consistent, reliable way. It can't be too rigid, or it will be unworkable for publishers to use; it can't be too loose, or it will be impossible for reading systems to implement. EPUB 3 needs to strike a balance that lets publishers create files that can be trusted to work across reading systems and platforms while accommodating all kinds of publications – and letting companies compete by differentiating. Managing an ecosystem requires judgment and restraint. Dumping poison in a river kills the fish; you can ban the poison, but you can't take away the rivers or outlaw the fish. The EPUB ecosystem exists in the context of the wider publishing ecosystem. It doesn't make print obsolete; it doesn't take the place of PDF; it has to exist alongside Amazon, which dominates the retail and e-reader space. Most book publishers realize now that they need to create at least three versions of their files, given the reality of the publishing ecosystem today: PDF, Mobi (for Amazon) and EPUB. But it's the EPUB that has all the power and flexibility. It does so much more than PDF or Mobi, and it works in so many more places (even for creating that Mobi file!). It's what readers are increasingly coming to prefer – and to demand.
How EPUB 3 does this is by providing a specification for how to organize, structure and manage rich web-based content so that it can be packaged into a single-file format and distributed reliably to a wide variety of reading systems, online and offline. It is much more than a standard for content documents; in fact, it specifies very little about how to mark up documents, relying on the inherent semantics in HTML5 and supplementing that with a ‘structural semantics vocabulary’3. The ‘package file’ in an EPUB 3 is an XML document that contains a <metadata> section that accommodates much richer and more complex metadata than EPUB 2 did (while requiring only one thing more than EPUB 2.0.1 metadata required, a timestamp recording when the EPUB was generated). The <package> also contains a <manifest>, a ‘packing list’ that documents all the content documents, images, videos, audio files, scripts, stylesheets, fonts, etc. that together make up the EPUB, and a <spine> that defines a default reading order for the EPUB. It is even possible to embed an ONIX or MARC record in an EPUB 3. While EPUBs are inherently reflowable, it is also possible to create an EPUB 3 that uses a ‘fixed layout’; and the new ‘media overlays’ specification enables synchronization of the EPUB's text with recorded audio. The authoritative reference to all of the features of EPUB 3 is on the IDPF website4, and the most useful current reference to the creation of EPUBs is O'Reilly's EPUB 3 Best Practices.5
“Most book publishers realize now that they need to create at least three versions of their files …”
The evolution is ongoing
Another misconception about a standard like EPUB is that it should never change. While stability is important, stagnation is deadly. EPUB 3 was designed to evolve. It was written as a modular standard that would preserve the basic specification, EPUB 3.0, while enabling the addition of new functionality and features that continue to expand its usefulness. For example, working groups are currently developing specifications for indexes in EPUBs and for dictionaries and glossaries, as well as for ‘advanced hybrid layouts’ that will accommodate both fixed and reflowable content in a single EPUB. The EPUB 3.0.1 Working Group is currently refining the standard – mostly through minor fixes and documentation enhancements – with the goal of making EPUB 3 an ISO standard.
The IDPF provides many resources to help both publishers and reading system developers implement the EPUB 3 standard effectively. EPUBCheck6 enables publishers to make sure their EPUBs are correctly structured, and the EPUB Samples Project7 provides a wealth of model EPUB files. In development is a Compliance Test Suite that will consist of an authoritative set of files that will test reading system conformance to each EPUB 3 feature. And another organization, the Book Industry Study Group (BISG), publishes a frequently updated EPUB 3 Grid8 that monitors what features of EPUB 3 are implemented in each reading system.
All of this indicates that the EPUB 3 ecosystem is healthy and thriving. We are closer than ever to being able to create that long-dreamed-of single file that works across the many systems and devices that publishers and readers – and libraries! – want to use.
“While stability is important, stagnation is deadly.”