EPUB Zero

A Collection of Interesting Ideas,

This version:
http://dauwhe.github.io/epub-zero/
Editors:
(Wiley)
(Hachette Livre)

Abstract

A simple ebook format, as close to the Open Web Platform as possible.

Table of Contents

Introduction

Note: This specification is only a thought experiment, inspired by the EPUB+WEB White Paper (by the W3C Digital Publishing Interest Group and the IDPF) and by concerns about the complexity of EPUB3. Please join the Digital Publishing Interest Group and IDPF to help bring about a better future for ebooks.

A book, in the digital world, is a reading mode, not a file format. The Open Web Platform provides us with most of what we need for the content, style, and behavior of a publication. EPUB Zero aims to make publications just another part of the web.

EPUB3 is complex. It requires numerous custom XML vocabularies, and massive repetition of content. Developing a reading system for EPUB3 is a herculean task, as evidenced by the slow progress of Readium. Our goal is to create a simpler publication format, easier to author, and easy to read in a browser with as little additional technology as possible.

Note: EPUB Zero is not just for books, but any packaged web content. We will use the term publication to describe books, magazines, journals, manuals, reference documents, corporate documents, articles, etc.

Note: Any discussion of the definitions of packaged, web, content, book, or document will not be tolerated :)

1. Design goals and rationale

EPUB Zero satisfies the following design goals:

  1. Simplicity

  2. “Webbiness”

  3. Built from HTML and CSS and JSON as much as possible

  4. Uses existing, forward-looking standards whenever possible

  5. Don’t repeat yourself

  6. Built for the convenience of the author and reader rather than the implementor

  7. Aims to use the browser as reading system

Note: The ultimate test of this specification is whether we can build a reading system on top of an ordinary browser with only a bit of JS.

2. Overview

EPUB Zero is yet another ebook format, which isn’t just “based on” HTML like EPUB, but is HTML. Where possible, we use HTML solutions to achieve book-like functionality. It appears to be possible to do this using mostly-existing web technology:

  1. Online/offline reading via Service Workers

  2. Pagination via prollyfills

  3. Installable via application manifest

  4. Access to navigation document via sidebar link relation

But each of these features are problematic. As this document evolves, we hope to propose alternatives.

screenshot of an epub zero in the Opera browser
Using Opera 12.16 as an EPUB Zero Reading system.
folder structure of an EPUB Zero publication
Folder structure of an EPUB Zero publication

2.1. Comparison of EPUB 3 and EPUB Zero

Feature EPUB 3 EPUB+WEB EPUB Zero
Package Document manifest, reading order, metadata in XML vocabulary simplified JSON package file unnecessary?
Reading Order spine element in package file JSON spine link rel=next in HTML
Navigation Document required nav, optional ncx required nav required nav
Linking EPUB CFI EPUB CFI? TK
Publication Metadata custom XML format inside package file JSON package + external file in any vocabulary external file in any vocabulary
Content Documents XHTML5, SVG HTML5, SVG HTML5, SVG
Fixed Layout viewport in HTML + configuration in package ? just don’t even
Style subset of CSS any CSS any CSS
Multimedia html audio and video, mp3 and mp4 are core media types ? Anything supported by the browser
Fonts OTF and WOFF ? Anything supported by the browser
Scripting optional, recommend container constrained ? required for offline reading
Text-to-speech Media overlays/SMIL TK TK
Container OCF ? ?
Manifest OCF manifest JSON manifest nav + web app manifest?
Offline reading dependent on reading system dependent on reading system Service Workers

3. EPUB Zero documents

3.1. Content documents

An EPUB Zero content document is an HTML5 document.

Can SVG or raster images be a content document? Does [HTML5] define required media types? Issue: Is question whether SVG can be embedded inline in HTML (yes, http://www.w3.org/html/wg/drafts/html/master/semantics.html#svg-0) or whether SVG can be first class citizen?

3.2. Style

There are no restrictions on the use of CSS.

3.2.1. Pagination

Pagination is essential for an optimum long-form reading experience. Several approaches to pagination may be possible:

  1. Native support exists in Opera 12.16, via overflow: paged.

  2. Polyfills exist, based on either multicol or regions.

  3. Project Houdini may expose primitives making prollyfills easier.

Note: Reading long-form content in paginated form often offers a better experience for readers. We encourage document authors to support pagination via CSS, polyfills, prollyfills, reading systems, and/or political action.

3.2.2. Page Transitions

A book may consist of several HTML files. A user must be able to move from ch1.html to ch2.html as easily as moving from page 1 to page 2, and with the same action.

Note: discussed by CSSWG in thread starting at https://lists.w3.org/Archives/Public/www-style/2014Jan/0093.html

HTML5 link relations support describing previous and next files.

Opera and Firefox have UI for link rel=prev|next, but Safari, Chrome, and IE do not.

Using link relations introduces a burden on authoring that does not currently exist in EPUB3.

3.3. Interactivity

Security for downloaded publications. How is this handled with service workers?

3.4. Media Overlays

Browsers (as far as we know) do not support SMIL.

Note: See https://github.com/timesheets/timesheets.js for a JS implementation of SMIL

Are there polyfills that are “good enough?” Is there a better approach for synchronizing [HTML5] with multimedia?

3.5. Global Language Support

Most reading systems provide a link to the navigation document as part of the reading system user interface.

Note: In Opera and Firefox, opening a link with rel=sidebar can open a navigation document in the "secondary browsing context", aka sidebar. This does not work in Safari, Chrome, or IE.

3.7. Fixed Layout

FXL is often a bad idea

3.8. Accessibility

Note: Compliance with WCAG 2.0 and integration of ARIA 1.1 and the Digital Publishing module of ARIA will aid in creating accessible content.

How do we make EPUB Zero documents “born accessible?” EPUB 3.0 requires a nav document. EDUPUB requires the section element, and proper use of [HTML5] heading elements.

What’s the state of text-to-speech support in browsers?

4. Packaging

5. Installing EPUB Zero publications

EPUB Zero uses the web manifest specification https://w3c.github.io/manifest/ to facilitate installation as a webapp on user devices.

{  "name": "Moby-Dick",
  "short_name": "Moby-Dick",
  "icons": [{
        "src": "icons/moby-dick-icon.webp",
        "sizes": "64x64",
        "type": "image/webp"
      }],
  "start_url": "title-page.html",
  "display": "minimal-ui",
}

the display property is interesting. Adding an additional value "display": "book" might be a good way of indicating to the browser that it should display the content with a UI optimized for long-form content.

there seems to be some pushback against the web manifest specification.

How might a web manifest fulfill the function of an EPUB manifest?

6. Offline reading

Books must be readable offline and online.

Is offline reading really the same question as packaging/archiving?

Browsers currently offer ways of accessing web content while offline:

6.1. AppCache

Cache manifests allow offline access to web content: https://html.spec.whatwg.org/multipage/browsers.html#offline

AppCache will likely be removed from browsers in favor of Service Workers.

6.2. Service Workers

Service Workers are the preferred way of implementing offline viewing for web content.

Note: A service worker should be declared in start_url.

7. Publication structure and packaging

the EPUB+WEB White Paper discusses offline access together with packaging. Are these in fact the same issue?

An EPUB Zero publication is a collection of files, which should be collected inside a folder or directory. The top level of the directory should contain the package file. No other restrictions exist on the directory structure.

Compressing or otherwise packaging this directory may be required for many reasons, including

  1. reduction of file size

  2. creating a single “blob” that can be easily transmitted

  3. allowing for a digital signature

  4. allowing for digital rights management

  5. allowing for streaming of the publication

In simple cases, using ZIP may be sufficient. The W3C WebApps Packaging format may prove useful as well http://w3ctag.github.io/packaging-on-the-web/

how about using presence of package.json as trigger, as what defines a publication as something different than an ordinary bundle of web content? Properties in manifest.json can act as hints to UA/Browsing Context that the thing is “bookish”

8. Metadata

Publication-level metadata can be stored in the publication folder. We recommend the use of JSON-LD as a metadata format, but different communities may use other formats such as RDF, Turtle, or ONIX.

{
  "@context": "http://schema.org",
  "@type": "Book",
  "accessibilityAPI": "ARIA",
  "accessibilityControl": [
    "fullKeyboardControl",
    "fullMouseControl"
  ],
  "accessibilityFeature": [
    "largePrint/CSSEnabled",
    "highContrast/CSSEnabled",
    "resizeText/CSSEnabled"
  ],
  "accessibilityHazard": [
    "noFlashing",
    "noMotionSimulation",
    "noSound"
  ],
  "aggregateRating": {
    "@type": "AggregateRating",
    "reviewCount": "0"
  },
  "bookFormat": "EBook/e0",
  "copyrightHolder": {
    "@type": "Organization",
    "name": "Harper & Row"
  },
  "author": "Herman Melville",
  "datePublished": "1851-10-19",
  "image": "moby-dick-book-cover.jpg",
  "offers": {
    "@type": "Offer",
    "availability": "https://example.com/BuyMe?isbn=9780316123456",
    "price": "6.99",
    "priceCurrency": "USD"
  },
  "copyrightYear": "1851",
  "description": "Project Gutenberg edition of Moby-Dick",
  "genre": "Literary Fiction",
  "inLanguage": "en-US",
  "isFamilyFriendly": "true",
  "isbn": "9780000000000",
  "name": "Moby-Dick",
  "numberOfPages": "777",
  "publisher": {
    "@type": "Organization",
    "name": "Harper & Row"
  }
}


Acknowledgments

EPUB Zero was originally inspired by a series of posts on EPUB3 by Daniel Glazman.

Many thanks to Hadrien Gardeur for the structure of the JSON Package file.

Reading system behaviour

  1. a user agent receives an http request

    GET /urn:isbn:9780316123456
    Host: http://www.hachettebookgroup.com
    Accept: ????
    

what if there’s a zip (or any compressed file) or .e0 there?

  1. If manifest.json exists, read manifest.json; open start_url

  2. If manifest.json does not exist a. if index.html exists open that b. if index.html doesn’t exist open first html file c. if there are no html files open the first supported format d. otherwise error "there is not a book here"

References and Further Reading

Functional requirements for books, and possible solutions

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Informative References

[HTML5]
Ian Hickson; et al. HTML5. 28 October 2014. REC. URL: http://www.w3.org/TR/html5/

Issues Index

Can SVG or raster images be a content document? Does [HTML5] define required media types? Issue: Is question whether SVG can be embedded inline in HTML (yes, http://www.w3.org/html/wg/drafts/html/master/semantics.html#svg-0) or whether SVG can be first class citizen?
Opera and Firefox have UI for link rel=prev|next, but Safari, Chrome, and IE do not.
Using link relations introduces a burden on authoring that does not currently exist in EPUB3.
Security for downloaded publications. How is this handled with service workers?
Browsers (as far as we know) do not support SMIL.
FXL is often a bad idea
How do we make EPUB Zero documents “born accessible?” EPUB 3.0 requires a nav document. EDUPUB requires the section element, and proper use of [HTML5] heading elements.
What’s the state of text-to-speech support in browsers?
the display property is interesting. Adding an additional value "display": "book" might be a good way of indicating to the browser that it should display the content with a UI optimized for long-form content.
there seems to be some pushback against the web manifest specification.
How might a web manifest fulfill the function of an EPUB manifest?
Is offline reading really the same question as packaging/archiving?
AppCache will likely be removed from browsers in favor of Service Workers.
the EPUB+WEB White Paper discusses offline access together with packaging. Are these in fact the same issue?
how about using presence of package.json as trigger, as what defines a publication as something different than an ordinary bundle of web content? Properties in manifest.json can act as hints to UA/Browsing Context that the thing is “bookish”
what if there’s a zip (or any compressed file) or .e0 there?