Scholarly Publishing in a Connected World

Tzviya Siegman, Wiley

WWW2016 - Montreal

Research is sharing knowledge

  • Historically, publishing research meant publishing a flat, finished product
  • Articles have been read by our peers before and after publication
  • The published article is rather immutable

Even retractions and errata are discrete entitities

Retracted article from The Lancet with the title RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children

Machines are readers too

  • We are beginning to recognize the value of machines reading our research as well
  • Using tools of the open web platform can help researchers and even speed up research during times of crisis
Google blog post from March 3, 2016 with the title
Providing support to combat Zika in Brazil and beyond
Finding solutions to the rapid spread of the Zika virus can be aided by those with experience analyzing large sets of data. Source: Google Blog

Publishing may be a little behind

Oil portrait of 18th century family standing outside near windmills
1771 Portrait of The Wiley Family by William Williams, Smithsonian American Art Museum

Smithsonian Linked Open Data

The Smithsonian Linked Open Data page for the Wiley painting

History Lesson

  • Digitization of scholarly publishing is about the same age as the Web
  • In 1995, libraries didn't want to build additional wings to house stacks of journals
  • JSTOR was created to solve a real estate problem
  • Earliest online journals were HTML
  • The appeal and efficiency of desktop publishing was too good to pass up


  • Right now, we have a lot of faith in the “page”
  • Citations are usually page-based
  • Even when given the choice of HTML, many researchers choose PDF

Benefit from the crossroads

  • Reap the benefits of pages, or CSS "fragmentainers"
  • Don't be contained by the frozen page of desktop publishing
  • Publications can be living documents, with typography, supported by the full stack of web technology

Connected Publishing

Serious electronic literature (for scholarship, detailed controversy and detailed collaboration) must support bidirectional and profuse links, which cannot be embedded; and must offer facilities for easily tracking re-use on a principled basis among versions and quotations.
Theodor H. Nelson, "Xanalogical Structure, Needed Now More than Ever: Parallel Documents, Deep Links to Content, Deep Versioning and Deep Re-Use"

Hyper* Research

  • Hypertext gives us a lot of the same information as print and a protocol for sharing it
  • Hypermedia gives us an interactive layer
  • Hyperdata gives us networked, living information

How Are We Doing?

  • We are still looking for the best way to convey "deeply intertwingled" information
  • Traditional links rot. Is there a better way to convey relationships?
  • Can information from one resource change or affect something in another resource?
  • Is it possible to add information to a resource without touching it?

Hyper* check

  • Hierachical information is pretty comfortable
  • Associative information has come a long way
    • Convey metadata and relational information without tweaking the content
    • Convey data in a manner that does not require hopping in and out of systems

An Associative Index?

Read more about the extensible TEI and XML code base.

Search results of the Tufts Perseus index showing all results for medusa across all assets
The Perseus Hopper Digital Library is completely online. Users can not only read passages from texts, but use a suite of search tools to find what they are looking for, in any of the languages the Hopper supports.

Meaningful Metadata

Not too long ago (and often still) author information and titles were just meaningless blobs of text

Linked Data to the rescue!

linked data diagram showing author information for a journal article
Linked Data diagram for authors and affiliations of Brain and Behavior An update on immunopathogenesis, diagnosis, and treatment of multiple sclerosis, DOI: 10.1002/brb3.362, courtesy of Standard Analytics

Networked Publications

Our articles are talking to each other

                <li property="schema:author">
                    <span typeof="schema:Person" resource="">
                    <meta property="schema:givenName" content="Ada">
                    <meta property="schema:additionalName" content="Augusta">
                    <meta property="schema:familyName" content="Lovelace">
                    <span property="schema:name">Augusta Ada Lovelace</span>

Theory vs. Practicality

  • RDFa and friends give us the tools to link anything
  • A quick look through LOV will show hundreds of open vocabularies, with varying degrees of use
  • If I don't find what I need, I can write a new vocabulary or create a namespace and some SKOS equivalents

What is the point?

  • Do these endless, unique vocabularies actually assist in linking content?
  • Would it help to standardize on a small subset of vocabularies?

Use Case: Curation of Content

landing page of hub on wiley online library, showing search options by topic

Publishing Data

  • In the past, published data was, by definition, historical
  • Tools like IPython, Jupyter, and GitHub allow researchers to publish and never stop!
  • Web Annotations, GitHub Pull Requests allow anyone to comment
  • Living publications allow for living data

Use Case: Derivatives Analytics with Python (Wiley Finance)

Wiley Finance is a series of books aimed at finance and investment professionals. Often authors make tools such as spreadsheets available to the readers.

  • Derivatives Analytics with Python breaks the historical mold of firewalling everything behind a book's results, tables, and figures
  • Puts all Python code in an open Github Repository
  • The author provides a Jupyter Notebook to host the code on the Quant Platform for a standardized execution environment
Screen capture of GitHub Repository Pyhton code for Derivatives Analytics with Python

Lessons from the Past

  • In 2012 Wiley launched a site called Functional Chemistry
  • Instead of publishing raster images of compounds that originated in ChemDraw, we extracted InChIKeys from the ChemDraw files
  • This enabled our machines to identifying existing and novel compounds
  • Users able to view the compound labels, images, and schemes as well as the spectra associated with compounds
Wiley Functional Chemistry site highlighting the relationship between schemes, labels, and text.
Functional Chemsitry offered users the option of navigation by scheme or label before the Chemical Abstract Service touched the files.
Wiley Functional Chemistry site showing specific reagents highlited
Wiley tagged all the compounds in text, but at the time, the purpose was highlighting.

Data Lives

  • Live data, annotations allow authors and users to constantly update a publication
  • If an author updates the data in GitHub, the graphic representation of it will change in the article
  • What, then is the "publication of record"?
  • Can this be managed by versioning?

Use Case: Cochrane Library Systematic Reviews

Cochrane Reviews are systematic reviews of primary research in human health care and health policy, and are internationally recognised as the highest standard in evidence-based health care. They may either investigate the effects of interventions for prevention, treatment, and rehabilitation, or alternatively may assess the accuracy of a diagnostic test for a given condition in a specific patient group and setting. A unique feature of Cochrane Reviews is that they are living documents in that they are updated with new evidence that emerges. They were conceived as electronic publications from the outset, and designed to take advantage of features unique to electronic publishing.

There is still work to do

  • We have come a long way in exploiting the value of data on the web and data about the web
  • Many questions remain
  • Interoperability is still a goal!

Thank you

Tzviya Siegman