Why is it important for Elementa articles to be machine-intelligible?

Stuart Shieber
Stuart Shieber, James O. Welch, Jr. and Virginia B. Welch Professor of Computer Science, Harvard University.

“Text-mining articles within the context of a body of articles can expedite new discoveries”

Elementa is open access, and open data.  It is accessible through a number of machine as well as human-intelligible formats. For readers to be able to access articles through a variety of hand-held devices, content is available in EPUB3, PDF, HTML, and Mobipocket. To be machine-intelligible, it is also available in XML and JSON.

We spoke to Stuart Shieber, James O. Welch, Jr. and Virginia B. Welch Professor of Computer Science at Harvard who explained the importance of machine-intelligibility in the context of text-mining.

There are three primary reasons for the importance of machine intelligibility; firstly, it allows researchers to text-mine within their subject areas in a specific way. Disciplines that have had data available for text mining have progressively found new ways to make use of it. Text-mining articles within the context of a body of articles can expedite new discoveries. For example, searching for genes that are mentioned together in a body of articles can help detect otherwise undiscovered relationships.

Secondly, open, machine-intelligible content enables researchers to develop new methods of language processing and computer reading. Computer scientists and text miners can make new discoveries when they have a corpus of data, and new analytic methods can be discovered. Although in some cases, large amounts of data have been made available, raw data provided through XML facilitates new methods of text mining far more efficiently and flexibly.

Thirdly, text mining can inform public policy. For instance, funding bodies have the potential to identify research generated from their funding through open, machine-readable data.

In order to facilitate data deposit, Elementa has established a partnership with Dryad, an international repository of data underlying peer-reviewed articles in the basic and applied sciences.