This document describes the application programming interface (API) of Mr. DLib, which is an acronym for machine-readable digital library. The first section of the document gives an overview of Mr. DLib’s RESTful API. The second section describes the interface and its parameters. The third section lists the available resources and the information they offer. The fourth section summarizes the development status of individual API components and gives an outlook on planned features.

1 Web Services offered by Mr. DLib

This section introduces the web services offered by Mr. DLib via a RESTful API. The REST architecture closely resembles the design of the HTTP standard. This similarity of basic concepts in REST and HTTP simplifies the usage of the API for web developers.

REST is resource-focused. The main resources in Mr. DLib are documents (journal and conference articles, conference proceedings, books, and theses), persons (authors, editors, and conference chairs), conferences, journals, and organizations (universities and publishers). Mr. DLib provides links between these resources, e.g. between authors and articles, and between articles and conference proceedings wherever possible.

Retrieving data from Mr. DLib requires a simple HTTP GET command on an URI using a HTTP client (e.g. a web browser). Analogously, adding, modifying and deleting resources in Mr. DLib will be possible (not yet part of the public API) by using the POST, PUT and DELETE methods defined in the HTTP standard. Likewise, the Mr. DLib API uses status messages that are adopted from the HTTP standard. For instance, if a resource does not exist, Mr. DLib returns a 404 NOT FOUND error. If a request is successful, the API returns a 200 OK status.

2 API Overview

Mr. DLib is accessible via the HTTP protocol. The base URL is:

The complete URL structure is:

All arguments in parentheses (so all except resource_type) are optional. The individual parameters are:

resource_type: The argument specifies the type of the resource to be requested. Details of the resource types available in Mr. DLib are listed in section 3. For instance, a request of the URL returns a list of all documents in Mr. DLib. The output includes basic metadata for each document and a URI pointing to the specific document resource as shown in Figure 1.

Figure 1: List of documents in XML format

Figure 1: List of documents in XML format

resource_sel (Resource Selector): Resource selectors can be specified for limiting the returned result set to specific resources. Depending on the resource type, different selectors can be used, which are explained in section 3. Each record in Mr. DLib has a unique numeric identifier (ID), which is a valid selector for all resource types. Details of a specific resource can be retrieved by requesting a URL of the form: . For instance, requesting the URL returns all data for the document having the ID 1234 as shown in Figure 2.

Figure 2: Metadata for a document (excerpt)

Figure 2: Metadata for a document (excerpt)

resource_element (Resource Element): The argument can be used to limit the output to certain elements of the requested resource. The web service returns different data elements depending on the resource type and the availability of the specific data element for a certain entity. For instance, a journal article consists of elements including title, authors, keywords, publishing date, page numbers, and URIs to full-texts.

parameters: Different parameters can be specified for controlling the behavior of Mr. DLib. These are:

  • format specifies the data format in which results are to be returned. Mr. DLib supports the following formats:
    • ‘xml’ (default): XML
    • ‘json’: JSON
  • filter (unstable API): is a special parameter used for limiting the original query to a subset of the data contained in Mr. DLib. For instance, issuing the request would display all documents in Mr. DLib’s database originating from
  • dlo (date low) and dhi (date high): can be used to filter for publication date of documents. For instance, issuing the request returns documents published in January and February 2012
  • number (default=1000): adjusts the number of results per page
  • page (default=1): sets the page number of the result list to be returned

3 Specification of Resource Types

This section describes the resource types available in Mr. DLib, the respective data elements they consist of and the selectors available for retrieving individual resources.

3.1 authors

The authors resource type is used to store author information in Mr. DLib.

3.1.1 Resource Selectors
  • none: Lists all authors
  • {id}: Returns solely the author resource with the specified numeric ID
  • {name}: Returns last name(s) starting with the specified string
3.1.2 Resource Elements


3.2 documents

The documents resource type is used to store document metadata in Mr. DLib.

3.2.1 Resource selectors
  • none (POST): Adds a document given as XML or PDF file to Mr. DLib and returns the newly created resource (access restricted)
  • none (GET): Lists all documents
  • {id}: Returns the document resource having the given id
  • {q}: Document matching a Lucene search query. Lucene’s query format is described in the Lucene documentation. The following fields are available in Mr. DLib:
    • id: ID
    • type: article type
    • title: article title
    • abstract: articles abstract
    • author: documents authors
    • affiliation: documents authors affiliation
    • keyword: keyword
    • publishedin: the venue (e.g. the journal or conference proceedings)
    • pubdate: publication date
    • location: place of publication
    • publisher: the publisher
    • edition: journal edition
    • number: journal number
    • volume: journal volume
    • pages: pages of the document in the journal
    • series: series of the journal
    • doi: DOI
    • issn: ISSN
    • isbn: ISBN
    • lang: language
    • oid: id at other sources. Format: <source>/<sources_id>
    • reflist: search reference list of the article


    • returns all documents that have an author whose name contains “John”
    • returns all documents that have an author whose name contains “John” and the abstract contains “study”
    • totalamount: Returns the count of documents in Mr. DLib
    • latestpublicationdate: Returns latest publication date of all documents
    • title: Returns document title(s)
    • abstract: Returns document abstract(s)
    • authors: Returns author(s)
    • fulltexts: Returns document fulltext(s)
    • xrefs: Returns document cross-reference(s)
3.2.2 Resource Elements


3.3 fulltext

The fulltext resource type is used to store the full text of documents in Mr. DLib.

3.3.1 Resource selectors
  • {id}: Returns the document fulltext having the given ID
3.3.2 Resource Elements


3.4 organizations

The organization resource is used to store information about organizations such as universities and publishers in Mr. DLib.

3.4.1 Resource Selectors
  • {id}: Returns the organization having the given ID
3.4.2 Resource Elements


3.5 Tools

The tools resource type provides data processing utilities that do not necessarily depend on the data contained in Mr. DLib.

3.5.1 Resource Selectors
3.5.2 Resource Elements


3.6 xref

The xref resource contains information about cross-references from individual resources to other services. For example, a document may contain a cross-reference to another third-party service that is also offering data on the document.

3.6.1 Resource Selectors
  • {id}: The xref resource with the given ID
3.6.2 Resource Elements


4 Current Development Status and Future Plans

Mr. DLib is currenlty in beta: it is functional and in use by some third parties such as Docear. We consider the available resources as stable. However, authentication methods, URIs, filters, query parameters, and other details may change in future versions.

We are working on improving the interface by providing more flexible methods for selecting elements and filtering results. In addition, we plan to provide data import methods that include document de-duplication, author name disambiguation and access restrictions. Another planned extension is the integration of a recommender system for related articles.