Huskey is an associate professor and the chair of the Department of Classics and Letters at OU. He is also the director of the Digital Latin Library
Witt is an associate professor in the Philosophy Department at Loyola University Maryland.
This is the source
With
imprimaturfrom its association with any particular presentation and allow the indicators of quality to
travelwith the data and be communicated to end users in a plurality of visualizations.
A method for verifying that a set of data is the published, peer-reviewed version.
People have been publishing digital versions of Latin texts — in the sense of making them available to the public — for at least as long as there has been an internet. Until recently, however, digital Latin texts did not include any of the features of critical editions, for a variety of reasons. Copyright restrictions have prevented the digitization of anything but the main text of existing editions. The technical challenges of representing a critical apparatus in a digital format are considerable. Perhaps the most significant barrier has been the problem of defining what a digital critical edition is. Then there is the issue of publication itself. Scholars are reluctant to try a new form of publication if they are not certain that their peers will equate it with existing forms.
For all of these reasons, the publication of digital critical editions of Latin
texts has been the primary objective of the DLL project from its beginning. It
is why representatives of three major learned societies, the Society for Classical Studies
(SCS), the Medieval Academy of
America (MAA), and the Renaissance Society of
America (RSA) have always been on the project’s advisory board, since
these three groups share a common interest in promoting the publication of
digital critical editions of Latin texts. Publication has also been the main
factor in the project’s funding, which has so far come solely from the Scholarly
Communications program of the Andrew W.
Mellon Foundation.
But what will the DLL publish? The term
The key word in that last sentence is
This is not to say that the presentation of an edition’s data does not matter, or
that encoded data is not itself a type of presentation.human-legible and reasonably clear
(any means of making explicit an
interpretation of a text.
(Creating an electronic edition is not a one-person operation; it
requires skills rarely if ever found in any one person. Scholarly
editors are first and foremost textual critics. They are also
bibliographers and they know how to conduct literary and historical
research. But they are usually not also librarians, typesetters,
printers, publishers, book designers, programmers, web-masters, or
systems analysts.
The DLL itself puts this decoupling into practice by providing its own
applications for using LDLT texts. For example, its web-based DLL Viewer
is an ongoing scholarly project of Hugh Cayless that presents an LDLT edition’s
data in a user-friendly, feature-rich online interface based on the JavaScript
library CETEIcean.
Our aim is to provide a means of independently verifying that a text presented in
an interface has undergone scholarly peer review. In this way, textual editors
receive credit for their work when it is used in other projects, and interface
designers and others can certify that their data come from the authorized,
published version of a text. The DLL Review Registry Service fills that need by
providing what we are calling a travelling imprimatur
.
The DLL Review Registry Service is one piece of a larger ecosystem that is trying to change the way we think about publication and quality control. In traditional models, publication and the imprimatur of quality control have been tightly coupled. The publisher is responsible not only for printing and distributing a text, but also for providing an imprimatur that indicates that the text in question has reached a certain level of quality. A consequence of this approach is that the imprimatur of a text is bound to a particular presentation of the underlying data. This coupling means that if one wants to access or use the reviewed text, one must do so exclusively within the confines of the particular presentation offered by the publisher. This requirement severely limits our ability to reuse the underlying data for new and unanticipated purposes and it likewise de-incentivizes scholarly interest in creating reusable data of high quality.
One of the most exciting possibilities afforded to us by the digital medium is the ability to use and re-use a text for a plurality of purposes and within a plurality of presentations. With a single source document we can create books, websites, databases, and networks tailored to specific research questions.
Unfortunately, the dominant publication practices have not kept pace with these changes. While it is now technologically possible to separate the underlying data of a text from its presentational form, the practice of coupling an imprimatur to its publication practically demands that the reviewed text can only be used in a single presentational form, severely limiting the way a peer-reviewed text can be reused.
The DLL and its partners aim to offer a new kind of peer review that separates the task of review and quality control from the task of publication and distribution.
In our model, each of the partner organizations undertakes the tasks of reviewing the underlying data of a given text rather than any particular presentational form. The DLL Review Registry Service exists to help record these reviews and make them accessible for re-use throughout the web in any publication platform and any client viewer.
On this approach, reviews are tied to the reviewed text using a cryptographic
hash.
The DLL Review Registry Service is built around these unique fingerprints. Every review certificate (described below) indicates the hash of the file that has been approved. It then associates this hash with a rubric or set of criteria used by the society to make their evaluation.
Meanwhile, any publication application, whether print or web, can use the fingerprint of the file to discover any existing reviews. Using the DLL Review Registry Service API, a client application can compute the hash of any file it has. It can then send that hash to the DLL endpoint and discover any review of that precise file. For any request, the DLL endpoint will respond with a machine actionable set of metadata about the review for this file.
Included in this response is a link to a society’s
In this regard, the DLL Review Registry Service operates as one of possibly many indexing or discovery services. This service is therefore centralizing only in the sense that it keeps track of review certificates in a single registry. It thereby provides a convenient API for clients to discover these certificates. However, because the authority of the review certificates does not come from its association with a given domain name or API endpoint, but from the certificate's signature, anyone can create a similar indexing service, allowing clients access to alternative discovery APIs via other web domains.
In sum: Instead of confining the imprimatur to a particular book or particular
website, the DLL Review Registry Service assigns an imprimatur to the underlying
data. The service allows that review to be instantly accessed regardless of
where or how that text is being published. In this way, the imprimatur
travels
with the text wherever it goes, which is why we refer to it
as the traveling imprimatur
.
The follow is a technical overview of the 1.0.0 DLL Review Registry Service. Given the fast pace of web development and our heavy use and modification of emerging technologies, the following description should not be viewed as documentation of the production system. For up-to-date documentation, the production system itself should be consulted (https://dll-review-registry.digitallatin.org). Instead, we offer here a description of the basic architecture of the review system with sufficient technical detail to make that pattern clear and intelligible. While the specific details of how that pattern is executed will undoubtedly change over time, it is the larger pattern and workflow that we aim to communicate here.
There are three main interactions that the DLL Review Registry Service
anticipates:
In order to ensure the long-term survival of these reviews and to avoid the pitfalls of a service with a single point of failure, we have attempted to enact all three aspects in the most decentralized way possible, and it will be the goal of further development of the system to continue moving in this direction as new technologies emerge and stabilize.
As mentioned above, this means that while for convenience the service will offer a public-facing site and a centralized index of reviews, the issued review certificates themselves are not dependent on the index for their existence or retrievability. Moreover, reviews will be created and published in such a way that the authenticity of their content is not dependent on the origin from which they are retrieved. That is, one need not infer the legitimacy of the review based on the domain name or service from which it was retrieved. Instead the certificates will carry within themselves sufficient information for new aggregation and new index services to be constructed from verified content. Thus, it will be possible for multiple indices of these reviews to exist at the same time. More importantly, this approach allows subsequent indices to easily replace the existing index service should that become necessary.
When a society is ready to submit a review for an approved edition, it can
submit that review in one of three ways: via a webform, via a
POST
request to the DLL Review Registry API, or they can
construct, sign, and publish the review themselves and then submit a minimal
POST
to register the existence of the review. The latter
option may lack the convenience of the former options, but the option exists
to emphasize that the creation and verification of these reviews is based on
a set of rules and protocols, not on the existence of a particular website.
Registration with the DLL is then simply a way of letting particular
aggregators know about the existence of the review created according to
community standards and protocols described here. The aggregator then
functions as a convenient discovery endpoint whereby third party clients can
discover the existence of reviews of interest.
In its most basic form, a review is a JSON document that takes as its
template the OpenBadges
specification.
To do this we make use of another emerging technology called IPFS or the Interplanetary File System. With the
help of IPFS we are not only able to generate a unique fingerprint (a SHA256
hash) of the file being reviewed, but via the IPFS network we are actually
able to retrieve this file by its hash or content rather than its
location.
Our modified OpenBadges certificate uses the recipient field to identify the IPFS hash of the document or documents being reviewed, rather than a name, email, or url. In addition, the certificate itself should include the public key of the agency of institution that signed the document (see the section of verification below). This public key itself can be hashed and made addressable via the IPFS network as well being stored and addressed via the DLL Review Registry site.
A basic certificate can be seen here:
The construction of this certificate is the main task of creating a review. The resulting document itself can be hashed and pinned to the IPFS network. But in order for this certificate to be discoverable, the hash itself needs to be given to a registry or index, such as the DLL Review Registry Service.
This could be done manually or through an API service. However, the more likely case is that an issuer will prefer to use an automated system for hashing the various files, publishing them to the IPFS network, as well as digitally signing the resulting certificate.
The DLL Review Registry Service provides this service through a webform, seen in Figure 1.
Submission through the webform takes five basic parameters:
.
The Review Text URL is a URL from which the document or documents being reviewed can be retrieved. If the document has already been hashed and pinned to the IPFS network, then an IPFS gateway address can be used, but any other URL will work as well. This information is not embedded in the certificate, but is used to create the recipient hash embedded in the issued certificate.
The Review Society field is the place for authorized users to select the society issuing the certificate. A society listed here means that the DLL Review Registry Service contains and protects a copy of the society’s private key and is therefore authorized to issue certificates signed with this key (again, see below). The selection of a society will likewise determine which public key gets inserted into the review certificate.
The Review Approval Level is a field for reviewers or institutions to indicate the rubric used for the review. The possibility of indicating approval codes creates a way for reviews to escape the binary of an all or nothing review. A society, if it wishes, can create different kinds of reviews corresponding to different rubrics, allowing them to acknowledge quality work that meets different criteria. The code indicated here will determine which badge is issued in the resulting certificate.
An example of such a rubric might look like the rubric seen below in Figure 2.
A Review Summary is a field for leaving any pertinent details about the particular review. It could be inserted into the certificate itself, but at the present it is used merely as a place for a notice about reviews issued through the DLL Review Registry Service.
The Review Submitter field is, again, a field used only by the DLL Review Registry Service to keep track of the individual who created the review through the service and on behalf of a particular society. At present it is not used in the creation of a certificate.
Upon submitting the above information review, the DLL Review Registry Service performs several actions.
First, using the Review Text URL(s), the service retrieves the designated
file(s) and pins the file to an IPFS node. At the same time it records the
IPFS hash and the plain sha256 hash. (It should be noted that an IPFS hash
is also a 256-hash, plus a prefix, encoded in base58, representing the file
divided into a series of blocks.
Second, the service constructs the review certificate using the newly minted recipient hash, the review approval code information, and the public-key of the reviewing society.
Third, the service then pins the newly generated certificate to the IPFS node generating an IPFS hash for the certificate. This hash is stored in the registry index alongside of the hash of the reviewed documents.
Fourth, the service digitally signs the generated certificate using GnuPG and the institution’s private key. This process is described below. The resulting signatures are themselves hashed and pinned to the IPFS node. These hashes are finally stored in the index alongside the hash of the document and unsigned certificate.
While content addressability offers assurance that the document received is the document requested, a means of verifying that a given society really did create this review certificate is still needed. After all, the public key of any reviewing institution is public and therefore anyone could embed this public key into their own certificate.
The simplest way of providing this verification is through hosted
verification. Namely, one can have confidence that a review was issued by a
given society because that review was requested from the DLL Review Registry
Service’s domain. However, in a distributed system where certificates can
travel and are requested not by their location but by their content, we open
the possibility for a certificate to be retrieved from any IPFS node or via
http through any IPFS gateway. Thus, we need a mechanism of verification
independent of the document’s point of origin. GnuPG digital signatures were
designed for precisely this purpose.
Using the private key of a reviewing society, we can generate a
signature
for a given certificate. This signature is a hash
uniquely generated from the private key of the issuer and the data being
signed. Importantly, the process can never be reversed. The private key can
never be generated from the public key, the signature, the data, or any
combination thereof. However, the fact that this signature was generated
from the society’s private key can be verified against the combination of
the data to be verified and the society’s public key.
In other words, a document (in this case, the certificate) can be verified as
being issued by a society with a specific public key by putting the data and
the signature together and checking it against the society’s public key. The
check will confirm that this signature could only have been made by the
combination of this society’s private key and the submitted certificate.
Change one bit in the certificate (in this case, one would most likely be
tempted to change the hash of the recipient
document) or one bit in
the public key and the signature will no longer pass verification.
At the present the DLL Review Registry Service creates two kinds of signatures allowing two methods of verification.
The first method is a detached signature. The detached signature is a hash stored in a separate file that is the result of the institution’s private key and the issued certificate. The DLL Review Registry Service again pins this resulting signature to the IPFS network and stores its hash as part of the indexed record.
Using a detached signature and previously imported public key, the
certificate can then be verified as follows: $ gpg --verify
[path/to/signature] [path/to/certificate]
The drawback to using a detached signature is that the signature is detached from the certificate and creates another file that needs a central indexing service to associate with the issued certificate.
In order to reduce this reliance on a central indexing service, the DLL
Review Registry Service also produces a clear-signed
version of the
original certificate. This means a new version of the certificate is
produced that includes both the original content of the certificate as well
as a signature. With a clear signed certificate, the verification process
can both verify the certificate and return the original un-signed
certificate. The certificate can be validated with the following command:
$ gpg --verify [path/to/clear/signed/certificate]
. The
original unsigned certificate can be generated from the signed certificate
with $ gpg -d -o original.txt
[path/to/clear/signed/certificate]
. This resulting document
original.txt
can be further verified against the
unsigned-certificate.txt
by comparing the hashes of each file or
by performing a unix diff
command against both files, e.g.
$ diff original.txt unsigned-certificate.txt
.
From a verified certificate one can have complete confidence that the recipient document in the certificate is the precise document reviewed by the society. This is the case because the certificate does not point to a location but to the precise hash of the reviewed document. Addressing the document by its hash rather than its location via the IPFS network provides assurance that one is retrieving the very same document that was used to make the certificate and verified via its signature. Change one bit in the hash in the request and a different document will be retrieved. Change one bit in the hash in the certificate and the certificate will not longer be verified.
These verifications can be done entirely separately from the DLL Review Registry Service website, but the service also offers a web form and API through which certificates can be reviewed, as seen in Figure 3.
Finally, third party services may want to be able to use the DLL Review Registry Service to discover verified review certificates for documents of interest and to be able to communicate the review status of those documents and verify that status back to end users.
Naturally, a human user can use the DLL Review Registry Service to browse reviews, view html renderings of reviews, verify certificates, and read the rubrics used by each society. This, however, is not the primary place an end user is expected to encounter this information. Rather, the primary expected pattern is that a client application will access this information using the DLL Review Registry Service API and offer that information back to the end user within its own interface. (See examples below.)
The API for data retrieval in the 1.0.0 version consists of three main
routes:
.api/v1/review/:id
routeapi/v1/reviews/:hash
or
?url=document_url route
api/v1/verify?url=clearsigned_url
The api/v1/review/:id
route (1) allows a client that knows
beforehand the indexed ID of a DLL Review Registry Service review to look up
the review report. This is an ID that is unique to the centralized index
service. The report in turn returns the information stored in the index
including the document hash, certificate hash, detached signature hash, and
the clear-signed certificate hash.
For a route like the following:
https://dll-review-registry.digitallatin.org/api/v1/review/f4b8dc2f-4d41-478d-877b-843b46e21283
a sample expected output would be:
From a decentralized perspective, the clear-signed certificate hash is the only piece of information needed. Every other piece of information can be extracted from it.
The clear-signed certificate contains the reference to the public key which can be used to verify the signed certificate and to generate the original certificate. The certificate itself contains the review report as well as the reference to the hash of the reviewed documents.
In most cases, however, it is not expected that a client will know the DLL Review Registry Service specific review ID. More likely, a client will be serving a document and it will want to be able to inform its user quickly as to whether or not this precise document has a review. This brings us to the second provided API route.
The api/v1/reviews/:hash
or ?url=document_url
route
(2) offers this possibility.
In this second, more common case, a client can supply a pre-computed sha-256 hash and receive a list of all reviews for any document with the same fingerprint or hash. Likewise, the client can supply, if it knows it, the IPFS hash and the API will return all reviews for documents having that hash. The IPFS hash and sha-256 hash will always correlate, meaning that a sha-256 hash will always return the same list as its corresponding IPFS hash and vice versa.
Finally, if a client does not want to pre-compute the hash of a document, it can simply supply the URL of the document it is serving using the ?url parameter. In this case, the DLL Review Registry Service will use the supplied URL to retrieve this file, compute its hash, and return a list of reviews that correspond to the text.
This is perhaps the most likely scenario for a consuming client, as it presumes no prior knowledge on the part of the client except for the location of the text it wants to check. By providing the DLL registry with the location of this text, it can easily check whether or not the precise version of this text has received a review and then receive pertinent metadata about that review.
For the following request
https://dll-review-registry.digitallatin.org/api/v1/reviews/QmdgZhAFTepfXmUsxEGaVampJubZ7XMk4pC42uzHgBEgvy
a sample expected response would appear as follows:
In the above response one can see that two separate reviews exist for a file with identical content.
The API also allows easy filtering of different kinds of reviews. If a user
or application only wants to see a review by the MAA (or another specific
society) and exclude all other reviews, it can simply add a parameter to the
request like so:
http://reviews.digitallatin.org/api/v1/reviews/45304964c8bf9fb63737fa54e701b765baea0d950ff396c8fc686dd9bfda0416?society=MAA
This request will return an array of only those reviews that come from the
MAA.
Finally, api/v1/verify?url=clearsigned_url
(3) allows clients to
verify clear-signed signature.
For the route api/v1/verify?url=clearsigned_url
, a client can
provide the URL of a clear-signed certificate and the API will return a very
simple response as follows:
The expected pattern is that a client could offer its users access to the
review certificate for a given document by first sending the URL of the
document to the Review Registry Service. In the response, the client would
find the hash of the unsigned and clear-signed signature. The client can
then offer these certificates to end users. If end users want to verify the
authenticity of the certificate, clients can use the retrieved url of the
clear-signed certificate and the DLL Review Registry
/api/v1/verify
route to offer end users a verification.
A high-level illustration of anticipated interactions between the DLL Review Registry and Client systems can be visualized below in Figure 4.
Below are three brief examples of third party clients using the DLL Review Registry Service.
Figure 5 shows the LombardPress' LombardPress-Web (Lbp-Web) application displaying a critical edition. Lbp-Web does not contain any text on its server, but uses the Scholastic Commentaries and Texts Archive to locate data sources based on various query parameters, then it makes a further request for that data, which can be distributed anywhere on the web, and finally it displays that data to the end user. In this case, Lbp-Web has discovered a data source, an XML file for
Figure 6 provides a second example of basically the same process. What is important here is the emphasis on the way the same imprimatur can travel with the data despite the highly varied use of the data in question. In the Ad Fontes application — an application designed to allow users to explore individual quotations throughout the Scholastic corpus — when a user requests to see the context paragraph of a given quotation, the application knows that this paragraph has been pulled from a larger data source (a larger XML file). At the same time that the application is showing the end user the context paragraph, it can also send an asynchronous request to the DLL Review Registry Service to check if there have been any reviews for the data source from which this context paragraph has been extracted. If so, it can display the review badge and data source hash alongside the context paragraph, as seen in the bottom right corner of Figure 6.
Finally, Figures 7
and 8 illustrate the way the travelling
imprimatur can be implemented in an even wider array of applications to
provide users confidence when merging content in a Linked Data world. Figure 7 illustrates how Linked Data Notifications
(LDN) have been used to make announcements about related but distributed
content. In this case the Scholastic Commentaries and Texts Archive has
published a transcription via the IIIF Open Annotation Model about a
manuscript owned by the Bayerische Staatsbiliothek and made accessible via a
IIIF Manifest. Through the use of this Linked Data, LDN, and the Mirador-LDN-Plugin
Figure 8 offers a second example of this use of the traveling imprimatur. This time the additional content for the British Library Manuscript has already been imported. The visual imprimatur and accompanying IPFS hash for the data source from which this transcription was extracted are provided for the end user. The end user can click on the imprimatur to learn more about the review and the rubric used to make the review or can use the IPFS hash to access the raw data source directly.
It is important to acknowledge that web technologies are evolving at a rapid
pace. This includes the web technologies described above. New possibilities
are emerging every day. Accordingly, it needs to be reiterated that the
central import of the present article is to describe the actualization of an
idea and the general architecture of that achievement rather than the
particular details of its achievement. Above we have tried to show how we
have built a system that moves the idea of the Traveling Imprimatur
to a reality in production. Beyond providing a production service, this
actualization is also meant to show that social rather than technical
obstacles are causing delays in de-coupling the distinct and separable
functions of publication and quality control.
Today, the need to demonstrate that this idea can be actualized is urgent
because of the persistent temptation to view the construction of a digital
edition as something equivalent to or on par with a printed book, where the
edition itself is bound to and identified with a particular visual
presentation of that data.