Anna Bellotto holds a MA degree in Italian Philology from University of Padova (Italy) and a MA degree in Digital Humanities from King’s College London (UK). In 2018 she joined the team of
This is the source
Semantic Web technologies provide the ability to more effectively connect and
integrate structured data by disclosing their intended meaning and therefore
making explicit their description, context and provenance. Thanks to their
nature, Semantic Web technologies have produced insights into the challenges
associated with standardizing metadata for manuscripts. Scholars depend on
highly specific catalogue records in order to understand a manuscript and raise
research questions which take into account either its physicality or its nature
of evidence for all aspects of
life in the medieval period
The employment of knowledge representation in the field of medieval manuscript descriptions is still narrow, though. Against the state of the art, this paper attempts to add some evidence: it analyses the impact of a top-level ontology designed for modelling cultural objects, namely
How can the
European medieval manuscripts represent one of the most significant treasures of
human culture and society, revealing rich information about the past that is
invaluable to historical research. History, art history, literature and
philology, codicology and palaeography, all rely on the analysis of manuscripts,
and scholars in these disciplines engage with these objects in unique ways.
However, whether the main focus may be the handwritten document in its
physicality or the textual content of the manuscript, ‘the first level of enquiry always is (or should
be) the document, the physical support that lies in front of the
scholar’s eyes’
Today, XML-encoded analytic descriptions of the physical and intellectual nature
of manuscripts work as digital surrogates of these artifacts while enabling the
knowledge contained in these descriptions to be interrelated and thus
potentially compared by users. Indeed, as stressed by Stinson a means for sorting, classifying,
and comparing collections of manuscripts
without metadata, there is no access and no meaning
requires
creating explicit machine-readable data that allow automated correlation
or collocation of related resources
This paper
Medieval manuscripts can be described within general databases, such as a library’s collection database, or specific manuscript databases. In the former context, records about a collection of manuscripts might be encoded in bibliographic technical and structural standards, as – for example –
The semantic and syntactical nature of XML provides many opportunities for encoding granular and extensible records, which offers many unique opportunities for data about medieval manuscripts. Moreover, the
different levels/depth of manuscript descriptions
Barbero and Trasselli
Research on Semantic Web technologies has produced insights into the challenges associated with standardizing metadata for manuscripts. In particular, the application of ontologies in the domain of codicological and paleographical data has been evaluated as a clever approach towards better communication within the community. A clarification about what ontology means in this context, and its relation with the Semantic Web and Linked Data, constitutes a preliminary step in understanding which benefits ontologies can offer to the complex framework outlined in Section 2.
The word ontology
in a computational sense is derived from a long
established tradition in the philosophical field, namely the concept -
introduced by Aristotle - of a
particular system of categories accounting for a certain vision of the
world
formal, explicit
specification of a shared conceptualisation
classes
and concrete
examples of these concepts within a domain are called instances
;
relationships are named properties
ontology
, the model needs to reflect ‘a certain rate of consensus about the knowledge in
that domain’
Thanks to their nature, ontologies have played a fundamental role in the development of the Semantic Web. The concept of the Semantic Web, or
Linked Datarefers to this technical set.
The role of the ontologies of sharing a common knowledge representation and
rendering domain assumptions
explicit
support communication processes
on contextualization of objects
In the same way, the terminological aspect which challenges the medieval
manuscripts area has found in the Semantic Web techniques a possible resolution.
Whether considering the perspective of multiple national languages or
recognising that an established palaeographical vocabulary still misses,
scholars seem to agree that an ontology could represent a valuable answer
because it could be able to align different vocabularies in a conceptual map
Although many points of analysis could have been explored in the research area of this paper, Kummer’s proposal of testing the suitability of
Taking into account the more complex and various European settings previously described, it could be properly argued that the focus on a single schema and encoding method challenges the validity of the outcomes of an ontology implementation. It is thus important to stress some considerations. On the one hand, scholars’ judgements on the appropriateness of
Commonly, the granularity and depth of descriptions is influenced by the decisions made by each project’s cataloguers: the general guiding purpose affects the meticulousness of the cataloguing activity. These two projects can represent two examples: whereas
For the aspects aforementioned, the differences of these two special projects, while being founded on a common ground — i.e.
‘imprecision, vagueness, lacunae’
Firstly, it is a top-level ontology, meaning that it delineates general
classes and properties, as events, places, actors, which are independent of a particular problem
or domain
Secondly, at the foundation of
The implementation of
The implementation of an ontology should always start from defining
its domain and
scope
what knowledge do
you want to represent
Once the decision about how general the ontology is going to be has been made, the desired information has to be mapped to the
involving participation of people and things
As a consequence, a conceptual re-arrangement of all the details contained in the manuscript descriptions was considered a pivotal step in order to more easily map the
Regarding the practical mapping,
defines the model on a purely conceptual level
sequence[s] of semantically associated [CIDOC CRM ] classes and properties, representing a specific concept
Finally, as previously delineated,
preserve the original semantics and/or to uniquely identify the metadata information
characterize and classify instances of CRM classes
[m]inimality modelling principle ofCIDOC CRM
Since 2004, a unit within the
new elements for marking-up real-world information
is common to mostTEIdocuments
events are not tagged
However, a manuscript description is a particular type of
The aforesaid reasoning needs further consideration, though. Events require
to be placed in specific time frames and linked to identified actors in
order to model factual information. If this data is not encoded in tags as
<date> and <name>, individual histories of handwritten primary sources
cannot be retrieved. This was the main drawback affecting the mapping
process. In
In reference to some of the aspects against which the choice of
An additional level of evaluation pertains the contradictory views.
exhibited in or presupposed
Lastly, the use of the class E55 Type for representing
the concepts of a codicological thesauri and the use of E41 Appellation for naming instances of classes by convention, tradition, or
agreement
E55 Type) from – for
example – the conventional name littera textualis
(E41 Appellation) (see Figure 5).
The analysis presented in this paper was not performed on a comprehensive range
of medieval manuscript descriptions encoded in different metadata schemas,
neither did it involve the implementation of an ontology in all its required
stages. If so, these choices would have let results have a more authoritative
voice, highlighting at the same time a major level of complexity that the
Just considering
precisely know the semantic definitionsof their schemas
‘must be involved in encoding the meaning of their own information’
semantic enrichment
encoding knowledge units directly into texts
this kind of framework for linking disparate resources is lacking for medieval and Renaissance studies
Despite the above-mentioned considerations, the challenges in terms of
infrastructure as well as human and economic resources should not overcome the
potential benefits that such an implementation could involve. The introductory
paragraphs of this article have underlined how manuscripts form a crucial evidence base for the
humanities, and research into their histories has important benefits for a
wide range of disciplines
CIDOC CRM – would
enhance the quality, in terms of both discovery and analysis, of a great variety
of large-scale qualitative investigations which researchers could be focused
on.
Semantic Web technologies provide the ability to more effectively connect and
integrate structured data by disclosing their intended meaning and therefore
making explicit their description, context and provenance. While allowing
integration and interoperability across heterogeneous resources, one great
benefit of the Semantic Web is that the local meaning of each of these resources
is never lost and the source systems are not demanded for large changes:
semantics can be embedded
(rather than described separately) within exactly the same
structure
The case study presented in this paper has been focused on one specific instance of medieval manuscript data lacking explicit semantics: the paleographical-codicological descriptions belonging to two different collections within the Italian catalogue
Taking into account the limitations of this case study and its challenges, this initial theoretical attempt towards the proposed solution of the Semantic Web technologies to integrate an heterogeneous framework, as that of medieval manuscript descriptions, can be positively evaluated. However, there are still many problems that need to be addressed if considering a real-world implementation. But despite current and potential questions that demand to be investigated, tools are now available to achieve the promising outcome illustrated in the first paragraph of this conclusion. It remains to invest in Digital Humanities research and the cultural community in terms of interest, skills and collaborative work.
A pivotal thank you to Dr. Kristen Schuster for the great encouragement and enthusiastic support in writing this article.