Linked Open Data in Dialogue (2023)

Join us for our event "Linked Open Data in dialogue" on Wednesday 15 November 2023 at the main building of the University of Bern. 
Room: 501/Kuppelraum, Hochschulstrasse 4, 3012 Bern. 

This continuation of the event series “...im Dialog” (jointly organized by Digital Humanities and the University Library of the University of Bern) will also be part of the LOD4HSS project, developing and enriching the Geovistory platform.  

Geovistory (www.geovistory.org/) is a virtual research environment for the Humanities and Social Sciences and explicitly tailored towards the need of highly interlinked research projects. 

The program will include an introductory talk on Linked Open Data, a hands-on Wikidata workshop, and an introductory or alternatively an advanced workshop on Geovistory. Various projects around Linked Open Data and Geovistory will present their experiences with the platform and Linked Open Data in general. 

The morning session will start on 9 a.m. and we will end around 5 p.m. with an apéro. Participation is possible on-site (in Bern) or via livestream (ZOOM). This event will be in English. 

Sign-up using this LINK.

Find the Flyer here: 2023_Poster_LODinDialogue_ger.pdf.
 

Program

Time

 

Presenters

09:00 

Welcome 

Ursula Loosli, Stephen Hart & Tobias Hodel 

 

Morning: Geovistory in Action

 

09:15 – 10:00 

Linked Open Data for Research: Towards a Paradigm Shift in the Humanities and Social Sciences

Francesco Beretta 

10:00 – 10:15 

Coffee break 

 

10:15 – 11:25 

Geovistory Workshops

Two Parallel Sessions, Beginners and Advanced

 

11:30 – 12:30 

The MURELCO Seminar – a Teaching Tool

The Processetti Inquiries (16th-17th Centuries) – a Research Tool​

Switzerland and Beyond –  a Community Project​

Tagebücher Anna Maria Preiswerk-Iselin – a Scholarly Edition 

OntoMe – The Ontology Management Environment

Matthieu Gillabert & Stephen Hart 

David Knecht

Tobias Hodel

Morgane Pica

Vincent Alamercery

 

Afternoon: Linked Open Data in action

 

14:00 – 15:15 

Archival Descriptions as Linked Open Data : Why, How, Whereto​

Linked Open Data​ in Numismatics – nomisma.org & Co

The LOD Pilot Project «Berner Ortsgeschichten»

Oliver Schihin

Rahel C. Ackermann 

René Frei & Thomas Hayoz 

15:15 – 15:45 

Coffee break 

 

15:45 – 17:00 

Wikidata Hands-on: Introduction, Analysis and Visualization 

Benedikt Hitz-Gamper 

17:00 – 17:15 

Final words 

Tobias Hodel & Gero Schreier 

From 17:15

Apéro 

 

Contact & organizers

We look forward to your participation! For questions, we can be reached at imdialog@ub.unibe.ch

The organizers

Prof. Dr. Tobias Hodel, M.A. Stephen Hart, Digital Humanities

M.A. Ursula Loosli, Dr. Gero Schreier, Open Science 

While large language models and AI-based image and sound production technologies are currently attracting a great deal of public attention, from the perspective of Humanities and Social Sciences (HSS) research, we should not forget the importance of knowledge graphs and linked open data (LOD), which are revolutionising the way information is analysed and shared at scale. Google's Knowledge Graph, announced in 2012, has been defined by the company as a "giant virtual encyclopaedia of facts", and indeed it is huge: “By March 2023, it had grown to 800 billion facts on 8 billion entities” (Wikipedia). 

Given this context, will HSS researchers be able to share the information they produce on a daily basis by leveraging the same semantic technologies, and interconnect their data with the metadata of libraries, archives and museums, in order to create a giant graph of high-quality information about historical objects? And thus enable a renewed knowledge of the societies of the past, and a better understanding of the issues of the present? 

In this talk, I will first address the question of how to define information in the context of HSS research, and under what conditions and for what purposes the sharing of vast amounts of information can be useful for the promotion of disciplinary innovation. I will then address the issue of a semantic methodology that allows for the creation of an open and refinable common conceptual model that carefully documents the meaning of the shared information and facilitates reuse for new research. Finally, I will present some projects that have adopted this approach and are providing increasing amounts of semantic LOD, and I will outline the conditions for the success of this endeavour as they stand today. In conclusion, I will discuss how the Semantic Web approach can bring about a paradigm shift in HSS research by enabling new research questions to be developed, given the vast amount of information about past and present societies that distributed and semantified LOD will make available, and by integrating AI into the digital toolkit of researchers. 

The technologies of Linked Open Data, more and more present in the Humanities, bring changes in the way historians manage their research data. The Geovistoy ecosystem helps researchers to produce and reuse FAIR data, allowing them to not only build collectively new information, but also to answer new complex historical questions. 

Geovistory can also be a powerful teaching tool that can bring students to this understanding, through theory and practice. The seminar put in place by professor Matthieu Gillabert as well as Diletta Guidi and Sara Petrella confronts students with ethnographic collections from the Fribourg museums and aims to better understand the biases in the way those collections were acquired and documented. In this context, the use of Geovistory can teach students how Linked Open Data and the SDHSS ontology can benefit in documenting complex information and diverging opinions. In addition, the use of Geovistory allows the students to publish FAIR data that can be later reused by other students in future seminars, thus building an incremental and collaborative teaching project.

The ANR funded research project Processetti by Professor Chauvard “Marriage and Mobility in Venice, 16th-18th centuries” has a twofold objective: on the one hand, to study the structures and migration routes to Venice through the exploitation of the information contained in the premarital enquiries (processetti) to which widowers and foreigners were subjected in order to establish their status; on the other hand, to study the implementation of this control procedure on an Italian scale and in a comparative perspective towards the orthodox world.

In order to realise the ANR Processetti database, produce and curate semantic data, manage digital reproductions and semantically annotate transcriptions, the project made use of Geovistory, an open-source virtual research environment explicitly designed for the digital humanities and conceived with the aim of supporting collaborative work for historical knowledge production. The project-webpage builds on the Geovistory infrastructure and allows to browse the project’s database.

Geovistory Community projects seek to consolidate data collections to illustrate the potential of Linked Open Data in a particular domain. These projects aim to exhibit the extensive historical data that has already been amassed and can be repurposed. For instance, the "Switzerland and beyond" project within Geovistory gathers diverse digital data pertaining to the history of present-day Switzerland. This includes data from the historical lexicon of Switzerland, scholarly editions, and other relevant sources. 

Linked Open Data, when associated with FAIR practices, is an excellent way to produce transparent research data. Historical data being often derived from textual sources, combining Linked Open Data with an XML/TEI encoding allows even more transparency. The source and the database may then be linked and published, each through its own standards.

The aim of the Tagebücher Anna Maria Preiswerk-Iselin project (Digitale Edition der Tagebücher der Anna Maria Preiswerk-Iselin, verfasst zwischen 1795 und 1839) was to study the intellectual, emotional and literary life of a 19th-century Basel woman, based on her diaries. To do this, a plain-text transcription was drawn up and linked to a database identifying and defining a series of concepts, people and bibliographical references found in the text. This work, carried out in the Geovistory research environment and set into the conceptual frame of the CIDOC-CRM, was then used to compile an XML/TEI text database, deriving minimal semantic encoding from the graphic organisation of the transcription, and to publish the transcription using TEI-Publisher. Publication allows the text to be read along with the entities identified within it, as well as locating all occurrences of an entity throughout the corpus.

This project is one functional example of how a FAIR database may converse with a semantic encoding of its historical source, and how Geovistory may give an opportunity to link a digital transcription to the Semantic Web.

Developed by the Digital History research team at the LARHRA lab (CNRS – universities of Lyon and Grenoble), as part of the Geovistory ecosystem, OntoME (for Ontology Management Environment - https://ontome.net) is a software as a service (SaaS) that offers research projects the ability of managing their ontologies in a collaborative and open way.

All too often, research projects in the historical sciences and humanities still follow closed or poorly designed models that do not allow data to be interoperable with authority files and other similar databases, or to be reused for new research. This is even though, in the context of open science, funding agencies (first and foremost the ERC) recommend or require funded projects to publish FAIR data. 

These observations led us to design an application that enables a community or project to build a domain-specific model for its own research, integrated with the CIDOC CRM and SDHSS conceptual framework, to produce high quality linked open data. 

Using the Processetti project as an example, we will show how OntoME can be used to discover and understand semantic of existing ontologies, design a community driven model ensuring interoperability and especially create application profiles to describe a specific domain or project.

Archival descriptions have been open for users to read and machines to index on the internet since the advent of online catalogues. Linked data technology promises to extend and enhance this openness to create better interoperability, improve machine readability and enable new forms of visualization and discovery. The publication of the first versions of the Records in Context Ontology (RiC-O) has boosted this work. In addition, the Open Government Data (OGD) movement has been helpful to find partners and resources within the administration, bringing together agencies (and their datasets) through common procedures, rulesets, and technology stacks.

The “liberation” of archival information held in database tables is not straightforward. Archival data is – different from library data – not produced for sharing and has a high number of local definitions. Exchange standards like EAD XML are open to interpretation and cover only parts of a system’s content. On the other hand, leveled descriptions and hierarchical linking following ISAD(G) are key, records are only understandable in context. The increased accessibility of archival information requires that: 1) data collection should be as simple as possible, in a flat, complete and non-standardized format; 2) data should be transformed with standardized mappings (RML or R2RML) that are adaptable and understandable by non-IT personnel; 3) data be published with suitable URIs and a specific ontology (like RiC-O) into a public SPARQL

Further development lies ahead: On an organizational level, we have to guarantee the stability of services and URIs. On a data level, we need to map more data and create more context for our records and collections through linking to other data sets, be it international, community-specific or local, and to hook into larger knowledge graphs. Moreover, the momentum gained should be used to contribute to the development of new standards and rule sets for archival description.

Since the Renaissance, coins have been described, sorted, classified, discussed, and sorted again. As standardised mass products, they were  ideal material for being recorded in early databases. The numismatic concepts, covering all periods and agreed upon by experts over the course of more than 500 years, was transferred into Linked Open Data from around 2010 onwards. The aim was making the scattered data stocks  jointly accessible for research. It began with a few visionary individuals and is now an international network of numismatists, archaeologists, and computer scientists who jointly create the concepts for various fields of numismatics.

The LOD pilot project ‘Berner Ortsgeschichten’ is a collaboration between Digital Scholarship and the ZHB (Historical Collections Center), both part of the University Library Bern. While Digital Scholarship was seeking suitable data to acquire some initial experience with a linked data project, ZHB aimed to make the data of the Bibliography of Bernese History (BBG) more accessible. However, as the dataset of this bibliography includes more than 30,000 records, a decision was made to begin by focusing on the nearly 600 Bernese local histories dedicated to specific villages and towns (German: Ortsgeschichten). 

Since these texts narrate the history of one specific place, not only should they be enriched with external links by means of the places’ GND identifiers (GND-IDs), obtained from the library catalogue (Alma), but they should also be georeferenced and made accessible on a map. 

The first step involved selecting around 40 Bernese local histories from the nineteenth century, which are fully digitised and available on DigiBern, enriching the corresponding metadata, which was extracted from the catalogue, and georeferencing it on a map. 

Many obstacles had to be overcome: First of all, GND-IDs cannot be easily retrieved from Alma. The complete metadata had to be extracted via the SRU API, and then the GND-IDs had to be extracted with Python. Second, the GND-IDs do not directly point to the desired linked data. Using muenzfunde.ch, it was possible to obtain many of the desired links to Wikidata (Q-IDs), the Historical Dictionary of Switzerland (HLS-DHS) and ortsnamen.ch. Finally, the IT department had no time to display the data on a map, so Google My Maps was chosen as a workaround, despite its highly limited functionalities. Nevertheless, the more than 8,400 views of the map ‘Berner Ortsgeschichten aus dem 19. Jahrhundert’ on DigiBern indicate significant interest in this kind of resource. 

The Python script was then adapted for the larger dataset of Bernese local histories dedicated to specific villages and towns from 1975 onwards (500+). Furthermore, a separate data table was created as a concordance, replacing the table from muenzfunde.ch that had been used for the 40 nineteenth-century histories.  

Thanks to the Q-IDs that were added during this process, links to the coats of arms of villages/towns and to the official websites of municipalities, as well as other desired links, can now be reliably extracted from Wikidata with a few lines of Python, alongside the geographic coordinates. The IT department managed to create a proof of concept for the map, which is accessible via the university network (Citrix). Automatic updates to the data from the catalogue are planned, but unlikely to be carried out this year. 

Other planned steps include, first, the continuous expansion of the Bernese local table (BEOT) – which contains the Q-IDs, links to HLS-DHS and ortsnamen.ch, and the GND-IDs – and, second, the direct enrichment of the GND records with the Q-IDs. Additionally, there are plans to supplement/correct the location data in Wikidata and make the project’s data available on GitHub.