Database progress – and a new model
By Sarah Middle, Duncan Hay, and Alex Butterworth
In a previous blog post,we described both the Arches software that we are using to manage and publish our project database, and our initial remodelling of the data with reference to the events-based CIDOC Conceptual Reference Model (CIDOC CRM). In the extended course of doing so, we realised ultimately that CIDOC CRM alone did not provide the level of nuance required to distinguish between the complex events concerning the lives, work and relationships of our makers that are represented in the SIMON database. To compound this problem, we also encountered a seemingly insuperable practice obstacle to introducing additional ontologies to Arches, and were forced to conclude that applying multiple ontologies to describe the same entity (resource model) within Arches was not possible.
This was a serious setback. Undeterred in our commitment to the Arches platform, though, for the reasons described before, we realised that the most effective way to address this issue, while still taking advantage of the many benefits offered by Arches from a usability perspective, was to create our own ontology, with relevant classes and properties aligned to existing ontologies (such as but not limited to CIDOC CRM), where suitable classes and properties were available. Using the resource models that we had created in our initial Arches instance, as well as the remaining database fields from SIMON and new records laboriously extracted information from its ‘Misc. Info’ fields and elsewhere, we produced the Scientific Instrument Makers and Events Ontology, SIMEOn. Once finalised, SIMEOn will be published openly online, alongside our project database, but for now, it is undergoing a gradual process of refinement as we proceed with its implementation in Arches.
Our initial impressions of using SIMEOn in Arches were extremely positive – all the conceptual work has already been done, and Arches’ validation features (to ensure that the correct classes and properties are associated with each other), which had previously proved the stumbling block when using multiple external ontologies, now made the process of creating resource models quite efficient. At the same time, these features proved extremely useful to for identifying gaps and issues with the ontology, which could then be resolved. This does not come without its own drawbacks, though, as the validation process also required us to delete the entire Arches database every time we wished to upload a new version of the ontology. However, whilst annoying, we also recognised that this was crucial to ensure data integrity and to avoid potential future incompatibilities within the model.
To mitigate the need to repeat this process too often, we produced a schema, or plan, to list our resource models and the pieces of information (nodes) attached to them, in a simple text document. This exercise enabled the identification of issues with the ontology without having to repeatedly upload new versions to Arches, as well as providing us with a point of reference to compare with the actual data. In the course of this process, we realised that by working, increasingly, from an abstraction of the data, some assumptions and misunderstandings had crept into the model: while still the ideal way of representing the data in principle, considerable work has be required and is currently being undertaken to restructure all the data into the necessary format.
Alongside the above data modelling, we have additionally been working to finalise controlled lists of the different ‘types’ that appear in the SIMON data , developing during the very extensive work of data extraction that has been carried out over the last eighteen months (which will be discussed in a future blog post). The key resource developed by the project in this area is the Scientific Apparatus Ontology, created by Sarah Middle on the foundation of a taxonomy produced by Morgan Bell and Steven Skuse from the Whipple Museum in Cambridge, which will be discussed in a future blog post. Among the other data resources produced during are occupation types, material types and organisation types, which will be converted to thesauri, also in Simple Knowledge Organisation System (SKOS) format. These thesauri will then appear as dropdown lists when a user is adding new data to the database, which will ensure that terminology is used consistently, as well as allowing users to search the database by type.
Having taken the time required to address the issues we have encountered in terms of data modelling and data management, work on the database is progressing well and we look forward to installing an initial version of the Arches database on Royal Museum Greenwich servers in June, with testing to follow.The first published version will follow in the coming months, when it is available.
Leave a Reply