EAC-CPF revision half-way into stage 2

Kerstin Arnold is the Technical Coordinator of Archives Portal Europe

After successfully releasing a minor revision of the Encoded Archival Context – Corporate Bodies, Persons, and Families (EAC-CPF) in December 2018, the EAC-CPF team of the Technical Subcommittee on Encoded Archival Standards (TS-EAS) is now half-way into the second stage of the standard’s major revision. While all current issues are discussed on GitHub and during the monthly team meetings, a few general topics emerged from these conversations that seemed to qualify for more in-depth analysis.

One day dedicated to EAC-CPF

Hence TS-EAS decided to have a one-day meeting on selected topics from the EAC-CPF revision, when meeting in the context of the Annual Meeting of the Society of American Archivists (SAA) in Austin in August 2019. The topics on the table were “Dates”, “Names”, “Identifiers” and the new addition of “Assertion Description”. It should be noted that some of the updates, which are detailed below, are still in flux and will require further conversations during the next few months.

Dates

EAC-CPF uses a variety of elements to encode date information, but it is only possible to some extent to express uncertainty about dates or even to classify part(s) of a given date range as unknown. The EAC-CPF team has been looking at the implementation of such uncertainty in other standards, including Encoded Archival Description (EAD 2002 and EAD3), the Extended Date/Time Format (EDTF) Specification, the Text Encoding Initiative (TEI) and the Metadata Object Description Schema (MODS). Based on this comparison, the suggested changes include:

  • Adding the attribute @certainty (from EAD3) to the elements <date>, <fromDate> and <toDate>;
  • Recommending the use of values inspired by EDTF such as “uncertain”, “approximate” and “uncertain and approximate” with the newly added attribute @certainty;
  • Introducing a new attribute @status for the elements <date>, <fromDate> and <toDate> to indicate their status as being “unknown” or “open” (e.g. for persons who are still alive).

Names

While in EAC-CPF it is relatively straightforward to use the element <nameEntry> plus <part> to encode names and their constituent parts, there remain questions around the appropriate use of its other sub-elements: <authorizedForm>, <alternativeForm> and <preferredForm>. Meant to indicate the rule or convention, based on which a specific form of name can be identified as “authorized”, “alternative” or “preferred”, these elements – indirectly – also provide information about the status of the name given in their parent <nameEntry>. The EAC-CPF team has discussed options to disentangle the current situation, e.g. by:

  • Recommending more strongly that rules and conventions are encoded via the element <conventionDeclaration> in the <control> section of an EAC-CPF instance;
  • Adding the attribute @rules (from EAD3) to <nameEntry> to briefly note the applied rule, plus adding an IDREF type attribute to <nameEntry> to enable pointing to the corresponding <conventionDeclaration> for further details;
  • Introducing a new attribute @status for the element <nameEntry> to indicate the status of the name as being “authorized” or “alternative”;
  • Investigating the possibility of turning <preferredForm> into an attribute as well.

Alongside the expected changes for <nameEntry>, the EAC-CPF team also is considering a name change from <nameEntryParallel> to a more general <nameEntrySet>. The use of the attribute @localType would then be recommended to indicate that all names grouped within <nameEntrySet> are “parallel”, as per the specific use case in the US American context, or do all represent “former” forms of the name or “translation”-s of the name.

Identifiers

Talking about the various ways to identify an EAC-CPF instance, its versions, its parts, the entity – or identity – it describes, as well as related resources and related entities, the EAC-CPF team has decided to focus especially on providing more specific descriptions and more appropriate examples to clarify which ID element – or attribute – to use for which use case. As a starting point, three types of identifiers have been defined, one of which can furthermore be divided into two sub-groups:

  • Database primary keys, used to uniquely identify each record within a given context; e.g. elements <recordId> and <otherRecordId> holding current and maybe previously used identifiers of the EAC-CPF instance;
  • Identifiers used to distinguish and determine entities;
    • Informational identifiers, e.g. the alphanumeric string representing the name of an entity as given in <nameEntry>, which establishes a meaningful connection with the entity it represents;
    • Non-informational identifiers, e.g. the primarily, but not exclusively numeric string of a globally unique and persistent identifier as given in <entityId>, which does not have a meaningful connection with the entity it represents;
  • Identifiers used to create unique locations within an EAC-CPF instance; i.e. the attribute @xml:id providing identification for a specific element within the EAC-CPF XML.

With regard to identifiers of EAC-CPF instances that have been merged or translated into the current one, the EAC-CPF team has decided to promote the use of <source> rather than <otherRecordId>. Furthermore, <entityId> will be renamed more appropriately to <identityId>.

Assertion Description

In addition to the three topics on existing elements above, the EAC-CPF team also discussed a new feature request, which deals with enabling users to encode the source of specific information as part of an EAC-CPF instance. This becomes relevant especially when looking at potentially contradicting sources e.g. for the name of an entity or the date or place of birth of a person. Discussions are still ongoing with regard to this topic, but the intent is:

  • To introduce a new element called <evidence> or similar as sub-element to most descriptive elements within EAC-CPF;
  • To include a sub-element <foundData> with this new element to encode a brief description of the evidence data found in the (new) source;
  • To work with attributes to point to the exact element that includes the assertion and to refer to potentially contradicting assertions within the same EAC-CPF instance;
  • To enable connections between the new element <evidence> and the elements <source> and <maintenanceEvent> in the <control> section to encode information about the source in general as well as about agent making the assertion and the date of the assertion.

Next steps

The EAC-CPF team will tackle pending questions with regard to these topics as well as others, which still require further consideration, in the context of its monthly meetings between December 2019 and March 2020, culminating in a three-day meeting from 9 to 12 March 2020 in Berlin, Germany. We invite you to follow and participate in our conversations on GitHub at any time.

Describing institutions with archival holdings

Kerstin Arnold is the Technical Coordinator of Archives Portal Europe

Archives are history in action. They hold and preserve records of public bodies on all administrative levels, of persons and families, of businesses, of universities and other research and educational organisations, of churches, parties, unions, etc. More importantly, though, archives make these records available to all of us and they ensure that these records are and remain accessible. Accordingly, archives have described the records they hold from the very beginning in order to make them searchable and findable. They have additionally described the entities that created, used and maintained these records over time in order to show their relationships to the records as well as to each other, thereby unfolding the social networks of the past. But what about the archives themselves? How do I know where to go if I want to find the records on a specific event in history or when I am researching my own ancestry?

A common approach

While, in the era of the Internet, a lot of archives might provide their contact details, describe their services, tell users about their institutional history or present their holdings via their own websites, the question of having a standardised approach to describe institutions with archival holdings specifically is of interest in the context of aggregating services such as the Archives Portal Europe, which bring together information from and about institutions from all over the world.

Hence it will not come as a surprise, that the idea of a common approach in the form of a standardised XML format to describe institutions with archival holdings was born in the context of the Censo-Guía de los Archivos de España e Iberoamérica. When describing archival fonds and collections with their components as well as the creators of archival materials, Censo-Guía already followed the guidance provided by the International Council on Archives (ICA) as per:

and had implemented the equivalent communication standards of Encoded Archival Description (EAD) and Encoded Archival Context (EAC-CPF) for data exchange and processing.

Evolution of a standard

What Censo-Guía was missing, however, was a standardised way to describe the more than 50,000 archival institutions from Spain and Latin America gathered in their portal, so that it would be easy for users to navigate and find the most important information, independent from an institution’s origin. Picking up on their use of EAD and EAC-CPF, Censo-Guía’s response to this apparent gap was the creation of the Encoded Archival Guide (EAG 0.2) in 2002.

Successfully implemented in their portal, EAG 0.2 then formed the basis for the Directory of Archives Portal Europe, when it was first made available online nearly a decade later in 2011. In the meantime, in 2008 to be precise, ICA had published the first edition of the International Standard for Describing Institutions with Archival Holdings (ISDIAH), and the APEx project (Archives Portal Europe network of excellence) responsible for the further development of Archives Portal Europe between 2012 and 2015 built on EAG 0.2 in combination with the additional guidance provided by ISDIAH to establish the current version of EAG 2012.

EAG 2012 is currently used within Censo-Guía, Archives Portal Europe and other national and international aggregation projects from the archives, cultural heritage and historical research domains, and it is maintained by the Archives Portal Europe Foundation in close collaboration with the Technical Subcommittee on Encoded Archival Standards (TS-EAS) at the Society of American Archivists.

A new version on the horizon

Being in its seventh year, the Working Group on Standards (WGoS) within the Archives Portal Europe Foundation has decided that EAG 2012 is up for a major revision. Starting the process now, we want to hear from the community about elements and features that they might have been missing so far, but we also want to take into account some developments from the last few years, such as:

  • the most recent drafts of the Records in Context Conceptual Model (RiC-CM) and the Records in Context Ontology (RiC-O) developed by the Expert Group on Archival Description (EGAD) at the ICA,
  • the extension of schema.org with regard to an improved representation of digital and physical archives and their content (see https://www.w3.org/community/architypes/),
  • the ongoing major revision of EAC-CPF, from which EAG 2012 has taken some inspiration with regard to its structure, elements and attributes.

All information on the current EAG 2012 can be found on – and from – our GitHub repository, which also holds a list of all issues that have been registered so far as potential candidates for changes in a new version of EAG.

We would want to encourage feedback and comments being sent via GitHub as a central place to collect, review and follow up on potential future changes. Alternatively, you can address the WGoS via email (standards@archivesportaleurope.net) and we will then include your feedback and comments on GitHub accordingly.

This call for comments is open until Friday, 27 March 2020.

Next steps

WGoS will then review and collate all feedback, creating a first draft for a new version of EAG during the second half of 2020 with additional feedback rounds throughout 2021. The aim is to then publish a new version of EAG alongside the revised version of EAC-CPF, which is currently scheduled towards the second half of 2021.

DARIAH Code Sprint

Berlin (Germany), 24-26 September 2019

Registration is now open for the next DARIAH Code Sprint, the annual coding event organised by the pan-European Digital Research Infrastructure for the Arts and Humanities (DARIAH). The event brings together developers and digital humanists from all over the world to discuss & create tools for research and usage. This year the event will be organised around the DESIR project, an offspring project of DARIAH tasked with developing sustainability approaches for the research infrastructure for technological and organisational matters.

This year’s Code Sprint will focus on bibliographical metadata from 3 angles:

– Extraction of bibliographical data and citations from PDF applying GROBID
– Data import and processing applying BibSonomy
– Data visualisation of time dependent graphs of relations

The connecting brace is to work with the same bibliographical data through the process and to improve the interoperability within the tools.

The event is open to anyone interested, inside and outside the DARIAH community. The event will be held in English. Visit the event’s website for more information.

The documentation of the first DARIAH Code Sprint can be found here

Screen Shot 2019-08-14 at 12.52.59

 

Masterclass in “authorial” libraries

The National Library, Rome (Italy), 29-31 October 2019

A masterclass dedicated to the management of “authorial” libraries collections will take place at the National Library in Rome on the 29-31 October. This specialisation course is aimed at students and professionals working as archivists or librarians.

Private (and personal) authorial libraries are characterised by specific elements such as dedications, annotations, external material (such as postcards or newspaper articles) added to the volumes. Personal collections provide an overview of the literary taste and interests of their owner, his/her network  of intellectual relations, the cultural context in which s/he operated. The course aim to teach how to set up, manage, and promote these libraries, and how to relate them to personal archives, of which they are a constituent part.

Among the topics in programme: the establishment of the idea of authorial libraries between the 19th and 20th century; how to preserve them intact; acquisitions and donations on part of the heirs; management; the antiquarian market; relations with the archives; digitisation and digital projects.

The masterclass fee is 150 EUR; registrations are accepted until the 10th October. The course will be held in Italian. More information (in Italian) here

1245

 

The new historical library of the Italian Institute for Africa and the Middle East

The National Library in Rome has opened a new library dedicated to African and Oriental Studies, thanks to the collaboration with the Italian Institute for Africa and the Middle East (ISIAO). The library holds the institute’s collection and it is composed of more than 200,000 volumes, 2500 magazines, thousands of manuscripts and printouts, maps, photographs, etc.

The size of the library makes it one of the most relevant in Europe for studies on Africa and the Middle East.More information (in Italian) is available on the website of the National Library

index