New Forms of Textuality and Metadata

Producing and Analysing Digital Objects in Sinology
Thursday
9:00 am – 10:45 am
Room 2

Organised by Martina Siebert
Martina Siebert, “Digital Perspectives on pulu 譜錄—Reading pulu from a Distance”
Hou Ieong (Brent) Ho, “New Forms of Metadata and Non-Consumptive Computational Services with CrossAsia-ITR”
Hilde De Weerdt, “What Are and What Do We Do with Meso or Macro-Scale Historical Datasets?”
Shih-Pei Chen, “What One has to Know about a Locality: Analysing Knowledge Organisations of 4,000 Chinese Local Gazetteers”

Starting in late 1990 the availability of electronic full texts of historical Chinese sources has grown steadily. This has been triggered mainly by two conditions: first, the enormous size (and sometimes rarity) of the text corpus academics need to investigate in their research, and second, the presence of cheap labour force for typing (and of course a growing financial power of Chinese universities to buy these e-products). More recently the interest in full-text databases has become supplemented with a curiosity about what more might be done by means of Digital Humanities (DH) methods and tools. Whereas in the view of many academics DH seems to only be a more elaborate version of full-text searches aiming at answering questions faster or on a larger scale, this panel will step outside this pure instrumentalist view to explore the digital objects produced by DH tools and methods in their own right. We conjecture that the various forms of statistics, network analysis, text enhancements etc. and their interactive visualisations produce alternative, non-linear, or meta versions of a text and thereby allow—from a library’s viewpoint—for additional means for orientation within a corpus or—from an academic perspective—for an aggregation/analysis of data that supplements the characterisation of the resources used and the research problematic. The panel brings together a group of scholars with historical, digital humanity and library expertise to approach the various aspects of this notion.

Martina Siebert, “Digital Perspectives on Pulu 譜錄—Reading Pulu from a Distance“

My dissertation explored the types of knowledge and modes of presentation that characterises the genre of pulu and the changing classification pulu texts experienced within Chinese bibliographical schemes. Almost twenty years ago this was still based on hand-copying texts in Chinese libraries and an Access database to store and analyse data on the content and various classificatory allocation of the identified titles. In my later research, I have benefited to a large extent from the growing availability and searchability of electronic texts. In this talk, I want to zoom out and look at pulu from a distance, showing how they present themselves against the backdrop of different types of digital meta-objects. The bibliographical class of pulu will serve as a test case for service developments in the context of the CrossAsia Integrated Text Repository (ITR). Concrete examples are a statistical text similarity analysis (PCA) of the canonically structured corpus Xuxiu Siku quanshu and a bigram explorer allowing an analysis of how the terms pu (treatise) and tu (illustration) correlate with other terms/bigrams in this corpus. I will also look at language models, i.e. semantic units according to probability distribution and what is the most probable environment of these units, leading to a comparison of models calculated on the basis of ‘four classes’ corpora to that calculated on the basis of individual time or genre segments.

Hou Ieong (Brent) Ho, “New Forms of Metadata and Non-Consumptive Computational Services with CrossAsia-ITR”

CrossAsia is a service of the Berlin State Library that provides, among other things, access to licenced digital resources on Asia for scholars affiliated with a German academic institution. In the past five years, CrossAsia has been developing an infrastructure called the CrossAsia Integrated Textrepository (ITR) that aims at preserving the growing amount of stored digital texts but also provides the basis for developments and experiments with non-consumptive computational services that do not violate copyright restrictions of the full texts. The talk will introduce these additional approaches to the collection that supplement the ones offered by the original databases or traditional cataloguing techniques, and show how they open the road to further digital meta-objects accessible to all. The ITR Fulltext Search enables scholars all over the world to get search results with relevant snippets from over 50 million indexed pages from 325,000 titles, mainly in Chinese, English and Japanese. The ITR Explorer in addition enables users to investigate and visualise the statistical relations of keywords in the ITR collection in a number of titles and over time. Besides these pre-defined tools, we also processed the licensed texts into n-grams (consecutive one to three Chinese characters and their frequencies within a title) and released them together with their metadata as open data for interested scholars to download and perform their own analyses. This data can be used for various digital humanities methods and thus may produce further open data sets and digital meta-objects.

Hilde De Weerdt, “What Are and What Do We Do with Meso or Macro-Scale Historical Datasets?”

In an attempt to compare how contemporaries viewed relationships amongst the dozens and, in one case, hundreds of people who were put on political blacklists I and a small group of colleagues extracted relational datasets of the co-occurrence of their names in tens of thousands of documents written in a period covering the late eleventh through the early thirteenth centuries. We performed a variety of network and probabilistic analyses on these datasets which produced further datasets, spreadsheets, and interactive graphs. We produced sample datasets to compare the behaviour of those on the list to those of their contemporaries with similar backgrounds. In this presentation, we will not only present some of the conclusions of this work but also focus on the question of how these digital research outputs (and similar ones such as spatial analyses of other Chinese historical datasets) compare to analogue historical source materials and why and how they could be leveraged to discover and read primary sources at micro-scales in new ways.

Shih-Pei Chen, “What One has to Know about a Locality: Analysing Knowledge Organisations of 4,000 Chinese Local Gazetteers”

Since at least the 12th century, local gentry and officials had been recording local knowledge in local gazetteers (difangzhi). Through 800+ years of development, the genre surprisingly maintained a relatively consistent structure of roughly 20 to 100 sections that reappeared in many gazetteers across this long period of time and vast space of China. While there were indeed top-down guidelines issued by the central or provincial governments for editors to follow when compiling local gazetteers, it is not easy to grasp what the thousands of individual editors actually decided to keep, to add, and to leave out. In this presentation, we will report on our experiment of analysing the section headings of a set of 4,000 local gazetteers from the Song Dynasty to the Republican period. By employing computational techniques to look at the actual section headings used in each gazetteer and how they are similar or different, this bottom-up approach helps historians to see global patterns in the knowledge organisation of difangzhi as a genre and helps us to understand the negotiation process between the centre and the local compilers about what categories of knowledge should be recorded. Using simple statistics and a visualisation tool we analysed the 12,000 (normalised) distinct section headings from the 4,000 gazetteers. Our preliminary result shows that there are temporal as well as geographical patterns in the section headings used, which require closer examinations together with historians.

Event Timeslots (1)

Room 2
9:00 am - 10:45 am
Producing and Analysing Digital Objects in Sinology