1. Introduction

For many years the importance of facsimile reproductions has been growing in textual scholarship. In German philology, the “Frankfurter Hölderlin-Ausgabe” (FHA) (Sattler) realised by Dietrich E. Sattler from 1975 to 2008 was the first edition to systematically integrate facsimile reproductions. While the basic concept had already been proven in 1959 with the facsimile edition of Hölderlin’s “Friedensfeier” offering a diplomatic transcription of the primary stages (“Vorstufen”) (Binder &Kelletat), Sattler substantially widened this methodology for the FHA. Basically, the FHA made the whole edition process transparent and verifiable for the reader by openly presenting both documentation (facsimiles and their diplomatic transcription) and the process of text constitution (phase analysis and resulting texts) to the reader. The FHA overcame the conventional separation between the edited text and its variants kept in the critical apparatus (as present in Friedrich Beissner’s “Grosse Stuttgarter Ausgabe” which the FHA objected against (Louth 898)) by conceiving an edition as a process rather than as a presentation of the editor’s completed work (Martens 52).

Consequently, the constituted texts can be traced back to the documents (represented by facsimiles) they are based on. This implies a different relation between documents and texts than in conventional editions. Instead of separating texts from documents and overcoming the latter, text can be conceived as “a function of the document” (Gabler 199). In German philology, this paradigm change began with the FHA and was followed by other print edition projects such as the “Franz Kafka-Ausgabe” (FKA) (Reuss & Staengle) or the “Kritische Robert Walser-Ausgabe” (KWA) (Groddeck & Reibnitz). In fact, the FHA relies on the manuscripts as images and transforms them into text(s). Because this transformation is highly interpretive (and hence always unsure), the reader is brought into a position from which he can critically observe the process of text constitution. In the world of digital editing, the systematic integration of facsimiles is a natural step since digitisation projects usually create digital representations of analogue, material documents by reproducing their visual appearance. In general, digitisation is bound to the use of digital photography.1

In this article, we initially present a method for transcribing digital facsimiles2 currently being developed as a part of SALSAH (System for Annotation and Linkage of Sources in Arts and Humanities), a generic Virtual Research Environment (VRE) for humanities scholars used by several research projects (for more information on SALSAH, see Schweizer). The transcription method will be applied by the “Anton Webern Gesamtausgabe” which edits the musical works of Anton Webern (print edition). Parts of the available supplementary material such as letters and postcards will be edited in a separate series referred to as the “Webern Studies” (print edition). At the moment, our transcription efforts concentrate on the supplementary (textual) material of the “Webern Studies” which will also be made available on-line for research. In future, we will also be attempting to extend our transcription method to encompass sheet music to be used for the preparation of the “Anton Webern Gesamtausgabe”. In section 3, we discuss transcriptions as an integral part of SALSAH’s annotation and linkage functionality. Like any other digital object in SALSAH, transcriptions (or parts of them) can be related to other resources and critically commented. The resulting network-like structure can be dynamically browsed by the users.

2. Transcribing Digital Facsimiles in SALSAH

Within SALSAH, we are developing a tool implementing a topographical transcription method for digital facsimiles: the user is able to create transcriptions directly in his or her research environment. In fact, transcriptions can be seen as a kind of annotation. As in any other process of annotation in SALSAH, the user interacts with the graphical user interface (GUI) running in his or her web-browser.3 The GUI is realised exclusively on the basis of HTML/CSS and JavaScript.

The first step in the process of transcription is the definition of a geometrically described region4 on the facsimile. This steps takes the visual coherence of textual information (e.g., text blocks, side notes) as the primary reference (hence the term topographical). Subsequently, each region is associated with a transcription representing its textual information combined with visual attributes (text formatting). This method reflects the preference for a documentary approach in modern editing theory.

The regions are defined directly on the digital facsimile using a drawing application implemented with the HTML5 canvas element and JavaScript. On the right, these regions are automatically represented by elements that allow for direct text editing. That way, transcriptions can be created corresponding to the topographical relations on the digital facsimile resulting in a diplomatic transcription.

Figure 1: The screenhot shows a postcard (Österreichische Nationalbibliothek, Sign. Autogr. 431 1-200, April 24th 1937) sent to Josef Humplik and Hildegard Jone by Anton Webern with the corresponding transcription in SALSAH.


Figure 1 shows the facsimile of a postcard that is part of the edition of Anton Webern’s correspondence (supplementary material). For the transcription, four regions have been created following the spatial composition of textual information on the document (the sender and the receiver are not part of the example since they can be represented easily as ordinary metadata or by using SALSAH's region-of-interest-functionality5). Each of these has been associated with a line-by-line transcription using SALSAH’s built-in text editor: the underline is represented as well as the rotations.

It is typical in this approach that the single regions are defined and transcribed independently of each other. No assumptions are made about the sequential relation between the transcribed regions - there is no overall continuous text; the relation between them stays merely spatial. In contrast, the transcribed text within a single region is a clearly defined sequence of single characters, arranged line by line, thus forming a conventional text. The sequential combination of the region-based transcriptions would be based on interpretation to make explicit how the single transcriptions are related to each other. We will integrate a functionality in the transcription tool in order to constitute an overall text on the basis of a diplomatic transcription (e.g. that regions 1, 2 and 3 form a continuous text whereas 4 is a standalone side note). This explicit interpretation (continuous sequence of characters) does not contain spatial information anymore and could then be exported as TEI/XML or as a text file (e.g. as a rich text format file).

Figure 2: The screenhot shows the back of the postcard shown in Figure 1 (Österreichische Nationalbibliothek, Sign. Autogr. 431 1-200, April 24th 1937).


Figure 2 shows the back of Figure 1. In this case, there are interlinear insertions which can be transcribed without making their relation to the lines above or below explicit in the sense that they would have to be marked as insertions at a specific point respectively positioned in the character stream of the main region. Again, their definition remains merely spatial.

Figure 3: PDF-export of the diplomatic transcription shown in Figure 1.


SALSAH offers the possibility of exporting a diplomatic transcription as a PDF. Since PDF is based on PostScript (which is a programming language for page description) it is ideally suited for the purpose of representing a diplomatic transcription. The PDF is created on the server based on the information created by the user in the SALSAH-GUI (see Example 1). Actually, the creation of a PDF is quite similar to the functionality used on the client side inside the canvas-element (two-dimensional drawing context). So to speak, the drawing context of PostScript and the canvas are related technologies.

Each region associated with a transcription including attributes constitutes a single ‘transcription area' stored in SALSAH’s database. Every ‘transcription area' is represented by a reference to the resource (represented by a facsimile) it belongs to, the geometrical definition of the region on the facsimile, and the transcription text along with its attributes. Example 1 shows the serialisation of the ‘transcription area' of region 1 (see numbering on figure) shown in Figure 1:

Example 1: Serialisation of a ‘transcription area':

a) Geometry:


b) Character Stream:

Aber nun wird’s ja hoffentlichim Herbst wirklich werden. —Wann sehn wir uns endlichwieder?!!!Ich hatte viel Plage die letzte Zeitohne eigentlich so recht zu meinerArbeit kommen zu können!Und wie steht es um Eure?!!!Bitte, gebt uns bald Nachricht!!!Ich begebe mich heute mittagsauf eine Tour [über Sonntag]ins Ötscher-Gebiet. [Per Motorrad.]Ich muß mich ein bißchen auslaufen.Heraußen blüht es schon ganzgroßartig. Unser Garten wirdglaube ich sehr schön heuer. — Hoffentlich könnt Ihr nun dochbald zu uns kommen Das wäre

c) Properties:


d) Additional Infos:


a) Representation of the rectangular figure on the digital facsimile. The upper left and the lower right corner (x and y values) are stored relatively (range from 0 to 1) to the facsimile's width and height in order to allow for zooming. In case of several facsimiles of the same document, we could use the shared canvas model (http://www.shared-canvas.org/datamodel/spec/) allowing for the mapping of the ‘transcription area' onto multiple images (using a virtual canvas for the coordinates).

b) The text of a ‘transcription area' is represented as a mere UTF-8 character stream (neither the line breaks nor the attributes are included). UTF-8 allows for a wide range of characters including diacritic ones, also multibyte characters can be represented. SALSAH's text editor offers a configurable character table the user can choose from just by clicking (without having to know about the keyboard combination).

c) For each property (represented by a name such as ‘linebreak' or ‘underline'), an arbitrary number of ranges (starting and ending positions) can be defined. The numbers refer to a position in the text beginning with 0 (position before the first character) and ending with the last one (position after the last character) which corresponds to the length of the character stream.

d) For the whole region, additional information can be specified such as the rotation (+/- 90°), a relative offset (this allows for movement relative to the geometrical position as the spacing of printed characters is not the same as of handwritten letters (Sprünglin 34) and the font size (relative to the height of the facsimile to allow for different resolutions).

For the serialisation of the transcription text and its attributes, we have chosen an offset-based approach.5 Unlike with XML-based approaches, the attributes are stored separately from the text (the mere character stream).6 As shown in Example 1, the text’s attributes are represented by names associated with an array of objects indicating the beginning and ending positions within the character stream.7 For example, the attribute 'underline' affects the characters from position 215 until position 245 (“!!!Bitte, gebt uns bald Nachricht!!!”). The essential advantage of this offset-based approach is that it allows for various perspectives on the same text without having to deal with the problem of interference. Since the assignment of an attribute does not change the text itself (the positions of its characters) it does not interfere with other assignments. Regarding XML-based approaches, the phenomenon called interference is known as overlap. It is caused by the XML-specification that a document has to be well-formed, respectively all its elements have to be nested properly while the attributes one would like to encode might not fit into the hierarchical structure of a tree (Schmidt 344).

The emphasis on the visual appearance inherent in our approach contradicts a basic assumption of descriptive markup such as XML: that a document can be represented irrespective of its visual representation (cf. Pierazzo & Stokes 399). This assumption can be explained by having a look at the origin of SGML (of which XML is a subset). SGML was developed in the 1970s at IBM to compose and represent documents such as manuals and technical documentations independently of different devices and platforms. SGML’s purpose was to describe a document’s structure in terms of ‘logical' units like chapters, paragraphs etc. which could then be processed in an arbitrary way (cf. Schmidt 339). In fact, the purpose of descriptive markup was to overcome the need of using procedural markup in the process of document composition. While the latter defined the procedures to be carried out (e.g. formatting instructions for printing), the former offered the possibility of an abstract and universal description of a document (Goldfarb 1981, Goldfarb 1990).

Descriptive markup requires an explicit formulation of a document's structure whereas in modern editing theory a documentary8 approach became common. The basic contradiction consists in the fact that a documentary approach tries to describe 'material' characteristics (such as spatial relations or layout information) which are regarded as secondary or procedural in markup theory (cf. Pierazzo & Stokes 400). SGML/XML presupposes an unambiguous 'logical' structure but such a structure can only be built on the basis of interpretation and is thus always unsure (cf. Sprünglin 2008). It has become increasingly accepted in editing theory that:

“it is documents that we have, and documents only. In all transmission and all editing, texts are (and, if properly recognised, always have been) constructs from documents. For to edit texts critically means precisely this: to construct them.” (Gabler 199).

The serialisation shown in example 1 is used as the primary structure for storing a ‘transcription area' in the data base. But since SALSAH is completely web-based, we have to present the transcription to the user as HTML. For this reason, we have developed a JavaScript-application which converts the HTML-representation of a transcription to the primary offset-based data structure and vice versa. The HTML-representation is only used for display purposes. For this reason, the structure of the HTML representation is not semantically relevant: while SGML/XML-based approaches associate the hierarchical structure of a document with meaning, we consider different hierarchical structures as equivalent as long as they result in the same visual representation.

Consequently, we are using HTML as a markup language to describe a visual output. HTML 4 was defined by using SGML, so it is actually an application of SGML as a metalanguage (a language to define languages). With the advent of XHTML, SGML was replaced by XML as the metalanguage applied. Strictly speaking, our approach is an abuse of HTML because it contains HTML elements designed to generate a certain rendering while it is based upon the ideas of descriptive markup (in a technical sense the distinction between content and form is preserved due to the usage of CSS, but since HTML elements have no structural, but instead a visual meaning in our approach, this distinction is blurred).

However, also the current TEI guidelines P5 offer an encoding model for transcriptions that actually contradicts the principles of descriptive markup. Unlike the mere linking of facsimiles to TEI encodings representing textual information ('parallel transcription' (Burnard & Bauman, section 11.2.1), the 'embedded transcription' (Burnard & Bauman, section 11.2.2) describes the “physical disposition of its [the document's] component parts”. The notable innovation is the topographical orientation of the ‘embedded transcription'. While physical aspects such as page breaks have been conventionally represented by milestone elements (empty elements) in TEI encodings in order to avoid the problem of overlap with ‘logical' elements, the primary hierarchy of this new encoding follows the visual appearance of the document, thus changing the relation between document and text: “An embedded transcription is one in which words and other written traces are encoded as subcomponents of elements representing the physical surfaces carrying them rather than independently of them.” (Burnard & Bauman, section 11.2.2). The primary hierarchy consists of zone-elements indicating geometrical areas on the document-surface. Each of these zone-elements contains line-elements with the transcribed text (cf. Pierazzo & Stokes 423f.). We will offer a conversion of ‘transcription areas' as shown in Example 1 to a TEI-‘embedded transcription', as this form of TEI-encoding does not require an overall text.

3. Annotation and Linkage of Digital Objects

Within SALSAH, 'transcription areas' are objects relating to other objects (such as postcards, letters etc.). Since SALSAH's annotation and linkage functionality is generic it applies to digital objects in general irrespective of their concrete shape. In this section, SALSAH's functionality is explained by reference to the Webern-project. Basically, the same functionality is used in an art-historical context (Incunabula-project)9 and in collaboration with a virtual library of medieval manuscripts (e-codices).10

Regarding the “Webern Studies”, the project basically11 uses two types of objects in SALSAH: events (dated biographical and work-related events) and supplements (letters, postcards, articles, diaries, notes). While the former are immaterial (they have to be constituted) the latter may be connected to a digital representation of the original material (e.g. a digital facsimile). Information contained in supplements documents events. For this reason, supplementary material may be related to events. In SALSAH, this kind of relation between objects is called a resource reference. A resource reference may be based on a supplement object as a whole or on parts (one or multiple words) of its transcription or any other annotation. Like this, descriptions of an event contained in an annotation or transcription may relate to an event such as a biographical one or a concert etc. Technically, these references from an annotation or transcription to an event etc can be represented like other attributes: in an offset-based form.

Example 2: Linking a word in a transcription to an object in SALSAH:

Character Stream

[...] Ich hatte viel Plage die letzte Zeit ohne eigentlich so recht zu meiner Arbeit kommen zu können! [...]



In Example 2, the word “Arbeit'' (see Figure 1, region number 1) in the transcription text is related to an event object (represented by the pseudo resource-id 'xy' here). Since SALSAH offers a generic text editor, resource references can be defined in any textual annotation to any object in SALSAH. The video below shows how a reference to another resource can be established in SALSAH.

Video: Linking parts of a text to another object in SALSAH


Webern writes about troubles which kept him away from working on his composition. According to the Webern-biographer Hans Moldenhauer, Webern was working on opus 28 at the time when he wrote the postcard (April 24th 1937)(Moldenhauer 443). The indicating word in the transcription refers directly to the object representing the event of the composition process of opus 28. Unlike a conventional hyperlink (the anchor-tag in HTML), SALSAH recognises outgoing as well as incoming links to resources. This way, each link can be treated as being bidirectional (e.g., feedback for the user in the GUI when he is looking at an object). As a result, relations between objects constitute a network-like structure as shown in Figure 4.

Figure 4: Network structure in SALSAH.


Besides a simple fulltext search (as for example known from Google), SALSAH offers an extended search mode to browse this network-like structure. This mode allows for a structured search: the user may indicate what kind of resources (e.g. supplements or events) he is looking for and also specify their characteristics (e.g. the exact date or a period). Using SALSAH's extended search mode, its network-like structure (generated by the researchers themselves) can be filtered according to the researchers' current interests. This is made possible by SALSAH's semantic implementation of the Resource Description Framework (RDF): each resource (subject) in SALSAH has a resource type with a set of defined properties (predicate). Additionally, the occurrence of these properties can be defined (0-1, 1, 1-n, 0-n).

In order to allow for optimal usability we are developing a visualisation tool in SALSAH. With this tool, it is possible to interactively navigate in SALSAH's network structure. Therefore, the RDF-graphs are visualised using the HTML5-canvas element and a Javascript-based graph visualisation library.12 The graph-visualisation makes visible indirect and thus implicit relations between objects and allows for an intuitive and interactive navigation in SALSAH's database (see video).

Video: SALSAH's visualisation tool


4. Conclusion

The article presented a tool for the creation of diplomatic transcriptions for digital facsimiles as an integral part of SALSAH, a VRE for the humanities. The transcription tool provides a web-based application with a GUI and an automatically generated offset-based serialisation allowing for exports of diplomatic transcriptions as PDF or TEI-'embedded transcription'. The main characteristic of a diplomatic transcription is that its visual and spatial approach does not presuppose an overall textual structure. Therefore, we have chosen a serialisation which does not rely on the principles of descriptive markup inherent in SGML and XML. There are plans to extend the tool by adding functionality to constitute texts by combining single regions: mere spatial relations of a diplomatic transcription can be made explicit and exported as texts (as TEI/XML or as text files such as the rich text format).

Furthermore, the using a VRE allows for a way of treating transcriptions and texts in a manner that goes beyond the possibilities of printed editions: (transcription) texts can be annotated and linked to other objects in arbitrary ways. As the basis of academic work (the edition) becomes the medium of academic work, research can be carried out in the same medium the edition has been established in 13.

5. References

Binder, Wolfang and Kelletat, Alfred (ed.). Friedrich Hölderlin: Friedensfeier. Lichtdrucke der Reinschrift und ihrer Vorstufen, Tübingen. 1959.

Burnard, Lou and Syd Bauman. Guidelines for Electronic Text Encoding and Inter-

change. 2007. 5. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/. (Updated 17/01/2013)

Gabler, Hans Walter.: The Primacy of the Document in Editing. Ecdotica 4, 2007.197-207.

Goldfarb, Charles F. “A Generalized Approach to Document Markup”. Proceedings of the

ACM SIGPLAN SIGOA Symposium on Text Manipulation, New York, 1981. 68-73.

Goldfarb, Charles F. The SGML Handbook. Oxford, 1990.

Groddeck, Wolfram and von Reibnitz, Barbara (ed.) Robert Walser. Kritische Ausgabe sämtlicher Drucke und Manuskripte. Basel, 2008ff.

Louth, Charly. “The Frankfurt Edition of Hölderlin's Hymns: A Review Article”. The Modern Language Review 98.4, 2003. 898-907.

Martens, Gunter. Texte ohne Varianten? Überlegungen zur Bedeutung der Frankfurter Hölderlin-Ausgabe in der gegenwärtigen Situation der Editionsphilologie, in: Zeitschrift für deutsche Philologie 101, Sonderheft: Probleme neugermanistischer Edition, 1982. 43-64.

Moldenhauer, Hans. Anton von Webern: Chronik seines Lebens und Werkes. Zürich 1980.

Pierazzo, Elena and Peter A. Stokes. "Putting the Text back into Context: A Codicological
Approach to Manuscript Transcription." Kodikologie und
Paläographie im Digitalen Zeitalter 2. Ed. Franz Fischer. Norderstedt, 2010. 397–429.

Reuss, Roland and Staengle, Peter (ed.) Franz Kafka. Historisch-kritische Ausgabe sämtlicher Handschriften, Drucke und Typoskripte, Frankfurt am Main, 1995ff.

Sattler. Dietrich E. (ed.) Friedrich Hölderlin. Sämtliche Werke. ‘Frankfurter Ausgabe’, Frankfurt am Main. 1975-2008.

Schmidt, Desmond. “The inadequacy of embedded markup for cultural heritage texts”.

Literary and Linguistics Computing, 25.3, 2010. 337-356.

Schweizer, Tobias. "Development of a Topographical Transcription Method". Claire Clivaz et al. (ed.): Lire demain. Des manuscrits à l'ère digitale, Lausanne, 2012. 671-679.

Sprünglin, Matthias. Zu Theorie und Praxis der elektronischen Edition in der Kritischen Robert Walser-Ausgabe, in: TEXT. Kritische Beiträge 12, 2008. 31-38.