2. Global

Encoding Common to Every Document


2.1 Header
2.2 Unique Identifiers
2.3 Basic Document Structure
2.4 Titles and Naming
2.5 References to External Files (Page Breaks and Entity Declarations)

[Note: Above the Header of each document are the XML Declaration, Document Type Declaration, External Entity Declarations, and the open tag of the "root element," TEI, which contains all other elements. Please go to the Annotated Template or to section 2.5 to read more about how to insert these.]

2.1 The Header

Every XML document we create has a "header," which carries essential information about who is responsible for creating and publishing the document, the source of the text we are marking up, and kind of electronic title page. The header is analogous to a book's first few pages, which inform you of the author, publisher, copyright date, terms of publication, etc.

Since much of the information in the header is the same for all of the XML documents we create, we recommend that you use the template to simplify your encoding of it.

Below, you will find descriptions of the main parts of the header, and you can click here to consult an annotated version of the template.

The <teiHeader> has three principal components:

  • <fileDesc> contains a full bibliographic description of an electronic file
  • <profileDesc> provides a detailed description of non-bibliographic aspects of a text, specifically the situation in which it was produced, the participants, and their setting
  • <revisionDesc> summarizes the revision history for a file

These elements are arranged within the <teiHeader> in this order, so the overall structure of <teiHeader> is this:

<teiHeader>
<fileDesc></fileDesc>
<profileDesc></profileDesc>
<revisionDesc></revisionDesc>
</teiHeader>

File description<fileDesc>

This should contain the following components:

  • Title statement <titleStmt> includes 1) the title given to the electronic work (which here always includes the subtitle provided by us: "a machine readable transcription"); 2) the author; 3) the editors; 4) information about others responsible for aspects of the electronic text; and 5) the name of the sponsors and funders. An example in which the original document bears a title given by Whitman:

<titleStmt>
<title level="m" type="main">Song of Myself</title>
<title level="m" type="sub"&gta machine readable transcription</title>
<author>Walt Whitman</author>
<editor>Ed Folsom</editor>
<editor>Kenneth M. Price</editor>
<respStmt>
<resp>Transcription and encoding</resp>
<name>The Walt Whitman Archive Staff</name>
</respStmt>
<sponsor>The Institute for Advanced Technology in the Humanities</sponsor>
<sponsor>University of Iowa</sponsor>
<sponsor>University of Nebraska-Lincoln</sponsor>
<funder>The National Endowment for the Humanities</funder>
<funder>The United States Department of Education</funder>
</titleStmt>

An example for a manuscript that lacks an authorial title (to read the guidelines for assigning titles, click here.):

<titleStmt>
<title level="m" type="main" rend="bracketed">I see who you are</title>
<title level="m" type="sub">a machine readable transcription</title>
. . .
</titleStmt>
etc.

Note that the title element includes a rend attribute that indicates it has been supplied by us and should therefore be displayed with brackets.

<editionStmt>
<edition>
<date>2005</date>
</edition>
</editionStmt>

<publicationStmt>
<idno>uva.00023</idno>
<distributor>The Walt Whitman Archive</distributor>
<address>
<addrLine>The Institute for Advanced Technology in the Humanities</addrLine>
<addrLine>Alderman Library</addrLine>
<addrLine>University of Virginia</addrLine>
<addrLine>P.O. Box 400115</addrLine>
<addrLine>Charlottesville, VA 22904-4115</addrLine>
<addrLine>[email protected]</addrLine>
</address>
<availability>
Copyright &#169; 2005 by Ed Folsom and Kenneth M. Price, all rights reserved. Items in the Archive may be shared in accordance with the Fair Use provisions of U.S. copyright law. Redistribution or republication on other terms, in any medium, requires express written consent from the editors and advance notification of the publisher, The Institute for Advanced Technology in the Humanities. Permission to reproduce the graphic images in this archive has been granted by the owners of the originals for this publication only.
</availability>
</publicationStmt>

<sourceDesc>
<bibl>
<author>Walt Whitman</author>
<title>Calamus Leaves</title>
<orgName>Yale Collection of American Literature, Beinecke Rare Book and Manuscript Library</orgName>
<note type="project">Transcribed from our own digital image of original manuscript.</note>
</bibl>
</sourceDesc>

Note on <orgName>: The institution that holds the manuscript should be cited as listed in the Preferred Citation table in the References section of the Encoding Guidelines.

Notes on description of source: This information is about the copy text, and the <title> here (as opposed to the one in titleStmt) should be given exactly as it appears in the records of the institutional repository, no matter how imprecise or wrong-headed their conventions may seem. Many times, the most specific title for the material will be that given to the folder used to store it, since few archives assign a title to each individual item; often, therefore, the <title> given in the <sourceDesc> will be a folder label.

At present, we almost always work from our own digital images, but we have also worked from Joel Myerson's facsmile reproductions of Whitman manuscripts (published in Joel Myerson, The Walt Whitman Archive: A Facsimile of the Poet's Manuscripts, New York: Garland, 1993.); from the Primary Source Media Whitman CD (Major Author's on CD-ROM: Walt Whitman, Eds. Ed Folsom and Kenneth M. Price, Woodbridge, CT : Primary Source Media, 1997); or from the original manuscripts themselves. Whatever the case, specific information about the image(s) and/or text(s) you rely on should be given in a <note>. If you consult more than one thing, list each, separated by semicolons. (Please note that when citing Myerson, the volume #, part #, and page # change from manuscript to manuscript.)

By the way, we say "our own digital image" rather than, say, the Whitman Archive's digital image so as to draw a clear distinction with Myerson's volumes, also called—somewhat confusingly—the Whitman Archive.

Profile description <profileDesc>

In the <profileDesc> is a list of all hands other than Whitman's that the markup declares as being in any way responsible, typically as the value of a "resp" (or "responsibility") attribute in a note, unclear, or gap element.

For example, if you are transcribing a Whitman manuscript that has a note by Fredson Bowers written physically on it, the header must have a <profileDesc> that reads:

<profileDesc>
<handList>
<hand scribe="Fredson Bowers" id="fb"/>
</handList>
</profileDesc>

(For more on this topic and how to encode non-Whitman writing on manuscripts, see section 3.10, "Writing in Others' Hands".)

You also need to include a <handList> in the <profileDesc> if your markup includes any <unclear> or <gap> elements, which require a "resp" attribute. For example, if Andy Jewell is encoding a manuscript with an unclear word and inserts this markup:

<unclear reason="cut away" cert="60%" resp="awj">herbage</unclear>
the document's <teiHeader> will need to include this <profileDesc>:
<profileDesc>
<handList>
<hand scribe="Andrew Jewell" id="awj"/>
</handList>
</profileDesc>

Revision description <revisionDesc>:

The revisionDesc element is used to summarize the changes that have been made to the file. It contains date, respStmt, name, and item elements to specify the date, responsible individuals, and changes. IMPORTANT: TEI allows only one <item> per <change>. If changes are performed at the same time, insert additional changes within the same <item> and use semicolons. If multiple changes are performed at different times, add another <change> at the top, so that changes are listed in reverse chronological order (most recent change first). To describe the tasks in our routine workflow, choose from the following terms for the content of <item>:

If the task is something other than these, any descriptive phrase can be used. Example:

<revisionDesc>
<change>
<date>2002-10-30</date>
<respStmt>
<name>Brett Barney</name>
</respStmt>
<item>Converted to camel case</item>
</change>
<change>
<date>2002-09-14</date>
<respStmt>
<name>Kenneth M. Price</name>
</respStmt>
<item>Edited</item>
</change>
<change>
<date>2002-09-07</date>
<respStmt>
<name>Andrew Jewell</name>
</respStmt>
<item>Checked; revised</item>
</change>
<change>
<date>2000-08-22</date>
<respStmt>
<name>Matt Miller</name>
</respStmt>
<item>Transcribed; encoded</item>
</change>
</revisionDesc>

2.2 Unique Identifiers


Description

Unique identifiers are one-of-a-kind names assigned to each electronic text we create. That is, every poem, collection of poems and work (for an explanation of "work" vs. "document" click here) must have a unique ID.

Creating and assigning IDs

For manuscripts, IDs are made up of a 3-character repository code plus a 5-digit number (assigned in ascending order), with the two fields separated by a dot.
Examples:
loc.00158 (a manuscript at the Library of Congress)
uva.00001 (a manuscript at University of Virginia)

Printed texts are all assigned the 3-letter prefix "ppp."

ID database

We use a database to track the unique identifiers and our workflow as we transcribe, encode, and upload manuscripts. This database can be accessed here.

Placement of IDs

The unique identifier appears in two places in the TEI header:

Transcription file names

To name the file when you save it, simply add the file extension ".xml" to the ID. Example:
uva.00023.xml

Image file names

Each page image of a document is also given an ID, created by adding a three-digit suffix to the document ID. For example:
loc.00158.002 (Page 2 of a manuscript)

These page image IDs are inserted as the value of the corresp attribute of the appropriate page break elements (<pb/>), and an entity declaration for each one must be inserted between the square brackets in the document type declaration. Example:

. . .
<!DOCTYPE TEI.2 PUBLIC "-//UVA::IATH//DTD whitman.dtd (Whitman Archive)//EN" "whitman.dtd" [

<!ENTITY uva.00023.001 SYSTEM "uva.00023.001.jpg" NDATA jpeg>
<!ENTITY uva.00023.002 SYSTEM "uva.00023.002.jpg" NDATA jpeg>

]>
. . .
<pb corresp="uva.00023.001" />
. . .
<pb corresp="uva.00023.002" />

2.3 Basic Document Structure

Within the <text> of each encoded document is a structured description of the content of the item being encoded. This page describes the basic elements of this structural tagging.


Basic Elements for Marking Structure

The following elements are used to describe the structure of Whitman's poetic works:

A sample structure might look like this:

<!-- markup is simplified -->
<div1 type="poem notes">
<lg1 type="poem">
<head type="main-authorial" rend="underline"></head>
<l></l>
<l>
<seg></seg>
<seg></seg>
</l>
</lg1>
<lg1 type="poem">
<head type="main-derived"></head>
<lg2 type="linegroup">
<l>
<seg></seg>
<seg></seg>
<seg></seg>
</l>
<l></l>
</lg2>
<lg2 type="linegroup">
<l></l>
<l></l>
</lg2>
</lg1>
<p></p>
</div1>

Manuscript Genres

To figure out how to tag a particular manuscript, first look closely at its structure, and decide which of the following three categories it falls into. A guide on how to deal with each type follows.

VERSE ONLY

PROSE ONLY

Prose should be divided into <p>s. No <div> is required in a prose-only document unless the prose is divided into separate intellectual units. For example, a manuscript requires <div1 type="section"> if it begins with two paragraphs about democracy, then has a clear break (e.g., a sub-heading, a horizontal line, or white space) followed by three paragraphs about the sound of the fishmonger yelling on the street. In such a case, the discreet groups of paragraphs should be marked with <div1>s. Except on title pages, line breaks <lb/> are not encoded. Also note that <lg>s are only used to markup poetry, never prose.

<!-- markup is simplified -->
<text type="manuscript">
<body>
<div1 type="section">
<p></p>
<p></p>
</div1>
<div1 type="section">
. . .
</div1>
</body>
</text>

MIXED GENRE

Many manuscripts contain single intellectual units which are a mixture of poetry and prose. (For an example, see the manuscript "Ashes of Roses," here.) "Mixed genre," for our purposes, does NOT just mean a manuscript leaf with poetry and prose on it (for example, a poetic draft on the recto and prose on the verso). Rather, "mixed genre" signifies writing that is thematically unified, apparently part of a single draft, but made up of a mix of prose and verse, as when Whitman composes an early draft that combines trial poetic lines with prose notes or lists. For a mixed-genre manuscript, use a <div1> with "poem notes" as the value of the "type" attribute, like this:

<!-- markup is simplified -->
<text type="manuscript">
<body>
<div1 type="poem notes">
etc.

TITLE PAGE

Some manuscripts have only titles, with no content to follow those titles, or are pages with several trial titles that Whitman never used (for an example, click here). For these unusual manuscripts, we have a different <div1> type, "title notes."

<!-- markup is simplified -->
<text type="manuscript">
<body>
<div1 type="title notes">
etc.

To read about the unique markup used in Title Page manuscripts, go here

Headings

A <head> tag, used to mark titles, will be used for both indexing and display. Please go here to read more about this titling procedure. Note that <head> can be on any structure; <div#> and <lg#> will be most common.

2.4 Titles and Naming


[If you are interested in reading about the markup used in Title Page manuscripts, go here]

Each poetry manuscript transcription will have three different kinds of titles. These titles may be identical; they may be different.


Naming poetry manuscripts


We have developed a simple set of rules for giving names to Whitman's manuscript poems. Note that this naming is IN ADDITION TO the assignment of a unique identifier. The rules are listed here in the order of priority: