2. Global

Encoding Common to Every Document

2.1 Header
2.2 Unique Identifiers
2.3 Basic Document Structure
2.4 Titles and Naming
2.5 References to External Files (Page Breaks and Entity Declarations)

[Note: Above the Header of each document are the XML Declaration, Document Type Declaration, External Entity Declarations, and the open tag of the "root element," TEI, which contains all other elements. Please go to the Annotated Template or to section 2.5 to read more about how to insert these.]

2.1 The Header

Every XML document we create has a "header," which carries essential information about who is responsible for creating and publishing the document, the source of the text we are marking up, and kind of electronic title page. The header is analogous to a book's first few pages, which inform you of the author, publisher, copyright date, terms of publication, etc.

Since much of the information in the header is the same for all of the XML documents we create, we recommend that you use the template to simplify your encoding of it.

Below, you will find descriptions of the main parts of the header, and you can click here to consult an annotated version of the template.

The <teiHeader> has three principal components:

<fileDesc> contains a full bibliographic description of an electronic file

<profileDesc> provides a detailed description of non-bibliographic aspects of a text, specifically the situation in which it was produced, the participants, and their setting

<revisionDesc> summarizes the revision history for a file

These elements are arranged within the <teiHeader> in this order, so the overall structure of <teiHeader> is this:

<teiHeader>
<fileDesc></fileDesc>
<profileDesc></profileDesc>
<revisionDesc></revisionDesc>
</teiHeader>

File description<fileDesc>

This should contain the following components:

Title statement <titleStmt> includes 1) the title given to the electronic work (which here always includes the subtitle provided by us: "a machine readable transcription"); 2) the author; 3) the editors; 4) information about others responsible for aspects of the electronic text; and 5) the name of the sponsors and funders. An example in which the original document bears a title given by Whitman:

<titleStmt>
<title level="m" type="main">Song of Myself</title>
<title level="m" type="sub"&gta machine readable transcription</title>
<author>Walt Whitman</author>
<editor>Ed Folsom</editor>
<editor>Kenneth M. Price</editor>
<respStmt>
<resp>Transcription and encoding</resp>
<name>The Walt Whitman Archive Staff</name>
</respStmt>
<sponsor>The Institute for Advanced Technology in the Humanities</sponsor>
<sponsor>University of Iowa</sponsor>
<sponsor>University of Nebraska-Lincoln</sponsor>
<funder>The National Endowment for the Humanities</funder>
<funder>The United States Department of Education</funder>
</titleStmt>

An example for a manuscript that lacks an authorial title (to read the guidelines for assigning titles, click here.):

<titleStmt>
<title level="m" type="main" rend="bracketed">I see who you are</title>
<title level="m" type="sub">a machine readable transcription</title>
. . .
</titleStmt>
etc.

Note that the title element includes a rend attribute that indicates it has been supplied by us and should therefore be displayed with brackets.

Edition statement <editionStmt> gives the current date. Example:

<editionStmt>
<edition>
<date>2005</date>
</edition>
</editionStmt>

Publication statement <publicationStmt>: includes the unique id number <idno>, distributor <distributor>, address <address>, and a statement of rights and availability <availability>. Example:

<publicationStmt>
<idno>uva.00023</idno>
<distributor>The Walt Whitman Archive</distributor>
<address>
<addrLine>The Institute for Advanced Technology in the Humanities</addrLine>
<addrLine>Alderman Library</addrLine>
<addrLine>University of Virginia</addrLine>
<addrLine>P.O. Box 400115</addrLine>
<addrLine>Charlottesville, VA 22904-4115</addrLine>
<addrLine>[email protected]</addrLine>
</address>
<availability>
Copyright © 2005 by Ed Folsom and Kenneth M. Price, all rights reserved. Items in the Archive may be shared in accordance with the Fair Use provisions of U.S. copyright law. Redistribution or republication on other terms, in any medium, requires express written consent from the editors and advance notification of the publisher, The Institute for Advanced Technology in the Humanities. Permission to reproduce the graphic images in this archive has been granted by the owners of the originals for this publication only.
</availability>
</publicationStmt>

Source description <sourceDesc> gives a bibliographic description of the copy text(s) used in the creation of the present electronic text. Example:

<sourceDesc>
<bibl>
<author>Walt Whitman</author>
<title>Calamus Leaves</title>
<orgName>Yale Collection of American Literature, Beinecke Rare Book and Manuscript Library</orgName>
<note type="project">Transcribed from our own digital image of original manuscript.</note>
</bibl>
</sourceDesc>

Note on <orgName>: The institution that holds the manuscript should be cited as listed in the Preferred Citation table in the References section of the Encoding Guidelines.

Notes on description of source: This information is about the copy text, and the <title> here (as opposed to the one in titleStmt) should be given exactly as it appears in the records of the institutional repository, no matter how imprecise or wrong-headed their conventions may seem. Many times, the most specific title for the material will be that given to the folder used to store it, since few archives assign a title to each individual item; often, therefore, the <title> given in the <sourceDesc> will be a folder label.

At present, we almost always work from our own digital images, but we have also worked from Joel Myerson's facsmile reproductions of Whitman manuscripts (published in Joel Myerson, The Walt Whitman Archive: A Facsimile of the Poet's Manuscripts, New York: Garland, 1993.); from the Primary Source Media Whitman CD (Major Author's on CD-ROM: Walt Whitman, Eds. Ed Folsom and Kenneth M. Price, Woodbridge, CT : Primary Source Media, 1997); or from the original manuscripts themselves. Whatever the case, specific information about the image(s) and/or text(s) you rely on should be given in a <note>. If you consult more than one thing, list each, separated by semicolons. (Please note that when citing Myerson, the volume #, part #, and page # change from manuscript to manuscript.)

By the way, we say "our own digital image" rather than, say, the Whitman Archive's digital image so as to draw a clear distinction with Myerson's volumes, also called—somewhat confusingly—the Whitman Archive.

Profile description <profileDesc>

In the <profileDesc> is a list of all hands other than Whitman's that the markup declares as being in any way responsible, typically as the value of a "resp" (or "responsibility") attribute in a note, unclear, or gap element.

For example, if you are transcribing a Whitman manuscript that has a note by Fredson Bowers written physically on it, the header must have a <profileDesc> that reads:

<profileDesc>
<handList>
<hand scribe="Fredson Bowers" id="fb"/>
</handList>
</profileDesc>

(For more on this topic and how to encode non-Whitman writing on manuscripts, see section 3.10, "Writing in Others' Hands".)

You also need to include a <handList> in the <profileDesc> if your markup includes any <unclear> or <gap> elements, which require a "resp" attribute. For example, if Andy Jewell is encoding a manuscript with an unclear word and inserts this markup:

<unclear reason="cut away" cert="60%" resp="awj">herbage</unclear>

the document's <teiHeader> will need to include this <profileDesc>:

<profileDesc>
<handList>
<hand scribe="Andrew Jewell" id="awj"/>
</handList>
</profileDesc>

Revision description <revisionDesc>:

The revisionDesc element is used to summarize the changes that have been made to the file. It contains date, respStmt, name, and item elements to specify the date, responsible individuals, and changes. IMPORTANT: TEI allows only one <item> per <change>. If changes are performed at the same time, insert additional changes within the same <item> and use semicolons. If multiple changes are performed at different times, add another <change> at the top, so that changes are listed in reverse chronological order (most recent change first). To describe the tasks in our routine workflow, choose from the following terms for the content of <item>:

Transcribed; encoded
Checked; revised
Edited
Blessed

If the task is something other than these, any descriptive phrase can be used. Example:

<revisionDesc>
<change>
<date>2002-10-30</date>
<respStmt>
<name>Brett Barney</name>
</respStmt>
<item>Converted to camel case</item>
</change>
<change>
<date>2002-09-14</date>
<respStmt>
<name>Kenneth M. Price</name>
</respStmt>
<item>Edited</item>
</change>
<change>
<date>2002-09-07</date>
<respStmt>
<name>Andrew Jewell</name>
</respStmt>
<item>Checked; revised</item>
</change>
<change>
<date>2000-08-22</date>
<respStmt>
<name>Matt Miller</name>
</respStmt>
<item>Transcribed; encoded</item>
</change>
</revisionDesc>

Description

Unique identifiers are one-of-a-kind names assigned to each electronic text we create. That is, every poem, collection of poems and work (for an explanation of "work" vs. "document" click here) must have a unique ID.

Creating and assigning IDs

For manuscripts, IDs are made up of a 3-character repository code plus a 5-digit number (assigned in ascending order), with the two fields separated by a dot.

Examples:
loc.00158 (a manuscript at the Library of Congress)
uva.00001 (a manuscript at University of Virginia)

Printed texts are all assigned the 3-letter prefix "ppp."

ID database

We use a database to track the unique identifiers and our workflow as we transcribe, encode, and upload manuscripts. This database can be accessed here.

Placement of IDs

The unique identifier appears in two places in the TEI header:

As an attribute value in the TEI.2 root element (the very first tag):

<TEI.2 id="uva.00001">
As content in the <publicationStmt>:

<publicationStmt>
<idno>uva.00001</idno>

Transcription file names

To name the file when you save it, simply add the file extension ".xml" to the ID. Example:

uva.00023.xml

Image file names

Each page image of a document is also given an ID, created by adding a three-digit suffix to the document ID. For example:

loc.00158.002 (Page 2 of a manuscript)

These page image IDs are inserted as the value of the corresp attribute of the appropriate page break elements (<pb/>), and an entity declaration for each one must be inserted between the square brackets in the document type declaration. Example:

. . .
<!DOCTYPE TEI.2 PUBLIC "-//UVA::IATH//DTD whitman.dtd (Whitman Archive)//EN" "whitman.dtd" [

<!ENTITY uva.00023.001 SYSTEM "uva.00023.001.jpg" NDATA jpeg>
<!ENTITY uva.00023.002 SYSTEM "uva.00023.002.jpg" NDATA jpeg>

]>
. . .
<pb corresp="uva.00023.001" />
. . .
<pb corresp="uva.00023.002" />

2.3 Basic Document Structure

Within the <text> of each encoded document is a structured description of the content of the item being encoded. This page describes the basic elements of this structural tagging.

Basic Elements for Marking Structure
Manuscript Genres
Heads

Basic Elements for Marking Structure

The following elements are used to describe the structure of Whitman's poetic works:

Division <div1> through <div7>: Used, with the type attribute, to mark structural units larger than the cluster or poem. Values for the type attribute include "book," "section," "contents," "poem notes," "title notes," and "multiple poems." The largest unit is marked as <div1>, and descending levels of <div> can be nested inside. Click here to read an explanation of how these type attributes are used when marking up Whitman documents.
Line Group <lg1> through <lg7>: Function in the same way as <div>, but are used exclusively to mark clusters, poems, and structural sub-units within them (ie, groups of lines—"sections" or "linegroups"—that constitute distinct units within a poem). If the poem has no distinguishable sub-units within it, no further <lgs> are needed; if the poem has one or more sub-units, you need to mark each of those units with the appropriate <lg>. As with <div>s above, descending levels of <lg> are nested inside <lg1>. For example, for a manuscript of a poem broken into three linegroups, the poem itself would be tagged <lg1 type="poem"> and each linegroup would be tagged <lg2 type="linegroup">. The type attribute is required; values include "cluster," "poem," "section," and "linegroup."
Head <head>: Marks the title. Used on all <div>s and <lg# type="poem">s, even when the source shows no title. For a more thorough discussion click here. Requires the type attribute; choose from "main-authorial," "main-derived," and "sub."
Line <l>: Used to mark a poetic line.
Segment <seg>: Used to mark groups of words within a line of poetry that are in the same physical line. PLEASE NOTE: This tag is omitted when poetic lines run to only one physical line. Please read this note about idententation and spacing.

A sample structure might look like this:

<div1 type="poem notes">
<lg1 type="poem">
<head type="main-authorial" rend="underline"></head>
<l></l>
<l>
<seg></seg>
<seg></seg>
</l>
</lg1>
<lg1 type="poem">
<head type="main-derived"></head>
<lg2 type="linegroup">
<l>
<seg></seg>
<seg></seg>
<seg></seg>
</l>
<l></l>
</lg2>
<lg2 type="linegroup">
<l></l>
<l></l>
</lg2>
</lg1>
<p></p>
</div1>

Manuscript Genres

To figure out how to tag a particular manuscript, first look closely at its structure, and decide which of the following three categories it falls into. A guide on how to deal with each type follows.

Verse Only (can usually be identified by the use of line segmentation/hanging indentation)
Prose Only (may be divided into paragraphs)
A Mixture of Verse and Prose
Title Page

VERSE ONLY

For a manuscript of one poem composed of a single group of lines, do not use a <div>. Instead, use a structure like the following:

<text type="manuscript">
<body>
<lg1 type="poem">
<l>There is no word . . .
For a single poem clearly divided into smaller chunks:

<text type="manuscript">
<body>
<lg1 type="poem">
<lg2 type="linegroup">
<l>
<seg>[line segment here]</seg>
<seg>[line segment here]</seg>
</l>
. . .
</lg2>
<lg2 type="linegroup">[lines and segments here] </lg2>
</lg1>
</body>
</text>
For a manuscript containing two or more poems:

<text type="manuscript">
<body>
<div1 type="multiple poems">
<lg1 type="poem">[poem here]</lg1>
<lg1 type="poem">[poem here]</lg1>
</div1>

PROSE ONLY

Prose should be divided into <p>s. No <div> is required in a prose-only document unless the prose is divided into separate intellectual units. For example, a manuscript requires <div1 type="section"> if it begins with two paragraphs about democracy, then has a clear break (e.g., a sub-heading, a horizontal line, or white space) followed by three paragraphs about the sound of the fishmonger yelling on the street. In such a case, the discreet groups of paragraphs should be marked with <div1>s. Except on title pages, line breaks <lb/> are not encoded. Also note that <lg>s are only used to markup poetry, never prose.

<text type="manuscript">
<body>
<div1 type="section">
<p></p>
<p></p>
</div1>
<div1 type="section">
. . .
</div1>
</body>
</text>

MIXED GENRE

Many manuscripts contain single intellectual units which are a mixture of poetry and prose. (For an example, see the manuscript "Ashes of Roses," here.) "Mixed genre," for our purposes, does NOT just mean a manuscript leaf with poetry and prose on it (for example, a poetic draft on the recto and prose on the verso). Rather, "mixed genre" signifies writing that is thematically unified, apparently part of a single draft, but made up of a mix of prose and verse, as when Whitman composes an early draft that combines trial poetic lines with prose notes or lists. For a mixed-genre manuscript, use a <div1> with "poem notes" as the value of the "type" attribute, like this:

<text type="manuscript">
<body>
<div1 type="poem notes">
etc.

TITLE PAGE

Some manuscripts have only titles, with no content to follow those titles, or are pages with several trial titles that Whitman never used (for an example, click here). For these unusual manuscripts, we have a different <div1> type, "title notes."

<text type="manuscript">
<body>
<div1 type="title notes">
etc.

To read about the unique markup used in Title Page manuscripts, go here

Headings

A <head> tag, used to mark titles, will be used for both indexing and display. Please go here to read more about this titling procedure. Note that <head> can be on any structure; <div#> and <lg#> will be most common.

2.4 Titles and Naming

[If you are interested in reading about the markup used in Title Page manuscripts, go here]

Each poetry manuscript transcription will have three different kinds of titles. These titles may be identical; they may be different.

Title in the <titleStmt>
This title, which occurs inside the TEI header, names the electronic file you are creating and should therefore be distinguished from the title of the source material. Do this by adding the phrase "a machine readable transcription" as a subtitle, as in the following example.

<titleStmt>
  <title level="m" type="main">Death dogs my steps</title>
  <title level="m" type="sub">a machine readable transcription</title>
</titleStmt>
I f you are transcribing and encoding a manuscript that does not have a title written on it, derive a main title from the first line, as described below. Also, include the attribute rend with the value "bracketed," to signify that the title is one we have assigned based on the first line and should therefore be bracketed when displayed. An example:
<titleStmt>
  <title level="m" type="main" rend="bracketed">And to me each minute of the night and day is vital and visible</title>
  <title level="m" type="sub">a machine readable transcription</title>
</titleStmt>
F or manuscripts that contain more than one poem, follow the above procedure for each poem, but for the value of level use "a" (which indicates individual items within a larger item). Then wrap all of these individual titles in another <title level="m">. The following example imagines a manuscript with two poems, the first of which Whitman has given a title and the second of which he hasn't.
<titleStmt>
  <title level="m">
    <title level="a" type="main">Title Written on Manuscript</title>
    and
    <title level="a" type="main" rend="bracketed">Title derived from first line</title>
  </title>
  <title level="m" type="sub">a machine readable transcription</title>
</titleStmt>
Title in the<sourceDesc>
This title is the one given to the artifact by the holding institution. The <sourceDesc> is essentially a bibliography of information that should be sufficient for a user to locate the item that is the source of the transcription. If the title is bracketed in the online repository guide, you should bracket it in the <sourceDesc>. In some cases, the "title" in the <sourceDesc> may bear little relation to the poem—for example, it might be the title of the folder which holds the item rather than the title of the item itself (this is typically the case only for the Feinberg collection at the Library of Congress).
Title in the <head>
This is the location to which the stylesheet will go to pull titles of poems, etc. for indexing and display. For more detailed rules on how to formulate these titles, please see the section below. The rules for the use of <head>:
- Each <lg> and <div> can have its own <head> (and thus its own title). <head>s are required for <lg1 type="poem"> and for all <div>s within <text type="manuscript">. <head>s are not required—nor typically necessary—on any other <lg> other than <lg1 type="poem"> or on <div>s within notebooks (<text type="notebook">).
- The "type" attribute on this element is required to differentiate titles physically present on the manuscript from those assigned by our project and to distinguish between main titles and subtitles. Use one of these three values:
  — main-authorial (written by Whitman on the page)
  — main-derived (assigned by us, derived via the formula for titles (see next section below)
  — sub (subtitle written by Whitman on the page; "sub" is only used for secondary authorial titles and must be preceded by <head type="main-authorial">)

Naming poetry manuscripts

We have developed a simple set of rules for giving names to Whitman's manuscript poems. Note that this naming is IN ADDITION TO the assignment of a unique identifier. The rules are listed here in the order of priority:

First priority: a name written by Whitman. (<head type="main-authorial">)
- Disregard numbers (roman or arabic) AND punctuation that precede the first word (e.g., "?")
- For authorial titles that Whitman has edited with additions and deletions, the procedure works like this:
  - In the <titleStmt> title, use the final reading of the title, disregarding deleted passages and including added ones. For instance, for this example, the <titleStmt> title would read:
  - <titleStmt>
    <title level="m" type="main">Ah, not this granite dead and cold.</title>
    <title level="m" type="sub">a machine readable transcription</title>
    </titleStmt>
  - In the <head>, encode all the additions, deletions, and substitutions as such. For the same example, the <head> would be encoded like this:
    <head type="main-authorial" rend="underline">
      <app>
        <rdg varSeq="1">
          <del type="overstrike">Beyond this </del>
        </rdg>
        <rdg varSeq="2">
          <add type="unmarked" place="supralinear">Ah, not this </add>
        </rdg>
      </app>
    granite dead and cold.
    </head>
  - If a manuscript is not titled by Whitman, in the <titleStmt> use the first words not struckthrough, and go up to (but do not include) the first punctuation mark OR the end of the line OR the end of the segment, WHICHEVER COMES FIRST. For <head> follow the same procedure and assign the attribute type="main-derived."
  - For poems with recurrent titles (like "Leaf"), use the title AND, in brackets, the title derived from the first line. So: Leaf [A promise to Indiana]
  - Don't worry if two poems have the same title. Our unique identifier for the document will enable us to locate the correct document for processing through stylesheets.
  2.5 References to External Files:
  
  Entity Declarations and Page Breaks
  Note: The procedure described here applies to both printed and manuscript materials, even if the manuscript is only one page long.
  There are two steps to linking a document to its corresponding image:
  - First, declare the name and location of each image in an "entity declaration" right before the <TEI2> tag. In the sample below, the identifier following "!ENTITY" is the string that will be inserted within a <pb> tag; the string following "SYSTEM" is the name of the image file that corresponds with this identifier. Together, the two strings basically to tell the computer how to find the pictures.
    Sample:
    
    <!ENTITY loc.00001.001 SYSTEM "loc.00001.001.jpg" NDATA jpeg>
    <!ENTITY loc.00001.002 SYSTEM "loc.00001.002.jpg" NDATA jpeg>
    <!ENTITY loc.00001.003 SYSTEM "loc.00001.003.jpg" NDATA jpeg>
    etc.
  - PLEASE NOTE:
    - Entity names (the first string) should NOT include the file extension.
    - Be sure the file name which follows SYSTEM includes the dot before the file extension.
    - Be consistent when giving file extensions—we've chosen to always use ".jpg" rather than ".jpeg" or ".jpe," for example.
    - The last portion of the declaration, "NDATA jpeg" should always be noted in that way. This does not contradict the rule just above.
  - Second, include a pb element BEFORE the content of each page, and in the corresp attribute give to the name of the entity you've declared at the top of the document:
    
    <text><body>
    <pb corresp="loc.00001.001" id="leaf01r" type="recto"/>
    <lg1><l>etc.
    
    PLEASE NOTE:
    - Do not include the file extension here.
    - The "type" attribute is required on the <pb> element. Choose between "recto" and "verso."
    - ID's must be unique within each document, so for manuscripts that have writing on both the recto and verso and/or that have more than one page, modify the id attribute for subsequent <pb>'s. For the recto of leaf one, use id="leaf01r"; for the verso of leaf one, use id="leaf01v," etc.
    - Include the id attribute even on one-page documents.
    - For printed works or manuscripts that have page numbers, also include an "n" attribute in the <pb>. So: <pb corresp="ppp.00001.001" id="leaf01r" type="recto" n="1"/>

2. Global

Encoding Common to Every Document

2.1 The Header

File description<fileDesc>