Title: "Each Part and Tag of Me is a Miracle": Reflections after Tagging the 1867 Leaves of Grass

Author(s): Brett Barney

Publication information: First published on the Whitman Archive. This paper was originally delivered at the 2001 Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing.

Whitman Archive ID: anc.00002

In the fall of 2000 Ken Price hired me as a graduate assistant to work on the encoding of documents in the Walt Whitman Hypertext Archive. At the time, I knew enough about HTML to be dangerous and had never heard of SGML, TEI, or any of the other acronyms I've since found myself using on unsuspecting colleagues and loved ones. Ken recommended that in preparation for my work I read the chapter of Guidelines for Electronic Text Encoding and Interchange titled "A Gentle Introduction to SGML," so I did. I was imagining "gentle" to mean gradual and pleasant, as in "a gentle slope," so I pictured myself easing my way into SGML enlightenment, almost imperceptibly; before I knew it I would be able to look back, survey the robust mark-up landscape, and marvel at how far I'd climbed. I've come to believe, however, that "gentle" is used here in the sense it carries in the expression "Break it to them gently." Perhaps the chapter should be subtitled "You Better Sit Down, Kids." This is not a complaint, honestly—we're better off knowing the worst, after all. Though I found the "gentle introduction" daunting and more often blunt than gentle, by working my way through it I did begin to glimpse some of the rich possibilities of SGML encoding—as well as a few of its complexities. My experiences since those first days have only reinforced my initial impressions; as I've worked at encoding Whitman texts, I've experienced both heady moments of insight when I could envision some of the possibilities that rich encoding would offer users and moments of puzzlement and frustration when both the TEI and the texts I was encoding seemed impossibly uncooperative.

Perhaps a portion of my frustrations (and also insights) are the result of Ken's somewhat fortuitous choice of texts to have me start with: the 1867 edition of Leaves of Grass. A number of reasons led to this choice, I believe: First, importantly, Ken already had in his possession an etext of this edition, having obtained permission from a commercial publisher (Primary Source Media) to use their file, which they had marked up in Borland database format and which had subsequently been converted to TEI-Lite at the University of Virginia's Etext Center. I wouldn't need to build a text from the ground up; instead, my task was to deepen this markup by adding a substantial but manageable layer of additional tagging. A second reason that my work began with the 1867 Leaves of Grass is one that I expect is familiar to many projects: We imagined that of all of the editions printed during Whitman's lifetime, the 1867 would likely present the most complicated (and therefore most productive) challenges. Since Alice Rutkowski and John Unsworth and others would be developing a project-wide, TEI-compliant DTD, by working through these challenges we had our best chance to do what is surely impossible: to make at the beginning the wisest decisions for a project that will presumably continue for many years to come.

To say that the 1867 edition of Leaves of Grass presents an unusually complicated case is something of an understatement. Relatively little critical attention has been focused on it, perhaps because, as Luke Mancuso conjectures, it has been labeled "the 'workshop' edition, which has unfortunately relegated its critical stature to the level of a second-rate edition that merits little scrutiny." Ken and Ed Folsom have characterized it as "the most carelessly printed and the most chaotic of all the editions." Indeed, you are likely to find the word "chaotic" in almost any critical discussion of this edition.

Many of the same traits that mark the book as chaotic also make it a particular challenge for TEI-compliant encoding. Even choosing a text to "capture" is a thorny issue, since what is usually referred to in shorthand as the 1867 edition is actually at least four notably distinct texts, each variation containing a different assortment of poems. As Folsom and Price have written, this fact is evidence that Whitman "was obviously confused about what form his book should take," not exactly a comforting observation if one considers the emphasis on form that undergirds TEI.

It appears that the most common issue of the 1867 edition is also the one that included all of the poems, though, and the etext I was given was—whether by design or chance I don't know—based on this variant, so it is the one I have worked on. Also, since SGML mark-up holds the promise of this one file being also, for different purposes at different moments, each of these variants the idea of choosing one as a "copy text" is probably an imagined rather than a real concern. The "completeness" of this variant, though, presents its own difficulties for mark-up. To explain these difficulties, I will first need to describe in a bit more detail the nature of the variant forms. Between the 1860 publication of the third edition of Leaves of Grass and the publication of the fourth edition six-and-a-half years later, Whitman's notions of himself and his country had been deeply affected by his experience of the Civil War. The war prompted Whitman to create many entirely new poems, some of which he chose to publish separately in 1865 as a volume titled Drum Taps. The end of the war and Lincoln's death as that volume went to press inspired more poems, and Whitman hurried these through the press as Sequel to Drum Taps. These two slender volumes he bound together in one cover but with separate tables of contents and pagination. For whatever reason, he also chose to stitch copies of them into the back of the new edition of Leaves of Grass. For good measure, behind all of these he included yet another thin sheaf of poems titled "Songs before Parting," again with its own table of contents and pagination.

One oddity that results from having these various collections of poems attached to the end of the 1867 Leaves of Grass is that the book contains four pages numbered "10," four numbered "11" and so forth, since page numbering starts over in each one. In the electronic text as it came to me this peculiarity had been handled by attaching an "a" to the values in the n= attributes of the page break tags in "Drum Taps," a "b" to those in "Sequel," and a "c" to those in "Songs before Parting." There is, of course, a degree of common sense in such a practice, though I am now beginning to question that choice. The page numbers are not encoded anywhere else in the file, so processing for display will almost certainly rely upon these values to assign page numbers. It is possible that a stylesheet can be written in such a way that the letters in these values are suppressed, but I have (admittedly vague) suspicions that doing so might be difficult. Whatever the solution—whether in the tagging or the processing—it is desirable, I believe, to avoid creating in the etext a scheme that is not present in the text Whitman had published.

In the case of these page numbers, I think the wrong choice was made before we ever got the text, perhaps based on a faulty understanding of TEI guidelines or on different assumptions about the uses of the texts and the tagging. Certainly my own limited knowledge of TEI meant that it was only very recently that I was able to evaluate this treatment of page numbers at all critically. Since as I read them the TEI guidelines don't mandate unique values for the page break number attribute, if we choose to do so we should be able to revise the mark-up to eliminate the A's B's and C's without creating validity issues. The tables of contents in the 1867 Leaves of Grass, however, present a more real conflict with TEI guidelines. As I understand them, the guidelines would normally have tables of contents included in the front matter, or occasionally in the back matter. The problem with the 1867 edition of Leaves of Grass, however, is that because each of the 4 books-within-a-book is immediately preceded by its own table of contents, we have what amounts to four sections of front matter scattered through the volume. The TEI guidelines' default "front matter, body, back matter" structure was obviously not formulated with this specific, chaotic book in mind, but it has apparently influenced its tagging in such a way that in the current etext only the first table of contents is tagged as front matter, while all of the others, somewhat counter-intuitively, are included in the body.

Similar difficulties arise in the treatment of publication dates and places. The book's four title pages list three different dates and two different places. Yet only those on the first title page (1867, New York) are included in the publication statement; the other dates and places are simply keyed as content. It seems, then, that one effect of these various encoding choices we've inherited—even though they were most likely based on a desire to play by the TEI rules—has been to make the 1867 seem less peculiar by regularizing its anomalies. And one of the effects of my adding a layer of tagging has been to call to the attention of other project members these issues, so that we can make well-considered decisions that faithfully represent Whitman's printed text.

The tendency of mark-up to clarify and disambiguate what in the printed text is unclear and ambiguous extends also to the individual poems. The 1867 edition opens with a newly written poem titled "Inscription." Unlike the other poems in the book, it is printed on unnumbered pages and preceded and followed by blank sheets, all of which visual clues suggest that it should be treated differently from the other poems—specifically that it should be included in the front matter, which our etext does. Even assuming that the poem is part of the front matter, it remains unclear whether it is intended as an inscription prefatory to the entire volume—all four inner "books"—or only to the first book. Of course, if we can only have one section of front matter, the choice to call this poem front matter labels it, by default, as pertaining to the entire volume. One can, however, make a reasonable case for including "Inscription" in the body of the text rather than the front matter. It is listed alongside all the other poems in the table of contents, an unusual though not unthinkable occurrence for front matter. Further, to encode this poem as front matter in the 1867 puts it in a unique position compared with similar instances in all the other editions. In earlier editions Whitman had no such prefatory poems. In later editions he had a number of poems grouped together as "Inscriptions"; these he placed at the beginning but did not set them off in any way from the poems that followed. For these later editions it would be strange and in fact inaccurate to include the "Inscriptions" poems in the front matter.

The groupings of poems, or "clusters" as Whitman called them, are important structural units in the 1867 edition, as they are in other editions. They are therefore tagged as divisions of the type "cluster." Knowing what is a poem and what is a cluster is not always easy, though. As an example, consider two table of contents entries: "Thoughts" and "Says." The first of these is listed as a cluster, with individual poem titles indented under it. In this case, the titles are simply numbers (a very common circumstance, especially in this edition of Leaves of Grass), so first lines are substituted. The item "Says," however, is listed in the table as a free-standing poem, although the texts of the poems give no clues of such a distinction. Both are divided into numbered sections, and both are set in the exact same size of type. Because the table is otherwise suspect (e.g., it omits one poem altogether) and because type sizes vary considerably throughout the volume, neither line of reasoning is conclusive. In short, Whitman may have thought of both "Thoughts" and "Says" as clusters of short, number-titled poems, may have thought of both as individual poems containing numbered sections, or may have thought of one as a cluster and of the other as a poem. We might be tempted, as elsewhere, say with a particularly difficult reading in a manuscript, to look at the ways the poems are treated in subsequent editions (though there is danger in putting too much stock in such "evidence"). But since in the next edition "Says" was dropped altogether and "Thoughts" appeared in vastly altered form, we don't get very far in any case.

I may seem to be making much of a minor difficulty. Probably so, but it does illustrate a couple of important points. First, even if one is completely conversant in TEI, the structures of published materials (to say nothing of manuscripts) are sometimes unclear, even undecidable with any kind of surety. Second, tagging sometimes brings into focus issues that might otherwise not have drawn our attention. I'm not sure that anyone has ever considered carefully whether these are poems or clusters—though someone might have—and it is only because I had to tag them as one or the other that I came to think about the issue. Third, whatever we decide, our tagging will privilege a particular view and impose a more rigid structure than exists in the printed text.

As any encoding project must, we have several times had to confront the issue of knowing where to stop. Are running headers important? What about differences in font size or all-caps vs. "small-caps"? Printer's ornaments? Line wraps? The only one of these we've decided for sure to encode is line wraps, on the grounds that, based on what we know of Whitman's concern for "look and feel" it is potentially useful to be able to isolate each part of a poetic line. With regard to things we've not encoded, we have sometimes comforted ourselves with the fact that users will always have available to them page images from which they can gather information not encoded. But the promise of robust encoding, as I understand it, is that the texts will be malleable in ways that static physical texts never could be, so our choice not to encode certain things tacitly implies that those features aren't really important and therefore don't really need to be searchable. Such an implication is worrisome, especially in working with the writings of Whitman, who once said, "I sometimes find myself more interested in book making than in book writing: the way books are made—that always excites my curiosity: the way books are written—that only attracts me once in a great while." So much was Whitman concerned with the physical appearance of his poems that he told his friend Horace Traubel he would like to "throw a line away" from a poem that seemed too crowded on the page. When Traubel asked, "Don't you love your lines too much for that?" he replied, "No—not enough to let them spoil the page." We on the project have also taken some solace in the knowledge that what we choose not to encode now can be encoded at a later time, that if we create rich texts they will have a useful life for a long time to come and can be the basis for later work that might draw on improved understandings or capabilities.

Of course, some of those improved understandings and capabilities may be mine. I have hope that I will be able to learn enough while I'm still working on the Whitman project to make our etext of the 1867 Leaves of Grass more faithful to the printed text even as we make it more useful. Inasmuch as our tagging has been able to faithfully mirror Whitman's own structure identifying titles, lines, etc. in the etext I'm confident that the work has been very useful. But some of the subtle ways that our tagging to date has brought order to the chaos of that edition is an area of concern, especially since a number of Whitman scholars have seen chaos as a centrally meaningful aspect of the edition, as a mirror of both Whitman's own in-process ideological restructuring and the ideological restructuring of the nation.

Despite my continuing reservations and frustrations with the process of trying to fit TEI, with its love of structure, to the jumble that is the 1867 Leaves of Grass, I see at least one immediate and compelling benefit of that process. As we have been making these various decisions about mark-up for this edition we have at the same time been teasing out a better understanding of Whitman's poetry. Certainly my own understanding has increased as the members of the project have offered opinions and suggestions about tagging quandaries based on their abundant knowledge of Whitman's poetry and writing practices. And as I heard Johanna Drucker say recently, such an increased understanding of the text—not mere efficiency—should be a goal of humanities computing.


