Documenting Language

By “documentation” we most often refer to the creation of high-quality digital recordings of stories, narratives, dialog or elicitation sessions on grammatical topics. We strive to transcribe and annotate all of this material. In every case, we also try to share as much as possible with the public and certainly with the relevant language community itself, who typically plays a major role in directing, if not explicitly initiating, the project.

It is often pointed out by speakers of endangered languages that notions like “documentation” and “preservation” bear an uncomfortable resemblance to pickling and seem quite removed from the everyday struggle to keep languages spoken in the community. From the most severe perspective, the entire notion of documenting endangered languages can be viewed as unethical, like a portrait artist plying their trade in an emergency room. There is nothing wrong with painting portraits, but the emergency room is a place for surgeons, not analysts. The hope of course is that documentary work will feed into revitalization. In practice, this may turn out to be more rationalization than reality. There is widespread agreement that the single key to perpetuating language is immersion schools for children. The classic products of linguistic research, grammars, dictionaries and archived recordings, are ultimately peripheral to such activities. Languages were spoken before such resources existed and if there are still speakers left, their languages can be passed on orally as they were since time immemorial. So, contrary to popular belief, an endangered language needs preschool teachers to survive far more than it needs linguists. Unfortunately, however, there are very few institutions that offer training in the creation of an immersion program.

Linguists, however, have made major contributions in the realm of revitalization. While few end up on the front lines, e.g. as preschool teachers or agitators for language rights, many have been instrumental in training those who would lead the charge. Furthermore, in those cases where circumstances may not have allowed for revitalization, linguists have been able to create a lasting record of a language which in some cases has formed the basis for revitalization efforts today. Two prominent local examples of this involve Algonquian peoples, the Miami tribe’s Myaamia Center, headed by Daryl Baldwin, and the Wôpanâak Language Reclamation Project, headed by Jessie “little doe” Baird. Both groups are succeeding in bringing back their language child by child several decades after the last active speakers had passed away and this success has been made possible in part with the help of linguists; Ken Hale in the case of Wôpanâak, who helped mentor Jessie “little doe” Baird, and David Costa in the case of Myaamia, who has devoted much of his career to the interpretation, analysis and reconstruction of the Miami-Illinois language.

The positioning of language documentation as a modern subfield of linguistics (inasmuch as it has been accepted by academic institutions) is largely due to the work of Nikolaus Himmelmann (1998, 2006), who has argued that description should not be mistaken for primary data, i.e., recordings, transcribed texts and annotations. Documentary linguistics emphasizes the link between data and description both for the purposes of preserving the original record as well as for verifying statements made about the language. Good documentation, it is argued, constitutes a “lasting, multi-purpose record of a language”. Longevity is achieved through archiving recordings in open formats at institutions which can guarantee their safety (e.g. ELAR, DoBeS, AILLAParadisecCalifornia Language Archive). By “multi-purpose”, it is intended that documentation be able to satisfy the interests and needs of diverse parties: community members interested in revitalization, linguists interested in description as well as family members and others who are interested in the historical, cultural and sentimental value that field recordings contain.


Recordings may take place at speakers’ homes, at community centers, at ELA’s office, or elsewhere–most of the speakers we work with are from the New York/New Jersey area, but ELA is also a center for endangered language work more broadly.

Our documentation efforts at ELA are aimed at languages with very little in the way of recorded or written materials, either in the language or about the language. One of our goals is thus to record culturally significant texts that can be shared with the language group or more widely with the public. We focus on supplementing the documentary record for endangered languages–in many cases, ELA linguists are making the first-ever multimedia recordings of a given language–and responding to the needs and requests of communities. While the highest priority for a language’s survival must be transmission, in the worst case scenario, high quality annotated recordings can ultimately put a language on “life support” as we find with the many documentary materials being used for language reclamation across the globe today. Our primary aim however is not to pickle language in case of future catastrophe, it is to make these recordings useful for today’s generation of language learners, linguists and others.

The opportunity of working with hundreds of language communities in New York City is tempered by certain challenges. Unlike in traditional “fieldwork”, in several cases we do not have access to more than a single speaker of a language, as they may be the only local representatives of their language community. Our corpus of recordings in these cases is necessarily unbalanced, lacking dialogue and inter-speaker variation. On the other hand, in depth documentation of a single speaker’s knowledge of his or her language still constitutes a very significant contribution for a language that has never been carefully documented. This raises an interesting question of whether languages can be properly documented and understood outside of their traditional area. We would answer this question with an emphatic “yes” for the following reasons. While the connection between land and language cannot and should not be denied, there are many components of a language that are more computational than cultural, for instance, how a language forms relative clauses or the way certain speech sounds pattern within words. Secondly, the false dichotomy of “local languages” versus “world languages” puts unnatural strictures on indigenous languages that nobody would think of imposing on languages such as English, Chinese, Russian or French, all routinely studied outside their homelands. In effect, it reenforces the detrimental idea that certain languages are not fully appropriate outside their villages. Finally, small communities routinely reconstitute their cultures in New York City; there are, in fact, several well-known cases where languages have been preserved only outside the homeland. In sum, while we recognize that languages co-evolve with their surroundings we also give much credit to the resilience of languages and cultures in diaspora.


Making high-quality audio recordings has become vastly easier than it used to be, when heavy pieces of delicate equipment had to be brought to the field. A good digital recorder, capable of making far clearer recordings for greater lengths of time can now fit easily in one’s pocket. The digital revolution has also allowed for major advances in storage, transfer, sharing and searching. ELA lends out recorders such as the SONY PCM-M10 to those wanting to make recordings in New York City and around the world. Some examples of work ELA associates have done abroad using this relatively simple handheld equipment can be seen here and here.


ELA researcher Natalia Bermudez teaches a Naso boy how to transcribe his language using a laptop and free software.

Our goals for each project involve creating a searchable corpus of texts. For this, we use standard free software such as Praat and ELAN to create time-aligned annotations and Fieldworks to gloss texts and create lexica. Together with Raphael Finkel of the University of Kentucky, we are currently creating an online interface for searching Fieldworks databases that will allow complex searches of glossed text with links to the lexicon and audio.

As much as possible, we strive to share our documentation through popular means such as our YouTube channel, which now allows for captions and rolling transcripts. In this way, our documentation has the best chances of reaching its most important audience, the young people of the community to which the language belongs. More recently, we have begun to participate in the Endangered Language Project and are sharing media through that online venue, as well. The latter is a free forum aimed at speakers of endangered languages to which anyone can upload video material.

There are several excellent introductions to linguistic fieldwork which detail the use of digital and analog tools in language documentation (see References). A great collection of questionnaires for grammatical exploration can also be found at the Max Planck Institute website.



2012 Timor Workshop: Preserving Knowledge Through Recording and Writing Local Languages

The communities we typically work with are “minorities within minorities”: people who are minorities in their places of origin whose compatriots form a minority community in New York. The ethno-linguistic identities of these groups in New York are often not even recognized by others from their country. For example, the Amuzgo people of Guerrero and Oaxaca states in Mexico speak one of over 250 indigenous languages that exist in Mexico, but few Mexicans from outside the immediate Amuzgo-speaking area have ever heard of the Amuzgo people or their language.

We are often asked how ELA makes contact with speakers of indigenous and endangered languages. Finding such “doubly minoritized” communities in New York is usually not straightforward and must rely to a large extent on chance and indeed the element of chance never ceases to surprise us. To take but two examples, not long after researcher Natalia Bermudez returned from her fieldwork in Panama with the Naso and Ngobe, we were contacted by one of only two individuals from the related Bribri tribe living in the United States. Similarly, after an article about our work appeared in the New York Times, we were contacted by the only known speaker of Tsou in the United States, an endangered Austronesian language of Taiwan with roughly 2,000 speakers.

To make contact, we post flyers, advertise through community radio, get plugged into community and neighborhood networks, and talk to people at fairs and other celebrations. In many cases, individuals contact us after having heard about us in the press or after hearing about or attending one of our public events. Most often, our collaborators will introduce us to related communities in the local area and thus our network grows.

ELA collaborates with interested individuals and organizations from across the city on a variety of language projects and has been successful in procuring grants for fieldwork on endangered languages abroad.  ELA also has close ties to the Endangered Language Initiative at the CUNY Graduate Center, is engaged in film and poetry projects, and is assisting with programming at NYC museums and cultural centers.


ELA follows best practice ethical guidelines in its research and conforms to the ethics statement put forth by the Linguistic Society of America. A substantial portion of our work is explicitly collaborative, shaped from the beginning by community or speaker initiative and interest. Language consultants are compensated for their time, and resulting recordings and materials are shared only with the explicit permission of all involved. While most speakers and communities choose to share their language and culture with the general public, whether through the ELA website, Youtube channel or some other means, we remain sensitive and responsive to issues of access, use, and intellectual property. Those recorded also have the right to make their recordings private retroactively. We extend our responsibilities not only to our own documentation but also in helping communities gain access to previous linguistic literature on their languages.


The following is a very abbreviated list of relevant references to books and articles arranged by topic. Directions for further reading can be found in the sources below, as well as in Language Documentation & Conservation, a journal dedicated to the field. For those interested in joining list-serves, two relevant ones are Indigenous Languages and Technology and the Endangered Languages List.

[Note: Many of the books listed under "Fieldwork, Language Documentation and Description: also contain chapters or sections on ethics.]

