Endangered Language Alliance


documentation and description work on little-known languages

IMG_3784Most of the world’s approximately 7000 languages have not been documented in any depth, and many have never been recorded or described at all. At ELA, we work with speakers of lesser-known languages to produce highly quality video and audio recordings, which are then transcribed, annotated, and translated into a language of wider communication whenever possible.

By documenting the vocabulary of a language, analyzing its grammar, and collecting texts, we create a flexible, multi-purpose record that can be useful for learners, linguists, and speakers themselves. Most of ELA’s documentation work is done in the New York City area, in close collaboration with immigrant communities who have brought their languages and cultures with them. We also provide support to speakers and activists working to document their own languages back home, and to aspiring linguists and students interested in getting involved in language documentation and revitalization.

ELA has recorded hundreds of hours of video and audio in some 50 different languages from all over the world, belonging to many different language families. ELA researchers is engaged in ongoing, in-depth documentation of languages as different as Ikota, Gurung, Wakhi, Purhepecha, and Koda. Special attention is given to culture-specific speech genres and unique literary and verbal art traditions — ELA’s archive includes a Torah portion given in Bukhori, oral histories in Nones, and examples of abaimahani and arumahani genres of Garifuna music.

Watch the Bukhari Torah portion:

Language documentation ensures that future generations will have access to their own linguistic and cultural heritage. It also creates a new presence for the language on the internet, a potentially powerful boost for the prestige and visibility of a language. Only 5 percent of the world’s languages have a real presence online.

For more information on our strategies and techniques for documenting languages, see our How page.

















*For a full description, see Kaufman, Daniel & Raphael Finkel. 2018. Kratylos: A tool for sharing interlinearized and lexical data in diverse formats. Language Documentation & Conservation 12. 124-146.

Supported by a National Science Foundation, Kratylos is a new online tool under joint development by ELA and computer scientist Raphael Finkel at the University of Kentucky. Kratylos will enable linguists to share and analyze language data more easily as well as offering new ways of collecting data online.

Today, a major gap exists in the electronic ecosystem for fieldworkers and other linguists who use software such as Toolbox and FLEx: there is still no easy method for sharing projects containing a lexicon and glossed interlinearized texts in a way that enables complex searches and the elicitation of feedback through the Internet.

Kratylos complements rather than competes with existing database software, building on the Fieldworks Language Explorer (FLEx) software developed by SIL, which has a large international and highly active user community. Kratylos effectively replicates its powerful search features for online users. With Kratylos, a FLEx project can be transformed into a linked online concordance and dictionary, complete with audio and video media — a record of a language.

Why XML?

Best practices in linguistic documentation demands the use of formats that are maximally interoperable and least likely to become obsolete. As a result, linguistic data in electronic format is increasingly being encoded in Extensible Markup Language, XML for short. XML is a format for encoding documents that can be read easily by both humans and computers. Information must be classified using start-tags and end-tags so that each part of the document belongs unambiguously to the various sections or categories that exist within the scheme. To take a simple example, what we would write informally as,

Step 1: Find A

Step 2: Find B

Step 3: Connect A to B

would be expressed as the following in XML, where each step is enclosed by a start-tag and an end-tag and the whole set of steps are embedded within a procedure, with its own start- and end-tags.

    <step number="1">Find A.</step>
    <step number="2">Find B.</step> 
    <step number="3">Connect A to B.</step>

The tags tell us unambiguously where the string begins and ends as well as informing us that it is a step. In the schema employed, steps have the attribute “number” which marks each step distinctly. There are serious benefits to storing linguistic information (lexical data and interlinearized texts) in a tagged, well-delineated, hierarchical format. The result is a human readable, unambiguous and highly interoperable code that can be used for years to come.

While XML is a great way to store linguistic data there are still no readymade solutions for displaying and searching such data. Many general programs exist for viewing XML more easily but are not particularly well suited for linguistic analysis. In addition, XML viewers are stand-alone programs that are not designed to facilitate sharing data through the internet — which is crucial for documentary linguists.

Collaborating and Crowdsourcing With Kratylos

Kratylos will offer a new way for linguists to share their data, whether in XML or other standard formats, in the form of online corpora and dictionaries. This includes transforming XML exports from FLEx into a linked, searchable online corpus (complete with multimedia files) and dictionary.

The development of Kratylos began in 2015 and will continue through 2017. The system is being with complex real-world language data from four of the ELA’s ongoing language documentation projects: Ikota, KodaPurhepecha, and Wakhi. As data collection and analysis proceeds, the FLEx databases for each project are increasing in complexity, allowing us to test Kratylos against a wide range of linguistic issues. Making these projects freely available as easily searchable corpora and dictionaries online, we will be able to involve researchers and community members directly in the documentation process.

The fate of the world’s linguistic diversity may very well hang on our ability to take advantage of “crowdsourcing” strategies for language documentation in the coming decades. Crowdsourcing initiatives are underway for collection of audio and transcription, perhaps the most effective example being BOLD (Basic Oral Language Documentation) (Bird 2010), in which audio/video data is collected, re-spoken and then translated orally in a transcription-free workflow. While BOLD targets participants with low exposure to technology and areas that may be off the grid, there is a growing but unmet need for similar strategies aimed at technology savvy contributors. Kratylos will fill that gap by allowing the guided transcription of texts.

Stay Tuned!

Currently, Kratylos is a working prototype still under development, but already able to create online databases from a user’s uploaded FLEx, Praat or ELAN data as well as play associated audio files Within the next year or two, we hope it will be a valuable public tool, free and easy to use. Email us at info@elalliance.org for more information or if you are interested in uploading your own data to the test site.

Youtube screenshotJose Juarez (ALNY)

ALNY will be both the first-ever urban language archive and a unique portrait of New York, capturing the linguistic life of the city. The archive will be a resource for communities, scholars, and the general public, for all New Yorkers and for anyone interested in languages and cities. It will be powerful enough to enable research on little-known languages and straightforward enough for schoolchildren to explore. The front end will be an evolving, fully accessible, state-of-the-art audio-visual archive. On the back end, ALNY will be integrated with Kratylos, an innovative software tool for analyzing and crowdsourcing linguistic data, which is currently being built at ELA with the support of a funded three-year National Science Foundation grant.

To see some of the material that will be housed in ALNY, visit individual language pages, where a sample of our recordings are currently housed. ELA’s Youtube channel features over 200 videos in several dozen languages, including songs in Garifuna, oral histories in little-documented Jewish languages of Central Asia like Juhuri and Bukhori, and much more. ALNY will enable ELA to make all our material fully available, discoverable, and searchable.


Perlin, Ross. 2019. A Grammar of Trung. Himalayan Linguistics, 18(2). http://dx.doi.org/10.5070/H918244579

Perlin, Ross and Daniel Kaufman (eds). 2019. Languages of New York City (1st and 2nd edition), map. New York: Endangered Language Alliance.

Perlin, Ross. 2019. “Talk of the Town”, Artforum, October 2019.

Kaufman, Daniel and Ross Perlin. 2018. “Language documentation in diaspora communities” in Kenneth Rehg and Lyle Campbell (eds.), The Oxford Handbook of Endangered Languages, Oxford: Oxford University Press.

Gurung, Nawang, Ross Perlin, Daniel Kaufman, Mark Turin, & Sienna R. Craig. 2018. Orality and Mobility: Documenting Himalayan Voices in New York CityVerge: Studies in Global Asias, 4 (2), 64-80.

Kaufman, Daniel and Raphael Finkel. 2017. “Kratylos: A Tool for Sharing Interlinearized and Lexical Data in Diverse Formats.” Language Documentation and Conservation, vol. 12, 2018, pp. 124-146. Reprinted in CUNY Academic Works.

Borjian, Habib and Daniel Kaufman. 2016. “Juhuri: from the Caucasus to New York City” in Maryam Borjian and Charles Häberl (eds.), Special Issue: Middle Eastern Languages in Diasporic USA Communities, International Journal of Sociology of Language, (237), 51–74.

Perlin, Ross. 2016. “The Race to Save a Dying Language”, Guardian, August 17.

Borjian, Habib and Ross Perlin. 2015. “Bukhori in New York”, Cahier de Studia Iranica 57:15-27.

Perlin, Ross. 2014. “Endangered Speakers”, n+1 20.

Borjian, Habib. 2014. “What Is Judeo-Median—and How Does it Differ from Judeo-Persian?” Journal of Jewish Languages, 2(2): 117-142. doi: https://doi.org/10.1163/22134638-12340026

Blevins, Juliette. 2010. Saving endangered languages in the United States.  A Living Legacy: Preserving Intangible Culture. Washington, D.C.: United States Department of State, Bureau of International Information Programs. 6-10.


Kaufman, Daniel. 2017. “Saisiyat Morphology: A Review Article”. Oceanic Linguistics 56(1):278-293.

Kaufman, Daniel. 2013. “A Grammar of Tamambo, the Language of Western Malo, Vanuatu” (review). ​Oceanic Linguistics 51(3): 286-299.

Kaufman, Daniel. 2012. “Endangered Austronesian and Australian Aboriginal Languages: Essays on language documentation, archiving and revitalization” (review). ​Oceanic Linguistics,51(2): 589-596

Kaufman, Daniel. 2007. “Salako or Badameà: Sketch Grammar, Texts and Lexicon of a Kanayatn Dialect in West Borneo” (review). Oceanic Linguistics 46(2): 624-633.


Perlin, Ross. 2020. Counting New York: The City and the Census. New York Public Library. Mar. 4.

Kaufman, Daniel and Ross Perlin. 2019. Memorial Sloan Kettering Cancer Center (Immigrant Health & Cancer Disparities Services).

Kaufman, Daniel. 2019. Language work with Indigenous Immigrants in NYC​. CNY Humanities Corridor Workshop: Celebrating Indigenous and Refugee Language Communities in New York State. Cornell University. Sept. 27.

Kaufman, Daniel. 2019. Language revitalization and  language access in NYC. Global Language Justice Book Workshop. Mellon-Sawyer Language Justice Project. Columbia University. Aug. 28.

Perlin, Ross and Daniel Kaufman. 2019. Language Access and NYC’s true linguistic diversity. Presentation to the Mayor’s Office of Immigrant Affairs Language Access team. July 10.

Kaufman, Daniel. Linguistic Research with Diaspora Communities​. LSA Summer Institute, UC Davis. June 30.

Kaufman, Daniel. 2019. Making Way for Indigenous Languages in the City: The View From New York.  Invited Talk. HELISET TTE SKAL ​Let the Languages Live Conference​. Victoria, British Columbia. June 24.

Perlin, Ross. 2019. “Stateless, oral, immigrant cultures in New York”, Workshop on Language in Its Settings, Columbia University, 31 May 2019.

Kaufman, Daniel. 2019. From field data recording to online interlinear glossed text corpus. NYU Fieldwork Discussion Group. May 3.

Kaufman, Daniel, Habib Borjian, Daniel Barry, Ross Perlin, Kathryn Rafailov and Matthew Zaslansky. 2017. “Endangered Iranian Languages“. North American Conference in Iranian Linguistics (NACIL 1), 2017 April 28-30.

Kaufman, Daniel and Raphael Finkel. 2019. Demonstration of Kratylos software. Technological showcase. International Conference on Language Documentation and Conservation. University of Hawai’i at Manoa.

Kaufman, Daniel. 2019. Community based research across borders. Invited panelist for Bringing Latin American Perspectives on Community Based Research to SSILA at the LSA conference. Jan. 2-4.

Kaufman, Daniel. 2018. Examining Austronesian prosody through the lens of hip-hop. Invited talk. CCLS Lecture Series , University of Cologne, Germany.

Kaufman, Daniel. 2018. Ways of engaging with urban linguistic diversity: A critical view from New York. Plenary talk. Big Cities, Small Languages Conference . ZAS, Berlin. Nov. 14-16.

Kaufman, Daniel. 2018. Indigenous languages in NYC: Ideology and conservation. Indigenous Languages: From Endangerment to Revitalization to Resilience. University of Michigan Center for Southeast Asian Studies. Oct. 25.

Kaufman, Daniel, Tony Woodbury and B’alam Mateo. 2018. Roundtable facilitator on collaborative documentation. Sound Systems of Latin America III. University of Mass. at Amherst. Oct. 19-21.

Alvarez, Jackeline & Daniel Kaufman. 2018. A Comparative Analysis of Alcozauca and Cuautipan Mixteco Deictics. First ILLC Conference. Long Island University. (Part of the NSF REU mentorship Program.)

Kaufman, Daniel. 2018. The Austronesians: Family relations and inter-family contact across six millennia. The Greater South China Sea Interaction Zone: A Workshop to Explore Interdisciplinary Interventions into the Study of the Ancient East Eurasian South. Columbia University.

Kaufman, Daniel. 2018. Discussant for Panel 326: Language Choice and Identity in South and Southeast Asia. Association for Asian Studies 2018 Annual Conference, Washington DC.

Kaufman, Daniel. 2010.  “Greenberg’s 16th Slayed in the Bronx?”. Harvard GSAS Workshop in Language Universals and Linguistic Fieldwork, 2010 April 13.