Cet article est uniquement disponible en anglais.

From our previous articles on this blog, you know that the Language Archive at the Max Planck Institute for Psycholinguistics is home to a large amount of data on endangered languages that has been collected by MPI researchers, within the DoBeS programme and by quite a number of researchers not directly affiliated with the MPI. They just thought this was a good place to store their data.

But there is also a lot of non-documentary data stored at the TLA archive. As the archive is attached to a research institute, we also hold a lot of linguistic research data. We also host resources for other researchers that need a safe place to store their data and easily make it available to the outside world. This research data is quite varied by now, ranging from more traditional experiments using sound recordings or eye-tracking data over neurobiological data from EEG and MRT, up to data from our department Language and Genetics, which stores genome sequences and complicated analyses they carry out on that within our archive.

One of the more traditional external deposits is called LESLLA. The wider LESLLA project is concerned with second language acquisition, specifically with Low Educated Second Language and Literacy Acquisition. There are multiple sub projects within the realm of LESLLA. The one stored at TLA is called « Stagnation in L2 acquisition: Under the spell of the L1? »

In this particular project data was collected from 15 women of Turkish and Moroccan origin, who have learned Dutch as a second language and come from a low-educational background. They were all participating in a Dutch course for adults during the time of the project. Within the project, they had to carry out various speech tasks, and, to see the progress in their Dutch proficiency, this has been done in three cycles of about 6 months each.

One of these tasks had the participants read a picture story about a snowman which they then had to retell and afterwards had to answer questions about. Here are some small examples from two of the participants during this task:




The data was collected between 2003 and 2005 in a project funded by the NWO (the Dutch Organisation for Scientific Research), but it hadn’t been made available to the public then. All the recordings, alongside the transcriptions had been stored on a number of DVDs, which weren’t easily accessible, even to fellow researchers. The CLARIN project Data Curation Service has taken this data and processed it to make it available to the research community and the wider public. In 2014 all the data has been deposited in the TLA archive and can now be used in further research.

The LESLLA data is a good showcase for the wealth of different materials the TLA archive has to offer. While the documentational data is often more exciting and flashy, the research data can also be very interesting, not only to researchers. The TLA archive is set up in such a way to make it easy to search through these kind of materials. There is a search engine available that lets you search through all the annotations within a certain corpus:



The archive also has a special viewer in place that lets you listen to recordings (or watch them in case of video) while displaying the corresponding annotations at the same time below. This way it is easier to follow what is being said, especially if it is in a language the viewer is unfamiliar with:


We hope that this small window into some of the materials stored at The Language Archive has made you curious and some of you will go on and explore the other data stored at our archive, and all of us here at TLA wish you happy holidays.

Further information:
* A paper about the curation process can be found here.
* The LESLLA corpus can be viewed directly at TLA.

by Alexander König (TLA)