7 Replies to “Lexical Richness Measures in Spanish Heritage, Native, and L2 Learners (Irene Checa-Garcia, Laura Marqués-Pascual)”

  1. Thank you very much for your presentation, i was very interesting. I was wondering if you could go over how you calculated lexical diversity and sophistication, as you mentioned in the video. ¡Muchas gracias!

    1. Hello Marina, thank you for your question. I could send you a bit more detail over the email if you want (write to me to irene.checa@uwyo.edu), but for lexical diversity there are many measures. We used the program CLAN to extract the lemmas and work with lemmas rather than words (so “student” and “students” would count as the same lemma rather than two different words, for instance). We used only content lemmas (grammatical morphemes tend to show less of that) and we applied what is called the Uber transformation to avoid correlation with extension and because it has been shown to be better in the written mode. As for sophistication, I wrote a script in R to match every lemma with the frequency assigned to that lemma in the frequency list for Spanish published by Mark Davies. Then the frequencies were normalized with log10 and their scored per subject averaged, following some other methodological studies on this. Happy to share more details or references over the email.

    2. Hello Marina, thank you for your question. I could send you a bit more detail over the email if you want (write to me to irene.checa@uwyo.edu), but for lexical diversity there are many measures. We used the program CLAN to extract the lemmas and work with lemmas rather than words (so “student” and “students” would count as the same lemma rather than two different words, for instance). We used only content lemmas (grammatical morphemes tend to show less of that) and we applied what is called the Uber transformation to avoid correlation with extension and because it has been shown to be better in the written mode. As for sophistication, I wrote a script in R to match every lemma with the frequency assigned to that lemma in the frequency list for Spanish published by Mark Davies. Then the frequencies were normalized with log10 and their scored per subject averaged, following some other methodological studies on this. Happy to share more details or references over the email.

  2. Hi Irene and Laura, thanks for your presentation!

    I was wondering whether you controlled for participants’ proficiency in Spanish in some way, as most of the measures you use in the study have been found to correlate with proficiency. Also in relation to this, were the L2 learners and the heritage speakers equally proficient in Spanish?

    Gracias!

    1. All L2 learners were at a similar proficiency level: end of the 2nd year Spanish, and all heritage speakers were at a similar proficiency level (first time they ever took a Spanish class in college), however, due to IRB limitations, we could not test proficiency independently with further testing. Because of the characteristics of the program where we did the study, the heritage speakers were all more fluent in Spanish than the L2. That all said, if we were to do this study again, and given the nature of the essay, I think we would add a component in the linguistic background questionnaire to ask for socioeconomic status, as this may make quite a bit of difference. Hope this answers your question!

  3. Interesting presentation.
    What did you do with the errors or deviations from the norm? Did you count them? And what about other lexical features such as loans, code switching, calcos? Did you exclude them?
    Thank you.

    1. Great questions Adrian. We decided, at least for now, not to include “errors” and the reason for this being, particularly with calcos, and loans, and the like, that it is hard to determine to what extend some of those are “errors” or rather features of their linguistic variant. Is “aplicación” meaning ‘solicitud’ an error? In the case of an L2 it is a transfer from their L1, but is it just that in the case of the SHS? We did mark words in English as such, simply because once the analysis is automatized with CLAN, and frequencies matched through R to a Spanish frequency list, the English words would not have a correlate in the 20,000 most frequent words in Spanish, and therefore would inflate sophistication. It is also dubious that English words should be included for diversity as well, as they were actually very rarely used but could also inflate diversity measures. And once again, their use could mean different things across the groups, but the treatment, particularly the automatized one, would be the same. That all said, there were quite a few calcos, but very few code-switching English words (more grammatical ones actually, not computed for most of the indexes). Of course, a complete picture of lexical development, beyond richness, would include accuracy, and we may in the future, provided we can find a way to deal with how to differently consider those among groups and how that would affect the comparison. Thank you for the question, truly important aspect!

Comments are closed.