One of the joys of learning a language is learning new words, in both the target language and your own native language.  Having never taken a course in Linguistics, there are many linguistic terms I am unfamiliar with.  Today, I discovered a new one:  Lemma.

In my ongoing interest in language, dictionaries, and computational linguistics, this term delights me.

I am a complete novice in this area, but I will make a stab at understanding it. Please refer to the experts for a better explanation.

Looking up words in the dictionary has me thinking about HEADWORDS.  Conjugating verbs has me thinking about STEMS.  As I develop my own internal dictionary for understanding the words I am learning in Korean, I debate how I want to organize these words.  In English, you have words go, goes, going, gone.  Go is the HEADWORD, that bolded term in the dictionary that other words are related to.   Similarly, in Korean you have the infinitive form of the verb to go 가다, the stem 가, the past base form 갔, and forms of the word go 가요, did go 갔어요, will go 갈 거예요, questioning 갑니까?, commanding 가십시오, proposing 갑시다, going 감, go if 가면, go and 가고, etc.

I’ve taken issue with the way LingQ measures words because it would count all those different versions of go as different words, where I think of them as different forms of the same word – a word grouping or word family if you like.

A  ‘lemma‘ is a  unit, devised and used by linguists and computational lexicologists to
somehow order and arrange the world of wordforms.  Ah!  That makes so much sense to my computer programmer brain.

The dictionary headword is one form a lemma can take to represent a `word’ in all its inflected forms.

A lemma doesn’t really exist

A lemma is that `underlying’ form; it doesn’t really exist, except for use in databases and
dictionaries. It looks like a real word, but in fact, it’s just a convenient way of expressing something bigger.

This is the moment I fell in love with lemma.  I will spare you a discussion of relational databases, but a lemma is the key I could use in the table to relate all the different word forms.  Bonus, this is a word instead of some database generated number to provide unique identifier for this word family.  The back of my brain is still chewing on how to create database structures to represent the dictionary/wiki of information I am gathering as I learn.  Lemma is a concept that makes me very happy.

So here is the definition of lemma from wikipedia:

 In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword)

