I went in search of a way to write down the grammar structures I learned in KDA class. I meandered off into linguistic grammar patterns and learner’s dictionaries. This is just a ramble of what I found along the way.
I need to develop a consistent notation for describing grammar and sentence patterns. My lack of formal linguistic training pains me. I am sure other folks have already figured this out. My first attempt to use puzzle pieces as a way to capture the particles grammar is too simple to contain the new grammar I am learning.
Poking around, I discover there is a International Language Database COBUILD. Pattern Grammar is a model for describing the syntax of individual lexical items. Each word has a set of patterns assigned to it which describe typical contexts in which they are used. For example, ‘skim’ can be used to skim cream off milk or skim something you are reading. By looking at large amounts of text to see how words are used, they can figure out the patterns and use this to create Learner Dictionaries.
In linguistics, grammar is the set of structural rules governing the composition of clauses, phrases, and words in any given natural language. Part of speech tagging (POST) is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech. As I learn Korean, I need to both capture the grammar rules and know how to identify the parts of speech used in sentences. The process of Korean sentence parsing is called parsed corpora.
Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, and because some parts of speech are complex or unspoken. In part-of-speech tagging by computer, it is typical to distinguish 50 separate parts of speech for English. Some examples are NN for noun, NNS for plural noun, VB for verb, VBD for verb past tense, and a tagged string might look like this:
The/DT quick/JJ brown/JJ fox/NN jumped/VBD over/IN the/DT lazy/JJ dog/NN
I pull myself back from delving into computational linguistics, a field I long to explore. Today’s goal is just to find a way of representing the grammar that I am learning in KDA class.
In my search, I came across a paper by Seo-in Shin at Seoul National University
The paper is about extracting grammatical patterns from Korean sentences (parsed corpora). It wants to represent Korean sentence structure.
Here is an example sentence:
리나가 민희에게 선물을 주었다.
Rina-SBJ Minhee-I_OBJ present-D_OBJ give-PAST-FINAL
‘Rina gave Minhee a present.’
In Korean, verb comes at the end of the sentence. Usually, subject comes first followed by indirect object and direct object.
I am in search of the Korean version of a Learner’s dictionary. 21c Sejong project is a government-funded project to build corpora and produce an electronic dictionary for natural language processing. The parsed corpus of 21c Sejong project has been built since 2002, and now the size of this corpus is 363,226 words; 33,437 sentences. The human annotators use tools that aid to build the corpora, but most of the processes are being done manually.
This cries out for a computer program to be built to help automate the process.
A Monolingual learner’s dictionary (or MLD) is a type of dictionary designed to meet the reference needs of people learning a foreign language.
I am having trouble locating a Korean Learner’s dictionary. This one pictured is out of stock at twochoi’s website. I saw a youtube video that showed a Korean learner’s dictionary with IPA (International Phonetic Alphabet), word frequency, and Hanja. Example sentences and related terms. Written entirely in Korean, it included the word, the definition, part of speech. So I know such things exist, I simply haven’t found one yet.
The Learner dictionary entries are far more comprehensive. For example, see the one on countable nouns. I make a mental note to check here when I am looking up grammar terms and add “Cambridge Advanced Learner’s Dictionary” to my wish list.
The English Language: From Sound to Sense
Is there such a thing as a “language geek”? I must be one. I find this all fascinating.