DHQ: Digital Humanities Quarterly
Changing the Center of Gravity: Transforming Classical Studies Through Cyberinfrastructure
2009
Volume 3 Number 1
2009 3.1  |  XML |  Discuss ( Comments )

What Your Teacher Told You is True: Latin Verbs Have Four Principal Parts

Gregory Stump  <gstump_at_uky_dot_edu >, University of Kentucky

Abstract

We describe two different strategies for generating the morphology of Latin verbs. First, we hand-code default inheritance hierarchies in the KATR formalism, treating inflectional exponents as markings associated with the application of rules by which complex word forms are deduced from simpler roots or stems. The high degree of similarity among verbs of different conjugation classes allows us to formulate general rules; these general rules are, however, sometimes overridden by conjugation-specific rules. This approach allows linguists to gain an appreciation for the structure of verbs, gives teachers a foundation for organizing lessons in morphology, and provides students a technique for generating forms of any verb. Second, we start with a paradigm chart, then automatically remove common parts and redundant morphosyntactic property sets (columns), combine similar conjugations (rows), and generate the KATR theory that produces a complete table of forms for a set of lexemes. This second approach automatically determines principal parts (for Latin, we verify that there are four), groups inflection classes into super-classes, and builds full paradigm charts.

Introduction

Recent research into the nature of morphology has demonstrated the feasibility of two alternative approaches to the definition of a language’s inflectional system. Central to both approaches is the notion of an inflectional paradigm. In general terms, the inflectional paradigm of a lexeme L can be regarded as a set of cells [1], where each cell is the pairing of L with a set of morphosyntactic properties, and each cell has a word form as its realization; for instance, the paradigm of the lexeme walk includes cells such as <WALK, {3rd singular present indicative}> and <WALK, {past}>, whose realizations are the word forms walks and walked.
Given this notion, one approach to the definition of a language’s inflectional system is the realizational approach (see [Matthews 1972], [Zwicky 1985], [Anderson 1992], [Corbett 1993], [Stump 2001]). In this approach, each word form in a lexeme’s paradigm is deduced from the lexical and morphosyntactic properties of the cell that it realizes by means of a system of morphological rules. For instance, the word form walks is deduced from the cell <WALK, {3rd singular present indicative}> by means of the rule of -s suffixation, which applies to the root walk of the lexeme WALK to express the property set {3rd singular present indicative}.
An alternative approach to the definition of a language’s inflectional system is the implicative approach (see [Blevins 2005], [Blevins 2006], [Finkel 2009]). According to this approach, certain word forms in a lexeme’s paradigm serve as the basis for inferring the paradigm’s other forms. In Old English, for instance, the word form hældon “healed (plural)” may be deduced from the word form hælde “healed (3rd singular)” in accordance with a general principle that in the inflection of a weak verbal lexeme L, the realization of <L, {3rd singular past indicative}> and that of <L, {plural past indicative}> stand in the relation XdeXdon.
Despite their differences, both approaches are capable of generating a language’s inflected forms. We demonstrate this claim for Latin. We first present a realizational analysis of Latin in the KATR language [Finkel 2002]. KATR is based on DATR, a formal language for representing lexical knowledge designed and implemented by Roger Evans and Gerald Gazdar [Evans 1989]. We then present an implicative analysis that uses techniques of abstraction and grouping to derive both a principal-part analysis and a different KATR theory for Latin.
This research is part of a larger effort aimed at elucidating the morphological structure of natural languages. In particular, we are interested in identifying the ways in which default-inheritance relations describe a language’s morphology as well as the theoretical relevance of the traditional notion of principal parts.

Benefits

As we demonstrate below, the realizational approach leads to a Latin KATR theory that provides a clear picture of the morphology of Latin verbs. Different audiences might find different aspects of it attractive.
  • A linguist can peruse the theory to gain an appreciation for the structure of Indo-European verbs in general and Latin verbs in particular, with all exceptional cases clearly marked either by morphophonological diacritics or by rules of sandhi, which are segregated from all the other rules.
  • A teacher of the language can use the theory as a foundation for organizing lessons in morphology.
  • A student of the language can suggest verb roots and use the theory to generate all the appropriate forms, instead of locating the right paradigm in a book and substituting consonants.
The implicative approach that we demonstrate in this paper has several benefits.
  • It automatically determines which forms of a verb could be treated as principal parts. For Latin, we compute that four principal parts suffice.
  • It allows us to group inflection classes into super-classes. For Latin verbs, there are more than four inflection classes if one takes into account variations in such forms as those of the active perfect and passive participle; our grouping method shows that the traditional organization into four conjugations is consistent with super-classes of our more finely detailed set of inflection classes.
  • It generates charts showing the full paradigm of lexemic exemplars; such charts can have pedagogic value.

A Realizational KATR Theory for Latin

The purpose of the KATR theory described here is to generate verb forms for Latin, specifically, the realizations of all combinations of the morphosyntactic properties of voice (active/passive), mood (indicative/subjunctive), aspect (imperfective/perfective), tense (present/past/future), number (singular/plural), and person (1/2/3). The combinations form a total of 144 morphosyntactic property sets (MPSs). However, Latin has no future subjunctive, reducing the total to 120 MPSs.
Latin verbs consist of a sequence of morphological formatives, arranged in five slots:
  • Root, which realizes the verb’s lexeme, possibly dependent on the aspect. For instance, for the verb laudō, the stem is laud.
  • Tense marker 1, which realizes part of the verb’s tense, possibly dependent on the mood. For instance, for past perfective subjunctive, this marker is issē.
  • Tense marker 2, which realizes another part of the verb’s tense. This marker is usually empty. For the past indicative, though, it is ā, and for the present perfective subjunctive, it is ī.
  • Person/Number, which realizes a verb’s properties of person and number, possibly dependent on other categories. For example, the marker for the first person singular present indicative is ō.
  • Voice, which realizes a verb’s voice. It is empty for the active voice, and is usually r for the passive voice.
To keep our discussion short, we omit the imperative and infinitive forms, although our complete KATR theory includes them without difficulty. For those forms that use a participle (such as the perfective passive), we limit ourselves to a single form, the masculine singular.
There are four frequently encountered conjugations, distinguished by their theme vowel: first (ā: laudāre), second (ē: monēre), third (i: dūcere, capere), fourth (ī: audīre). The third conjugation has two variants; in one (capere), the theme vowel is more pronounced.

The Conjugation-1 Verb laudāre “Praise”

A diagram showing a network of nodes for generating forms of verbs
Figure 1. 
A network of nodes for generating forms of verbs in five conjugations.
A theory in KATR is a network of nodes. The network of nodes constituting our verb morphology theory is partially represented in Figure 1. The organizational principle in this network is hierarchical: The tree structure’s terminal nodes represent individual verbal lexemes, and each of the non-terminal nodes in the tree defines default properties shared by the lexemes that it dominates.
Each of the nodes in a theory houses a set of rules. We represent the verb laudāre “praise” by a node:
Praise: 
  1 <root> == l a u d 
  2 <> == VerbA
The node, named Praise, has two rules, which we number for discussion purposes only. KATR syntax requires that a node be terminated by a single period (full stop), which we omit here. Our convention is to name the node for a lexeme by a capitalized English word (here Praise) representing its meaning.
Rule 1 says that a query asking for the root of this verb should produce a four-atom result containing l, a, u, and d. Rule 2 says that all other queries are to be referred to the VerbA node, which we introduce below.
A query is a list of atoms, such as <root> or <active indicative perfect present 1 sg>, addressed to a node such as Praise. In our theory, the atoms in queries generally represent morphological formatives (such as root, themeVowel), morphosyntactic properties (such as perfect, sg) or surface forms (specific orthographic characters).
A query addressed to a given node is matched against all the rules housed at that node. A rule matches if all the atoms on its left-hand side match the atoms in the query. A rule can match even if its atoms do not exhaust the entire query. In the case of Praise, a query <root perfect> is matched by Rules 1 and 2; a query <themeVowel> is only matched by Rule 2.
Left-hand sides expressed with path notation (<pointed brackets>) only match if their atoms match an initial substring of the query. Left-hand sides expressed with set notation ({braces}) match if their atoms are all expressed, in whatever position, in the query. We usually use set notation for queries based on morphological formatives and morphosyntactic properties, where order is insignificant, but path notation for queries based on surface forms, where order is significant.
When several rules match, KATR picks the best match, that is, the one whose left-hand side “uses up” the most of the query. This choice embodies Pāṇini’s principle, which entails that if two rules are applicable, the more restrictive rule applies, to the exclusion of the more general rule. We sometimes speak of a rule’s Pāṇini precedence, which is the cardinality of its left-hand side. If a node in a KATR theory houses two applicable rules with the same Pāṇini precedence, we consider that theory malformed.
In our case, Rule 2 of Praise only applies when Rule 1 does not apply, because Rule 1 is always a better match if it applies at all. Rule 2 is called a default rule, because it applies by default if no other rule applies. Default rules define a hierarchical relation among some of the nodes in a KATR theory; thus, in the tree structure depicted in Figure 1, node X immediately dominates node Y iff Y houses a default rule that refers queries to X.
KATR generates output based on queries directed to nodes representing individual lexemes. Since these nodes, such as Praise, are not referred to by other nodes, they are called leaves, as opposed to nodes like VerbA, which are called internal nodes. The KATR theory itself indicates the list of queries to be addressed to all leaves. Here is the output that KATR generates for several queries directed to the Praise node.
active,indicative,imperfective,present,sg,1 laudō
active,indicative,imperfective,past,sg,2 laudābās
active,indicative,imperfective,past,sg,3 laudābat
active,indicative,imperfective,future,pl,1 laudābimus
active,indicative,perfective,present,pl,2 laudāvistis
active,indicative,perfective,past,pl,3 laudāverant
active,indicative,perfective,future,sg,1 laudāverō
active,subjunctive,imperfective,present,sg,2 laudēs
active,subjunctive,imperfective,past,sg,3 laudāret
active,subjunctive,imperfective,past,pl,1 laudārēmus
active,subjunctive,perfective,present,pl,2 laudāverītis
passive,indicative,imperfective,present,pl,3 laudantur
passive,indicative,imperfective,past,sg,1 laudābāmer
passive,indicative,imperfective,past,sg,2 laudābāris
passive,indicative,imperfective,future,sg,3 laudābitur
passive,indicative,perfective,present,pl,1 laudātīsumus
passive,indicative,perfective,past,pl,2 laudātīerātis
passive,indicative,perfective,future,pl,3 laudātīerunt
Table 1. 
The rule for Praise illustrates the strategy we term provisioning [Finkel 2007b]: It provides information (here, the consonants of the verb’s root) needed by but not provided by more general nodes (here, VerbA and the nodes to which it, in turn, refers).
We refer to the individual segments of a morphological form by means of particular atoms:
  • themeVowel is the vowel usually found after the root of a verb.
  • The atoms stemImperfective, stemPerfective, and stemParticiple are the verb stems, usually including the theme vowel, that precede suffixes marking tense, number, person, mood, and aspect.

The VerbA Node

We now turn to the VerbA node, to which the Praise node refers.
VerbA: 
         1 <themeVowel> == ā 
         2 <stemImperfective> == "<oot>" <themeVowel>
         3 <stemPerfective> == <stemImperfective> v
         4 <stemParticiple> == <stemImperfective> t
         5 <> == Verb:<conj1> </code>
       
As with the Praise node, VerbA defers most queries to its parent, in this case the node called Verb, as Rule 5 indicates.
Most of these rules are for provisioning. Rule 1 answers the themeVowel query. We sometimes address this query to leaf nodes; the Praise node defers it to VerbA, which answers the query.
Rules 2-4 provision the three stems. The quoted path <root> in the right-hand side directs a new query to the node to which the original query was first addressed, in our case, Praise, which produces the four atoms l a u d. The non-quoted path <themeVowel> directs a new query to the current node, that is, VerbA, resolved by Rule 1. The right-hand side of the rule in this case is equivalent to l a u d ā. Similarly, the right-hand side of Rule 3 is l a u d ā v, and the right-hand side of Rule 4 is l a u d ā t.
Rule 5 is a default rule, directing its query to the Verb node, with the atom conj1 prepended to the query. Therefore, queries addressed to Verb contain not only morphosyntactic markers (such as “present passive”) but also informational markers (“conj1”).
By way of contrast, we also present the VerbIO node, which applies to i-stem third-conjugation verbs such as facere and capere.
VerbIO: 
       1 {themeVowel} == i 
       2 {themeVowel past imperfective } == I 
       3 {themeVowel 2 sg present imperfective passive 
       indicative} == I 
       4 {themeVowel imperative} == I 
       5 {themeVowel infinitive} == I 
       6 <stemImperfective> == "<root>" <themeVowel> 
       7 <stemPerfective> == AEE:<"<root>"> 
       8 <stemParticiple> == "<root>" t 
       9 <> == Verb:<conj3io>     
     
Rules 2-5 introduce the strategy we call overriding, answering a query that is usually answered by a more general node in order to provide specific results for this situation. The left-hand sides of these rules use braces instead of angle brackets, indicating that the order of appearance of the atoms is irrelevant for matching the rules. The atom I that appears on the right-hand sides of these rules is a morphophoneme that either disappears or converts to an i or an e, depending on surrounding context, during a postprocessing step.
Rule 7 introduces the lookup strategy, by which particular information is obtained by reference to a special-purpose node. It directs a query such as f a c to the AEE node to convert the a to ē, so the perfective stem of facere becomes fēc. Here is that lookup node:
AEE: 
  1 <$letter#1 a $letter#2> == $letter#1 ē $letter#2      
This node depends on a definition (not shown) that defines what atoms are in the category “letter.” Rule 1 says that any query beginning with a letter, then the atom a, then any letter, should evaluate to the two letters surrounding ē instead.

The Verb Node

Queries addressed to Praise are generally deferred to its parent, VerbA, which then defers them further to Verb.
Verb:  
       1 {$conj34 1 sg future/present imperfective 
           indicative/subjunctive} =+= <> 
       2 {future subjunctive} == ! 
       3 {perfective passive} == Sandhi:<"<stemParticiple>" 
           AdjSuffix:<nominative masculine> wordEnd> , 
           ToBe:<imperfective active> 
       4 <> == Sandhi:<StemAspect SuffixTense1 SuffixTense2 
           SuffixPersonVoice wordEnd>   
Rule 1 reflects the future indicative to the present subjunctive in verbs of conjugations 3 and 4 (abbreviated by the value $conj34) in the first singular imperfective. This rule is quite specialized, applying, for instance, to dūcam “I will lead / may I lead.” Without this rule, we would produce dūcēo “I will lead.”
Rule 2 indicates that there is no result for a query involving future subjunctive forms; Latin does not have these forms.
Rules 3 and 4 reflect to the Sandhi node as a postprocessing step after assembling the components of a verb. The rule with the widest applicability, Rule 4, is the default rule. It combines the results of queries directed to the four nodes StemAspect, SuffixTense1, SuffixTense2, and SuffixPersonVoice, along with the marker wordEnd for postprocessing. Rule 3 generates forms for the perfective passive, which involve a participle, an adjectival end, and a specific form of the verb esse, which has its own node ToBe.

Auxiliary Nodes

The Verb node invokes several auxiliary nodes.
StemAspect: 
         1 {imperfective} =+= "<stemImperfective>" 
         2 {perfective} =+= "<stemPerfective>" 
The stem depends on the aspect; it either results in the imperfective or the perfective stem. The =+= notation preserves all the elements of the query path, including those otherwise removed by matching the left-hand side of the rules.
SuffixTense1: 
         1 <> == 
         2 {conj1 present imperfective subjunctive} == ē 
         3 {present imperfective subjunctive} == ā 
         4 {perfective} == e r 
         5 {past perfective subjunctive} == i s s ē 
         6 {present perfective indicative} == I 
         7 {3 pl present perfective indicative} == ē r 
         8 {3 pl present perfective subjunctive} == e r 
         9 {past imperfective indicative} == b 
         10 {future imperfective indicative} == b 
         11 {past imperfective indicative conj3io} == i ē b 
         12 {past imperfective indicative conj4} == ē b 
         13 {future imperfective indicative $conj34} == 
         14 {future imperfective indicative conj4} == ē 
         15 {future imperfective indicative conj3io} == ē 
         16 {past imperfective subjunctive} == r ē 
       
The SuffixTense1 node contains most tense information. It ranges from very specific rules, like Rule 14, to fairly general rules, such as Rule 4, which is overridden by more specific Rules 5-8.
SuffixTense2: 
         1 {past indicative} == ā 
         2 {present perfective subjunctive} == ī 
         3 <> == 
       
The second tense suffix is usually empty (Rule 3), but it is occasionally either ā or ī.
SuffixPersonVoice: 
         1 {2 sg} == I SuffixVoice SuffixPerson 
         2 {2 pl passive} == I m i n ī 
         3 <> == SuffixPerson SuffixVoice 
The suffix for person and voice is occasionally quite specific, as in the second person plural passive. In the second person singular, the voice suffix (r for the passive) precedes the person suffix, as in laudāris “you (sg) are praised,” whereas the voice suffix usually follows the person suffix, as in laudāmur “we are praised.”
SuffixVoice: 
         1 {passive} == r 
         2 {passive 3} == u r 
         3 <> == 
       
In general, Rule 3 indicates that there is no suffix for voice. However, there is a suffix for the passive voice, which is generally r (Rule 2) but sometimes ur (Rule 3).
SuffixPerson: 
         1 {1 sg} == m 
         2 {1 sg present imperfective indicative} == ō
         3 {1 sg future indicative} == ō 
         4 {1 sg present perfective indicative} == ī 
         5 <> == SuffixalVowel Desinence
The personal suffix is usually a vowel and a desinence (Rule 5), but the first person singular is exceptional, with a general suffix m (Rule 1) but sometimes ō or ī.
SuffixalVowel: 
         1 {future} == I 
         2 {present imperfective indicative} == I 
         3 {past imperfective subjunctive} == I 
         4 {3 pl +2} == u 
         5 {3 pl future perfective active} == I 
         6 <> == 
       
The vowel has several forms; sometimes it is the morphophoneme I as in laudābit “he will praise,” but sometimes u, as in laudābunt “they will praise.”
Desinence: 
         1 {2 sg} == I s 
         2 {2 sg present perfective indicative} == s t ¯ı 
         3 {3 sg} == t 
         4 {1 pl} == m u s 
         5 {2 pl} == t i s 
         6 {2 pl present perfective indicative} == s <2 pl> 
         7 {3 pl} == n t 
       
The desinence provides the final consonants, typically marking person and number, but occasionally influenced by aspect and tense (Rules 2 and 6).

The Sandhi Node

After we assemble the entire verb, we apply language-specific sandhi rules to account for phonological alterations.
Sandhi: 
         1 <wordEnd> == 
         2 <$letter> == $letter <> 
         3 <s r wordEnd> == <r wordEnd> 
         4 <I> == <i> 
         5 <$unroundedVowel I> == <$unroundedVowel> 
         6 <I r> == >e r> 
         7 <I $longUnroundedVowel> == $longUnroundedVowel> 
         8 <I e wordEnd> == e % canIe => cane 
         9 <i i> == <i> % cap i i ē bam -> capieebam 
         10 <ā ō> == ¯o> 
         11 <ā ē> == <ē> 
         12 <ē ō> == <e ō> 
         13 <ē ā> == <e ā> % moneeām -> moneām 
         14 <ī ō> == <i ō> % audīoo -> audioo 
         15 <ī ē> == <i ē> % audīees -> audiees 
         16 <ī ā> == <i ā> % audīām -> audiām -> 
         17 <ī ū> == <i ū> % audīūnt -> audiūnt 
         18 <ū $vowel> == <u $vowel> % frūctū-um -> 
             frūctu-um; cornūa-> 
         19 <$longVowel $nonSibilantConsonant wordEnd> ==  
             <Shorten:<$longVowel> $nonSibilantConsonant wordEnd> 
         20 <$longVowel $stop#1 $stop#2> == 
             <Shorten:<$longVowel> $stop#1 $stop#2> 
         21 <$longUnroundedVowel u> == 
             <$longUnroundedVowel> % dūceebāunt -> 
             dūceebānt -> .. 
         22 <c s> == <x> 
         23 <g s> == <c s> 
         24 <$consonant r wordEnd> == <$consonant e r 
             wordEnd>
       
Unlike other nodes, Sandhi works strictly from left to right, dealing with a few atoms at a time. The first rule removes the wordEnd marker if that is all that is left. The other rules simplify the beginning of the remaining string of letters and then use angle brackets to direct the modified string back to the Sandhinode.
Rule 2 is quite general: if no more specific rule applies, it takes a single letter from the query as output, and directs the remainder of the query back to Sandhi. Rule 3 converts forms like *dūcimusr to *dūcimur. Rules 4-8 deal with the morphophoneme I, typically converting it to i (Rule 5), but sometimes converting it to e or removing it entirely. Rules 9-18 deal with two vowels in a row; typically, the second vowel is retained, and the first either disappears or shortens. Rules 19 and 20 shorten long vowels in certain contexts by applying to the Shorten rule, which we omit here. Finally, Rules 22-24 introduce spelling rules. We include them in under the rubric of sandhi.

Strategies for Building KATR Theories

We have been applying KATR to natural-language morphology for several years. In addition to Latin, we have built a complete morphology of Hebrew verbs [Finkel 2007b], large parts of Sanskrit (and other related languages), and smaller studies of Bulgarian, Swahili, Georgian, Lingala, Spanish, Polish, and Turkish. KATR allows us to represent morphological rules for these languages with great elegance.
Writing specifications in KATR is not easy. KATR is capable of representing elegant theories, but arriving at those theories requires considerable effort. Early choices color the structure of the resulting theory, and the author must often discard attempts and rethink how to represent the target morphology. The hardest choice is often whether to model a form by introducing a sandhi rule or a formative rule. An example is the -mur suffix that marks first person plural passive. We choose to model this ending as mus + r and to reduce the result by sandhi. We could have introduced instead a rule in the Desinence node:
4.5 {1 pl passive} = m u
and let the r appear due to the SuffixVoice node. In this case, we prefer not to introduce a special rule in Desinence, partially because it looks so similar to the existing Rule 4, and therefore seems unparsimonious, and partially because we hypothesize that historically there really was some form *-musr that eventually elided the two final consonants.

An Implicative KATR Theory for Latin

The Paradigm Chart

We start this analysis by presenting a paradigm of word forms for Latin verbs. Table 2 displays a subset of the entire paradigm that covers only the present indicative active forms. The roots of lexemes are abstracted away from this paradigm.
CONJ PrIAc1s PrIAc2s PrIAc3s PrIAc1p PrIAc2p PrIAc3p
TEMPLATE 4S1C 1S1Cs 1S1Ct 4S1Cmus 1S1Ctis 4S1Cnt
cIa ō ā ā ā ā ā
cIb ō ā ā ā ā ā
cIc ō ā ā ā ā ā
cIIa ē ē ē ē ē
cIIb ē ē ē ē ē
cIIc ē ē ē ē ē
cIId ē ē ē ē ē
cIIe ē ē ē ē ē
cIIIa ō i i i i i
cIIIb ō i i i i i
cIIIc ō i i i i i
cIIId ō i i i i i
cIIIe i i i i i
cIIIf ō
cIIIs um u u
cIVa ī ī ī ī ī
cIVb ī ī ī ī ī
cIVc ī ī ī ī ī
cIVd ī ī ī ī ī
Table 2. 
Latin paradigm fragment (6 of the 92 columns)
We have expanded the traditional four conjugations into 19 conjugations. They are mostly distinguished by forms not shown here, particularly by the perfect indicative active. For instance, a cIa verb such as iuvāre “help” forms the perfect stem by lengthening the middle vowel, as in iūvī “I helped,” whereas a cIb verb such as laudāre “praise” simply adds āvī to the stem to form the perfect 1 singular. For completeness, we even include conjugation cIIIs for the two exceptional verbs esse “be” and posse “be able.”
The line marked CONJ simply lists the morphosyntactic property sets for our convenience.
The line marked TEMPLATE indicates parts of the chart that are constant within each column. For instance, the second-person singular is marked with 1S1Cs. The 1S part means “place the first stem here.” We choose to make the first stem the present stem. Next, 1C means “place the first entry in the column here.” Our Latin charts have only a single entry per column in each row. When an entry is empty, we mark it with . Finally, s means “place the letter s here”. All Latin verbs follow this strategy for building the perfect indicative active 2nd person singular forms.
Our chart requires that each verb have five stems, possibly identical: present, perfect, supine, present first person, and present past subjunctive. For iuvāre “help,” these stems are iuv, , , iuv, and iuv, respectively. For esse “be,” the stems are es, fu, es, s, and es, respectively.
It often happens that a conjugation regularly refers one stem to another. For instance, class cIa refers the fourth and fifth stems to the first; class cIIIs refers the third to first. (For consistency, we always refer stems to earlier-numbered stems.) We represent stem referrals as an addendum to our chart, as in Table 3.
REFER cIa 4 , 5 -> 1
REFER cIb 2 - 5 -> 1
REFER cIc 2 - 5 -> 1
REFER cIIa 2 - 5 -> 1
REFER cIIb 2 - 5 -> 1
REFER cIIc 2 - 5 -> 1
REFER cIId 4 , 5 -> 1 ; 3 -> 2
REFER cIIe 4 , 5 -> 1
REFER cIIIa 2 - 5 -> 1
REFER cIIIb 2 - 5 -> 1
REFER cIIIc 2 - 5 -> 1
REFER cIIId 2 - 5 -> 1
REFER cIIIs 3 -> 1
REFER cIIIe 3 - 5 -> 1
REFER cIVa 3 - 5 -> 1
REFER cIIIf 4 - 5 -> 1
REFER cIVb 2 - 5 -> 1
REFER cIVc 2 - 5 -> 1
REFER cIVd 2 - 5 -> 1
Table 3. 
Latin stem referrals
We then represent the lexicon by indicating the conjugation and stems of each verb as another addendum to the chart, as in Table 4.
LEXEME help cIa 1:iuv 2:ded 3:iū
LEXEME bathe cIa 1:lav 2:lāv 3:lau
LEXEME stand cIa 1:st 2:stet 3:sta
LEXEME give cIa 1:d 2:ded 3:da
LEXEME praise cIb 1:laud
LEXEME rattle cIc 1:crep
LEXEME destroy cIIa 1:dēl
LEXEME mourn cIIb 1:lūg
LEXEME order cIIb 1:iub
LEXEME warn cIIc 1:mon
LEXEME see cIId 1:vid 2:vīd
LEXEME arouse cIIe 1:ci 2:cī 3:ci
LEXEME decide cIIIa 1:dēcern
LEXEME nourish cIIIb 1:al
LEXEME lead cIIIc 1:dūc
LEXEME attach cIIId 1:fīg
LEXEME take cIIIe 1:cap 2:cēp
LEXEME carry cIIIf 1:fer 2:tul 3:lāt
LEXEME be cIIIs 1:es 2:fu 4:s
LEXEME be able cIIIs 1:potes 2:possu 4:poss 5:pos
LEXEME go cIIIs 1:i 2:i 4:e 5:ī
LEXEME come cIVa 1:ven 2:vēn
LEXEME hear cIVb 1:aud
LEXEME leap cIVc 1:sal
LEXEME bind cIVd 1:vinc
Table 4. 
Latin lexicon
The final section of the paradigm chart expresses rules of sandhi, as shown in Table 5.
CLASS finalStop r m t nt
SANDHI s s | => s % esst => est
SANDHI s r => s s % esret => esset
SANDHI s b => r % esbam => eram
SANDHI ā\verb+\+s*[:finalStop:] | => a $1
SANDHI ī\verb+\+s*[:finalStop:] | => i $1
SANDHI ē\verb+\+s*[:finalStop:] | => e $1
SANDHI ō\verb+\+s*[:finalStop:] | => o $1
SANDHI ū\verb+\+s*[:finalStop:] | => u $1
SANDHI g s => x % lugsi => luxi
SANDHI b s => s s % iubsī => iussī
SANDHI b t => s s % iubtum => iussum
SANDHI g t => c t % lugtum => luctum
SANDHI c s => x % ducsi => duxi
SANDHI d s => s % vīdsum => vīsum
SANDHI rn t => rt % dēcerntum => dēcertum
Table 5. 
Latin sandhi rules
These rules sometimes truly express sandhi, such as the rules for shortening long vowels before a final stop. Others are merely spelling rules, such as converting cs to x. We introduce a few in order to make our paradigm more regular, such as changing sb to r. In many cases, we indicate the situation that led us to introduce the rule by a comment starting with %.

Deriving the Essence of the Paradigm

Before analyzing the paradigm, we reduce it to its essence. The first reduction removes identical conjugations. This situation does not arise in Latin, but it does in French, where 71 of the 149 conjugations listed in the Larousse Dictionary are redundant after we present them on our paradigm form, and another 11 are identical except for stem-referral pattern.
The next step is to remove redundant columns (morphosyntactic property sets, or MPSs). For example, although the second and third person verb forms are different in the present indicative active, the columns associated with those forms are identical. The differences are all covered by the template and sandhi rules. Of the 92 MPSs in Latin, there are only 14 unique ones.
The last step is to remove essentially identical columns. Two columns are essentially identical if there is a one-to-one and onto mapping between the exponences (cell contents) found in those columns. For example, the future perfect indicative active 2nd singular MPS is essentially the same as another MPS. Our analysis reduces the chart from 92 MPSs to 9 important ones, which we call distillations.
We call the reduced paradigm chart the essence. The essence does not contain actual exponences; we substitute unique symbols instead. Table 6 presents the essence of the Latin paradigm. An entry like e4_2 means that this distillation is based on the fourth MPS (which happens to be PrIAc1p ) and has the second exponence found in that MPS (which happens to be ē).
cIa e1_1 e2_1 e4_1 e13_1 e25_1 e37_1 e55_1 e58_1 e92_1
cIb e1_1 e2_1 e4_1 e13_1 e25_1 e37_2 e55_1 e58_2 e92_2
cIc e1_1 e2_1 e4_1 e13_1 e25_1 e37_3 e55_1 e58_2 e92_3
cIIa e1_2 e2_2 e4_2 e13_2 e25_2 e37_4 e55_2 e58_3 e92_4
cIIb e1_2 e2_2 e4_2 e13_2 e25_2 e37_5 e55_2 e58_3 e92_1
cIIc e1_2 e2_2 e4_2 e13_2 e25_2 e37_3 e55_2 e58_3 e92_3
cIId e1_2 e2_2 e4_2 e13_2 e25_2 e37_1 e55_2 e58_3 e92_5
cIIe e1_2 e2_2 e4_2 e13_2 e25_2 e37_6 e55_2 e58_3 e92_1
cIIIa e1_1 e2_3 e4_3 e13_2 e25_3 e37_1 e55_3 e58_4 e92_1
cIIIb e1_1 e2_3 e4_3 e13_2 e25_3 e37_3 e55_3 e58_4 e92_1
cIIIc e1_1 e2_3 e4_3 e13_2 e25_3 e37_5 e55_3 e58_4 e92_1
cIIId e1_1 e2_3 e4_3 e13_2 e25_3 e37_5 e55_3 e58_4 e92_5
cIIIe e1_3 e2_3 e4_3 e13_3 e25_4 e37_1 e55_4 e58_5 e92_1
cIIIf e1_1 e2_4 e4_4 e13_2 e25_3 e37_1 e55_5 e58_1 e92_6
cIIIs e1_4 e2_4 e4_5 e13_4 e25_5 e37_1 e55_6 e58_6 e92_0
cIVa e1_3 e2_5 e4_6 e13_3 e25_4 e37_1 e55_4 e58_5 e92_1
cIVb e1_3 e2_5 e4_6 e13_3 e25_4 e37_7 e55_4 e58_5 e92_7
cIVc e1_3 e2_5 e4_6 e13_3 e25_4 e37_3 e55_4 e58_5 e92_1
cIVd e1_3 e2_5 e4_6 e13_3 e25_4 e37_5 e55_4 e58_5 e92_1
Table 6. 
Essence of the Latin paradigm

Principal Parts

Intuitively, a set of principal parts is a minimal subset of MPSs so that if one knows the exponences of those MPSs for a particular verb, one can deduce the verb's conjugation, from which one can deduce all the other MPSs of the verb. The practical utility of principal parts for language pedagogy has long been recognized. Generations of Latin students have learned that each verb in Latin has four principal parts (present indicative active first person singular, active infinitive, perfect indicative active first person singular, perfect passive participle (neuter nominative singular)). If one knows laudō, laudāre, laudāvī, laudātum, one knows enough to place the verb in conjugation 1, from which one can determine the exponences of all the other MPSs by reference to the paradigm chart.
We can define several kinds of principal-part systems [Finkel 2007a].
  • Static: a set of MPSs that applies for all verbs in the chart. Given the exponences of a verb for those MPSs, one can deduce its conjugation. Static principal parts are equivalent to the traditional understanding.
  • Adaptive: a tree of MPSs. Given the exponence of a verb for the MPS at the root of the tree, one can select an appropriate subtree and recurse. The leaves of the tree are conjugations.
  • Dynamic: a set of {MPS, exponence} pairs for each conjugation. If a verb agrees with a set of pairs, it belongs to the associated conjugation.
We have built a program that takes the essence of a paradigm and computes its principal-part systems. For Latin, we find that there are, in fact, four static principal parts, with ten variations, as shown in Table 7. It is a bit surprising that the infinitive does not figure into any of these variations. However, the paradigm from which we calculate this result places the infinitive MPS almost at the end. It turns out that the exponences for the infinitive are identical to the exponences for the imperfect subjunctive active first person singular. For laudō, for instance, both show ā, one using that exponence to form the infinitive laudāre, and the other to form the subjunctive laudārem. In turn the MPS for the imperfect subjunctive active first person singular is essentially identical to the MPS for the present indicative active second person singular. Therefore, the first variation in Table 7 is the traditional set of principal parts.
1 Present 1 sg, Present 2 sg, Perfect 1 sg, Supine
2 Present 1 sg, Present 1 pl, Perfect 1 sg, Supine
3 Present 2 sg, Imperfect 1 sg, Perfect 1 sg, Supine
4 Present 2 sg, Future perfect 1 sg, Perfect 1 sg, Supine
5 Present 2 sg, Perfect 1 sg, Present subj 1 sg, Supine
6 Present 2 sg, Perfect 1 sg, Present subj 1 pl, Supine
7 Present 1 pl, Imperfect 1 sg, Perfect 1 sg, Supine
8 Present 1 pl, Future perfect 1 sg, Perfect 1 sg, Supine
9 Present 1 pl, Perfect 1 sg, Present subj 1 sg, Supine
10 Present 1 pl, Perfect 1 sg, Present subj 1 pl, Supine
Table 7. 
Latin static principal parts (indicative active unless otherwise marked)
Our program computes one of the possible trees representing adaptive principal parts, as shown in Figure 2, which also shows a representative verb from each conjugation. Some conjugations can be determined with only two adaptive principal parts. For instance, if the present second person singular form uses ās and the perfect first person singular form is ī, then the conjugation is cIa, as in iuvāre “help.” Others require three adaptive principal parts, such as dūcere “lead.” However, no conjugation needs four principal parts. This analysis shows that the most important distinction, the one at the top of the tree, is based on the present indicative active second person singular form, which we note above is essentially the same as the active infinitive form.
Latin adaptive principal parts
Figure 2. 
Latin adaptive principal parts (all indicative active)
We also compute dynamic principal parts. Table 8 displays one set for each conjugation; in general, each conjugation has several variations. This analysis shows that many conjugations can be completely determined by a single principal part. For example, if the perfect indicative active 1 sg form of a verb is -āvī, the verb is in conjugation cIb. Only one conjugation, cIIIc, requires three exponences to determine all its forms.
cIa Present 2 sg, Present subjunctive 1 pl
cIb Perfect 1 sg
cIc Present subjunctive 1 pl, Supine
cIIa Perfect 1 sg
cIIb Present 1 sg, Perfect 1 sg
cIIc Present 1 sg, Supine
cIId Present 1 sg, Perfect 1 sg
cIIe Perfect 1 sg
cIIIa Perfect 1 sg, Present subjunctive 1 sg
cIIIb Perfect 1 sg, Present subjunctive 1 sg
cIIIc Perfect 1 sg, Present subjunctive 1 sg, Supine
cIIId Present subjunctive 1 sg, Supine
cIIIe Present 1 sg, Present 2 sg
cIIIf Present 1 pl
cIIIs Present 1 sg
cIVa Present 2 sg, Perfect 1 sg
cIVb Perfect 1 sg
cIVc Present 2 sg, Perfect 1 sg
cIVd Present 2 sg, Perfect 1 sg
Table 8. 
Latin dynamic principal parts (indicative active unless otherwise marked)

Grouping

Computing the adaptive principal parts produces one way to see the interrelation of the conjugations, as shown in Figure 2. We can compute the interrelation in a more direct way by using an algorithm based on Huffman encoding [Wikipedia 2009]. We define the distance between two conjugations as the number of distillations on which they disagree. We repeatedly find the two conjugations of minimum distance, delete them from the set of conjugations, combine them, and insert the result, a pseudo-conjugation, back into the set of conjugations. A pseudo-conjugation has a compound value for those distillations where the two conjugations disagree. We consider a compound value to be of distance 0 from any superset or subset.
This algorithm leads to multiple possible analyses, because we may be able to choose among several minimum pairs. Each analysis is a taxonomic tree. Figure 3 shows one tree that our program produces. The entries like e13_2 refer to the essence of Figure 3. They show in what way each node in the tree is distinguished from its siblings. For instance, conjugations cIIb and cIIe are distinguished by e37, which is perfect indicative active first person singular.
Latin conjugation groups
Figure 3. 
Latin conjugation groups
Figure 3 verifies that the usual conjugation nomenclature is reasonable. All three cI conjugations are close to each other, although cIa is a slight outlier. Conjugations cIIIa−d are very close to each other, but cIIIf (ferre) is significantly farther away. Conjugation cIIIs (esse) is still farther. Strangely, conjugation cIIIe (capere) is grouped with the cIV conjugations; apparently, i-stem third conjugation verbs share more connection with the fourth conjugation than the third.

Generating a KATR Theory

We can generate a KATR theory directly from the paradigm of Table 2 along with the stem-referral rules of Table 3, and the lexicon of Table 4. We can take advantage of the grouping (Figure 3) to generate a fairly compact KATR theory. Figure 4 shows a fragment of the computed KATR theory.
The Help node introduces the three stems required by conjugation cIa. It refers all other requests to the CONJcIa node, which refers the remaining stems to the first stem. It also provisions the version of distillation e58 to have variant 1. It refers other requests to a chain of grouping nodes, here shown as Join12, Join15, Join17, and Join18, each of which provisions some distillations and hands of other requests to the next node in the chain. Finally, node Join18 refers all morphological queries to EXPAND. This node, of which we show only a small piece, combines the result of referring to nodes MPS1 through MPS92, for each one looking up the appropriate exponence. MPS1, which generates the present indicative active first person singular form, invokes node T02 with a parameter that depends on the value of the e1 distillation. Finally, the T02 node looks up the appropriate stem (in this case, stem 4) and combines it with the given ending.
Automatically generated KATR theory
Figure 4. 
Automatically generated KATR theory (fragment)
Table 9 shows some of the output that KATR generates for this theory. We use such output to verify that we have correctly captured the original paradigm in our chart of Table 2.
iuvō iuvās iuvat iuvāmus iuvātis iuvant
laudō laudās laudat laudāmus laudātis laudant
moneō monēs monet monēmus monētis monent
dūco dūcis dūcit dūcimus dūcitis dūcunt
sum es est sumus estis sunt
Table 9. 
Output of automatically generated KATR theory (fragment)

Conclusions

This exercise demonstrates that both the realizational and the implicative approaches to defining language morphology lead to effective descriptions, as evidenced by the KATR theories they produce. An advantage of the realizational approach is that it allows us to apply language-specific knowledge and insight to create a default inheritance hierarchy that captures the morphological structure of the language, with slots pertaining to different morphosyntactic properties. However, as we have noted elsewhere [Finkel 2007b] writing KATR specifications requires considerable effort. Early choices color the structure of the resulting theory, and the author must often discard attempts and rethink how to represent the target morphology. We have built KATR theories for verbs in Hebrew, Slovak, Polish, Spanish, and Lingala (a Bantu language of the Congo), as well as for parts of Hungarian, Sanskrit, and Pali.
The implicative approach is much more automatic. One still needs to manually construct the initial paradigm, decide how many stems are needed (for Latin, we use five; for French, we use 15), and abstract as much information as possible into the templates for each MPS. After that, we can use automatic methods that reduce the paradigm chart to its essence, group conjugations, and generate an effective KATR theory. These steps take only a few seconds to complete (on a 1.8GHz Intel Pentium running Linux, only one second). It is a simple (albeit tedious) matter to verify that all the forms the KATR theory generates are accurate. The KATR theory itself is fairly compact, taking advantage of grouping. However, it is about twice the size of the hand-built theory (measured in characters). More important, it doesn't clearly delineate the slots of the exponences. It is therefore somewhat less satisfying, somewhat less informative, than the KATR theory we build manually following the realizational approach. We have applied the implicative approach to French (both as spelled and as pronounced), Hebrew, and Yiddish, as well as some lesser-known languages, such as Comaltepec Chinantec (Oto-Manguean, spoken in Oaxaca, Mexico), Fur (Nilo-Saharan, in Darfur), and Sora (Austro-Asiatic, India).
The implicative approach also has the advantage that it allows us to analyze the principal parts of the language based solely on the exponences in the paradigm chart. We have taken advantage of that ability elsewhere to characterize languages based on properties of their principal parts [Finkel 2007a]. For example, Latin, along with Sanskrit, but in strong contradistinction to Comaltepec Chinantec, has a very orthogonal set of principal parts: Each principal part tends to govern a disjoint set of MPSs.
As Latin developed into the Romance languages, its scheme of conjugations and principal parts evolved. Initial investigation of French, following the same implicative approach as that shown here for Latin, shows that 9 static principal parts are needed to distinguish the 67 distinct conjugations. Ignoring spelling and considering only pronunciation, we have been able to reduce this total to 7 principal parts distinguishing 35 conjugations. This investigation continues.

Acknowledgments

We would like to thank Lei Shen and Suresh Thesayi, who were instrumental in implementing our Java™ version of KATR. Nancy Snoke assisted in implementing our Perl/Prolog version.
This work was partially supported by the National Science Foundation under Grants IIS-0097278 and IIS-0325063 and by the University of Kentucky Center for Computational Science. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Glossary

  • Cell: A position in a full table of word forms, where the row is the inflection class (such as conjugation 1), and the column represents a set of morphosyntactic properties (such as first person singular present indicative active).
  • Desinence: An inflectional ending, usually added to a stem according to its syntactic context. For example, amat “he/she loves” has the desinence -t.
  • Diacritic: A marker of a particular morphophonological property. For example, the fact that a verb is in conjugation 4 is a diacritic.
  • Exponence: The contents of a cell for a given lexeme, such as amat.
  • Morphophoneme: A phonological unit whose phonemic expression depends on its context. For example in our Latin KATR theory, we use I as the phonological unit in conjugation 3 (i-stem) that is either expressed as the phoneme i (as in capiō “I grab”), as the phoneme e (before r, as in capere “to grab”), or disappears entirely (such as before ī, as in cēpī “I have grabbed”).
  • Node: A set of rules in a KATR theory to which a query is directed. The particular rule to apply depends on the query. Some nodes refer to others, leading to a hierarchical node structure.
  • Sandhi: Rules of euphony or spelling. For example ēō is pronounced , as in videō “I see,” and cs is spelled x, as in dūxī “I have led.”

Notes

[1] See the glossary for a definition of technical terms.

Works Cited

Anderson 1992 
Anderson, Stephen R. Amorphous Morphology. Cambridge: Cambridge University Press, 1992.
Blevins 2005 
Blevins, James P. “Word-Based Declensions in Estonian”. In Geert Booij and Jaap van Marle, eds., Yearbook of Morphology 2005. Dordrecht: Springer, 2005. pp. 1-25.
Blevins 2006 
Blevins, James P. “Word-based Morphology”. Journal of Linguistics 42 (2006), pp. 531-573.
Corbett 1993 
Corbett, Greville G., and Norman Fraser. “Network Morphology: A DATR Account of Russian Nominal Inflection”. Journal of Linguistics 29 (1993), pp. 113-142.
Evans 1989 
Evans, Roger, and Gerald Gazdar. “Inference in DATR”. Presented at ACL 1989. Proceedings of the Fourth Conference on the European Chapter of the Association for Computational Linguistics (1989), pp. 66-71.
Finkel 2002 
Warning: Biblio formatting not applied. Finkel R. ShenL. StumpG. ThesayiS. KATR: A Set-based Extension of DATR. Technical Report 346-02, University of Kentucky Department of Computer Science, Lexington, KY. 2002. ftp://ftp.cs.uky.edu/cs/techreports/346-02.pdf.
Finkel 2007a 
Finkel, Raphael, and Gregory Stump. “Principal Parts and Morphological Typology”. Morphology 17 (2007), pp. 39-75.
Finkel 2007b 
Finkel, Raphael, and Gregory Stump. “A Default Inheritance Hierarchy for Computing Hebrew Verb Morphology”. Literary and Linguistic Computing 22: 2 (2007), pp. 117-136. dx.doi.org/10.1093/llc/fqm004.
Finkel 2009 
Finkel, Raphael, and Gregory Stump. “Principal Parts and Degrees of Paradigmatic Transparency”. In James P. Blevins and Juliette Blevins, eds., Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 2009.
Matthews 1972 
Matthews, P.H. Inflectional Morphology. Cambridge: Cambridge University Press, 1972.
Stump 2001 
Stump, Gregory. Inflectional Morphology: A Theory of Paradigm Structure. Cambridge: Cambridge University Press, 2001.
Wikipedia 2009 
Wikipedia. Huffman Coding. , 2009. http://en.wikipedia.org/wiki/Huffman_coding.
Zwicky 1985 
Zwicky, Arnold M. “How to Describe Inflection”. Presented at BLS 1985. Proceedings of the Eleventh Annual Meeting of the Berkeley Linguistics Society (1985), pp. 372-386.