nlp4

Syntax of Various Sentence Forms

Auxiliary Verbs

The main auxiliary verbs in English are be and have, the modal verbs (do, can, may, will, might, should, must, need, and dare) and the form of be used in the passive (e.g., being seen). The ordering of the auxiliaries in a verb phrase is fairly simple:

(Modal) (Have) (Be)

which means that each part is optional, but modals always precede Have and Be, and Have always precedes Be. Here are some examples, using see as the main verb:

could have been seen

could have seen

can see

have seen

was seen

To enforce agreement in form among the auxiliaries and their main verb, the COMPFORM attribute is added to their entries in the lexicon, and then enforced in their entries in the grammar. Here is an example lexicon entry for the auxiliary verb could:

(could (aux (modal +) (root COULD1) (vform (? v pres past)) (agr ?a) (COMPFORM bare)))

This says that could is a modal, can be used with a verb in the present or past tense (e.g., could see or could have seen), requires person and number agreement, and has as complement the base (bare) form of the main verb when used alone with it. A grammar rule that incorporates auxiliaries is illustrated below:

VP --> (AUX COMPFORM ?v) (VP VFORM ?v)

which is encoded as follows for the Allen interpreter:

((vp) -2> (head (aux (compform ?v))) (vp (vform ?v)))

This allows a verb phrase to be formed out of an auxiliary verb (like could) and a main verb, provided that the COMPFORM attribute of the auxiliary is the same as the VFORM of the main verb (like bare, in the verb phrase could be).

More examples of auxiliary verb coding and use in the lexicon and grammar are illustrated in Allen, pp 123-127. Here is a listing of the lexicon for auxiliary verbs, given as the file chapt5 in the jallen/Parser1.1 directory.

(setq *lexicon5-2*
'((can (aux (modal +) (root CAN1) (vform pres) (agr ?a) (COMPFORM bare)) can1)
    (could (aux (modal +) (root COULD1) (vform (? v pres past)) (agr ?a)
           (COMPFORM bare)))
    (do (aux (modal +) (root DO1) (vform pres) (agr (? a 1s 2s 1p 2p 3p))
        (COMPFORM bare)))
    (does (aux (modal +) (root DO1) (vform pres) (agr 3s) (COMPFORM bare)))
    (did (aux (modal +) (root DO1) (vform past) (agr ?a) (COMPFORM bare)))
    (have (aux (vform bare) (root HAVE-AUX) (COMPFORM pastprt)))
    (have (aux (vform pres) (root HAVE-AUX) (agr (?a 1s 2s 1p 2p 3p))
          (COMPFORM pastprt)))
    (has (aux (vform pres) (root HAVE-AUX) (agr 3s) (COMPFORM pastprt)))
    (had (aux (vform past) (root HAVE-AUX) (agr ?a) (COMPFORM pastprt)))
    (having (aux (vform ing) (root HAVE-AUX) (COMPFORM pastprt)))
    (be (aux (root BE-AUX) (VFORM bare) (COMPFORM -)))
    (is (aux (root BE-AUX) (VFORM pres) (COMPFORM -) (AGR 3s)))
    (am (aux (root BE-AUX) (VFORM pres) (COMPFORM -) (AGR 1s)))
    (are (aux (root BE-AUX) (VFORM pres) (COMPFORM -) (AGR (?a 2s 1p 2p 3p))))
    (was (aux (root BE-AUX) (VFORM past) (AGR (? a 1s 3s)) (COMPFORM -)))
    (were (aux (root BE-AUX) (VFORM past) (AGR (? a 2s 1p 2p 3p))
          (COMPFORM -)))
    (been (aux (root BE-AUX) (VFORM pastprt) (COMPFORM -)))
    (being (aux (root BE-AUX) (VFORM ing) (COMPFORM -)))))

Auxiliaries: a Prolog View

Here is another treatment of auxiliaries, from a Prolog point of view. This will be useful later when we discuss the semantics of sentences using a Prolog style of representation. This discussion comes from Pereira, p 114ff.

In this discussion, we identify the idea of a finite verb as one which is a complete verb phrase, like "halts," "halted," "writes a program," "is halting," or "has been halting." A nonfinite verb refers to a verb's base form, such as "halt." We also distinguish infinitival forms, like "to halt," present participles, like "halting," and past participles, like "halted." Now the lexicon for verbs can be encoded in the following way:

iv(Form) --> [IV], {iv(IV, Form)}.
iv(halts, finite).
iv(halt, nonfinite).
iv(halting, present_participle).
iv(halted, past_participle).

aux(Form) --> [Aux], {aux(Aux, Form)}.
aux(could, finite/nonfinite).
aux(have, nonfinite/past_participle).
aux(has, finite/past_participle).
aux(been, past_participle/present_participle).
aux(be, nonfinite/present_participle).

Now verb phrases can be formed with or without auxiliaries, using the following rules:

vp(Form) --> iv(Form).
vp(Form) --> tv(Form), np.
vp(Form) --> aux(Form/Require), vp(Require).

The above rules cover various kinds of verb phrases. For instance, the auxiliary verb been is a past participle and combines with a present participle, and have is nonfinite and takes a past participle when used as an auxiliary.

Thus, "have been halting" is a verb phrase with the auxiliary Form = nonfinite (have) and Require = past_participle (been halting). Taking the next step, the verb phrase "been halting" responds to the same grammar rule, with the auxiliary Form = past_participle (been) and Require = present_participle (halting). Finally, the verb phrase "halting" is an intransitive verb (using the first rule) with Form = present_participle.

Passives

Most verbs that include an NP in their complement (i.e., are transitive) allow the passive form as well. For instance, the active verb phrase in the sentence

Jack can see the dog.

has a passive form in sentences like the following:

The dog was seen.

To account for passives forms, grammars utilize the notion of a "passive gap", which is just a placeholder for an object that would normally complement a transitive verb. To explain passive verb phrases like the one above, the grammar is augmented with rules like the following:

VP[+pass] --> AUX[be] VP[pastprt, main, +passgap]
VP{+passgap, +main] --> V[_np]

That is, a verb phrase that is passive can be formed using the auxiliary be, followed by a verb phrase whose main verb is a past participle and then a passive gap. A verb phrase that is a main verb and has a passive gap can be any transitive verb (i.e., any verb that has the feature _np). These two rules are encoded as follows:

((vp (PASS +)) -5>
(head (aux (root BE-AUX))) (vp (vform pastprt) (MAIN +) (PASSGAP +)))
((vp (PASSGAP +) (MAIN +)) -8>
(head (v (subcat _np))))

Here is an encoding of the complete grammar shown in Figure 5.3 (page 127) of Allen, including rules like the ones discussed above. This grammar defines several different verb phrase structures, each allowing different combinations of auxiliary verbs and passive forms.

(setq *grammar5-3*
      '((headfeatures
         (s vform agr)
         (vp vform agr)
         (np agr))
        ((s (inv -))
         -1>
         (np (agr ?a)) (head (vp (vform (? v pres past)) (agr ?a))))
        ((vp)
         -2>
         (head (aux (compform ?v))) (vp (vform ?v)))
        ((vp)
         -3>
         (head (aux (root BE-AUX))) (vp (vform ing) (MAIN +)))
        ((vp)
         -4>
         (head (aux (root BE-AUX))) (vp (vform ing) (PASS +)))
        ((vp (PASS +))
         -5>
         (head (aux (root BE-AUX))) (vp (vform pastprt) (MAIN +) (PASSGAP +)))
        ((vp (PASSGAP -) (MAIN +))
         -6>
         (head (v (subcat _none))))
        ((vp (PASSGAP -) (MAIN +))
         -7>
         (head (v (subcat _np))) (np))
        ((vp (PASSGAP +) (MAIN +))
         -8>
         (head (v (subcat _np))))
        ((np)
          -9>
          (art (agr ?a)) (head (n (agr ?a))))
         ((np)
          -10>
          (head (name)))
         ((np)
          -11>
          (head (pro)))))

Below is a parse of the sentence "The dog was seen" which shows the roles of the various grammar rules for handling passive gaps. A simplified tree diagram of this parse is shown in Figure 5.4 of Allen (page 128).

REL181:<REL ((GAP -) (1 S180))> from 0 to 3 from rule -R5>
S180:<S ((GAP <NP ((SEM ?SEM177) (AGR ?AGR176))>) (WH -) (INV -)
           (VFORM PAST) (AGR 3S) (1 NP175)
           (2 VP179))> from 0 to 3 from rule -5-8-1>
    NP175:<NP ((GAP -) (WH -) (AGR 3S) (1 DET173)
               (2 CNP174))> from 0 to 2 from rule -5-7-2>
      DET173:<DET ((GAP -) (AGR 3S)
                   (1 ART167))> from 0 to 1 from rule -5-7-5>
        ART167:<ART ((LEX THE) (ROOT THE1)
                     (AGR (? A5 3P 3S)))> from 0 to 1 from rule NIL
      CNP174:<CNP ((GAP -) (AGR 3S)
                   (1 N168))> from 1 to 2 from rule -5-7-3>
        N168:<N ((LEX DOG) (ROOT DOG1)
                 (AGR 3S))> from 1 to 2 from rule NIL
    VP179:<VP ((GAP <NP ((SEM ?SEM177) (AGR ?AGR176))>) (VFORM PAST)
               (AGR 3S) (1 V170)
               (2 GAP178))> from 2 to 3 from rule -5-8-7>
      V170:<V ((LEX WAS) (ROOT BE1) (VFORM PAST) (AGR (? A7 3S 1S))
               (SUBCAT _NP))> from 2 to 3 from rule NIL
      GAP178:<NP ((EMPTY +) (GAP <NP ((SEM ?SEM177) (AGR ?AGR176))>)
                  (SEM ?SEM177)
                  (AGR ?AGR176))> from 3 to 3 from rule NP-GAP-INTRO
V191:<V ((VFORM PASTPRT) (ROOT SEE1) (SUBCAT _NP) (1 V171)
         (2 +EN172))> from 3 to 5 from rule -LEX5>
V171:<V ((LEX SEE) (ROOT SEE1) (VFORM BARE) (SUBCAT _NP)
           (IRREG-PAST +) (EN-PASTPRT +))> from 3 to 4 from rule NIL
+EN172:<+EN ((LEX +EN))> from 4 to 5 from rule NIL
REL196:<REL ((GAP -) (1 VP195))> from 3 to 5 from rule -R6>
VP195:<VP ((GAP <NP ((SEM ?SEM193) (AGR ?AGR192))>) (VFORM PASTPRT)
             (AGR -) (1 V191) (2 GAP194))> from 3 to 5 from rule -5-8-7>
    V191:<V ((VFORM PASTPRT) (ROOT SEE1) (SUBCAT _NP) (1 V171)
             (2 +EN172))> from 3 to 5 from rule -LEX5>
      V171:<V ((LEX SEE) (ROOT SEE1) (VFORM BARE) (SUBCAT _NP)
               (IRREG-PAST +) (EN-PASTPRT +))> from 3 to 4 from rule NIL
      +EN172:<+EN ((LEX +EN))> from 4 to 5 from rule NIL
    GAP194:<NP ((EMPTY +) (GAP <NP ((SEM ?SEM193) (AGR ?AGR192))>)
                (SEM ?SEM193)
                (AGR ?AGR192))> from 5 to 5 from rule NP-GAP-INTRO

Gaps and Movement in Sentences

So far we have looked at simple declarative sentences. Effective grammars also need to handle a variety of other sentential forms in which some part of the simple declarative form has moved to a new position. Here are four types of movement identified by Allen (attributed to Baker, 1989):

wh-movement: move a wh-term to the beginning of a sentence to form a wh-question. E.g., "Which dogs did he see?"
topicalization: move a constituent to the beginning of a sentene for emphases. E.g., "That dog he never liked."
Adverb preposing: move an adverb to the beginning of a sentence. E.g., "Tomorrow, he will see the dog."
Extraposition: move certain NP complements to the end of the sentence. E.g., "A book was written about evolution."

Handling Questions and Relative Clauses

The notion of a "gap" is a more general one than that which is used to handle passive forms. It is very useful these kinds of sentences as well. Let's look at how gaps can be used to handle movement in certain kinds of questions, like the following:

Which dogs did he see?

Here, the gap follows the verb phrase, and the word "Which" is sometimes called a "filler" for the gap (that is, a word that gives license to the existence of a gap following a transitive verb). Often words such as which (e.g., who, what, where, etc.; sometimes called the "wh-words") are also used at the head of relative clauses, such as in

The dogs which he saw returned.

So the coding of words like which in the lexicon must allow these different uses. The feature WH is used for this purpose. Here is an encoding of the word which in the lexicon that distinguishes its use in a question from its use in a relative clause (see Allen Figure 5.6, page 135 for more discussion of these examples).

(which (qdet (WH q) (root WHICH) (agr (? a 3s 3p))))
(which (pro (WH r) (root WHICH) (agr (? a 3s 3p))))

The first encoding says that which can be used to introduce wh-questions, and the second says that it can be used to introduce relative clauses. Some corresponding grammar rules that can be used with the first of these two uses are as follows (the entire grammar is given in the file lisp/jallen/Parser1.1/Grams/chap5):

       ((s) -5-8-3>
        (np (wh q) (gap -) (agr ?a))
        (head (s (inv +) (gap (% np (agr ?a))))))
   ((s (inv +) (wh ?w) (gap ?g)) -5-8-2>
        (head (aux (compform ?s) (agr ?a)))
        (np (wh ?w) (agr ?a) (gap -))
        (vp (vform ?s) (gap ?g)))
   ((np (wh ?w)) -5-7-2>
        (det (wh ?w) (agr ?a)) (head (cnp (agr ?a))))
   ((det (wh ?w)) -5-7-7>
        (head (qdet (wh ?w))))

The first rule says that a sentence can be constructed using a noun phrase of the WH variety (e.g., "which dogs") and a head of the inverted s variety (e.g., "did he see"). The second rule shows how an inverted s can be defined with a gapped vp. The third and fourth rules tell more about the structure of a np of the WH variety; that it can be a det of the qdet variety (e.g., which) followed by a complementary noun phrase ("dogs"). Agreement also appears in appropriate places, as does the location of the gap (e.g., following the transitive verb "see").

A full parse of the sentence "Which dogs did he see" appears below,. This corresponds to the chart parse shown and discussed on page 141 of Allen.

S218:<S ((VFORM PAST) (AGR 3S) (1 NP210)
         (2 S217))> from 0 to 6 from rule -5-8-3>
NP210:<NP ((GAP -) (WH Q) (AGR 3P) (1 DET204)
             (2 CNP209))> from 0 to 3 from rule -5-7-2>
    DET204:<DET ((GAP -) (WH Q) (AGR 3P)
                 (1 QDET198))> from 0 to 1 from rule -5-7-7>
      QDET198:<QDET ((LEX WHICH) (WH Q) (ROOT WHICH)
                     (AGR (? A23 3P 3S)))> from 0 to 1 from rule NIL
    CNP209:<CNP ((GAP -) (AGR 3P)
                 (1 N208))> from 1 to 3 from rule -5-7-3>
      N208:<N ((AGR 3P) (ROOT DOG1) (1 N199)
               (2 +S200))> from 1 to 3 from rule -LEX7>
        N199:<N ((LEX DOG) (ROOT DOG1)
                 (AGR 3S))> from 1 to 2 from rule NIL
        +S200:<+S ((LEX +S))> from 2 to 3 from rule NIL
S217:<S ((GAP <NP ((SEM ?SEM214) (AGR 3P))>) (WH -) (INV +)
           (VFORM PAST) (AGR 3S) (1 AUX201) (2 NP211)
           (3 VP216))> from 3 to 6 from rule -5-8-2>
    AUX201:<AUX ((LEX DID) (MODAL +) (ROOT DO1) (VFORM PAST) (AGR 3S)
                 (COMPFORM BARE))> from 3 to 4 from rule NIL
    NP211:<NP ((GAP -) (WH -) (POSS -) (AGR 3S)
               (1 PRO202))> from 4 to 5 from rule -5-7-1>
      PRO202:<PRO ((LEX HE) (ROOT HE1)
                   (AGR 3S))> from 4 to 5 from rule NIL
    VP216:<VP ((GAP <NP ((SEM ?SEM214) (AGR 3P))>) (VFORM BARE) (AGR -)
               (1 V203) (2 GAP215))> from 5 to 6 from rule -5-8-7>
      V203:<V ((LEX SEE) (ROOT SEE1) (VFORM BARE) (SUBCAT _NP)
               (IRREG-PAST +) (EN-PASTPRT +))> from 5 to 6 from rule NIL
      GAP215:<NP ((EMPTY +) (GAP <NP ((SEM ?SEM214) (AGR 3P))>)
                  (SEM ?SEM214)
                  (AGR 3P))> from 6 to 6 from rule NP-GAP-INTRO

Handling Gaps and Questions in Prolog

Prolog provides similar support for representing questions and other filler-gap situations in sentences. The simplest situation occurs with subject-auxiliary inversion, forming a yes-no question. For example, the sentence "the program could halt" can be turned into the question "could the program halt" by inverting the subject and the auxiliary verb could. This is characterized in the following grammatical rule (continung the grammar begun in the previous Prolog discussion).

sinv --> aux(finite/Required), np, vp(Required).

That is, an inverted sentence is formed by an auxiliary verb, followed by a noun phrase and a verb phrase that reflects the Required part of the auxiliary verb phrase.

Recall that a gap is part of a phrase missing from its usual location, and a filler is another phrase that stands for the missiing one. For instance, in "terry read every book that bertrand wrote", the filler is "that" and the gap occurs after the verb "wrote" which normally takes a noun phrase as an object. In Prolog, a gap is realized by omitting a noun phrase:

np(gap(np)) --> [].

Now a verb phrase that admits a gap can be formed from a transitive verb and a possibly-missing noun phrase:

vp(GapInfo) --> tv, np(GapInfo).
s(GapInfo) --> np(nogap), vp(GapInfo).
rel --> relpron, s(gap(np)).

That is, a relative clause is a relative pronoun followed by a sentence with a gap, as in "that bertrand wrote."

Wh-questions can be handled in Prolog using similar strategies. Questions like "who loves mary" and "who does mary love" are handled using the following rules, respectively:

q --> whpron, vp(nogap).
q --> whpron, sinv(gap(np)).
sinv(GapInfo) --> aux, np(nogap), vp(GapInfo).

How good are NLP systems in practice?

(Alshawi, 5) CLE (Core Language Engine, 1992) Structure: a comprehensive NLP system for English; four stages -- lexical analysis, morphology, syntactic alalysis, and semantic analysis. Additional disambiguation and contextual interpretation carried out in other phases -- sortal filtering, quatiifier scoping, reference and ellipsis resolution, and plausibility checking.

CLE performance in 1992. A sample of 1000 sentences taken at random from the Lancaster Oslo Bergen corpus of printed British English. Of these, 634 were analyzed successfully by the CLE -- that is parsed and produced at least one logical form for meaning. 67% of the 634 were estimated to be valid meaning representations.

Exercises

Consider the Prolog rules for auxiliary verbs. Show how the verb phrase "could have been halting" is parsed usng these rules.
Parse the question "who does mary love" using the Prolog grammars for questions and gaps that are given in this section.
Augment the Prolog grammatical rules with sufficient lexical and syntactic definitions that will allow them to be equivalent to the Lisp-based grammar discussed in this section. Run your grammar with the sentence "which dogs did he see" and similar sentences to check that your augmented grammar is correct.

References

Allen, Chapter 5
Matthews, Chapter 11
Alshawi, Hayan, The Core Language Engine, SRI International, 1992.