Many of these requirements can be ensured by adding so-called "features" to the lexicon, and then augmenting the grammar rules so that the features (like number and person) play a role in discriminating between correct and incorrect sentences.
The man sees the fish.is correct, but not
The man see the fish.Alongside number is the person feature of noun and verb phrases. Each of the above sentences is in the third person, since its subject is neither the speaker nor the hearer. When the subject of the sentence is the speaker, it is expressed in the first person, as in
I see the fish.When the subject is the hearer, it is identified as the second person, as in
You see the fish.Noun phrase-verb phrase agreement takes person into account as well as number.
s(s(NP,VP)) --> np(NP, Number), vp(VP, Number), terminator.
np(np(D,N), Number) --> det(D, Number), n(N, Number).
np(np(P), Number) --> pro(P, Number).
vp(vp(IV), Number) --> iv(IV, Number).
vp(vp(TV,NP), Number) --> tv(TV, Number), np(NP, _).
det(det(W), Number) --> [W], {det(W, Number)}.
det(the, _).
det(a, s).
n(n(W), Number) --> [W], {n(W, Number)}.
n(dog, s).
n(fish, _).
n(man, s). n(men, p).
n(saw, s).
pro(pro(W), Number) --> [W], {pro(W, Number)}.
pro(he, s).
iv(iv(W), Number) --> [W], {iv(W, Number)}.
iv(cries, s). iv(cry, p).
tv(tv(W), Number) --> [W], {tv(W, Number)}.
tv(sees, s). tv(see, p).
tv(wants, s). tv(want, p).
tv(was, s). tv(were, p).
terminator --> ['.'] ; ['?'] ; ['!'].
This grammar has some interesting characteristics. First, the number feature is prominently carried in rules where it is needed to enforce number agreement among nouns, verbs, determiners, noun phrases, verb phrases, and sentences. Recall the Prolog convention that several occurrences of a variable, like Number, within a single rule must all instantiate to the same value (s or p in this case) whenever that rule is used in a parse.
Second, some words in the lexicon (e.g., the and fish) can represent associate with both singular and plural number, and this is identified by _ (don't care) number entries for those words in the lexicon.
Third, the grammar doesn't take person into account, as it should in reality. This extension is left as an exercise.
Different constructions that follow the main verb in a sentence are
sometimes defined using "verb subcategorization," as in the following (Allen,
p 88):
Add the idea of having prepositional phrases in three general classes:
And we can see the following additional verb subcategorizations:
> (load "LOADP")
> (loadChapter4)
(dog (n (root DOG1) (agr 3s))
(saw (v (root SEE1) (VFORM past) (subcat _np) (agr ?a)))
(NP AGR ?a) --> (ART AGR ?a) (N AGR ?a)
((np) -2> (art (agr ?a)) (head (n (agr ?a))))
(setq *lexicon4-6*
'((a (art (agr 3s) (root A1)))
(be (v (root BE1) (vform bare) (subcat (? s
_adjp _np)) (irreg-pres +)
(irreg-past
+)))
(cry (v (root CRY1) (vform bare) (subcat _none)))
(dog (n (root DOG1) (agr 3s)))
(fish (n (root FISH1) (agr (? a 3s 3p)) (IRREG-PL
+)))
(happy (adj (subcat _vp-inf) (root HAPPY1)))
(he (pro (root HE1) (AGR 3s)))
(is (v (root BE1) (VFORM pres) (SUBCAT (? s
_adjp _np)) (AGR 3s)))
(Jack (name (agr 3s) (root JACK1)))
(man (n (root MAN1) (agr 3s)))
(men (n (root MAN1) (agr 3p)))
(saw (n (root SAW1) (agr 3s)))
(saw (v (root SAW2) (vform bare) (subcat _np)))
(saw (v (root SEE1) (VFORM past) (subcat _np)
(agr ?a)))
(see (v (root SEE1) (VFORM bare) (subcat _np)
(irreg-past +)
(en-pastprt +)))
(seed (n (root SEED1) (AGR 3s)))
(the (art (root THE1) (agr (? a 3s 3p))))
(to (to (vform inf)))
(want (v (root WANT1) (VFORM bare)
(subcat (? s _np _vp-inf _np_vp-inf))))
(was (v (root BE1) (VFORM past) (AGR (? a 1s
3s))
(SUBCAT (? s _adjp _np))))
(were (v (root BE1) (VFORM past) (AGR (? a 2s
1p 2p 3p))
(SUBCAT (? s _adjp _np))))
(+s (+S))
(+ed (+ED))
(+en (+EN))
(+ing (+ING))))
Here is a complete listing of the encoded grammar.
(setq *grammar4-7*
'((headfeatures (s agr) (vp vform
agr) (np agr))
((s (inv -))
-1>
(np (agr ?a)) (head (vp (vform (? v past pres)) (agr ?a))))
((np)
-2>
(art (agr ?a)) (head (n (agr ?a))))
((np)
-3>
(head (pro)))
((vp)
-4>
(head (v (subcat _none))))
((vp)
-5>
(head (v (subcat _np))) (np))
((vp)
-6>
(head (v (subcat _vp-inf))) (vp (vform inf)))
((vp)
-7>
(head (v (subcat _np_vp-inf))) (np) (vp (vform inf)))
((vp)
-8>
(head (v (subcat _adjp))) (adjp))
((vp (vform inf))
-9>
(head (to)) (vp (vform bare)))
((adjp)
-10>
(head (adj)))
((adjp)
-11>
(head (adj (subcat _vp-inf))) (vp (vform inf)))))
Below is a trace of the parse of the sentence "He wants to be happy." shown in Figure 4.9, using this grammar and lexicon. It is obtained by the Lisp function call (BU-parse '(he want +s to be happy)) followed by the function call (show-answers).
S57:<S ((INV -) (AGR 3S) (1 NP50) (2 VP56))> from 0 to 6 from
rule -1>
NP50:<NP ((AGR 3S) (1 PRO44))> from 0 to 1 from rule
-3>
PRO44:<PRO ((LEX HE) (ROOT HE1) (AGR 3S))>
from 0 to 1 from rule NIL
VP56:<VP ((VFORM PRES) (AGR 3S) (1 V52)
(2 VP55))> from 1 to 6 from rule -6>
V52:<V ((AGR 3S) (VFORM PRES) (ROOT WANT1)
(SUBCAT _VP-INF) (1 V45)
(2 +S46))> from 1 to 3 from rule -LEX1>
V45:<V ((LEX WANT) (ROOT WANT1)
(VFORM BARE)
(SUBCAT
(? S6 _NP_VP-INF _VP-INF
_NP)))> from 1 to 2 from rule NIL
+S46:<+S ((LEX +S))> from 2 to
3 from rule NIL
VP55:<VP ((VFORM INF) (AGR -) (1 TO47)
(2 VP54))> from 3 to 6 from rule -9>
TO47:<TO ((LEX TO) (VFORM INF))>
from 3 to 4 from rule NIL
VP54:<VP ((VFORM BARE) (AGR -)
(1 V48)
(2 ADJP53))> from 4 to 6 from rule -8>
V48:<V ((LEX BE)
(ROOT BE1) (VFORM BARE) (SUBCAT _ADJP)
(IRREG-PRES +)
(IRREG-PAST +))> from 4 to 5 from rule NIL
ADJP53:<ADJP ((1
ADJ49))> from 5 to 6 from rule -10>
ADJ49:<ADJ
((LEX HAPPY) (SUBCAT _VP-INF)
(ROOT HAPPY1))> from 5 to 6 from rule NIL
This trace should be compared with the parse tree shown in Figure 4.9
on page 97 of Allen. Note that different levels of indentation here
correspond with different levels of subtree in that figure. It's
a bit tedious to unravel, but a close examination of the indentation structure
reveals the structure of the parse tree itself. All derived feature
values are attached to each different node of the tree, along with the
identifying number of the grammar rule that generated that node.