August 13, 2004

combined grammar update

We fixed a tiny error in our tree file. A node labeled t is in fact supposed to be labeled t2. Now rec works with our new combined grammar. Now there is an example of rec returning the tree with the proper states for "does Carol t2 think that Alice did see the man with a telescope".

CP gets the state '@' meaning that the tree is accepted.

CP @[0](~,~,CP CP[3](~,Cb Cb[0](~,that C[1](Ib Ib[0](~,did I[1](VP VP[0](~,see Vt[0](DP DP[0](~,~,DP DP[0](~,the D[0](man Nc[0](~,~,~),~,~),~)),~,~),VP VP[1](~,VP VP[0](PP PP[0](~,with P[0](DP DP[0](~,~,DP DP[0](~,a D[0](telescope Nc[0](~,~,~),~,~),~)),~,~),~),~,~),~)),~,~),IP IP[0](~,DP DP[1](Ib Ib[0](~,~,~),~,DP DP[0](~,Alice Np[0](~,~,~),~)),~)),~,~),Cb Cb[0](~,does I[1](Ib Ib[0](~,t2 It[1](VP VP[0](~,think Vb[1](Cb Cb[0](~,~,~),~,~),~),~,~),IP IP[0](~,DP DP[1](Ib Ib[0](~,~,~),~,DP DP[0](~,Carol Np[0](~,~,~),~)),~)),~,~),~)),~))

The fixed automaton:

3 0 ~
Alice Np ~
Carol Np ~
Bob Np ~
man Nc ~
see Vt ~
with P ~
telescope Nc ~
a D ~
the D ~
CP CP ~
D D ~
Cb Cb ~
C C ~
IP IP ~
I I ~
DP DP ~
N N ~
VP VP ~
PP PP ~
who DPwh ~
does I ~
like Vt ~
think Vb ~
t2 It ~
t DPt ~
Cb Cb ~
Ib Ib ~
VP VP ~
that C ~
CP CP ~
IP IP ~
Cb* Cb* ~
did I ~
does* I* ~
CP @ CP[3](~,Cb(~,C[1](Ib(~,I[1](VP(~,Vt(DP(~,~,~),~,~),~),~,~),~),~,~),~),~)
Ib Ib IP(~,DP[1](Ib(~,~,~),~,~),~)
DP DP DP(~,D(Nc,~,~),~)
DP DP DP(~,Np,~)
VP VP VP[1](~,VP(PP(~,P(DP(~,~,~),~,~),~),~,~),~)
DP DP DP(~,DP(PP(~,P(DP(~,~,~),~,~),~),~,~),~)
CP @ CP[2](~,DPwh[1](Cb(~,C[1](Ib(~,I[1](VP(~,Vt[1](DPt(~,~,~),~,~),~),~,~),~),~
,~),~),~,~),~)
Cb Cb Cb(~,I[1](Ib(~,It[1](VP(~,Vb[1](Cb(~,~,~),~,~),~),~,~),~),~,~),~)
Ib Ib IP[0](~,DP[1](Ib(~,~,~),~,~),~)
CP @ CP[2](~,DPwh[1](Cb*(~,I*[1](Ib(~,It[1](VP(~,Vt[1](DPt(~,~,~),~,~),~),~,~),~
),~,~),~),~,~),~)
CP @ Cb*[1](~,I*[1](Ib(~,It[1](VP(~,Vt[1](DP(~,~,~),~,~),~),~,~),~),~,~),~)

Posted by lemanal at 04:12 PM

Maketree

We've created a program called maketree that can be accessed at /clients/users/kernco/maketree that provides an interface to construct trees and output them to files. I have given universal read and execute access to it. This will hopefully help us create working tree files quicker. There are two things we haven't fixed yet. The first is that if you try to set the head of a node and give it an illegal value, an assertion will fail and the program will end. The second is that, while you can use showtree to graphically view the progress of your tree, it will end the program afterwards.

Posted by kernco at 04:08 PM

August 12, 2004

stippled trees

We combined the 3d grammar from wrapping of trees with a new grammar Jim gave us. This entailed rethinking a lot of the heads we had already set. They seem to be correct now, but we did not manage to get rec working on our first test tree. The tree's 1d yield is "does Carol t think Alice did see the man with the telescope". Other than that, we managed to get showtree to show parenthood in the classic tree representation sense with dotted lines.

combine.png

Posted by lemanal at 04:47 PM

August 11, 2004

An example grammar

We have a working grammar and two trees that represent the string "Alice did see the man with a telescope". In the parse trees shown below, the first structure implies that Alice used a telescope to see the man ("with a telescope" is an adverbial phrase of "did see'); while in the second parse tree, the structure implies that the man that Alice saw was carrying a telescope ("with a telescope" is an adjective phrase describing "the man").

example1.png

example2.png

Here are the trees shown in 2-dimensional form using our project algorithm.

example3.png

example4.png

The string yield of both trees is "Alice did see the man with a telescope."

Here is our paper with the few revisions we made today. This is most likely the last revision we will make this summer.

Posted by kernco at 02:11 PM

August 09, 2004

The Monkey Wears Rollerskates

We have a new version of the paper:
Representing Multidimensional Trees

Changes:
-Rewrote justification for Corollary to Lemma 2
-Changed reasoning for why the N3(d) formula is correct.
-Split Embed section into more subsections
-Modified proof of Lemma 3
-Added paragraph before last paragraph of Embed section
-Added justification for claim in last paragraph of Embed section
-Reworked the explanation of project.
-Added a section at the end regarding rec.

The excavate section needs work, but since we haven't actually implemented the function, I'm not sure I understand it enough to elaborate or make it any clearer.

Jim, is there a reason for the section "Embed" to still be called "Embed"? It seems like since we refer to the process as factoring throughout the paper, we should just change the name of the function and the section, and get rid of the term "embed" completely. What do you think?

Posted by kernco at 03:42 PM | Comments (1)

To Theory Group Members

For the fall semester, we are supposed to think about how the arbitrary dimensional automaton class will be implemented as a polymorphic data structure. We want the class to be able to adapt its size to the dimensionality, so it grows linearly with the dimension but does not need to be recompiled for every different dimension.

Remember that the automaton is stored as a vector of states and the root label and will return the new state. We want to be able to index into this in constant time. The vector is size equal to the dimensionality. It seems like the vector class itself would be sufficient to make the class polymorphic, however we may want to use a hash structure to eliminate wasted space.

Posted by kernco at 11:27 AM | Comments (1)

Goals

Here are our goals for the rest of the week:
Write up our tree recognizing algorithm in detail
Write up the question on polymorphism of the automaton class
Write up examples using Jim's new grammar
Small modification to paper

And, our long term goals:
Change automaton class to only use 2-branching trees
Change recognizer algorithm to embed into 2-branching and then excavate to n-branching trees
Change 3d CKY to return the parse forest.
Work on n-dimensional CKY algorithm
Continue working on project section of paper

Posted by lemanal at 11:25 AM

August 06, 2004

More Paper Revisions

Paper

We made changes to the Embed section. Lots of little fixes and worked on the proof for the theorem (was a lemma in the last version).

Posted by kernco at 03:13 PM

August 05, 2004

Paper modifications

Paper

Changed:
-Introductory section modified
-Ordering of Embed section changed
-Lemma 3 added
-Excavate moved to its own section
-Small fixes

Posted by kernco at 04:33 PM

Concerns about Embed and Rec

We have been considering how our tree recognizer would work with an embeded automaton. If we try to embed the tree before running the recognizer there will be important information lost about where each label in the input tree comes from in terms of the new unique labels from the automaton. It would be impossible to run rec on a tree that has not yet been embeded because the trees would not match. There may be some way to either build an embeded tree as we run rec and then excavate that or simply run rec without an embeded automaton.

In terms of storing the factored trees we attempted to think of ways to retain the information of how they were factored in the new unique labels. We could not come up wtih a satisfactory solution to this problem and thought that adding a pointer to each tree in the automaton which would point to its origin, the tree that was factored into it would make sense. We think the best way would be to create an inherited class from the tree class for automaton rules.

Posted by lemanal at 04:23 PM

Todo

Here are our goals for the next bit:

fix the introduction to the paper
fix the embed section
rewrite the automaton class hash function to only use 2-branching local trees.
modify embed to incorporate a notion of how a tree was factored in the trees it was factored into
write excavate function to rebuild n-branching structure from 2-branching trees

Posted by lemanal at 11:13 AM

August 04, 2004

prettyPrintnd

We have updated the crazy number of prettyPrint functions we have to include one that works with our current style of trees. This output does not conform to our readTree() function so it may not be used as input for our programs, but it makes it much easier for a human parser. We took precautions to try and make these trees not run off the edge of the screen too quickly. uglyPrint() CP @[0](~,~,CP CP[2](~,who DPwh[1](Cb Cb[0](~,that C[1](Y Y[0](~,does I[1](VP VP[0](~,like Vt[1](t DPt[0](~,~,~),~,~),~),~,~),IP IP[0](~,Carol DP[1](Ib Ib[0](~,~,~),~,~),~)),~,~),Cb Cb[1](~,does I[1](Y Y[0](~,t2 It[1](VP VP[0](~,think Vb[1](Cb Cb[0](~,~,~),~,~),~),~,~),IP IP[0](~,Alice DP[1](Ib Ib[0](~,~,~),~,~),~)),~,~),~)),~,~),~)) prettyPrint()
CP @
 (~,
  ~,
  CP CP
  | (~,
  |  who DPwh
  |  | (Cb Cb
  |  |  | (~,
  |  |  |  that C
  |  |  |  | (Y Y
  |  |  |  |  | (~,
  |  |  |  |  |  does I
  |  |  |  |  |  | (VP VP
  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  like Vt
  |  |  |  |  |  |  |  | (t DPt
  |  |  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  ~),
  |  |  |  |  |  IP IP
  |  |  |  |  |  | (~,
  |  |  |  |  |  |  Carol DP
  |  |  |  |  |  |  | (Ib Ib
  |  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  ~)
  |  |  |  |  |  ),
  |  |  |  |  ~,
  |  |  |  |  ~),
  |  |  |  Cb Cb
  |  |  |  | (~,
  |  |  |  |  does I
  |  |  |  |  | (Y Y
  |  |  |  |  |  | (~,
  |  |  |  |  |  |  t2 It
  |  |  |  |  |  |  | (VP VP
  |  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  |  think Vb
  |  |  |  |  |  |  |  |  | (Cb Cb
  |  |  |  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  IP IP
  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  Alice DP
  |  |  |  |  |  |  |  | (Ib Ib
  |  |  |  |  |  |  |  |  | (~,
  |  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  |  ~,
  |  |  |  |  |  |  |  |  ~),
  |  |  |  |  |  |  |  ~)
  |  |  |  |  |  |  ),
  |  |  |  |  |  ~,
  |  |  |  |  |  ~),
  |  |  |  |  ~)
  |  |  |  ),
  |  |  ~,
  |  |  ~),
  |  ~)
  )
Posted by lemanal at 04:10 PM

Revisions and other news

Paper revision:
Version 0.9.2

Changes:
-Added example to the end of Headed Trees section
-Added section in Embed about complexity
-Added section after embed code

We are going to try to write a pretty print function for the flat form of trees, and get some screenshots using Ian's showtree of the parse trees we created from the Wrapping of Trees grammars.

Tomorrow we are going to go back to talking about CKY. We should review our inference rules and other previous posts about CKY, and also look at the inference rules in the Shieber, Shabes, Pereira Paper.

Posted by kernco at 02:39 PM

August 03, 2004

illegal links

We have been considering the number of new automaton rules that will be added depending on the number of links which are outside of the branching factor. In comparing full 3-dimensional n-branching local trees to 3d (n-1)-branching local trees we have come to some conclusions. We consider all links that are not in an (n-1)-branching local tree but are in a n-branching local tree illegal. We then considered the number of illegal links in each dimension. in a 3d local tree converting from 3-branching to 2-branching there are 3 illegal 2d links and 7 illegal 1d links. But, as we fix each 2d link we will fix a 1d link so after we fixed the three 2d links there will only be 4 1d links left to fix. The method of decreasing the branching factor of an automaton by embeding local trees into sets of local trees is only meant to fix one link per new rule, but in fact fixes one link in each dimension less than or equal to d. Taking this into account the number of new rules required turns out to be equal to the number of illegal 1d links.Asymptotically the number of rules grows at the same speed as the number of new nodes because the number of 1d links is a constant factor of the number of total links.

Posted by lemanal at 04:51 PM

Corrections

We were not looking at arbitrary branching trees correctly yersterday. The actual formula for the number of nodes in a 3-branching tree in d-dimensions is:
N3(d) = ( N3(d - 1) - 1 ) * N3(d - 1) + 2 = N3(d - 1)^2 - N3(d - 1) + 2

When solving this recurrence, the largest term becomes N3(d)^(2^d), so this is big-omega(2^(2^d)). Instead of being exponential, it is in fact hyper-exponential.

Posted by kernco at 03:41 PM

August 02, 2004

Paper revisions

Version 0.9.1

Changes:
-Rearranged some of the portions in the Embed section.
-Changed term "automaton rule" to "local tree" in the Embed section and indicated in the first paragraph that we are working exclusively with automaton rules.
-Reworded documentation of function headers in Embed code so that both the preconditions and postconditions are implicitly stated, rather than the preconditions being implicit and the postconditions explicit.

Posted by kernco at 05:01 PM

Asymptotic Growth of n-branching Local Trees

Through drawing many comlicated trees and counting nodes, we were eventually able to discover a pattern and come up with a formula for calculating the maximum number of nodes allowed in a d-dimensional n-branching local tree. The formula is recursive.

Notation:
Nodes(d, n) = The maximum number of nodes in a d-dimensional, n-branching local tree.

Base:
Nodes(2, n) = n + 1

Recursion:
Nodes(d, n) = ( (Nodes(d - 1, n) - 1) * (n - 1) ) + 2

The complexity of an n-branching tree as the dimension increases is O(x^(n-1)). We are more interested, however in the complexity of d-dimensional tree as n increases. We do not have proof using the recursive equation, since the recursion is done with n being held constant, but from looking at the chart we constructed, it appears that it is O(x^(d-1)). We will look at this more closely later.

The number of illegal links in a given local tree is equal to Nodes(d, n) - Nodes(d, 2). Nodes(d, 2) has no illegal links, since the notion of an illegal link is a link that makes the branching factor greater than two. In fact, the tree it represents contains the only possible legal links. Since any higher branching tree would also contain these links, we must subtract them from the number. Also, since we know there aren't any more legal links possible, we know that we don't need to subtract anything more.

Since every illegal link is turned into a new automaton rule, the size of the automaton using embedding would increase with the same complexity as the number of nodes of the n-branching tree. We did notice, however, that sometimes a new rule can eliminate more than one illegal link. In the maximally filled n-branching tree, removing an i-dimensional illegal link will remove i illegal links. This is because removing a 2-dimensional illegal link will move the 1-dimensional illegal link that was a part of that local tree into a legal position. With a 3-dimensional illegal link, the same thing happens with a 2- and 1-dimensional link in the local tree. Looking at the tree below, fixing the illegal link labeled 'A' will fix the link labeled 'a' legal as well. Fixing the link 'B' will fix both links labeled 'b' legal. However, we don't believe this decreases the asymptotic complexity. This is another thing we will have to look at more closely.

illegal.png

Here is a chart we made. Some of it was made by drawing the trees by hand. After we used it to come up with the formula, we filled the rest in with the formula.
chart.png

Posted by kernco at 04:45 PM

Modified trees from Wrapping of Trees

We modified our wrap1.aut based on the bridge verbs and subj-aux inversion automaton from the Wrapping of Trees paper. We attempted to make it more general and allow for a greater number of possible parsed sentences as well as adjusting the head for the 3-dimensional root. We came up with a couple new trees for testing which seemed to work.

Our trees now look like:

wrap1.tree
"Who does Alice think that Carol does like"
CP(~,~,CP[2](~,who[1](Cb(~,that[1](Y(~,does[1](VP(~,like[1](t,~,~),~),~,~),IP(~,Carol[1](Ib,~,~),~)),~,~),Cb[1](~,does[1](Y(~,t2[1](VP(~,think[1](Cb,~,~),~),~,~),IP(~,Alice[1](Ib,~,~),~)),~,~),~)),~,~),~))

wrap1.tree2
"does* Bob like Alice"
CP(~,~,Cb*[1](~,does*[1](Y(~,t2[1](VP(~,like[1](Alice(~,~,~),~,~),~),~,~),IP[0](~,Bob[1](Ib(~,~,~),~,~),~)),~,~),~)

wrap1.tree3
"Carol does like Alice"
CP(~,~,Y[0](~,does[1](VP(~,like[1](Alice(~,~,~),~,~),~),~,~),IP[0](~,Carol[1](Ib(~,~,~),~,~),~)))

And, here's the automaton file

wrap1.aut
3 0 ~
who DPwh ~
does I ~
like Vt ~
Alice DP ~
Carol DP ~
Bob DP ~
think Vb ~
t2 It ~
t DPt ~
Cb Cb ~
Ib Ib ~
VP VP ~
that C ~
CP CP ~
IP IP ~
Cb* Cb* ~
does* I* ~
CP @ CP[2](~,DPwh[1](Cb(~,C[1](Y(~,I[1](VP(~,Vt[1](DPt(~,~,~),~,~),~),~,~),~),~,~),~),~,~),~)
Cb Cb Cb[1](~,I[1](Y(~,It[1](VP(~,Vb[1](Cb(~,~,~),~,~),~),~,~),~),~,~),~)
Y Y IP[0](~,DP[1](Ib(~,~,~),~,~),~)
CP @ CP[2](~,DPwh[1](Cb*(~,I*[1](Y(~,It[1](VP(~,Vt[1](DPt(~,~,~),~,~),~),~,~),~),~,~),~),~,~),~)
CP @ Cb*[1](~,I*[1](Y(~,It[1](VP(~,Vt[1](DP(~,~,~),~,~),~),~,~),~),~,~),~)
CP @ Y[0](~,I[1](VP(~,Vt[1](DP(~,~,~),~,~),~),~,~),~)

Posted by lemanal at 04:23 PM