We present a unified and systematic approach to basic principles of Arbology, a new algorithmic discipline focusing on algorithms on trees. Stringology, a highly developed algorithmic discipline in the area of string processing, can use finite automata as its basic model of computation. For various kinds of linear notations of ranked and unranked ordered trees it holds that subtrees of a tree in a linear notation are substrings of the tree in the linear notation. Arbology uses pushdown automata reading such linear notations of trees as its basic model of computation. Basic principles known from stringology are used for the construction of particular arbology algorithms, in which the underlying tree structure is processed with the use of the pushdown store. Arbology results are shown for the basic problems subtree matching and tree indexing for ranked and unranked ordered trees.
This paper introduces and discusses a modification of pushdown automata. This modification is based on two-sided pushdowns into which symbols are pushed from both ends. These pushdowns are defined over free groups, not free monoids, and they can be shortened only by the standard group reduction. We demonstrate that these automata characterize the family of recursively enumerable languages even if the free groups are generated by no more than four symbols.
A new kind of a deterministic pushdown automaton, called a \emph{Tree Compression Automaton}, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with n nodes, the automaton has at most n+1 states, its transition function cardinality is at most 4n and there are 2n+1 pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of logn, the construction of the automaton takes linear time and space with respect to the length n of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.