All patterns are stored in one tree. Then failure links are created (that have same meaning as in KMP). And then for each node, the first failure link that points to a pattern is recorded as "output link". This is necessary because some pattern may be sub-pattern of a bigger pattern and would not be found otherwise.
Example:
2 patterns: string, in
the tree looks like this:
- Code: Select all
______
| |
s(1) i(2)
| |
t(3) n(4)
|
r(5)
|
i(6)
|
n(7)
|
g(8)
I numbered the nodes in bfs order. This order is necessary to build the failure links.
Failure link of some node is failure link of parent + last read character,
or if this is not a node in the tree, it is the failure link of node pointed to by failure link of parent + last read character ... until either a node in the tree is found or the root is reached.
So in the example, there is a failure link from 6 to 2, from 7 to 4, and from all other nodes to the root.
When we are matching a text, it is sufficient to go through the text once.
For each character in the string, we try to follow an edge in the tree. If there is no such edge, follow a failure link (like in KMP). For each reached node, follow the output link in order not to miss a string to output.
Lets search in the string "strings".
We follow the edges in the tree 0->1, 1->3, 3->5, 5->6, 6->7
now we reach output link to 4 and see that we matched the pattern "in".
then we continue with 7 -> 8. Now we have matched the pattern "string".
and then we have to take failure link, because there is no edge labeled 's'.
I think you will also find some information in the internet if you use google.