## 10679 - I love Strings!!!

Moderator: Board moderators

### 10679 - I love Strings!!!

I was trying to solve this problem with the naive solution (using strstr).
It gives TLE.
I've created a huge test file and the naive solution solves that huge test file very fast.
So... has the judge a lot of test files ?
I've few motivation to implement a better algorithm when I can't see the naive is slow.

By the way, if strstr is well implemented, a better solution is hard to implement considering there are only 100 queries in a single string, isn't it ?
erdos
New poster

Posts: 32
Joined: Sat Jul 06, 2002 9:38 pm
Location: Lisbon

### Re: 10679

All naive solution have complexity O(n*m) in worst case,
where n - lenght of base string, and m - length of substring.
What for this problem n=100000 and m=1000, O(n*m)=O(100 000 000) multiple it by number of queries and by number of test cases you get
VERY BIG complexity.
Use O(n+m) algo. For example, Knuth-Morris-Pratt.

-best,
Pavel
rotoZOOM
Learning poster

Posts: 63
Joined: Mon Dec 22, 2003 5:05 am
Location: Russia, Chelyabinsk

Is there no other ways to do so other than KMP?

cytmike
Learning poster

Posts: 95
Joined: Mon Apr 26, 2004 1:23 pm
Location: Hong Kong and United States

cytmike wrote:Is there no other ways to do so other than KMP?

I suppose you may use hash function.
But you have to select very good hash-function.
rotoZOOM
Learning poster

Posts: 63
Joined: Mon Dec 22, 2003 5:05 am
Location: Russia, Chelyabinsk

This topic is well-researched. I ended up using Turbo Boyer-Moore. I thought it was slightly unforunate that it became a "guess the algorithm" contest, because my first submission was Ukkonen's algorithm with a suffix tree, but even that got TLE.. (during the contest..)

I'm curious as to what kind of algorithm people used..
Larry
Guru

Posts: 647
Joined: Wed Jun 26, 2002 10:12 pm
Location: Hong Kong and New York City

Nice problem for KMP.
"Everything should be made simple, but not always simpler"
anupam
A great helper

Posts: 405
Joined: Wed Aug 28, 2002 6:45 pm

Another nice algorithm for this problem is Aho-Corasick. But the tricky thing is that the input seems to contain in some queries duplicate patterns, so the tree has to store the same pattern more than once.
Guru

Posts: 724
Joined: Wed Dec 19, 2001 2:00 am
Location: Germany

"Everything should be made simple, but not always simpler"
anupam
A great helper

Posts: 405
Joined: Wed Aug 28, 2002 6:45 pm

All patterns are stored in one tree. Then failure links are created (that have same meaning as in KMP). And then for each node, the first failure link that points to a pattern is recorded as "output link". This is necessary because some pattern may be sub-pattern of a bigger pattern and would not be found otherwise.
Example:
2 patterns: string, in
the tree looks like this:
Code: Select all
`______|     |s(1) i(2)|     |t(3) n(4)|r(5)|i(6)|n(7)|g(8)`

I numbered the nodes in bfs order. This order is necessary to build the failure links.
or if this is not a node in the tree, it is the failure link of node pointed to by failure link of parent + last read character ... until either a node in the tree is found or the root is reached.
So in the example, there is a failure link from 6 to 2, from 7 to 4, and from all other nodes to the root.
When we are matching a text, it is sufficient to go through the text once.
For each character in the string, we try to follow an edge in the tree. If there is no such edge, follow a failure link (like in KMP). For each reached node, follow the output link in order not to miss a string to output.
Lets search in the string "strings".
We follow the edges in the tree 0->1, 1->3, 3->5, 5->6, 6->7
now we reach output link to 4 and see that we matched the pattern "in".
then we continue with 7 -> 8. Now we have matched the pattern "string".
and then we have to take failure link, because there is no edge labeled 's'.

I think you will also find some information in the internet if you use google.
Guru

Posts: 724
Joined: Wed Dec 19, 2001 2:00 am
Location: Germany

i tried rabin karp by hashing the strings modulo some prime numbers.
what ever prime number i tried, i got TLE, and now i have the code Ac in 4.5 secs.
could some one tell me how to efficiently chose the prime in rabin karp

bye
abi

abishek
Experienced poster

Posts: 131
Joined: Mon Dec 15, 2003 5:41 am

Uhh..our team solved this during the contest by, guess what, REVERSING the test strings, and then using strstr..the reversing process killed the tricks in the input, according to my teammate.
mido
Learning poster

Posts: 78
Joined: Sun Jun 16, 2002 9:48 pm
Location: Cairo,Egypt

### poor data set

I applied Boyer's Moore and got AC in 0.806 seconds..

.. but after reversing and using strstr took only 0.189 s

I am waiting for a rejudge eventhough I am currently ranked 2 in this problem.

sohel
Guru

Posts: 862
Joined: Thu Jan 30, 2003 5:50 am
Location: University of Texas at San Antonio

### book

If you can, check out 'Algorithms on Trees and Sequences' -
Aho-Corasick is described there in detail.
pminkov
New poster

Posts: 8
Joined: Wed Oct 17, 2001 2:00 am
Location: Bulgaria, Sofia

### Re: poor data set

sohel wrote:I applied Boyer's Moore and got AC in 0.806 seconds..

.. but after reversing and using strstr took only 0.189 s

I am waiting for a rejudge eventhough I am currently ranked 2 in this problem.
Well, I don't think that the "reversing" technique is so bad after all. It is not easy to come up with this idea during contest time! One has to be really smart to think of this.

Anyway, it would be great if I could learn the advanced algorithms you guys mentioned above. It takes time, of course
7th Contest of Newbies
Date: December 31st, 2011 (Saturday)
Time: 12:00 - 16:00 (UTC)
URL: http://uva.onlinejudge.org
Observer
Guru

Posts: 570
Joined: Sat May 10, 2003 4:20 am
Location: Hong Kong

### odd

Something very interesting :

After reversing the strings and using strstr() took .187 seconds...

.. later I thought of a better idea.. ie reverse then apply Boyer Moore.
.. but that gave TLE..

So it seems that for some cases strstr is better than Boyer Moore.

.. can any one tell me the reason why I get TLE with the modified method.

sohel
Guru

Posts: 862
Joined: Thu Jan 30, 2003 5:50 am
Location: University of Texas at San Antonio

Next