distinct substrings

Let's talk about algorithms!

Moderator: Board moderators

distinct substrings

Postby sohel » Mon Jan 09, 2006 3:37 pm

Is there any easy and efficient way of finding the total number of distinct substrings in a given string of length 1000.
User avatar
sohel
Guru
 
Posts: 862
Joined: Thu Jan 30, 2003 5:50 am
Location: University of Texas at San Antonio

Postby misof » Mon Jan 09, 2006 5:28 pm

Ah, bytecode :D

There is both an easy way and an efficient way :)

First, in O(N^2) you can create a trie containing all suffixes, the number of different substrings == the number of its nodes.

The answer can also be computed from a suffix array (that can still quite easily be constructed in O(N log N) ) with longest common prefixes for subsequent elements of the array.

The optimal way is to construct a suffix tree in O(N) and deduce the answer from this tree. (The suffix tree is basically just a compressed version of the trie from the first solution.)
User avatar
misof
A great helper
 
Posts: 430
Joined: Wed Jun 09, 2004 1:31 pm

Postby 7erry » Mon Jun 18, 2007 6:45 pm

Please tell me how to do this in O(n^2)
I think inserting a substring costs O(n). With n^2 substrings we need O(n^3) to complete constructing trie, right?
7erry
New poster
 
Posts: 6
Joined: Wed Feb 21, 2007 7:40 pm

Postby mf » Tue Jun 19, 2007 12:46 am

You should insert only the suffixes.
n suffixes * O(n) insertion time = O(n^2).
mf
Guru
 
Posts: 1244
Joined: Mon Feb 28, 2005 4:51 am
Location: Zürich, Switzerland

Postby 7erry » Tue Jun 19, 2007 10:51 am

This is my program which got WA in spoj, please help me...
http://www.spoj.pl/problems/DISUBSTR/
Code: Select all
const
        lowc=char(ord(32));
        highc=char(ord(127));
        maxs=1000;
type
        trie=^trienode;
        trienode=array[lowc..highc] of trie;
var
        s:array[1..maxs] of char;
        k,i,j,ss:integer;
        a:trie;
        res:longint;
procedure readf;
 var
        c:char;
 begin
   new(a);
   for c:=lowc to highc do a^[c]:=nil;
   read(c);
   while (ord(c)<32) or (ord(c)>127) do read(c);
   ss:=0;
   while (7=7) do
    begin
      inc(ss);
      s[ss]:=c;
      read(c);
      if (ord(c)<32) or (ord(c)>127) then break;
    end;
 end;
procedure check(x,y:integer);
 var
        i:integer;
        p,q:trie;
        c:char;
 begin
   p:=a;
   for i:=x to y do
    if p^[s[x]]=nil then
     begin
       new(q);
       for c:=lowc to highc do q^[c]:=nil;
       p^[s[x]]:=q;
       p:=q;
       inc(res);
     end else p:=p^[s[x]];
 end;
BEGIN
  read(k);
  while k>0 do
   begin
     dec(k);
     readf;
     res:=0;
     for i:=1 to ss do check(i,ss);
     writeln(res);
   end;
END.
7erry
New poster
 
Posts: 6
Joined: Wed Feb 21, 2007 7:40 pm

Postby temper_3243 » Sun Jan 06, 2008 7:01 pm

Can you please tell me which n*(n+1)/2 - sigma of (LCP) gives the answer.

Say the alphabets are unique then there are n*(n+1)/2 but how does subtracting LCP from that give the answer. Can someone take the pains to explain me

For abcd (len1 = n ) , (len2 = n-1 ) , len3 = n-2 ,so sigma = n*n+1/2

Code: Select all
abcd= (n) + (n-1) + (n-2) + ...

abcd
bcd
cd
d



abab = 1(abab) + 2(aba,bab) + (3-1) { ab,ba,ab} + {4 -2} (a,b} =7


ab
abab(2)
b (0)
bab (1)

 
temper_3243
Experienced poster
 
Posts: 105
Joined: Wed May 25, 2005 7:23 am


Return to Algorithms

Who is online

Users browsing this forum: No registered users and 0 guests