## distinct substrings

Moderator: Board moderators

### distinct substrings

Is there any easy and efficient way of finding the total number of distinct substrings in a given string of length 1000.

sohel
Guru

Posts: 862
Joined: Thu Jan 30, 2003 5:50 am
Location: University of Texas at San Antonio

Ah, bytecode

There is both an easy way and an efficient way

First, in O(N^2) you can create a trie containing all suffixes, the number of different substrings == the number of its nodes.

The answer can also be computed from a suffix array (that can still quite easily be constructed in O(N log N) ) with longest common prefixes for subsequent elements of the array.

The optimal way is to construct a suffix tree in O(N) and deduce the answer from this tree. (The suffix tree is basically just a compressed version of the trie from the first solution.)

misof
A great helper

Posts: 430
Joined: Wed Jun 09, 2004 1:31 pm

Please tell me how to do this in O(n^2)
I think inserting a substring costs O(n). With n^2 substrings we need O(n^3) to complete constructing trie, right?
7erry
New poster

Posts: 6
Joined: Wed Feb 21, 2007 7:40 pm

You should insert only the suffixes.
n suffixes * O(n) insertion time = O(n^2).
mf
Guru

Posts: 1244
Joined: Mon Feb 28, 2005 4:51 am
Location: Zürich, Switzerland

http://www.spoj.pl/problems/DISUBSTR/
Code: Select all
`const        lowc=char(ord(32));        highc=char(ord(127));        maxs=1000;type        trie=^trienode;        trienode=array[lowc..highc] of trie;var        s:array[1..maxs] of char;        k,i,j,ss:integer;        a:trie;        res:longint;procedure readf; var        c:char; begin   new(a);   for c:=lowc to highc do a^[c]:=nil;   read(c);   while (ord(c)<32) or (ord(c)>127) do read(c);   ss:=0;   while (7=7) do    begin      inc(ss);      s[ss]:=c;      read(c);      if (ord(c)<32) or (ord(c)>127) then break;    end; end;procedure check(x,y:integer); var        i:integer;        p,q:trie;        c:char; begin   p:=a;   for i:=x to y do    if p^[s[x]]=nil then     begin       new(q);       for c:=lowc to highc do q^[c]:=nil;       p^[s[x]]:=q;       p:=q;       inc(res);     end else p:=p^[s[x]]; end;BEGIN  read(k);  while k>0 do   begin     dec(k);     readf;     res:=0;     for i:=1 to ss do check(i,ss);     writeln(res);   end;END.`
7erry
New poster

Posts: 6
Joined: Wed Feb 21, 2007 7:40 pm

Can you please tell me which n*(n+1)/2 - sigma of (LCP) gives the answer.

Say the alphabets are unique then there are n*(n+1)/2 but how does subtracting LCP from that give the answer. Can someone take the pains to explain me

For abcd (len1 = n ) , (len2 = n-1 ) , len3 = n-2 ,so sigma = n*n+1/2

Code: Select all
`abcd= (n) + (n-1) + (n-2) + ...abcdbcdcddabab = 1(abab) + 2(aba,bab) + (3-1) { ab,ba,ab} + {4 -2} (a,b} =7ab abab(2)b (0)bab (1) `
temper_3243
Experienced poster

Posts: 105
Joined: Wed May 25, 2005 7:23 am