permutations again

Write here if you have problems with your C source code

Moderator: Board moderators

permutations again

Postby WR » Fri Aug 13, 2004 11:47 am

Hi,

I've found lots of posts telling how to generate permutations of a string but none about determining whether a number (with let's say 10 digits) is a permutation of another number.

Right now I convert those numbers to strings, sort them and then strcmp them. It works and is reasonable fast, but is there a better algorithm?
WR
Experienced poster
 
Posts: 145
Joined: Thu Nov 27, 2003 9:46 am

Postby little joey » Fri Aug 13, 2004 12:05 pm

For such short string (10 digits) I think it's as fast as it can get. For longer strings (say >100 digits), you could calculate two histograms and compare them. That method is O(N), while the sorting method is O(N*logN) (N is number of digits in a string).
User avatar
little joey
Guru
 
Posts: 1080
Joined: Thu Dec 19, 2002 7:37 pm

Postby WR » Fri Aug 13, 2004 1:01 pm

Thanks a lot for the quick response!
WR
Experienced poster
 
Posts: 145
Joined: Thu Nov 27, 2003 9:46 am

Postby little joey » Sun Aug 15, 2004 1:25 pm

Hmm. I was wrong in my previous post. The complexity of the sorting method is of course dependant on the complexity of the sorting function. In this case the O(N) bucket sort can be used, so the sorting method can be also made O(N).

I wrote a test program to test the three different methods (sorting using quick sort, sorting using bucket sort and the histogram method), and here are the results:
Code: Select all
     N   QSORT   BSORT   HISTO
 ----- ------- ------- -------
     1      10      10      10
     2      20      10      10
     5      30      10      10
    10      50      20      10
    20     120      20      20
    50     340      50      20
   100     770      80      40
   200    1720     140      80
   500    4890     320     180
  1000   10610     630     360
  2000   20370    1220     720
  5000   62760    3070    1780
 10000  134270    6140    3540

Here the numbers are the times in milliseconds to perform 10000 tests using two random strings of length N (using the very inaccurate function clock()). The number of succesful tests is about equal to the number of failing tests (in about half the cases the second string is a permutation of the first). The numbers support the O(N*logN) complexity for the QSORT method and the O(N) complexity for the BSORT and HISTO methods. The fact that BSORT is slower then HISTO doesn't mean that HISTO is the better method, but only that my HISTO implementation is more efficient than my BSORT method. Careful inspection of both methods reveal that they in fact are two different implementations of the same algorithm.

For those interested in my code, here it is:
[c]#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define TESTS 10000
#define MAXSTRING 10000

const int DIGITS[]={1,2,5,10,20,50,100,200,500,1000,2000,5000,10000};

int charcompare(const void *p1, const void *p2){

return *((char *)p1) - *((char *)p2);
}

int q_ispermutation(char *s1, char *s2){
static char c1[MAXSTRING+1],c2[MAXSTRING+1];

strcpy(c1,s1);
qsort(c1,strlen(c1),sizeof(char),charcompare);
strcpy(c2,s2);
qsort(c2,strlen(c2),sizeof(char),charcompare);
return !strcmp(c1,c2);
}

void bsort(char *s){
int bucket[10];
int i,j;

for(i=0;i<10;i++) bucket[i]=0;
for(j=strlen(s)-1;j>=0;j--) bucket[s[j]-'0']++;
for(i=0;i<10;i++){
for(j=bucket[i];j>0;j--) *(s++)='0'+i;
}
}


int b_ispermutation(char *s1, char *s2){
static char c1[MAXSTRING+1],c2[MAXSTRING+1];

strcpy(c1,s1);
bsort(c1);
strcpy(c2,s2);
bsort(c2);
return !strcmp(c1,c2);
}

int h_ispermutation(char *s1, char *s2){
int histo[10];
int i;

for(i=0;i<10;i++) histo[i]=0;
for(i=strlen(s1)-1;i>=0;i--) histo[s1[i]-'0']++;
for(i=strlen(s2)-1;i>=0;i--) histo[s2[i]-'0']--;
for(i=0;i<10;i++) if(histo[i]) return 0;
return 1;
}

void randomize_string(char *s){
int i,j,len;
char temp;

len=strlen(s);
for(i=0;i<len-1;i++){
j=i+rand()%(len-i);
temp=s[i];
s[i]=s[j];
s[j]=temp;
}
}

int main(){
int caseno,i,j;
static char s1[TESTS][MAXSTRING+1],s2[TESTS][MAXSTRING+1];
clock_t t0,tq,tb,th;

printf(" N QSORT BSORT HISTO\n");
printf(" ----- ------- ------- -------\n");

for(caseno=0;caseno<sizeof(DIGITS)/sizeof(int);caseno++){
for(i=0;i<TESTS;i++){
for(j=0;j<DIGITS[caseno];j++) s1[i][j]='0'+rand()%10;
s1[i][DIGITS[caseno]]='\0';
if(rand()%2){
for(j=0;j<DIGITS[caseno];j++) s2[i][j]='0'+rand()%10;
s2[i][DIGITS[caseno]]='\0';
}
else{
strcpy(s2[i],s1[i]);
randomize_string(s2[i]);
}
}
t0=clock();
for(i=0;i<TESTS;i++) q_ispermutation(s1[i],s2[i]);
tq=clock()-t0;
t0=clock();
for(i=0;i<TESTS;i++) b_ispermutation(s1[i],s2[i]);
tb=clock()-t0;
t0=clock();
for(i=0;i<TESTS;i++) h_ispermutation(s1[i],s2[i]);
th=clock()-t0;
printf("%6d%8d%8d%8d\n",DIGITS[caseno],tq/1000,tb/1000,th/1000);
}

return 0;
}
[/c]
User avatar
little joey
Guru
 
Posts: 1080
Joined: Thu Dec 19, 2002 7:37 pm

Postby abishek » Sun Aug 15, 2004 2:13 pm

i don't understand how in the case of two random strings the number of failure about as equal to the number of success. i'd be suprised if you can get two random strings to be permutations for the alphabet size of N=10000. indeed Bucket sort and histogram are same techniques, however, one could use the histogram technique a bit more efficiently for this specific problem.
bye
abi
User avatar
abishek
Experienced poster
 
Posts: 131
Joined: Mon Dec 15, 2003 5:41 am

Postby little joey » Sun Aug 15, 2004 2:53 pm

abishek wrote:i don't understand how in the case of two random strings the number of failure about as equal to the number of success. i'd be suprised if you can get two random strings to be permutations for the alphabet size of N=10000.

It's the if(rand()%2) in my code. In about half the cases a new random string is created for s2[i], in the other half of the cases (else part) s2[i] is just a random permutation of s1[i].

abishek wrote:indeed Bucket sort and histogram are same techniques, however, one could use the histogram technique a bit more efficiently for this specific problem.

Well, my objective was to demonstrate my statement, not to write the most efficient code. I think the code for b_ispermutation() can be optimized to run about as fast as h_ispermutation(), but that was beside my point.
User avatar
little joey
Guru
 
Posts: 1080
Joined: Thu Dec 19, 2002 7:37 pm


Return to C

Who is online

Users browsing this forum: No registered users and 0 guests

cron