11860-Document Analyzer

All about problems in Volume CXVIII. If there is a thread about your problem, please use it. If not, create one with its number in the subject.

Moderator: Board moderators

11860-Document Analyzer

Postby durjay » Sun Oct 10, 2010 6:40 pm

I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.
durjay
New poster
 
Posts: 13
Joined: Tue Oct 06, 2009 5:09 pm
Location: ctg

Re: 11860-Document Analyzer

Postby Angeh » Sun Oct 10, 2010 7:35 pm

use roling ...
O(n)
>>>>>>>>> A2
Beliefs are not facts, believe what you need to believe;)
Angeh
Experienced poster
 
Posts: 108
Joined: Sat Aug 08, 2009 2:53 pm

Re: 11860-Document Analyzer

Postby lord_burgos » Wed Oct 20, 2010 9:19 pm

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.


For solve this problem, i'm used Suffix Tree to find words, for example 'a abba a c a' = 1 2 1 3 1, and with this I know the max number of different words = 3 in this case, use interval tree, for know the minimum position of each word, and insert it one by one.

in the example i have 3 different words, [ a, abba, c], the first iteration i don't know the position [ INF, INF, INF], but insert one by one,
1) a = [ 0, INF, INF]
2) abba = [0, 1, INF];
3) a = [2, 1, INF];
4) c = [2, 1, 3]; // in this point, we know all position, and, insert the last one in position 3, so the minimum of all words position is 1, then 3- 1 + 1 = 3 = diff
5) a = [4, 1, 3] // the same, last position 4, the minimum of all is 1 then 4 - 1 + 1 = 4 = diff
the solution for this case is 2 4.

there are many words, and the time to find the minimum position is so slow, for that point i use interval tree, but you can use any other algorithm.
User avatar
lord_burgos
New poster
 
Posts: 43
Joined: Mon Oct 13, 2003 4:54 pm
Location: Mexico

Re: 11860-Document Analyzer

Postby Leonid » Sat Oct 23, 2010 12:18 am

I think Suffix Tree is a bit of an overkill for such a simple problem.

Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length). I leave it up to you to prove why it works :)
Leonid
Experienced poster
 
Posts: 148
Joined: Thu Dec 22, 2005 5:50 pm

Re: 11860-Document Analyzer

Postby peratu » Sat Oct 30, 2010 1:56 pm

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.


You can read the whole document and put it into a string variable. For example, the case one would be stored in a string with this content:

"1. a case is a case, 2. case is not a case~"

Then, write a function that cleans that "dirty" document an puts the clean words into a vector (the clean document). Here is an example in C++:
Code: Select all
void cleanDocument( string dirty_document, vector<string> &clean_document ) {
  string clean_word = "";
 
  for ( unsigned int i = 0; i < dirty_document.size(); ++i )
    if ( dirty_document[i] > 96 && dirty_document[i] < 123 )
      clean_word += dirty_document[i];
    else
      if ( clean_word != "" ) {
        clean_document.push_back( clean_word );
        clean_word = "";
      }
}
peratu
New poster
 
Posts: 7
Joined: Sat Jul 10, 2010 9:22 am

Re: 11860-Document Analyzer

Postby sith » Fri Jun 22, 2012 12:42 pm

Hello

I've got TLE, but I believe that my solution is pretty optimal. I don't have any string comparision, I read all input data . what is wrong?

Code: Select all
import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        InputStreamReader reader = new InputStreamReader(System.in);

        char[] buf = new char[1024 * 1024 * 9];
        try {
            reader.read(buf);
        }
        catch (IOException e) {
        }

        CharArrayReader bufReader = new CharArrayReader(buf);

        int intch;
        try {
            while ((intch = bufReader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(bufReader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();
                            System.out.println("Document: " + i + ": " + poll.start + " " + poll.end);
                            break;
                        }
                    }
                }
            }
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return null;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
sith
Learning poster
 
Posts: 67
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Postby brianfry713 » Tue Jun 26, 2012 12:31 am

Try using BufferedReader and BufferedWriter
brianfry713
Guru
 
Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Postby sith » Tue Jun 26, 2012 6:09 pm

It didn't help. I still get TLE

Here is updated solution

Code: Select all
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader= new BufferedReader(new InputStreamReader(System.in),1024*1024*9);

        BufferedWriter writer = new BufferedWriter(new PrintWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
sith
Learning poster
 
Posts: 67
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Postby brianfry713 » Wed Jun 27, 2012 12:20 am

Input:
Code: Select all
10
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
100
Correct output:
Code: Select all
Document 1: 1 5
Document 2: 1 5
Document 3: 1 5
Document 4: 1 5
Document 5: 1 5
Document 6: 1 5
Document 7: 1 5
Document 8: 1 5
Document 9: 1 5
Document 10: 1 5
brianfry713
Guru
 
Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Postby sith » Wed Jun 27, 2012 8:27 pm

Thank you for your case. It helped me to fix one bug.
In your case you have 100 at the and of input - is it typo?

But I've still get TLE :((((((((((((((((



Code: Select all
import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in), 1024 * 1024 * 9);

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
sith
Learning poster
 
Posts: 67
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Postby brianfry713 » Thu Jun 28, 2012 12:03 am

I intentionally put 100 at the end of the input to demonstrate an issue with your code. You should only read the number of documents on the first line and then quit after reading that number of documents. Often the problems on this website will have garbage at the end of the input to see if you are following the problem statement and reading the input correctly.
brianfry713
Guru
 
Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Postby sith » Thu Jun 28, 2012 10:09 am

Thank you.
I fixed It, but TLE has been happend again :(((


Code: Select all
import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
                writer.flush();
                return;
            }
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            } else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
sith
Learning poster
 
Posts: 67
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Postby brianfry713 » Fri Jun 29, 2012 1:25 am

My C++ code uses a map<string,int> and an array, no priority queue. I use an algorithm similar to the one described by Leonid in this thread and get AC in 0.868 sec.
I rewrote my code in JAVA and got AC in 2.044 sec.
brianfry713
Guru
 
Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Postby Scarecrow » Sat Sep 08, 2012 12:57 am

please help someone. getting WA

Code: Select all
AC
Do or do not. There is no try.
Scarecrow
Learning poster
 
Posts: 69
Joined: Wed Oct 19, 2011 9:06 pm


Return to Volume CXVIII

Who is online

Users browsing this forum: No registered users and 1 guest