## 11860-Document Analyzer

Moderator: Board moderators

### 11860-Document Analyzer

I think this is a simple adhoc problem.....
but i got TLE in the contest
can any one give me any clue.....

durjay
New poster

Posts: 13
Joined: Tue Oct 06, 2009 5:09 pm
Location: ctg

### Re: 11860-Document Analyzer

use roling ...
O(n)
>>>>>>>>> A2
Beliefs are not facts, believe what you need to believe;)
Angeh
Experienced poster

Posts: 108
Joined: Sat Aug 08, 2009 2:53 pm

### Re: 11860-Document Analyzer

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest
can any one give me any clue.....

For solve this problem, i'm used Suffix Tree to find words, for example 'a abba a c a' = 1 2 1 3 1, and with this I know the max number of different words = 3 in this case, use interval tree, for know the minimum position of each word, and insert it one by one.

in the example i have 3 different words, [ a, abba, c], the first iteration i don't know the position [ INF, INF, INF], but insert one by one,
1) a = [ 0, INF, INF]
2) abba = [0, 1, INF];
3) a = [2, 1, INF];
4) c = [2, 1, 3]; // in this point, we know all position, and, insert the last one in position 3, so the minimum of all words position is 1, then 3- 1 + 1 = 3 = diff
5) a = [4, 1, 3] // the same, last position 4, the minimum of all is 1 then 4 - 1 + 1 = 4 = diff
the solution for this case is 2 4.

there are many words, and the time to find the minimum position is so slow, for that point i use interval tree, but you can use any other algorithm.

lord_burgos
New poster

Posts: 43
Joined: Mon Oct 13, 2003 4:54 pm
Location: Mexico

### Re: 11860-Document Analyzer

I think Suffix Tree is a bit of an overkill for such a simple problem.

Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length). I leave it up to you to prove why it works
Leonid
Experienced poster

Posts: 148
Joined: Thu Dec 22, 2005 5:50 pm

### Re: 11860-Document Analyzer

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest
can any one give me any clue.....

You can read the whole document and put it into a string variable. For example, the case one would be stored in a string with this content:

"1. a case is a case, 2. case is not a case~"

Then, write a function that cleans that "dirty" document an puts the clean words into a vector (the clean document). Here is an example in C++:
Code: Select all
`void cleanDocument( string dirty_document, vector<string> &clean_document ) {  string clean_word = "";    for ( unsigned int i = 0; i < dirty_document.size(); ++i )    if ( dirty_document[i] > 96 && dirty_document[i] < 123 )      clean_word += dirty_document[i];    else      if ( clean_word != "" ) {        clean_document.push_back( clean_word );        clean_word = "";      }}`
peratu
New poster

Posts: 7
Joined: Sat Jul 10, 2010 9:22 am

### Re: 11860-Document Analyzer

Hello

I've got TLE, but I believe that my solution is pretty optimal. I don't have any string comparision, I read all input data . what is wrong?

Code: Select all
`import java.io.ByteArrayInputStream;import java.io.CharArrayReader;import java.io.IOException;import java.io.InputStreamReader;import java.io.Reader;import java.util.Comparator;import java.util.HashMap;import java.util.Map;import java.util.PriorityQueue;import java.util.Set;import java.util.TreeSet;class Main {    public static final String END = "END";    public static void main(String[] args) {        InputStreamReader reader = new InputStreamReader(System.in);        char[] buf = new char[1024 * 1024 * 9];        try {            reader.read(buf);        }        catch (IOException e) {        }        CharArrayReader bufReader = new CharArrayReader(buf);        int intch;        try {            while ((intch = bufReader.read()) != -1) {                if (intch == 0) {                    break;                }                Integer count = null;                try {                    count = Integer.valueOf(Character.valueOf((char) intch).toString());                }                catch (NumberFormatException e) {                    continue;                }                for (int i = 1; i <= count; i++) {                    int position = 0;                    Map<String, PriorityQueue<Integer>> words = readTheWords(bufReader, position);                    if (words == null) {                        return;                    }                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();                    while (true) {                        WordsDataStructure dataStructure = new WordsDataStructure();                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());                        }                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);                        if (integers.size() > 1) {                            integers.poll();                        }                        else {                            Interval poll = resultQueue.poll();                            System.out.println("Document: " + i + ": " + poll.start + " " + poll.end);                            break;                        }                    }                }            }        }        catch (IOException e) {        }    }    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)        throws IOException {        int intch;        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();        StringBuilder builder = new StringBuilder();        boolean isEnd = false;        while (!isEnd) {            intch = reader.read();            if (intch == -1) {                return null;            }            char c = (char) intch;            if (Character.isLetter(c)) {                builder.append(c);            }            else {                String value = builder.toString();                isEnd = value.equals(END);                if (builder.length() > 0 && !isEnd) {                    PriorityQueue<Integer> priorityQueue = words.get(value);                    if (priorityQueue == null) {                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {                            public int compare(final Integer o1, final Integer o2) {                                return o2.compareTo(o1);                            }                        });                        words.put(value, priorityQueue);                    }                    priorityQueue.add(++position);                }                builder = new StringBuilder();            }        }        return words;    }    static class WordsDataStructure {        String maxWord;        int maxPostition = Integer.MIN_VALUE;        int minPosittion = Integer.MAX_VALUE;        public void addWord(String word, int position) {            if (position > maxPostition) {                maxPostition = position;                maxWord = word;            }            if (position < minPosittion) {                minPosittion = position;            }        }    }    static class Interval implements Comparable<Interval> {        final Integer start;        final Integer end;        Interval(final int start, final int end) {            this.end = end;            this.start = start;        }        public int compareTo(final Interval o) {            int compareResult = (end - start) - (o.end - o.start);            if (compareResult != 0) {                return compareResult;            }            return start.compareTo(o.start);        }    }}`
sith
Learning poster

Posts: 67
Joined: Sat May 19, 2012 7:46 pm

### Re: 11860-Document Analyzer

brianfry713
Guru

Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

### Re: 11860-Document Analyzer

It didn't help. I still get TLE

Here is updated solution

Code: Select all
`import java.io.BufferedReader;import java.io.BufferedWriter;import java.io.ByteArrayInputStream;import java.io.CharArrayReader;import java.io.IOException;import java.io.InputStreamReader;import java.io.PrintWriter;import java.io.Reader;import java.util.Comparator;import java.util.HashMap;import java.util.Map;import java.util.PriorityQueue;import java.util.Set;import java.util.TreeSet;class Main {    public static final String END = "END";    public static void main(String[] args) {        BufferedReader reader= new BufferedReader(new InputStreamReader(System.in),1024*1024*9);        BufferedWriter writer = new BufferedWriter(new PrintWriter(System.out));        int intch;        try {            while ((intch = reader.read()) != -1) {                if (intch == 0) {                    break;                }                Integer count = null;                try {                    count = Integer.valueOf(Character.valueOf((char) intch).toString());                }                catch (NumberFormatException e) {                    continue;                }                for (int i = 1; i <= count; i++) {                    int position = 0;                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);                    if (words == null) {                        return;                    }                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();                    while (true) {                        WordsDataStructure dataStructure = new WordsDataStructure();                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());                        }                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);                        if (integers.size() > 1) {                            integers.poll();                        }                        else {                            Interval poll = resultQueue.poll();                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");                            break;                        }                    }                }            }            writer.flush();        }        catch (IOException e) {        }    }    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)        throws IOException {        int intch;        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();        StringBuilder builder = new StringBuilder();        boolean isEnd = false;        while (!isEnd) {            intch = reader.read();            if (intch == -1) {                return words;            }            char c = (char) intch;            if (Character.isLetter(c)) {                builder.append(c);            }            else {                String value = builder.toString();                isEnd = value.equals(END);                if (builder.length() > 0 && !isEnd) {                    PriorityQueue<Integer> priorityQueue = words.get(value);                    if (priorityQueue == null) {                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {                            public int compare(final Integer o1, final Integer o2) {                                return o2.compareTo(o1);                            }                        });                        words.put(value, priorityQueue);                    }                    priorityQueue.add(++position);                }                builder = new StringBuilder();            }        }        return words;    }    static class WordsDataStructure {        String maxWord;        int maxPostition = Integer.MIN_VALUE;        int minPosittion = Integer.MAX_VALUE;        public void addWord(String word, int position) {            if (position > maxPostition) {                maxPostition = position;                maxWord = word;            }            if (position < minPosittion) {                minPosittion = position;            }        }    }    static class Interval implements Comparable<Interval> {        final Integer start;        final Integer end;        Interval(final int start, final int end) {            this.end = end;            this.start = start;        }        public int compareTo(final Interval o) {            int compareResult = (end - start) - (o.end - o.start);            if (compareResult != 0) {                return compareResult;            }            return start.compareTo(o.start);        }    }}`
sith
Learning poster

Posts: 67
Joined: Sat May 19, 2012 7:46 pm

### Re: 11860-Document Analyzer

Input:
Code: Select all
`10a b c d eENDa b c d eENDa b c d eENDa b c d eENDa b c d eENDa b c d eENDa b c d eENDa b c d eENDa b c d eENDa b c d eEND100`
Correct output:
Code: Select all
`Document 1: 1 5Document 2: 1 5Document 3: 1 5Document 4: 1 5Document 5: 1 5Document 6: 1 5Document 7: 1 5Document 8: 1 5Document 9: 1 5Document 10: 1 5`
brianfry713
Guru

Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

### Re: 11860-Document Analyzer

Thank you for your case. It helped me to fix one bug.
In your case you have 100 at the and of input - is it typo?

But I've still get TLE (((((((((((((((

Code: Select all
`import java.io.*;import java.util.Comparator;import java.util.HashMap;import java.util.Map;import java.util.PriorityQueue;class Main {    public static final String END = "END";    public static void main(String[] args) {        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in), 1024 * 1024 * 9);        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));        int intch;        try {            while ((intch = reader.read()) != -1) {                if (intch == 0) {                    break;                }                Integer count = null;                try {                    StringBuilder builder = new StringBuilder();                    char intch1 = (char) intch;                    while (intch1 != '\n') {                        builder.append(intch1);                        intch1 = (char) reader.read();                    }                    count = Integer.valueOf(builder.toString().trim());                } catch (NumberFormatException e) {                    continue;                }                for (int i = 1; i <= count; i++) {                    int position = 0;                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);                    if (words == null) {                        return;                    }                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();                    while (true) {                        WordsDataStructure dataStructure = new WordsDataStructure();                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());                        }                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);                        if (integers.size() > 1) {                            integers.poll();                        } else {                            Interval poll = resultQueue.poll();                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");                            break;                        }                    }                }            }            writer.flush();        } catch (IOException e) {        }    }    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)            throws IOException {        int intch;        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();        StringBuilder builder = new StringBuilder();        boolean isEnd = false;        while (!isEnd) {            intch = reader.read();            if (intch == -1) {                return words;            }            char c = (char) intch;            if (Character.isLetter(c)) {                builder.append(c);            }            else {                String value = builder.toString();                isEnd = value.equals(END);                if (builder.length() > 0 && !isEnd) {                    PriorityQueue<Integer> priorityQueue = words.get(value);                    if (priorityQueue == null) {                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {                            public int compare(final Integer o1, final Integer o2) {                                return o2.compareTo(o1);                            }                        });                        words.put(value, priorityQueue);                    }                    priorityQueue.add(++position);                }                builder = new StringBuilder();            }        }        return words;    }    static class WordsDataStructure {        String maxWord;        int maxPostition = Integer.MIN_VALUE;        int minPosittion = Integer.MAX_VALUE;        public void addWord(String word, int position) {            if (position > maxPostition) {                maxPostition = position;                maxWord = word;            }            if (position < minPosittion) {                minPosittion = position;            }        }    }    static class Interval implements Comparable<Interval> {        final Integer start;        final Integer end;        Interval(final int start, final int end) {            this.end = end;            this.start = start;        }        public int compareTo(final Interval o) {            int compareResult = (end - start) - (o.end - o.start);            if (compareResult != 0) {                return compareResult;            }            return start.compareTo(o.start);        }    }}`
sith
Learning poster

Posts: 67
Joined: Sat May 19, 2012 7:46 pm

### Re: 11860-Document Analyzer

I intentionally put 100 at the end of the input to demonstrate an issue with your code. You should only read the number of documents on the first line and then quit after reading that number of documents. Often the problems on this website will have garbage at the end of the input to see if you are following the problem statement and reading the input correctly.
brianfry713
Guru

Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

### Re: 11860-Document Analyzer

Thank you.
I fixed It, but TLE has been happend again ((

Code: Select all
`import java.io.*;import java.util.Comparator;import java.util.HashMap;import java.util.Map;import java.util.PriorityQueue;class Main {    public static final String END = "END";    public static void main(String[] args) {     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));        int intch;        try {            while ((intch = reader.read()) != -1) {                if (intch == 0) {                    break;                }                Integer count = null;                try {                    StringBuilder builder = new StringBuilder();                    char intch1 = (char) intch;                    while (intch1 != '\n') {                        builder.append(intch1);                        intch1 = (char) reader.read();                    }                    count = Integer.valueOf(builder.toString().trim());                } catch (NumberFormatException e) {                    continue;                }                for (int i = 1; i <= count; i++) {                    int position = 0;                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);                    if (words == null) {                        return;                    }                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();                    while (true) {                        WordsDataStructure dataStructure = new WordsDataStructure();                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());                        }                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);                        if (integers.size() > 1) {                            integers.poll();                        } else {                            Interval poll = resultQueue.poll();                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");                            break;                        }                    }                }                writer.flush();                return;            }        } catch (IOException e) {        }    }    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)            throws IOException {        int intch;        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();        StringBuilder builder = new StringBuilder();        boolean isEnd = false;        while (!isEnd) {            intch = reader.read();            if (intch == -1) {                return words;            }            char c = (char) intch;            if (Character.isLetter(c)) {                builder.append(c);            } else {                String value = builder.toString();                isEnd = value.equals(END);                if (builder.length() > 0 && !isEnd) {                    PriorityQueue<Integer> priorityQueue = words.get(value);                    if (priorityQueue == null) {                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {                            public int compare(final Integer o1, final Integer o2) {                                return o2.compareTo(o1);                            }                        });                        words.put(value, priorityQueue);                    }                    priorityQueue.add(++position);                }                builder = new StringBuilder();            }        }        return words;    }    static class WordsDataStructure {        String maxWord;        int maxPostition = Integer.MIN_VALUE;        int minPosittion = Integer.MAX_VALUE;        public void addWord(String word, int position) {            if (position > maxPostition) {                maxPostition = position;                maxWord = word;            }            if (position < minPosittion) {                minPosittion = position;            }        }    }    static class Interval implements Comparable<Interval> {        final Integer start;        final Integer end;        Interval(final int start, final int end) {            this.end = end;            this.start = start;        }        public int compareTo(final Interval o) {            int compareResult = (end - start) - (o.end - o.start);            if (compareResult != 0) {                return compareResult;            }            return start.compareTo(o.start);        }    }}`
sith
Learning poster

Posts: 67
Joined: Sat May 19, 2012 7:46 pm

### Re: 11860-Document Analyzer

My C++ code uses a map<string,int> and an array, no priority queue. I use an algorithm similar to the one described by Leonid in this thread and get AC in 0.868 sec.
I rewrote my code in JAVA and got AC in 2.044 sec.
brianfry713
Guru

Posts: 1765
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

### Re: 11860-Document Analyzer

Code: Select all
`AC`
Do or do not. There is no try.
Scarecrow
Learning poster

Posts: 69
Joined: Wed Oct 19, 2011 9:06 pm