String encoding challenge solution

Contents

Description

Given a non-empty string, encode the string such that its encoded length is the shortest.

The encoding rule is: k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times.

Note:

k will be a positive integer and encoded string will not be empty or have extra space.
You may assume that the input string contains only lowercase English letters. The string’s length is at most 160.
If an encoding process does not make the string shorter, then do not encode it. If there are several solutions, return any of them is fine.

Examples

Example 1:

Input: "aaa"
Output: "aaa"
Explanation: There is no way to encode it such that it is shorter than the input string, so we do not encode it.

Input: "aaa"

Output: "aaa"

Explanation: There is no way to encode it such that it is shorter than the input string, so we do not encode it.

Example 2:

Input: "aaaaa"
Output: "5[a]"
Explanation: "5[a]" is shorter than "aaaaa" by 1 character.

Input: "aaaaa"

Output: "5[a]"

Explanation: "5[a]" is shorter than "aaaaa" by 1 character.

Example 3:

Input: "aaaaaaaaaa"
Output: "10[a]"
Explanation: "a9[a]" or "9[a]a" are also valid solutions, both of them have the same length = 5, which is the same as "10[a]".

Input: "aaaaaaaaaa"

Output: "10[a]"

Explanation: "a9[a]" or "9[a]a" are also valid solutions, both of them have the same length = 5, which is the same as "10[a]".

Example 4:

Input: "aabcaabcd"
Output: "2[aabc]d"
Explanation: "aabc" occurs twice, so one answer can be "2[aabc]d".

Input: "aabcaabcd"

Output: "2[aabc]d"

Explanation: "aabc" occurs twice, so one answer can be "2[aabc]d".

Example 5:

Input: "abbbabbbcabbbabbbc"
Output: "2[2[abbb]c]"
Explanation: "abbbabbbc" occurs twice, but "abbbabbbc" can also be encoded to "2[abbb]c", so one answer can be "2[2[abbb]c]".

Input: "abbbabbbcabbbabbbc"

Output: "2[2[abbb]c]"

Explanation: "abbbabbbc" occurs twice, but "abbbabbbc" can also be encoded to "2[abbb]c", so one answer can be "2[2[abbb]c]".

Solution

Before I provide my solution, first I want to reference below the existing brute force solution that you can easily find online:

http://massivealgorithms.blogspot.com/2016/12/leetcode-471-encode-string-with.html

The solution above returns always the best possible encoding (by giving the shortest output possible), but it is very slow and has high time and space complexity, respectively O(N^4) and O(N^2).

I made a benchmark with an input string of 1000 characters by running 100 times each solution. My solution took in average 160ms, while the brute force one 3m 19s 316ms.

The logical steps of my solution are as follow:

While iterating over the input string, I store in a LinkedHashMap all the start and end indexes of parts which are repeated and need to get encoded in the output string.
As long as one character is repeated, I count its repetitions and then add it to the replaceMap.
At a certain index i, the longest string that can repeat itself has length N – i, where N is the length of the input string. Knowing this, we can start to check if any pattern is repeating itself from the index i – (N – i). If the previous value is negative, we start the check from index 0 until i.
If we find a repeating pattern, we check in the already found repetitions in replaceMap, if the previous repetitions were the same as the existing one. If so, we update the replaceMap to leave only one repetition with the lowest start index, and proper repetition count.
I check if encoding the found repetition(s) will actually reduce the string size or not. If yes, then this repetition is checked against all the existing other ones and only the longest one is left.
The previous repetitions which use fewer characters are removed, and we can break the loop. We can break the loop because if we find a repetition starting at index X, there is no need to check the repetitions starting at index Y, when X < Y. The length of the repetition starting at X will be greater and provide us better encoding.
Form the encoded output string by making us of the formed replaceMap.

This solution can of course be further optimized and cleaned up, but for now it offers an alternative point of view on this challenge.

The code and some explanatory comments are given below:

package com.kristijorgji;

import java.util.LinkedHashMap;

class Compress {
    String compress(String S)
    {
        final int N = S.length();
        LinkedHashMap<Integer, RepetitionTuple> replaceMap = new LinkedHashMap<>();

        for (int i = 1; i < N; i++) {
            //handle the case in which adjacent characters are equal, ex aaaaa...
            int same = 0;
            while (i < N && S.charAt(i) == S.charAt(i - 1)) { same++; i++; } //2 character for opening and closing brackets, plus the characters for the digit and the char itself if (same > 0 && getDigitsNumber(same) + 2 + 1 < same + 1) {
                replaceMap.put(i - 1 - same, new RepetitionTuple(i - 1 - same, i - same, 1, same));
            }

            /*
                handle the case of group repetitions one after each other, ex abcabc...
                search all possible matching from left side until end of string
             */
            int matchCheckStartIndex = N - i;
            int ci = Math.max(0, i - matchCheckStartIndex);

            for (;ci < i; ci++) {
                String value = S.substring(ci, i);

                int timesRepeated = 0;
                int j = i;
                int k = 0;
                while (k < value.length() && S.charAt(j) == value.charAt(k)) { j++; k++; if (k % value.length() == 0) { timesRepeated++; } } int last = ci - k; while (timesRepeated > 0 && replaceMap.containsKey(last) && replaceMap.get(last).length == k) {
                    timesRepeated++;
                    replaceMap.remove(last);
                    last -= k;
                }

                //add to repetition map only if it minimizes the chunk size
                if (k > 0 && ci + k == i && k + 3 < value.length() * (timesRepeated + 1)) {
                    /*
                        check if a previous replacement overlaps with the current one
                        and if so, decide to keep the one which minimizes more the string
                     */
                    boolean doesLongerReplacementExist = false;
                    for (int d = 0; d < ci; d++) { if (replaceMap.containsKey(d)) { RepetitionTuple rt = replaceMap.get(d); int endIndex = rt.length * (rt.times + 1) + rt.start; if (endIndex >= ci) {
                                if (rt.length * (rt.times + 1) > k * (timesRepeated + 1)) {
                                    doesLongerReplacementExist = true;
                                } else if (rt.length * (rt.times + 1) < k * (timesRepeated + 1)) {
                                    replaceMap.remove(d);
                                }
                            }
                        }
                    }

                    if (!doesLongerReplacementExist) {
                        replaceMap.put(
                                last + k,
                                new RepetitionTuple(
                                        last + k,
                                        last + 2 * k,
                                        k,
                                        timesRepeated
                                )
                        );
                        /*
                            We found a compression which uses more characters, the ones with less can be removed now
                         */
                        for (int d = last + k + 1; d < i; d++) { replaceMap.remove(d); } break; } } } } StringBuilder sb = buildCompressedString(S, replaceMap); return sb.length() > N ? S : sb.toString();
    }

    private int getDigitsNumber(int number)
    {
        int c = 1;
        while (number / 10 != 0) {
            c++;
            number /= 10;
        }
        return c;
    }

    /**
     * O(N) complexity
     */
    private StringBuilder buildCompressedString(String S, LinkedHashMap<Integer, RepetitionTuple> replaceMap)
    {
        int i = 0;
        int N = S.length();
        StringBuilder sb = new StringBuilder();
        StringBuilder capture = new StringBuilder();

        if (replaceMap.size() == 0) {
            return sb.append(S);
        }

        while (i < N) {
            while  (i < N && !replaceMap.containsKey(i)) {
                capture.append(S.charAt(i));
                i++;
            }

            sb.append(compress(capture.toString()));
            capture = new StringBuilder();

            if (replaceMap.containsKey(i)) {
                RepetitionTuple repetitionTuple = replaceMap.get(i);
                sb.append(repetitionTuple.times + 1)
                        .append("[")
                        .append(compress(S.substring(repetitionTuple.start, repetitionTuple.end)))
                        .append("]");
                i += (repetitionTuple.times + 1) * repetitionTuple.length;
                continue;
            }

            i++;
        }

        return  sb;
    }

    class RepetitionTuple {
        int start;
        int end;
        int length;
        private int times;

        RepetitionTuple(int start, int end, int length, int times) {
            this.start = start;
            this.end = end;
            this.length = length;
            this.times = times;
        }
    }
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

package com.kristijorgji;

import java.util.LinkedHashMap;

class Compress {

String compress(String S)

{

final int N = S.length();

LinkedHashMap<Integer, RepetitionTuple> replaceMap = new LinkedHashMap<>();

for (int i = 1; i < N; i++) {

//handle the case in which adjacent characters are equal, ex aaaaa...

int same = 0;

while (i < N && S.charAt(i) == S.charAt(i - 1)) { same++; i++; } //2 character for opening and closing brackets, plus the characters for the digit and the char itself if (same > 0 && getDigitsNumber(same) + 2 + 1 < same + 1) {

replaceMap.put(i - 1 - same, new RepetitionTuple(i - 1 - same, i - same, 1, same));

}

handle the case of group repetitions one after each other, ex abcabc...

search all possible matching from left side until end of string

int matchCheckStartIndex = N - i;

int ci = Math.max(0, i - matchCheckStartIndex);

for (;ci < i; ci++) {

String value = S.substring(ci, i);

int timesRepeated = 0;

int j = i;

int k = 0;

while (k < value.length() && S.charAt(j) == value.charAt(k)) { j++; k++; if (k % value.length() == 0) { timesRepeated++; } } int last = ci - k; while (timesRepeated > 0 && replaceMap.containsKey(last) && replaceMap.get(last).length == k) {

timesRepeated++;

replaceMap.remove(last);

last -= k;

}

//add to repetition map only if it minimizes the chunk size

if (k > 0 && ci + k == i && k + 3 < value.length() * (timesRepeated + 1)) {

check if a previous replacement overlaps with the current one

and if so, decide to keep the one which minimizes more the string

boolean doesLongerReplacementExist = false;

for (int d = 0; d < ci; d++) { if (replaceMap.containsKey(d)) { RepetitionTuple rt = replaceMap.get(d); int endIndex = rt.length * (rt.times + 1) + rt.start; if (endIndex >= ci) {

if (rt.length * (rt.times + 1) > k * (timesRepeated + 1)) {

doesLongerReplacementExist = true;

} else if (rt.length * (rt.times + 1) < k * (timesRepeated + 1)) {

replaceMap.remove(d);

}

if (!doesLongerReplacementExist) {

replaceMap.put(

last + k,

new RepetitionTuple(

last + k,

last + 2 * k,

timesRepeated

)

);

We found a compression which uses more characters, the ones with less can be removed now

for (int d = last + k + 1; d < i; d++) { replaceMap.remove(d); } break; } } } } StringBuilder sb = buildCompressedString(S, replaceMap); return sb.length() > N ? S : sb.toString();

}

private int getDigitsNumber(int number)

{

int c = 1;

while (number / 10 != 0) {

c++;

number /= 10;

}

return c;

}

/**

* O(N) complexity

private StringBuilder buildCompressedString(String S, LinkedHashMap<Integer, RepetitionTuple> replaceMap)

{

int i = 0;

int N = S.length();

StringBuilder sb = new StringBuilder();

StringBuilder capture = new StringBuilder();

if (replaceMap.size() == 0) {

return sb.append(S);

}

while (i < N) {

while (i < N && !replaceMap.containsKey(i)) {

capture.append(S.charAt(i));

i++;

}

sb.append(compress(capture.toString()));

capture = new StringBuilder();

if (replaceMap.containsKey(i)) {

RepetitionTuple repetitionTuple = replaceMap.get(i);

sb.append(repetitionTuple.times + 1)

.append("[")

.append(compress(S.substring(repetitionTuple.start, repetitionTuple.end)))

.append("]");

i += (repetitionTuple.times + 1) * repetitionTuple.length;

continue;

}

i++;

}

return sb;

}

class RepetitionTuple {

int start;

int end;

int length;

private int times;

RepetitionTuple(int start, int end, int length, int times) {

this.start = start;

this.end = end;

this.length = length;

this.times = times;

}

Categories:Code challenges

Tags:code challenges java

Kristi Jorgji

Description

Examples

Solution

Related Posts

String decompression challenge solution