String encoding challenge solution
Contents
Description
k[encoded_string]
, where the encoded_string inside the square brackets is being repeated exactly k times.- k will be a positive integer and encoded string will not be empty or have extra space.
- You may assume that the input string contains only lowercase English letters. The string’s length is at most 160.
- If an encoding process does not make the string shorter, then do not encode it. If there are several solutions, return any of them is fine.
Examples
1 2 3 |
Input: "aaa" Output: "aaa" Explanation: There is no way to encode it such that it is shorter than the input string, so we do not encode it. |
1 2 3 |
Input: "aaaaa" Output: "5[a]" Explanation: "5[a]" is shorter than "aaaaa" by 1 character. |
1 2 3 |
Input: "aaaaaaaaaa" Output: "10[a]" Explanation: "a9[a]" or "9[a]a" are also valid solutions, both of them have the same length = 5, which is the same as "10[a]". |
1 2 3 |
Input: "aabcaabcd" Output: "2[aabc]d" Explanation: "aabc" occurs twice, so one answer can be "2[aabc]d". |
1 2 3 |
Input: "abbbabbbcabbbabbbc" Output: "2[2[abbb]c]" Explanation: "abbbabbbc" occurs twice, but "abbbabbbc" can also be encoded to "2[abbb]c", so one answer can be "2[2[abbb]c]". |
Solution
Before I provide my solution, first I want to reference below the existing brute force solution that you can easily find online:
http://massivealgorithms.blogspot.com/2016/12/leetcode-471-encode-string-with.html
The solution above returns always the best possible encoding (by giving the shortest output possible), but it is very slow and has high time and space complexity, respectively O(N^4) and O(N^2).
I made a benchmark with an input string of 1000 characters by running 100 times each solution. My solution took in average 160ms, while the brute force one 3m 19s 316ms.
The logical steps of my solution are as follow:
- While iterating over the input string, I store in a
LinkedHashMap
all the start and end indexes of parts which are repeated and need to get encoded in the output string. - As long as one character is repeated, I count its repetitions and then add it to the
replaceMap
. - At a certain index i, the longest string that can repeat itself has length N – i, where N is the length of the input string. Knowing this, we can start to check if any pattern is repeating itself from the index i – (N – i). If the previous value is negative, we start the check from index 0 until i.
- If we find a repeating pattern, we check in the already found repetitions in replaceMap, if the previous repetitions were the same as the existing one. If so, we update the replaceMap to leave only one repetition with the lowest start index, and proper repetition count.
- I check if encoding the found repetition(s) will actually reduce the string size or not. If yes, then this repetition is checked against all the existing other ones and only the longest one is left.
- The previous repetitions which use fewer characters are removed, and we can break the loop. We can break the loop because if we find a repetition starting at index X, there is no need to check the repetitions starting at index Y, when X < Y. The length of the repetition starting at X will be greater and provide us better encoding.
- Form the encoded output string by making us of the formed replaceMap.
This solution can of course be further optimized and cleaned up, but for now it offers an alternative point of view on this challenge.
The code and some explanatory comments are given below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
package com.kristijorgji; import java.util.LinkedHashMap; class Compress { String compress(String S) { final int N = S.length(); LinkedHashMap<Integer, RepetitionTuple> replaceMap = new LinkedHashMap<>(); for (int i = 1; i < N; i++) { //handle the case in which adjacent characters are equal, ex aaaaa... int same = 0; while (i < N && S.charAt(i) == S.charAt(i - 1)) { same++; i++; } //2 character for opening and closing brackets, plus the characters for the digit and the char itself if (same > 0 && getDigitsNumber(same) + 2 + 1 < same + 1) { replaceMap.put(i - 1 - same, new RepetitionTuple(i - 1 - same, i - same, 1, same)); } /* handle the case of group repetitions one after each other, ex abcabc... search all possible matching from left side until end of string */ int matchCheckStartIndex = N - i; int ci = Math.max(0, i - matchCheckStartIndex); for (;ci < i; ci++) { String value = S.substring(ci, i); int timesRepeated = 0; int j = i; int k = 0; while (k < value.length() && S.charAt(j) == value.charAt(k)) { j++; k++; if (k % value.length() == 0) { timesRepeated++; } } int last = ci - k; while (timesRepeated > 0 && replaceMap.containsKey(last) && replaceMap.get(last).length == k) { timesRepeated++; replaceMap.remove(last); last -= k; } //add to repetition map only if it minimizes the chunk size if (k > 0 && ci + k == i && k + 3 < value.length() * (timesRepeated + 1)) { /* check if a previous replacement overlaps with the current one and if so, decide to keep the one which minimizes more the string */ boolean doesLongerReplacementExist = false; for (int d = 0; d < ci; d++) { if (replaceMap.containsKey(d)) { RepetitionTuple rt = replaceMap.get(d); int endIndex = rt.length * (rt.times + 1) + rt.start; if (endIndex >= ci) { if (rt.length * (rt.times + 1) > k * (timesRepeated + 1)) { doesLongerReplacementExist = true; } else if (rt.length * (rt.times + 1) < k * (timesRepeated + 1)) { replaceMap.remove(d); } } } } if (!doesLongerReplacementExist) { replaceMap.put( last + k, new RepetitionTuple( last + k, last + 2 * k, k, timesRepeated ) ); /* We found a compression which uses more characters, the ones with less can be removed now */ for (int d = last + k + 1; d < i; d++) { replaceMap.remove(d); } break; } } } } StringBuilder sb = buildCompressedString(S, replaceMap); return sb.length() > N ? S : sb.toString(); } private int getDigitsNumber(int number) { int c = 1; while (number / 10 != 0) { c++; number /= 10; } return c; } /** * O(N) complexity */ private StringBuilder buildCompressedString(String S, LinkedHashMap<Integer, RepetitionTuple> replaceMap) { int i = 0; int N = S.length(); StringBuilder sb = new StringBuilder(); StringBuilder capture = new StringBuilder(); if (replaceMap.size() == 0) { return sb.append(S); } while (i < N) { while (i < N && !replaceMap.containsKey(i)) { capture.append(S.charAt(i)); i++; } sb.append(compress(capture.toString())); capture = new StringBuilder(); if (replaceMap.containsKey(i)) { RepetitionTuple repetitionTuple = replaceMap.get(i); sb.append(repetitionTuple.times + 1) .append("[") .append(compress(S.substring(repetitionTuple.start, repetitionTuple.end))) .append("]"); i += (repetitionTuple.times + 1) * repetitionTuple.length; continue; } i++; } return sb; } class RepetitionTuple { int start; int end; int length; private int times; RepetitionTuple(int start, int end, int length, int times) { this.start = start; this.end = end; this.length = length; this.times = times; } } } |