String encoding challenge solution

Contents

Description

Given a non-empty string, encode the string such that its encoded length is the shortest.
The encoding rule is: k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times.
Note:
  1. k will be a positive integer and encoded string will not be empty or have extra space.
  2. You may assume that the input string contains only lowercase English letters. The string’s length is at most 160.
  3. If an encoding process does not make the string shorter, then do not encode it. If there are several solutions, return any of them is fine.

Examples

Example 1:

Example 2:

Example 3:

Example 4:

Example 5:

Solution

Before I provide my solution, first I want to reference below the existing brute force solution that you can easily find online:

http://massivealgorithms.blogspot.com/2016/12/leetcode-471-encode-string-with.html

The solution above returns always the best possible encoding (by giving the shortest output possible), but it is very slow and has high time and space complexity, respectively O(N^4) and O(N^2).

I made a benchmark with an input string of 1000 characters by running 100 times each solution. My solution took in average 160ms, while the brute force one 3m 19s 316ms.

The logical steps of my solution are as follow:

  1. While iterating over the input string, I store in a  LinkedHashMap all the start and end indexes of parts which are repeated and need to get encoded in the output string.
  2. As long as one character is repeated, I count its repetitions and then add it to the replaceMap.
  3. At a certain index i, the longest string that can repeat itself has length N – i, where N is the length of the input string. Knowing this, we can start to check if any pattern is repeating itself from the index i – (N – i). If the previous value is negative, we start the check from index 0 until i.
  4. If we find a repeating pattern, we check in the already found repetitions in replaceMap, if the previous repetitions were the same as the existing one. If so, we update the replaceMap to leave only one repetition with the lowest start index, and proper repetition count.
  5. I check if encoding the found repetition(s) will actually reduce the string size or not. If yes, then this repetition is checked against all the existing other ones and only the longest one is left.
  6. The previous repetitions which use fewer characters are removed, and we can break the loop. We can break the loop because if we find a repetition starting at index X, there is no need to check the repetitions starting at index Y, when X < Y. The length of the repetition starting at X will be greater and provide us better encoding.
  7. Form the encoded output string by making us of the formed replaceMap.

This solution can of course be further optimized and cleaned up, but for now it offers an alternative point of view on this challenge.

The code and some explanatory comments are given below: