String decompression challenge solution
I am sharing my solution in Java to one of the former Google interview questions, which is posted by Google itself here:
https://techdevguide.withgoogle.com/resources/compress-decompression/
I am copying the description below just in case the original link gets removed, and without it the solution will not have any meaning.
Contents
The Challenge
In this exercise, you’re going to decompress a compressed string.
Your input is a compressed string of the format number[string]
and the decompressed output form should be the string
written number
times. For example:
The input
3[abc]4[ab]c
Would be output as
abcabcabcababababc
Other rules
Number can have more than one digit. For example, 10[a]
is allowed, and just means aaaaaaaaaa
One repetition can occur inside another. For example, 2[3[a]b]
decompresses into aaabaaab
Characters allowed as input include digits, small English letters and brackets [ ]
.
Digits are only to represent amount of repetitions.
Letters are just letters.
Brackets are only part of syntax of writing repeated substring.
Input is always valid, so no need to check its validity.
Solution
My solution uses the approach which I find the most intuitive for this case, the recursive one. The recursive solution can also be converted to iterative by using a stack instead of recursive function calls.
We know that we need to iterate over the input string, and when we notice the encoding pattern (digits followed by opening square bracket), we have to repeat the same logic for the capture group by calling again the same function.
We do not want the called function to write to the output sb
buffer directly, because the returned value will need to get repeated and the number of repetitions is known only by the parent call, which captured the digits. For this reason we pass an empty current
buffer as output buffer.
On the other hand, as long as we are not capturing digits, we add to the current characters to the current
buffer. This buffer will be flushed to the output buffer once we encounter one opening square bracket [
.
As any recursive solution, we need to identify the base case; otherwise we will have infinite calls. The base case is that case which will not call itself anymore.
In this challenge the base case is when a string does not contain another encoded nested string, for example abcd
is a base case. It does not have any encoded string (starting with number, followed by [, then the string inside, and finally the closing bracket ]).
For all the other cases, once we notice that an encoded string is starting (started capturing digits), when we find the opening bracket (in this case this will be always after the number as per the problem description), we call the recursive function to do the same for the rest of the string. We do this until the closing bracket is found, or until another encoded string is found. If the latter happens then we call the recursive function itself to handle it.
Another important point here is that we need to skip the parts already processed by the recursive calls, and we do this by returning the index of the last processed character as the return value of the function.
The upper function in the call stack sets its i
variable value to the returned one, and then proceeds from there. The groups captured by the recursive call, are then repeated by the parent call and a and added to the output buffer sb
.
The full solution in Java is given below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
package com.kristijorgji; /** * @see <a href="https://techdevguide.withgoogle.com/resources/compress-decompression/">Compression and Decompression</a> */ public class Decompress { String decompress(String input) { StringBuilder sb = new StringBuilder(); decompress(input, 0, sb); return sb.toString(); } private int decompress(String input, int start, StringBuilder sb) { StringBuilder times = new StringBuilder(); StringBuilder current = new StringBuilder(); int i = start; while (i < input.length()) { while (Character.isDigit(input.charAt(i))) { times.append(input.charAt(i)); i++; } if (input.charAt(i) == '[' && times.length() > 0) { if (current.length() > 0) { sb.append(current); current = new StringBuilder(); } i = decompress(input, i + 1, current); int repeatTimes = Integer.parseInt(times.toString()); for (int j = 0; j < repeatTimes; j++) { sb.append(current); } current = new StringBuilder(); times = new StringBuilder(); } else if (input.charAt(i) == ']') { return i + 1; } else { sb.append(input.charAt(i)); i++; } } return i; } } |
In case of questions, feel free to comment and I will be glad to answer.