I have spent a lot of time learning the many libraries of java to hate it actually. I have decent knowledge in perl that I am reluctant to learn other scripting languages like Python. Similarly, if I can write cgi scripts in perl, why learn PHP or Ruby (on rails or otherwise).
I had a strange Out Of Memory Heap exception and after reviewing the code didn’t find anything obvious in the java code that hinted at any memory leak. Then it turns out, my regular expression matching is what is causing the memory problem. That is in turn caused by the String.substring.
If you look at Java source code, in java/lang/String.java, you would notice some comments for the
public String(String string)
implementation. First, why would one want to create a String of another string? Well, here is the reason. Thing is, when you get a substring of a string in java, it doesn’t actually create a separate array to store that substring. Instead, the array of the original string is shared and an offset and length are used to track the substring. This type of implementation is possible in Java because strings are immutable.
So, in my use case, I have been fetching a bunch of large HTML pages and doing some pattern matching and extracting some tokens and keeping them in an array. So, even though they happen to be small tokens within my program and hence my initial code review assuming that I am only consuming very little memory, because those tokens happened to be substrings of the entire html page, the memory consumption turned out to be very high. Actually they are the return values of the javax.util.regex.Matcher.group(1). So, instead of directly adding the return value into the array, I created a string
String token = new String(matcher.group(1));
and then added it. This solved the memory problem.
Granted you don’t need to know about memory management when using Java as there is the garbage collector that takes care of things for you. But now and then, you get into this type of issues that require a little bit more digging (not the social type).