Apparently there is a limit to the size of an initialisation string in javac. Can anyone help me in identifying what the maximum limit is please?
Thank you
edit:
We are building an initialisation string which will look something like this "{1,2,3,4,5,6,7,8......}" but with 10,000 numbers ideally. When we do this for a 1000 it works, 10,000 throws an error saying code too large for try statement.
To produce this we are using a stringbuilder and looping over an array appending the values. Apparently it is a limitation in javac. We have been told that we could rebuild the array in the method we are invoking if we pass it in small chunks. This however is not possible because we dont have control over the user method we are invoking.
I would like to post code but can't because this is a project for University. I am not looking for code solutions just some help in understanding what the actual problem here is.
Its the for loop which is the offender
Object o = new Object()
{
public String toString()
{
StringBuilder s = new StringBuilder();
int length = MainInterfaceProcessor.this.valuesFromData.length;
Object[] arrayToProcess = MainInterfaceProcessor.this.valuesFromData;
if(length == 0)
{
//throw exception to do
}
else if(length == 1)
{
s.append("{" + Integer.toString((Integer)arrayToProcess[0])+"}");
}
else
{
s.append("{" + Integer.toString((Integer)arrayToProcess[0])+","); //opening statement
for(int i = 1; i < length; i++)
{
if(i == (length - 1))
{
//last element in the array so dont add comma at the end
s.append(getArrayItemAsString(arrayToProcess, i)+"}");
break;
}
//append each array value at position i, followed
//by a comma to seperate the values
s.append(getArrayItemAsString(arrayToProcess, i)+ ",");
}
}
return s.toString();
}
};
try
{
Object result = method.invoke(obj, new Object[] { o });
}
The length of a String literal (i.e. "..."
) is limited by the class file format's CONSTANT_Utf8_info
structure, which is referred by the CONSTANT_String_info
structure.
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
The limiting factor here is the length
attribute, which only is 2 bytes large, i.e. has a maximum value of 65535.
This number corresponds to the number of bytes in a modified UTF-8 representation of the string (this is actually almost CESU-8, but the 0 character is also represented in a two-byte form).
So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 ...U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-16 (i.e. U+10000 to U+10FFFF) take up 6 bytes each (real UTF-8 would take 5 here).
(The same limit is there for identifiers, i.e. class, method and variable names, and type descriptors for these, since they use the same structure.)
The Java Language Specification does not mention any limit for string literals:
A string literal consists of zero or more characters enclosed in double quotes.
So in principle a compiler could split a longer string literal into more than one CONSTANT_String_info
structure and reconstruct it on runtime by concatenation (and .intern()
-ing the result). I have no idea if any compiler is actually doing this.
It shows that the problem does not relate to string literals, but to array initializers.
When passing an object to BMethod.invoke
(and similarly to BConstructor.newInstance), it can either be a BObject (i.e. a wrapper around an existing object, it will then pass the wrapped object), a String (which will be passed as is), or anything else. In the last case, the object will be converted to a string (by toString()
), and this string then interpreted as a Java expression.
To do this, BlueJ will wrap this expression in a class/method and compile this method. In the method, the array initializer is simply converted to a long list of array assignments ... and this finally makes the method longer than the maximum bytecode size of a Java method:
The value of the code_length item must be less than 65536.
This is why it breaks for longer arrays.
So, to pass larger arrays, we have to find some other way to pass them to BMethod.invoke. The BlueJ extension API has no way to create or access arrays wrapped in a BObject.
One idea we found in chat is this:
Create a new class inside the project (or in a new project, if they can interoperate), something like this:
public class IntArrayBuilder {
private ArrayList<Integer> list;
public void addElement(int el) {
list.add(el);
}
public int[] makeArray() {
int[] array = new int[list.size()];
for(int i = 0; i < array.length; i++) {
array[i] = list.get(i);
}
return array;
}
}
(This is for the case of creating an int[]
- if you need other types of array, too, it can
also be made more generic. Also, it could be made more efficient by using an
internal int[]
as storage, enlarging it sporadically as it grows, and int makeArray
doing a final arraycopy. This is a sketch, thus this is the simplest implementation.)
From our extension, create an object of this class ,
and add elements to this object by calling its .addElement
method.
BObject arrayToBArray(int[] a) {
BClass builderClass = package.getClass("IntArrayBuilder");
BObject builder = builderClass.getConstructor(new Class<?>[0]).newInstance(new Object[0]);
BMethod addMethod = builderClass.getMethod("addElement", new Class<?>[]{int.class});
for(int e : a) {
addMethod.invoke(builder, new Object[]{ e });
}
BMethod makeMethod = builderClass.getMethod("addElement", new Class<?>[0]);
BObject bArray = (BObject)makeMethod.invoke(builder, new Object[0]);
return bArray;
}
(For efficiency, the BClass/BMethod objects could actually be retrieved once and cached instead of once for each array conversion.)
If you generate the arrays contents by some algorithm, you can do this generation here instead of first creating another wrapping object.
In our extension, call the method we actually want to call with the long array, passing our wrapped array:
Object result = method.invoke(obj, new Object[] { bArray });