parsing incomplete Java source code

Question

In certain problem I need to parse a Java source code fragment that is potentially incomplete. For example, the code can refer to variables that are not defined in such fragment.

In that case, I would still like to parse such incomplete Java code, transform it to a convenient inspectable representation, and being able to generate source code from such abstract representation.

What is the right tool for this ? In this post I found suggestions to use Antlr, JavaCC or the Eclipse JDT. However, I did not find any reference regarding dealing with incomplete Java source code fragments, hence this question (and in addition the linked question is more than two years old, so I am wondering if something new is on the map).

As an example, the code could be something like the following expression:

"myMethod(aVarName)"

In that case, I would like to be able to somehow detect that the variable aVarName is referenced in the code.

Solution

Uhm... This question does not have anything even vaguely like a simple answer. Any of the above parser technologies will allow you to do what you wish to do, if you write the correct grammar and manipulate the parser to do fallback parsing unknown token passover sort of things.

The least amount of work to get you where you're going is either to use ANTLR which has resumable parsing and comes with a reasonably complete java 7 grammar, or see what you can pull out of the eclipse JDT ( which is used for doing the error and intention notations and syntax highlighting in the eclipse IDE. )

Note that none of this stuff is easy -- you're writing klocs, not just importing a class and telling it to go.

At a certain point of incorrect/incompleteness all of these strategies will fail just because no computer ( or even person for that matter ) is able to discern what you mean unless you at least vaguely say it correctly.