- Create a platform to enable language designers and the community to validate extensions to Java.
- Encourage and facilitate empirical research of features in Java.
A Query Language for Language Designers
Java Source Code Query Languages
In this section, we give an overview of the seven query languages that we evaluate in this paper: Java Tools Language, Browse-By-Query, SOUL, JQuery, .QL, Jackpot and PMD. We selected these languages because they provide a variety of design choices and strictly provide a query language. For example, we didn't select Findbugs as it only lets programmers query source by creating new classes based on a Java framework. We also only selected source code query languages that included a guide or a working implementation.
Java Language Tools
The Java Tools Language (JTL) is a logic-paradigm query language to select Java elements in a code base. The current implementation is based on an analysis of Java bytecode classes. The JTL syntax is inspired by Query-by-Example ideas in order to increase productivity of users. For example, one could find all methods taking three int parameters and returning a subclass of Java's Date class using the follow query:
public static D method(int, int, int), D extends* /java.util.Date;
In addition, JTL features variable binding and data flow queries.
Browse-By-Query (BBQ) reads Java bytecode files and creates a database representing classes, method calls, fields, field references, string constants and string constant references. This database can then be interrogated through English-like queries. The syntax is motivated by the desire to be intuitive. For example, one could find all the methods that call a method whose name matches start by composing the following query:
methods containing calls to matching "start" methods in all classes
In addition, BBQ provides filtering mechanisms, set and and relational operators that can be combined to compose more complex queries.
SOUL is a logic-paradigm query language. It contains an extensive predicate library called CAVA that matches queries against AST nodes of a Java program generated by the Eclipse JDT. SOUL facilitates the specification of queries by using example-driven matching of templates and structural unification to match a code excerpt with an AST node. In practice, this means a user can create a logic variable to match an AST node and reuse this variable within the query regardless of the execution path where the variable appears. For example, one could specify a query that finds instances of Scanner that is read after it was closed as follows:
if jtMethodDeclaration(?m){ public static void main(String[] args) { ?scanner := [new java.util.Scanner(?argList);] ?scanner.close(); ?; } }
JQuery is a logic-paradigm query language built on top of the logic programming language TyRuBa. The implementation of JQuery analyse the AST of a Java program by making calls to the Eclipse JDT. JQuery includes a library of predicates that allows querying Java elements and the relationships between them. For example, the following query finds all method declarations ?M that have at least one parameter of type Integer:
method(?M, paramType, ?PT), match(?PT, /Integer/).
.QL is an object-oriented query language. It enables programmers to query Java source code by composing queries that look like SQL. The motivation for this design choice is to reduce barrier to entry for developers that learn it. In addition, the authors argue that object-orientation provides the structure necessary for building reusable queries. An implementation is available, called SemmleCode, which includes an editor and various optimisations. As an example, the following query describes how to find all classes that declare a method equals, but which do not specify a method hashCode.
from Class c where c.declaresMethod("equals") and not (c.declaresMethod("hashCode")) and c.fromSource() select c.getPackage(), c
Jackpot is a module for the NetBeans IDE for querying and transforming Java source files. Jackpot lets user query the AST of a Java program by composing rules under the form of a Java expression. In addition, one can specify variables to bind to a matching AST node. For example, the following query will match any code surrounded by a call to readLock() and readUnlock():
$document.readLock(); $statementsUnderLock$; $document.readUnlock();
PMD is a ruleset based Java source code analyzer that identifies bugs or potential problems including dead code, duplicate code or overcomplicated expressions. PMD has an extensive archive of built-in rules that can be used to identify such problems. One can specify new rules by writing it in Java and making use of the PMD helper classes. Alternatively, one can also compose custom rules via an XPath expression that queries the AST of the program to analyze. For example, the following query finds all method declarations that have at least one parameter of type Integer:
//MethodDeclarator/FormalParameters [FormalParameter/Type/ReferenceType/ClassOrInterfaceType [@Image = 'Integer']]
Uses Cases
In this section, we describe the use cases examined for the evaluation. We selected use cases that are source of language design discussions and make use of a variety of Java features.
Final Array and Anonymous Inner Classes
Java lets programmers create inner classes, which is a nested class not declared static. There exists three different types of inner classes: non-static member, local and anonymous classes.
Inner classes have a restriction that any local variable, formal parameter, or exception parameter used but not declared in the inner class must be declared final.
However, programmers can circumvent this restriction by declaring a final array with only one element and mutate the element of the array. The following code illustrates this mechanism:
public class OutsideClass { public void methodA() { final String\[\] s = new String\[1\]; class InnerClass { public void methodB() { s\[0\] = "bypass"; // accepted by compiler } } } }
Use Case 1: Find occurrence of an anonymous inner class whose code references a final array variable in the enclosing scope and which mutates array elements via that variable.
Generic Constructors
class Foo<T extends Number> { <S extends T> Foo() {}}
class Foo<T extends Number> { <S extends T> Foo() {}}
A constructor can have two sets of type arguments. A constructor can use the type parameters declared in a generic class. One can then specify the types after the class name: new Foo<Integer>(). In addition, a constructor can declare its own type parameters. The types are then specified between the new token and the class name: new <Integer> Foo<Number>(). The code below illustrates a constructor of class Foo which declares its own type parameter S that extends the class's own parameters.
class Foo<T extends Number> {
h5. <S extends T> Foo() {}
h5. }
class Foo<T extends Number> {
h5. <S extends T> Foo() {}
h5. }
class Foo<T extends Number> { h5. <S extends T> Foo() {} h5. }
class Foo<T extends Number> { h5. <S extends T> Foo() {} h5. }
Capture Conversion Idiom
Overloaded Methods
Covariant Arrays
Final Array & Anonymous Class |
Generic Constructors |
Capture Conversion Idiom |
Overloaded Methods |
Covariant Arrays |
X |
X |
X |
X |
? |
JQuery |
X |
X |
X |
? |
X |
.QL |
Jackpot |
- doesn't detect local inner classes (local & anonymous). Only inner classes (doesn't differentiate): class in all classes
- no access to local variable declared in methods
- no support for generics on declaration.
- no support for constructors (considered as method init)
- no AST structural matching. (e.g loops ...)
- no variable binding/unification
- set operators (union, intersection)
- support for read/write of fields references
- variable binding through predicates
- support for read & mutation of fields (write) [writes(?B,?F,?L) means: "Block ?B writes to field ?F at location ?L"]
- no structural matching (e.g pattern match on a loop or body of a method)
- no generics support
Relevant Literature
[1] Brian Goetz. Language designer's notebook: Quantitative language design.
[2] Chris Parnin, Christian Bird, and Emerson Murphy-Hill. 2011. Java generics adoption: how new features are introduced, championed, or ignored. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR '11)
[3] Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In Proceedings of the 2010 Asia Pacific Software Engineering Conference (APSEC '10)
[4] Joseph Gil and Keren Lenz. 2010. The use of overloading in JAVA programs. In Proceedings of the 24th European conference on Object-oriented programming (ECOOP '10)
[5] Raoul-Gabriel Urma and Janina Voigt. Using the OpenJDK to Investigate Covariance in Java. Java Magazine May/June 2012.
Related Projects
[a] Refactoring NG.
[b] Tal Cohen, Joseph (Yossi) Gil, and Itay Maman. 2006. JTL: the Java tools language. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications (OOPSLA '06)
[c] Browse By Query.