Evaluating Call Graph Construction for JVM-hosted Language Implementations
An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing bytecode. While this approach works well when applied to bytecode produced from Java code, its efficacy when applied to other bytecode has not been studied until now.
We present qualitative and quantitative analysis of the soundness and precision of call graphs constructed from JVM bytecodes produced for Python, Ruby, Clojure, Groovy, Scala, and OCaml applications. We show that, for Python, Ruby, Clojure, and Groovy, the call graphs are unsound due to use of reflection, invokedynamic instructions, and run-time code generation, and imprecise due to how function calls are translated. For Scala and OCaml, all unsoundness comes from rare, complex uses of reflection and proxies, and the translation of first-class features in Scala incurs a significant loss of precision.
Frank Tip is a Principal Engineer in the Frontier Computer Science Lab at Samsung Research America in San Jose, California and an Adjunct Professor at the David R. Cheriton School of Computer Science at the University of Waterloo. Previously, he was a Professor and Cheriton Research Chair in the David R. Cheriton School of Computer Science at the University of Waterloo (2012-2014), and a Research Staff Member and Manager at the Software Technology Department at the IBM T.J. Watson Research Center (1995-2012). He received his PhD in 1995 from the University of Amsterdam.