Hiroshi Yamauchi and I are giving a talk at this year's JavaOne. Part of this talk will be about our experiences writing profilers at Google, and in preparing for it, I realized my previous entry on this subject only told half the story.
The topic of the original post is that there is an undocumented JVMTI call in OpenJDK that allows you to get a stack trace from a running thread, regardless of the state of that thread. In Unix-like systems, you can use a SIGPROF signal to call this function at (semi-)regular intervals, without having to do anything to your code. The followon post, intended to describe why you would use this, described a couple of things that are true:
- That it tells you what is actually running on your CPU, not just what might be scheduled, which is what profilers that take a snapshot of every running thread do. This increases its accuracy dramatically. For example, there are many profilers that report a great deal of your time spent in blocking mechanisms, simply because threads frequently become scheduled as they exit blocking functions. However, those sorts of threads are all "ready to be scheduled" rather than "actually running". As a result, CPU time is charged to those threads inaccurately.
- That it reports time spent in GC and in other VM-specific activities (like JIT compiling). At Google, we find that substantial amounts of our users' CPU time is spent in VM activities. In fact, GC brings the occasional Google service to its knees (not mentioning any names).
Sampling profilers that take thread dumps from individual threads often do so by injecting thread stack trace gathering code into your code. This is suboptimal in many ways:
- It introduces overhead into your code. This is the most obvious way. As soon as you introduce overhead, you are changing the performance characteristics.
- Changing the size of the code changes the optimizing decisions made by the JIT. For example, one of the most important performance improvements made by a just-in-time compiler is when it inlines small methods. It won't inline if the method gets too large. Introducing sampling into the methods changes the size, and therefore affects inlining decisions, changing the performance characteristics of your code.
- Changing the size of the code affects code layout. For relatively obvious reasons of caching and memory alignment, the placement of your code can affect its performance. Bigger code often means worse performance.
(Another interesting point, as made by Todd Mytkowicz and Amer Diwan of the University of Colorado: since JIT behavior really depends on everything in the system, and most profilers are part of the system, the decision about where to put a safe point will end up depending on which profiler you are using. This can make the results of the profilers clash violently: because of their differing behaviors, the safe points end up in different places, and the profilers end up tracking different places. See Mytkowicz and Diwan's recent PLDI paper for details.)
All of this JIT talk basically mirrors the Heisenberg Uncertainty Principle, but for profiling. You can either have exact information, or you can have your code behave as it is supposed to behave, but not both. You need to minimize the effects, so you want a profiler that doesn't interfere with or depend on JIT decisions at all. AsyncGetCallTrace fits this bill - you call it, and it returns a result. You don't call it directly from your code. It doesn't wait for a safe point. It doesn't change your code. The JIT doesn't care.
ETA: I believe that the profiler bundled with Sun Studio uses the AsyncGetCallTrace method call, but I'm not exactly sure how.