One of the code optimization techniques performed by the HotSpot compiler (C2) is Loop Invariant Code Motion. Loop invariants are values that do not change during execution of a loop, and so can be moved out of the loop, speeding up loop execution. For e.g., in the following loop, the expression (x + y) doesn’t change during execution of the loop. Hence, its a loop invariant:
int x,y;
for (i = 0; i < SIZE; i++) {
a[i] = x + y;
}
Compilers can safely move the expression (x + y) out of the loop or in other words, hoist the expression above the loop as shown here:
int x,y;
int temp = x + y;
for (i = 0; i < SIZE; i++) {
a[i] = temp;
}
Now, consider the following example.
public class BusyTask implements Runnable {
private boolean stop;
public void shutdown() {
this.stop = true;
}
public void run() {
while (!stop) {
}
System.out.println("Busy task stopped");
}
}
public class ShutdownDriver {
public static void main(String[] s) {
BusyTask task = new BusyTask();
new Thread(task).start();
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
task.shutdown();
System.out.println("BusyTask should stop now.");
}
}
Let's compile and run. I'll redirect the output to a file called before.log.
vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/javac -d bin src/*.java vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/java -server -XX:CompileCommand=print,*BusyTask.run -cp bin ShutdownDriver > before.log 2>/dev/null
As Jeremy Manson explains, this program is not guaranteed to terminate. An examination of code after compiler is done with optimization, shows why. The following is the JIT'd BusyTask.run() method:
000 N79: # B1 <- BLOCK HEAD IS JUNK Freq: 1
000 INT3
NOP # 3 bytes pad for loops and calls
008 B1: # B7 B2 <- BLOCK HEAD IS JUNK Freq: 1
008 # stack bang
PUSHL EBP
SUB ESP,24 # Create frame
016 MOV EBP,[ECX]
018 MOV [ESP + #0],ECX
01b CALL_LEAF,runtime OSR_migration_end
No JVM State Info
#
020 MOV ECX,[EBP + #4]
023 NullCheck EBP
023
023 B2: # B5 B3 <- B1 Freq: 0.999999
023 CMPu ECX,precise klass BusyTask: 0x0a0116c0:Constant:exact *
029 Jne,us B5 P=0.000001 C=-1.000000
029
02b B3: # B6 B4 <- B2 Freq: 0.999998
02b #checkcastPP of EBP
02b MOVZX8 ECX,[EBP + #8] # ubyte -> int ! Field BusyTask.stop
02f TEST ECX,ECX
031 Jne,s B6 P=0.000000 C=416502.000000
031
033 B4: # B4 <- B3 B4 top-of-loop Freq: 1e-35
033 TSTL #polladdr,EAX ! Safepoint: poll for GC # BusyTask::run @ bci:7 L[0]=EBP
# OopMap{ebp=Oop off=51}
039 JMP,s B4
039
03b B5: # N79 <- B2 Freq: 9.99999e-07
03b MOV ECX,#-75
040 NOP # 3 bytes pad for loops and calls
043 CALL,static wrapper for: uncommon_trap(reason='unreached' action='reinterpret')
# BusyTask::run @ bci:0 L[0]=EBP
# OopMap{ebp=Oop off=72}
048 INT3 ; ShouldNotReachHere
048
04d B6: # N79 <- B3 Freq: 4.99999e-07
04d MOV ECX,#22
052 NOP # 1 bytes pad for loops and calls
053 CALL,static wrapper for: uncommon_trap(reason='unloaded' action='reinterpret' index='22')
# BusyTask::run @ bci:10 L[0]=_
# OopMap{off=88}
058 INT3 ; ShouldNotReachHere
I've hilighted the while loop code. The loop begins at label B4 and after 3 lines we jump back to label B4 - it's actually an infinite loop! Apart from safepoint polling, there's nothing else in the loop. What happened to reading the value of stop as we wrote in BusyTask.run() method?
The answer is at label B3 (above B4). We read the value of stop and check if the value is true or false by performing an AND operation. If the value is false, we jump to label B6 and call System.out.println. We revert back to interpreter, if the call to static method (System.out.println) doesn't succeed. That's what the uncommon trap refers to.
In Java, the code at labels B3 and B4 would be something like this:
int temp = stop; // Label B3: load 'stop' field (stop == false)
if(temp1 & temp1 == 1) { // is it true or false? (temp == false)
while (true) { // Label B4:
// Safepoint polling added
}
}
System.out.println("Busy task stopped!"); // Label B6: revert back to interpreter for execution
The visibility problem
Why did the compiler reorder code this way? Because, this reordered code behaves the same way as the original version in a single-threaded environment and C2 doesn't know that we intend to change the value of stop by means of another thread. In a single-threaded program, the value of stop isn't going to change - it's a loop invariant! Hence, it can be safely moved out of the loop. So the compiler optimizes the loop by removing the need to check the value of stop each time. But by reordering code, C2 has altered the semantics of our program. The resulting program never terminates.
Formally, if an action in one thread is visible to another thread, then the result of that action can be observed by the second thread. In order to guarantee that the results of one action are observable to a second action, the first must happen before the second. Because there is no happens-before relationship between the two threads, BusyTask never sees the update to stop performed by ShutdownDriver. The compiler detects that no writes are performed to stop in BusyTask and hoists the reading of stop out of the loop, transforming it into an infinite loop.
The Fix
The keyword volatile ensures happens-before relationship between the two threads. All actions on volatiles happen in a single order, and each write to a volatile field happens before any read of that volatile that occurs later in that order. Since JSR 133, the Java Memory Model ensures this guarantee for volatile. It also places restrictions on the scope of reorderings in the presence of synchronization, and this is how we guarantee that threads can have a consistent view of variables shared across more than one thread.
So the fix is to declare stop as volatile. The other way is to synchronize all reads and writes on stop (using a common lock).
public class BusyTask implements Runnable {
private volatile boolean stop;
public void shutdown() {
this.stop = true;
}
public void run() {
while (!stop) {
}
System.out.println("Busy task stopped");
}
}
Let's compile and run. I'll redirect the output to a file called after.log.
vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/javac -d bin src/*.java vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/java -server -XX:CompileCommand=print,*BusyTask.run -cp bin ShutdownDriver > after.log 2>/dev/null
Following is BusyTask.run() code after making stop as volatile:
000 N91: # B1 <- BLOCK HEAD IS JUNK Freq: 1
000 INT3
NOP # 3 bytes pad for loops and calls
008 B1: # B7 B2 <- BLOCK HEAD IS JUNK Freq: 1
008 # stack bang
PUSHL EBP
SUB ESP,24 # Create frame
016 MOV EBX,[ECX]
018 MOV [ESP + #0],ECX
01b CALL_LEAF,runtime OSR_migration_end
No JVM State Info
#
020 MOV ECX,[EBX + #4]
023 NullCheck EBX
023
023 B2: # B6 B3 <- B1 Freq: 0.999999
023 CMPu ECX,precise klass BusyTask: 0x082ea6c0:Constant:exact *
029 Jne,us B6 P=0.000001 C=-1.000000
029
02b B3: # B5 B4 <- B2 Freq: 0.999998
02b #checkcastPP of EBX
02b LEA ECX,[EBX + #8]
02e MOVZX8 EAX,[ECX] # ubyte -> int ! Field VolatileBusyTask.stop
031 MEMBAR-acquire ! (empty encoding)
031 TEST EAX,EAX
033 Jne,s B5 P=0.000000 C=384208.000000
NOP # 11 bytes pad for loops and calls
040 B4: # B4 B5 <- B3 B4 Loop: B4-B4 inner Freq: 999998
040 TSTL #polladdr,EAX ! Safepoint: poll for GC # BusyTask::run @ bci:7 L[0]=EBX
# OopMap{ecx=Derived_oop_ebx ebx=Oop off=64}
046 MOVZX8 EDX,[ECX] # ubyte -> int ! Field VolatileBusyTask.stop
049 MEMBAR-acquire ! (empty encoding)
049 TEST EDX,EDX
04b Je,s B4 P=1.000000 C=384208.000000
04b
04d B5: # N91 <- B4 B3 Freq: 0.999998
04d MOV ECX,#22
052 NOP # 1 bytes pad for loops and calls
053 CALL,static wrapper for: uncommon_trap(reason='unloaded' action='reinterpret' index='22')
# BusyTask::run @ bci:10 L[0]=_
# OopMap{off=88}
058 INT3 ; ShouldNotReachHere
Observations
I've hilighted the while loop which begins at label B4. Observe that the field stop is now read (MOVZX8 EDX,[ECX]) and checked (TEST EDX,EDX) in the loop. If the value is false we jump back to label B4. Here's a slightly cleaned up version:
040 B4: ; Loop begins here 040 TSTL #polladdr,EAX ; Safepoint polling 046 MOVZX8 EDX,[ECX] ; Load value of stop to a temp variable 049 TEST EDX,EDX ; check the value of this temp variable 04b Je,s B4 ; If value == true, continue looping
By making stop as volatile, the compiler enforces happens-before relationship between the two threads so the writes performed on stop by ShutdownDriver are observed in Busytask.
One other thing. Take a look at the code starting at label B3 (after a slight cleanup):
02b B3: ;
02b LEA ECX,[EBX + #8] ; Read value of stop
02e MOVZX8 EAX,[ECX] ; Load value of stop to a temp variable
031 TEST EAX,EAX ; check the value of temp variable
033 Jne,s B5 ; if value == false, jump to B5
NOP
040 B4: ; loop begins here
040 TSTL #polladdr,EAX ; Safepoint polling added
046 MOVZX8 EDX,[ECX] ; Load stop to a temp variable
049 TEST EDX,EDX ; if value == true continue looping
04b Je,s B4
04d B5:
04d MOV ECX,#22 ; At this label, execution goes back
052 NOP ; to interpreter.
053 CALL,static ; call to out.println()
# BusyTask::run @ bci:10 L[0]=_ ; Values needed to restore interpreter frame
058 INT3 ; Can't come back here from interpreter
Notice that there is still a test condition before the loop is executed. Also, in the loop (at label B4), the TEST instruction is executed just before the jump back to B4. In Java, this is how this logic would look:
int temp1 = stop; // Label B3: load 'stop' field (stop == false)
if(temp1 & temp1 == 1) { // is it true or false? (temp == false)
do {
// Safepoint polling added
int temp2 = stop; // Label B4:
} while (temp2 & temp2 == 0) // Compare and loop again ...
}
System.out.println("Busy task stopped!"); // Label B5: revert back to interpreter for execution
The compiler transforms the while loop into an if block containing a do...while loop. This optimization technique is called loop inversion.
If you have any comments, suggestions or corrections please feel free to let me know.