Benedikt Meurer JavaScript Engine Hacker and Programming Language Enthusiast.

JavaScript Optimization Patterns (Part 2)

Following up on part one of this series last week, here's another (hopefully interesting) episode about optimization patterns for JavaScript (based on my background working on the V8 engine for more than four years). This week we're going to look into an optimization called Function Context Specialization, that we introduced to V8 with TurboFan (other engines like JavaScriptCore implement similar optimizations). The name is a bit misleading. What it essentially does is to allow TurboFan to constant-fold certain values when generating optimized code, and it does that by specializing the generated machine code for a function to its surrounding context (which is V8 speak for the runtime representation of scope).

Consider the following simple code snippet:

const INCREMENT = 1;

function incr(x) {
return x + INCREMENT;
}

Assume that we run this in on <script> level in Chrome (or on top-level in the d8 shell), then we see the following bytecode generated for the function incr:

$ out/Release/d8 --print-bytecode ex1.js
...SNIP...
[generating bytecode for function: incr]
Parameter count 2
Frame size 0
   35 E> 0x1859bd52f4fe @    0 : 92                StackCheck
   41 S> 0x1859bd52f4ff @    1 : 13 04             LdaImmutableCurrentContextSlot [4]
   52 E> 0x1859bd52f501 @    3 : 97 00             ThrowReferenceErrorIfHole [0]
   50 E> 0x1859bd52f503 @    5 : 2b 02 03          Add a0, [3]
   63 S> 0x1859bd52f506 @    8 : 96                Return
Constant pool (size = 1)
0x1859bd52f4b1: [FixedArray] in OldSpace
 - map = 0x2f062f402309 <Map(PACKED_HOLEY_ELEMENTS)>
 - length: 1
           0: 0x1859bd52ef11 <String[9]: INCREMENT>
Handler Table (size = 16)

The interesting bit here is the access to the constant INCREMENT on script scope: It is loaded from the surrounding context via the LdaImmutableCurrentContextSlot bytecode and then immediately checked whether the value is what we call the_hole in V8; the_hole is an internal marker that is used to implement the temporal dead zone for lexical scoping (see Variables and scoping in ECMAScript 6 by Axel Rauschmayer for details on this). This is a bit counter-intuitive to many developers that I talk to, as the intuition is that the VM needs to do less work for const than var, especially inside of local scopes, but the reality is that - at least initially - the VM needs to do even more work because of the additional TDZ (temporal dead zone) check. This is necessary because of the way scoping works, i.e. let's look at ex2.js:

console.log(incr(5));

const INCREMENT = 1;

function incr(x) {
return x + INCREMENT;
}

And run it in the d8 shell:

$ out/Release/d8 ex2.js
ex2.js:5: ReferenceError: INCREMENT is not defined
function incr(x) { return x + INCREMENT; }
                              ^
ReferenceError: INCREMENT is not defined
    at incr (ex2.js:5:31)
    at ex2.js:1:13

What happens here is that the TDZ check fails, because the assignment const INCREMENT = 1 wasn't executed before incr was run. I have to admit that even though I'm working on the VM side of this for quite a while, I still find this behavior highly counter-intuitive, but I also don't consider myself a very good language designer... Ok, ranting aside. Looking at the example again, it obviously works if you put the call to incr last

const INCREMENT = 1;

function incr(x) {
return x + INCREMENT;
}

console.log(incr(5));

and run that in the d8 shell:

$ out/Release/d8 ex3.js
6

So much on the background for the temporal dead zone.

Performance-wise there's one very interesting (and maybe obvious) observation here: Once a particular const slot in a context is assigned, it will keep that value, and will not go back to ever contain the_hole again (that's what const guarantees). And we use exactly this fact in TurboFan to avoid loading and checking const slot values each time.

const INCREMENT = 1;

function incr(x) { return x + INCREMENT; }

// Warmup
incr(3);
incr(4);
%OptimizeFunctionOnNextCall(incr);
console.log(incr(5));

We can see this in the optimized machine code that is generated by TurboFan:

$ out/Release/d8 --allow-natives-syntax --print-opt-code --code-comments ex4.js
...SNIP...
                  -- B0 start (construct frame) --
0x11e35a6041e0     0  55             push rbp
0x11e35a6041e1     1  4889e5         REX.W movq rbp,rsp
0x11e35a6041e4     4  56             push rsi
0x11e35a6041e5     5  57             push rdi
0x11e35a6041e6     6  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x11e35a6041ed     d  0f862a000000   jna 0x11e35a60421d  <+0x3d>
                  -- B2 start --
                  -- B3 start (deconstruct frame) --
0x11e35a6041f3    13  488b4510       REX.W movq rax,[rbp+0x10]
0x11e35a6041f7    17  a801           test al,0x1
0x11e35a6041f9    19  0f8535000000   jnz 0x11e35a604234  <+0x54>
0x11e35a6041ff    1f  488bd8         REX.W movq rbx,rax
0x11e35a604202    22  48c1eb20       REX.W shrq rbx, 32
0x11e35a604206    26  83c301         addl rbx,0x1
0x11e35a604209    29  0f802a000000   jo 0x11e35a604239  <+0x59>
0x11e35a60420f    2f  48c1e320       REX.W shlq rbx, 32
0x11e35a604213    33  488bc3         REX.W movq rax,rbx
0x11e35a604216    36  488be5         REX.W movq rsp,rbp
0x11e35a604219    39  5d             pop rbp
0x11e35a60421a    3a  c21000         ret 0x10
...SNIP...

The only really interesting line here is line at offset 26 with the instruction addl rbx,0x1, where rbx contains the integer value of the parameter x passed to the function (based on the fact that we warmed up incr with integer values for x before), and the 0x1 is the constant-folded value of the INCREMENT constant from the surrounding context. The constant-folding in this case is only valid, because TurboFan knows that no one can change the value of INCREMENT anymore once it's no longer the_hole (i.e. outside the TDZ). Actually it's not TurboFan that figures this out, but the Ignition interpreter forwards this information to TurboFan via the dedicated bytecode LdaImmutableCurrentContextSlot that we saw earlier, specifically it's the immutable bit in this bytecode that tells TurboFan that the context slot cannot change anymore once it contains a non-holey value. We can see the difference when we try the same example with let:

let INCREMENT = 1;

function incr(x) { return x + INCREMENT; }

// Warmup
incr(3);
incr(4);
%OptimizeFunctionOnNextCall(incr);
console.log(incr(5));

Running this ex5.js code in the d8 shell and inspecting both the bytecode and the optimized machine code looks like this:

$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex5.js
...SNIP...
[generating bytecode for function: incr]
Parameter count 2
Frame size 0
   33 E> 0xa9399d2f63e @    0 : 92                StackCheck
   39 S> 0xa9399d2f63f @    1 : 12 04             LdaCurrentContextSlot [4]
   50 E> 0xa9399d2f641 @    3 : 97 00             ThrowReferenceErrorIfHole [0]
   48 E> 0xa9399d2f643 @    5 : 2b 02 03          Add a0, [3]
   61 S> 0xa9399d2f646 @    8 : 96                Return
...SNIP...
                  -- B0 start (construct frame) --
0x25139be041e0     0  55             push rbp
0x25139be041e1     1  4889e5         REX.W movq rbp,rsp
0x25139be041e4     4  56             push rsi
0x25139be041e5     5  57             push rdi
0x25139be041e6     6  4883ec08       REX.W subq rsp,0x8
0x25139be041ea     a  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x25139be041f1    11  0f864b000000   jna 0x25139be04242  <+0x62>
                  -- B2 start --
                  -- B3 start --
0x25139be041f7    17  48b8d1f4d299930a0000 REX.W movq rax,0xa9399d2f4d1    ;; object: 0xa9399d2f4d1 <FixedArray[5]>
0x25139be04201    21  488b402f       REX.W movq rax,[rax+0x2f]
0x25139be04205    25  493945a8       REX.W cmpq [r13-0x58],rax
0x25139be04209    29  0f844a000000   jz 0x25139be04259  <+0x79>
                  -- B4 start (deconstruct frame) --
0x25139be0420f    2f  488b5d10       REX.W movq rbx,[rbp+0x10]
0x25139be04213    33  f6c301         testb rbx,0x1
0x25139be04216    36  0f8564000000   jnz 0x25139be04280  <+0xa0>
0x25139be0421c    3c  a801           test al,0x1
0x25139be0421e    3e  0f8561000000   jnz 0x25139be04285  <+0xa5>
0x25139be04224    44  48c1e820       REX.W shrq rax, 32
0x25139be04228    48  488bd3         REX.W movq rdx,rbx
0x25139be0422b    4b  48c1ea20       REX.W shrq rdx, 32
0x25139be0422f    4f  03c2           addl rax,rdx
0x25139be04231    51  0f8053000000   jo 0x25139be0428a  <+0xaa>
0x25139be04237    57  48c1e020       REX.W shlq rax, 32
0x25139be0423b    5b  488be5         REX.W movq rsp,rbp
0x25139be0423e    5e  5d             pop rbp
0x25139be0423f    5f  c21000         ret 0x10
...SNIP...

Here we see that Ignition has to use LdaCurrentContextSlot, i.e. it cannot proof that the value of INCREMENT cannot change afterwards, because every other script could just modify INCREMENT later. And as such TurboFan cannot constant-fold the value 1, but instead has to generate explicit code to load INCREMENT from the script context and check that it's not the_hole (the code between offset 17 and 2f in the listing above does that).

So in this sense, const is a performance feature, but only once it reaches the optimizing compiler and if the Function Context Specialization kicks in, which depends on a rather simple condition that might not be obvious: It's only enabled for the first closure of any function in a given native context (which is V8 speak for <iframe>). So what does that mean? In the examples above, there was always only a single closure of incr. But let's consider this simple counter-example ex6.js:

const INCREMENT = 1;

function makeIncr() {
function incr(x) { return x + INCREMENT; }
return incr;
}

function test(incr) {
// Warmup
incr(3);
incr(4);
%OptimizeFunctionOnNextCall(incr);
console.log(incr(5));
}

test(makeIncr());
test(makeIncr());

It's definitely a bit artificial, but it's important to highlight the key takeaway: There are now multiple closures for the same function incr, generated by makeIncr. Running this in d8 reveals what I just described:

$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex6.js
...SNIP...
[generating bytecode for function: incr]
Parameter count 2
Frame size 0
   59 E> 0x34d1b322fb56 @    0 : 92                StackCheck
   65 S> 0x34d1b322fb57 @    1 : 13 04             LdaImmutableCurrentContextSlot [4]
   76 E> 0x34d1b322fb59 @    3 : 97 00             ThrowReferenceErrorIfHole [0]
   74 E> 0x34d1b322fb5b @    5 : 2b 02 03          Add a0, [3]
   87 S> 0x34d1b322fb5e @    8 : 96                Return
...SNIP...
                  -- B0 start (construct frame) --
0x30d8696041e0     0  55             push rbp
0x30d8696041e1     1  4889e5         REX.W movq rbp,rsp
0x30d8696041e4     4  56             push rsi
0x30d8696041e5     5  57             push rdi
0x30d8696041e6     6  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x30d8696041ed     d  0f862a000000   jna 0x30d86960421d  <+0x3d>
                  -- B2 start --
                  -- B3 start (deconstruct frame) --
0x30d8696041f3    13  488b4510       REX.W movq rax,[rbp+0x10]
0x30d8696041f7    17  a801           test al,0x1
0x30d8696041f9    19  0f8535000000   jnz 0x30d869604234  <+0x54>
0x30d8696041ff    1f  488bd8         REX.W movq rbx,rax
0x30d869604202    22  48c1eb20       REX.W shrq rbx, 32
0x30d869604206    26  83c301         addl rbx,0x1
0x30d869604209    29  0f802a000000   jo 0x30d869604239  <+0x59>
0x30d86960420f    2f  48c1e320       REX.W shlq rbx, 32
0x30d869604213    33  488bc3         REX.W movq rax,rbx
0x30d869604216    36  488be5         REX.W movq rsp,rbp
0x30d869604219    39  5d             pop rbp
0x30d86960421a    3a  c21000         ret 0x10
...SNIP...
                  -- B0 start (construct frame) --
0x30d8696042c0     0  55             push rbp
0x30d8696042c1     1  4889e5         REX.W movq rbp,rsp
0x30d8696042c4     4  56             push rsi
0x30d8696042c5     5  57             push rdi
0x30d8696042c6     6  4883ec08       REX.W subq rsp,0x8
0x30d8696042ca     a  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x30d8696042d1    11  0f8649000000   jna 0x30d869604320  <+0x60>
                  -- B2 start --
                  -- B3 start --
0x30d8696042d7    17  488b45f8       REX.W movq rax,[rbp-0x8]
0x30d8696042db    1b  488b582f       REX.W movq rbx,[rax+0x2f]
0x30d8696042df    1f  49395da8       REX.W cmpq [r13-0x58],rbx
0x30d8696042e3    23  0f844e000000   jz 0x30d869604337  <+0x77>
                  -- B4 start (deconstruct frame) --
0x30d8696042e9    29  488b5510       REX.W movq rdx,[rbp+0x10]
0x30d8696042ed    2d  f6c201         testb rdx,0x1
0x30d8696042f0    30  0f8568000000   jnz 0x30d86960435e  <+0x9e>
0x30d8696042f6    36  f6c301         testb rbx,0x1
0x30d8696042f9    39  0f8564000000   jnz 0x30d869604363  <+0xa3>
0x30d8696042ff    3f  48c1eb20       REX.W shrq rbx, 32
0x30d869604303    43  488bca         REX.W movq rcx,rdx
0x30d869604306    46  48c1e920       REX.W shrq rcx, 32
0x30d86960430a    4a  03d9           addl rbx,rcx
0x30d86960430c    4c  0f8056000000   jo 0x30d869604368  <+0xa8>
0x30d869604312    52  48c1e320       REX.W shlq rbx, 32
0x30d869604316    56  488bc3         REX.W movq rax,rbx
0x30d869604319    59  488be5         REX.W movq rsp,rbp
0x30d86960431c    5c  5d             pop rbp
0x30d86960431d    5d  c21000         ret 0x10
...SNIP...

Ignition sticks an LdaImmutableCurrentContextSlot bytecode in there, because it's a const context slot, but Function Context Specialization only kicks in for the first closure. The second closure get's new optimized code, which is not specialized. The reason behind this is that if you have more than one closure per function we would like to share the code between different closure, as it would be a waste of resources - both time and memory - to generate one code object per closure then, esp. if you use arrow functions with higher order builtins like for example

let b = a.map(x => x + 1);

where you don't want to have the optimizing compiler run every time you execute this line just to generate a specialized code object for x => x + 1. So the rule here is simple:

You only get Function Context Specialization for the first closure of every function in any given <iframe> (native context in V8 speak).

The native context part doesn't apply to Node as there you only have one native context, except when you use the vm module.

Now considering that class is like let, i.e. it's a mutable binding (again for reasons that I don't want to buy), you don't necessarily benefit from Function Context Specialization when using classes. Let's consider ex7.js:

class A {};

function makeA() { return new A; }

makeA();
makeA();
%OptimizeFunctionOnNextCall(makeA);
makeA();

Inspecting again the bytecode and the optimized code for makeA we observe the following:

$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex7.js
...SNIP...
[generating bytecode for function: makeA]
Parameter count 1
Frame size 8
   27 E> 0x1fcce9caf75e @    0 : 92                StackCheck
   32 S> 0x1fcce9caf75f @    1 : 12 04             LdaCurrentContextSlot [4]
         0x1fcce9caf761 @    3 : 97 00             ThrowReferenceErrorIfHole [0]
         0x1fcce9caf763 @    5 : 1e fa             Star r0
   39 E> 0x1fcce9caf765 @    7 : 58 fa fa 00 03    Construct r0, r0-r0, [3]
   46 S> 0x1fcce9caf76a @   12 : 96                Return
...SNIP...
                  -- B0 start (construct frame) --
0x19518f5041e0     0  55             push rbp
0x19518f5041e1     1  4889e5         REX.W movq rbp,rsp
0x19518f5041e4     4  56             push rsi
0x19518f5041e5     5  57             push rdi
0x19518f5041e6     6  4883ec08       REX.W subq rsp,0x8
0x19518f5041ea     a  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x19518f5041f1    11  0f8673000000   jna 0x19518f50426a  <+0x8a>
                  -- B2 start --
                  -- B3 start --
0x19518f5041f7    17  48b821f5cae9cc1f0000 REX.W movq rax,0x1fcce9caf521    ;; object: 0x1fcce9caf521 <FixedArray[5]>
0x19518f504201    21  488b402f       REX.W movq rax,[rax+0x2f]
0x19518f504205    25  493945a8       REX.W cmpq [r13-0x58],rax
0x19518f504209    29  0f8488000000   jz 0x19518f504297  <+0xb7>
                  -- B4 start --
0x19518f50420f    2f  48bb29b9e84758150000 REX.W movq rbx,0x155847e8b929    ;; object: 0x155847e8b929 <JSFunction A (sfi = 0x1fcce9caf169)>
0x19518f504219    39  483bd8         REX.W cmpq rbx,rax
0x19518f50421c    3c  0f859c000000   jnz 0x19518f5042be  <+0xde>
0x19518f504222    42  498b8578e40300 REX.W movq rax,[r13+0x3e478]
0x19518f504229    49  488d5818       REX.W leaq rbx,[rax+0x18]
0x19518f50422d    4d  49399d80e40300 REX.W cmpq [r13+0x3e480],rbx
0x19518f504234    54  0f864a000000   jna 0x19518f504284  <+0xa4>
                  -- B6 start --
                  -- B7 start (deconstruct frame) --
0x19518f50423a    5a  488d5818       REX.W leaq rbx,[rax+0x18]
0x19518f50423e    5e  4883c001       REX.W addq rax,0x1
0x19518f504242    62  49899d78e40300 REX.W movq [r13+0x3e478],rbx
0x19518f504249    69  48bb9105294321300000 REX.W movq rbx,0x302143290591    ;; object: 0x302143290591 <Map(PACKED_HOLEY_ELEMENTS)>
0x19518f504253    73  488958ff       REX.W movq [rax-0x1],rbx
0x19518f504257    77  498b5d70       REX.W movq rbx,[r13+0x70]
0x19518f50425b    7b  48895807       REX.W movq [rax+0x7],rbx
0x19518f50425f    7f  4889580f       REX.W movq [rax+0xf],rbx
0x19518f504263    83  488be5         REX.W movq rsp,rbp
0x19518f504266    86  5d             pop rbp
0x19518f504267    87  c20800         ret 0x8
...SNIP...

What's interesting to see here is that the constructor for A is properly inlined into makeA in the optimized code and we essentially just stamp out instances of A with the best possible code, except for the additional checks that we need to perform because TurboFan doesn't know that A cannot change (in fact A can change at any moment, since it's a mutable binding). So all the code between offset 17 and offset 2f loads the context slot for A and checks that it's not the_hole and the next two lines check that it's actually the JSFunction A that we saw earlier (during warmup). As you can see TurboFan nevertheless tries hard to generate pretty decent code. But you can help it further by using const here as well:

const A = class A {};

function makeA() { return new A; }

makeA();
makeA();
%OptimizeFunctionOnNextCall(makeA);
makeA();

Now you get the ideal code for makeA because Ignition tells TurboFan that the context slot cannot change (via LdaImmutableCurrentContextSlot):

$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex8.js
...SNIP...
[generating bytecode for function: makeA]
Parameter count 1
Frame size 8
   37 E> 0x257007eaf75e @    0 : 92                StackCheck
   42 S> 0x257007eaf75f @    1 : 13 04             LdaImmutableCurrentContextSlot [4]
         0x257007eaf761 @    3 : 97 00             ThrowReferenceErrorIfHole [0]
         0x257007eaf763 @    5 : 1e fa             Star r0
   49 E> 0x257007eaf765 @    7 : 58 fa fa 00 03    Construct r0, r0-r0, [3]
   56 S> 0x257007eaf76a @   12 : 96                Return
...SNIP...
                  -- B0 start (construct frame) --
0x3f0511b841e0     0  55             push rbp
0x3f0511b841e1     1  4889e5         REX.W movq rbp,rsp
0x3f0511b841e4     4  56             push rsi
0x3f0511b841e5     5  57             push rdi
0x3f0511b841e6     6  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x3f0511b841ed     d  0f8648000000   jna 0x3f0511b8423b  <+0x5b>
                  -- B2 start --
                  -- B3 start --
0x3f0511b841f3    13  498b8578e40300 REX.W movq rax,[r13+0x3e478]
0x3f0511b841fa    1a  488d5818       REX.W leaq rbx,[rax+0x18]
0x3f0511b841fe    1e  49399d80e40300 REX.W cmpq [r13+0x3e480],rbx
0x3f0511b84205    25  0f8647000000   jna 0x3f0511b84252  <+0x72>
                  -- B5 start --
                  -- B6 start (deconstruct frame) --
0x3f0511b8420b    2b  488d5818       REX.W leaq rbx,[rax+0x18]
0x3f0511b8420f    2f  4883c001       REX.W addq rax,0x1
0x3f0511b84213    33  49899d78e40300 REX.W movq [r13+0x3e478],rbx
0x3f0511b8421a    3a  48bb9105d1acaa270000 REX.W movq rbx,0x27aaacd10591    ;; object: 0x27aaacd10591 <Map(PACKED_HOLEY_ELEMENTS)>
0x3f0511b84224    44  488958ff       REX.W movq [rax-0x1],rbx
0x3f0511b84228    48  498b5d70       REX.W movq rbx,[r13+0x70]
0x3f0511b8422c    4c  48895807       REX.W movq [rax+0x7],rbx
0x3f0511b84230    50  4889580f       REX.W movq [rax+0xf],rbx
0x3f0511b84234    54  488be5         REX.W movq rsp,rbp
0x3f0511b84237    57  5d             pop rbp
0x3f0511b84238    58  c20800         ret 0x8
...SNIP...

This is the perfect x64 machine code for makeA, there are no redundant checks in this code left (the two checks in there are the stack check to ensure that V8 doesn't overflow the execution stack and the bump pointer check to trigger garbage collection when new space is filled up).

So far the only way to get LdaImmutableCurrentContextSlot instead of LdaCurrentContextSlot was by using const. But this was because I was demonstrating only code operating on lexically bound names on script level (or top-level in d8). If we go back to the simple let example in ex5.js and run that in Node 9 (or 8.2.0-rc1) we see that INCREMENT get's constant-folded despite using let:

$ node --print-opt-code --code-comments --allow-natives-syntax ex5.js
...SNIP...
                  -- B0 start (construct frame) --
0x2f2f61804f60     0  55             push rbp
0x2f2f61804f61     1  4889e5         REX.W movq rbp,rsp
0x2f2f61804f64     4  56             push rsi
0x2f2f61804f65     5  57             push rdi
0x2f2f61804f66     6  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x2f2f61804f6d     d  0f862a000000   jna 0x2f2f61804f9d  <+0x3d>
                  -- B2 start --
                  -- B3 start (deconstruct frame) --
0x2f2f61804f73    13  488b4510       REX.W movq rax,[rbp+0x10]
0x2f2f61804f77    17  a801           test al,0x1
0x2f2f61804f79    19  0f8535000000   jnz 0x2f2f61804fb4  <+0x54>
0x2f2f61804f7f    1f  488bd8         REX.W movq rbx,rax
0x2f2f61804f82    22  48c1eb20       REX.W shrq rbx, 32
0x2f2f61804f86    26  83c301         addl rbx,0x1
0x2f2f61804f89    29  0f802a000000   jo 0x2f2f61804fb9  <+0x59>
0x2f2f61804f8f    2f  48c1e320       REX.W shlq rbx, 32
0x2f2f61804f93    33  488bc3         REX.W movq rax,rbx
0x2f2f61804f96    36  488be5         REX.W movq rsp,rbp
0x2f2f61804f99    39  5d             pop rbp
0x2f2f61804f9a    3a  c21000         ret 0x10
                  -- B4 start (no frame) --
                  -- B1 start (deferred) --
                  -- </usr/local/google/home/bmeurer/Projects/v8/ex5.js:3:14> --
0x2f2f61804f9d    3d  48bb40690e0100000000 REX.W movq rbx,0x10e6940
0x2f2f61804fa7    47  33c0           xorl rax,rax
0x2f2f61804fa9    49  488b75f8       REX.W movq rsi,[rbp-0x8]
0x2f2f61804fad    4d  e82ef6e7ff     call 0x2f2f616845e0     ;; code: STUB, CEntryStub, minor: 8
0x2f2f61804fb2    52  ebbf           jmp 0x2f2f61804f73  <+0x13>
0x2f2f61804fb4    54  e847f0cfff     call 0x2f2f61504000     ;; deoptimization bailout 0
0x2f2f61804fb9    59  e84cf0cfff     call 0x2f2f6150400a     ;; deoptimization bailout 1
...SNIP...

And similar if we run ex7.js with the class binding for A in Node 9 (or 8.2.0-rc1):

$ node --print-opt-code --code-comments --allow-natives-syntax ex7.js
...SNIP...
                  -- B0 start (construct frame) --
0x2e1f81f84e80     0  55             push rbp
0x2e1f81f84e81     1  4889e5         REX.W movq rbp,rsp
0x2e1f81f84e84     4  56             push rsi
0x2e1f81f84e85     5  57             push rdi
0x2e1f81f84e86     6  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x2e1f81f84e8d     d  0f8648000000   jna 0x2e1f81f84edb  <+0x5b>
                  -- B2 start --
                  -- B3 start --
0x2e1f81f84e93    13  498b85a8ec0300 REX.W movq rax,[r13+0x3eca8]
0x2e1f81f84e9a    1a  488d5818       REX.W leaq rbx,[rax+0x18]
0x2e1f81f84e9e    1e  49399db0ec0300 REX.W cmpq [r13+0x3ecb0],rbx
0x2e1f81f84ea5    25  0f8647000000   jna 0x2e1f81f84ef2  <+0x72>
                  -- B5 start --
                  -- B6 start (deconstruct frame) --
0x2e1f81f84eab    2b  488d5818       REX.W leaq rbx,[rax+0x18]
0x2e1f81f84eaf    2f  4883c001       REX.W addq rax,0x1
0x2e1f81f84eb3    33  49899da8ec0300 REX.W movq [r13+0x3eca8],rbx
0x2e1f81f84eba    3a  48bb012f6b7ceb110000 REX.W movq rbx,0x11eb7c6b2f01    ;; object: 0x11eb7c6b2f01 <Map(PACKED_HOLEY_ELEMENTS)>
0x2e1f81f84ec4    44  488958ff       REX.W movq [rax-0x1],rbx
0x2e1f81f84ec8    48  498b5d70       REX.W movq rbx,[r13+0x70]
0x2e1f81f84ecc    4c  48895807       REX.W movq [rax+0x7],rbx
0x2e1f81f84ed0    50  4889580f       REX.W movq [rax+0xf],rbx
0x2e1f81f84ed4    54  488be5         REX.W movq rsp,rbp
0x2e1f81f84ed7    57  5d             pop rbp
0x2e1f81f84ed8    58  c20800         ret 0x8
...SNIP...

We see that this is the ideal code. The reason for this is the CommonJS module system used by Node. Every module is implicitly wrapped into a function. So ex7.js in Node corresponds roughly to the following code in Chrome or d8:

(function() {
class A {};

function makeA() { return new A; }

makeA();
makeA();
%OptimizeFunctionOnNextCall(makeA);
makeA();
})();

This is simplified (as I don't want to explain webpack as well here). What's interesting here, is that A is local to the anonymous closure, and thus the parser can actually proof that A never changed after the initial definition, because no code outside the closure can see (and touch) the binding A. Thereby Ignition sticks an LdaImmutableCurrentContextSlot in there and TurboFan can generate awesome code for makeA:

$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex9.js
...SNIP...
[generating bytecode for function: makeA]
Parameter count 1
Frame size 8
   45 E> 0x22ac28a2f7e6 @    0 : 92                StackCheck
   50 S> 0x22ac28a2f7e7 @    1 : 13 04             LdaImmutableCurrentContextSlot [4]
         0x22ac28a2f7e9 @    3 : 97 00             ThrowReferenceErrorIfHole [0]
         0x22ac28a2f7eb @    5 : 1e fa             Star r0
   57 E> 0x22ac28a2f7ed @    7 : 58 fa fa 00 03    Construct r0, r0-r0, [3]
   64 S> 0x22ac28a2f7f2 @   12 : 96                Return
...SNIP...
                  -- B0 start (construct frame) --
0x138cd23841e0     0  55             push rbp
0x138cd23841e1     1  4889e5         REX.W movq rbp,rsp
0x138cd23841e4     4  56             push rsi
0x138cd23841e5     5  57             push rdi
0x138cd23841e6     6  493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x138cd23841ed     d  0f8648000000   jna 0x138cd238423b  <+0x5b>
                  -- B2 start --
                  -- B3 start --
0x138cd23841f3    13  498b8578e40300 REX.W movq rax,[r13+0x3e478]
0x138cd23841fa    1a  488d5818       REX.W leaq rbx,[rax+0x18]
0x138cd23841fe    1e  49399d80e40300 REX.W cmpq [r13+0x3e480],rbx
0x138cd2384205    25  0f8647000000   jna 0x138cd2384252  <+0x72>
                  -- B5 start --
                  -- B6 start (deconstruct frame) --
0x138cd238420b    2b  488d5818       REX.W leaq rbx,[rax+0x18]
0x138cd238420f    2f  4883c001       REX.W addq rax,0x1
0x138cd2384213    33  49899d78e40300 REX.W movq [r13+0x3e478],rbx
0x138cd238421a    3a  48bb910501aa382d0000 REX.W movq rbx,0x2d38aa010591    ;; object: 0x2d38aa010591 <Map(PACKED_HOLEY_ELEMENTS)>
0x138cd2384224    44  488958ff       REX.W movq [rax-0x1],rbx
0x138cd2384228    48  498b5d70       REX.W movq rbx,[r13+0x70]
0x138cd238422c    4c  48895807       REX.W movq [rax+0x7],rbx
0x138cd2384230    50  4889580f       REX.W movq [rax+0xf],rbx
0x138cd2384234    54  488be5         REX.W movq rsp,rbp
0x138cd2384237    57  5d             pop rbp
0x138cd2384238    58  c20800         ret 0x8
...SNIP...

So takeaways from this exercise:

  1. Looking at generated x64 machine code can be frightening.
  2. const comes with a cost for the TDZ, but can pay off in optimized code.
  3. class binding is equivalent to let binding, use const to get immutable binding on script scope.
  4. JavaScript VMs try to be smart within function scopes (as used by Node or webpack).