JavaScript Optimization Patterns (Part 2)
Following up on part one of this series last week, here's another (hopefully interesting) episode about optimization patterns for JavaScript (based on my background working on the V8 engine for more than four years). This week we're going to look into an optimization called Function Context Specialization, that we introduced to V8 with TurboFan (other engines like JavaScriptCore implement similar optimizations). The name is a bit misleading. What it essentially does is to allow TurboFan to constant-fold certain values when generating optimized code, and it does that by specializing the generated machine code for a function to its surrounding context (which is V8 speak for the runtime representation of scope).
Consider the following simple code snippet:
const INCREMENT = 1;
function incr(x) {
return x + INCREMENT;
}
Assume that we run this in on <script>
level in Chrome (or on top-level in the d8
shell), then we see the following bytecode generated for the function incr
:
$ out/Release/d8 --print-bytecode ex1.js
...SNIP...
[generating bytecode for function: incr]
Parameter count 2
Frame size 0
35 E> 0x1859bd52f4fe @ 0 : 92 StackCheck
41 S> 0x1859bd52f4ff @ 1 : 13 04 LdaImmutableCurrentContextSlot [4]
52 E> 0x1859bd52f501 @ 3 : 97 00 ThrowReferenceErrorIfHole [0]
50 E> 0x1859bd52f503 @ 5 : 2b 02 03 Add a0, [3]
63 S> 0x1859bd52f506 @ 8 : 96 Return
Constant pool (size = 1)
0x1859bd52f4b1: [FixedArray] in OldSpace
- map = 0x2f062f402309 <Map(PACKED_HOLEY_ELEMENTS)>
- length: 1
0: 0x1859bd52ef11 <String[9]: INCREMENT>
Handler Table (size = 16)
The interesting bit here is the access to the constant INCREMENT
on script scope: It is loaded from the surrounding context via the LdaImmutableCurrentContextSlot
bytecode and then immediately checked whether the value is what we call the_hole
in V8; the_hole
is an internal marker that is used to implement the temporal dead zone for lexical scoping (see Variables and scoping in ECMAScript 6 by Axel Rauschmayer for details on this). This is a bit counter-intuitive to many developers that I talk to, as the intuition is that the VM needs to do less work for const
than var
, especially inside of local scopes, but the reality is that - at least initially - the VM needs to do even more work because of the additional TDZ (temporal dead zone) check. This is necessary because of the way scoping works, i.e. let's look at ex2.js
:
console.log(incr(5));
const INCREMENT = 1;
function incr(x) {
return x + INCREMENT;
}
And run it in the d8
shell:
$ out/Release/d8 ex2.js
ex2.js:5: ReferenceError: INCREMENT is not defined
function incr(x) { return x + INCREMENT; }
^
ReferenceError: INCREMENT is not defined
at incr (ex2.js:5:31)
at ex2.js:1:13
What happens here is that the TDZ check fails, because the assignment const INCREMENT = 1
wasn't executed before incr
was run. I have to admit that even though I'm working on the VM side of this for quite a while, I still find this behavior highly counter-intuitive, but I also don't consider myself a very good language designer... Ok, ranting aside. Looking at the example again, it obviously works if you put the call to incr
last
const INCREMENT = 1;
function incr(x) {
return x + INCREMENT;
}
console.log(incr(5));
and run that in the d8
shell:
$ out/Release/d8 ex3.js
6
So much on the background for the temporal dead zone.
Performance-wise there's one very interesting (and maybe obvious) observation here: Once a particular const
slot in a context is assigned, it will keep that value, and will not go back to ever contain the_hole
again (that's what const
guarantees). And we use exactly this fact in TurboFan to avoid loading and checking const
slot values each time.
const INCREMENT = 1;
function incr(x) { return x + INCREMENT; }
// Warmup
incr(3);
incr(4);
%OptimizeFunctionOnNextCall(incr);
console.log(incr(5));
We can see this in the optimized machine code that is generated by TurboFan:
$ out/Release/d8 --allow-natives-syntax --print-opt-code --code-comments ex4.js
...SNIP...
-- B0 start (construct frame) --
0x11e35a6041e0 0 55 push rbp
0x11e35a6041e1 1 4889e5 REX.W movq rbp,rsp
0x11e35a6041e4 4 56 push rsi
0x11e35a6041e5 5 57 push rdi
0x11e35a6041e6 6 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x11e35a6041ed d 0f862a000000 jna 0x11e35a60421d <+0x3d>
-- B2 start --
-- B3 start (deconstruct frame) --
0x11e35a6041f3 13 488b4510 REX.W movq rax,[rbp+0x10]
0x11e35a6041f7 17 a801 test al,0x1
0x11e35a6041f9 19 0f8535000000 jnz 0x11e35a604234 <+0x54>
0x11e35a6041ff 1f 488bd8 REX.W movq rbx,rax
0x11e35a604202 22 48c1eb20 REX.W shrq rbx, 32
0x11e35a604206 26 83c301 addl rbx,0x1
0x11e35a604209 29 0f802a000000 jo 0x11e35a604239 <+0x59>
0x11e35a60420f 2f 48c1e320 REX.W shlq rbx, 32
0x11e35a604213 33 488bc3 REX.W movq rax,rbx
0x11e35a604216 36 488be5 REX.W movq rsp,rbp
0x11e35a604219 39 5d pop rbp
0x11e35a60421a 3a c21000 ret 0x10
...SNIP...
The only really interesting line here is line at offset 26 with the instruction addl rbx,0x1
, where rbx
contains the integer value of the parameter x
passed to the function (based on the fact that we warmed up incr
with integer values for x
before), and the 0x1
is the constant-folded value of the INCREMENT
constant from the surrounding context. The constant-folding in this case is only valid, because TurboFan knows that no one can change the value of INCREMENT
anymore once it's no longer the_hole
(i.e. outside the TDZ). Actually it's not TurboFan that figures this out, but the Ignition interpreter forwards this information to TurboFan via the dedicated bytecode LdaImmutableCurrentContextSlot
that we saw earlier, specifically it's the immutable bit in this bytecode that tells TurboFan that the context slot cannot change anymore once it contains a non-holey value. We can see the difference when we try the same example with let
:
let INCREMENT = 1;
function incr(x) { return x + INCREMENT; }
// Warmup
incr(3);
incr(4);
%OptimizeFunctionOnNextCall(incr);
console.log(incr(5));
Running this ex5.js
code in the d8
shell and inspecting both the bytecode and the optimized machine code looks like this:
$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex5.js
...SNIP...
[generating bytecode for function: incr]
Parameter count 2
Frame size 0
33 E> 0xa9399d2f63e @ 0 : 92 StackCheck
39 S> 0xa9399d2f63f @ 1 : 12 04 LdaCurrentContextSlot [4]
50 E> 0xa9399d2f641 @ 3 : 97 00 ThrowReferenceErrorIfHole [0]
48 E> 0xa9399d2f643 @ 5 : 2b 02 03 Add a0, [3]
61 S> 0xa9399d2f646 @ 8 : 96 Return
...SNIP...
-- B0 start (construct frame) --
0x25139be041e0 0 55 push rbp
0x25139be041e1 1 4889e5 REX.W movq rbp,rsp
0x25139be041e4 4 56 push rsi
0x25139be041e5 5 57 push rdi
0x25139be041e6 6 4883ec08 REX.W subq rsp,0x8
0x25139be041ea a 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x25139be041f1 11 0f864b000000 jna 0x25139be04242 <+0x62>
-- B2 start --
-- B3 start --
0x25139be041f7 17 48b8d1f4d299930a0000 REX.W movq rax,0xa9399d2f4d1 ;; object: 0xa9399d2f4d1 <FixedArray[5]>
0x25139be04201 21 488b402f REX.W movq rax,[rax+0x2f]
0x25139be04205 25 493945a8 REX.W cmpq [r13-0x58],rax
0x25139be04209 29 0f844a000000 jz 0x25139be04259 <+0x79>
-- B4 start (deconstruct frame) --
0x25139be0420f 2f 488b5d10 REX.W movq rbx,[rbp+0x10]
0x25139be04213 33 f6c301 testb rbx,0x1
0x25139be04216 36 0f8564000000 jnz 0x25139be04280 <+0xa0>
0x25139be0421c 3c a801 test al,0x1
0x25139be0421e 3e 0f8561000000 jnz 0x25139be04285 <+0xa5>
0x25139be04224 44 48c1e820 REX.W shrq rax, 32
0x25139be04228 48 488bd3 REX.W movq rdx,rbx
0x25139be0422b 4b 48c1ea20 REX.W shrq rdx, 32
0x25139be0422f 4f 03c2 addl rax,rdx
0x25139be04231 51 0f8053000000 jo 0x25139be0428a <+0xaa>
0x25139be04237 57 48c1e020 REX.W shlq rax, 32
0x25139be0423b 5b 488be5 REX.W movq rsp,rbp
0x25139be0423e 5e 5d pop rbp
0x25139be0423f 5f c21000 ret 0x10
...SNIP...
Here we see that Ignition has to use LdaCurrentContextSlot
, i.e. it cannot proof that the value of INCREMENT
cannot change afterwards, because every other script could just modify INCREMENT
later. And as such TurboFan cannot constant-fold the value 1
, but instead has to generate explicit code to load INCREMENT
from the script context and check that it's not the_hole
(the code between offset 17 and 2f in the listing above does that).
So in this sense, const
is a performance feature, but only once it reaches the optimizing compiler and if the Function Context Specialization kicks in, which depends on a rather simple condition that might not be obvious: It's only enabled for the first closure of any function in a given native context (which is V8 speak for <iframe>
). So what does that mean? In the examples above, there was always only a single closure of incr
. But let's consider this simple counter-example ex6.js
:
const INCREMENT = 1;
function makeIncr() {
function incr(x) { return x + INCREMENT; }
return incr;
}
function test(incr) {
// Warmup
incr(3);
incr(4);
%OptimizeFunctionOnNextCall(incr);
console.log(incr(5));
}
test(makeIncr());
test(makeIncr());
It's definitely a bit artificial, but it's important to highlight the key takeaway: There are now multiple closures for the same function incr
, generated by makeIncr
. Running this in d8
reveals what I just described:
$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex6.js
...SNIP...
[generating bytecode for function: incr]
Parameter count 2
Frame size 0
59 E> 0x34d1b322fb56 @ 0 : 92 StackCheck
65 S> 0x34d1b322fb57 @ 1 : 13 04 LdaImmutableCurrentContextSlot [4]
76 E> 0x34d1b322fb59 @ 3 : 97 00 ThrowReferenceErrorIfHole [0]
74 E> 0x34d1b322fb5b @ 5 : 2b 02 03 Add a0, [3]
87 S> 0x34d1b322fb5e @ 8 : 96 Return
...SNIP...
-- B0 start (construct frame) --
0x30d8696041e0 0 55 push rbp
0x30d8696041e1 1 4889e5 REX.W movq rbp,rsp
0x30d8696041e4 4 56 push rsi
0x30d8696041e5 5 57 push rdi
0x30d8696041e6 6 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x30d8696041ed d 0f862a000000 jna 0x30d86960421d <+0x3d>
-- B2 start --
-- B3 start (deconstruct frame) --
0x30d8696041f3 13 488b4510 REX.W movq rax,[rbp+0x10]
0x30d8696041f7 17 a801 test al,0x1
0x30d8696041f9 19 0f8535000000 jnz 0x30d869604234 <+0x54>
0x30d8696041ff 1f 488bd8 REX.W movq rbx,rax
0x30d869604202 22 48c1eb20 REX.W shrq rbx, 32
0x30d869604206 26 83c301 addl rbx,0x1
0x30d869604209 29 0f802a000000 jo 0x30d869604239 <+0x59>
0x30d86960420f 2f 48c1e320 REX.W shlq rbx, 32
0x30d869604213 33 488bc3 REX.W movq rax,rbx
0x30d869604216 36 488be5 REX.W movq rsp,rbp
0x30d869604219 39 5d pop rbp
0x30d86960421a 3a c21000 ret 0x10
...SNIP...
-- B0 start (construct frame) --
0x30d8696042c0 0 55 push rbp
0x30d8696042c1 1 4889e5 REX.W movq rbp,rsp
0x30d8696042c4 4 56 push rsi
0x30d8696042c5 5 57 push rdi
0x30d8696042c6 6 4883ec08 REX.W subq rsp,0x8
0x30d8696042ca a 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x30d8696042d1 11 0f8649000000 jna 0x30d869604320 <+0x60>
-- B2 start --
-- B3 start --
0x30d8696042d7 17 488b45f8 REX.W movq rax,[rbp-0x8]
0x30d8696042db 1b 488b582f REX.W movq rbx,[rax+0x2f]
0x30d8696042df 1f 49395da8 REX.W cmpq [r13-0x58],rbx
0x30d8696042e3 23 0f844e000000 jz 0x30d869604337 <+0x77>
-- B4 start (deconstruct frame) --
0x30d8696042e9 29 488b5510 REX.W movq rdx,[rbp+0x10]
0x30d8696042ed 2d f6c201 testb rdx,0x1
0x30d8696042f0 30 0f8568000000 jnz 0x30d86960435e <+0x9e>
0x30d8696042f6 36 f6c301 testb rbx,0x1
0x30d8696042f9 39 0f8564000000 jnz 0x30d869604363 <+0xa3>
0x30d8696042ff 3f 48c1eb20 REX.W shrq rbx, 32
0x30d869604303 43 488bca REX.W movq rcx,rdx
0x30d869604306 46 48c1e920 REX.W shrq rcx, 32
0x30d86960430a 4a 03d9 addl rbx,rcx
0x30d86960430c 4c 0f8056000000 jo 0x30d869604368 <+0xa8>
0x30d869604312 52 48c1e320 REX.W shlq rbx, 32
0x30d869604316 56 488bc3 REX.W movq rax,rbx
0x30d869604319 59 488be5 REX.W movq rsp,rbp
0x30d86960431c 5c 5d pop rbp
0x30d86960431d 5d c21000 ret 0x10
...SNIP...
Ignition sticks an LdaImmutableCurrentContextSlot
bytecode in there, because it's a const
context slot, but Function Context Specialization only kicks in for the first closure. The second closure get's new optimized code, which is not specialized. The reason behind this is that if you have more than one closure per function we would like to share the code between different closure, as it would be a waste of resources - both time and memory - to generate one code object per closure then, esp. if you use arrow functions with higher order builtins like for example
let b = a.map(x => x + 1);
where you don't want to have the optimizing compiler run every time you execute this line just to generate a specialized code object for x => x + 1
. So the rule here is simple:
You only get Function Context Specialization for the first closure of every function in any given
<iframe>
(native context in V8 speak).
The native context part doesn't apply to Node as there you only have one native context, except when you use the vm
module.
Now considering that class
is like let
, i.e. it's a mutable binding (again for reasons that I don't want to buy), you don't necessarily benefit from Function Context Specialization when using classes. Let's consider ex7.js
:
class A {};
function makeA() { return new A; }
makeA();
makeA();
%OptimizeFunctionOnNextCall(makeA);
makeA();
Inspecting again the bytecode and the optimized code for makeA
we observe the following:
$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex7.js
...SNIP...
[generating bytecode for function: makeA]
Parameter count 1
Frame size 8
27 E> 0x1fcce9caf75e @ 0 : 92 StackCheck
32 S> 0x1fcce9caf75f @ 1 : 12 04 LdaCurrentContextSlot [4]
0x1fcce9caf761 @ 3 : 97 00 ThrowReferenceErrorIfHole [0]
0x1fcce9caf763 @ 5 : 1e fa Star r0
39 E> 0x1fcce9caf765 @ 7 : 58 fa fa 00 03 Construct r0, r0-r0, [3]
46 S> 0x1fcce9caf76a @ 12 : 96 Return
...SNIP...
-- B0 start (construct frame) --
0x19518f5041e0 0 55 push rbp
0x19518f5041e1 1 4889e5 REX.W movq rbp,rsp
0x19518f5041e4 4 56 push rsi
0x19518f5041e5 5 57 push rdi
0x19518f5041e6 6 4883ec08 REX.W subq rsp,0x8
0x19518f5041ea a 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x19518f5041f1 11 0f8673000000 jna 0x19518f50426a <+0x8a>
-- B2 start --
-- B3 start --
0x19518f5041f7 17 48b821f5cae9cc1f0000 REX.W movq rax,0x1fcce9caf521 ;; object: 0x1fcce9caf521 <FixedArray[5]>
0x19518f504201 21 488b402f REX.W movq rax,[rax+0x2f]
0x19518f504205 25 493945a8 REX.W cmpq [r13-0x58],rax
0x19518f504209 29 0f8488000000 jz 0x19518f504297 <+0xb7>
-- B4 start --
0x19518f50420f 2f 48bb29b9e84758150000 REX.W movq rbx,0x155847e8b929 ;; object: 0x155847e8b929 <JSFunction A (sfi = 0x1fcce9caf169)>
0x19518f504219 39 483bd8 REX.W cmpq rbx,rax
0x19518f50421c 3c 0f859c000000 jnz 0x19518f5042be <+0xde>
0x19518f504222 42 498b8578e40300 REX.W movq rax,[r13+0x3e478]
0x19518f504229 49 488d5818 REX.W leaq rbx,[rax+0x18]
0x19518f50422d 4d 49399d80e40300 REX.W cmpq [r13+0x3e480],rbx
0x19518f504234 54 0f864a000000 jna 0x19518f504284 <+0xa4>
-- B6 start --
-- B7 start (deconstruct frame) --
0x19518f50423a 5a 488d5818 REX.W leaq rbx,[rax+0x18]
0x19518f50423e 5e 4883c001 REX.W addq rax,0x1
0x19518f504242 62 49899d78e40300 REX.W movq [r13+0x3e478],rbx
0x19518f504249 69 48bb9105294321300000 REX.W movq rbx,0x302143290591 ;; object: 0x302143290591 <Map(PACKED_HOLEY_ELEMENTS)>
0x19518f504253 73 488958ff REX.W movq [rax-0x1],rbx
0x19518f504257 77 498b5d70 REX.W movq rbx,[r13+0x70]
0x19518f50425b 7b 48895807 REX.W movq [rax+0x7],rbx
0x19518f50425f 7f 4889580f REX.W movq [rax+0xf],rbx
0x19518f504263 83 488be5 REX.W movq rsp,rbp
0x19518f504266 86 5d pop rbp
0x19518f504267 87 c20800 ret 0x8
...SNIP...
What's interesting to see here is that the constructor for A
is properly inlined into makeA
in the optimized code and we essentially just stamp out instances of A
with the best possible code, except for the additional checks that we need to perform because TurboFan doesn't know that A
cannot change (in fact A
can change at any moment, since it's a mutable binding). So all the code between offset 17 and offset 2f loads the context slot for A
and checks that it's not the_hole
and the next two lines check that it's actually the JSFunction A
that we saw earlier (during warmup). As you can see TurboFan nevertheless tries hard to generate pretty decent code. But you can help it further by using const
here as well:
const A = class A {};
function makeA() { return new A; }
makeA();
makeA();
%OptimizeFunctionOnNextCall(makeA);
makeA();
Now you get the ideal code for makeA
because Ignition tells TurboFan that the context slot cannot change (via LdaImmutableCurrentContextSlot
):
$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex8.js
...SNIP...
[generating bytecode for function: makeA]
Parameter count 1
Frame size 8
37 E> 0x257007eaf75e @ 0 : 92 StackCheck
42 S> 0x257007eaf75f @ 1 : 13 04 LdaImmutableCurrentContextSlot [4]
0x257007eaf761 @ 3 : 97 00 ThrowReferenceErrorIfHole [0]
0x257007eaf763 @ 5 : 1e fa Star r0
49 E> 0x257007eaf765 @ 7 : 58 fa fa 00 03 Construct r0, r0-r0, [3]
56 S> 0x257007eaf76a @ 12 : 96 Return
...SNIP...
-- B0 start (construct frame) --
0x3f0511b841e0 0 55 push rbp
0x3f0511b841e1 1 4889e5 REX.W movq rbp,rsp
0x3f0511b841e4 4 56 push rsi
0x3f0511b841e5 5 57 push rdi
0x3f0511b841e6 6 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x3f0511b841ed d 0f8648000000 jna 0x3f0511b8423b <+0x5b>
-- B2 start --
-- B3 start --
0x3f0511b841f3 13 498b8578e40300 REX.W movq rax,[r13+0x3e478]
0x3f0511b841fa 1a 488d5818 REX.W leaq rbx,[rax+0x18]
0x3f0511b841fe 1e 49399d80e40300 REX.W cmpq [r13+0x3e480],rbx
0x3f0511b84205 25 0f8647000000 jna 0x3f0511b84252 <+0x72>
-- B5 start --
-- B6 start (deconstruct frame) --
0x3f0511b8420b 2b 488d5818 REX.W leaq rbx,[rax+0x18]
0x3f0511b8420f 2f 4883c001 REX.W addq rax,0x1
0x3f0511b84213 33 49899d78e40300 REX.W movq [r13+0x3e478],rbx
0x3f0511b8421a 3a 48bb9105d1acaa270000 REX.W movq rbx,0x27aaacd10591 ;; object: 0x27aaacd10591 <Map(PACKED_HOLEY_ELEMENTS)>
0x3f0511b84224 44 488958ff REX.W movq [rax-0x1],rbx
0x3f0511b84228 48 498b5d70 REX.W movq rbx,[r13+0x70]
0x3f0511b8422c 4c 48895807 REX.W movq [rax+0x7],rbx
0x3f0511b84230 50 4889580f REX.W movq [rax+0xf],rbx
0x3f0511b84234 54 488be5 REX.W movq rsp,rbp
0x3f0511b84237 57 5d pop rbp
0x3f0511b84238 58 c20800 ret 0x8
...SNIP...
This is the perfect x64 machine code for makeA
, there are no redundant checks in this code left (the two checks in there are the stack check to ensure that V8 doesn't overflow the execution stack and the bump pointer check to trigger garbage collection when new space is filled up).
So far the only way to get LdaImmutableCurrentContextSlot
instead of LdaCurrentContextSlot
was by using const
. But this was because I was demonstrating only code operating on lexically bound names on script level (or top-level in d8
). If we go back to the simple let
example in ex5.js
and run that in Node 9 (or 8.2.0-rc1) we see that INCREMENT
get's constant-folded despite using let
:
$ node --print-opt-code --code-comments --allow-natives-syntax ex5.js
...SNIP...
-- B0 start (construct frame) --
0x2f2f61804f60 0 55 push rbp
0x2f2f61804f61 1 4889e5 REX.W movq rbp,rsp
0x2f2f61804f64 4 56 push rsi
0x2f2f61804f65 5 57 push rdi
0x2f2f61804f66 6 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x2f2f61804f6d d 0f862a000000 jna 0x2f2f61804f9d <+0x3d>
-- B2 start --
-- B3 start (deconstruct frame) --
0x2f2f61804f73 13 488b4510 REX.W movq rax,[rbp+0x10]
0x2f2f61804f77 17 a801 test al,0x1
0x2f2f61804f79 19 0f8535000000 jnz 0x2f2f61804fb4 <+0x54>
0x2f2f61804f7f 1f 488bd8 REX.W movq rbx,rax
0x2f2f61804f82 22 48c1eb20 REX.W shrq rbx, 32
0x2f2f61804f86 26 83c301 addl rbx,0x1
0x2f2f61804f89 29 0f802a000000 jo 0x2f2f61804fb9 <+0x59>
0x2f2f61804f8f 2f 48c1e320 REX.W shlq rbx, 32
0x2f2f61804f93 33 488bc3 REX.W movq rax,rbx
0x2f2f61804f96 36 488be5 REX.W movq rsp,rbp
0x2f2f61804f99 39 5d pop rbp
0x2f2f61804f9a 3a c21000 ret 0x10
-- B4 start (no frame) --
-- B1 start (deferred) --
-- </usr/local/google/home/bmeurer/Projects/v8/ex5.js:3:14> --
0x2f2f61804f9d 3d 48bb40690e0100000000 REX.W movq rbx,0x10e6940
0x2f2f61804fa7 47 33c0 xorl rax,rax
0x2f2f61804fa9 49 488b75f8 REX.W movq rsi,[rbp-0x8]
0x2f2f61804fad 4d e82ef6e7ff call 0x2f2f616845e0 ;; code: STUB, CEntryStub, minor: 8
0x2f2f61804fb2 52 ebbf jmp 0x2f2f61804f73 <+0x13>
0x2f2f61804fb4 54 e847f0cfff call 0x2f2f61504000 ;; deoptimization bailout 0
0x2f2f61804fb9 59 e84cf0cfff call 0x2f2f6150400a ;; deoptimization bailout 1
...SNIP...
And similar if we run ex7.js
with the class
binding for A
in Node 9 (or 8.2.0-rc1):
$ node --print-opt-code --code-comments --allow-natives-syntax ex7.js
...SNIP...
-- B0 start (construct frame) --
0x2e1f81f84e80 0 55 push rbp
0x2e1f81f84e81 1 4889e5 REX.W movq rbp,rsp
0x2e1f81f84e84 4 56 push rsi
0x2e1f81f84e85 5 57 push rdi
0x2e1f81f84e86 6 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x2e1f81f84e8d d 0f8648000000 jna 0x2e1f81f84edb <+0x5b>
-- B2 start --
-- B3 start --
0x2e1f81f84e93 13 498b85a8ec0300 REX.W movq rax,[r13+0x3eca8]
0x2e1f81f84e9a 1a 488d5818 REX.W leaq rbx,[rax+0x18]
0x2e1f81f84e9e 1e 49399db0ec0300 REX.W cmpq [r13+0x3ecb0],rbx
0x2e1f81f84ea5 25 0f8647000000 jna 0x2e1f81f84ef2 <+0x72>
-- B5 start --
-- B6 start (deconstruct frame) --
0x2e1f81f84eab 2b 488d5818 REX.W leaq rbx,[rax+0x18]
0x2e1f81f84eaf 2f 4883c001 REX.W addq rax,0x1
0x2e1f81f84eb3 33 49899da8ec0300 REX.W movq [r13+0x3eca8],rbx
0x2e1f81f84eba 3a 48bb012f6b7ceb110000 REX.W movq rbx,0x11eb7c6b2f01 ;; object: 0x11eb7c6b2f01 <Map(PACKED_HOLEY_ELEMENTS)>
0x2e1f81f84ec4 44 488958ff REX.W movq [rax-0x1],rbx
0x2e1f81f84ec8 48 498b5d70 REX.W movq rbx,[r13+0x70]
0x2e1f81f84ecc 4c 48895807 REX.W movq [rax+0x7],rbx
0x2e1f81f84ed0 50 4889580f REX.W movq [rax+0xf],rbx
0x2e1f81f84ed4 54 488be5 REX.W movq rsp,rbp
0x2e1f81f84ed7 57 5d pop rbp
0x2e1f81f84ed8 58 c20800 ret 0x8
...SNIP...
We see that this is the ideal code. The reason for this is the CommonJS module system used by Node. Every module is implicitly wrapped into a function. So ex7.js
in Node corresponds roughly to the following code in Chrome or d8
:
(function() {
class A {};
function makeA() { return new A; }
makeA();
makeA();
%OptimizeFunctionOnNextCall(makeA);
makeA();
})();
This is simplified (as I don't want to explain webpack as well here). What's interesting here, is that A
is local to the anonymous closure, and thus the parser can actually proof that A
never changed after the initial definition, because no code outside the closure can see (and touch) the binding A
. Thereby Ignition sticks an LdaImmutableCurrentContextSlot
in there and TurboFan can generate awesome code for makeA
:
$ out/Release/d8 --print-bytecode --allow-natives-syntax --print-opt-code --code-comments ex9.js
...SNIP...
[generating bytecode for function: makeA]
Parameter count 1
Frame size 8
45 E> 0x22ac28a2f7e6 @ 0 : 92 StackCheck
50 S> 0x22ac28a2f7e7 @ 1 : 13 04 LdaImmutableCurrentContextSlot [4]
0x22ac28a2f7e9 @ 3 : 97 00 ThrowReferenceErrorIfHole [0]
0x22ac28a2f7eb @ 5 : 1e fa Star r0
57 E> 0x22ac28a2f7ed @ 7 : 58 fa fa 00 03 Construct r0, r0-r0, [3]
64 S> 0x22ac28a2f7f2 @ 12 : 96 Return
...SNIP...
-- B0 start (construct frame) --
0x138cd23841e0 0 55 push rbp
0x138cd23841e1 1 4889e5 REX.W movq rbp,rsp
0x138cd23841e4 4 56 push rsi
0x138cd23841e5 5 57 push rdi
0x138cd23841e6 6 493ba5680c0000 REX.W cmpq rsp,[r13+0xc68]
0x138cd23841ed d 0f8648000000 jna 0x138cd238423b <+0x5b>
-- B2 start --
-- B3 start --
0x138cd23841f3 13 498b8578e40300 REX.W movq rax,[r13+0x3e478]
0x138cd23841fa 1a 488d5818 REX.W leaq rbx,[rax+0x18]
0x138cd23841fe 1e 49399d80e40300 REX.W cmpq [r13+0x3e480],rbx
0x138cd2384205 25 0f8647000000 jna 0x138cd2384252 <+0x72>
-- B5 start --
-- B6 start (deconstruct frame) --
0x138cd238420b 2b 488d5818 REX.W leaq rbx,[rax+0x18]
0x138cd238420f 2f 4883c001 REX.W addq rax,0x1
0x138cd2384213 33 49899d78e40300 REX.W movq [r13+0x3e478],rbx
0x138cd238421a 3a 48bb910501aa382d0000 REX.W movq rbx,0x2d38aa010591 ;; object: 0x2d38aa010591 <Map(PACKED_HOLEY_ELEMENTS)>
0x138cd2384224 44 488958ff REX.W movq [rax-0x1],rbx
0x138cd2384228 48 498b5d70 REX.W movq rbx,[r13+0x70]
0x138cd238422c 4c 48895807 REX.W movq [rax+0x7],rbx
0x138cd2384230 50 4889580f REX.W movq [rax+0xf],rbx
0x138cd2384234 54 488be5 REX.W movq rsp,rbp
0x138cd2384237 57 5d pop rbp
0x138cd2384238 58 c20800 ret 0x8
...SNIP...
So takeaways from this exercise:
- Looking at generated x64 machine code can be frightening.
const
comes with a cost for the TDZ, but can pay off in optimized code.class
binding is equivalent tolet
binding, useconst
to get immutable binding on script scope.- JavaScript VMs try to be smart within function scopes (as used by Node or webpack).