.

optimization and node.js

Part 1

How the design of Node.js makes your code fast running.png

A Program

Sorry, your browser does not support SVG.

A Program

Sorry, your browser does not support SVG.

A Program

Sorry, your browser does not support SVG.

A Program

Sorry, your browser does not support SVG.

A Program

Sorry, your browser does not support SVG.

Threading can help

Sorry, your browser does not support SVG.

Threading has costs

  • Context switching is not free
  • Execution stacks take up memory

Threading has costs

  • Difficult to develop with
  • Locking is hard to do right

Coordinating threads

lock1.png

Coordinating threads

lock2.png

Coordinating threads

lock3.png

Coordinating threads

lock4.png

Coordinating threads

lock5.png

There are other ways

poodle.png

Apache vs. nginx

nginx-apache-reqs-sec.png

Apache vs. nginx

nginx-apache-memory.png

The Difference

  • Apache uses threads
  • nginx uses an event loop

A Program

Sorry, your browser does not support SVG.

A Event Loop Program

Sorry, your browser does not support SVG.

A Event Loop Program

Sorry, your browser does not support SVG.

An Event Loop Program

  • The old program:
var data = DB.fetch("select ...")
print(data)
... other stuff ...
  • The new program:
var data = DB.fetch("select ...", function(data) {
  print data
})
... other stuff ...

An Event Loop Program

We're taking a function call and turning the "wait" into a "wait before doing this, but continue with that…."

var data = DB.fetch("select ...", function(data) {
  print data
})
... other stuff ...

But Where?

confused.png

But Where?

Wherever there is I/O.

But Where?

Wherever there is I/O.

http.request(options, function (response) {
  render(`I received ${response.data}`)
})

But Where?

Wherever there is I/O.

http.request(options, function (response) {
  render(`I received ${response.data}`)
})
redis.get(hash, (err, result) => {
  if (err) callback(err)
  console.log(`My result: ${result}`)
})

But Where?

Wherever there is I/O.

http.request(options, function (response) {
  render(`I received ${response.data}`)
})
redis.get(hash, (err, result) => {
  if (err) callback(err)
  console.log(`My result: ${result}`)
})
http.createServer(function (req, res) {
  res.sendHeader(200, {'Content-Type': 'text/plain'})
  res.sendBody('Hello World\r\n')
  res.finish();
}).listen(8000)

But Where?

Wherever there is I/O.

http.request(options, function (response) {
  render(`I received ${response.data}`)
})
redis.get(hash, (err, result) => {
  if (err) callback(err)
  console.log(`My result: ${result}`)
})
http.createServer(function (req, res) {
  res.sendHeader(200, {'Content-Type': 'text/plain'})
  res.sendBody('Hello World\r\n')
  res.finish();
}).listen(8000)
fs.exists('filename', function (result) {
  if (result) console.log('filename exists!')
})

But Where?

Wherever there is I/O.

http.request(options, function (response) {
  render(`I received ${response.data}`)
})
redis.get(hash, (err, result) => {
  if (err) callback(err)
  console.log(`My result: ${result}`)
})
http.createServer(function (req, res) {
  res.sendHeader(200, {'Content-Type': 'text/plain'})
  res.sendBody('Hello World\r\n')
  res.finish();
}).listen(8000)
fs.exists('filename', function (result) {
  if (result) console.log('filename exists!')
})
var i = rl.createInterface(process.sdtin, process.stdout, null)
i.question('What do you think of this presentation?', function (answer) {
  console.log('Thank you for your valuable feedback.')
  i.close()
  process.stdin.destroy()
})

But How?

confused2.png

But How?

How is this different from threading? Or multiprocessing?

But How?

OS level support for evented I/O

But How?

OS level support for evented I/O, a fundamentally different mechanism.

  • select()
  • kqueue (FreeBSD)
  • epoll (Linux)
  • IO Completion Ports (Windows)
  • libevent/libev/libuv

Part 2: V8

A tale of two compilers

A tale of two compilers

  • generic

A tale of two compilers

  • generic

JS:

a + b

Assembler:

mov rax, a
mov rbx, b
call AddValues

V8: A tale of two compilers

  • generic

JS:

a + b

Assembler:

mov rax, a
mov rbx, b
call AddValues

Are they integers? Floats? Pointers to strings? AddValues will go through the work to figure that out.

V8: A tale of two compilers

  • generic
mov rax, a
mov rbx, b
call AddValues
  • optimizing
mov rax, a
mov rbx, b
add rax, rbx

..assuming we know they are integers…huge speedup.

How's that work?

Monomorphism

Monomorphism

  • Compiler sees one type
  • Back-patches the compiled code to accommodate that type
  • With a minimal guard condition

Inline caching

candidate % this.primes[i] == 0
push [ebp+0x8]
mov eax,[ebp+0xc]
mov edx,eax
mov ecx,0x50b155dd
call LoadIC_Initialize           ;; this.primes
push eax
mov eax,[ebp+0xf4]
pop edx
mov ecx,eax
call KeyedLoadIC_Initialize      ;; this.primes[i]
pop edx
call BinaryOpIC_Initialize Mod   ;; candidate % this.primes[i]

Inline caching

candidate % this.primes[i] == 0
push [ebp+0x8]
mov eax,[ebp+0xc]
mov edx,eax
mov ecx,0x50b155dd
call LoadIC_Initialize           ;; this.primes
push eax
mov eax,[ebp+0xf4]
pop edx
mov ecx,eax
call KeyedLoadIC_Initialize      ;; this.primes[i]
pop edx
call BinaryOpIC_Initialize Mod   ;; candidate % this.primes[i]
cmp [edi+0xff],0x4920d181   ;; Is this a Primes object?
jnz 0x2a90a03c
mov eax,[edi+0xf]           ;; Fetch this.primes
test eax,0x1                ;; Is primes a SMI ?
jz 0x2a90a050
cmp [eax+0xff],0x4920b001   ;; Is primes hidden class a packed SMI array?
mov ebx,[eax+0x7]
mov esi,[eax+0xb]           ;; Load array length
sar esi,1                   ;; Convert SMI length to int32
cmp ecx,esi                 ;; Check array bounds
jnc 0x2a90a06e
mov esi,[ebx+ecx*4+0x7]     ;; Load element
sar esi,1                   ;; Convert SMI element to int32
test esi,esi                ;; mod (int32)
jz 0x2a90a078
...
cdq
idiv esi

Inline caching

  • Speculative
  • Can be foiled
  • Will get back-patched again
  • To become polymorphic
  • And the megamorphic (the worst)

On a per-call basis

  • Keep functions relatively short!

Ways to check optimization

node --allow-natives-syntax

function printStatus(fn) {
    switch(%GetOptimizationStatus(fn)) {
        case 1: console.log("Function is optimized"); break
        case 2: console.log("Function is not optimized"); break
        case 3: console.log("Function is always optimized"); break
        case 4: console.log("Function is never optimized"); break
        case 6: console.log("Function is maybe deoptimized"); break
        case 7: console.log("Function is optimized by TurboFan"); break
        default: console.log("Unknown optimization status"); break
    }
}

printStatus(myFunction)
%OptimizeFunctionOnNextCall(myFunction)
myFunction()

Ways to check optimization

node --trace_opt --trace_deopt

Function is not optimized
[compiling method 0x3730d8083cb9 <JS Function myFunction (SharedFunctionInfo 0x3dcabc53d291)> using TurboFan]
[optimizing 0x3730d8083cb9 <JS Function myFunction (SharedFunctionInfo 0x3dcabc53d291)> - took 1.606, 0.000, 0.000 ms]
Function is optimized by TurboFan

Ways to Kill Optimization

Things not optimized (right now):

  • Generator functions
  • Functions that contain a for-of statement
  • Functions that contain a try-catch statement
  • Functions that contain a try-finally statement
  • Functions that contain a compound let assignment
  • Functions that contain a compound const assignment
  • Functions that contain object literals that contain proto, or get or set declarations.

Ways to Kill Optimization

Things likely never optimizable:

  • Functions that contain a debugger statement
  • Functions that call eval() on a literal
  • Functions that contain a with statement

Also: for speed, keep numbers under 31 bits!

Ways to Kill Optimization

  • Mis-use of the arguments array

only use:

  • arguments.length
  • arguments[i]
  • One exception: fn.apply(y, arguments)

References