Native Compilation in Emacs

by HP

Abstract

The next version of Emacs will include an optimizing compiler from Elisp to native code, thanks to the work by Andrea Corallo et.al. This will facilitate performance increases between 2.3x and 42x¹ and a generally smoother editing experience².

In this talk, I will discuss the design and implementation of this new feature and demonstrate some of its implications for day-to-day use.

Introduction

Current State of the Gnunion

  • ~20% of emacs written in C because of performance

  • Byte compiler & VM used to speed up Elisp performance

  • Last major improvements to byte compiler: ca. 1990

  • Byte code portable in theory

    • Forward compatible between Emacs versions
    • Machine independent
    • But: Portability not often used, compilation often on target system

Motivation for Performance Improvements

  • Demanding modern requirements:

    • IDE features (lsp-mode)
    • Org Mode, Magit, treemacs, etc
  • Competition (VSCode) uses state of the Art JIT compiler (V8)

  • VIM users laugh at Emacs’ input latency

Now: Native Code 🎉

  • Multiple Attempts previously: All failed
  • New work by Andrea Corallo et. al.
  • Will be included in Emacs 28
  • Promises a faster and more extensible Emacs

Demo

Interpreter

(defun silly-loop1 (n)
  "Return the time, in seconds, to run N iterations of a loop."
  (let ((t1 (float-time)))
    (while (> (setq n (1- n)) 0))
    (- (float-time) t1)))
(silly-loop1 50000000)

Byte Code

(defun silly-loop2 (n)
  "Return the time, in seconds, to run N iterations of a loop."
  (let ((t1 (float-time)))
    (while (> (setq n (1- n)) 0))
    (- (float-time) t1)))
(byte-compile 'silly-loop2)
(silly-loop2 50000000)
(with-temp-buffer
  (disassemble 'silly-loop2 (current-buffer))
  (buffer-string))

Native Code

(defun silly-loop3 (n)
  "Return the time, in seconds, to run N iterations of a loop."
  (let ((t1 (float-time)))
    (while (> (setq n (1- n)) 0))
    (- (float-time) t1)))
(native-compile 'silly-loop3)
(silly-loop3 50000000)
(with-temp-buffer
  (disassemble 'silly-loop3 (current-buffer))
  (buffer-string))

Implementation Details

  • Starts with existing byte code representation

  • Implements bespoke IR with data-flow analysis, etc

  • Uses libgccjit for native code generation

LAP Generation

  • IR of the existing byte compiler
  • Stack-Machine based VM (think JVM)
(defun simple1 (n)
  (+ n 1))
(with-temp-buffer
  (disassemble 'simple1 (current-buffer))
  (buffer-string))

LAP to LIMPLE

LIMPLE

  • New IR for native code compiler
  • Named after “GIMPLE”, an intermediate IR of GCC
  • Already in form of basic blocks and control flow graph

Motivation:

  • Later IRs in GCC disregard Elisp’s semantics

  • Thus, some compiler features best implemented with Elisp in mind:

    • Type propagation:

      • GCC can only propagate constants and ranges
    • Purity analysis

    • Reference propagation

    • Unboxing

    • Compiler hints

      • comp-hint-fixnum
      • comp-hint-cons
    • Better warnings and errors

    • GCC optimization constraints: GCC very conservative with optimization

      • E.g. tail-call elimination
(defun simple2 (n)
  (+ n 1))
(let ((native-comp-verbose 2))
  (native-compile 'simple2))

LIMPLE -> SSA

  • Standard transformation into SSA (static single assignment) form

Data flow analysis

Propagation of

  • values
  • types
  • allocation

Call Optimization

  • Byte Code: Opcodes for some C-functions, funcall-trampoline for others

  • Now: All primitive C functions as direct calls

Dead Code Elimination

Tail Recursion Elimination

  • Only recursive tail-calls eliminated
  • Possible other functions in the future

Code Layout

  • Generats libgccjit IR
  • “Inlines” some primitives
  • Finally passes IR to libgccjit to generate native code
  • Compiled elisp functions follow ABI of existing primitive functions (in C)
(defun simple2 (n)
  (+ n 1))
(native-compile 'simple2)
(with-temp-buffer
  (disassemble 'simple2 (current-buffer))
  (buffer-string))

Usage

  • Transparent and asynchronous (depending on native-comp-deferred-compilation)

  • Whenever .elc file without matching .eln file is loaded:

    • Async instance of Emacs spawned for compilation
    • Finally: Natively compiled versions replace byte compiled functions
    • .eln file cached for future loading
  • Ahead-of-time compilation optionally possible

Sources