Native Compilation in Emacs
This meeting will be online. Please join us with the following link: https://conf.dfn.de/webapp/conference/979117107
Abstract
The next version of Emacs will include an optimizing compiler from Elisp to native code, thanks to the work by Andrea Corallo et.al. This will facilitate performance increases between 2.3x and 42x¹ and a generally smoother editing experience².
In this talk, I will discuss the design and implementation of this new feature and demonstrate some of its implications for day-to-day use.
- ¹ according to https://arxiv.org/abs/2004.02504
- ² according to me
Introduction
Current State of the Gnunion
~20% of emacs written in C because of performance
Byte compiler & VM used to speed up Elisp performance
Last major improvements to byte compiler: ca. 1990
Byte code portable in theory
- Forward compatible between Emacs versions
- Machine independent
- But: Portability not often used, compilation often on target system
Motivation for Performance Improvements
Demanding modern requirements:
- IDE features (lsp-mode)
- Org Mode, Magit, treemacs, etc
Competition (VSCode) uses state of the Art JIT compiler (V8)
VIM users laugh at Emacs’ input latency
Now: Native Code 🎉
- Multiple Attempts previously: All failed
- New work by Andrea Corallo et. al.
- Will be included in Emacs 28
- Promises a faster and more extensible Emacs
Demo
Interpreter
(defun silly-loop1 (n)
"Return the time, in seconds, to run N iterations of a loop."
(let ((t1 (float-time)))
(while (> (setq n (1- n)) 0))
(- (float-time) t1)))
(silly-loop1 50000000)
Byte Code
(defun silly-loop2 (n)
"Return the time, in seconds, to run N iterations of a loop."
(let ((t1 (float-time)))
(while (> (setq n (1- n)) 0))
(- (float-time) t1)))
(byte-compile 'silly-loop2)
(silly-loop2 50000000)
(with-temp-buffer
(disassemble 'silly-loop2 (current-buffer))
(buffer-string))
Native Code
(defun silly-loop3 (n)
"Return the time, in seconds, to run N iterations of a loop."
(let ((t1 (float-time)))
(while (> (setq n (1- n)) 0))
(- (float-time) t1)))
(native-compile 'silly-loop3)
(silly-loop3 50000000)
(with-temp-buffer
(disassemble 'silly-loop3 (current-buffer))
(buffer-string))
Implementation Details
Starts with existing byte code representation
Implements bespoke IR with data-flow analysis, etc
Uses libgccjit for native code generation
LAP Generation
- IR of the existing byte compiler
- Stack-Machine based VM (think JVM)
(defun simple1 (n)
(+ n 1))
(with-temp-buffer
(disassemble 'simple1 (current-buffer))
(buffer-string))
LAP to LIMPLE
LIMPLE
- New IR for native code compiler
- Named after “GIMPLE”, an intermediate IR of GCC
- Already in form of basic blocks and control flow graph
Motivation:
Later IRs in GCC disregard Elisp’s semantics
Thus, some compiler features best implemented with Elisp in mind:
Type propagation:
- GCC can only propagate constants and ranges
Purity analysis
Reference propagation
Unboxing
Compiler hints
comp-hint-fixnum
comp-hint-cons
Better warnings and errors
GCC optimization constraints: GCC very conservative with optimization
- E.g. tail-call elimination
(defun simple2 (n)
(+ n 1))
(let ((native-comp-verbose 2))
(native-compile 'simple2))
LIMPLE -> SSA
- Standard transformation into SSA (static single assignment) form
Data flow analysis
Propagation of
- values
- types
- allocation
Call Optimization
Byte Code: Opcodes for some C-functions, funcall-trampoline for others
Now: All primitive C functions as direct calls
Dead Code Elimination
Tail Recursion Elimination
- Only recursive tail-calls eliminated
- Possible other functions in the future
Code Layout
- Generats libgccjit IR
- “Inlines” some primitives
- Finally passes IR to libgccjit to generate native code
- Compiled elisp functions follow ABI of existing primitive functions (in C)
(defun simple2 (n)
(+ n 1))
(native-compile 'simple2)
(with-temp-buffer
(disassemble 'simple2 (current-buffer))
(buffer-string))
Usage
Transparent and asynchronous (depending on
native-comp-deferred-compilation
)Whenever
.elc
file without matching.eln
file is loaded:- Async instance of Emacs spawned for compilation
- Finally: Natively compiled versions replace byte compiled functions
.eln
file cached for future loading
Ahead-of-time compilation optionally possible