Perhaps you’ve watched the recent KotlinConf 2023 Keynote on updates for K2 compiler. What is the K2 compiler?
Perhaps you might be waiting on Part 2 Crash Course of the Kotlin Compiler: before we can continue, we take a step back to cover a high level overview of different kinds of Kotlin compilers and its differences. Brief introduction to different kinds of data transformations that may occur with compilation will be an important primer for the next part.
For quick reference:
•
1. Frontend: Parsing phase
•
2. Detour: K1 + K2 Frontends, Backends
•
3. Frontend: Resolution phase (coming soon)
The basics
Source code is submitted through the Kotlin compiler to turn human-readable source code into machine-executable code targeted for whichever designated machine.
If we were to grossly oversimplify the compiler, we can think of the compiler as doing two things: compiling and lowering . Compiling changes the one data format to another data format, and lowering generally simplifies/optimizes the existing data format.
When Kotlin code is compiled, a set of configurations is chosen to run the Kotlin compiler: deciding which environment to run on i.e. CLI, Analysis API, processor options, which plugins to hook into the compiler - and which frontend/backend to choose from.
The Kotlin compiler has 2 frontends — K1 and K2 — and 4 backends: JVM , JS , Native , and an experimental WASM .
K1/K2 Frontends
The Kotlin compiler has two frontends — K1 frontend (denoted in the source code with Fe10- ) and K2 frontend (sometimes called FIR frontend , and sometimes denoted in the source code with Fir- ). Choosing a frontend determines what information is sent to the backend for IR generation and subsequent target generation.
Both K1 and K2 frontends share similar stages in the beginning, only FIR (frontend intermediate representation) frontend introduces an additional data format before sending off the transformed code to the backend — which will then immediately change the data format to IR, or Intermediate Representation, for further processing.
K1 Frontend
K1 frontend ( Fe10- ) takes human-readable source code, breaks down the text into Lexical tokens, creates a PSI tree and performs resolution to create additional data structures like descriptors and BindingContext to the backend.
K1 Frontend data format changes before it is sent to the backend: source code → tokens → AST/PSI
PSI stands for Program Structure Interface , which layers in IntelliJ to help parse files and create syntactic/semantic code models later on in compiling.
In the case of K1 frontend , resolution is performed on the PSI tree to generate descriptors and BindingContext , which is all sent over to the backend to be transformed into IR.
•
Descriptors , depending on element types, may hold context, scope, containing information, overrides, companion objects, and so on.
•
BindingContext is a big map where PSI is a key to a map of descriptors and other information used to infer upon the code later on.
K1 compiler sends PSI and BindingContext to the backend, which uses Psi2Ir to transform information into IR for the backend for further processing.
However, sending over PSI and BindingContext like this has led to performance issues in the compiler.
As explained by Dmitriy Novozhilov in Kotlinlang slack , resolution was also stored in BindingContext , so the CPU cannot cache objects quick enough. All descriptors were lazy — this resulted in the compiler jumping between different parts of the code, which killed a number of JIT optimizations.