The support for WebAssembly (abbreviated Wasm) is a critical part of the VGG engine. Due to performance and cross-platform support considerations, the VGG engine is written in C++. It could be compiled into WebAssembly, so that we are able to run it in browsers. More importantly, it also supports user-generated Wasm files to be plugged into the designs.
WebAssembly is an executable binary format for a stack-based virtual machine.
There have been plenty of previous work, including blog posts, papers, books, etc, that help us understand the WebAssembly format. However, few of them focus on the compiling process of C++ to WebAssembly.
In this post, we share the process of using Emscripten, a de-facto compiler toolchain for WebAssembly, to compile C++ code into WebAssembly. Hope you enjoy it!
Background of WebAssembly
Let’s start the story with JVM1, the most famous virtual machine in programming history. JVM-based languages are a collection of languages that obey the JVM specification, including Java, Clojure, Scala, and Kotlin, so that they can run on JVM without cross-platform issues, while getting the power of JVM including GC2, exception handling, multithreading, atomic operands, etc.
WebAssembly format actually borrows a lot from the JVM specification. Some details can be peeked in the paper Bringing the Web up to Speed with WebAssembly3. The authors also implemented a virtual machine to execute the WebAssembly format. As long as a programming language could be compiled into WebAssembly, it can be executed in this WebAssembly virtual machine. This is exactly what JVM does.
WebAssembly is designed for speeding up the Web. For example, we can use Photoshop in the browser. The ability of WebAssembly is greatly extended. It could help build portable standards for embedded devices so that we can narrow the gap between embedded hardware and software. As a consequence, more developers using high-level programming languages are able to deploy their products on tiny embedded hardware. It is the last puzzle of making everything intelligent.
Besides, the blockchain community is paying more attention to WebAssembly. For example, Ethereum executes contracts in its own virtual machine called EVM4. These contracts are written by Solidity language, a derivative programming language from Go. Developers cannot port the contracts to another chain unless it also supports EVM. What will happen if we use WebAssembly as the virtual machine format? Developers can write contracts in any programming language, then compile them to WebAssembly format. The contract is portable to any chains which support WebAssembly virtual machine. And it is the next virtual machine generation for most newer blockchain projects. We believe it will be the future for DeFi infrastructure.
Rust and C++, the two programming languages, are the primarily supported languages for WebAssembly generation. WebAssembly has the same linear memory model and reference table design as the C++ language. And the compiling to WebAssembly binaries is supported by Clang, which contains the WebAssembly target for LLVM5 framework. Clang is the C++ compiler constructed with modules from LLVM. It focuses on translating C++ languages to LLVM IR6 and then uses some toolkits to generate executable binary files for the target platform.
It’s very convenient to set up a C++ WebAssembly building environment with
emsdk, which helps us collect all the toolkits for compiling WebAssembly.
emsdk, we will get all the tools under the directory
upstream. We can check the installation by the command:
clang executable is downloaded directly from official LLVM releases, unmodified. If we want to build LLVM, we can follow the instructions from
upstream/cache/sysroot is a very important directory. It contains the header and library files for the subsequent compiling and linking.
Compile C++ to WebAssembly with Clang
It is not easy to be a master at compiling C++ programs because the compiler has thousands of options to control the compiling and linking process. As a curious developer, we can debug the LLVM and clang source code to understand how it works. Let’s start with the famous
hello world program.
Here we include the header file of
stdio.h, which is contained in the C language library
libc. Emscripten uses
musl library and
libc library customized for
wasi environment, and copys the
libunwind from LLVM, which are used to support the C++ language features.
To show how to compile the program, we split the progress into four stages: preprocessing, generating LLVM IR, generating assembly target object file, and linking.
--target=wasm32defines the wasm32 target
-Eindicates that we only run preprocessing of the
-vwill print the details of the execution process.
In the end, we will encounter the error
It is about the missing header file
stdio.h. C++ defines all function interfaces in header files. In the preprocessing step, the compiler will search the directories to find the header files. Here we are using the clang from
emsdk. So it can not find the correct header file. And we notice that the current header search directories only include:
Then we can add
-I to append the header search directory to resolve the problem. As usual, we can add the system header include directory. But the system-integrated headers and libraries are not adapted for WebAssembly. Emscripten offers all the basic headers and libraries required to build the wasm file. We can use the
--sysroot is only required with option
--target=wasm32. So we can not use it in the normal C++ compilation process, otherwise it fails.
LLVM Intermediate Presentation (IR)
It is the most important design in LLVM, that any languages compiled to IR format, could reuse the target platform code generation and assembling, with lots of optimization libraries. So WebAssembly is derived from the IR format and other LLVM-based programming languages could also easily be transformed into WebAssembly format.
-Iis used to prepend header search directories
-Sindicates to generate LLVM assembly file
-emit-llvmwill produce the LLVM intermediate representation
.llfile format is the text format of IR
Besides, to understand how the optimization process works, we can add
-O2 and check the output. If we run the command without
-emit-llvm, we can get a pure assembly format text file.
As to Emscripten, we find it adapts some header files to support WebAssembly. The wasm file is executed in a virtual machine. Currently, the WebAssembly instruction set does not support some system devices and kernel interfaces. So Emscripten needs to replace these functions when compiling.
Target Object File
In this stage, the clang will use the LLVM IR to generate target platform object files. So LLVM will handle all the following work to construct an executable binary file.
-coption will produce the target compiled object file.
It produces a
hello.o binary file, which the linker could resolve the symbols with and do some optimization work in the linking stage.
Let’s run compiling and linking by separate command tools. Clang only chains the tools to complete building the executable target file. To show the linking progress, we only need to use the tool
wasm-ld, instead of
ld. More details could be found in driver.cpp
wasm-ldis the wasm linker in clang tools.
-o hello.wasmindicates that the output is a wasm file.
hello.ois the compiled object file.
-L...prepends the object library search directory in the linking stage.
-lcompiler_rtare used to add
compiler_rtlibrary to linker when resolving symbols.
--no-entryis used to avoid
entry symbol not defined _starterror.
If we look into the wasm file, we will find it has some Emscripten symbols inside.
wasm2wat is a tool one of Wabt toolkits, which is used for transforming a wasm binary file to a human-readable text format file.
Compile C++ to WebAssembly with Emscripten
In the last chapter, we have compiled C++ programs to WebAssembly with clang toolchains. Actually, Emscripten helps organize the driver process to generate WebAssembly files, with a python tool called
We dumped out the commands in
emcc, as it shows the best practice for compiling WebAssembly with Emscripten toolchains. By following the python debugger, we find the
emcc tool split the whole process into three main phases:
The compile command in
The link command in
emcc has done a lot of optimization work in the compiling and linking process.
Binaryen - the post link phase
Binaryen is another optimization tool integrated into Emscripten. It will extract the wasm binary file into a new AST (Abstract Syntax Tree), rather than the wasm plain stack format. Then this AST helps optimize the wasm file further in
Binaryen will not be launched unless
-O3 parameter is passed to the
WebAssembly API with Emscripten
Module. Apparently, it is easier and more convenient to use the
Let’s create a
math.c we define an external function
alloc function used to do memory allocation, and a
factorial function to test the operations. Then let’s build the wasm file.
And in the linking step, we make use of the full command from
Using the option
--export-all will make the linker import function
Instead, we can use the
emcc command directly to complete the whole building process.
wasm-optto optimize the wasm file, which is a tool from
-sERROR_ON_UNDEFINED_SYMBOLS=0is used to avoid the
compiler.jsprocessing bug in
Then we add the following code in a file
index.html in chrome and we can get the output in the console.
In this article, we reviewed how the
emcc script works and extracted the
WebAssembly text format, which helps a lot in the whole debugging process. And we post some suggestions on developing C/C++ with WebAssembly.
emcc is a helpful tool for compiling C++ to wasm, but it’s complicated to understand both
- Discord: https://discord.gg/89fFapjfgM