What is Clang?
I’ve spent the last few months working with Clang, a frontend for the LLVM compiler project. Clang can parse and analyze any source code in the C language family (C, C++, ObjectiveC, etc…) and has a wonderful modular design that makes it easy to use.
If you’re looking to do static analysis on some C code, I highly recommend Clang — it’s vastly superior to other static analysis tools (like CIL … yuck!) and has decent documentation. Also, the Clang mailing list is very active and helpful if you ever find yourself stuck on something.
I personally used Clang to perform static analysis on I/O device drivers in the Linux kernel, including camera drivers and DRM drivers for graphics cards. Kernel code (especially drivers) can be very complex and difficult to analyze, but Clang allows you to handle it with ease. Let’s take a look at what you can do with Clang.
How does Clang work?
In most cases, Clang will run the Preprocessor (expanding all macros) and parse your source code into an Abstract Syntax Tree (AST). The preprocessed AST is a lot easier to work with than the source-level C code, but you can always reference the original code with ease. In fact, every data structure in Clang used to represent the code (AST, CFG, etc) can always relate back to the original source, which is quite useful for many analysis purposes (refactoring, etc).
If you need to analyze or modify code at the source level, Clang is better than LLVM. Doing analysis with LLVM means you must use LLVM’s internal representation of the code, which is similar to assembly.
The Clang AST
Almost every compiler and static analysis tool uses an AST to represent the source code. Clang’s AST is very detailed and complex, but you’ll actually enjoy learning about all the different Classes of AST elements. Here’s an introduction to the Clang AST, but the easiest way to learn about it is to just dump the AST for a simple source file to see how the AST is laid out.
In general, a Clang AST is made up of two very flexible Classes: Decl and Stmt. There are many subclasses of each; here are a few examples:
Most Classes in the AST are pretty self-explanatory, like ForStmt, IfStmt, and ReturnStmt. You’ll get the hang of the AST after playing with it for a few minutes. You can usually find the documentation for the Class by googling something like “Clang FunctionDecl.”
How Can I Use Clang?
Clang can be used as a drop-in replacement for gcc and it offers some cool built-in static analysis tools. As a programmer (not just a normal user!), you can access the full power of Clang by using it as a library in one of three ways, depending on how you wish to program.
First, go check out Clang’s own description of each interface. In addition to everything stated on that site, I’ve highlighted some other significant differences between the multiple Clang interfaces below.
Your code is the plugin itself and is run as a completely new instance for each source file, meaning you cannot keep any global information or other contextual information across different source files (but you can still run it on multiple files sequentially). A plugin is run by passing some options to your build system (Clang, Make, etc) via command-line arguments. It’s almost like enabling an optimization in GCC (e.g., “-O1”). You won’t be able to run any custom task before or after a source file is analyzed.
LibTooling (Clang Tool)
Your code is a normal C++ program; it has a normal main() function as the entry point. LibTooling is usually for running analysis on some source code (multiple files, if you want) separately from your normal build process. A new instance of your analysis code (and a new AST) will be created for each new source file (much like a Clang Plugin), but you are able to maintain contextual information across each source file because data items like global variables will persist. Since you have a main() function, you can also run tasks before or after Clang has finished analyzing all of your source files.
LibClang is great when you need a stable API. Clang changes periodically, and if you use a Plugin or Libtooling, you might have to update your code to match Clang’s changes (but don’t let that discourage you!). If you need to access Clang’s API from a language other than C++ (like Python), you must use LibClang.
- Note: LibClang does not give you full access to the AST (only high-level access), but the other two options do. Usually, you want full access to the AST.
If you still can’t decide, I would recommend starting with the LibTooling interface. It’s easiest and works in a way that you’d expect. It offers all the flexibility and full AST access of a Plugin without losing global context across source files. Plus, it’s no more difficult to set up LibTooling than a Plugin.
Getting Started with Clang
Now that you know a bit about the basics, let’s get started! For reference, these instructions work on any version of Linux (and probably Mac OS X) but were tested on Ubuntu. You can obtain LLVM and Clang easily by following these steps (taken from Clang’s official installation instructions):
- Download and install (using something like apt-get) all required packages.
(A typical Linux distro already comes with everything you need, except subversion.)
- Change to the directory you want to install LLVM in (e.g., ~/static_analysis/).
I’ll refer to this as the “top directory” from now on.
- Run the following commands in a terminal window:
$ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm $ cd llvm/tools $ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang $ cd clang/tools $ svn co http://llvm.org/svn/llvm-project/clang-tools-extra/trunk extra $ cd ../../../.. #go back to top directory $ cd llvm/projects $ svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt $ cd ../.. #go back to top directory $ cd llvm $ ./configure $ make #this takes a few hours $ sudo make install
It takes quite a while to build LLVM and Clang. Go make yourself a nice dinner.
$ clang --version
You can also test Clang by running it in place of GCC on a classic Hello World example:
$ clang hello.c -o hello $ ./hello
For this entire tutorial series, I use Clang 3.4 on Ubuntu 13.04, but you can probably get away with many different versions of each.
Now, let’s move on to some actual programming with Clang.