Clang Tips and Tricks

Introduction

I’ve assembled a list of random tips and tricks that I came across while working with Clang. Most of them serve to clear up some confusing behaviors or other complicated parts not well covered in the Clang documentation.

Obtaining the proper SourceLocation

When using a Rewriter to modify source code, you must provide a SourceLocation for the Rewriter to start its edits. The SourceLocation for a Stmt or Expr can also be useful in a debugging dump.

Assume you have a pointer func to the FunctionDecl for foo() below:

int foo(int arg) {
    // body
}
  • func->getLocStart() points to the very beginning of the definition: the “i” in “int” at Line 1, Column 1.
  • func->getLocation() points to the start of the function name: the “f” in “foo” at Line 1, Column 5.

Another example: assume you have a pointer expr to the CallExpr for mac(3) below:

#define mac(x) foo(x)
int main() {
    mac(3);
}
  • expr->getLocStart() points to the start of the function call: the “m” in “mac” at Line 3, Column 2 (the tab is Column 1).
  • We can also get the location of the macro definition by using a SourceManager object reference srcMgr.
    srcMgr.getSpellingLoc(expr->getLocStart()) points to the start of the macro definition: the “f” in “foo” at Line 1, Column 16.

Then, if we have a CompilerInstance pointer *m_CI, we can use a Preprocessor reference object to get the beginning location of the macro definition itself, like so:

Preprocessor &pre = m_CI->getPreprocessor();
MacroInfo *mac = pre.getMacroInfo(pre.getIdentifierInfo(pre.getImmediateMacroName(expr->getLocStart())));
SourceLocation beginning = mac->getDefinitionLoc();
  • The variable beginning above will point to the “m” in “mac” after “#define”, at Line 1, Column 9.

Getting Function Definition from a Function Call

Sometimes it’s useful to access the actual definition of a function when you encounter a call to that function. Consider the following example:

int bar(int x);

int main(void) {
    int result = bar(3);
    return result;
}

int bar(int x) {
    return x + 3;
}

If you’re visiting each statement using something like a RecursiveASTVisitor, then you’ll encounter a CallExpr representing the call to bar(3) on L4. If you need to examine the body of the called function, bar(), then you can use CallExpr::getDirectCallee() to get the FunctionDecl for bar() (L8).

However, getDirectCallee() may return a reference to the function prototype (L1) rather than the actual definition, which is useless to most programmers. You can check if the FunctionDecl* returned by getDirectCallee() is the actual function definition using one or more of the following functions from the FunctionDecl class. For the below functions, assume it was called on a FunctionDecl* func.

  • func.hasBody() — returns true if this function has a defined body at all, not necessarily if func itself is the definition.
  • func.hasBody(func) — same as above, but sets the parameter func to point to the actual definition of the function.
  • func.isThisDeclarationADefinition() — returns true only if func itself is the actual definition of the function.
  • func.doesThisDeclarationHaveABody() — returns true if func itself has a body (note the difference from the first hasBody()). Note that this is typically true if the above isThisDeclarationADefinition() is true.

The second hasBody() above can provide a reference to the actual definition you want, while the other three can help you test whether you have the correct func pointer.

Using SourceManager

The SourceManager class has a plethora of functions to access information about the original source code by tying it back to AST nodes. From the documentation, it sounds like a single SourceManager instance persists throughout the entire execution of your program, meaning that only one SourceManager object is created to handle all source files. Although this was my initial assumption, this is wrong. A new SourceManager object is created for every single source file, so you have a new instance for each new source file inputted to your Clang program.

In addition to SourceManager, some other objects are re-initialized for each new input source file:

Conclusion

Hopefully these tips helped you avoid some potential confusion regarding the inner workings of Clang. Feel free to post a comment request for other Clang internals.

Check out the rest of my Clang posts in the blog archive.
Advertisements

5 comments

  1. HI Kevin,Thanks. Your tips and tutorial helped me getting started. I have one request though. If you provide details/tutorial about how you did static analysis on linux kernel and drivers that would helpful. thanks

  2. Hello, I have three questions, don’t know if you encountered them before.
    1) Do you know how to access to macros in a TranslationUnit dirrectly? I use Lexer to retrieve tokens, but “#define ….” information are not there. Looks like Lexer will do preprocessing first but I don’t know how to turn it off. I can only access to macro once I found other source code is using that macro, then I could get the spelling location of that macro.
    2) Do you know how to access to comments in TranslationUnit? It’s not in Lexer either.
    3) I posted the third question on clang mailing list http://clang-developers.42468.n3.nabble.com/How-to-avoid-visiting-children-of-macro-while-using-RecursiveASTVisitor-td4036170.html . return value of visit methods in Clang’s ASTVisitor have different meaning, do you know how to achieve that avoid visiting children of a particular node?

    1. 1) I haven’t ever needed to iterate through or access macros without first encountering their usage in a code statement. But it seems like you should be able to iterate through a list of all Identifiers in the TranslationUnit and check if it’s a macro, similar to this question on the cfe-dev list.
      2) To access comments in a source file, you’ll want to take a look at the RawCommentList class. Also take a look at this presentation and the CommentHandler page.
      3) AFAIK, you don’t have fine-grained control over what nodes are visited if you use a typical Traverse* function. However, from the RecursiveASTVisitor documentation, it appears that you can use the WalkUpFrom* functions to avoid visiting child nodes, but that is beyond my knowledge of ASTVisitor.

  3. Thank you for sharing this tips and trick. I really liked it about function definition topic.
    Currently i am working on Block Decl part and i need your help on this.
    for function Decl we use getDirectCallee() but for blockDecl how we can get ??

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s