O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Ruxmon.2013-08.-.CodeBro!

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 30 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Anúncio

Semelhante a Ruxmon.2013-08.-.CodeBro! (20)

Ruxmon.2013-08.-.CodeBro!

  1. 1. Improving static code review using AST-based code analysis Christophe Alladoum @_hugsy_ hugsy
  2. 2. Who am I ? ➔ Christophe Alladoum ➔ IOActive pirate ➔ blah blah blah
  3. 3. What about ? ➔ I read a LOT of code ◆ mostly for fun (eventually for work) ● just to know how it works ● occasionally to find bugs ◆ most of the time, C code ● sometimes C++ ● occasionally higher level stuff: PHP (lol), Java, Python, ...
  4. 4. What about ? ➔ C code is tricky & not trivial ● many standards (ANSI C - C89, C99, C11, etc..) ● many bad coding practices ● MANY subtleties in the language ➔ Ergo, many places for flaws ● logic errors ● programming errors ● lack of restriction in code (buffers, integers) I like
  5. 5. Existing automated tools ● Many Open-Source & licenced ($$$) tools use regexp to find weak patterns ● Insufficient approach : ○ Example using latest flawfinder : ○ Basically as clever as making a `grep` which is one of the best vuln finder btw Ok, thanks !
  6. 6. Existing automated tools ○ and (too) many times, there are “strange” results ○ Usually a very *bad* idea to just paste output from those tools in a (serious) code review report *PLUS* splint fails to see vulnerable calls
  7. 7. A smarter approach ➔ C based code projects are ultimately made to be compiled & linked ◆ Compilers are the best code reviewers !! ● Code is parsed and transformed into another format ● Code is validated ● Some additional checks are even provided by default for programming errors (type checks, unused vars, invalid formatted strings, uninitialized values, etc…)
  8. 8. Quick reminder on compilers ● Compiler, noun : set of programs that transforms source code written in a programming language into another computer language (Wikipedia). ■ Examples : GCC, as, Python ( which embeds a JIT compiler), etc... ● Abstract representation of compiler behavior:
  9. 9. LLVM Specifics ● What makes LLVM so special ? ○ LLVM (Low-Level Virtual Machine) : 13 year old project ○ Many different projects around this architecture ○ LLVM structure *truly* isolates each part (lexing/optimizing/generating) ○ Totally Plug-and-Play ● you can easily write a lexer for generating Python .pyc file ... ● … or you can use optimizer API to help runtime bug detection (heard of Google AddressSanitizer module ?) … ● … or you can use an existing parser (for instance GCC’s) and bind it to the rest of the LLVM architecture (llvm-gcc) → really cool features ! Go hack it !!
  10. 10. LLVM Specifics ● Clang ○ Default C/C++/Obj-C compiler based for LLVM architecture ○ Parser gets .c, .cpp, .m files as input and generates an Intermediate Representation (IR) of the code → this is achieved thanks to an Abstract Syntax Tree (AST) created when “reading” each source file ○ An API is provided to interact with the generated AST → in native C++ → or higher languages, like Python ■ This means that Clang parses the code for us, then why not use this to parse code in a smart way (and ultimately find vulnerabilities) ?
  11. 11. Clang Python API ● Relatively easy to use... ○ … but not enough thoroughly documented (just automatically generated documentation) → pydoc works fairly well on it ○ Many blog posts (but sometimes outdated on the topic) ○ Namespace fairly intuitive Basic example : outputs
  12. 12. Demo ● clang-draw-ast.py is a 70-line Python script that will parse a C source file and display (PNG format) the corresponding AST.
  13. 13. (This is the expected result if live demo fails)
  14. 14. Let’s have a look...
  15. 15. The magic inside Indexation engine API is exposed by `clang.cindex` package. ● Index ○ top-level object which manages some global library state. ● TranslationUnit ○ High-level object encapsulating the AST for a single translation unit (parsed on the fly) ● SourceRange, SourceLocation, and File ○ Objects representing information about the input source.
  16. 16. Clang internals voodoo The routines in this group provide the ability to create and destroy translation units from files, either by parsing the contents of the files or by reading in a serialized representation of a translation unit. ● Once indexation engine is created, parse() function will output a TranslationUnit object ○ The most important object ● Cursor object that will iterate through all nodes ○ kind : declare the type of the current node ○ displayname : display name for the entity referenced ○ location : returns the source location (the starting character) ○ get_children() : return an iterator for accessing the children of this cursor ○ get_arguments(): return an iterator for accessing the arguments of this cursor
  17. 17. Clang internals voodoo Now we can better understand the previous script Easy, right ? 1 2 3 4
  18. 18. Pros / Cons Pros ● simple and intuitive Python bindings ● full control over all the code being audited ● parsing and browsing are fast ● can be extended with LLVM extra modules Cons ● generated over Python ctypes : might not work as well for other high level languages (Ruby, Java, etc.) Limitations ? ● Many developments, API keeps on improving and docs becoming more complete
  19. 19. Introducing CodeBro! ● Built as a Proof-of-Concept around this idea ○ Meaning : you can use it but don’t rely on it ● Underlying idea : create a web-based tool that would interface between AST and code reviewer ○ Code reviewer can smartly analyse/navigate through code and eventually add some modules to detect basic (or advanced) vulnerabilities
  20. 20. CodeBro! ● 100% Open-Source ○ Beer-Ware License ● 100% full Python ● (Hopefully) Easily installable (pip) ● Django (compat. 1.5+) based application ○ combines many cool Python based technologies ■ PyDot ■ PyCharm ■ Pygments ■ etc. ○ Allows to keep things simple ■ 1 project to audit = 1 specific database (default : SQLite)
  21. 21. CodeBro! ● Uses Clang parsing module to dynamically interact with code ○ Cross-referencing feature similar to IDA Pro ■ only between functions (caller/callee) ○ call graphs generation : visual understanding of code ■ SVG generated graph → can be browsed through browser
  22. 22. CodeBro! ● “Analysis” module ○ reports all default diagnostics provided by Clang ○ provides a “Plugin” API ■ some modules implemented ■ … some more to come
  23. 23. CodeBro! ● Extensible through plugins ○ can use AST and/or already existing references ○ Examples : ■ detecting dead code ● find all functions never called (i.e. no down Xref to it) ■ improving format string flaws detection ● “count” number of args for known functions (printf, sprintf, etc.) and parse the arguments ● detect formatted string wrapping functions (based on former calls) ■ (in a limited extent) detect use-after-free like this →
  24. 24. Demo time (More screenshots if demo still fails)
  25. 25. Code project listing
  26. 26. Code browsing - unparsed then parsed
  27. 27. Call graph generation : SVG generation (href linking) ← Functions listing
  28. 28. Future enhancements ● Still a work in progress ● Fix bugs ● Index all components of source files (instead of just CALL_EXPR and FUNCTION_DECL) ● Improve search engine ● Add macro parsing ● Integrate more source code input vector (GIT - as soon as there is a decent Python GIT bindings package) ● Improve C++ and Objective-C analysis ● Add moar modulez !!
  29. 29. The end QUESTIONS ?
  30. 30. Links : ● https://github.com/hugsy/codebro ● https://twitter.com/_hugsy_ ● http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang ● http://llvm.org/devmtg/2010-11/Gregor-libclang.pdf ● https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer

×