commits
| 2026-04-02 | Merge branch 'function-pointer' | Sören Tempel |
| 2026-04-02 | quebex: Fully initialize envSyms in mkEnv | Sören Tempel |
| 2026-04-02 | quebex: Move showAddr function to Memory module | Sören Tempel |
| 2026-04-01 | quebex: Start working on a text memory segment | Sören Tempel |
| 2026-04-01 | quebex-cli: Relax constraints on main() return value type | Sören Tempel |
Clone the repository to access all 487 commits.
README
A work-in-progress software analysis framework built around the QBE intermediate language.
Motivation
Existing analysis frameworks are predominantly built around LLVM. Unfortunately, LLVM is a fast-moving target with constant changes and updates to its intermediate representation. Therefore, tooling built on LLVM often requires dated LLVM versions (e.g., KLEE currently recommends LLVM 13 released in 2022). Obtaining these LLVM versions can be cumbersome and often hinders employment of these tools. To overcome these issues, maintainers of analysis tooling need to constantly invest time to catch up with LLVM releases instead of focusing on improving their analysis framework.
In order to reduce the maintenance burden, this project attempts to investigates the utilization of another intermediate language for software analysis: QBE. QBE is a much smaller-scale project than LLVM and thereby offers a higher degree of stability. Further, QBE is simpler than LLVM (e.g., providing fewer operations) and thereby eases the implementation of analysis techniques. Nonetheless, there exist compiler frontends that can emit a representation in the QBE intermediate representation (which can then be analyzed using quebex!). For example, SCC, cproc, or the Hare compiler.
Status
I currently consider this a vertical prototype. A lot of the desired
functionality is already there, but not fully developed and tested.
However, all major features of the QBE specification are
nowadays implemented to some degree. Consequentially, it is possible to
process QBE programs emitted by existing compiler frontends such as the
cproc C11 compiler. In terms of
analysis features, the implementation currently focuses on dynamic
analysis techniques (primarily symbolic
execution). Unfortunately, there is basically no documentation for
the API and the provided command-line frontends (quebex and
quebex-symex) are presently very rudimentary.
Architecture
The foundation of this project is a formal, yet executable,
description of the QBE intermediate language. At the time of writing, it
targets v1.2 of the
QBE specification. The syntax is specified using literate
Haskell and parser
combinators in the quebex-syntax library. The language
semantics are expressed in a modular way by distinguishing abstract and
actual semantics. Abstract semantics of the QBE language are
described in terms of a Simulator monad (i.e., an abstract monad). This monad
must then be instantiated, whereby actual semantics are
specified. Presently, the following instantiations are supported:
- Concrete semantics, provided by
Language.QBE.Simulator.Default.State. - Symbolic
(specifically concolic)
semantics through
Language.QBE.Simulator.Concolic.State.
The former is primarily useful for simulation of programs written in the QBE intermediate language. The latter intended for automated software testing using symbolic execution and—as demonstrated below—can be used to automatically generate test inputs.
The abstract description of the QBE semantics, in terms of the
Simulator monad, and its concrete instantiation are
provided by the quebex library. The symbolic semantics are
implemented by a separate quebex-symex library. Additional
semantics (e.g., for abstract
interpretation) can be implemented by building on top of these
existing libraries.
Further, executable programs are provided by the
quebex-cli component. These programs can be used directly
from a shell, without interacting with the Haskell codebase. Presently
the following executable program components are available:
quebex: A simulator for QBE programs built on top of the concrete semantics.quebex-symex: An automated software testing tool facilitating the symbolic semantics.
These program components operate directly on QBE input programs.
Installation
After cloning the repository, individual components can be installed
using Cabal (e.g.,
cabal install quebex-cli). However, presently the codebase
is mainly tested with selected GHC versions; therefore, installation
using Guix is recommended. For
example, in order to install the quebex-cli component using
Guix:
$ guix time-machine -C .guix/channels.scm -- install -L .guix/modules/ quebex-cli
Afterwards, if Guix is configured correctly, the aforementioned
program components (quebex and quebex-symex)
should be available in your $PATH. The following section
demonstrates usage of these components.
Demonstration
This framework is primarily intended to be used as a library, allowing the implementation of both static and dynamic analysis techniques based on QBE. Presently, it focuses on dynamic analysis, and sufficient documentation of the library interface is still lacking. Nonetheless, it is already capable of executing QBE representations of complex C code (e.g., as emitted by cproc). In order to experiment with the current capabilities, the following subsections demonstrate utilization of the aforementioned program components.
Concrete Execution
Consider the following “Hello, World!” program:
#include <stdio.h>
int main(void) {
puts("Hello, World!");
return 0;
}In order to concretly execute this program using quebex,
we need to obtain an equivalent representation in QBE. To this end, we
can invoke the cproc compiler as
follows:
$ cproc -emit-qbe hello.c
The resulting QBE file can then be executed with the concrete semantics using:
$ quebex hello.qbe
Hello, World!
Note that quebex is only able to invoke the
puts(3) function because it intercepts its execution,
providing a “simulated” version of it. Presently, only a limited amount
of standard library functions are intercepted in this way. As such,
interactions with the file system or more complex output functions
(e.g. printf(3)) are currently not supported.
Symbolic Execution
Symbolic execution is a dynamic software analysis technique that explores reachable program paths based on a symbolic input variable (i.e., an input source). As an example, consider the following C program:
#include <stdio.h>
#include <stddef.h>
// Convert a memory region to an unconstrained symbolic value. Like calloc(3), it can
// account for memory regions which store multiple elements (nelem) of a specific size
// (elsiz). The give name is used to identify the symbolic variable and must be unique.
extern void quebex_make_symbolic(void *ptr, size_t nelem, size_t elsiz, const char *name);
int main(void) {
int a;
quebex_make_symbolic(&a, 1, sizeof(a), "a");
if (a == 42) {
puts("you found the answer");
} else {
puts("not the answer");
}
return 0;
}This program can be compiled using the QBE-based cproc C11 compiler as follows:
$ cproc -emit-qbe example.c
The resulting QBE representation (example.qbe) can be
symbolically executed using quebex-symex:
$ quebex-symex --write-tests tests/ example.qbe
This will yield the following output:
not the answer
you found the answer
---
Amount of paths: 2
This indicates that we found two execution paths through our program
based on the symbolic variable a. Due to the
--write-tests option, quebex-symex will create a
tests/ directory that contains test inputs in the ktest
format, one for each execution path. These files can be inspected
with the ktest-tool from KLEE (which is included in the Guix
development environment described below). For example:
$ ktest-tool tests/test000002.ktest
ktest file : 'tests/test000002.ktest'
args : ['example.qbe']
num objects: 1
object 0: name: 'a1'
object 0: size: 4
object 0: data: b'*\x00\x00\x00'
object 0: hex : 0x2a000000
object 0: int : 42
object 0: uint: 42
object 0: text: *...
This tells us that the second execution path, where the program
prints you found the answer, was triggered with
a1 := 42. From these files, we can—for
example—automatically generate
high-coverage test cases. In the future, it will be possible to
replay selected .ktest files using
quebex-cli.
Design Goals
This project is intentionally written in a simple subset of the Haskell programming language. It should be usable by anyone with a basic Haskell background (e.g., as obtained by reading Learn You a Haskell for Great Good!). Further, the project should require minimal long-term maintenance and should also support older GHC versions. Therefore, it uses the GHC2021 language standard and avoids usage of additional language extensions. Further, whenever possible, dependencies on external libraries that are not bundled by GHC must be avoided.
Development
Code should be formatted using ormolu. Git hooks performing several sanity checks, including ensuring the proper code formatting, are available. These hooks can be enabled using:
$ git config --local core.hooksPath .githooks
Further, a Guix environment for development purposes can be obtained using:
$ guix time-machine -C .guix/channels.scm -- shell -L .guix/modules/ -m .guix/manifest.scm
License
This project uses the REUSE Specification to indicated used software license.