quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

  1<!--
  2SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net>
  3
  4SPDX-License-Identifier: GPL-3.0-only
  5-->
  6
  7## README
  8
  9A work-in-progress software analysis framework built around the [QBE] intermediate language.
 10
 11### Motivation
 12
 13Existing analysis frameworks are predominantly built around [LLVM].
 14Unfortunately, LLVM is a fast-moving target with constant changes and updates to its intermediate representation.
 15Therefore, tooling built on LLVM often requires dated LLVM versions (e.g., [KLEE] currently [recommends LLVM 13][KLEE LLVM] released in 2022).
 16Obtaining these LLVM versions can be cumbersome and often hinders employment of these tools.
 17To overcome these issues, maintainers of analysis tooling need to constantly invest time to catch up with LLVM releases instead of focusing on improving their analysis framework.
 18
 19In order to reduce the maintenance burden, this project attempts to investigates the utilization of another intermediate language for software analysis: [QBE].
 20QBE is a much [smaller-scale project][QBE vs LLVM] than LLVM and thereby offers a higher degree of stability.
 21Further, QBE is simpler than LLVM (e.g., providing fewer operations) and thereby eases the implementation of analysis techniques.
 22Nonetheless, there exist compiler frontends that can emit a representation in the QBE intermediate representation (which can then be analyzed using quebex!).
 23For example, [SCC], [cproc], or the [Hare compiler][Hare].
 24
 25### Status
 26
 27I currently consider this a vertical prototype.
 28A lot of the desired functionality is already there, but not fully developed and tested.
 29However, all major features of the [QBE specification][QBE v1.2] are nowadays implemented to some degree.
 30Consequentially, it is possible to process QBE programs emitted by existing compiler frontends such as the [cproc] C11 compiler.
 31In terms of analysis features, the implementation currently focuses on dynamic analysis techniques (primarily [symbolic execution]).
 32Unfortunately, there is basically no documentation for the API and the provided command-line frontends (`quebex` and `quebex-symex`) are presently very rudimentary.
 33
 34### Architecture
 35
 36The foundation of this project is a formal, yet executable, description of the QBE intermediate language.
 37At the time of writing, it targets [v1.2 of the QBE specification][QBE v1.2].
 38The syntax is specified using [literate Haskell][literate programming] and [parser combinators] in the `quebex-syntax` library.
 39The language semantics are expressed in a modular way by distinguishing abstract and actual semantics.
 40*Abstract semantics* of the QBE language are described in terms of a `Simulator` monad (i.e., an [abstract monad]).
 41This monad must then be instantiated, whereby *actual semantics* are specified.
 42Presently, the following instantiations are supported:
 43
 441. Concrete semantics, provided by `Language.QBE.Simulator.Default.State`.
 452. [Symbolic][symbolic execution] (specifically [concolic][concolic testing]) semantics through `Language.QBE.Simulator.Concolic.State`.
 46
 47The former is primarily useful for simulation of programs written in the QBE intermediate language.
 48The latter intended for automated software testing using [symbolic execution] and—as demonstrated below—can be used to automatically generate test inputs.
 49
 50The abstract description of the QBE semantics, in terms of the `Simulator` monad, and its concrete instantiation are provided by the `quebex` library.
 51The symbolic semantics are implemented by a separate `quebex-symex` library.
 52Additional semantics (e.g., for [abstract interpretation]) can be implemented by building on top of these existing libraries.
 53
 54Further, executable programs are provided by the `quebex-cli` component.
 55These programs can be used directly from a shell, without interacting with the Haskell codebase.
 56Presently the following executable program components are available:
 57
 581. `quebex`: A simulator for QBE programs built on top of the concrete semantics.
 592. `quebex-symex`: An automated software testing tool facilitating the symbolic semantics.
 60
 61These program components operate directly on QBE input programs.
 62
 63### Installation
 64
 65After cloning the repository, individual components can be installed using [Cabal] (e.g., `cabal install quebex-cli`).
 66However, presently the codebase is mainly tested with selected GHC versions; therefore, installation using [Guix] is recommended.
 67For example, in order to install the `quebex-cli` component using Guix:
 68
 69```
 70$ guix time-machine -C .guix/channels.scm -- install -L .guix/modules/ quebex-cli
 71```
 72
 73Afterwards, if Guix is configured correctly, the aforementioned program components (`quebex` and `quebex-symex`) should be available in your `$PATH`.
 74The following section demonstrates usage of these components.
 75
 76### Demonstration
 77
 78This framework is primarily *intended to be used as a library*, allowing the implementation of both static and dynamic analysis techniques based on QBE.
 79Presently, it focuses on dynamic analysis, and sufficient documentation of the library interface is still lacking.
 80Nonetheless, it is already capable of executing QBE representations of complex C code (e.g., as emitted by [cproc]).
 81In order to experiment with the current capabilities, the following subsections demonstrate utilization of the aforementioned program components.
 82
 83#### Concrete Execution
 84
 85Consider the following "Hello, World!" program:
 86
 87```C
 88#include <stdio.h>
 89
 90int main(void) {
 91	puts("Hello, World!");
 92	return 0;
 93}
 94```
 95
 96In order to concretly execute this program using `quebex`, we need to obtain an equivalent representation in QBE.
 97To this end, we can invoke the [cproc] compiler as follows:
 98
 99```
100$ cproc -emit-qbe hello.c
101```
102
103The resulting QBE file can then be executed with the concrete semantics using:
104
105```
106$ quebex hello.qbe
107Hello, World!
108```
109
110Note that `quebex` is only able to invoke the `puts(3)` function because it intercepts its execution, providing a "simulated" version of it.
111Presently, only a limited amount of standard library functions are intercepted in this way.
112As such, interactions with the file system or more complex output functions (e.g. `printf(3)`) are currently not supported.
113
114#### Symbolic Execution
115
116[Symbolic execution][symbolic execution] is a dynamic software analysis technique that explores reachable program paths based on a symbolic input variable (i.e., an input source).
117As an example, consider the following C program:
118
119```C
120#include <stdio.h>
121#include <stddef.h>
122
123// Convert a memory region to an unconstrained symbolic value. Like calloc(3), it can
124// account for memory regions which store multiple elements (nelem) of a specific size
125// (elsiz). The give name is used to identify the symbolic variable and must be unique.
126extern void quebex_make_symbolic(void *ptr, size_t nelem, size_t elsiz, const char *name);
127
128int main(void) {
129	int a;
130	quebex_make_symbolic(&a, 1, sizeof(a), "a");
131	if (a == 42) {
132		puts("you found the answer");
133	} else {
134		puts("not the answer");
135	}
136
137	return 0;
138}
139```
140
141This program can be compiled using the QBE-based [cproc] C11 compiler as follows:
142
143```
144$ cproc -emit-qbe example.c
145```
146
147The resulting QBE representation (`example.qbe`) can be symbolically executed using quebex-symex:
148
149```
150$ quebex-symex --write-tests tests/ example.qbe
151```
152
153This will yield the following output:
154
155```
156not the answer
157you found the answer
158
159---
160Amount of paths: 2
161```
162
163This indicates that we found two execution paths through our program based on the symbolic variable `a`.
164Due to the `--write-tests` option, quebex-symex will create a `tests/` directory that contains test inputs in the [ktest format][KLEE ktest], one for each execution path.
165These files can be inspected with the `ktest-tool` from [KLEE] (which is included in the Guix development environment described below).
166For example:
167
168```
169$ ktest-tool tests/test000002.ktest
170ktest file : 'tests/test000002.ktest'
171args       : ['example.qbe']
172num objects: 1
173object 0: name: 'a1'
174object 0: size: 4
175object 0: data: b'*\x00\x00\x00'
176object 0: hex : 0x2a000000
177object 0: int : 42
178object 0: uint: 42
179object 0: text: *...
180```
181
182This tells us that the second execution path, where the program prints `you found the answer`, was triggered with `a1 := 42`.
183From these files, we can—for example—automatically [generate high-coverage test cases][KLEE OSDI].
184In the future, it will be possible to replay selected `.ktest` files using `quebex-cli`.
185
186### Design Goals
187
188This project is intentionally written in a simple subset of the [Haskell] programming language.
189It should be usable by anyone with a basic Haskell background (e.g., as obtained by reading [Learn You a Haskell for Great Good!][learnyouahaskell]).
190Further, the project should require minimal long-term maintenance and should also support older GHC versions.
191Therefore, it uses the [GHC2021] language standard and avoids usage of additional language extensions.
192Further, whenever possible, dependencies on external libraries that are [not bundled by GHC][GHC libraries] must be avoided.
193
194### Development
195
196Code should be formatted using [ormolu][ormolu github].
197Git hooks performing several sanity checks, including ensuring the proper code formatting, are available.
198These hooks can be enabled using:
199
200	$ git config --local core.hooksPath .githooks
201
202Further, a [Guix] environment for development purposes can be obtained using:
203
204	$ guix time-machine -C .guix/channels.scm -- shell -L .guix/modules/ -m .guix/manifest.scm
205
206### License
207
208This project uses the [REUSE Specification] to indicated used software license.
209
210[QBE]: https://c9x.me/compile/
211[QBE vs LLVM]: https://c9x.me/compile/doc/llvm.html
212[QBE v1.2]: https://c9x.me/compile/doc/il-v1.2.html
213[LLVM]: https://llvm.org/
214[KLEE]: https://klee-se.org
215[KLEE LLVM]: https://klee-se.org/releases/docs/v3.1/build-llvm13/
216[KLEE OSDI]: https://www.usenix.org/legacy/events/osdi08/tech/full_papers/cadar/cadar.pdf#page=9
217[KLEE ktest]: https://klee-se.org/releases/docs/v3.1/tutorials/testing-function/#klee-generated-test-cases
218[SCC]: https://www.simple-cc.org/
219[cproc]: https://sr.ht/~mcf/cproc/
220[Hare]: https://harelang.org/
221[Haskell]: https://haskell.org/
222[GHC]: https://www.haskell.org/ghc/
223[GHC2021]: https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/control.html#extension-GHC2021
224[GHC libraries]: https://ghc.gitlab.haskell.org/ghc/doc/libraries/index.html
225[learnyouahaskell]: https://learnyouahaskell.github.io/chapters.html
226[libriscv]: https://github.com/agra-uni-bremen/libriscv
227[ormolu github]: https://github.com/tweag/ormolu
228[REUSE Specification]: https://reuse.software/spec-3.3/
229[Guix]: https://guix.gnu.org
230[symbolic execution]: https://en.wikipedia.org/wiki/Symbolic_execution
231[concolic testing]: https://en.wikipedia.org/wiki/Concolic_testing
232[abstract interpretation]: https://en.wikipedia.org/wiki/Abstract_interpretation
233[literate programming]: https://en.wikipedia.org/wiki/Literate_programming
234[parser combinators]: https://en.wikipedia.org/wiki/Parser_combinator
235[abstract monad]: https://doi.org/10.1145/3607833
236[Cabal]: https://www.haskell.org/cabal/