quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

  1<!--
  2SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net>
  3
  4SPDX-License-Identifier: GPL-3.0-only
  5-->
  6
  7## README
  8
  9A work-in-progress software analysis framework built around the [QBE] intermediate language.
 10
 11### Motivation
 12
 13Existing analysis frameworks are predominantly built around [LLVM].
 14Unfortunately, LLVM is a fast-moving target with constant changes and updates to its intermediate representation.
 15Therefore, tooling built on LLVM often requires dated LLVM versions (e.g., [KLEE] currently [recommends LLVM 13][KLEE LLVM] released in 2022).
 16Obtaining these LLVM versions can be cumbersome and often hinders employment of these tools.
 17To overcome these issues, maintainers of analysis tooling need to constantly invest time to catch up with LLVM releases instead of focusing on improving their analysis framework.
 18
 19In order to reduce the maintenance burden, this project attempts to investigates the utilization of another intermediate language for software analysis: [QBE].
 20QBE is a much [smaller-scale project][QBE vs LLVM] than LLVM and thereby offers a higher degree of stability.
 21Further, QBE is simpler than LLVM (e.g., providing fewer operations) and thereby eases the implementation of analysis techniques.
 22Nonetheless, there exist compiler frontends that can emit a representation in the QBE intermediate representation (which can then be analyzed using quebex!).
 23For example, [SCC], [cproc], or the [Hare compiler][Hare].
 24
 25### Status
 26
 27I currently consider this a vertical prototype.
 28A lot of the desired functionality is already there, but not fully developed.
 29Presently, it covers the majority of QBE features needed for medium-complexity QBE programs (e.g., as emitted by [cproc]).
 30Notably, proper support for variadic functions is missing.
 31In terms of analysis features, the implementation presently focuses on dynamic analysis techniques (primarily [symbolic execution]).
 32Unfortunately, there is basically no documentation for the API and the provided command-line frontends (`quebex` and `quebex-symex`) are presently very basic.
 33
 34### Architecture
 35
 36This project provides a formal, yet executable, description of the QBE intermediate language.
 37Currently, it targets [v1.2 of the QBE specification][QBE v1.2].
 38The syntax is specified using [literate Haskell][literate programming] and [parser combinators] in the `quebex-syntax` library.
 39The language semantics are expressed in a modular way by distinguishing abstract and actual semantics.
 40*Abstract semantics* of the QBE language are described in terms of a `Simulator` monad (i.e., an [abstract monad]).
 41This monad must then be instantiated, whereby *actual semantics* are specified.
 42Presently, the following instantiations are supported:
 43
 441. Concrete semantics, provided by `Language.QBE.Simulator.Default.State`.
 45   This is useful for simulation of the QBE intermediate language.
 462. [Symbolic][symbolic execution] (more specifically [concolic][concolic testing]) semantics, provided by `Language.QBE.Simulator.Concolic.State`.
 47   This is intended for automated software testing using [symbolic execution].
 48
 49The abstract description of the QBE semantics, in terms of the `Simulator` monad, and its concrete instantiation are provided by the `quebex` library.
 50The symbolic semantics are implemented by a separate `quebex-symex` library.
 51Additional semantics can be implemented by building on top of these existing Haskell libraries.
 52
 53Further, executable programs are provided by the `quebex-cli` component.
 54Presently the following executable program components are available:
 55
 561. `quebex`: A simulator for QBE programs built on top of the concrete semantics.
 572. `quebex-symex`: An automated software testing tool facilitating the symbolic semantics.
 58
 59These program components can be used directly on QBE input programs.
 60
 61### Installation
 62
 63After cloning the repository, individual components can be installed using `cabal install`.
 64However, presently specific GHC versions are required; therefore, installation using [Guix] is recommended.
 65For example, in order to install the `quebex-cli` component using Guix:
 66
 67```
 68$ guix time-machine -C .guix/channels.scm -- install -L .guix/modules/ quebex-cli
 69```
 70
 71Afterwards, if Guix is configured correctly, the aforementioned program components (`quebex` and `quebex-symex`) should be available in your `$PATH`.
 72The following section demonstrates usage of these components.
 73
 74### Demonstration
 75
 76This framework is primarily *intended to be used as a library*, allowing the implementation of both static and dynamic analysis techniques based on QBE.
 77Presently, it focuses on dynamic analysis, and sufficient documentation of the library interface is still lacking.
 78Nonetheless, it is already capable of executing QBE representations of medium-complexity C code (e.g., as emitted by [cproc]).
 79In order to experiment with the current capabilities, the following subsections demonstrate utilization of the aforementioned program components.
 80
 81#### Concrete Execution
 82
 83Consider the following "Hello, World!" program:
 84
 85```C
 86#include <stdio.h>
 87
 88int main(void) {
 89	puts("Hello, World!");
 90	return 0;
 91}
 92```
 93
 94In order to concretly execute this program using `quebex`, we need to obtain an equivalent representation in QBE.
 95To this end, we can invoke the [cproc] compiler as follows:
 96
 97```
 98$ cproc -emit-qbe hello.c
 99```
100
101The resulting QBE file can then be executed with the concrete semantics using:
102
103```
104$ quebex hello.qbe
105Hello, World!
106```
107
108Note that `quebex` is only able to invoke the `puts(3)` function because it intercepts its execution, providing a "simulated" version of it.
109Presently, only a limited amount of standard library functions are intercepted in this way.
110As such, interactions with the file system or more complex output functions (e.g. `printf(3)`) are currently not supported.
111
112#### Symbolic Execution
113
114[Symbolic execution][symbolic execution] is a dynamic software analysis technique that explores reachable program paths based on a symbolic input variable (i.e., an input source).
115As an example, consider the following C program:
116
117```C
118#include <stdio.h>
119#include <stddef.h>
120
121extern void quebex_symbolic_array(void *ptr, size_t nelem, size_t elsiz, const char *name);
122
123int main(void) {
124	int a;
125	quebex_symbolic_array(&a, 1, sizeof(a), "a");
126	if (a == 42) {
127		puts("you found the answer");
128	} else {
129		puts("not the answer");
130	}
131
132	return 0;
133}
134```
135
136This program can be compiled using the QBE-based [cproc] C compiler as follows:
137
138```
139$ cproc -emit-qbe example.c
140```
141
142The resulting QBE representation (`example.qbe`) can be symbolically executed using quebex-symex:
143
144```
145$ quebex-symex --write-tests tests/ example.qbe
146```
147
148This will yield the following output:
149
150```
151not the answer
152you found the answer
153
154---
155Amount of paths: 2
156```
157
158This indicates that we found two execution paths through our program based on the symbolic variable `a`.
159Due to the `--write-tests` option, quebex-symex will create a `tests/` directory that contains test inputs in the [ktest format][KLEE ktest], one for each execution path.
160These files can be inspected with the `ktest-tool` from [KLEE] (which is included in the Guix development environment described below).
161For example:
162
163```
164$ ktest-tool tests/test000002.ktest
165ktest file : 'tests/test000002.ktest'
166args       : ['example.qbe']
167num objects: 1
168object 0: name: 'a1'
169object 0: size: 4
170object 0: data: b'*\x00\x00\x00'
171object 0: hex : 0x2a000000
172object 0: int : 42
173object 0: uint: 42
174object 0: text: *...
175```
176
177This tells us that the second execution path, where the program prints `you found the answer`, was triggered with `a1 := 42`.
178From these files, we can—for example—automatically [generate high-coverage test cases][KLEE OSDI].
179In the future, it will be possible to replay selected `.ktest` files using `quebex-cli`.
180
181### Design Goals
182
183This project is intentionally written in a simple subset of the [Haskell] programming language.
184It should be usable by anyone with a basic Haskell background (e.g., as obtained by reading [Learn You a Haskell for Great Good!][learnyouahaskell]).
185Further, the project should require minimal long-term maintenance and should also support older GHC versions.
186Therefore, it uses the [GHC2021] language standard and avoids usage of additional language extensions.
187Further, whenever possible, dependencies on external libraries that are [not bundled by GHC][GHC libraries] must be avoided.
188
189### Development
190
191Code should be formatted using [ormolu][ormolu github].
192Git hooks performing several sanity checks, including ensuring the proper code formatting, are available.
193These hooks can be enabled using:
194
195	$ git config --local core.hooksPath .githooks
196
197Further, a [Guix] environment for development purposes can be obtained using:
198
199	$ guix time-machine -C .guix/channels.scm -- shell -L .guix/modules/ -m .guix/manifest.scm
200
201### License
202
203This project uses the [REUSE Specification] to indicated used software license.
204
205[QBE]: https://c9x.me/compile/
206[QBE vs LLVM]: https://c9x.me/compile/doc/llvm.html
207[QBE v1.2]: https://c9x.me/compile/doc/il-v1.2.html
208[LLVM]: https://llvm.org/
209[KLEE]: https://klee-se.org
210[KLEE LLVM]: https://klee-se.org/build/build-llvm13/
211[KLEE OSDI]: https://www.usenix.org/legacy/events/osdi08/tech/full_papers/cadar/cadar.pdf#page=9
212[KLEE ktest]: https://klee-se.org/tutorials/testing-function/#klee-generated-test-cases
213[SCC]: https://www.simple-cc.org/
214[cproc]: https://sr.ht/~mcf/cproc/
215[Hare]: https://harelang.org/
216[Haskell]: https://haskell.org/
217[GHC]: https://www.haskell.org/ghc/
218[GHC2021]: https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/control.html#extension-GHC2021
219[GHC libraries]: https://ghc.gitlab.haskell.org/ghc/doc/libraries/index.html
220[learnyouahaskell]: https://learnyouahaskell.github.io/chapters.html
221[libriscv]: https://github.com/agra-uni-bremen/libriscv
222[ormolu github]: https://github.com/tweag/ormolu
223[REUSE Specification]: https://reuse.software/spec-3.3/
224[Guix]: https://guix.gnu.org
225[symbolic execution]: https://en.wikipedia.org/wiki/Symbolic_execution
226[concolic testing]: https://en.wikipedia.org/wiki/Concolic_testing
227[literate programming]: https://en.wikipedia.org/wiki/Literate_programming
228[parser combinators]: https://en.wikipedia.org/wiki/Parser_combinator
229[abstract monad]: https://doi.org/10.1145/3607833