1<!--2SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net>34SPDX-License-Identifier: GPL-3.0-only5-->67## README89A work-in-progress software analysis framework built around the [QBE] intermediate language.1011### Motivation1213Existing analysis frameworks are predominantly built around [LLVM].14Unfortunately, LLVM is a fast-moving target with constant changes and updates to its intermediate representation.15Therefore, tooling built on LLVM often requires dated LLVM versions (e.g., [KLEE] currently [recommends LLVM 13][KLEE LLVM] released in 2022).16Obtaining these LLVM versions can be cumbersome and often hinders employment of these tools.17To overcome these issues, maintainers of analysis tooling need to constantly invest time to catch up with LLVM releases instead of focusing on improving their analysis framework.1819In order to reduce the maintenance burden, this project attempts to investigates the utilization of another intermediate language for software analysis: [QBE].20QBE is a much [smaller-scale project][QBE vs LLVM] than LLVM and thereby offers a higher degree of stability.21Further, QBE is simpler than LLVM (e.g., providing fewer operations) and thereby eases the implementation of analysis techniques.22Nonetheless, there exist compiler frontends that can emit a representation in the QBE intermediate representation (which can then be analyzed using quebex!).23For example, [SCC], [cproc], or the [Hare compiler][Hare].2425### Status2627I currently consider this a vertical prototype.28A lot of the desired functionality is already there, but not fully developed and tested.29However, all major features of the [QBE specification][QBE v1.2] are nowadays implemented to some degree.30Consequentially, it is possible to process QBE programs emitted by existing compiler frontends such as the [cproc] C11 compiler.31In terms of analysis features, the implementation currently focuses on dynamic analysis techniques (primarily [symbolic execution]).32Unfortunately, there is basically no documentation for the API and the provided command-line frontends (`quebex` and `quebex-symex`) are presently very rudimentary.3334### Architecture3536The foundation of this project is a formal, yet executable, description of the QBE intermediate language.37At the time of writing, it targets [v1.2 of the QBE specification][QBE v1.2].38The syntax is specified using [literate Haskell][literate programming] and [parser combinators] in the `quebex-syntax` library.39The language semantics are expressed in a modular way by distinguishing abstract and actual semantics.40*Abstract semantics* of the QBE language are described in terms of a `Simulator` monad (i.e., an [abstract monad]).41This monad must then be instantiated, whereby *actual semantics* are specified.42Presently, the following instantiations are supported:43441. Concrete semantics, provided by `Language.QBE.Simulator.Default.State`.452. [Symbolic][symbolic execution] (specifically [concolic][concolic testing]) semantics through `Language.QBE.Simulator.Concolic.State`.4647The former is primarily useful for simulation of programs written in the QBE intermediate language.48The latter intended for automated software testing using [symbolic execution] and—as demonstrated below—can be used to automatically generate test inputs.4950The abstract description of the QBE semantics, in terms of the `Simulator` monad, and its concrete instantiation are provided by the `quebex` library.51The symbolic semantics are implemented by a separate `quebex-symex` library.52Additional semantics (e.g., for [abstract interpretation]) can be implemented by building on top of these existing libraries.5354Further, executable programs are provided by the `quebex-cli` component.55These programs can be used directly from a shell, without interacting with the Haskell codebase.56Presently the following executable program components are available:57581. `quebex`: A simulator for QBE programs built on top of the concrete semantics.592. `quebex-symex`: An automated software testing tool facilitating the symbolic semantics.6061These program components operate directly on QBE input programs.6263### Installation6465After cloning the repository, individual components can be installed using [Cabal] (e.g., `cabal install quebex-cli`).66However, presently the codebase is mainly tested with selected GHC versions; therefore, installation using [Guix] is recommended.67For example, in order to install the `quebex-cli` component using Guix:6869```70$ guix time-machine -C .guix/channels.scm -- install -L .guix/modules/ quebex-cli71```7273Afterwards, if Guix is configured correctly, the aforementioned program components (`quebex` and `quebex-symex`) should be available in your `$PATH`.74The following section demonstrates usage of these components.7576### Demonstration7778This framework is primarily *intended to be used as a library*, allowing the implementation of both static and dynamic analysis techniques based on QBE.79Presently, it focuses on dynamic analysis, and sufficient documentation of the library interface is still lacking.80Nonetheless, it is already capable of executing QBE representations of complex C code (e.g., as emitted by [cproc]).81In order to experiment with the current capabilities, the following subsections demonstrate utilization of the aforementioned program components.8283#### Concrete Execution8485Consider the following "Hello, World!" program:8687```C88#include <stdio.h>8990int main(void) {91 puts("Hello, World!");92 return 0;93}94```9596In order to concretly execute this program using `quebex`, we need to obtain an equivalent representation in QBE.97To this end, we can invoke the [cproc] compiler as follows:9899```100$ cproc -emit-qbe hello.c101```102103The resulting QBE file can then be executed with the concrete semantics using:104105```106$ quebex hello.qbe107Hello, World!108```109110Note that `quebex` is only able to invoke the `puts(3)` function because it intercepts its execution, providing a "simulated" version of it.111Presently, only a limited amount of standard library functions are intercepted in this way.112As such, interactions with the file system or more complex output functions (e.g. `printf(3)`) are currently not supported.113114#### Symbolic Execution115116[Symbolic execution][symbolic execution] is a dynamic software analysis technique that explores reachable program paths based on a symbolic input variable (i.e., an input source).117As an example, consider the following C program:118119```C120#include <stdio.h>121#include <stddef.h>122123// Convert a memory region to an unconstrained symbolic value. Like calloc(3), it can124// account for memory regions which store multiple elements (nelem) of a specific size125// (elsiz). The give name is used to identify the symbolic variable and must be unique.126extern void quebex_make_symbolic(void *ptr, size_t nelem, size_t elsiz, const char *name);127128int main(void) {129 int a;130 quebex_make_symbolic(&a, 1, sizeof(a), "a");131 if (a == 42) {132 puts("you found the answer");133 } else {134 puts("not the answer");135 }136137 return 0;138}139```140141This program can be compiled using the QBE-based [cproc] C11 compiler as follows:142143```144$ cproc -emit-qbe example.c145```146147The resulting QBE representation (`example.qbe`) can be symbolically executed using quebex-symex:148149```150$ quebex-symex --write-tests tests/ example.qbe151```152153This will yield the following output:154155```156not the answer157you found the answer158159---160Amount of paths: 2161```162163This indicates that we found two execution paths through our program based on the symbolic variable `a`.164Due to the `--write-tests` option, quebex-symex will create a `tests/` directory that contains test inputs in the [ktest format][KLEE ktest], one for each execution path.165These files can be inspected with the `ktest-tool` from [KLEE] (which is included in the Guix development environment described below).166For example:167168```169$ ktest-tool tests/test000002.ktest170ktest file : 'tests/test000002.ktest'171args : ['example.qbe']172num objects: 1173object 0: name: 'a1'174object 0: size: 4175object 0: data: b'*\x00\x00\x00'176object 0: hex : 0x2a000000177object 0: int : 42178object 0: uint: 42179object 0: text: *...180```181182This tells us that the second execution path, where the program prints `you found the answer`, was triggered with `a1 := 42`.183From these files, we can—for example—automatically [generate high-coverage test cases][KLEE OSDI].184In the future, it will be possible to replay selected `.ktest` files using `quebex-cli`.185186### Design Goals187188This project is intentionally written in a simple subset of the [Haskell] programming language.189It should be usable by anyone with a basic Haskell background (e.g., as obtained by reading [Learn You a Haskell for Great Good!][learnyouahaskell]).190Further, the project should require minimal long-term maintenance and should also support older GHC versions.191Therefore, it uses the [GHC2021] language standard and avoids usage of additional language extensions.192Further, whenever possible, dependencies on external libraries that are [not bundled by GHC][GHC libraries] must be avoided.193194### Development195196Code should be formatted using [ormolu][ormolu github].197Git hooks performing several sanity checks, including ensuring the proper code formatting, are available.198These hooks can be enabled using:199200 $ git config --local core.hooksPath .githooks201202Further, a [Guix] environment for development purposes can be obtained using:203204 $ guix time-machine -C .guix/channels.scm -- shell -L .guix/modules/ -m .guix/manifest.scm205206### License207208This project uses the [REUSE Specification] to indicated used software license.209210[QBE]: https://c9x.me/compile/211[QBE vs LLVM]: https://c9x.me/compile/doc/llvm.html212[QBE v1.2]: https://c9x.me/compile/doc/il-v1.2.html213[LLVM]: https://llvm.org/214[KLEE]: https://klee-se.org215[KLEE LLVM]: https://klee-se.org/releases/docs/v3.1/build-llvm13/216[KLEE OSDI]: https://www.usenix.org/legacy/events/osdi08/tech/full_papers/cadar/cadar.pdf#page=9217[KLEE ktest]: https://klee-se.org/releases/docs/v3.1/tutorials/testing-function/#klee-generated-test-cases218[SCC]: https://www.simple-cc.org/219[cproc]: https://sr.ht/~mcf/cproc/220[Hare]: https://harelang.org/221[Haskell]: https://haskell.org/222[GHC]: https://www.haskell.org/ghc/223[GHC2021]: https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/control.html#extension-GHC2021224[GHC libraries]: https://ghc.gitlab.haskell.org/ghc/doc/libraries/index.html225[learnyouahaskell]: https://learnyouahaskell.github.io/chapters.html226[libriscv]: https://github.com/agra-uni-bremen/libriscv227[ormolu github]: https://github.com/tweag/ormolu228[REUSE Specification]: https://reuse.software/spec-3.3/229[Guix]: https://guix.gnu.org230[symbolic execution]: https://en.wikipedia.org/wiki/Symbolic_execution231[concolic testing]: https://en.wikipedia.org/wiki/Concolic_testing232[abstract interpretation]: https://en.wikipedia.org/wiki/Abstract_interpretation233[literate programming]: https://en.wikipedia.org/wiki/Literate_programming234[parser combinators]: https://en.wikipedia.org/wiki/Parser_combinator235[abstract monad]: https://doi.org/10.1145/3607833236[Cabal]: https://www.haskell.org/cabal/