1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>2% SPDX-FileCopyrightText: 2025-2026 Sören Tempel <soeren+git@soeren-tempel.net>3%4% SPDX-License-Identifier: MIT AND GPL-3.0-only56\documentclass{article}7%include polycode.fmt89%subst blankline = "\\[5mm]"1011% See https://github.com/kosmikus/lhs2tex/issues/5812%format <$> = "\mathbin{\langle\$\rangle}"13%format <&> = "\mathbin{\langle\&\rangle}"14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"15%format <?> = "\mathbin{\langle?\rangle}"16%format <*> = "\mathbin{\langle*\rangle}"17%format <* = "\mathbin{\langle*}"18%format *> = "\mathbin{*\rangle}"1920\long\def\ignore#1{}2122\usepackage{hyperref}23\hypersetup{24 colorlinks = true,25}2627\begin{document}2829\title{QBE Intermediate Language\vspace{-2em}}30\date{}31\maketitle32\frenchspacing3334\ignore{35\begin{code}36module Language.QBE.Parser37 ( skipInitComments,38 dataDef,39 typeDef,40 funcDef,41 fileDef42 )43where4445import Data.Char (chr)46import Data.Word (Word64)47import Data.Functor ((<&>))48import Data.List (singleton)49import Data.Map qualified as Map50import qualified Language.QBE.Types as Q51import Language.QBE.Util (bind, decNumber, octNumber, float)52import Text.ParserCombinators.Parsec53 ( Parser,54 alphaNum,55 anyChar,56 between,57 char,58 choice,59 letter,60 many,61 many1,62 manyTill,63 newline,64 noneOf,65 oneOf,66 optional,67 optionMaybe,68 sepBy,69 sepBy1,70 skipMany,71 skipMany1,72 string,73 try,74 (<?>),75 (<|>),76 )77\end{code}78}7980This an executable description of the81\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},82specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}83parser combinators and generated from a literate Haskell file. The description84is derived from the original QBE IL documentation, licensed under MIT.85Presently, this implementation targets version 1.2 of the QBE intermediate86language and aims to be equivalent with the original specification.8788\section{Basic Concepts}8990The intermediate language (IL) is a higher-level language than the91machine's assembly language. It smoothes most of the92irregularities of the underlying hardware and allows an infinite number93of temporaries to be used. This higher abstraction level lets frontend94programmers focus on language design issues.9596\subsection{Input Files}9798The intermediate language is provided to QBE as text. Usually, one file99is generated per each compilation unit from the frontend input language.100An IL file is a sequence of \nameref{sec:definitions} for101data, functions, and types. Once processed by QBE, the resulting file102can be assembled and linked using a standard toolchain (e.g., GNU103binutils).104105\begin{code}106comment :: Parser ()107comment = skipMany blankNL >> comment' >> skipMany blankNL108 where109 comment' = char '#' >> manyTill anyChar newline110\end{code}111112\ignore{113\begin{code}114skipNoCode :: Parser () -> Parser ()115skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP116\end{code}117}118119Here is a complete "Hello World" IL file which defines a function that120prints to the screen. Since the string is not a first class object (only121the pointer is) it is defined outside the function\textquotesingle s122body. Comments start with a \# character and finish with the end of the123line.124125\begin{verbatim}126data $str = { b "hello world", b 0 }127128export function w $main() {129@start130 # Call the puts function with $str as argument.131 %r =w call $puts(l $str)132 ret 0133}134\end{verbatim}135136If you have read the LLVM language reference, you might recognize the137example above. In comparison, QBE makes a much lighter use of types and138the syntax is terser.139140\subsection{Parser Combinators}141142\ignore{143\begin{code}144bracesNL :: Parser a -> Parser a145bracesNL = between (wsNL $ char '{') (wsNL $ char '}')146147quoted :: Parser a -> Parser a148quoted = let q = char '"' in between q q149150sepByTrail1 :: Parser a -> Parser sep -> Parser [a]151sepByTrail1 p sep = do152 x <- p153 xs <- many (try $ sep >> p)154 _ <- optional sep155 return (x:xs)156157sepByTrail :: Parser a -> Parser sep -> Parser [a]158sepByTrail p sep = sepByTrail1 p sep <|> return []159160parenLst :: Parser a -> Parser [a]161parenLst p = between (ws $ char '(') (char ')') inner162 where163 inner = sepBy (ws p) (ws $ char ',')164165unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr166unaryInstr conc keyword = do167 _ <- ws (string keyword)168 conc <$> ws val169170binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr171binaryInstr conc keyword = do172 _ <- ws (string keyword)173 vfst <- ws val <* ws (char ',')174 conc vfst <$> ws val175176-- Can only appear in data and type definitions and hence allows newlines.177alignAny :: Parser Word64178alignAny = (ws1 (string "align")) >> wsNL decNumber179180-- Returns true if it is signed.181signageChar :: Parser Bool182signageChar = (char 's' <|> char 'u') <&> (== 's')183\end{code}184}185186The original QBE specification defines the syntax using a BNF grammar. In187contrast, this document defines it using Parsec parser combinators. As such,188this specification is less formal but more accurate as the parsing code is189actually executable. Consequently, this specification also captures constructs190omitted in the original specification (e.g., \nameref{sec:identifiers}, or191\nameref{sec:strlit}). Nonetheless, the formal language recognized by these192combinators aims to be equivalent to the one of the BNF grammar.193194\subsection{Identifiers}195\label{sec:identifiers}196197% Ident is not documented in the original QBE specification.198% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304199200\begin{code}201ident :: Parser String202ident = do203 start <- letter <|> oneOf "._"204 rest <- many (alphaNum <|> oneOf "$._")205 return $ start : rest206\end{code}207208Identifiers for data, types, and functions can start with any ASCII letter or209the special characters \texttt{.} and \texttt{\_}. This initial character can210be followed by a sequence of zero or more alphanumeric characters and the211special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.212213\subsection{Sigils}214215\begin{code}216userDef :: Parser Q.UserIdent217userDef = Q.UserIdent <$> (char ':' >> ident)218219global :: Parser Q.GlobalIdent220global = Q.GlobalIdent <$> (char '$' >> ident)221222local :: Parser Q.LocalIdent223local = Q.LocalIdent <$> (char '%' >> ident)224225label :: Parser Q.BlockIdent226label = Q.BlockIdent <$> (char '@' >> ident)227\end{code}228229The intermediate language makes heavy use of sigils, all user-defined230names are prefixed with a sigil. This is to avoid keyword conflicts, and231also to quickly spot the scope and nature of identifiers.232233\begin{itemize}234 \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}235 \item \texttt{\$} is for globals (represented by a pointer)236 \item \texttt{\%} is for function-scope temporaries237 \item \texttt{@@} is for block labels238\end{itemize}239240\subsection{Spacing}241242\begin{code}243blank :: Parser Char244blank = oneOf "\t " <?> "blank"245246blankNL :: Parser Char247blankNL = oneOf "\n\t " <?> "blank or newline"248\end{code}249250Individual tokens in IL files must be separated by one or more spacing251characters. Both spaces and tabs are recognized as spacing characters.252In data and type definitions, newlines may also be used as spaces to253prevent overly long lines. When exactly one of two consecutive tokens is254a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.255256\ignore{257\begin{code}258ws :: Parser a -> Parser a259ws p = p <* skipMany blank260261ws1 :: Parser a -> Parser a262ws1 p = p <* skipMany1 blank263264wsNL :: Parser a -> Parser a265wsNL p = p <* skipNoCode (skipMany blankNL)266267wsNL1 :: Parser a -> Parser a268wsNL1 p = p <* skipNoCode (skipMany1 blankNL)269270-- Only intended to be used to skip comments at the start of a file.271skipInitComments :: Parser ()272skipInitComments = skipNoCode (skipMany blankNL)273\end{code}274}275276\subsection{String Literals}277\label{sec:strlit}278279% The string literal is not documented in the original QBE specification.280% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287281282\begin{code}283strLit :: Parser String284strLit = concat <$> quoted (many strChr)285 where286 strChr :: Parser [Char]287 strChr = (singleton <$> noneOf "\"\\") <|> escSeq288289 -- TODO: not documnted in the QBE BNF.290 octEsc :: Parser Char291 octEsc = do292 n <- octNumber293 pure $ chr (fromIntegral n)294295 escSeq :: Parser [Char]296 escSeq = try $ do297 esc <- char '\\'298 (singleton <$> octEsc) <|> (anyChar <&> (\c -> [esc, c]))299\end{code}300301Strings are enclosed by double quotes and are, for example, used to specify a302section name as part of the \nameref{sec:linkage} information. Within a string,303a double quote can be escaped using a \texttt{\textbackslash} character. All304escape sequences, including double quote escaping, are passed through as-is to305the generated assembly file.306307\section{Types}308309\subsection{Simple Types}310311The IL makes minimal use of types. By design, the types used are312restricted to what is necessary for unambiguous compilation to machine313code and C interfacing. Unlike LLVM, QBE is not using types as a means314to safety; they are only here for semantic purposes.315316\begin{code}317baseType :: Parser Q.BaseType318baseType = choice319 [ bind "w" Q.Word320 , bind "l" Q.Long321 , bind "s" Q.Single322 , bind "d" Q.Double ]323\end{code}324325The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}326(double), they stand respectively for 32-bit and 64-bit integers, and32732-bit and 64-bit floating-point numbers. There are no pointer types328available; pointers are typed by an integer type sufficiently wide to329represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).330Temporaries in the IL can only have a base type.331332\begin{code}333extType :: Parser Q.ExtType334extType = (Q.Base <$> baseType)335 <|> bind "b" Q.Byte336 <|> bind "h" Q.HalfWord337\end{code}338339Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),340respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}341and \nameref{sec:data} definitions.342343For C interfacing, the IL also provides user-defined aggregate types as344well as signed and unsigned variants of the sub-word extended types.345Read more about these types in the \nameref{sec:aggregate-types}346and \nameref{sec:functions} sections.347348\subsection{Subtyping}349\label{sec:subtyping}350351The IL has a minimal subtyping feature, for integer types only. Any352value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the35332 least significant bits of the word value are used.354355Make note that it is the opposite of the usual subtyping on integers (in356C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value357cannot be used in word context. The rationale is that a word can be358signed or unsigned, so extending it to a long could be done in two ways,359either by zero-extension, or by sign-extension.360361\subsection{Constants and Vals}362\label{sec:constants-and-vals}363364\begin{code}365dynConst :: Parser Q.DynConst366dynConst =367 (Q.Const <$> constant)368 <|> (Q.Thread <$> global)369 <?> "dynconst"370\end{code}371372Constants come in two kinds: compile-time constants and dynamic373constants. Dynamic constants include compile-time constants and other374symbol variants that are only known at program-load time or execution375time. Consequently, dynamic constants can only occur in function bodies.376377The representation of integers is two's complement.378Floating-point numbers are represented using the single-precision and379double-precision formats of the IEEE 754 standard.380381\begin{code}382constant :: Parser Q.Const383constant =384 (Q.Number <$> decNumber)385 <|> (Q.SFP <$> sfp)386 <|> (Q.DFP <$> dfp)387 <|> (Q.Global <$> global)388 <?> "const"389 where390 sfp = string "s_" >> float391 dfp = string "d_" >> float392\end{code}393394Constants specify a sequence of bits and are untyped. They are always395parsed as 64-bit blobs. Depending on the context surrounding a constant,396only some of its bits are used. For example, in the program below, the397two variables defined have the same value since the first operand of the398subtraction is a word (32-bit) context.399400\begin{verbatim}401%x =w sub -1, 0 %y =w sub 4294967295, 0402\end{verbatim}403404Because specifying floating-point constants by their bits makes the code405less readable, syntactic sugar is provided to express them. Standard406scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and407double precision numbers respectively. Once again, the following example408defines twice the same double-precision constant.409410\begin{verbatim}411%x =d add d_0, d_-1412%y =d add d_0, -4616189618054758400413\end{verbatim}414415Global symbols can also be used directly as constants; they will be416resolved and turned into actual numeric constants by the linker.417418When the \texttt{thread} keyword prefixes a symbol name, the419symbol\textquotesingle s numeric value is resolved at runtime in the420thread-local storage.421422\begin{code}423val :: Parser Q.Value424val =425 (Q.VConst <$> dynConst)426 <|> (Q.VLocal <$> local)427 <?> "val"428\end{code}429430Vals are used as arguments in regular, phi, and jump instructions within431function definitions. They are either constants or function-scope432temporaries.433434\subsection{Linkage}435\label{sec:linkage}436437\begin{code}438linkage :: Parser Q.Linkage439linkage =440 wsNL (bind "export" Q.LExport)441 <|> wsNL (bind "thread" Q.LThread)442 <|> do443 _ <- ws1 $ string "section"444 (try secWithFlags) <|> sec445 where446 sec :: Parser Q.Linkage447 sec = wsNL strLit <&> (`Q.LSection` Nothing)448449 secWithFlags :: Parser Q.Linkage450 secWithFlags = do451 n <- ws1 strLit452 wsNL strLit <&> Q.LSection n . Just453\end{code}454455Function and data definitions (see below) can specify linkage456information to be passed to the assembler and eventually to the linker.457458The \texttt{export} linkage flag marks the defined item as visible outside the459current file\textquotesingle s scope. If absent, the symbol can only be460referred to locally. Functions compiled by QBE and called from C need to461be exported.462463The \texttt{thread} linkage flag can only qualify data definitions. It mandates464that the object defined is stored in thread-local storage. Each time a465runtime thread starts, the supporting platform runtime is in charge of466making a new copy of the object for the fresh thread. Objects in467thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,468as specified in the \nameref{sec:constants-and-vals} section.469470A \texttt{section} flag can be specified to tell the linker to put the defined471item in a certain section. The use of the section flag is platform472dependent and we refer the user to the documentation of their assembler473and linker for relevant information.474475\begin{verbatim}476section ".init_array" data $.init.f = { l $f }477\end{verbatim}478479The section flag can be used to add function pointers to a global480initialization list, as depicted above. Note that some platforms provide481a BSS section that can be used to minimize the footprint of uniformly482zeroed data. When this section is available, QBE will automatically make483use of it and no section flag is required.484485The section and export linkage flags should each appear at most once in486a definition. If multiple occurrences are present, QBE is free to use487any.488489\subsection{Definitions}490\label{sec:definitions}491492Definitions are the essential components of an IL file. They can define493three types of objects: aggregate types, data, and functions. Aggregate494types are never exported and do not compile to any code. Data and495function definitions have file scope and are mutually recursive (even496across IL files). Their visibility can be controlled using linkage497flags.498499\subsubsection{Aggregate Types}500\label{sec:aggregate-types}501502\begin{code}503typeDef :: Parser Q.TypeDef504typeDef = do505 _ <- wsNL1 (string "type")506 i <- wsNL1 userDef507 _ <- wsNL1 (char '=')508 a <- optionMaybe alignAny509 bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a510\end{code}511512Aggregate type definitions start with the \texttt{type} keyword. They have file513scope, but types must be defined before being referenced. The inner514structure of a type is expressed by a comma-separated list of fields.515516\begin{code}517subType :: Parser Q.SubType518subType =519 (Q.SExtType <$> extType)520 <|> (Q.SUserDef <$> userDef)521522field :: Parser Q.Field523field = do524 -- TODO: newline is required if there is a number argument525 f <- wsNL subType526 s <- ws $ optionMaybe decNumber527 pure (f, s)528529fields :: Bool -> Parser [Q.Field]530fields allowEmpty =531 (if allowEmpty then sepByTrail else sepByTrail1) field (wsNL $ char ',')532\end{code}533534A field consists of a subtype, either an extended type or a user-defined type,535and an optional number expressing the value of this field. In case many items536of the same type are sequenced (like in a C array), the shorter array syntax537can be used.538539\begin{code}540regularType :: Parser Q.AggType541regularType = Q.ARegular <$> fields True542\end{code}543544Three different kinds of aggregate types are presentl ysupported: regular545types, union types and opaque types. The fields of regular types will be546packed. By default, the alignment of an aggregate type is the maximum alignment547of its members. The alignment can be explicitly specified by the programmer.548549\begin{code}550unionType :: Parser Q.AggType551unionType = Q.AUnion <$> many1 (wsNL unionType')552 where553 unionType' :: Parser [Q.Field]554 unionType' = bracesNL $ fields False555\end{code}556557Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.558559\begin{code}560opaqueType :: Parser Q.AggType561opaqueType = Q.AOpaque <$> wsNL decNumber562\end{code}563564Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.565566\subsubsection{Data}567\label{sec:data}568569\begin{code}570dataDef :: Parser Q.DataDef571dataDef = do572 link <- many linkage573 name <- wsNL1 (string "data") >> wsNL global574 _ <- wsNL (char '=')575 alignment <- optionMaybe alignAny576 bracesNL dataObjs <&> Q.DataDef link name alignment577 where578 -- TODO: sepByTrail is not documented in the QBE BNF.579 dataObjs = sepByTrail dataObj (wsNL $ char ',')580\end{code}581582Data definitions express objects that will be emitted in the compiled583file. Their visibility and location in the compiled artifact are584controlled with linkage flags described in the \nameref{sec:linkage}585section.586587They define a global identifier (starting with the sigil \texttt{\$}), that588will contain a pointer to the object specified by the definition.589590\begin{code}591dataObj :: Parser Q.DataObj592dataObj =593 (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))594 <|> do595 t <- wsNL1 extType596 i <- many1 (wsNL dataItem)597 return $ Q.OItem t i598\end{code}599600Objects are described by a sequence of fields that start with a type601letter. This letter can either be an extended type, or the \texttt{z} letter.602If the letter used is an extended type, the data item following603specifies the bits to be stored in the field.604605\begin{code}606dataItem :: Parser Q.DataItem607dataItem =608 (Q.DString <$> strLit)609 <|> try610 ( do611 i <- ws global612 off <- (ws $ char '+') >> ws decNumber613 return $ Q.DSymOff i off614 )615 <|> (Q.DConst <$> constant)616\end{code}617618Within each object, several items can be defined. When several data items619follow a letter, they initialize multiple fields of the same size.620621\begin{code}622allocSize :: Parser Q.AllocSize623allocSize =624 choice625 [ bind "4" Q.AllocWord,626 bind "8" Q.AllocLong,627 bind "16" Q.AllocLongLong628 ]629\end{code}630631The members of a struct will be packed. This means that padding has to632be emitted by the frontend when necessary. Alignment of the whole data633objects can be manually specified, and when no alignment is provided,634the maximum alignment from the platform is used.635636When the \texttt{z} letter is used the number following indicates the size of637the field; the contents of the field are zero initialized. It can be638used to add padding between fields or zero-initialize big arrays.639640\subsubsection{Functions}641\label{sec:functions}642643\begin{code}644funcDef :: Parser Q.FuncDef645funcDef = do646 link <- many linkage647 _ <- ws1 (string "function")648 retTy <- optionMaybe (ws1 abity)649 name <- ws global650 args <- wsNL params651 body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block652653 case (Q.insertJumps body) of654 Nothing -> fail $ "invalid fallthrough in " ++ show name655 Just bl -> return $ Q.FuncDef link name retTy args bl656\end{code}657658Function definitions contain the actual code to emit in the compiled659file. They define a global symbol that contains a pointer to the660function code. This pointer can be used in \texttt{call} instructions or stored661in memory.662663\begin{code}664subWordType :: Parser Q.SubWordType665subWordType = choice666 [ try $ bind "sb" Q.SignedByte667 , try $ bind "ub" Q.UnsignedByte668 , bind "sh" Q.SignedHalf669 , bind "uh" Q.UnsignedHalf ]670671abity :: Parser Q.Abity672abity = try (Q.ASubWordType <$> subWordType)673 <|> (Q.ABase <$> baseType)674 <|> (Q.AUserDef <$> userDef)675\end{code}676677The type given right before the function name is the return type of the678function. All return values of this function must have this return type.679If the return type is missing, the function must not return any value.680681\begin{code}682param :: Parser Q.FuncParam683param = (Q.Env <$> (ws1 (string "env") >> local))684 <|> (string "..." >> pure Q.Variadic)685 <|> do686 ty <- ws1 abity687 Q.Regular ty <$> local688689params :: Parser [Q.FuncParam]690params = parenLst param691\end{code}692693The parameter list is a comma separated list of temporary names prefixed694by types. The types are used to correctly implement C compatibility.695When an argument has an aggregate type, a pointer to the aggregate is696passed by thea caller. In the example below, we have to use a load697instruction to get the value of the first (and only) member of the698struct.699700\begin{verbatim}701type :one = { w }702703function w $getone(:one %p) {704@start705 %val =w loadw %p706 ret %val707}708\end{verbatim}709710If a function accepts or returns values that are smaller than a word,711such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type712must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,713respectively, for signed and unsigned 8-bit values, and signed and714unsigned 16-bit values. Parameters associated with a sub-word type of715bit width N only have their N least significant bits set and have base716type \texttt{w}. For example, the function717718\begin{verbatim}719function w $addbyte(w %a, sb %b) {720@start721 %bw =w extsb %b722 %val =w add %a, %bw723 ret %val724}725\end{verbatim}726727needs to sign-extend its second argument before the addition. Dually,728return values with sub-word types do not need to be sign or zero729extended.730731If the parameter list ends with \texttt{...}, the function is a variadic732function: it can accept a variable number of arguments. To access the733extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}734instructions described in the \nameref{sec:variadic} section.735736Optionally, the parameter list can start with an environment parameter737\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,738of type \texttt{l}). If the function does not use its environment parameter,739callers can safely omit it. This parameter is invisible to a C caller:740for example, the function741742\begin{verbatim}743export function w $add(env %e, w %a, w %b) {744@start745 %c =w add %a, %b746 ret %c747}748\end{verbatim}749750must be given the C prototype \texttt{int add(int, int)}. The intended use of751this feature is to pass the environment pointer of closures while752retaining a very good compatibility with C. The \nameref{sec:call}753section explains how to pass an environment parameter.754755Since global symbols are defined mutually recursive, there is no need756for function declarations: a function can be referenced before its757definition. Similarly, functions from other modules can be used without758previous declaration. All the type information necessary to compile a759call is in the instruction itself.760761The syntax and semantics for the body of functions are described in the762\nameref{sec:control} section.763764\section{Control}765\label{sec:control}766767The IL represents programs as textual transcriptions of control flow768graphs. The control flow is serialized as a sequence of blocks of769straight-line code which are connected using jump instructions.770771\subsection{Blocks}772\label{sec:blocks}773774\begin{code}775block :: Parser Q.Block'776block = do777 l <- wsNL1 label778 p <- many (wsNL1 $ try phiInstr)779 s <- many (wsNL1 statement)780 Q.Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr)781\end{code}782783All blocks have a name that is specified by a label at their beginning.784Then follows a sequence of instructions that have "fall-through" flow.785Finally one jump terminates the block. The jump can either transfer786control to another block of the same function or return; jumps are787described further below.788789The first block in a function must not be the target of any jump in the790program. If a jump to the function start is needed, the frontend must791insert an empty prelude block at the beginning of the function.792793When one block jumps to the next block in the IL file, it is not794necessary to write the jump instruction, it will be automatically added795by the parser. For example the start block in the example below jumps796directly to the loop block.797798\subsection{Jumps}799\label{sec:jumps}800801\begin{code}802jumpInstr :: Parser Q.JumpInstr803jumpInstr = (string "hlt" >> pure Q.Halt)804 -- TODO: Return requires a space if there is an optionMaybe805 <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)806 <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))807 <|> do808 _ <- ws1 $ string "jnz"809 v <- ws val <* ws (char ',')810 l1 <- ws label <* ws (char ',')811 l2 <- ws label812 return $ Q.Jnz v l1 l2813\end{code}814815A jump instruction ends every block and transfers the control to another816program location. The target of a jump must never be the first block in817a function. The three kinds of jumps available are described in the818following list.819820\begin{enumerate}821 \item \textbf{Unconditional jump.} Jumps to another block of the same function.822 \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.823 \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.824 \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.825\end{enumerate}826827\section{Instructions}828\label{sec:instructions}829830\begin{code}831instr :: Parser Q.Instr832instr =833 choice834 [ try $ binaryInstr Q.Add "add",835 try $ binaryInstr Q.Sub "sub",836 try $ binaryInstr Q.Mul "mul",837 try $ binaryInstr Q.Div "div",838 try $ binaryInstr Q.URem "urem",839 try $ binaryInstr Q.Rem "rem",840 try $ binaryInstr Q.UDiv "udiv",841 try $ binaryInstr Q.Or "or",842 try $ binaryInstr Q.Xor "xor",843 try $ binaryInstr Q.And "and",844 try $ binaryInstr Q.Sar "sar",845 try $ binaryInstr Q.Shr "shr",846 try $ binaryInstr Q.Shl "shl",847 try $ unaryInstr Q.Neg "neg",848 try $ unaryInstr Q.Cast "cast",849 try $ unaryInstr Q.Copy "copy",850 try $ unaryInstr Q.VAArg "vaarg",851 try $ loadInstr,852 try $ allocInstr,853 try $ compareInstr,854 try $ extInstr,855 try $ truncInstr,856 try $ fromFloatInstr,857 try $ toFloatInstr858 ]859\end{code}860861Instructions are the smallest piece of code in the IL, they form the body of862\nameref{sec:blocks}. This specification distinguishes instructions and863volatile instructions, the latter do not return a value. For the former, the IL864uses a three-address code, which means that one instruction computes an865operation between two operands and assigns the result to a third one.866867\begin{code}868assign :: Parser Q.Statement869assign = do870 n <- ws local871 t <- ws (char '=') >> ws1 baseType872 Q.Assign n t <$> instr873874volatileInstr :: Parser Q.Statement875volatileInstr =876 Q.Volatile <$>877 (storeInstr <|> blitInstr <|> vastartInstr <|> dbglocInstr)878879-- TODO: Not documented in the QBE BNF.880statement :: Parser Q.Statement881statement = (try callInstr) <|> assign <|> volatileInstr882\end{code}883884An instruction has both a name and a return type, this return type is a base885type that defines the size of the instruction's result. The type of the886arguments can be unambiguously inferred using the instruction name and the887return type. For example, for all arithmetic instructions, the type of the888arguments is the same as the return type. The two additions below are valid if889\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).890891\begin{verbatim}892%x =w add 0, %y893%z =w add %x, %x894\end{verbatim}895896Some instructions, like comparisons and memory loads have operand types897that differ from their return types. For instance, two floating points898can be compared to give a word result (0 if the comparison succeeds, 1899if it fails).900901\begin{verbatim}902%c =w cgts %a, %b903\end{verbatim}904905In the example above, both operands have to have single type. This is906made explicit by the instruction suffix.907908\subsection{Arithmetic and Bits}909910\begin{quote}911\begin{itemize}912\item \texttt{add}, \texttt{sub}, \texttt{div}, \texttt{mul}913\item \texttt{neg}914\item \texttt{udiv}, \texttt{rem}, \texttt{urem}915\item \texttt{or}, \texttt{xor}, \texttt{and}916\item \texttt{sar}, \texttt{shr}, \texttt{shl}917\end{itemize}918\end{quote}919920The base arithmetic instructions in the first bullet are available for921all types, integers and floating points.922923When \texttt{div} is used with word or long return type, the arguments are924treated as signed. The unsigned integral division is available as \texttt{udiv}925instruction. When the result of a division is not an integer, it is truncated926towards zero.927928The signed and unsigned remainder operations are available as \texttt{rem} and929\texttt{urem}. The sign of the remainder is the same as the one of the930dividend. Its magnitude is smaller than the divisor one. These two instructions931and \texttt{udiv} are only available with integer arguments and result.932933Bitwise OR, AND, and XOR operations are available for both integer934types. Logical operations of typical programming languages can be935implemented using \nameref{sec:comparisions} and \nameref{sec:jumps}.936937Shift instructions \texttt{sar}, \texttt{shr}, and \texttt{shl}, shift right or938left their first operand by the amount from the second operand. The shifting939amount is taken modulo the size of the result type. Shifting right can either940preserve the sign of the value (using \texttt{sar}), or fill the newly freed941bits with zeroes (using \texttt{shr}). Shifting left always fills the freed942bits with zeroes.943944Remark that an arithmetic shift right (\texttt{sar}) is only equivalent to a945division by a power of two for non-negative numbers. This is because the shift946right "truncates" towards minus infinity, while the division truncates towards947zero.948949\subsection{Memory}950\label{sec:memory}951952The following sections discuss instructions for interacting with values stored in memory.953954\subsubsection{Store instructions}955956\begin{code}957storeInstr :: Parser Q.VolatileInstr958storeInstr = do959 t <- string "store" >> ws1 extType960 v <- ws val961 _ <- ws $ char ','962 ws val <&> Q.Store t v963\end{code}964965Store instructions exist to store a value of any base type and any extended966type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}967and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of968this word will be stored in memory at the address specified in the second969argument.970971\subsubsection{Load instructions}972973\begin{code}974loadInstr :: Parser Q.Instr975loadInstr = do976 _ <- string "load"977 t <- ws1 $ choice978 [ try $ bind "sw" (Q.LBase Q.Word),979 try $ bind "uw" (Q.LBase Q.Word),980 try $ Q.LSubWord <$> subWordType,981 Q.LBase <$> baseType982 ]983 ws val <&> Q.Load t984\end{code}985986For types smaller than long, two variants of the load instruction are987available: one will sign extend the loaded value, while the other will zero988extend it. Note that all loads smaller than long can load to either a long or a989word.990991The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect992when they are used to define a word temporary. A \texttt{loadw} instruction is993provided as syntactic sugar for \texttt{loadsw} to make explicit that the994extension mechanism used is irrelevant.995996\subsubsection{Blits}997998\begin{code}999blitInstr :: Parser Q.VolatileInstr1000blitInstr = do1001 v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')1002 v2 <- ws val <* (ws $ char ',')1003 nb <- decNumber1004 return $ Q.Blit v1 v2 nb1005\end{code}10061007The blit instruction copies in-memory data from its first address argument to1008its second address argument. The third argument is the number of bytes to copy.1009The source and destination spans are required to be either non-overlapping, or1010fully overlapping (source address identical to the destination address). The1011byte count argument must be a nonnegative numeric constant; it cannot be a1012temporary.10131014One blit instruction may generate a number of instructions proportional to its1015byte count argument, consequently, it is recommended to keep this argument1016relatively small. If large copies are necessary, it is preferable that1017frontends generate calls to a supporting \texttt{memcpy} function.10181019\subsubsection{Stack Allocation}10201021\begin{code}1022allocInstr :: Parser Q.Instr1023allocInstr = do1024 siz <- (ws $ string "alloc") >> (ws1 allocSize)1025 val <&> Q.Alloc siz1026\end{code}10271028These instructions allocate a chunk of memory on the stack. The number ending1029the instruction name is the alignment required for the allocated slot. QBE will1030make sure that the returned address is a multiple of that alignment value.10311032Stack allocation instructions are used, for example, when compiling the C local1033variables, because their address can be taken. When compiling Fortran,1034temporaries can be used directly instead, because it is illegal to take the1035address of a variable.10361037\subsection{Comparisons}1038\label{sec:comparisions}10391040\begin{code}1041compareInstr :: Parser Q.Instr1042compareInstr = do1043 _ <- char 'c'1044 (try intCompare) <|> floatCompare10451046compareArgs :: Parser (Q.Value, Q.Value)1047compareArgs = do1048 lhs <- ws val <* ws (char ',')1049 rhs <- ws val1050 pure (lhs, rhs)10511052intCompare :: Parser Q.Instr1053intCompare = do1054 op <- compareIntOp1055 ty <- ws1 intArg10561057 (lhs, rhs) <- compareArgs1058 pure $ Q.CompareInt ty op lhs rhs10591060floatCompare :: Parser Q.Instr1061floatCompare = do1062 op <- compareFloatOp1063 ty <- ws1 floatArg10641065 (lhs, rhs) <- compareArgs1066 pure $ Q.CompareFloat ty op lhs rhs1067\end{code}10681069Comparison instructions return an integer value (either a word or a long), and1070compare values of arbitrary types. The returned value is 1 if the two operands1071satisfy the comparison relation, or 0 otherwise. The names of comparisons1072respect a standard naming scheme in three parts:10731074\begin{enumerate}1075 \item All comparisons start with the letter \texttt{c}.1076 \item Then comes a comparison type.1077 \item Finally, the instruction name is terminated with a basic type suffix precising the type of the operands to be compared.1078\end{enumerate}10791080The following instruction are available for integer comparisons:10811082\begin{code}1083compareIntOp :: Parser Q.IntCmpOp1084compareIntOp = choice1085 [ bind "eq" Q.IEq1086 , bind "ne" Q.INe1087 , try $ bind "sle" Q.ISle1088 , try $ bind "slt" Q.ISlt1089 , try $ bind "sge" Q.ISge1090 , try $ bind "sgt" Q.ISgt1091 , try $ bind "ule" Q.IUle1092 , try $ bind "ult" Q.IUlt1093 , try $ bind "uge" Q.IUge1094 , try $ bind "ugt" Q.IUgt ]1095\end{code}10961097For floating point comparisons use one of these instructions:10981099\begin{code}1100compareFloatOp :: Parser Q.FloatCmpOp1101compareFloatOp = choice1102 [ bind "eq" Q.FEq1103 , bind "ne" Q.FNe1104 , try $ bind "le" Q.FLe1105 , bind "lt" Q.FLt1106 , try $ bind "ge" Q.FGe1107 , bind "gt" Q.FGt1108 , bind "o" Q.FOrd1109 , bind "uo" Q.FUnord ]1110\end{code}11111112For example, \texttt{cod} compares two double-precision floating point numbers1113and returns 1 if the two floating points are not NaNs, or 0 otherwise. The1114\texttt{csltw} instruction compares two words representing signed numbers and1115returns 1 when the first argument is smaller than the second one.11161117\subsection{Conversions}11181119Conversion operations change the representation of a value, possibly modifying1120it if the target type cannot hold the value of the source type. Conversions can1121extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or1122convert a floating point into an integer and vice versa.11231124\begin{code}1125extInstr :: Parser Q.Instr1126extInstr = do1127 _ <- string "ext"1128 ty <- ws1 extArg1129 ws val <&> Q.Ext ty1130 where1131 extArg :: Parser Q.ExtArg1132 extArg = try (Q.ExtSubWord <$> subWordType)1133 <|> try (bind "sw" Q.ExtSignedWord)1134 <|> bind "s" Q.ExtSingle1135 <|> bind "uw" Q.ExtUnsignedWord1136\end{code}11371138Extending the precision of a temporary is done using the \texttt{ext} family of1139instructions. Because QBE types do not specify the signedness (like in LLVM),1140extension instructions exist to sign-extend and zero-extend a value. For1141example, \texttt{extsb} takes a word argument and sign-extends the 81142least-significant bits to a full word or long, depending on the return type.11431144\begin{code}1145truncInstr :: Parser Q.Instr1146truncInstr = do1147 _ <- ws1 $ string "truncd"1148 ws val <&> Q.TruncDouble1149\end{code}11501151The instructions \texttt{exts} (extend single) and \texttt{truncd} (truncate1152double) are provided to change the precision of a floating point value. When1153the double argument of truncd cannot be represented as a single-precision1154floating point, it is truncated towards zero.11551156\begin{code}1157floatArg :: Parser Q.FloatArg1158floatArg = bind "d" Q.FDouble <|> bind "s" Q.FSingle11591160fromFloatInstr :: Parser Q.Instr1161fromFloatInstr = do1162 arg <- floatArg <* string "to"1163 isSigned <- signageChar1164 _ <- ws1 $ char 'i'1165 ws val <&> Q.FloatToInt arg isSigned11661167intArg :: Parser Q.IntArg1168intArg = bind "w" Q.IWord <|> bind "l" Q.ILong11691170toFloatInstr :: Parser Q.Instr1171toFloatInstr = do1172 isSigned <- signageChar1173 arg <- intArg1174 _ <- ws1 $ string "tof"1175 ws val <&> Q.IntToFloat arg isSigned1176\end{code}11771178Converting between signed integers and floating points is done using1179\texttt{stosi} (single to signed integer), \texttt{stoui} (single to unsigned1180integer), \texttt{dtosi} (double to signed integer), \texttt{dtoui} (double to1181unsigned integer), \texttt{swtof} (signed word to float), \texttt{uwtof}1182(unsigned word to float), \texttt{sltof} (signed long to float) and1183\texttt{ultof} (unsigned long to float).11841185\subsection{Cast and Copy}11861187The \texttt{cast} and \texttt{copy} instructions return the bits of their1188argument verbatim. However a cast will change an integer into a floating point1189of the same width and vice versa.11901191Casts can be used to make bitwise operations on the representation of floating1192point numbers. For example the following program will compute the opposite of1193the single-precision floating point number \texttt{\%f} into \texttt{\%rs}.11941195\begin{verbatim}1196%b0 =w cast %f1197%b1 =w xor 2147483648, %b0 # flip the msb1198%rs =s cast %b11199\end{verbatim}12001201\subsection{Call}1202\label{sec:call}12031204\begin{code}1205-- TODO: Code duplication with 'param'.1206callArg :: Parser Q.FuncArg1207callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))1208 <|> (string "..." >> pure Q.ArgVar)1209 <|> do1210 ty <- ws1 abity1211 Q.ArgReg ty <$> val12121213callArgs :: Parser [Q.FuncArg]1214callArgs = parenLst callArg12151216callInstr :: Parser Q.Statement1217callInstr = do1218 retValue <- optionMaybe $ do1219 i <- ws local <* ws (char '=')1220 a <- ws1 abity1221 return (i, a)1222 toCall <- ws1 (string "call") >> ws val1223 fnArgs <- callArgs1224 return $ Q.Call retValue toCall fnArgs1225\end{code}12261227The call instruction is special in several ways. It is not a three-address1228instruction and requires the type of all its arguments to be given. Also, the1229return type can be either a base type or an aggregate type. These specifics are1230required to compile calls with C compatibility (i.e., to respect the ABI).12311232When an aggregate type is used as argument type or return type, the value1233respectively passed or returned needs to be a pointer to a memory location1234holding the value. This is because aggregate types are not first-class1235citizens of the IL.12361237Sub-word types are used for arguments and return values of width less than a1238word. Details on these types are presented in the \nameref{sec:functions} section.1239Arguments with sub-word types need not be sign or zero extended according to1240their type. Calls with a sub-word return type define a temporary of base type1241\texttt{w} with its most significant bits unspecified.12421243Unless the called function does not return a value, a return temporary must be1244specified, even if it is never used afterwards.12451246An environment parameter can be passed as first argument using the \texttt{env}1247keyword. The passed value must be a 64-bit integer. If the called function does1248not expect an environment parameter, it will be safely discarded. See the1249\nameref{sec:functions} section for more information about environment1250parameters.12511252When the called function is variadic, there must be a \texttt{...} marker1253separating the named and variadic arguments.12541255\subsection{Variadic}1256\label{sec:variadic}12571258\begin{code}1259vastartInstr :: Parser Q.VolatileInstr1260vastartInstr = do1261 _ <- ws1 (string "vastart")1262 Q.VAStart <$> ws val1263\end{code}12641265The \texttt{vastart} and \texttt{vaarg} instructions provide a portable way to1266access the extra parameters of a variadic function.12671268\begin{enumerate}1269 \item \texttt{vastart} -- \texttt{(m)}1270 \item \texttt{vaarg} -- \texttt{T(mmmm)}1271\end{enumerate}12721273The \texttt{vastart} instruction initializes a variable argument list used to1274access the extra parameters of the enclosing variadic function. It is safe to1275call it multiple times.12761277The \texttt{vaarg} instruction fetches the next argument from a variable1278argument list. It is currently limited to fetching arguments that have a base1279type. This instruction is essentially effectful: calling it twice in a row will1280return two consecutive arguments from the argument list.12811282Both instructions take a pointer to a variable argument list as the sole argument.1283The size and alignment of the variable argument lists depends on the target used.12841285\subsection{Phi}12861287\begin{code}1288phiBranch :: Parser (Q.BlockIdent, Q.Value)1289phiBranch = do1290 n <- ws1 label1291 v <- val1292 pure (n, v)12931294phiInstr :: Parser Q.Phi1295phiInstr = do1296 -- TODO: code duplication with 'assign'1297 n <- ws local1298 t <- ws (char '=') >> ws1 baseType12991300 _ <- ws1 (string "phi")1301 -- TODO: combinator for sepBy1302 p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')1303 return $ Q.Phi n t p1304\end{code}13051306First and foremost, phi instructions are NOT necessary when writing a frontend1307to QBE. One solution to avoid having to deal with SSA form is to use stack1308allocated variables for all source program variables and perform assignments1309and lookups using \nameref{sec:memory} operations. This is what LLVM users1310typically do.13111312Another solution is to simply emit code that is not in SSA form! Contrary to1313LLVM, QBE is able to fixup programs not in SSA form without requiring the1314boilerplate of loading and storing in memory. For example, the following1315program will be correctly compiled by QBE.13161317\begin{verbatim}1318@start1319 %x =w copy 1001320 %s =w copy 01321@loop1322 %s =w add %s, %x1323 %x =w sub %x, 11324 jnz %x, @loop, @end1325@end1326 ret %s1327\end{verbatim}13281329Now, if you want to know what phi instructions are and how to use them in QBE,1330you can read the following.13311332Phi instructions are specific to SSA form. In SSA form values can only be1333assigned once, without phi instructions, this requirement is too strong to1334represent many programs. For example consider the following C program.13351336\begin{verbatim}1337int f(int x) {1338 int y;1339 if (x)1340 y = 1;1341 else1342 y = 2;1343 return y;1344}1345\end{verbatim}13461347The variable \texttt{y} is assigned twice, the solution to translate it in SSA1348form is to insert a phi instruction.13491350\begin{verbatim}1351@ifstmt1352 jnz %x, @ift, @iff1353@ift1354 jmp @retstmt1355@iff1356 jmp @retstmt1357@retstmt1358 %y =w phi @ift 1, @iff 21359 ret %y1360\end{verbatim}13611362Phi instructions return one of their arguments depending on where the control1363came from. In the example, \texttt{\%y} is set to 1 if the1364\texttt{\textbackslash{}ift} branch is taken, or it is set to 2 otherwise.13651366An important remark about phi instructions is that QBE assumes that if a1367variable is defined by a phi it respects all the SSA invariants. So it is1368critical to not use phi instructions unless you know exactly what you are1369doing.13701371\subsection{Debug Information}13721373QBE supports the inclusion of debug information. Specifically, it allows1374defining from which source file type, data, and function definitions originated.1375For this purpose, it provides the \texttt{dbgfile} definition, which receives a1376file name (string literal) as its sole argument. Every type, data and function1377definition thereafter are assumed to originate in this file.13781379\begin{code}1380-- TODO: not documnted in the QBE BNF.1381fileDef :: Parser String1382fileDef = do1383 _ <- ws1 $ string "dbgfile"1384 wsNL1 strLit1385\end{code}13861387Further, instructions within a function can be associated with a specific line1388and column number of a previously defined \texttt{dbgfile}. The1389\texttt{dbgfile} is referenced by index using the first argument to1390\texttt{dbgloc}. The second argument represents the line number, the third1391(optional) argument the column number.13921393\begin{code}1394-- TODO: not documnted in the QBE BNF.1395dbglocInstr :: Parser Q.VolatileInstr1396dbglocInstr = do1397 _ <- ws1 $ string "dbgloc"1398 file <- ws decNumber <* ws (char ',')1399 line <- ws decNumber1400 col <- optionMaybe (ws (char ',') >> ws decNumber)1401 return $ Q.DBGLoc file line col1402\end{code}14031404\end{document}