A software analysis framework built around the QBE intermediate language
git clone https://git.8pit.net/quebex.git
1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me> 2% SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net> 3% 4% SPDX-License-Identifier: MIT AND GPL-3.0-only 5 6\documentclass{article} 7%include polycode.fmt 8 9%subst blankline = "\\[5mm]" 10 11% See https://github.com/kosmikus/lhs2tex/issues/58 12%format <$> = "\mathbin{\langle\$\rangle}" 13%format <&> = "\mathbin{\langle\&\rangle}" 14%format <|> = "\mathbin{\langle\:\vline\:\rangle}" 15%format <?> = "\mathbin{\langle?\rangle}" 16%format <*> = "\mathbin{\langle*\rangle}" 17%format <* = "\mathbin{\langle*}" 18%format *> = "\mathbin{*\rangle}" 19 20\long\def\ignore#1{} 21 22\usepackage{hyperref} 23\hypersetup{ 24 colorlinks = true, 25} 26 27\begin{document} 28 29\title{QBE Intermediate Language\vspace{-2em}} 30\date{} 31\maketitle 32\frenchspacing 33 34\ignore{ 35\begin{code} 36module Language.QBE.Parser (dataDef, typeDef, funcDef) where 37 38import Data.Functor ((<&>)) 39import Data.List (singleton) 40import qualified Language.QBE.Types as Q 41import Language.QBE.Util (bind, decNumber, float) 42import Text.ParserCombinators.Parsec 43 ( Parser, 44 alphaNum, 45 anyChar, 46 between, 47 char, 48 choice, 49 letter, 50 many, 51 many1, 52 manyTill, 53 newline, 54 noneOf, 55 oneOf, 56 optionMaybe, 57 sepBy, 58 sepBy1, 59 skipMany, 60 skipMany1, 61 string, 62 try, 63 (<?>), 64 (<|>), 65 ) 66\end{code} 67} 68 69This an executable description of the 70\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language}, 71specified through \href{https://hackage.haskell.org/package/parsec}{Parsec} 72parser combinators and generated from a literate Haskell file. The description 73is derived from the original QBE IL documentation, licensed under MIT. 74Presently, this implementation targets version 1.2 of the QBE intermediate 75language and aims to be equivalent with the original specification. 76 77\section{Basic Concepts} 78 79The intermediate language (IL) is a higher-level language than the 80machine's assembly language. It smoothes most of the 81irregularities of the underlying hardware and allows an infinite number 82of temporaries to be used. This higher abstraction level lets frontend 83programmers focus on language design issues. 84 85\subsection{Input Files} 86 87The intermediate language is provided to QBE as text. Usually, one file 88is generated per each compilation unit from the frontend input language. 89An IL file is a sequence of \nameref{sec:definitions} for 90data, functions, and types. Once processed by QBE, the resulting file 91can be assembled and linked using a standard toolchain (e.g., GNU 92binutils). 93 94\begin{code} 95comment :: Parser () 96comment = skipMany blankNL >> comment' >> skipMany blankNL 97 where 98 comment' = char '#' >> manyTill anyChar newline 99\end{code}100101\ignore{102\begin{code}103skipNoCode :: Parser () -> Parser ()104skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP105\end{code}106}107108Here is a complete "Hello World" IL file which defines a function that109prints to the screen. Since the string is not a first class object (only110the pointer is) it is defined outside the function\textquotesingle s111body. Comments start with a \# character and finish with the end of the112line.113114\begin{verbatim}115data $str = { b "hello world", b 0 }116117export function w $main() {118@start119 # Call the puts function with $str as argument.120 %r =w call $puts(l $str)121 ret 0122}123\end{verbatim}124125If you have read the LLVM language reference, you might recognize the126example above. In comparison, QBE makes a much lighter use of types and127the syntax is terser.128129\subsection{Parser Combinators}130131\ignore{132\begin{code}133bracesNL :: Parser a -> Parser a134bracesNL = between (wsNL $ char '{') (wsNL $ char '}')135136quoted :: Parser a -> Parser a137quoted = let q = char '"' in between q q138139binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr140binaryInstr conc keyword = do141 _ <- ws (string keyword)142 vfst <- ws val <* ws (char ',')143 conc vfst <$> ws val144145-- Can only appear in data and type definitions and hence allows newlines.146alignSpec :: Parser Q.AllocSize147alignSpec = (ws1 (string "align")) >> wsNL allocSize148\end{code}149}150151The original QBE specification defines the syntax using a BNF grammar. In152contrast, this document defines it using Parsec parser combinators. As such,153this specification is less formal but more accurate as the parsing code is154actually executable. Consequently, this specification also captures constructs155omitted in the original specification (e.g., \nameref{sec:identifiers},156\nameref{sec:statements}, or \nameref{sec:strlit}). Nonetheless, the formal157language recognized by these combinators aims to be equivalent to the one of158the BNF grammar.159160\subsection{Identifiers}161\label{sec:identifiers}162163% Ident is not documented in the original QBE specification.164% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304165166\begin{code}167ident :: Parser String168ident = do169 start <- letter <|> oneOf "._"170 rest <- many (alphaNum <|> oneOf "$._")171 return $ start : rest172\end{code}173174Identifiers for data, types, and functions can start with any ASCII letter or175the special characters \texttt{.} and \texttt{\_}. This initial character can176be followed by a sequence of zero or more alphanumeric characters and the177special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.178179\subsection{Sigils}180181\begin{code}182userDef :: Parser Q.UserIdent183userDef = Q.UserIdent <$> (char ':' >> ident)184185global :: Parser Q.GlobalIdent186global = Q.GlobalIdent <$> (char '$' >> ident)187188local :: Parser Q.LocalIdent189local = Q.LocalIdent <$> (char '%' >> ident)190191label :: Parser Q.BlockIdent192label = Q.BlockIdent <$> (char '@' >> ident)193\end{code}194195The intermediate language makes heavy use of sigils, all user-defined196names are prefixed with a sigil. This is to avoid keyword conflicts, and197also to quickly spot the scope and nature of identifiers.198199\begin{itemize}200 \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}201 \item \texttt{\$} is for globals (represented by a pointer)202 \item \texttt{\%} is for function-scope temporaries203 \item \texttt{@@} is for block labels204\end{itemize}205206\subsection{Spacing}207208\begin{code}209blank :: Parser Char210blank = oneOf "\t " <?> "blank"211212blankNL :: Parser Char213blankNL = oneOf "\n\t " <?> "blank or newline"214\end{code}215216Individual tokens in IL files must be separated by one or more spacing217characters. Both spaces and tabs are recognized as spacing characters.218In data and type definitions, newlines may also be used as spaces to219prevent overly long lines. When exactly one of two consecutive tokens is220a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.221222\ignore{223\begin{code}224ws :: Parser a -> Parser a225ws p = p <* skipMany blank226227ws1 :: Parser a -> Parser a228ws1 p = p <* skipMany1 blank229230wsNL :: Parser a -> Parser a231wsNL p = p <* skipNoCode (skipMany blankNL)232233wsNL1 :: Parser a -> Parser a234wsNL1 p = p <* skipNoCode (skipMany1 blankNL)235\end{code}236}237238\subsection{String Literals}239\label{sec:strlit}240241% The string literal is not documented in the original QBE specification.242% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287243244\begin{code}245strLit :: Parser String246strLit = concat <$> quoted (many strChr)247 where248 strChr :: Parser [Char]249 strChr = (singleton <$> noneOf "\"\\") <|> escSeq250251 escSeq :: Parser [Char]252 escSeq = try $ do253 esc <- char '\\'254 chr <- anyChar255 return $ [esc, chr]256\end{code}257258Strings are enclosed by double quotes and are, for example, used to specify a259section name as part of the \nameref{sec:linkage} information. Within a string,260a double quote can be escaped using a \texttt{\textbackslash} character. All261escape sequences, including double quote escaping, are passed through as-is to262the generated assembly file.263264\section{Types}265266\subsection{Simple Types}267268The IL makes minimal use of types. By design, the types used are269restricted to what is necessary for unambiguous compilation to machine270code and C interfacing. Unlike LLVM, QBE is not using types as a means271to safety; they are only here for semantic purposes.272273\begin{code}274baseType :: Parser Q.BaseType275baseType = choice276 [ bind "w" Q.Word277 , bind "l" Q.Long278 , bind "s" Q.Single279 , bind "d" Q.Double ]280\end{code}281282The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}283(double), they stand respectively for 32-bit and 64-bit integers, and28432-bit and 64-bit floating-point numbers. There are no pointer types285available; pointers are typed by an integer type sufficiently wide to286represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).287Temporaries in the IL can only have a base type.288289\begin{code}290extType :: Parser Q.ExtType291extType = (Q.Base <$> baseType)292 <|> bind "b" Q.Byte293 <|> bind "h" Q.HalfWord294\end{code}295296Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),297respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}298and \nameref{sec:data} definitions.299300For C interfacing, the IL also provides user-defined aggregate types as301well as signed and unsigned variants of the sub-word extended types.302Read more about these types in the \nameref{sec:aggregate-types}303and \nameref{sec:functions} sections.304305\subsection{Subtyping}306307The IL has a minimal subtyping feature, for integer types only. Any308value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the30932 least significant bits of the word value are used.310311Make note that it is the opposite of the usual subtyping on integers (in312C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value313cannot be used in word context. The rationale is that a word can be314signed or unsigned, so extending it to a long could be done in two ways,315either by zero-extension, or by sign-extension.316317\subsection{Constants and Vals}318\label{sec:constants-and-vals}319320\begin{code}321dynConst :: Parser Q.DynConst322dynConst =323 (Q.Const <$> constant)324 <|> (Q.Thread <$> global)325 <?> "dynconst"326\end{code}327328Constants come in two kinds: compile-time constants and dynamic329constants. Dynamic constants include compile-time constants and other330symbol variants that are only known at program-load time or execution331time. Consequently, dynamic constants can only occur in function bodies.332333The representation of integers is two's complement.334Floating-point numbers are represented using the single-precision and335double-precision formats of the IEEE 754 standard.336337\begin{code}338constant :: Parser Q.Const339constant =340 (Q.Number <$> decNumber)341 <|> (Q.SFP <$> sfp)342 <|> (Q.DFP <$> dfp)343 <|> (Q.Global <$> global)344 <?> "const"345 where346 sfp = string "s_" >> float347 dfp = string "d_" >> float348\end{code}349350Constants specify a sequence of bits and are untyped. They are always351parsed as 64-bit blobs. Depending on the context surrounding a constant,352only some of its bits are used. For example, in the program below, the353two variables defined have the same value since the first operand of the354subtraction is a word (32-bit) context.355356\begin{verbatim}357%x =w sub -1, 0 %y =w sub 4294967295, 0358\end{verbatim}359360Because specifying floating-point constants by their bits makes the code361less readable, syntactic sugar is provided to express them. Standard362scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and363double precision numbers respectively. Once again, the following example364defines twice the same double-precision constant.365366\begin{verbatim}367%x =d add d_0, d_-1368%y =d add d_0, -4616189618054758400369\end{verbatim}370371Global symbols can also be used directly as constants; they will be372resolved and turned into actual numeric constants by the linker.373374When the \texttt{thread} keyword prefixes a symbol name, the375symbol\textquotesingle s numeric value is resolved at runtime in the376thread-local storage.377378\begin{code}379val :: Parser Q.Value380val =381 (Q.VConst <$> dynConst)382 <|> (Q.VLocal <$> local)383 <?> "val"384\end{code}385386Vals are used as arguments in regular, phi, and jump instructions within387function definitions. They are either constants or function-scope388temporaries.389390\subsection{Linkage}391\label{sec:linkage}392393\begin{code}394linkage :: Parser Q.Linkage395linkage =396 wsNL (bind "export" Q.LExport)397 <|> wsNL (bind "thread" Q.LThread)398 <|> do399 _ <- ws1 $ string "section"400 (try secWithFlags) <|> sec401 where402 sec :: Parser Q.Linkage403 sec = wsNL strLit <&> (`Q.LSection` Nothing)404405 secWithFlags :: Parser Q.Linkage406 secWithFlags = do407 n <- ws1 strLit408 wsNL strLit <&> Q.LSection n . Just409\end{code}410411Function and data definitions (see below) can specify linkage412information to be passed to the assembler and eventually to the linker.413414The \texttt{export} linkage flag marks the defined item as visible outside the415current file\textquotesingle s scope. If absent, the symbol can only be416referred to locally. Functions compiled by QBE and called from C need to417be exported.418419The \texttt{thread} linkage flag can only qualify data definitions. It mandates420that the object defined is stored in thread-local storage. Each time a421runtime thread starts, the supporting platform runtime is in charge of422making a new copy of the object for the fresh thread. Objects in423thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,424as specified in the \nameref{sec:constants-and-vals} section.425426A \texttt{section} flag can be specified to tell the linker to put the defined427item in a certain section. The use of the section flag is platform428dependent and we refer the user to the documentation of their assembler429and linker for relevant information.430431\begin{verbatim}432section ".init_array" data $.init.f = { l $f }433\end{verbatim}434435The section flag can be used to add function pointers to a global436initialization list, as depicted above. Note that some platforms provide437a BSS section that can be used to minimize the footprint of uniformly438zeroed data. When this section is available, QBE will automatically make439use of it and no section flag is required.440441The section and export linkage flags should each appear at most once in442a definition. If multiple occurrences are present, QBE is free to use443any.444445\subsection{Definitions}446\label{sec:definitions}447448Definitions are the essential components of an IL file. They can define449three types of objects: aggregate types, data, and functions. Aggregate450types are never exported and do not compile to any code. Data and451function definitions have file scope and are mutually recursive (even452across IL files). Their visibility can be controlled using linkage453flags.454455\subsubsection{Aggregate Types}456\label{sec:aggregate-types}457458\begin{code}459typeDef :: Parser Q.TypeDef460typeDef = do461 _ <- wsNL1 (string "type")462 i <- wsNL1 userDef463 _ <- wsNL1 (char '=')464 a <- optionMaybe alignSpec465 bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a466\end{code}467468Aggregate type definitions start with the \texttt{type} keyword. They have file469scope, but types must be defined before being referenced. The inner470structure of a type is expressed by a comma-separated list of fields.471472\begin{code}473subType :: Parser Q.SubType474subType =475 (Q.SExtType <$> extType)476 <|> (Q.SUserDef <$> userDef)477478field :: Parser Q.Field479field = do480 -- TODO: newline is required if there is a number argument481 f <- wsNL subType482 s <- ws $ optionMaybe decNumber483 pure (f, s)484485-- TODO: "For ease of IL generation, a trailing comma is tolerated by the parser".486fields :: Bool -> Parser [Q.Field]487fields allowEmpty =488 (if allowEmpty then sepBy else sepBy1) field (wsNL $ char ',')489\end{code}490491A field consists of a subtype, either an extended type or a user-defined type,492and an optional number expressing the value of this field. In case many items493of the same type are sequenced (like in a C array), the shorter array syntax494can be used.495496\begin{code}497regularType :: Parser Q.AggType498regularType = Q.ARegular <$> fields True499\end{code}500501Three different kinds of aggregate types are presentl ysupported: regular502types, union types and opaque types. The fields of regular types will be503packed. By default, the alignment of an aggregate type is the maximum alignment504of its members. The alignment can be explicitly specified by the programmer.505506\begin{code}507unionType :: Parser Q.AggType508unionType = Q.AUnion <$> many1 (wsNL unionType')509 where510 unionType' :: Parser [Q.Field]511 unionType' = bracesNL $ fields False512\end{code}513514Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.515516\begin{code}517opaqueType :: Parser Q.AggType518opaqueType = Q.AOpaque <$> wsNL decNumber519\end{code}520521Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.522523\subsubsection{Data}524\label{sec:data}525526\begin{code}527dataDef :: Parser Q.DataDef528dataDef = do529 link <- many linkage530 name <- wsNL1 (string "data") >> wsNL global531 _ <- wsNL (char '=')532 alignment <- optionMaybe alignSpec533 bracesNL dataObjs <&> Q.DataDef link name alignment534 where535 dataObjs = sepBy dataObj (wsNL $ char ',')536\end{code}537538Data definitions express objects that will be emitted in the compiled539file. Their visibility and location in the compiled artifact are540controlled with linkage flags described in the \nameref{sec:linkage}541section.542543They define a global identifier (starting with the sigil \texttt{\$}), that544will contain a pointer to the object specified by the definition.545546\begin{code}547dataObj :: Parser Q.DataObj548dataObj =549 (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))550 <|> do551 t <- wsNL1 extType552 i <- many1 (wsNL dataItem)553 return $ Q.OItem t i554\end{code}555556Objects are described by a sequence of fields that start with a type557letter. This letter can either be an extended type, or the \texttt{z} letter.558If the letter used is an extended type, the data item following559specifies the bits to be stored in the field.560561\begin{code}562dataItem :: Parser Q.DataItem563dataItem =564 (Q.DString <$> strLit)565 <|> (Q.DConst <$> constant)566 <|> do567 i <- global568 off <- optionMaybe (char '+' >> decNumber)569 return $ Q.DSymbol i off570\end{code}571572Within each object, several items can be defined. When several data items573follow a letter, they initialize multiple fields of the same size.574575\begin{code}576allocSize :: Parser Q.AllocSize577allocSize =578 choice579 [ bind "4" Q.AlignWord,580 bind "8" Q.AlignLong,581 bind "16" Q.AlignLongLong582 ]583\end{code}584585The members of a struct will be packed. This means that padding has to586be emitted by the frontend when necessary. Alignment of the whole data587objects can be manually specified, and when no alignment is provided,588the maximum alignment from the platform is used.589590When the \texttt{z} letter is used the number following indicates the size of591the field; the contents of the field are zero initialized. It can be592used to add padding between fields or zero-initialize big arrays.593594\subsubsection{Functions}595\label{sec:functions}596597\begin{code}598funcDef :: Parser Q.FuncDef599funcDef = do600 link <- many linkage601 _ <- ws1 (string "function")602 retTy <- optionMaybe (ws1 abity)603 name <- ws global604 args <- wsNL params605 body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block606 return $ Q.FuncDef link name retTy args body607\end{code}608609Function definitions contain the actual code to emit in the compiled610file. They define a global symbol that contains a pointer to the611function code. This pointer can be used in \texttt{call} instructions or stored612in memory.613614\begin{code}615subWordType :: Parser Q.SubWordType616subWordType = choice617 [ try $ bind "sb" Q.SignedByte618 , try $ bind "ub" Q.UnsignedByte619 , bind "sh" Q.SignedHalf620 , bind "uh" Q.UnsignedHalf ]621622abity :: Parser Q.Abity623abity = try (Q.ASubWordType <$> subWordType)624 <|> (Q.ABase <$> baseType)625 <|> (Q.AUserDef <$> userDef)626\end{code}627628The type given right before the function name is the return type of the629function. All return values of this function must have this return type.630If the return type is missing, the function must not return any value.631632\begin{code}633param :: Parser Q.FuncParam634param = (Q.Env <$> (ws1 (string "env") >> local))635 <|> (string "..." >> pure Q.Variadic)636 <|> do637 ty <- ws1 abity638 Q.Regular ty <$> local639640params :: Parser [Q.FuncParam]641params = between (ws $ char '(') (char ')') params'642 where643 params' = sepBy (ws param) (ws $ char ',')644\end{code}645646The parameter list is a comma separated list of temporary names prefixed647by types. The types are used to correctly implement C compatibility.648When an argument has an aggregate type, a pointer to the aggregate is649passed by thea caller. In the example below, we have to use a load650instruction to get the value of the first (and only) member of the651struct.652653\begin{verbatim}654type :one = { w }655656function w $getone(:one %p) {657@start658 %val =w loadw %p659 ret %val660}661\end{verbatim}662663If a function accepts or returns values that are smaller than a word,664such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type665must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,666respectively, for signed and unsigned 8-bit values, and signed and667unsigned 16-bit values. Parameters associated with a sub-word type of668bit width N only have their N least significant bits set and have base669type \texttt{w}. For example, the function670671\begin{verbatim}672function w $addbyte(w %a, sb %b) {673@start674 %bw =w extsb %b675 %val =w add %a, %bw676 ret %val677}678\end{verbatim}679680needs to sign-extend its second argument before the addition. Dually,681return values with sub-word types do not need to be sign or zero682extended.683684If the parameter list ends with \texttt{...}, the function is a variadic685function: it can accept a variable number of arguments. To access the686extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}687instructions described in the \nameref{sec:variadic} section.688689Optionally, the parameter list can start with an environment parameter690\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,691of type \texttt{l}). If the function does not use its environment parameter,692callers can safely omit it. This parameter is invisible to a C caller:693for example, the function694695\begin{verbatim}696export function w $add(env %e, w %a, w %b) {697@start698 %c =w add %a, %b699 ret %c700}701\end{verbatim}702703must be given the C prototype \texttt{int add(int, int)}. The intended use of704this feature is to pass the environment pointer of closures while705retaining a very good compatibility with C. The \nameref{sec:call}706section explains how to pass an environment parameter.707708Since global symbols are defined mutually recursive, there is no need709for function declarations: a function can be referenced before its710definition. Similarly, functions from other modules can be used without711previous declaration. All the type information necessary to compile a712call is in the instruction itself.713714The syntax and semantics for the body of functions are described in the715\nameref{sec:control} section.716717\section{Control}718\label{sec:control}719720The IL represents programs as textual transcriptions of control flow721graphs. The control flow is serialized as a sequence of blocks of722straight-line code which are connected using jump instructions.723724\subsection{Blocks}725\label{sec:blocks}726727\begin{code}728block :: Parser Q.Block729block = do730 l <- wsNL1 label731 s <- many (wsNL1 statement)732 Q.Block l s <$> (wsNL1 jumpInstr)733\end{code}734735All blocks have a name that is specified by a label at their beginning.736Then follows a sequence of instructions that have "fall-through" flow.737Finally one jump terminates the block. The jump can either transfer738control to another block of the same function or return; jumps are739described further below.740741The first block in a function must not be the target of any jump in the742program. If a jump to the function start is needed, the frontend must743insert an empty prelude block at the beginning of the function.744745When one block jumps to the next block in the IL file, it is not746necessary to write the jump instruction, it will be automatically added747by the parser. For example the start block in the example below jumps748directly to the loop block.749750\subsection{Jumps}751752\begin{code}753jumpInstr :: Parser Q.JumpInstr754jumpInstr = (string "hlt" >> pure Q.Halt)755 -- TODO: Return requires a space if there is an optionMaybe756 <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)757 <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))758 <|> do759 _ <- ws1 $ string "jnz"760 v <- ws val <* ws (char ',')761 l1 <- ws label <* ws (char ',')762 l2 <- ws label763 return $ Q.Jnz v l1 l2764\end{code}765766A jump instruction ends every block and transfers the control to another767program location. The target of a jump must never be the first block in768a function. The three kinds of jumps available are described in the769following list.770771\begin{enumerate}772 \item \textbf{Unconditional jump.} Jumps to another block of the same function.773 \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.774 \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.775 \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.776\end{enumerate}777778\section{Instructions}779\label{sec:instructions}780781\begin{code}782instr :: Parser Q.Instr783instr =784 choice785 [ try $ binaryInstr Q.Add "add",786 try $ binaryInstr Q.Sub "sub",787 try $ loadInstr,788 try $ allocInstr789 ]790\end{code}791792Instructions are the smallest piece of code in the IL, they form the body of793\nameref{sec:blocks}. This specification distinguishes instructions and794volatile instructions, the latter do not return a value. For the former, the IL795uses a three-address code, which means that one instruction computes an796operation between two operands and assigns the result to a third one.797798\begin{code}799assign :: Parser Q.Statement800assign = do801 n <- ws local802 t <- ws (char '=') >> ws1 baseType803 Q.Assign n t <$> instr804805volatileInstr :: Parser Q.Statement806volatileInstr = Q.Volatile <$> (storeInstr <|> blitInstr)807808-- TODO: Not documented in the QBE BNF.809statement :: Parser Q.Statement810statement = (try callInstr) <|> assign <|> volatileInstr811\end{code}812813An instruction has both a name and a return type, this return type is a base814type that defines the size of the instruction's result. The type of the815arguments can be unambiguously inferred using the instruction name and the816return type. For example, for all arithmetic instructions, the type of the817arguments is the same as the return type. The two additions below are valid if818\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).819820\begin{verbatim}821%x =w add 0, %y822%z =w add %x, %x823\end{verbatim}824825Some instructions, like comparisons and memory loads have operand types826that differ from their return types. For instance, two floating points827can be compared to give a word result (0 if the comparison succeeds, 1828if it fails).829830\begin{verbatim}831%c =w cgts %a, %b832\end{verbatim}833834In the example above, both operands have to have single type. This is835made explicit by the instruction suffix.836837\subsection{Arithmetic and Bits}838839To-Do.840841\subsection{Memory}842843\subsubsection{Store instructions}844845\begin{code}846storeInstr :: Parser Q.VolatileInstr847storeInstr = do848 t <- string "store" >> ws1 extType849 v <- ws val850 _ <- ws $ char ','851 ws val <&> Q.Store t v852\end{code}853854Store instructions exist to store a value of any base type and any extended855type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}856and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of857this word will be stored in memory at the address specified in the second858argument.859860\subsubsection{Load instructions}861862\begin{code}863loadInstr :: Parser Q.Instr864loadInstr = do865 _ <- string "load"866 t <- ws1 $ choice867 [ try $ bind "sw" (Q.LBase Q.Word),868 try $ bind "uw" (Q.LBase Q.Word),869 try $ Q.LSubWord <$> subWordType,870 Q.LBase <$> baseType871 ]872 ws val <&> Q.Load t873\end{code}874875For types smaller than long, two variants of the load instruction are876available: one will sign extend the loaded value, while the other will zero877extend it. Note that all loads smaller than long can load to either a long or a878word.879880The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect881when they are used to define a word temporary. A \texttt{loadw} instruction is882provided as syntactic sugar for \texttt{loadsw} to make explicit that the883extension mechanism used is irrelevant.884885\subsubsection{Blits}886887\begin{code}888blitInstr :: Parser Q.VolatileInstr889blitInstr = do890 v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')891 v2 <- ws val <* (ws $ char ',')892 nb <- decNumber893 return $ Q.Blit v1 v2 nb894\end{code}895896The blit instruction copies in-memory data from its first address argument to897its second address argument. The third argument is the number of bytes to copy.898The source and destination spans are required to be either non-overlapping, or899fully overlapping (source address identical to the destination address). The900byte count argument must be a nonnegative numeric constant; it cannot be a901temporary.902903One blit instruction may generate a number of instructions proportional to its904byte count argument, consequently, it is recommended to keep this argument905relatively small. If large copies are necessary, it is preferable that906frontends generate calls to a supporting \texttt{memcpy} function.907908\subsubsection{Stack Allocation}909910\begin{code}911allocInstr :: Parser Q.Instr912allocInstr = do913 siz <- (ws $ string "alloc") >> (ws1 allocSize)914 decNumber <&> Q.Alloc siz915\end{code}916917These instructions allocate a chunk of memory on the stack. The number ending918the instruction name is the alignment required for the allocated slot. QBE will919make sure that the returned address is a multiple of that alignment value.920921Stack allocation instructions are used, for example, when compiling the C local922variables, because their address can be taken. When compiling Fortran,923temporaries can be used directly instead, because it is illegal to take the924address of a variable.925926\subsection{Comparisons}927928To-Do.929930\subsection{Conversions}931932To-Do.933934\subsection{Cast and Copy}935936To-Do.937938\subsection{Call}939\label{sec:call}940941\begin{code}942callInstr :: Parser Q.Statement943callInstr = do944 retValue <- optionMaybe $ do945 i <- ws local <* ws (char '=')946 a <- ws1 abity947 return (i, a)948 toCall <- ws1 (string "call") >> ws val949 fnArgs <- params950 return $ Q.Call retValue toCall fnArgs951\end{code}952953To-Do.954955\subsection{Variadic}956\label{sec:variadic}957958To-Do.959960\subsection{Phi}961962To-Do.963964\end{document}