1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>2% SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net>3%4% SPDX-License-Identifier: MIT AND GPL-3.0-only56\documentclass{article}7%include polycode.fmt89%subst blankline = "\\[5mm]"1011% See https://github.com/kosmikus/lhs2tex/issues/5812%format <$> = "\mathbin{\langle\$\rangle}"13%format <&> = "\mathbin{\langle\&\rangle}"14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"15%format <?> = "\mathbin{\langle?\rangle}"16%format <*> = "\mathbin{\langle*\rangle}"17%format <* = "\mathbin{\langle*}"18%format *> = "\mathbin{*\rangle}"1920\long\def\ignore#1{}2122\usepackage{hyperref}23\hypersetup{24 colorlinks = true,25}2627\begin{document}2829\title{QBE Intermediate Language\vspace{-2em}}30\date{}31\maketitle32\frenchspacing3334\ignore{35\begin{code}36module Language.QBE.Parser37 ( skipInitComments,38 dataDef,39 typeDef,40 funcDef,41 )42where4344import Data.Char (chr)45import Data.Word (Word64)46import Data.Functor ((<&>))47import Data.List (singleton)48import Data.Map qualified as Map49import qualified Language.QBE.Types as Q50import Language.QBE.Util (bind, decNumber, octNumber, float)51import Text.ParserCombinators.Parsec52 ( Parser,53 alphaNum,54 anyChar,55 between,56 char,57 choice,58 letter,59 many,60 many1,61 manyTill,62 newline,63 noneOf,64 oneOf,65 optional,66 optionMaybe,67 sepBy,68 sepBy1,69 skipMany,70 skipMany1,71 string,72 try,73 (<?>),74 (<|>),75 )76\end{code}77}7879This an executable description of the80\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},81specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}82parser combinators and generated from a literate Haskell file. The description83is derived from the original QBE IL documentation, licensed under MIT.84Presently, this implementation targets version 1.2 of the QBE intermediate85language and aims to be equivalent with the original specification.8687\section{Basic Concepts}8889The intermediate language (IL) is a higher-level language than the90machine's assembly language. It smoothes most of the91irregularities of the underlying hardware and allows an infinite number92of temporaries to be used. This higher abstraction level lets frontend93programmers focus on language design issues.9495\subsection{Input Files}9697The intermediate language is provided to QBE as text. Usually, one file98is generated per each compilation unit from the frontend input language.99An IL file is a sequence of \nameref{sec:definitions} for100data, functions, and types. Once processed by QBE, the resulting file101can be assembled and linked using a standard toolchain (e.g., GNU102binutils).103104\begin{code}105comment :: Parser ()106comment = skipMany blankNL >> comment' >> skipMany blankNL107 where108 comment' = char '#' >> manyTill anyChar newline109\end{code}110111\ignore{112\begin{code}113skipNoCode :: Parser () -> Parser ()114skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP115\end{code}116}117118Here is a complete "Hello World" IL file which defines a function that119prints to the screen. Since the string is not a first class object (only120the pointer is) it is defined outside the function\textquotesingle s121body. Comments start with a \# character and finish with the end of the122line.123124\begin{verbatim}125data $str = { b "hello world", b 0 }126127export function w $main() {128@start129 # Call the puts function with $str as argument.130 %r =w call $puts(l $str)131 ret 0132}133\end{verbatim}134135If you have read the LLVM language reference, you might recognize the136example above. In comparison, QBE makes a much lighter use of types and137the syntax is terser.138139\subsection{Parser Combinators}140141\ignore{142\begin{code}143bracesNL :: Parser a -> Parser a144bracesNL = between (wsNL $ char '{') (wsNL $ char '}')145146quoted :: Parser a -> Parser a147quoted = let q = char '"' in between q q148149sepByTrail1 :: Parser a -> Parser sep -> Parser [a]150sepByTrail1 p sep = do151 x <- p152 xs <- many (try $ sep >> p)153 _ <- optional sep154 return (x:xs)155156sepByTrail :: Parser a -> Parser sep -> Parser [a]157sepByTrail p sep = sepByTrail1 p sep <|> return []158159parenLst :: Parser a -> Parser [a]160parenLst p = between (ws $ char '(') (char ')') inner161 where162 inner = sepBy (ws p) (ws $ char ',')163164unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr165unaryInstr conc keyword = do166 _ <- ws (string keyword)167 conc <$> ws val168169binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr170binaryInstr conc keyword = do171 _ <- ws (string keyword)172 vfst <- ws val <* ws (char ',')173 conc vfst <$> ws val174175-- Can only appear in data and type definitions and hence allows newlines.176alignAny :: Parser Word64177alignAny = (ws1 (string "align")) >> wsNL decNumber178179-- Returns true if it is signed.180signageChar :: Parser Bool181signageChar = (char 's' <|> char 'u') <&> (== 's')182\end{code}183}184185The original QBE specification defines the syntax using a BNF grammar. In186contrast, this document defines it using Parsec parser combinators. As such,187this specification is less formal but more accurate as the parsing code is188actually executable. Consequently, this specification also captures constructs189omitted in the original specification (e.g., \nameref{sec:identifiers}, or190\nameref{sec:strlit}). Nonetheless, the formal language recognized by these191combinators aims to be equivalent to the one of the BNF grammar.192193\subsection{Identifiers}194\label{sec:identifiers}195196% Ident is not documented in the original QBE specification.197% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304198199\begin{code}200ident :: Parser String201ident = do202 start <- letter <|> oneOf "._"203 rest <- many (alphaNum <|> oneOf "$._")204 return $ start : rest205\end{code}206207Identifiers for data, types, and functions can start with any ASCII letter or208the special characters \texttt{.} and \texttt{\_}. This initial character can209be followed by a sequence of zero or more alphanumeric characters and the210special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.211212\subsection{Sigils}213214\begin{code}215userDef :: Parser Q.UserIdent216userDef = Q.UserIdent <$> (char ':' >> ident)217218global :: Parser Q.GlobalIdent219global = Q.GlobalIdent <$> (char '$' >> ident)220221local :: Parser Q.LocalIdent222local = Q.LocalIdent <$> (char '%' >> ident)223224label :: Parser Q.BlockIdent225label = Q.BlockIdent <$> (char '@' >> ident)226\end{code}227228The intermediate language makes heavy use of sigils, all user-defined229names are prefixed with a sigil. This is to avoid keyword conflicts, and230also to quickly spot the scope and nature of identifiers.231232\begin{itemize}233 \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}234 \item \texttt{\$} is for globals (represented by a pointer)235 \item \texttt{\%} is for function-scope temporaries236 \item \texttt{@@} is for block labels237\end{itemize}238239\subsection{Spacing}240241\begin{code}242blank :: Parser Char243blank = oneOf "\t " <?> "blank"244245blankNL :: Parser Char246blankNL = oneOf "\n\t " <?> "blank or newline"247\end{code}248249Individual tokens in IL files must be separated by one or more spacing250characters. Both spaces and tabs are recognized as spacing characters.251In data and type definitions, newlines may also be used as spaces to252prevent overly long lines. When exactly one of two consecutive tokens is253a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.254255\ignore{256\begin{code}257ws :: Parser a -> Parser a258ws p = p <* skipMany blank259260ws1 :: Parser a -> Parser a261ws1 p = p <* skipMany1 blank262263wsNL :: Parser a -> Parser a264wsNL p = p <* skipNoCode (skipMany blankNL)265266wsNL1 :: Parser a -> Parser a267wsNL1 p = p <* skipNoCode (skipMany1 blankNL)268269-- Only intended to be used to skip comments at the start of a file.270skipInitComments :: Parser ()271skipInitComments = skipNoCode (skipMany blankNL)272\end{code}273}274275\subsection{String Literals}276\label{sec:strlit}277278% The string literal is not documented in the original QBE specification.279% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287280281\begin{code}282strLit :: Parser String283strLit = concat <$> quoted (many strChr)284 where285 strChr :: Parser [Char]286 strChr = (singleton <$> noneOf "\"\\") <|> escSeq287288 -- TODO: not documnted in the QBE BNF.289 octEsc :: Parser Char290 octEsc = do291 n <- octNumber292 pure $ chr (fromIntegral n)293294 escSeq :: Parser [Char]295 escSeq = try $ do296 esc <- char '\\'297 (singleton <$> octEsc) <|> (anyChar <&> (\c -> [esc, c]))298\end{code}299300Strings are enclosed by double quotes and are, for example, used to specify a301section name as part of the \nameref{sec:linkage} information. Within a string,302a double quote can be escaped using a \texttt{\textbackslash} character. All303escape sequences, including double quote escaping, are passed through as-is to304the generated assembly file.305306\section{Types}307308\subsection{Simple Types}309310The IL makes minimal use of types. By design, the types used are311restricted to what is necessary for unambiguous compilation to machine312code and C interfacing. Unlike LLVM, QBE is not using types as a means313to safety; they are only here for semantic purposes.314315\begin{code}316baseType :: Parser Q.BaseType317baseType = choice318 [ bind "w" Q.Word319 , bind "l" Q.Long320 , bind "s" Q.Single321 , bind "d" Q.Double ]322\end{code}323324The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}325(double), they stand respectively for 32-bit and 64-bit integers, and32632-bit and 64-bit floating-point numbers. There are no pointer types327available; pointers are typed by an integer type sufficiently wide to328represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).329Temporaries in the IL can only have a base type.330331\begin{code}332extType :: Parser Q.ExtType333extType = (Q.Base <$> baseType)334 <|> bind "b" Q.Byte335 <|> bind "h" Q.HalfWord336\end{code}337338Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),339respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}340and \nameref{sec:data} definitions.341342For C interfacing, the IL also provides user-defined aggregate types as343well as signed and unsigned variants of the sub-word extended types.344Read more about these types in the \nameref{sec:aggregate-types}345and \nameref{sec:functions} sections.346347\subsection{Subtyping}348\label{sec:subtyping}349350The IL has a minimal subtyping feature, for integer types only. Any351value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the35232 least significant bits of the word value are used.353354Make note that it is the opposite of the usual subtyping on integers (in355C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value356cannot be used in word context. The rationale is that a word can be357signed or unsigned, so extending it to a long could be done in two ways,358either by zero-extension, or by sign-extension.359360\subsection{Constants and Vals}361\label{sec:constants-and-vals}362363\begin{code}364dynConst :: Parser Q.DynConst365dynConst =366 (Q.Const <$> constant)367 <|> (Q.Thread <$> global)368 <?> "dynconst"369\end{code}370371Constants come in two kinds: compile-time constants and dynamic372constants. Dynamic constants include compile-time constants and other373symbol variants that are only known at program-load time or execution374time. Consequently, dynamic constants can only occur in function bodies.375376The representation of integers is two's complement.377Floating-point numbers are represented using the single-precision and378double-precision formats of the IEEE 754 standard.379380\begin{code}381constant :: Parser Q.Const382constant =383 (Q.Number <$> decNumber)384 <|> (Q.SFP <$> sfp)385 <|> (Q.DFP <$> dfp)386 <|> (Q.Global <$> global)387 <?> "const"388 where389 sfp = string "s_" >> float390 dfp = string "d_" >> float391\end{code}392393Constants specify a sequence of bits and are untyped. They are always394parsed as 64-bit blobs. Depending on the context surrounding a constant,395only some of its bits are used. For example, in the program below, the396two variables defined have the same value since the first operand of the397subtraction is a word (32-bit) context.398399\begin{verbatim}400%x =w sub -1, 0 %y =w sub 4294967295, 0401\end{verbatim}402403Because specifying floating-point constants by their bits makes the code404less readable, syntactic sugar is provided to express them. Standard405scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and406double precision numbers respectively. Once again, the following example407defines twice the same double-precision constant.408409\begin{verbatim}410%x =d add d_0, d_-1411%y =d add d_0, -4616189618054758400412\end{verbatim}413414Global symbols can also be used directly as constants; they will be415resolved and turned into actual numeric constants by the linker.416417When the \texttt{thread} keyword prefixes a symbol name, the418symbol\textquotesingle s numeric value is resolved at runtime in the419thread-local storage.420421\begin{code}422val :: Parser Q.Value423val =424 (Q.VConst <$> dynConst)425 <|> (Q.VLocal <$> local)426 <?> "val"427\end{code}428429Vals are used as arguments in regular, phi, and jump instructions within430function definitions. They are either constants or function-scope431temporaries.432433\subsection{Linkage}434\label{sec:linkage}435436\begin{code}437linkage :: Parser Q.Linkage438linkage =439 wsNL (bind "export" Q.LExport)440 <|> wsNL (bind "thread" Q.LThread)441 <|> do442 _ <- ws1 $ string "section"443 (try secWithFlags) <|> sec444 where445 sec :: Parser Q.Linkage446 sec = wsNL strLit <&> (`Q.LSection` Nothing)447448 secWithFlags :: Parser Q.Linkage449 secWithFlags = do450 n <- ws1 strLit451 wsNL strLit <&> Q.LSection n . Just452\end{code}453454Function and data definitions (see below) can specify linkage455information to be passed to the assembler and eventually to the linker.456457The \texttt{export} linkage flag marks the defined item as visible outside the458current file\textquotesingle s scope. If absent, the symbol can only be459referred to locally. Functions compiled by QBE and called from C need to460be exported.461462The \texttt{thread} linkage flag can only qualify data definitions. It mandates463that the object defined is stored in thread-local storage. Each time a464runtime thread starts, the supporting platform runtime is in charge of465making a new copy of the object for the fresh thread. Objects in466thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,467as specified in the \nameref{sec:constants-and-vals} section.468469A \texttt{section} flag can be specified to tell the linker to put the defined470item in a certain section. The use of the section flag is platform471dependent and we refer the user to the documentation of their assembler472and linker for relevant information.473474\begin{verbatim}475section ".init_array" data $.init.f = { l $f }476\end{verbatim}477478The section flag can be used to add function pointers to a global479initialization list, as depicted above. Note that some platforms provide480a BSS section that can be used to minimize the footprint of uniformly481zeroed data. When this section is available, QBE will automatically make482use of it and no section flag is required.483484The section and export linkage flags should each appear at most once in485a definition. If multiple occurrences are present, QBE is free to use486any.487488\subsection{Definitions}489\label{sec:definitions}490491Definitions are the essential components of an IL file. They can define492three types of objects: aggregate types, data, and functions. Aggregate493types are never exported and do not compile to any code. Data and494function definitions have file scope and are mutually recursive (even495across IL files). Their visibility can be controlled using linkage496flags.497498\subsubsection{Aggregate Types}499\label{sec:aggregate-types}500501\begin{code}502typeDef :: Parser Q.TypeDef503typeDef = do504 _ <- wsNL1 (string "type")505 i <- wsNL1 userDef506 _ <- wsNL1 (char '=')507 a <- optionMaybe alignAny508 bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a509\end{code}510511Aggregate type definitions start with the \texttt{type} keyword. They have file512scope, but types must be defined before being referenced. The inner513structure of a type is expressed by a comma-separated list of fields.514515\begin{code}516subType :: Parser Q.SubType517subType =518 (Q.SExtType <$> extType)519 <|> (Q.SUserDef <$> userDef)520521field :: Parser Q.Field522field = do523 -- TODO: newline is required if there is a number argument524 f <- wsNL subType525 s <- ws $ optionMaybe decNumber526 pure (f, s)527528fields :: Bool -> Parser [Q.Field]529fields allowEmpty =530 (if allowEmpty then sepByTrail else sepByTrail1) field (wsNL $ char ',')531\end{code}532533A field consists of a subtype, either an extended type or a user-defined type,534and an optional number expressing the value of this field. In case many items535of the same type are sequenced (like in a C array), the shorter array syntax536can be used.537538\begin{code}539regularType :: Parser Q.AggType540regularType = Q.ARegular <$> fields True541\end{code}542543Three different kinds of aggregate types are presentl ysupported: regular544types, union types and opaque types. The fields of regular types will be545packed. By default, the alignment of an aggregate type is the maximum alignment546of its members. The alignment can be explicitly specified by the programmer.547548\begin{code}549unionType :: Parser Q.AggType550unionType = Q.AUnion <$> many1 (wsNL unionType')551 where552 unionType' :: Parser [Q.Field]553 unionType' = bracesNL $ fields False554\end{code}555556Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.557558\begin{code}559opaqueType :: Parser Q.AggType560opaqueType = Q.AOpaque <$> wsNL decNumber561\end{code}562563Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.564565\subsubsection{Data}566\label{sec:data}567568\begin{code}569dataDef :: Parser Q.DataDef570dataDef = do571 link <- many linkage572 name <- wsNL1 (string "data") >> wsNL global573 _ <- wsNL (char '=')574 alignment <- optionMaybe alignAny575 bracesNL dataObjs <&> Q.DataDef link name alignment576 where577 -- TODO: sepByTrail is not documented in the QBE BNF.578 dataObjs = sepByTrail dataObj (wsNL $ char ',')579\end{code}580581Data definitions express objects that will be emitted in the compiled582file. Their visibility and location in the compiled artifact are583controlled with linkage flags described in the \nameref{sec:linkage}584section.585586They define a global identifier (starting with the sigil \texttt{\$}), that587will contain a pointer to the object specified by the definition.588589\begin{code}590dataObj :: Parser Q.DataObj591dataObj =592 (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))593 <|> do594 t <- wsNL1 extType595 i <- many1 (wsNL dataItem)596 return $ Q.OItem t i597\end{code}598599Objects are described by a sequence of fields that start with a type600letter. This letter can either be an extended type, or the \texttt{z} letter.601If the letter used is an extended type, the data item following602specifies the bits to be stored in the field.603604\begin{code}605dataItem :: Parser Q.DataItem606dataItem =607 (Q.DString <$> strLit)608 <|> try609 ( do610 i <- ws global611 off <- (ws $ char '+') >> ws decNumber612 return $ Q.DSymOff i off613 )614 <|> (Q.DConst <$> constant)615\end{code}616617Within each object, several items can be defined. When several data items618follow a letter, they initialize multiple fields of the same size.619620\begin{code}621allocSize :: Parser Q.AllocSize622allocSize =623 choice624 [ bind "4" Q.AllocWord,625 bind "8" Q.AllocLong,626 bind "16" Q.AllocLongLong627 ]628\end{code}629630The members of a struct will be packed. This means that padding has to631be emitted by the frontend when necessary. Alignment of the whole data632objects can be manually specified, and when no alignment is provided,633the maximum alignment from the platform is used.634635When the \texttt{z} letter is used the number following indicates the size of636the field; the contents of the field are zero initialized. It can be637used to add padding between fields or zero-initialize big arrays.638639\subsubsection{Functions}640\label{sec:functions}641642\begin{code}643funcDef :: Parser Q.FuncDef644funcDef = do645 link <- many linkage646 _ <- ws1 (string "function")647 retTy <- optionMaybe (ws1 abity)648 name <- ws global649 args <- wsNL params650 body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block651652 case (Q.insertJumps body) of653 Nothing -> fail $ "invalid fallthrough in " ++ show name654 Just bl -> return $ Q.FuncDef link name retTy args bl655\end{code}656657Function definitions contain the actual code to emit in the compiled658file. They define a global symbol that contains a pointer to the659function code. This pointer can be used in \texttt{call} instructions or stored660in memory.661662\begin{code}663subWordType :: Parser Q.SubWordType664subWordType = choice665 [ try $ bind "sb" Q.SignedByte666 , try $ bind "ub" Q.UnsignedByte667 , bind "sh" Q.SignedHalf668 , bind "uh" Q.UnsignedHalf ]669670abity :: Parser Q.Abity671abity = try (Q.ASubWordType <$> subWordType)672 <|> (Q.ABase <$> baseType)673 <|> (Q.AUserDef <$> userDef)674\end{code}675676The type given right before the function name is the return type of the677function. All return values of this function must have this return type.678If the return type is missing, the function must not return any value.679680\begin{code}681param :: Parser Q.FuncParam682param = (Q.Env <$> (ws1 (string "env") >> local))683 <|> (string "..." >> pure Q.Variadic)684 <|> do685 ty <- ws1 abity686 Q.Regular ty <$> local687688params :: Parser [Q.FuncParam]689params = parenLst param690\end{code}691692The parameter list is a comma separated list of temporary names prefixed693by types. The types are used to correctly implement C compatibility.694When an argument has an aggregate type, a pointer to the aggregate is695passed by thea caller. In the example below, we have to use a load696instruction to get the value of the first (and only) member of the697struct.698699\begin{verbatim}700type :one = { w }701702function w $getone(:one %p) {703@start704 %val =w loadw %p705 ret %val706}707\end{verbatim}708709If a function accepts or returns values that are smaller than a word,710such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type711must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,712respectively, for signed and unsigned 8-bit values, and signed and713unsigned 16-bit values. Parameters associated with a sub-word type of714bit width N only have their N least significant bits set and have base715type \texttt{w}. For example, the function716717\begin{verbatim}718function w $addbyte(w %a, sb %b) {719@start720 %bw =w extsb %b721 %val =w add %a, %bw722 ret %val723}724\end{verbatim}725726needs to sign-extend its second argument before the addition. Dually,727return values with sub-word types do not need to be sign or zero728extended.729730If the parameter list ends with \texttt{...}, the function is a variadic731function: it can accept a variable number of arguments. To access the732extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}733instructions described in the \nameref{sec:variadic} section.734735Optionally, the parameter list can start with an environment parameter736\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,737of type \texttt{l}). If the function does not use its environment parameter,738callers can safely omit it. This parameter is invisible to a C caller:739for example, the function740741\begin{verbatim}742export function w $add(env %e, w %a, w %b) {743@start744 %c =w add %a, %b745 ret %c746}747\end{verbatim}748749must be given the C prototype \texttt{int add(int, int)}. The intended use of750this feature is to pass the environment pointer of closures while751retaining a very good compatibility with C. The \nameref{sec:call}752section explains how to pass an environment parameter.753754Since global symbols are defined mutually recursive, there is no need755for function declarations: a function can be referenced before its756definition. Similarly, functions from other modules can be used without757previous declaration. All the type information necessary to compile a758call is in the instruction itself.759760The syntax and semantics for the body of functions are described in the761\nameref{sec:control} section.762763\section{Control}764\label{sec:control}765766The IL represents programs as textual transcriptions of control flow767graphs. The control flow is serialized as a sequence of blocks of768straight-line code which are connected using jump instructions.769770\subsection{Blocks}771\label{sec:blocks}772773\begin{code}774block :: Parser Q.Block'775block = do776 l <- wsNL1 label777 p <- many (wsNL1 $ try phiInstr)778 s <- many (wsNL1 statement)779 Q.Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr)780\end{code}781782All blocks have a name that is specified by a label at their beginning.783Then follows a sequence of instructions that have "fall-through" flow.784Finally one jump terminates the block. The jump can either transfer785control to another block of the same function or return; jumps are786described further below.787788The first block in a function must not be the target of any jump in the789program. If a jump to the function start is needed, the frontend must790insert an empty prelude block at the beginning of the function.791792When one block jumps to the next block in the IL file, it is not793necessary to write the jump instruction, it will be automatically added794by the parser. For example the start block in the example below jumps795directly to the loop block.796797\subsection{Jumps}798\label{sec:jumps}799800\begin{code}801jumpInstr :: Parser Q.JumpInstr802jumpInstr = (string "hlt" >> pure Q.Halt)803 -- TODO: Return requires a space if there is an optionMaybe804 <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)805 <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))806 <|> do807 _ <- ws1 $ string "jnz"808 v <- ws val <* ws (char ',')809 l1 <- ws label <* ws (char ',')810 l2 <- ws label811 return $ Q.Jnz v l1 l2812\end{code}813814A jump instruction ends every block and transfers the control to another815program location. The target of a jump must never be the first block in816a function. The three kinds of jumps available are described in the817following list.818819\begin{enumerate}820 \item \textbf{Unconditional jump.} Jumps to another block of the same function.821 \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.822 \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.823 \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.824\end{enumerate}825826\section{Instructions}827\label{sec:instructions}828829\begin{code}830instr :: Parser Q.Instr831instr =832 choice833 [ try $ binaryInstr Q.Add "add",834 try $ binaryInstr Q.Sub "sub",835 try $ binaryInstr Q.Mul "mul",836 try $ binaryInstr Q.Div "div",837 try $ binaryInstr Q.URem "urem",838 try $ binaryInstr Q.Rem "rem",839 try $ binaryInstr Q.UDiv "udiv",840 try $ binaryInstr Q.Or "or",841 try $ binaryInstr Q.Xor "xor",842 try $ binaryInstr Q.And "and",843 try $ binaryInstr Q.Sar "sar",844 try $ binaryInstr Q.Shr "shr",845 try $ binaryInstr Q.Shl "shl",846 try $ unaryInstr Q.Neg "neg",847 try $ unaryInstr Q.Cast "cast",848 try $ unaryInstr Q.Copy "copy",849 try $ loadInstr,850 try $ allocInstr,851 try $ compareInstr,852 try $ extInstr,853 try $ truncInstr,854 try $ fromFloatInstr,855 try $ toFloatInstr856 ]857\end{code}858859Instructions are the smallest piece of code in the IL, they form the body of860\nameref{sec:blocks}. This specification distinguishes instructions and861volatile instructions, the latter do not return a value. For the former, the IL862uses a three-address code, which means that one instruction computes an863operation between two operands and assigns the result to a third one.864865\begin{code}866assign :: Parser Q.Statement867assign = do868 n <- ws local869 t <- ws (char '=') >> ws1 baseType870 Q.Assign n t <$> instr871872volatileInstr :: Parser Q.Statement873volatileInstr = Q.Volatile <$> (storeInstr <|> blitInstr)874875-- TODO: Not documented in the QBE BNF.876statement :: Parser Q.Statement877statement = (try callInstr) <|> assign <|> volatileInstr878\end{code}879880An instruction has both a name and a return type, this return type is a base881type that defines the size of the instruction's result. The type of the882arguments can be unambiguously inferred using the instruction name and the883return type. For example, for all arithmetic instructions, the type of the884arguments is the same as the return type. The two additions below are valid if885\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).886887\begin{verbatim}888%x =w add 0, %y889%z =w add %x, %x890\end{verbatim}891892Some instructions, like comparisons and memory loads have operand types893that differ from their return types. For instance, two floating points894can be compared to give a word result (0 if the comparison succeeds, 1895if it fails).896897\begin{verbatim}898%c =w cgts %a, %b899\end{verbatim}900901In the example above, both operands have to have single type. This is902made explicit by the instruction suffix.903904\subsection{Arithmetic and Bits}905906\begin{quote}907\begin{itemize}908\item \texttt{add}, \texttt{sub}, \texttt{div}, \texttt{mul}909\item \texttt{neg}910\item \texttt{udiv}, \texttt{rem}, \texttt{urem}911\item \texttt{or}, \texttt{xor}, \texttt{and}912\item \texttt{sar}, \texttt{shr}, \texttt{shl}913\end{itemize}914\end{quote}915916The base arithmetic instructions in the first bullet are available for917all types, integers and floating points.918919When \texttt{div} is used with word or long return type, the arguments are920treated as signed. The unsigned integral division is available as \texttt{udiv}921instruction. When the result of a division is not an integer, it is truncated922towards zero.923924The signed and unsigned remainder operations are available as \texttt{rem} and925\texttt{urem}. The sign of the remainder is the same as the one of the926dividend. Its magnitude is smaller than the divisor one. These two instructions927and \texttt{udiv} are only available with integer arguments and result.928929Bitwise OR, AND, and XOR operations are available for both integer930types. Logical operations of typical programming languages can be931implemented using \nameref{sec:comparisions} and \nameref{sec:jumps}.932933Shift instructions \texttt{sar}, \texttt{shr}, and \texttt{shl}, shift right or934left their first operand by the amount from the second operand. The shifting935amount is taken modulo the size of the result type. Shifting right can either936preserve the sign of the value (using \texttt{sar}), or fill the newly freed937bits with zeroes (using \texttt{shr}). Shifting left always fills the freed938bits with zeroes.939940Remark that an arithmetic shift right (\texttt{sar}) is only equivalent to a941division by a power of two for non-negative numbers. This is because the shift942right "truncates" towards minus infinity, while the division truncates towards943zero.944945\subsection{Memory}946\label{sec:memory}947948The following sections discuss instructions for interacting with values stored in memory.949950\subsubsection{Store instructions}951952\begin{code}953storeInstr :: Parser Q.VolatileInstr954storeInstr = do955 t <- string "store" >> ws1 extType956 v <- ws val957 _ <- ws $ char ','958 ws val <&> Q.Store t v959\end{code}960961Store instructions exist to store a value of any base type and any extended962type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}963and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of964this word will be stored in memory at the address specified in the second965argument.966967\subsubsection{Load instructions}968969\begin{code}970loadInstr :: Parser Q.Instr971loadInstr = do972 _ <- string "load"973 t <- ws1 $ choice974 [ try $ bind "sw" (Q.LBase Q.Word),975 try $ bind "uw" (Q.LBase Q.Word),976 try $ Q.LSubWord <$> subWordType,977 Q.LBase <$> baseType978 ]979 ws val <&> Q.Load t980\end{code}981982For types smaller than long, two variants of the load instruction are983available: one will sign extend the loaded value, while the other will zero984extend it. Note that all loads smaller than long can load to either a long or a985word.986987The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect988when they are used to define a word temporary. A \texttt{loadw} instruction is989provided as syntactic sugar for \texttt{loadsw} to make explicit that the990extension mechanism used is irrelevant.991992\subsubsection{Blits}993994\begin{code}995blitInstr :: Parser Q.VolatileInstr996blitInstr = do997 v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')998 v2 <- ws val <* (ws $ char ',')999 nb <- decNumber1000 return $ Q.Blit v1 v2 nb1001\end{code}10021003The blit instruction copies in-memory data from its first address argument to1004its second address argument. The third argument is the number of bytes to copy.1005The source and destination spans are required to be either non-overlapping, or1006fully overlapping (source address identical to the destination address). The1007byte count argument must be a nonnegative numeric constant; it cannot be a1008temporary.10091010One blit instruction may generate a number of instructions proportional to its1011byte count argument, consequently, it is recommended to keep this argument1012relatively small. If large copies are necessary, it is preferable that1013frontends generate calls to a supporting \texttt{memcpy} function.10141015\subsubsection{Stack Allocation}10161017\begin{code}1018allocInstr :: Parser Q.Instr1019allocInstr = do1020 siz <- (ws $ string "alloc") >> (ws1 allocSize)1021 val <&> Q.Alloc siz1022\end{code}10231024These instructions allocate a chunk of memory on the stack. The number ending1025the instruction name is the alignment required for the allocated slot. QBE will1026make sure that the returned address is a multiple of that alignment value.10271028Stack allocation instructions are used, for example, when compiling the C local1029variables, because their address can be taken. When compiling Fortran,1030temporaries can be used directly instead, because it is illegal to take the1031address of a variable.10321033\subsection{Comparisons}1034\label{sec:comparisions}10351036\begin{code}1037compareInstr :: Parser Q.Instr1038compareInstr = do1039 _ <- char 'c'1040 (try intCompare) <|> floatCompare10411042compareArgs :: Parser (Q.Value, Q.Value)1043compareArgs = do1044 lhs <- ws val <* ws (char ',')1045 rhs <- ws val1046 pure (lhs, rhs)10471048intCompare :: Parser Q.Instr1049intCompare = do1050 op <- compareIntOp1051 ty <- ws1 intArg10521053 (lhs, rhs) <- compareArgs1054 pure $ Q.CompareInt ty op lhs rhs10551056floatCompare :: Parser Q.Instr1057floatCompare = do1058 op <- compareFloatOp1059 ty <- ws1 floatArg10601061 (lhs, rhs) <- compareArgs1062 pure $ Q.CompareFloat ty op lhs rhs1063\end{code}10641065Comparison instructions return an integer value (either a word or a long), and1066compare values of arbitrary types. The returned value is 1 if the two operands1067satisfy the comparison relation, or 0 otherwise. The names of comparisons1068respect a standard naming scheme in three parts:10691070\begin{enumerate}1071 \item All comparisons start with the letter \texttt{c}.1072 \item Then comes a comparison type.1073 \item Finally, the instruction name is terminated with a basic type suffix precising the type of the operands to be compared.1074\end{enumerate}10751076The following instruction are available for integer comparisons:10771078\begin{code}1079compareIntOp :: Parser Q.IntCmpOp1080compareIntOp = choice1081 [ bind "eq" Q.IEq1082 , bind "ne" Q.INe1083 , try $ bind "sle" Q.ISle1084 , try $ bind "slt" Q.ISlt1085 , try $ bind "sge" Q.ISge1086 , try $ bind "sgt" Q.ISgt1087 , try $ bind "ule" Q.IUle1088 , try $ bind "ult" Q.IUlt1089 , try $ bind "uge" Q.IUge1090 , try $ bind "ugt" Q.IUgt ]1091\end{code}10921093For floating point comparisons use one of these instructions:10941095\begin{code}1096compareFloatOp :: Parser Q.FloatCmpOp1097compareFloatOp = choice1098 [ bind "eq" Q.FEq1099 , bind "ne" Q.FNe1100 , try $ bind "le" Q.FLe1101 , bind "lt" Q.FLt1102 , try $ bind "ge" Q.FGe1103 , bind "gt" Q.FGt1104 , bind "o" Q.FOrd1105 , bind "uo" Q.FUnord ]1106\end{code}11071108For example, \texttt{cod} compares two double-precision floating point numbers1109and returns 1 if the two floating points are not NaNs, or 0 otherwise. The1110\texttt{csltw} instruction compares two words representing signed numbers and1111returns 1 when the first argument is smaller than the second one.11121113\subsection{Conversions}11141115Conversion operations change the representation of a value, possibly modifying1116it if the target type cannot hold the value of the source type. Conversions can1117extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or1118convert a floating point into an integer and vice versa.11191120\begin{code}1121extInstr :: Parser Q.Instr1122extInstr = do1123 _ <- string "ext"1124 ty <- ws1 extArg1125 ws val <&> Q.Ext ty1126 where1127 extArg :: Parser Q.ExtArg1128 extArg = try (Q.ExtSubWord <$> subWordType)1129 <|> try (bind "sw" Q.ExtSignedWord)1130 <|> bind "s" Q.ExtSingle1131 <|> bind "uw" Q.ExtUnsignedWord1132\end{code}11331134Extending the precision of a temporary is done using the \texttt{ext} family of1135instructions. Because QBE types do not specify the signedness (like in LLVM),1136extension instructions exist to sign-extend and zero-extend a value. For1137example, \texttt{extsb} takes a word argument and sign-extends the 81138least-significant bits to a full word or long, depending on the return type.11391140\begin{code}1141truncInstr :: Parser Q.Instr1142truncInstr = do1143 _ <- ws1 $ string "truncd"1144 ws val <&> Q.TruncDouble1145\end{code}11461147The instructions \texttt{exts} (extend single) and \texttt{truncd} (truncate1148double) are provided to change the precision of a floating point value. When1149the double argument of truncd cannot be represented as a single-precision1150floating point, it is truncated towards zero.11511152\begin{code}1153floatArg :: Parser Q.FloatArg1154floatArg = bind "d" Q.FDouble <|> bind "s" Q.FSingle11551156fromFloatInstr :: Parser Q.Instr1157fromFloatInstr = do1158 arg <- floatArg <* string "to"1159 isSigned <- signageChar1160 _ <- ws1 $ char 'i'1161 ws val <&> Q.FloatToInt arg isSigned11621163intArg :: Parser Q.IntArg1164intArg = bind "w" Q.IWord <|> bind "l" Q.ILong11651166toFloatInstr :: Parser Q.Instr1167toFloatInstr = do1168 isSigned <- signageChar1169 arg <- intArg1170 _ <- ws1 $ string "tof"1171 ws val <&> Q.IntToFloat arg isSigned1172\end{code}11731174Converting between signed integers and floating points is done using1175\texttt{stosi} (single to signed integer), \texttt{stoui} (single to unsigned1176integer), \texttt{dtosi} (double to signed integer), \texttt{dtoui} (double to1177unsigned integer), \texttt{swtof} (signed word to float), \texttt{uwtof}1178(unsigned word to float), \texttt{sltof} (signed long to float) and1179\texttt{ultof} (unsigned long to float).11801181\subsection{Cast and Copy}11821183The \texttt{cast} and \texttt{copy} instructions return the bits of their1184argument verbatim. However a cast will change an integer into a floating point1185of the same width and vice versa.11861187Casts can be used to make bitwise operations on the representation of floating1188point numbers. For example the following program will compute the opposite of1189the single-precision floating point number \texttt{\%f} into \texttt{\%rs}.11901191\begin{verbatim}1192%b0 =w cast %f1193%b1 =w xor 2147483648, %b0 # flip the msb1194%rs =s cast %b11195\end{verbatim}11961197\subsection{Call}1198\label{sec:call}11991200\begin{code}1201-- TODO: Code duplication with 'param'.1202callArg :: Parser Q.FuncArg1203callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))1204 <|> (string "..." >> pure Q.ArgVar)1205 <|> do1206 ty <- ws1 abity1207 Q.ArgReg ty <$> val12081209callArgs :: Parser [Q.FuncArg]1210callArgs = parenLst callArg12111212callInstr :: Parser Q.Statement1213callInstr = do1214 retValue <- optionMaybe $ do1215 i <- ws local <* ws (char '=')1216 a <- ws1 abity1217 return (i, a)1218 toCall <- ws1 (string "call") >> ws val1219 fnArgs <- callArgs1220 return $ Q.Call retValue toCall fnArgs1221\end{code}12221223The call instruction is special in several ways. It is not a three-address1224instruction and requires the type of all its arguments to be given. Also, the1225return type can be either a base type or an aggregate type. These specifics are1226required to compile calls with C compatibility (i.e., to respect the ABI).12271228When an aggregate type is used as argument type or return type, the value1229respectively passed or returned needs to be a pointer to a memory location1230holding the value. This is because aggregate types are not first-class1231citizens of the IL.12321233Sub-word types are used for arguments and return values of width less than a1234word. Details on these types are presented in the \nameref{sec:functions} section.1235Arguments with sub-word types need not be sign or zero extended according to1236their type. Calls with a sub-word return type define a temporary of base type1237\texttt{w} with its most significant bits unspecified.12381239Unless the called function does not return a value, a return temporary must be1240specified, even if it is never used afterwards.12411242An environment parameter can be passed as first argument using the \texttt{env}1243keyword. The passed value must be a 64-bit integer. If the called function does1244not expect an environment parameter, it will be safely discarded. See the1245\nameref{sec:functions} section for more information about environment1246parameters.12471248When the called function is variadic, there must be a \texttt{...} marker1249separating the named and variadic arguments.12501251\subsection{Variadic}1252\label{sec:variadic}12531254To-Do.12551256\subsection{Phi}12571258\begin{code}1259phiBranch :: Parser (Q.BlockIdent, Q.Value)1260phiBranch = do1261 n <- ws1 label1262 v <- val1263 pure (n, v)12641265phiInstr :: Parser Q.Phi1266phiInstr = do1267 -- TODO: code duplication with 'assign'1268 n <- ws local1269 t <- ws (char '=') >> ws1 baseType12701271 _ <- ws1 (string "phi")1272 -- TODO: combinator for sepBy1273 p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')1274 return $ Q.Phi n t p1275\end{code}12761277First and foremost, phi instructions are NOT necessary when writing a frontend1278to QBE. One solution to avoid having to deal with SSA form is to use stack1279allocated variables for all source program variables and perform assignments1280and lookups using \nameref{sec:memory} operations. This is what LLVM users1281typically do.12821283Another solution is to simply emit code that is not in SSA form! Contrary to1284LLVM, QBE is able to fixup programs not in SSA form without requiring the1285boilerplate of loading and storing in memory. For example, the following1286program will be correctly compiled by QBE.12871288\begin{verbatim}1289@start1290 %x =w copy 1001291 %s =w copy 01292@loop1293 %s =w add %s, %x1294 %x =w sub %x, 11295 jnz %x, @loop, @end1296@end1297 ret %s1298\end{verbatim}12991300Now, if you want to know what phi instructions are and how to use them in QBE,1301you can read the following.13021303Phi instructions are specific to SSA form. In SSA form values can only be1304assigned once, without phi instructions, this requirement is too strong to1305represent many programs. For example consider the following C program.13061307\begin{verbatim}1308int f(int x) {1309 int y;1310 if (x)1311 y = 1;1312 else1313 y = 2;1314 return y;1315}1316\end{verbatim}13171318The variable \texttt{y} is assigned twice, the solution to translate it in SSA1319form is to insert a phi instruction.13201321\begin{verbatim}1322@ifstmt1323 jnz %x, @ift, @iff1324@ift1325 jmp @retstmt1326@iff1327 jmp @retstmt1328@retstmt1329 %y =w phi @ift 1, @iff 21330 ret %y1331\end{verbatim}13321333Phi instructions return one of their arguments depending on where the control1334came from. In the example, \texttt{\%y} is set to 1 if the1335\texttt{\textbackslash{}ift} branch is taken, or it is set to 2 otherwise.13361337An important remark about phi instructions is that QBE assumes that if a1338variable is defined by a phi it respects all the SSA invariants. So it is1339critical to not use phi instructions unless you know exactly what you are1340doing.1341\end{document}