quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

   1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>
   2% SPDX-FileCopyrightText: 2025-2026 Sören Tempel <soeren+git@soeren-tempel.net>
   3%
   4% SPDX-License-Identifier: MIT AND GPL-3.0-only
   5
   6\documentclass{article}
   7%include polycode.fmt
   8
   9%subst blankline = "\\[5mm]"
  10
  11% See https://github.com/kosmikus/lhs2tex/issues/58
  12%format <$> = "\mathbin{\langle\$\rangle}"
  13%format <&> = "\mathbin{\langle\&\rangle}"
  14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"
  15%format <?> = "\mathbin{\langle?\rangle}"
  16%format <*> = "\mathbin{\langle*\rangle}"
  17%format <*  = "\mathbin{\langle*}"
  18%format *>  = "\mathbin{*\rangle}"
  19
  20\long\def\ignore#1{}
  21
  22\usepackage{hyperref}
  23\hypersetup{
  24	colorlinks = true,
  25}
  26
  27\begin{document}
  28
  29\title{QBE Intermediate Language\vspace{-2em}}
  30\date{}
  31\maketitle
  32\frenchspacing
  33
  34\ignore{
  35\begin{code}
  36module Language.QBE.Parser
  37  ( skipInitComments,
  38    dataDef,
  39    typeDef,
  40    funcDef,
  41    fileDef
  42  )
  43where
  44
  45import Data.Char (chr)
  46import Data.Word (Word64)
  47import Data.Functor ((<&>))
  48import Data.List (singleton)
  49import Data.Map qualified as Map
  50import qualified Language.QBE.Types as Q
  51import Language.QBE.Util (bind, decNumber, octNumber, float)
  52import Text.ParserCombinators.Parsec
  53  ( Parser,
  54    alphaNum,
  55    anyChar,
  56    between,
  57    char,
  58    choice,
  59    letter,
  60    many,
  61    many1,
  62    manyTill,
  63    newline,
  64    noneOf,
  65    oneOf,
  66    optional,
  67    optionMaybe,
  68    sepBy,
  69    sepBy1,
  70    skipMany,
  71    skipMany1,
  72    string,
  73    try,
  74    (<?>),
  75    (<|>),
  76  )
  77\end{code}
  78}
  79
  80This an executable description of the
  81\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},
  82specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}
  83parser combinators and generated from a literate Haskell file. The description
  84is derived from the original QBE IL documentation, licensed under MIT.
  85Presently, this implementation targets version 1.2 of the QBE intermediate
  86language and aims to be equivalent with the original specification.
  87
  88\section{Basic Concepts}
  89
  90The intermediate language (IL) is a higher-level language than the
  91machine's assembly language. It smoothes most of the
  92irregularities of the underlying hardware and allows an infinite number
  93of temporaries to be used. This higher abstraction level lets frontend
  94programmers focus on language design issues.
  95
  96\subsection{Input Files}
  97
  98The intermediate language is provided to QBE as text. Usually, one file
  99is generated per each compilation unit from the frontend input language.
 100An IL file is a sequence of \nameref{sec:definitions} for
 101data, functions, and types. Once processed by QBE, the resulting file
 102can be assembled and linked using a standard toolchain (e.g., GNU
 103binutils).
 104
 105\begin{code}
 106comment :: Parser ()
 107comment = skipMany blankNL >> comment' >> skipMany blankNL
 108  where
 109    comment' = char '#' >> manyTill anyChar newline
 110\end{code}
 111
 112\ignore{
 113\begin{code}
 114skipNoCode :: Parser () -> Parser ()
 115skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP
 116\end{code}
 117}
 118
 119Here is a complete "Hello World" IL file which defines a function that
 120prints to the screen. Since the string is not a first class object (only
 121the pointer is) it is defined outside the function\textquotesingle s
 122body. Comments start with a \# character and finish with the end of the
 123line.
 124
 125\begin{verbatim}
 126data $str = { b "hello world", b 0 }
 127
 128export function w $main() {
 129@start
 130        # Call the puts function with $str as argument.
 131        %r =w call $puts(l $str)
 132        ret 0
 133}
 134\end{verbatim}
 135
 136If you have read the LLVM language reference, you might recognize the
 137example above. In comparison, QBE makes a much lighter use of types and
 138the syntax is terser.
 139
 140\subsection{Parser Combinators}
 141
 142\ignore{
 143\begin{code}
 144bracesNL :: Parser a -> Parser a
 145bracesNL = between (wsNL $ char '{') (wsNL $ char '}')
 146
 147quoted :: Parser a -> Parser a
 148quoted = let q = char '"' in between q q
 149
 150sepByTrail1 :: Parser a -> Parser sep -> Parser [a]
 151sepByTrail1 p sep = do
 152  x <- p
 153  xs <- many (try $ sep >> p)
 154  _ <- optional sep
 155  return (x:xs)
 156
 157sepByTrail :: Parser a -> Parser sep -> Parser [a]
 158sepByTrail p sep = sepByTrail1 p sep <|> return []
 159
 160parenLst :: Parser a -> Parser [a]
 161parenLst p = between (ws $ char '(') (char ')') inner
 162  where
 163    inner = sepBy (ws p) (ws $ char ',')
 164
 165unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 166unaryInstr conc keyword = do
 167  _ <- ws (string keyword)
 168  conc <$> ws val
 169
 170binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 171binaryInstr conc keyword = do
 172  _ <- ws (string keyword)
 173  vfst <- ws val <* ws (char ',')
 174  conc vfst <$> ws val
 175
 176-- Can only appear in data and type definitions and hence allows newlines.
 177alignAny :: Parser Word64
 178alignAny = (ws1 (string "align")) >> wsNL decNumber
 179
 180-- Returns true if it is signed.
 181signageChar :: Parser Bool
 182signageChar = (char 's' <|> char 'u') <&> (== 's')
 183\end{code}
 184}
 185
 186The original QBE specification defines the syntax using a BNF grammar. In
 187contrast, this document defines it using Parsec parser combinators. As such,
 188this specification is less formal but more accurate as the parsing code is
 189actually executable. Consequently, this specification also captures constructs
 190omitted in the original specification (e.g., \nameref{sec:identifiers}, or
 191\nameref{sec:strlit}). Nonetheless, the formal language recognized by these
 192combinators aims to be equivalent to the one of the BNF grammar.
 193
 194\subsection{Identifiers}
 195\label{sec:identifiers}
 196
 197% Ident is not documented in the original QBE specification.
 198% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304
 199
 200\begin{code}
 201ident :: Parser String
 202ident = do
 203  start <- letter <|> oneOf "._"
 204  rest <- many (alphaNum <|> oneOf "$._")
 205  return $ start : rest
 206\end{code}
 207
 208Identifiers for data, types, and functions can start with any ASCII letter or
 209the special characters \texttt{.} and \texttt{\_}. This initial character can
 210be followed by a sequence of zero or more alphanumeric characters and the
 211special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.
 212
 213\subsection{Sigils}
 214
 215\begin{code}
 216userDef :: Parser Q.UserIdent
 217userDef = Q.UserIdent <$> (char ':' >> ident)
 218
 219global :: Parser Q.GlobalIdent
 220global = Q.GlobalIdent <$> (char '$' >> ident)
 221
 222local :: Parser Q.LocalIdent
 223local = Q.LocalIdent <$> (char '%' >> ident)
 224
 225label :: Parser Q.BlockIdent
 226label = Q.BlockIdent <$> (char '@' >> ident)
 227\end{code}
 228
 229The intermediate language makes heavy use of sigils, all user-defined
 230names are prefixed with a sigil. This is to avoid keyword conflicts, and
 231also to quickly spot the scope and nature of identifiers.
 232
 233\begin{itemize}
 234  \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}
 235  \item \texttt{\$} is for globals (represented by a pointer)
 236  \item \texttt{\%} is for function-scope temporaries
 237  \item \texttt{@@} is for block labels
 238\end{itemize}
 239
 240\subsection{Spacing}
 241
 242\begin{code}
 243blank :: Parser Char
 244blank = oneOf "\t " <?> "blank"
 245
 246blankNL :: Parser Char
 247blankNL = oneOf "\n\t " <?> "blank or newline"
 248\end{code}
 249
 250Individual tokens in IL files must be separated by one or more spacing
 251characters. Both spaces and tabs are recognized as spacing characters.
 252In data and type definitions, newlines may also be used as spaces to
 253prevent overly long lines. When exactly one of two consecutive tokens is
 254a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.
 255
 256\ignore{
 257\begin{code}
 258ws :: Parser a -> Parser a
 259ws p = p <* skipMany blank
 260
 261ws1 :: Parser a -> Parser a
 262ws1 p = p <* skipMany1 blank
 263
 264wsNL :: Parser a -> Parser a
 265wsNL p = p <* skipNoCode (skipMany blankNL)
 266
 267wsNL1 :: Parser a -> Parser a
 268wsNL1 p = p <* skipNoCode (skipMany1 blankNL)
 269
 270-- Only intended to be used to skip comments at the start of a file.
 271skipInitComments :: Parser ()
 272skipInitComments = skipNoCode (skipMany blankNL)
 273\end{code}
 274}
 275
 276\subsection{String Literals}
 277\label{sec:strlit}
 278
 279% The string literal is not documented in the original QBE specification.
 280% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287
 281
 282\begin{code}
 283strLit :: Parser String
 284strLit = concat <$> quoted (many strChr)
 285  where
 286    strChr :: Parser [Char]
 287    strChr = (singleton <$> noneOf "\"\\") <|> escSeq
 288
 289    -- TODO: not documnted in the QBE BNF.
 290    octEsc :: Parser Char
 291    octEsc = do
 292      n <- octNumber
 293      pure $ chr (fromIntegral n)
 294
 295    escSeq :: Parser [Char]
 296    escSeq = try $ do
 297      esc <- char '\\'
 298      (singleton <$> octEsc) <|> (anyChar <&> (\c -> [esc, c]))
 299\end{code}
 300
 301Strings are enclosed by double quotes and are, for example, used to specify a
 302section name as part of the \nameref{sec:linkage} information. Within a string,
 303a double quote can be escaped using a \texttt{\textbackslash} character. All
 304escape sequences, including double quote escaping, are passed through as-is to
 305the generated assembly file.
 306
 307\section{Types}
 308
 309\subsection{Simple Types}
 310
 311The IL makes minimal use of types. By design, the types used are
 312restricted to what is necessary for unambiguous compilation to machine
 313code and C interfacing. Unlike LLVM, QBE is not using types as a means
 314to safety; they are only here for semantic purposes.
 315
 316\begin{code}
 317baseType :: Parser Q.BaseType
 318baseType = choice
 319  [ bind "w" Q.Word
 320  , bind "l" Q.Long
 321  , bind "s" Q.Single
 322  , bind "d" Q.Double ]
 323\end{code}
 324
 325The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}
 326(double), they stand respectively for 32-bit and 64-bit integers, and
 32732-bit and 64-bit floating-point numbers. There are no pointer types
 328available; pointers are typed by an integer type sufficiently wide to
 329represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).
 330Temporaries in the IL can only have a base type.
 331
 332\begin{code}
 333extType :: Parser Q.ExtType
 334extType = (Q.Base <$> baseType)
 335       <|> bind "b" Q.Byte
 336       <|> bind "h" Q.HalfWord
 337\end{code}
 338
 339Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),
 340respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}
 341and \nameref{sec:data} definitions.
 342
 343For C interfacing, the IL also provides user-defined aggregate types as
 344well as signed and unsigned variants of the sub-word extended types.
 345Read more about these types in the \nameref{sec:aggregate-types}
 346and \nameref{sec:functions} sections.
 347
 348\subsection{Subtyping}
 349\label{sec:subtyping}
 350
 351The IL has a minimal subtyping feature, for integer types only. Any
 352value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the
 35332 least significant bits of the word value are used.
 354
 355Make note that it is the opposite of the usual subtyping on integers (in
 356C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value
 357cannot be used in word context. The rationale is that a word can be
 358signed or unsigned, so extending it to a long could be done in two ways,
 359either by zero-extension, or by sign-extension.
 360
 361\subsection{Constants and Vals}
 362\label{sec:constants-and-vals}
 363
 364\begin{code}
 365dynConst :: Parser Q.DynConst
 366dynConst =
 367  (Q.Const <$> constant)
 368    <|> (Q.Thread <$> (key "thread" >> global))
 369    <|> (Q.Extern <$> try (key "extern" >> global))
 370    <|> (Q.ExternThread <$> (key "extern" >> key "thread" >> global))
 371    <?> "dynconst"
 372  where
 373    key s = ws1 $ string s
 374\end{code}
 375
 376Constants come in two kinds: compile-time constants and dynamic
 377constants. Dynamic constants include compile-time constants and other
 378symbol variants that are only known at program-load time or execution
 379time. Consequently, dynamic constants can only occur in function bodies.
 380
 381When the \texttt{extern} keyword prefixes a symbol name, the symbol is
 382accessed indirectly through a table edited by the dynamic linker (e.g.,
 383GOT/PLT). This enables PIE/PIC code generation. When \texttt{extern} is
 384combined with \texttt{thread}, the symbol is accessed using the
 385initial-exec TLS model, suitable for thread-local variables defined in
 386shared objects available at startup time (i.e., not loaded through
 387dlopen).
 388
 389The representation of integers is two's complement.
 390Floating-point numbers are represented using the single-precision and
 391double-precision formats of the IEEE 754 standard.
 392
 393\begin{code}
 394constant :: Parser Q.Const
 395constant =
 396  (Q.Number <$> decNumber)
 397    <|> (Q.SFP <$> sfp)
 398    <|> (Q.DFP <$> dfp)
 399    <|> (Q.Global <$> global)
 400    <?> "const"
 401  where
 402    sfp = string "s_" >> float
 403    dfp = string "d_" >> float
 404\end{code}
 405
 406Constants specify a sequence of bits and are untyped. They are always
 407parsed as 64-bit blobs. Depending on the context surrounding a constant,
 408only some of its bits are used. For example, in the program below, the
 409two variables defined have the same value since the first operand of the
 410subtraction is a word (32-bit) context.
 411
 412\begin{verbatim}
 413%x =w sub -1, 0 %y =w sub 4294967295, 0
 414\end{verbatim}
 415
 416Because specifying floating-point constants by their bits makes the code
 417less readable, syntactic sugar is provided to express them. Standard
 418scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and
 419double precision numbers respectively. Once again, the following example
 420defines twice the same double-precision constant.
 421
 422\begin{verbatim}
 423%x =d add d_0, d_-1
 424%y =d add d_0, -4616189618054758400
 425\end{verbatim}
 426
 427Global symbols can also be used directly as constants; they will be
 428resolved and turned into actual numeric constants by the linker.
 429
 430When the \texttt{thread} keyword prefixes a symbol name, the
 431symbol\textquotesingle s numeric value is resolved at runtime in the
 432thread-local storage.
 433
 434\begin{code}
 435val :: Parser Q.Value
 436val =
 437  (Q.VConst <$> dynConst)
 438    <|> (Q.VLocal <$> local)
 439    <?> "val"
 440\end{code}
 441
 442Vals are used as arguments in regular, phi, and jump instructions within
 443function definitions. They are either constants or function-scope
 444temporaries.
 445
 446\subsection{Linkage}
 447\label{sec:linkage}
 448
 449\begin{code}
 450linkage :: Parser Q.Linkage
 451linkage =
 452  wsNL (bind "export" Q.LExport)
 453    <|> wsNL (bind "thread" Q.LThread)
 454    <|> do
 455      _ <- ws1 $ string "section"
 456      (try secWithFlags) <|> sec
 457  where
 458    sec :: Parser Q.Linkage
 459    sec = wsNL strLit <&> (`Q.LSection` Nothing)
 460
 461    secWithFlags :: Parser Q.Linkage
 462    secWithFlags = do
 463      n <- ws1 strLit
 464      wsNL strLit <&> Q.LSection n . Just
 465\end{code}
 466
 467Function and data definitions (see below) can specify linkage
 468information to be passed to the assembler and eventually to the linker.
 469
 470The \texttt{export} linkage flag marks the defined item as visible outside the
 471current file\textquotesingle s scope. If absent, the symbol can only be
 472referred to locally. Functions compiled by QBE and called from C need to
 473be exported.
 474
 475The \texttt{thread} linkage flag can only qualify data definitions. It mandates
 476that the object defined is stored in thread-local storage. Each time a
 477runtime thread starts, the supporting platform runtime is in charge of
 478making a new copy of the object for the fresh thread. Objects in
 479thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,
 480as specified in the \nameref{sec:constants-and-vals} section.
 481
 482A \texttt{section} flag can be specified to tell the linker to put the defined
 483item in a certain section. The use of the section flag is platform
 484dependent and we refer the user to the documentation of their assembler
 485and linker for relevant information.
 486
 487\begin{verbatim}
 488section ".init_array" data $.init.f = { l $f }
 489\end{verbatim}
 490
 491The section flag can be used to add function pointers to a global
 492initialization list, as depicted above. Note that some platforms provide
 493a BSS section that can be used to minimize the footprint of uniformly
 494zeroed data. When this section is available, QBE will automatically make
 495use of it and no section flag is required.
 496
 497The section and export linkage flags should each appear at most once in
 498a definition. If multiple occurrences are present, QBE is free to use
 499any.
 500
 501\subsection{Definitions}
 502\label{sec:definitions}
 503
 504Definitions are the essential components of an IL file. They can define
 505three types of objects: aggregate types, data, and functions. Aggregate
 506types are never exported and do not compile to any code. Data and
 507function definitions have file scope and are mutually recursive (even
 508across IL files). Their visibility can be controlled using linkage
 509flags.
 510
 511\subsubsection{Aggregate Types}
 512\label{sec:aggregate-types}
 513
 514\begin{code}
 515typeDef :: Parser Q.TypeDef
 516typeDef = do
 517  _ <- wsNL1 (string "type")
 518  i <- wsNL1 userDef
 519  _ <- wsNL1 (char '=')
 520  a <- optionMaybe alignAny
 521  bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a
 522\end{code}
 523
 524Aggregate type definitions start with the \texttt{type} keyword. They have file
 525scope, but types must be defined before being referenced. The inner
 526structure of a type is expressed by a comma-separated list of fields.
 527
 528\begin{code}
 529subType :: Parser Q.SubType
 530subType =
 531  (Q.SExtType <$> extType)
 532    <|> (Q.SUserDef <$> userDef)
 533
 534field :: Parser Q.Field
 535field = do
 536  -- TODO: newline is required if there is a number argument
 537  f <- wsNL subType
 538  s <- ws $ optionMaybe decNumber
 539  pure (f, s)
 540
 541fields :: Bool -> Parser [Q.Field]
 542fields allowEmpty =
 543  (if allowEmpty then sepByTrail else sepByTrail1) field (wsNL $ char ',')
 544\end{code}
 545
 546A field consists of a subtype, either an extended type or a user-defined type,
 547and an optional number expressing the value of this field. In case many items
 548of the same type are sequenced (like in a C array), the shorter array syntax
 549can be used.
 550
 551\begin{code}
 552regularType :: Parser Q.AggType
 553regularType = Q.ARegular <$> fields True
 554\end{code}
 555
 556Three different kinds of aggregate types are presentl ysupported: regular
 557types, union types and opaque types. The fields of regular types will be
 558packed. By default, the alignment of an aggregate type is the maximum alignment
 559of its members. The alignment can be explicitly specified by the programmer.
 560
 561\begin{code}
 562unionType :: Parser Q.AggType
 563unionType = Q.AUnion <$> many1 (wsNL unionType')
 564  where
 565    unionType' :: Parser [Q.Field]
 566    unionType' = bracesNL $ fields False
 567\end{code}
 568
 569Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.
 570
 571\begin{code}
 572opaqueType :: Parser Q.AggType
 573opaqueType = Q.AOpaque <$> wsNL decNumber
 574\end{code}
 575
 576Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.
 577
 578\subsubsection{Data}
 579\label{sec:data}
 580
 581\begin{code}
 582dataDef :: Parser Q.DataDef
 583dataDef = do
 584  link <- many linkage
 585  name <- wsNL1 (string "data") >> wsNL global
 586  _ <- wsNL (char '=')
 587  alignment <- optionMaybe alignAny
 588  bracesNL dataObjs <&> Q.DataDef link name alignment
 589 where
 590    -- TODO: sepByTrail is not documented in the QBE BNF.
 591    dataObjs = sepByTrail dataObj (wsNL $ char ',')
 592\end{code}
 593
 594Data definitions express objects that will be emitted in the compiled
 595file. Their visibility and location in the compiled artifact are
 596controlled with linkage flags described in the \nameref{sec:linkage}
 597section.
 598
 599They define a global identifier (starting with the sigil \texttt{\$}), that
 600will contain a pointer to the object specified by the definition.
 601
 602\begin{code}
 603dataObj :: Parser Q.DataObj
 604dataObj =
 605  (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))
 606    <|> do
 607      t <- wsNL1 extType
 608      i <- many1 (wsNL dataItem)
 609      return $ Q.OItem t i
 610\end{code}
 611
 612Objects are described by a sequence of fields that start with a type
 613letter. This letter can either be an extended type, or the \texttt{z} letter.
 614If the letter used is an extended type, the data item following
 615specifies the bits to be stored in the field.
 616
 617\begin{code}
 618dataItem :: Parser Q.DataItem
 619dataItem =
 620  (Q.DString <$> strLit)
 621    <|> try
 622      ( do
 623          i <- ws global
 624          off <- (ws $ char '+') >> ws decNumber
 625          return $ Q.DSymOff i off
 626      )
 627    <|> (Q.DConst <$> constant)
 628\end{code}
 629
 630Within each object, several items can be defined. When several data items
 631follow a letter, they initialize multiple fields of the same size.
 632
 633\begin{code}
 634allocSize :: Parser Q.AllocSize
 635allocSize =
 636  choice
 637    [ bind "4" Q.AllocWord,
 638      bind "8" Q.AllocLong,
 639      bind "16" Q.AllocLongLong
 640    ]
 641\end{code}
 642
 643The members of a struct will be packed. This means that padding has to
 644be emitted by the frontend when necessary. Alignment of the whole data
 645objects can be manually specified, and when no alignment is provided,
 646the maximum alignment from the platform is used.
 647
 648When the \texttt{z} letter is used the number following indicates the size of
 649the field; the contents of the field are zero initialized. It can be
 650used to add padding between fields or zero-initialize big arrays.
 651
 652\subsubsection{Functions}
 653\label{sec:functions}
 654
 655\begin{code}
 656funcDef :: Parser Q.FuncDef
 657funcDef = do
 658  link <- many linkage
 659  _ <- ws1 (string "function")
 660  retTy <- optionMaybe (ws1 abity)
 661  name <- ws global
 662  args <- wsNL params
 663  body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block
 664
 665  case (Q.insertJumps body) of
 666    Nothing -> fail $ "invalid fallthrough in " ++ show name
 667    Just bl -> return $ Q.FuncDef link name retTy args bl
 668\end{code}
 669
 670Function definitions contain the actual code to emit in the compiled
 671file. They define a global symbol that contains a pointer to the
 672function code. This pointer can be used in \texttt{call} instructions or stored
 673in memory.
 674
 675\begin{code}
 676subWordType :: Parser Q.SubWordType
 677subWordType = choice
 678  [ try $ bind "sb" Q.SignedByte
 679  , try $ bind "ub" Q.UnsignedByte
 680  , bind "sh" Q.SignedHalf
 681  , bind "uh" Q.UnsignedHalf ]
 682
 683abity :: Parser Q.Abity
 684abity = try (Q.ASubWordType <$> subWordType)
 685    <|> (Q.ABase <$> baseType)
 686    <|> (Q.AUserDef <$> userDef)
 687\end{code}
 688
 689The type given right before the function name is the return type of the
 690function. All return values of this function must have this return type.
 691If the return type is missing, the function must not return any value.
 692
 693\begin{code}
 694param :: Parser Q.FuncParam
 695param = (Q.Env <$> (ws1 (string "env") >> local))
 696    <|> (string "..." >> pure Q.Variadic)
 697    <|> do
 698          ty <- ws1 abity
 699          Q.Regular ty <$> local
 700
 701params :: Parser [Q.FuncParam]
 702params = parenLst param
 703\end{code}
 704
 705The parameter list is a comma separated list of temporary names prefixed
 706by types. The types are used to correctly implement C compatibility.
 707When an argument has an aggregate type, a pointer to the aggregate is
 708passed by thea caller. In the example below, we have to use a load
 709instruction to get the value of the first (and only) member of the
 710struct.
 711
 712\begin{verbatim}
 713type :one = { w }
 714
 715function w $getone(:one %p) {
 716@start
 717        %val =w loadw %p
 718        ret %val
 719}
 720\end{verbatim}
 721
 722If a function accepts or returns values that are smaller than a word,
 723such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type
 724must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,
 725respectively, for signed and unsigned 8-bit values, and signed and
 726unsigned 16-bit values. Parameters associated with a sub-word type of
 727bit width N only have their N least significant bits set and have base
 728type \texttt{w}. For example, the function
 729
 730\begin{verbatim}
 731function w $addbyte(w %a, sb %b) {
 732@start
 733        %bw =w extsb %b
 734        %val =w add %a, %bw
 735        ret %val
 736}
 737\end{verbatim}
 738
 739needs to sign-extend its second argument before the addition. Dually,
 740return values with sub-word types do not need to be sign or zero
 741extended.
 742
 743If the parameter list ends with \texttt{...}, the function is a variadic
 744function: it can accept a variable number of arguments. To access the
 745extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}
 746instructions described in the \nameref{sec:variadic} section.
 747
 748Optionally, the parameter list can start with an environment parameter
 749\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,
 750of type \texttt{l}). If the function does not use its environment parameter,
 751callers can safely omit it. This parameter is invisible to a C caller:
 752for example, the function
 753
 754\begin{verbatim}
 755export function w $add(env %e, w %a, w %b) {
 756@start
 757        %c =w add %a, %b
 758        ret %c
 759}
 760\end{verbatim}
 761
 762must be given the C prototype \texttt{int add(int, int)}. The intended use of
 763this feature is to pass the environment pointer of closures while
 764retaining a very good compatibility with C. The \nameref{sec:call}
 765section explains how to pass an environment parameter.
 766
 767Since global symbols are defined mutually recursive, there is no need
 768for function declarations: a function can be referenced before its
 769definition. Similarly, functions from other modules can be used without
 770previous declaration. All the type information necessary to compile a
 771call is in the instruction itself.
 772
 773The syntax and semantics for the body of functions are described in the
 774\nameref{sec:control} section.
 775
 776\section{Control}
 777\label{sec:control}
 778
 779The IL represents programs as textual transcriptions of control flow
 780graphs. The control flow is serialized as a sequence of blocks of
 781straight-line code which are connected using jump instructions.
 782
 783\subsection{Blocks}
 784\label{sec:blocks}
 785
 786\begin{code}
 787block :: Parser Q.Block'
 788block = do
 789  l <- wsNL1 label
 790  p <- many (wsNL1 $ try phiInstr)
 791  s <- many (wsNL1 statement)
 792  Q.Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr)
 793\end{code}
 794
 795All blocks have a name that is specified by a label at their beginning.
 796Then follows a sequence of instructions that have "fall-through" flow.
 797Finally one jump terminates the block. The jump can either transfer
 798control to another block of the same function or return; jumps are
 799described further below.
 800
 801The first block in a function must not be the target of any jump in the
 802program. If a jump to the function start is needed, the frontend must
 803insert an empty prelude block at the beginning of the function.
 804
 805When one block jumps to the next block in the IL file, it is not
 806necessary to write the jump instruction, it will be automatically added
 807by the parser. For example the start block in the example below jumps
 808directly to the loop block.
 809
 810\subsection{Jumps}
 811\label{sec:jumps}
 812
 813\begin{code}
 814jumpInstr :: Parser Q.JumpInstr
 815jumpInstr = (string "hlt" >> pure Q.Halt)
 816        -- TODO: Return requires a space if there is an optionMaybe
 817        <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)
 818        <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))
 819        <|> do
 820          _ <- ws1 $ string "jnz"
 821          v <- ws val <* ws (char ',')
 822          l1 <- ws label <* ws (char ',')
 823          l2 <- ws label
 824          return $ Q.Jnz v l1 l2
 825\end{code}
 826
 827A jump instruction ends every block and transfers the control to another
 828program location. The target of a jump must never be the first block in
 829a function. The three kinds of jumps available are described in the
 830following list.
 831
 832\begin{enumerate}
 833  \item \textbf{Unconditional jump.} Jumps to another block of the same function.
 834  \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.
 835  \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.
 836  \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.
 837\end{enumerate}
 838
 839\section{Instructions}
 840\label{sec:instructions}
 841
 842\begin{code}
 843instr :: Parser Q.Instr
 844instr =
 845  choice
 846    [ try $ binaryInstr Q.Add "add",
 847      try $ binaryInstr Q.Sub "sub",
 848      try $ binaryInstr Q.Mul "mul",
 849      try $ binaryInstr Q.Div "div",
 850      try $ binaryInstr Q.URem "urem",
 851      try $ binaryInstr Q.Rem "rem",
 852      try $ binaryInstr Q.UDiv "udiv",
 853      try $ binaryInstr Q.Or "or",
 854      try $ binaryInstr Q.Xor "xor",
 855      try $ binaryInstr Q.And "and",
 856      try $ binaryInstr Q.Sar "sar",
 857      try $ binaryInstr Q.Shr "shr",
 858      try $ binaryInstr Q.Shl "shl",
 859      try $ unaryInstr Q.Neg "neg",
 860      try $ unaryInstr Q.Cast "cast",
 861      try $ unaryInstr Q.Copy "copy",
 862      try $ unaryInstr Q.VAArg "vaarg",
 863      try $ loadInstr,
 864      try $ allocInstr,
 865      try $ compareInstr,
 866      try $ extInstr,
 867      try $ truncInstr,
 868      try $ fromFloatInstr,
 869      try $ toFloatInstr
 870    ]
 871\end{code}
 872
 873Instructions are the smallest piece of code in the IL, they form the body of
 874\nameref{sec:blocks}. This specification distinguishes instructions and
 875volatile instructions, the latter do not return a value. For the former, the IL
 876uses a three-address code, which means that one instruction computes an
 877operation between two operands and assigns the result to a third one.
 878
 879\begin{code}
 880assign :: Parser Q.Statement
 881assign = do
 882  n <- ws local
 883  t <- ws (char '=') >> ws1 baseType
 884  Q.Assign n t <$> instr
 885
 886volatileInstr :: Parser Q.Statement
 887volatileInstr =
 888  Q.Volatile <$>
 889    (storeInstr <|> blitInstr <|> vastartInstr <|> dbglocInstr)
 890
 891-- TODO: Not documented in the QBE BNF.
 892statement :: Parser Q.Statement
 893statement = (try callInstr) <|> assign <|> volatileInstr
 894\end{code}
 895
 896An instruction has both a name and a return type, this return type is a base
 897type that defines the size of the instruction's result. The type of the
 898arguments can be unambiguously inferred using the instruction name and the
 899return type. For example, for all arithmetic instructions, the type of the
 900arguments is the same as the return type. The two additions below are valid if
 901\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).
 902
 903\begin{verbatim}
 904%x =w add 0, %y
 905%z =w add %x, %x
 906\end{verbatim}
 907
 908Some instructions, like comparisons and memory loads have operand types
 909that differ from their return types. For instance, two floating points
 910can be compared to give a word result (0 if the comparison succeeds, 1
 911if it fails).
 912
 913\begin{verbatim}
 914%c =w cgts %a, %b
 915\end{verbatim}
 916
 917In the example above, both operands have to have single type. This is
 918made explicit by the instruction suffix.
 919
 920\subsection{Arithmetic and Bits}
 921
 922\begin{quote}
 923\begin{itemize}
 924\item \texttt{add}, \texttt{sub}, \texttt{div}, \texttt{mul}
 925\item \texttt{neg}
 926\item \texttt{udiv}, \texttt{rem}, \texttt{urem}
 927\item \texttt{or}, \texttt{xor}, \texttt{and}
 928\item \texttt{sar}, \texttt{shr}, \texttt{shl}
 929\end{itemize}
 930\end{quote}
 931
 932The base arithmetic instructions in the first bullet are available for
 933all types, integers and floating points.
 934
 935When \texttt{div} is used with word or long return type, the arguments are
 936treated as signed. The unsigned integral division is available as \texttt{udiv}
 937instruction. When the result of a division is not an integer, it is truncated
 938towards zero.
 939
 940The signed and unsigned remainder operations are available as \texttt{rem} and
 941\texttt{urem}. The sign of the remainder is the same as the one of the
 942dividend. Its magnitude is smaller than the divisor one. These two instructions
 943and \texttt{udiv} are only available with integer arguments and result.
 944
 945Bitwise OR, AND, and XOR operations are available for both integer
 946types. Logical operations of typical programming languages can be
 947implemented using \nameref{sec:comparisions} and \nameref{sec:jumps}.
 948
 949Shift instructions \texttt{sar}, \texttt{shr}, and \texttt{shl}, shift right or
 950left their first operand by the amount from the second operand. The shifting
 951amount is taken modulo the size of the result type. Shifting right can either
 952preserve the sign of the value (using \texttt{sar}), or fill the newly freed
 953bits with zeroes (using \texttt{shr}). Shifting left always fills the freed
 954bits with zeroes.
 955
 956Remark that an arithmetic shift right (\texttt{sar}) is only equivalent to a
 957division by a power of two for non-negative numbers. This is because the shift
 958right "truncates" towards minus infinity, while the division truncates towards
 959zero.
 960
 961\subsection{Memory}
 962\label{sec:memory}
 963
 964The following sections discuss instructions for interacting with values stored in memory.
 965
 966\subsubsection{Store instructions}
 967
 968\begin{code}
 969storeInstr :: Parser Q.VolatileInstr
 970storeInstr = do
 971  t <- string "store" >> ws1 extType
 972  v <- ws val
 973  _ <- ws $ char ','
 974  ws val <&> Q.Store t v
 975\end{code}
 976
 977Store instructions exist to store a value of any base type and any extended
 978type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}
 979and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of
 980this word will be stored in memory at the address specified in the second
 981argument.
 982
 983\subsubsection{Load instructions}
 984
 985\begin{code}
 986loadInstr :: Parser Q.Instr
 987loadInstr = do
 988  _ <- string "load"
 989  t <- ws1 $ choice
 990    [ try $ bind "sw" (Q.LBase Q.Word),
 991      try $ bind "uw" (Q.LBase Q.Word),
 992      try $ Q.LSubWord <$> subWordType,
 993      Q.LBase <$> baseType
 994    ]
 995  ws val <&> Q.Load t
 996\end{code}
 997
 998For types smaller than long, two variants of the load instruction are
 999available: one will sign extend the loaded value, while the other will zero
1000extend it. Note that all loads smaller than long can load to either a long or a
1001word.
1002
1003The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect
1004when they are used to define a word temporary. A \texttt{loadw} instruction is
1005provided as syntactic sugar for \texttt{loadsw} to make explicit that the
1006extension mechanism used is irrelevant.
1007
1008\subsubsection{Blits}
1009
1010\begin{code}
1011blitInstr :: Parser Q.VolatileInstr
1012blitInstr = do
1013  v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')
1014  v2 <- ws val <* (ws $ char ',')
1015  nb <- decNumber
1016  return $ Q.Blit v1 v2 nb
1017\end{code}
1018
1019The blit instruction copies in-memory data from its first address argument to
1020its second address argument. The third argument is the number of bytes to copy.
1021The source and destination spans are required to be either non-overlapping, or
1022fully overlapping (source address identical to the destination address). The
1023byte count argument must be a nonnegative numeric constant; it cannot be a
1024temporary.
1025
1026One blit instruction may generate a number of instructions proportional to its
1027byte count argument, consequently, it is recommended to keep this argument
1028relatively small. If large copies are necessary, it is preferable that
1029frontends generate calls to a supporting \texttt{memcpy} function.
1030
1031\subsubsection{Stack Allocation}
1032
1033\begin{code}
1034allocInstr :: Parser Q.Instr
1035allocInstr = do
1036  siz <- (ws $ string "alloc") >> (ws1 allocSize)
1037  val <&> Q.Alloc siz
1038\end{code}
1039
1040These instructions allocate a chunk of memory on the stack. The number ending
1041the instruction name is the alignment required for the allocated slot. QBE will
1042make sure that the returned address is a multiple of that alignment value.
1043
1044Stack allocation instructions are used, for example, when compiling the C local
1045variables, because their address can be taken. When compiling Fortran,
1046temporaries can be used directly instead, because it is illegal to take the
1047address of a variable.
1048
1049\subsection{Comparisons}
1050\label{sec:comparisions}
1051
1052\begin{code}
1053compareInstr :: Parser Q.Instr
1054compareInstr = do
1055  _ <- char 'c'
1056  (try intCompare) <|> floatCompare
1057
1058compareArgs :: Parser (Q.Value, Q.Value)
1059compareArgs = do
1060  lhs <- ws val <* ws (char ',')
1061  rhs <- ws val
1062  pure (lhs, rhs)
1063
1064intCompare :: Parser Q.Instr
1065intCompare = do
1066  op <- compareIntOp
1067  ty <- ws1 intArg
1068
1069  (lhs, rhs) <- compareArgs
1070  pure $ Q.CompareInt ty op lhs rhs
1071
1072floatCompare :: Parser Q.Instr
1073floatCompare = do
1074  op <- compareFloatOp
1075  ty <- ws1 floatArg
1076
1077  (lhs, rhs) <- compareArgs
1078  pure $ Q.CompareFloat ty op lhs rhs
1079\end{code}
1080
1081Comparison instructions return an integer value (either a word or a long), and
1082compare values of arbitrary types. The returned value is 1 if the two operands
1083satisfy the comparison relation, or 0 otherwise. The names of comparisons
1084respect a standard naming scheme in three parts:
1085
1086\begin{enumerate}
1087  \item All comparisons start with the letter \texttt{c}.
1088  \item Then comes a comparison type.
1089  \item Finally, the instruction name is terminated with a basic type suffix precising the type of the operands to be compared.
1090\end{enumerate}
1091
1092The following instruction are available for integer comparisons:
1093
1094\begin{code}
1095compareIntOp :: Parser Q.IntCmpOp
1096compareIntOp = choice
1097  [ bind "eq" Q.IEq
1098  , bind "ne" Q.INe
1099  , try $ bind "sle" Q.ISle
1100  , try $ bind "slt" Q.ISlt
1101  , try $ bind "sge" Q.ISge
1102  , try $ bind "sgt" Q.ISgt
1103  , try $ bind "ule" Q.IUle
1104  , try $ bind "ult" Q.IUlt
1105  , try $ bind "uge" Q.IUge
1106  , try $ bind "ugt" Q.IUgt ]
1107\end{code}
1108
1109For floating point comparisons use one of these instructions:
1110
1111\begin{code}
1112compareFloatOp :: Parser Q.FloatCmpOp
1113compareFloatOp = choice
1114  [ bind "eq" Q.FEq
1115  , bind "ne" Q.FNe
1116  , try $ bind "le" Q.FLe
1117  , bind "lt" Q.FLt
1118  , try $ bind "ge" Q.FGe
1119  , bind "gt" Q.FGt
1120  , bind "o" Q.FOrd
1121  , bind "uo" Q.FUnord ]
1122\end{code}
1123
1124For example, \texttt{cod} compares two double-precision floating point numbers
1125and returns 1 if the two floating points are not NaNs, or 0 otherwise. The
1126\texttt{csltw} instruction compares two words representing signed numbers and
1127returns 1 when the first argument is smaller than the second one.
1128
1129\subsection{Conversions}
1130
1131Conversion operations change the representation of a value, possibly modifying
1132it if the target type cannot hold the value of the source type. Conversions can
1133extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or
1134convert a floating point into an integer and vice versa.
1135
1136\begin{code}
1137extInstr :: Parser Q.Instr
1138extInstr = do
1139  _ <- string "ext"
1140  ty <- ws1 extArg
1141  ws val <&> Q.Ext ty
1142 where
1143  extArg :: Parser Q.ExtArg
1144  extArg = try (Q.ExtSubWord <$> subWordType)
1145    <|> try (bind "sw" Q.ExtSignedWord)
1146    <|> bind "s" Q.ExtSingle
1147    <|> bind "uw" Q.ExtUnsignedWord
1148\end{code}
1149
1150Extending the precision of a temporary is done using the \texttt{ext} family of
1151instructions. Because QBE types do not specify the signedness (like in LLVM),
1152extension instructions exist to sign-extend and zero-extend a value. For
1153example, \texttt{extsb} takes a word argument and sign-extends the 8
1154least-significant bits to a full word or long, depending on the return type.
1155
1156\begin{code}
1157truncInstr :: Parser Q.Instr
1158truncInstr = do
1159  _ <- ws1 $ string "truncd"
1160  ws val <&> Q.TruncDouble
1161\end{code}
1162
1163The instructions \texttt{exts} (extend single) and \texttt{truncd} (truncate
1164double) are provided to change the precision of a floating point value. When
1165the double argument of truncd cannot be represented as a single-precision
1166floating point, it is truncated towards zero.
1167
1168\begin{code}
1169floatArg :: Parser Q.FloatArg
1170floatArg = bind "d" Q.FDouble <|> bind "s" Q.FSingle
1171
1172fromFloatInstr :: Parser Q.Instr
1173fromFloatInstr = do
1174  arg <- floatArg <* string "to"
1175  isSigned <- signageChar
1176  _ <- ws1 $ char 'i'
1177  ws val <&> Q.FloatToInt arg isSigned
1178
1179intArg :: Parser Q.IntArg
1180intArg = bind "w" Q.IWord <|> bind "l" Q.ILong
1181
1182toFloatInstr :: Parser Q.Instr
1183toFloatInstr = do
1184  isSigned <- signageChar
1185  arg <- intArg
1186  _ <- ws1 $ string "tof"
1187  ws val <&> Q.IntToFloat arg isSigned
1188\end{code}
1189
1190Converting between signed integers and floating points is done using
1191\texttt{stosi} (single to signed integer), \texttt{stoui} (single to unsigned
1192integer), \texttt{dtosi} (double to signed integer), \texttt{dtoui} (double to
1193unsigned integer), \texttt{swtof} (signed word to float), \texttt{uwtof}
1194(unsigned word to float), \texttt{sltof} (signed long to float) and
1195\texttt{ultof} (unsigned long to float).
1196
1197\subsection{Cast and Copy}
1198
1199The \texttt{cast} and \texttt{copy} instructions return the bits of their
1200argument verbatim. However a cast will change an integer into a floating point
1201of the same width and vice versa.
1202
1203Casts can be used to make bitwise operations on the representation of floating
1204point numbers. For example the following program will compute the opposite of
1205the single-precision floating point number \texttt{\%f} into \texttt{\%rs}.
1206
1207\begin{verbatim}
1208%b0 =w cast %f
1209%b1 =w xor 2147483648, %b0  # flip the msb
1210%rs =s cast %b1
1211\end{verbatim}
1212
1213\subsection{Call}
1214\label{sec:call}
1215
1216\begin{code}
1217-- TODO: Code duplication with 'param'.
1218callArg :: Parser Q.FuncArg
1219callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))
1220    <|> (string "..." >> pure Q.ArgVar)
1221    <|> do
1222          ty <- ws1 abity
1223          Q.ArgReg ty <$> val
1224
1225callArgs :: Parser [Q.FuncArg]
1226callArgs = parenLst callArg
1227
1228callInstr :: Parser Q.Statement
1229callInstr = do
1230  retValue <- optionMaybe $ do
1231    i <- ws local <* ws (char '=')
1232    a <- ws1 abity
1233    return (i, a)
1234  toCall <- ws1 (string "call") >> ws val
1235  fnArgs <- callArgs
1236  return $ Q.Call retValue toCall fnArgs
1237\end{code}
1238
1239The call instruction is special in several ways. It is not a three-address
1240instruction and requires the type of all its arguments to be given. Also, the
1241return type can be either a base type or an aggregate type. These specifics are
1242required to compile calls with C compatibility (i.e., to respect the ABI).
1243
1244When an aggregate type is used as argument type or return type, the value
1245respectively passed or returned needs to be a pointer to a memory location
1246holding the value. This is because aggregate types are not first-class
1247citizens of the IL.
1248
1249Sub-word types are used for arguments and return values of width less than a
1250word. Details on these types are presented in the \nameref{sec:functions} section.
1251Arguments with sub-word types need not be sign or zero extended according to
1252their type. Calls with a sub-word return type define a temporary of base type
1253\texttt{w} with its most significant bits unspecified.
1254
1255Unless the called function does not return a value, a return temporary must be
1256specified, even if it is never used afterwards.
1257
1258An environment parameter can be passed as first argument using the \texttt{env}
1259keyword. The passed value must be a 64-bit integer. If the called function does
1260not expect an environment parameter, it will be safely discarded. See the
1261\nameref{sec:functions} section for more information about environment
1262parameters.
1263
1264When the called function is variadic, there must be a \texttt{...} marker
1265separating the named and variadic arguments.
1266
1267\subsection{Variadic}
1268\label{sec:variadic}
1269
1270\begin{code}
1271vastartInstr :: Parser Q.VolatileInstr
1272vastartInstr = do
1273  _ <- ws1 (string "vastart")
1274  Q.VAStart <$> ws val
1275\end{code}
1276
1277The \texttt{vastart} and \texttt{vaarg} instructions provide a portable way to
1278access the extra parameters of a variadic function.
1279
1280\begin{enumerate}
1281  \item \texttt{vastart} -- \texttt{(m)}
1282  \item \texttt{vaarg} -- \texttt{T(mmmm)}
1283\end{enumerate}
1284
1285The \texttt{vastart} instruction initializes a variable argument list used to
1286access the extra parameters of the enclosing variadic function. It is safe to
1287call it multiple times.
1288
1289The \texttt{vaarg} instruction fetches the next argument from a variable
1290argument list. It is currently limited to fetching arguments that have a base
1291type. This instruction is essentially effectful: calling it twice in a row will
1292return two consecutive arguments from the argument list.
1293
1294Both instructions take a pointer to a variable argument list as the sole argument.
1295The size and alignment of the variable argument lists depends on the target used.
1296
1297\subsection{Phi}
1298
1299\begin{code}
1300phiBranch :: Parser (Q.BlockIdent, Q.Value)
1301phiBranch = do
1302  n <- ws1 label
1303  v <- val
1304  pure (n, v)
1305
1306phiInstr :: Parser Q.Phi
1307phiInstr = do
1308  -- TODO: code duplication with 'assign'
1309  n <- ws local
1310  t <- ws (char '=') >> ws1 baseType
1311
1312  _ <- ws1 (string "phi")
1313  -- TODO: combinator for sepBy
1314  p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')
1315  return $ Q.Phi n t p
1316\end{code}
1317
1318First and foremost, phi instructions are NOT necessary when writing a frontend
1319to QBE. One solution to avoid having to deal with SSA form is to use stack
1320allocated variables for all source program variables and perform assignments
1321and lookups using \nameref{sec:memory} operations. This is what LLVM users
1322typically do.
1323
1324Another solution is to simply emit code that is not in SSA form! Contrary to
1325LLVM, QBE is able to fixup programs not in SSA form without requiring the
1326boilerplate of loading and storing in memory. For example, the following
1327program will be correctly compiled by QBE.
1328
1329\begin{verbatim}
1330@start
1331    %x =w copy 100
1332    %s =w copy 0
1333@loop
1334    %s =w add %s, %x
1335    %x =w sub %x, 1
1336    jnz %x, @loop, @end
1337@end
1338    ret %s
1339\end{verbatim}
1340
1341Now, if you want to know what phi instructions are and how to use them in QBE,
1342you can read the following.
1343
1344Phi instructions are specific to SSA form. In SSA form values can only be
1345assigned once, without phi instructions, this requirement is too strong to
1346represent many programs. For example consider the following C program.
1347
1348\begin{verbatim}
1349int f(int x) {
1350    int y;
1351    if (x)
1352        y = 1;
1353    else
1354        y = 2;
1355    return y;
1356}
1357\end{verbatim}
1358
1359The variable \texttt{y} is assigned twice, the solution to translate it in SSA
1360form is to insert a phi instruction.
1361
1362\begin{verbatim}
1363@ifstmt
1364    jnz %x, @ift, @iff
1365@ift
1366    jmp @retstmt
1367@iff
1368    jmp @retstmt
1369@retstmt
1370    %y =w phi @ift 1, @iff 2
1371    ret %y
1372\end{verbatim}
1373
1374Phi instructions return one of their arguments depending on where the control
1375came from. In the example, \texttt{\%y} is set to 1 if the
1376\texttt{\textbackslash{}ift} branch is taken, or it is set to 2 otherwise.
1377
1378An important remark about phi instructions is that QBE assumes that if a
1379variable is defined by a phi it respects all the SSA invariants. So it is
1380critical to not use phi instructions unless you know exactly what you are
1381doing.
1382
1383\subsection{Debug Information}
1384
1385QBE supports the inclusion of debug information. Specifically, it allows
1386defining from which source file type, data, and function definitions originated.
1387For this purpose, it provides the \texttt{dbgfile} definition, which receives a
1388file name (string literal) as its sole argument. Every type, data and function
1389definition thereafter are assumed to originate in this file.
1390
1391\begin{code}
1392-- TODO: not documnted in the QBE BNF.
1393fileDef :: Parser String
1394fileDef = do
1395  _ <- ws1 $ string "dbgfile"
1396  wsNL1 strLit
1397\end{code}
1398
1399Further, instructions within a function can be associated with a specific line
1400and column number of a previously defined \texttt{dbgfile}. The
1401\texttt{dbgfile} is referenced by index using the first argument to
1402\texttt{dbgloc}. The second argument represents the line number, the third
1403(optional) argument the column number.
1404
1405\begin{code}
1406-- TODO: not documnted in the QBE BNF.
1407dbglocInstr :: Parser Q.VolatileInstr
1408dbglocInstr = do
1409  _ <- ws1 $ string "dbgloc"
1410  file <- ws decNumber <* ws (char ',')
1411  line <- ws decNumber
1412  col  <- optionMaybe (ws (char ',') >> ws decNumber)
1413  return $ Q.DBGLoc file line col
1414\end{code}
1415
1416\end{document}