quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

   1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>
   2% SPDX-FileCopyrightText: 2025-2026 Sören Tempel <soeren+git@soeren-tempel.net>
   3%
   4% SPDX-License-Identifier: MIT AND GPL-3.0-only
   5
   6\documentclass{article}
   7%include polycode.fmt
   8
   9%subst blankline = "\\[5mm]"
  10
  11% See https://github.com/kosmikus/lhs2tex/issues/58
  12%format <$> = "\mathbin{\langle\$\rangle}"
  13%format <&> = "\mathbin{\langle\&\rangle}"
  14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"
  15%format <?> = "\mathbin{\langle?\rangle}"
  16%format <*> = "\mathbin{\langle*\rangle}"
  17%format <*  = "\mathbin{\langle*}"
  18%format *>  = "\mathbin{*\rangle}"
  19
  20\long\def\ignore#1{}
  21
  22\usepackage{hyperref}
  23\hypersetup{
  24	colorlinks = true,
  25}
  26
  27\begin{document}
  28
  29\title{QBE Intermediate Language\vspace{-2em}}
  30\date{}
  31\maketitle
  32\frenchspacing
  33
  34\ignore{
  35\begin{code}
  36module Language.QBE.Parser
  37  ( skipInitComments,
  38    dataDef,
  39    typeDef,
  40    funcDef,
  41    fileDef
  42  )
  43where
  44
  45import Control.Monad (foldM)
  46import Data.Char (chr)
  47import Data.Word (Word64)
  48import Data.Functor ((<&>))
  49import Data.List (singleton)
  50import Data.Map qualified as Map
  51import qualified Language.QBE.Types as Q
  52import Language.QBE.Util (bind, decNumber, octNumber, float)
  53import Text.ParserCombinators.Parsec
  54  ( Parser,
  55    alphaNum,
  56    anyChar,
  57    between,
  58    char,
  59    choice,
  60    letter,
  61    many,
  62    many1,
  63    manyTill,
  64    newline,
  65    noneOf,
  66    oneOf,
  67    optional,
  68    optionMaybe,
  69    sepBy,
  70    sepBy1,
  71    skipMany,
  72    skipMany1,
  73    string,
  74    try,
  75    (<?>),
  76    (<|>),
  77  )
  78\end{code}
  79}
  80
  81This an executable description of the
  82\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},
  83specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}
  84parser combinators and generated from a literate Haskell file. The description
  85is derived from the original QBE IL documentation, licensed under MIT.
  86Presently, this implementation targets version 1.2 of the QBE intermediate
  87language and aims to be equivalent with the original specification.
  88
  89\section{Basic Concepts}
  90
  91The intermediate language (IL) is a higher-level language than the
  92machine's assembly language. It smoothes most of the
  93irregularities of the underlying hardware and allows an infinite number
  94of temporaries to be used. This higher abstraction level lets frontend
  95programmers focus on language design issues.
  96
  97\subsection{Input Files}
  98
  99The intermediate language is provided to QBE as text. Usually, one file
 100is generated per each compilation unit from the frontend input language.
 101An IL file is a sequence of \nameref{sec:definitions} for
 102data, functions, and types. Once processed by QBE, the resulting file
 103can be assembled and linked using a standard toolchain (e.g., GNU
 104binutils).
 105
 106\begin{code}
 107comment :: Parser ()
 108comment = skipMany blankNL >> comment' >> skipMany blankNL
 109  where
 110    comment' = char '#' >> manyTill anyChar newline
 111\end{code}
 112
 113\ignore{
 114\begin{code}
 115skipNoCode :: Parser () -> Parser ()
 116skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP
 117\end{code}
 118}
 119
 120Here is a complete "Hello World" IL file which defines a function that
 121prints to the screen. Since the string is not a first class object (only
 122the pointer is) it is defined outside the function\textquotesingle s
 123body. Comments start with a \# character and finish with the end of the
 124line.
 125
 126\begin{verbatim}
 127data $str = { b "hello world", b 0 }
 128
 129export function w $main() {
 130@start
 131        # Call the puts function with $str as argument.
 132        %r =w call $puts(l $str)
 133        ret 0
 134}
 135\end{verbatim}
 136
 137If you have read the LLVM language reference, you might recognize the
 138example above. In comparison, QBE makes a much lighter use of types and
 139the syntax is terser.
 140
 141\subsection{Parser Combinators}
 142
 143\ignore{
 144\begin{code}
 145bracesNL :: Parser a -> Parser a
 146bracesNL = between (wsNL $ char '{') (wsNL $ char '}')
 147
 148quoted :: Parser a -> Parser a
 149quoted = let q = char '"' in between q q
 150
 151sepByTrail1 :: Parser a -> Parser sep -> Parser [a]
 152sepByTrail1 p sep = do
 153  x <- p
 154  xs <- many (try $ sep >> p)
 155  _ <- optional sep
 156  return (x:xs)
 157
 158sepByTrail :: Parser a -> Parser sep -> Parser [a]
 159sepByTrail p sep = sepByTrail1 p sep <|> return []
 160
 161parenLst :: Parser a -> Parser [a]
 162parenLst p = between (ws $ char '(') (char ')') inner
 163  where
 164    inner = sepBy (ws p) (ws $ char ',')
 165
 166unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 167unaryInstr conc keyword = do
 168  _ <- ws (string keyword)
 169  conc <$> ws val
 170
 171binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 172binaryInstr conc keyword = do
 173  _ <- ws (string keyword)
 174  vfst <- ws val <* ws (char ',')
 175  conc vfst <$> ws val
 176
 177-- Can only appear in data and type definitions and hence allows newlines.
 178alignAny :: Parser Word64
 179alignAny = (ws1 (string "align")) >> wsNL decNumber
 180
 181-- Returns true if it is signed.
 182signageChar :: Parser Bool
 183signageChar = (char 's' <|> char 'u') <&> (== 's')
 184\end{code}
 185}
 186
 187The original QBE specification defines the syntax using a BNF grammar. In
 188contrast, this document defines it using Parsec parser combinators. As such,
 189this specification is less formal but more accurate as the parsing code is
 190actually executable. Consequently, this specification also captures constructs
 191omitted in the original specification (e.g., \nameref{sec:identifiers}, or
 192\nameref{sec:strlit}). Nonetheless, the formal language recognized by these
 193combinators aims to be equivalent to the one of the BNF grammar.
 194
 195\subsection{Identifiers}
 196\label{sec:identifiers}
 197
 198% Ident is not documented in the original QBE specification.
 199% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304
 200
 201\begin{code}
 202ident :: Parser String
 203ident = do
 204  start <- letter <|> oneOf "._"
 205  rest <- many (alphaNum <|> oneOf "$._")
 206  return $ start : rest
 207\end{code}
 208
 209Identifiers for data, types, and functions can start with any ASCII letter or
 210the special characters \texttt{.} and \texttt{\_}. This initial character can
 211be followed by a sequence of zero or more alphanumeric characters and the
 212special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.
 213
 214\subsection{Sigils}
 215
 216\begin{code}
 217userDef :: Parser Q.UserIdent
 218userDef = Q.UserIdent <$> (char ':' >> ident)
 219
 220global :: Parser Q.GlobalIdent
 221global = Q.GlobalIdent <$> (char '$' >> ident)
 222
 223local :: Parser Q.LocalIdent
 224local = Q.LocalIdent <$> (char '%' >> ident)
 225
 226label :: Parser Q.BlockIdent
 227label = Q.BlockIdent <$> (char '@' >> ident)
 228\end{code}
 229
 230The intermediate language makes heavy use of sigils, all user-defined
 231names are prefixed with a sigil. This is to avoid keyword conflicts, and
 232also to quickly spot the scope and nature of identifiers.
 233
 234\begin{itemize}
 235  \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}
 236  \item \texttt{\$} is for globals (represented by a pointer)
 237  \item \texttt{\%} is for function-scope temporaries
 238  \item \texttt{@@} is for block labels
 239\end{itemize}
 240
 241\subsection{Spacing}
 242
 243\begin{code}
 244blank :: Parser Char
 245blank = oneOf "\t " <?> "blank"
 246
 247blankNL :: Parser Char
 248blankNL = oneOf "\n\t " <?> "blank or newline"
 249\end{code}
 250
 251Individual tokens in IL files must be separated by one or more spacing
 252characters. Both spaces and tabs are recognized as spacing characters.
 253In data and type definitions, newlines may also be used as spaces to
 254prevent overly long lines. When exactly one of two consecutive tokens is
 255a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.
 256
 257\ignore{
 258\begin{code}
 259ws :: Parser a -> Parser a
 260ws p = p <* skipMany blank
 261
 262ws1 :: Parser a -> Parser a
 263ws1 p = p <* skipMany1 blank
 264
 265wsNL :: Parser a -> Parser a
 266wsNL p = p <* skipNoCode (skipMany blankNL)
 267
 268wsNL1 :: Parser a -> Parser a
 269wsNL1 p = p <* skipNoCode (skipMany1 blankNL)
 270
 271-- Only intended to be used to skip comments at the start of a file.
 272skipInitComments :: Parser ()
 273skipInitComments = skipNoCode (skipMany blankNL)
 274\end{code}
 275}
 276
 277\subsection{String Literals}
 278\label{sec:strlit}
 279
 280% The string literal is not documented in the original QBE specification.
 281% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287
 282
 283\begin{code}
 284strLit :: Parser String
 285strLit = concat <$> quoted (many strChr)
 286  where
 287    strChr :: Parser [Char]
 288    strChr = (singleton <$> noneOf "\"\\") <|> escSeq
 289
 290    -- TODO: not documnted in the QBE BNF.
 291    octEsc :: Parser Char
 292    octEsc = do
 293      n <- octNumber
 294      pure $ chr (fromIntegral n)
 295
 296    escSeq :: Parser [Char]
 297    escSeq = try $ do
 298      esc <- char '\\'
 299      (singleton <$> octEsc) <|> (anyChar <&> (\c -> [esc, c]))
 300\end{code}
 301
 302Strings are enclosed by double quotes and are, for example, used to specify a
 303section name as part of the \nameref{sec:linkage} information. Within a string,
 304a double quote can be escaped using a \texttt{\textbackslash} character. All
 305escape sequences, including double quote escaping, are passed through as-is to
 306the generated assembly file.
 307
 308\section{Types}
 309
 310\subsection{Simple Types}
 311
 312The IL makes minimal use of types. By design, the types used are
 313restricted to what is necessary for unambiguous compilation to machine
 314code and C interfacing. Unlike LLVM, QBE is not using types as a means
 315to safety; they are only here for semantic purposes.
 316
 317\begin{code}
 318baseType :: Parser Q.BaseType
 319baseType = choice
 320  [ bind "w" Q.Word
 321  , bind "l" Q.Long
 322  , bind "s" Q.Single
 323  , bind "d" Q.Double ]
 324\end{code}
 325
 326The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}
 327(double), they stand respectively for 32-bit and 64-bit integers, and
 32832-bit and 64-bit floating-point numbers. There are no pointer types
 329available; pointers are typed by an integer type sufficiently wide to
 330represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).
 331Temporaries in the IL can only have a base type.
 332
 333\begin{code}
 334extType :: Parser Q.ExtType
 335extType = (Q.Base <$> baseType)
 336       <|> bind "b" Q.Byte
 337       <|> bind "h" Q.HalfWord
 338\end{code}
 339
 340Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),
 341respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}
 342and \nameref{sec:data} definitions.
 343
 344For C interfacing, the IL also provides user-defined aggregate types as
 345well as signed and unsigned variants of the sub-word extended types.
 346Read more about these types in the \nameref{sec:aggregate-types}
 347and \nameref{sec:functions} sections.
 348
 349\subsection{Subtyping}
 350\label{sec:subtyping}
 351
 352The IL has a minimal subtyping feature, for integer types only. Any
 353value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the
 35432 least significant bits of the word value are used.
 355
 356Make note that it is the opposite of the usual subtyping on integers (in
 357C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value
 358cannot be used in word context. The rationale is that a word can be
 359signed or unsigned, so extending it to a long could be done in two ways,
 360either by zero-extension, or by sign-extension.
 361
 362\subsection{Constants and Vals}
 363\label{sec:constants-and-vals}
 364
 365\begin{code}
 366dynConst :: Parser Q.DynConst
 367dynConst =
 368  (Q.Const <$> constant)
 369    <|> (Q.Thread <$> (key "thread" >> global))
 370    <|> (Q.Extern <$> try (key "extern" >> global))
 371    <|> (Q.ExternThread <$> (key "extern" >> key "thread" >> global))
 372    <?> "dynconst"
 373  where
 374    key s = ws1 $ string s
 375\end{code}
 376
 377Constants come in two kinds: compile-time constants and dynamic
 378constants. Dynamic constants include compile-time constants and other
 379symbol variants that are only known at program-load time or execution
 380time. Consequently, dynamic constants can only occur in function bodies.
 381
 382When the \texttt{extern} keyword prefixes a symbol name, the symbol is
 383accessed indirectly through a table edited by the dynamic linker (e.g.,
 384GOT/PLT). This enables PIE/PIC code generation. When \texttt{extern} is
 385combined with \texttt{thread}, the symbol is accessed using the
 386initial-exec TLS model, suitable for thread-local variables defined in
 387shared objects available at startup time (i.e., not loaded through
 388dlopen).
 389
 390The representation of integers is two's complement.
 391Floating-point numbers are represented using the single-precision and
 392double-precision formats of the IEEE 754 standard.
 393
 394\begin{code}
 395constant :: Parser Q.Const
 396constant =
 397  (Q.Number <$> decNumber)
 398    <|> (Q.SFP <$> sfp)
 399    <|> (Q.DFP <$> dfp)
 400    <|> (Q.Global <$> global)
 401    <?> "const"
 402  where
 403    sfp = string "s_" >> float
 404    dfp = string "d_" >> float
 405\end{code}
 406
 407Constants specify a sequence of bits and are untyped. They are always
 408parsed as 64-bit blobs. Depending on the context surrounding a constant,
 409only some of its bits are used. For example, in the program below, the
 410two variables defined have the same value since the first operand of the
 411subtraction is a word (32-bit) context.
 412
 413\begin{verbatim}
 414%x =w sub -1, 0 %y =w sub 4294967295, 0
 415\end{verbatim}
 416
 417Because specifying floating-point constants by their bits makes the code
 418less readable, syntactic sugar is provided to express them. Standard
 419scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and
 420double precision numbers respectively. Once again, the following example
 421defines twice the same double-precision constant.
 422
 423\begin{verbatim}
 424%x =d add d_0, d_-1
 425%y =d add d_0, -4616189618054758400
 426\end{verbatim}
 427
 428Global symbols can also be used directly as constants; they will be
 429resolved and turned into actual numeric constants by the linker.
 430
 431When the \texttt{thread} keyword prefixes a symbol name, the
 432symbol\textquotesingle s numeric value is resolved at runtime in the
 433thread-local storage.
 434
 435\begin{code}
 436val :: Parser Q.Value
 437val =
 438  (Q.VConst <$> dynConst)
 439    <|> (Q.VLocal <$> local)
 440    <?> "val"
 441\end{code}
 442
 443Vals are used as arguments in regular, phi, and jump instructions within
 444function definitions. They are either constants or function-scope
 445temporaries.
 446
 447\subsection{Linkage}
 448\label{sec:linkage}
 449
 450\begin{code}
 451linkage :: Parser Q.Linkage
 452linkage =
 453  wsNL (bind "export" Q.LExport)
 454    <|> wsNL (bind "thread" Q.LThread)
 455    <|> do
 456      _ <- ws1 $ string "section"
 457      (try secWithFlags) <|> sec
 458  where
 459    sec :: Parser Q.Linkage
 460    sec = wsNL strLit <&> (`Q.LSection` Nothing)
 461
 462    secWithFlags :: Parser Q.Linkage
 463    secWithFlags = do
 464      n <- ws1 strLit
 465      wsNL strLit <&> Q.LSection n . Just
 466\end{code}
 467
 468Function and data definitions (see below) can specify linkage
 469information to be passed to the assembler and eventually to the linker.
 470
 471The \texttt{export} linkage flag marks the defined item as visible outside the
 472current file\textquotesingle s scope. If absent, the symbol can only be
 473referred to locally. Functions compiled by QBE and called from C need to
 474be exported.
 475
 476The \texttt{thread} linkage flag can only qualify data definitions. It mandates
 477that the object defined is stored in thread-local storage. Each time a
 478runtime thread starts, the supporting platform runtime is in charge of
 479making a new copy of the object for the fresh thread. Objects in
 480thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,
 481as specified in the \nameref{sec:constants-and-vals} section.
 482
 483A \texttt{section} flag can be specified to tell the linker to put the defined
 484item in a certain section. The use of the section flag is platform
 485dependent and we refer the user to the documentation of their assembler
 486and linker for relevant information.
 487
 488\begin{verbatim}
 489section ".init_array" data $.init.f = { l $f }
 490\end{verbatim}
 491
 492The section flag can be used to add function pointers to a global
 493initialization list, as depicted above. Note that some platforms provide
 494a BSS section that can be used to minimize the footprint of uniformly
 495zeroed data. When this section is available, QBE will automatically make
 496use of it and no section flag is required.
 497
 498The section and export linkage flags should each appear at most once in
 499a definition. If multiple occurrences are present, QBE is free to use
 500any.
 501
 502\subsection{Definitions}
 503\label{sec:definitions}
 504
 505Definitions are the essential components of an IL file. They can define
 506three types of objects: aggregate types, data, and functions. Aggregate
 507types are never exported and do not compile to any code. Data and
 508function definitions have file scope and are mutually recursive (even
 509across IL files). Their visibility can be controlled using linkage
 510flags.
 511
 512\subsubsection{Aggregate Types}
 513\label{sec:aggregate-types}
 514
 515\begin{code}
 516typeDef :: Parser Q.TypeDef
 517typeDef = do
 518  _ <- wsNL1 (string "type")
 519  i <- wsNL1 userDef
 520  _ <- wsNL1 (char '=')
 521  a <- optionMaybe alignAny
 522  bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a
 523\end{code}
 524
 525Aggregate type definitions start with the \texttt{type} keyword. They have file
 526scope, but types must be defined before being referenced. The inner
 527structure of a type is expressed by a comma-separated list of fields.
 528
 529\begin{code}
 530subType :: Parser Q.SubType
 531subType =
 532  (Q.SExtType <$> extType)
 533    <|> (Q.SUserDef <$> userDef)
 534
 535field :: Parser Q.Field
 536field = do
 537  -- TODO: newline is required if there is a number argument
 538  f <- wsNL subType
 539  s <- ws $ optionMaybe decNumber
 540  pure (f, s)
 541
 542fields :: Bool -> Parser [Q.Field]
 543fields allowEmpty =
 544  (if allowEmpty then sepByTrail else sepByTrail1) field (wsNL $ char ',')
 545\end{code}
 546
 547A field consists of a subtype, either an extended type or a user-defined type,
 548and an optional number expressing the value of this field. In case many items
 549of the same type are sequenced (like in a C array), the shorter array syntax
 550can be used.
 551
 552\begin{code}
 553regularType :: Parser Q.AggType
 554regularType = Q.ARegular <$> fields True
 555\end{code}
 556
 557Three different kinds of aggregate types are presentl ysupported: regular
 558types, union types and opaque types. The fields of regular types will be
 559packed. By default, the alignment of an aggregate type is the maximum alignment
 560of its members. The alignment can be explicitly specified by the programmer.
 561
 562\begin{code}
 563unionType :: Parser Q.AggType
 564unionType = Q.AUnion <$> many1 (wsNL unionType')
 565  where
 566    unionType' :: Parser [Q.Field]
 567    unionType' = bracesNL $ fields False
 568\end{code}
 569
 570Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.
 571
 572\begin{code}
 573opaqueType :: Parser Q.AggType
 574opaqueType = Q.AOpaque <$> wsNL decNumber
 575\end{code}
 576
 577Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.
 578
 579\subsubsection{Data}
 580\label{sec:data}
 581
 582\begin{code}
 583dataDef :: Parser Q.DataDef
 584dataDef = do
 585  link <- many linkage
 586  name <- wsNL1 (string "data") >> wsNL global
 587  _ <- wsNL (char '=')
 588  alignment <- optionMaybe alignAny
 589  bracesNL dataObjs <&> Q.DataDef link name alignment
 590 where
 591    -- TODO: sepByTrail is not documented in the QBE BNF.
 592    dataObjs = sepByTrail dataObj (wsNL $ char ',')
 593\end{code}
 594
 595Data definitions express objects that will be emitted in the compiled
 596file. Their visibility and location in the compiled artifact are
 597controlled with linkage flags described in the \nameref{sec:linkage}
 598section.
 599
 600They define a global identifier (starting with the sigil \texttt{\$}), that
 601will contain a pointer to the object specified by the definition.
 602
 603\begin{code}
 604dataObj :: Parser Q.DataObj
 605dataObj =
 606  (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))
 607    <|> do
 608      t <- wsNL1 extType
 609      i <- many1 (wsNL dataItem)
 610      return $ Q.OItem t i
 611\end{code}
 612
 613Objects are described by a sequence of fields that start with a type
 614letter. This letter can either be an extended type, or the \texttt{z} letter.
 615If the letter used is an extended type, the data item following
 616specifies the bits to be stored in the field.
 617
 618\begin{code}
 619dataItem :: Parser Q.DataItem
 620dataItem =
 621  (Q.DString <$> strLit)
 622    <|> try
 623      ( do
 624          i <- ws global
 625          off <- (ws $ char '+') >> ws decNumber
 626          return $ Q.DSymOff i off
 627      )
 628    <|> (Q.DConst <$> constant)
 629\end{code}
 630
 631Within each object, several items can be defined. When several data items
 632follow a letter, they initialize multiple fields of the same size.
 633
 634\begin{code}
 635allocSize :: Parser Q.AllocSize
 636allocSize =
 637  choice
 638    [ bind "4" Q.AllocWord,
 639      bind "8" Q.AllocLong,
 640      bind "16" Q.AllocLongLong
 641    ]
 642\end{code}
 643
 644The members of a struct will be packed. This means that padding has to
 645be emitted by the frontend when necessary. Alignment of the whole data
 646objects can be manually specified, and when no alignment is provided,
 647the maximum alignment from the platform is used.
 648
 649When the \texttt{z} letter is used the number following indicates the size of
 650the field; the contents of the field are zero initialized. It can be
 651used to add padding between fields or zero-initialize big arrays.
 652
 653\subsubsection{Functions}
 654\label{sec:functions}
 655
 656\begin{code}
 657funcDef :: Parser Q.FuncDef
 658funcDef = do
 659  link <- many linkage
 660  _ <- ws1 (string "function")
 661  retTy <- optionMaybe (ws1 abity)
 662  name <- ws global
 663  args <- wsNL params
 664  body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block
 665
 666  case (insertJumps body) of
 667    Nothing -> fail $ "invalid fallthrough in " ++ show name
 668    Just bl -> return $ Q.FuncDef link name retTy args bl
 669\end{code}
 670
 671Function definitions contain the actual code to emit in the compiled
 672file. They define a global symbol that contains a pointer to the
 673function code. This pointer can be used in \texttt{call} instructions or stored
 674in memory.
 675
 676\begin{code}
 677subWordType :: Parser Q.SubWordType
 678subWordType = choice
 679  [ try $ bind "sb" Q.SignedByte
 680  , try $ bind "ub" Q.UnsignedByte
 681  , bind "sh" Q.SignedHalf
 682  , bind "uh" Q.UnsignedHalf ]
 683
 684abity :: Parser Q.Abity
 685abity = try (Q.ASubWordType <$> subWordType)
 686    <|> (Q.ABase <$> baseType)
 687    <|> (Q.AUserDef <$> userDef)
 688\end{code}
 689
 690The type given right before the function name is the return type of the
 691function. All return values of this function must have this return type.
 692If the return type is missing, the function must not return any value.
 693
 694\begin{code}
 695param :: Parser Q.FuncParam
 696param = (Q.Env <$> (ws1 (string "env") >> local))
 697    <|> (string "..." >> pure Q.Variadic)
 698    <|> do
 699          ty <- ws1 abity
 700          Q.Regular ty <$> local
 701
 702params :: Parser [Q.FuncParam]
 703params = parenLst param
 704\end{code}
 705
 706The parameter list is a comma separated list of temporary names prefixed
 707by types. The types are used to correctly implement C compatibility.
 708When an argument has an aggregate type, a pointer to the aggregate is
 709passed by thea caller. In the example below, we have to use a load
 710instruction to get the value of the first (and only) member of the
 711struct.
 712
 713\begin{verbatim}
 714type :one = { w }
 715
 716function w $getone(:one %p) {
 717@start
 718        %val =w loadw %p
 719        ret %val
 720}
 721\end{verbatim}
 722
 723If a function accepts or returns values that are smaller than a word,
 724such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type
 725must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,
 726respectively, for signed and unsigned 8-bit values, and signed and
 727unsigned 16-bit values. Parameters associated with a sub-word type of
 728bit width N only have their N least significant bits set and have base
 729type \texttt{w}. For example, the function
 730
 731\begin{verbatim}
 732function w $addbyte(w %a, sb %b) {
 733@start
 734        %bw =w extsb %b
 735        %val =w add %a, %bw
 736        ret %val
 737}
 738\end{verbatim}
 739
 740needs to sign-extend its second argument before the addition. Dually,
 741return values with sub-word types do not need to be sign or zero
 742extended.
 743
 744If the parameter list ends with \texttt{...}, the function is a variadic
 745function: it can accept a variable number of arguments. To access the
 746extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}
 747instructions described in the \nameref{sec:variadic} section.
 748
 749Optionally, the parameter list can start with an environment parameter
 750\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,
 751of type \texttt{l}). If the function does not use its environment parameter,
 752callers can safely omit it. This parameter is invisible to a C caller:
 753for example, the function
 754
 755\begin{verbatim}
 756export function w $add(env %e, w %a, w %b) {
 757@start
 758        %c =w add %a, %b
 759        ret %c
 760}
 761\end{verbatim}
 762
 763must be given the C prototype \texttt{int add(int, int)}. The intended use of
 764this feature is to pass the environment pointer of closures while
 765retaining a very good compatibility with C. The \nameref{sec:call}
 766section explains how to pass an environment parameter.
 767
 768Since global symbols are defined mutually recursive, there is no need
 769for function declarations: a function can be referenced before its
 770definition. Similarly, functions from other modules can be used without
 771previous declaration. All the type information necessary to compile a
 772call is in the instruction itself.
 773
 774The syntax and semantics for the body of functions are described in the
 775\nameref{sec:control} section.
 776
 777\section{Control}
 778\label{sec:control}
 779
 780The IL represents programs as textual transcriptions of control flow
 781graphs. The control flow is serialized as a sequence of blocks of
 782straight-line code which are connected using jump instructions.
 783
 784\subsection{Blocks}
 785\label{sec:blocks}
 786
 787\ignore{
 788\begin{code}
 789-- Basic block abstraction with optional exit points. The 'insertJumps'
 790-- function takes care of inserting fallthrough for omitted jumps.
 791data Block'
 792  = Block'
 793  { label' :: Q.BlockIdent,
 794    phi' :: [Q.Phi],
 795    stmt' :: [Q.Statement],
 796    term' :: Maybe Q.JumpInstr
 797  }
 798  deriving (Show, Eq)
 799
 800insertJumps :: [Block'] -> Maybe [Q.Block]
 801insertJumps xs = foldM go [] $ zipWithNext xs
 802  where
 803    zipWithNext :: [a] -> [(a, Maybe a)]
 804    zipWithNext [] = []
 805    zipWithNext lst@(_ : t) = zip lst $ map Just t ++ [Nothing]
 806
 807    fromBlock' :: Block' -> Q.JumpInstr -> Q.Block
 808    fromBlock' (Block' l p s _) = Q.Block l p s
 809
 810    go :: [Q.Block] -> (Block', Maybe Block') -> Maybe [Q.Block]
 811    go acc (x@Block' {term' = Just ji}, _) =
 812      Just (acc ++ [fromBlock' x ji])
 813    go acc (x@Block' {term' = Nothing}, Just nxt) =
 814      Just (acc ++ [fromBlock' x (Q.Jump $ label' nxt)])
 815    go _ (Block' {term' = Nothing}, Nothing) =
 816      Nothing
 817\end{code}
 818}
 819
 820\begin{code}
 821block :: Parser Block'
 822block = do
 823  l <- wsNL1 label
 824  p <- many (wsNL1 $ try phiInstr)
 825  s <- many (wsNL1 statement)
 826  Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr)
 827\end{code}
 828
 829All blocks have a name that is specified by a label at their beginning.
 830Then follows a sequence of instructions that have "fall-through" flow.
 831Finally one jump terminates the block. The jump can either transfer
 832control to another block of the same function or return; jumps are
 833described further below.
 834
 835The first block in a function must not be the target of any jump in the
 836program. If a jump to the function start is needed, the frontend must
 837insert an empty prelude block at the beginning of the function.
 838
 839When one block jumps to the next block in the IL file, it is not
 840necessary to write the jump instruction, it will be automatically added
 841by the parser. For example the start block in the example below jumps
 842directly to the loop block.
 843
 844\subsection{Jumps}
 845\label{sec:jumps}
 846
 847\begin{code}
 848jumpInstr :: Parser Q.JumpInstr
 849jumpInstr = (string "hlt" >> pure Q.Halt)
 850        -- TODO: Return requires a space if there is an optionMaybe
 851        <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)
 852        <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))
 853        <|> do
 854          _ <- ws1 $ string "jnz"
 855          v <- ws val <* ws (char ',')
 856          l1 <- ws label <* ws (char ',')
 857          l2 <- ws label
 858          return $ Q.Jnz v l1 l2
 859\end{code}
 860
 861A jump instruction ends every block and transfers the control to another
 862program location. The target of a jump must never be the first block in
 863a function. The three kinds of jumps available are described in the
 864following list.
 865
 866\begin{enumerate}
 867  \item \textbf{Unconditional jump.} Jumps to another block of the same function.
 868  \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.
 869  \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.
 870  \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.
 871\end{enumerate}
 872
 873\section{Instructions}
 874\label{sec:instructions}
 875
 876\begin{code}
 877instr :: Parser Q.Instr
 878instr =
 879  choice
 880    [ try $ binaryInstr Q.Add "add",
 881      try $ binaryInstr Q.Sub "sub",
 882      try $ binaryInstr Q.Mul "mul",
 883      try $ binaryInstr Q.Div "div",
 884      try $ binaryInstr Q.URem "urem",
 885      try $ binaryInstr Q.Rem "rem",
 886      try $ binaryInstr Q.UDiv "udiv",
 887      try $ binaryInstr Q.Or "or",
 888      try $ binaryInstr Q.Xor "xor",
 889      try $ binaryInstr Q.And "and",
 890      try $ binaryInstr Q.Sar "sar",
 891      try $ binaryInstr Q.Shr "shr",
 892      try $ binaryInstr Q.Shl "shl",
 893      try $ unaryInstr Q.Neg "neg",
 894      try $ unaryInstr Q.Cast "cast",
 895      try $ unaryInstr Q.Copy "copy",
 896      try $ unaryInstr Q.VAArg "vaarg",
 897      try $ loadInstr,
 898      try $ allocInstr,
 899      try $ compareInstr,
 900      try $ extInstr,
 901      try $ truncInstr,
 902      try $ fromFloatInstr,
 903      try $ toFloatInstr
 904    ]
 905\end{code}
 906
 907Instructions are the smallest piece of code in the IL, they form the body of
 908\nameref{sec:blocks}. This specification distinguishes instructions and
 909volatile instructions, the latter do not return a value. For the former, the IL
 910uses a three-address code, which means that one instruction computes an
 911operation between two operands and assigns the result to a third one.
 912
 913\begin{code}
 914assign :: Parser Q.Statement
 915assign = do
 916  n <- ws local
 917  t <- ws (char '=') >> ws1 baseType
 918  Q.Assign n t <$> instr
 919
 920volatileInstr :: Parser Q.Statement
 921volatileInstr =
 922  Q.Volatile <$>
 923    (storeInstr <|> blitInstr <|> vastartInstr <|> dbglocInstr)
 924
 925-- TODO: Not documented in the QBE BNF.
 926statement :: Parser Q.Statement
 927statement = (try callInstr) <|> assign <|> volatileInstr
 928\end{code}
 929
 930An instruction has both a name and a return type, this return type is a base
 931type that defines the size of the instruction's result. The type of the
 932arguments can be unambiguously inferred using the instruction name and the
 933return type. For example, for all arithmetic instructions, the type of the
 934arguments is the same as the return type. The two additions below are valid if
 935\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).
 936
 937\begin{verbatim}
 938%x =w add 0, %y
 939%z =w add %x, %x
 940\end{verbatim}
 941
 942Some instructions, like comparisons and memory loads have operand types
 943that differ from their return types. For instance, two floating points
 944can be compared to give a word result (0 if the comparison succeeds, 1
 945if it fails).
 946
 947\begin{verbatim}
 948%c =w cgts %a, %b
 949\end{verbatim}
 950
 951In the example above, both operands have to have single type. This is
 952made explicit by the instruction suffix.
 953
 954\subsection{Arithmetic and Bits}
 955
 956\begin{quote}
 957\begin{itemize}
 958\item \texttt{add}, \texttt{sub}, \texttt{div}, \texttt{mul}
 959\item \texttt{neg}
 960\item \texttt{udiv}, \texttt{rem}, \texttt{urem}
 961\item \texttt{or}, \texttt{xor}, \texttt{and}
 962\item \texttt{sar}, \texttt{shr}, \texttt{shl}
 963\end{itemize}
 964\end{quote}
 965
 966The base arithmetic instructions in the first bullet are available for
 967all types, integers and floating points.
 968
 969When \texttt{div} is used with word or long return type, the arguments are
 970treated as signed. The unsigned integral division is available as \texttt{udiv}
 971instruction. When the result of a division is not an integer, it is truncated
 972towards zero.
 973
 974The signed and unsigned remainder operations are available as \texttt{rem} and
 975\texttt{urem}. The sign of the remainder is the same as the one of the
 976dividend. Its magnitude is smaller than the divisor one. These two instructions
 977and \texttt{udiv} are only available with integer arguments and result.
 978
 979Bitwise OR, AND, and XOR operations are available for both integer
 980types. Logical operations of typical programming languages can be
 981implemented using \nameref{sec:comparisions} and \nameref{sec:jumps}.
 982
 983Shift instructions \texttt{sar}, \texttt{shr}, and \texttt{shl}, shift right or
 984left their first operand by the amount from the second operand. The shifting
 985amount is taken modulo the size of the result type. Shifting right can either
 986preserve the sign of the value (using \texttt{sar}), or fill the newly freed
 987bits with zeroes (using \texttt{shr}). Shifting left always fills the freed
 988bits with zeroes.
 989
 990Remark that an arithmetic shift right (\texttt{sar}) is only equivalent to a
 991division by a power of two for non-negative numbers. This is because the shift
 992right "truncates" towards minus infinity, while the division truncates towards
 993zero.
 994
 995\subsection{Memory}
 996\label{sec:memory}
 997
 998The following sections discuss instructions for interacting with values stored in memory.
 999
1000\subsubsection{Store instructions}
1001
1002\begin{code}
1003storeInstr :: Parser Q.VolatileInstr
1004storeInstr = do
1005  t <- string "store" >> ws1 extType
1006  v <- ws val
1007  _ <- ws $ char ','
1008  ws val <&> Q.Store t v
1009\end{code}
1010
1011Store instructions exist to store a value of any base type and any extended
1012type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}
1013and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of
1014this word will be stored in memory at the address specified in the second
1015argument.
1016
1017\subsubsection{Load instructions}
1018
1019\begin{code}
1020loadInstr :: Parser Q.Instr
1021loadInstr = do
1022  _ <- string "load"
1023  t <- ws1 $ choice
1024    [ try $ bind "sw" (Q.LBase Q.Word),
1025      try $ bind "uw" (Q.LBase Q.Word),
1026      try $ Q.LSubWord <$> subWordType,
1027      Q.LBase <$> baseType
1028    ]
1029  ws val <&> Q.Load t
1030\end{code}
1031
1032For types smaller than long, two variants of the load instruction are
1033available: one will sign extend the loaded value, while the other will zero
1034extend it. Note that all loads smaller than long can load to either a long or a
1035word.
1036
1037The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect
1038when they are used to define a word temporary. A \texttt{loadw} instruction is
1039provided as syntactic sugar for \texttt{loadsw} to make explicit that the
1040extension mechanism used is irrelevant.
1041
1042\subsubsection{Blits}
1043
1044\begin{code}
1045blitInstr :: Parser Q.VolatileInstr
1046blitInstr = do
1047  v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')
1048  v2 <- ws val <* (ws $ char ',')
1049  nb <- decNumber
1050  return $ Q.Blit v1 v2 nb
1051\end{code}
1052
1053The blit instruction copies in-memory data from its first address argument to
1054its second address argument. The third argument is the number of bytes to copy.
1055The source and destination spans are required to be either non-overlapping, or
1056fully overlapping (source address identical to the destination address). The
1057byte count argument must be a nonnegative numeric constant; it cannot be a
1058temporary.
1059
1060One blit instruction may generate a number of instructions proportional to its
1061byte count argument, consequently, it is recommended to keep this argument
1062relatively small. If large copies are necessary, it is preferable that
1063frontends generate calls to a supporting \texttt{memcpy} function.
1064
1065\subsubsection{Stack Allocation}
1066
1067\begin{code}
1068allocInstr :: Parser Q.Instr
1069allocInstr = do
1070  siz <- (ws $ string "alloc") >> (ws1 allocSize)
1071  val <&> Q.Alloc siz
1072\end{code}
1073
1074These instructions allocate a chunk of memory on the stack. The number ending
1075the instruction name is the alignment required for the allocated slot. QBE will
1076make sure that the returned address is a multiple of that alignment value.
1077
1078Stack allocation instructions are used, for example, when compiling the C local
1079variables, because their address can be taken. When compiling Fortran,
1080temporaries can be used directly instead, because it is illegal to take the
1081address of a variable.
1082
1083\subsection{Comparisons}
1084\label{sec:comparisions}
1085
1086\begin{code}
1087compareInstr :: Parser Q.Instr
1088compareInstr = do
1089  _ <- char 'c'
1090  (try intCompare) <|> floatCompare
1091
1092compareArgs :: Parser (Q.Value, Q.Value)
1093compareArgs = do
1094  lhs <- ws val <* ws (char ',')
1095  rhs <- ws val
1096  pure (lhs, rhs)
1097
1098intCompare :: Parser Q.Instr
1099intCompare = do
1100  op <- compareIntOp
1101  ty <- ws1 intArg
1102
1103  (lhs, rhs) <- compareArgs
1104  pure $ Q.CompareInt ty op lhs rhs
1105
1106floatCompare :: Parser Q.Instr
1107floatCompare = do
1108  op <- compareFloatOp
1109  ty <- ws1 floatArg
1110
1111  (lhs, rhs) <- compareArgs
1112  pure $ Q.CompareFloat ty op lhs rhs
1113\end{code}
1114
1115Comparison instructions return an integer value (either a word or a long), and
1116compare values of arbitrary types. The returned value is 1 if the two operands
1117satisfy the comparison relation, or 0 otherwise. The names of comparisons
1118respect a standard naming scheme in three parts:
1119
1120\begin{enumerate}
1121  \item All comparisons start with the letter \texttt{c}.
1122  \item Then comes a comparison type.
1123  \item Finally, the instruction name is terminated with a basic type suffix precising the type of the operands to be compared.
1124\end{enumerate}
1125
1126The following instruction are available for integer comparisons:
1127
1128\begin{code}
1129compareIntOp :: Parser Q.IntCmpOp
1130compareIntOp = choice
1131  [ bind "eq" Q.IEq
1132  , bind "ne" Q.INe
1133  , try $ bind "sle" Q.ISle
1134  , try $ bind "slt" Q.ISlt
1135  , try $ bind "sge" Q.ISge
1136  , try $ bind "sgt" Q.ISgt
1137  , try $ bind "ule" Q.IUle
1138  , try $ bind "ult" Q.IUlt
1139  , try $ bind "uge" Q.IUge
1140  , try $ bind "ugt" Q.IUgt ]
1141\end{code}
1142
1143For floating point comparisons use one of these instructions:
1144
1145\begin{code}
1146compareFloatOp :: Parser Q.FloatCmpOp
1147compareFloatOp = choice
1148  [ bind "eq" Q.FEq
1149  , bind "ne" Q.FNe
1150  , try $ bind "le" Q.FLe
1151  , bind "lt" Q.FLt
1152  , try $ bind "ge" Q.FGe
1153  , bind "gt" Q.FGt
1154  , bind "o" Q.FOrd
1155  , bind "uo" Q.FUnord ]
1156\end{code}
1157
1158For example, \texttt{cod} compares two double-precision floating point numbers
1159and returns 1 if the two floating points are not NaNs, or 0 otherwise. The
1160\texttt{csltw} instruction compares two words representing signed numbers and
1161returns 1 when the first argument is smaller than the second one.
1162
1163\subsection{Conversions}
1164
1165Conversion operations change the representation of a value, possibly modifying
1166it if the target type cannot hold the value of the source type. Conversions can
1167extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or
1168convert a floating point into an integer and vice versa.
1169
1170\begin{code}
1171extInstr :: Parser Q.Instr
1172extInstr = do
1173  _ <- string "ext"
1174  ty <- ws1 extArg
1175  ws val <&> Q.Ext ty
1176 where
1177  extArg :: Parser Q.ExtArg
1178  extArg = try (Q.ExtSubWord <$> subWordType)
1179    <|> try (bind "sw" Q.ExtSignedWord)
1180    <|> bind "s" Q.ExtSingle
1181    <|> bind "uw" Q.ExtUnsignedWord
1182\end{code}
1183
1184Extending the precision of a temporary is done using the \texttt{ext} family of
1185instructions. Because QBE types do not specify the signedness (like in LLVM),
1186extension instructions exist to sign-extend and zero-extend a value. For
1187example, \texttt{extsb} takes a word argument and sign-extends the 8
1188least-significant bits to a full word or long, depending on the return type.
1189
1190\begin{code}
1191truncInstr :: Parser Q.Instr
1192truncInstr = do
1193  _ <- ws1 $ string "truncd"
1194  ws val <&> Q.TruncDouble
1195\end{code}
1196
1197The instructions \texttt{exts} (extend single) and \texttt{truncd} (truncate
1198double) are provided to change the precision of a floating point value. When
1199the double argument of truncd cannot be represented as a single-precision
1200floating point, it is truncated towards zero.
1201
1202\begin{code}
1203floatArg :: Parser Q.FloatArg
1204floatArg = bind "d" Q.FDouble <|> bind "s" Q.FSingle
1205
1206fromFloatInstr :: Parser Q.Instr
1207fromFloatInstr = do
1208  arg <- floatArg <* string "to"
1209  isSigned <- signageChar
1210  _ <- ws1 $ char 'i'
1211  ws val <&> Q.FloatToInt arg isSigned
1212
1213intArg :: Parser Q.IntArg
1214intArg = bind "w" Q.IWord <|> bind "l" Q.ILong
1215
1216toFloatInstr :: Parser Q.Instr
1217toFloatInstr = do
1218  isSigned <- signageChar
1219  arg <- intArg
1220  _ <- ws1 $ string "tof"
1221  ws val <&> Q.IntToFloat arg isSigned
1222\end{code}
1223
1224Converting between signed integers and floating points is done using
1225\texttt{stosi} (single to signed integer), \texttt{stoui} (single to unsigned
1226integer), \texttt{dtosi} (double to signed integer), \texttt{dtoui} (double to
1227unsigned integer), \texttt{swtof} (signed word to float), \texttt{uwtof}
1228(unsigned word to float), \texttt{sltof} (signed long to float) and
1229\texttt{ultof} (unsigned long to float).
1230
1231\subsection{Cast and Copy}
1232
1233The \texttt{cast} and \texttt{copy} instructions return the bits of their
1234argument verbatim. However a cast will change an integer into a floating point
1235of the same width and vice versa.
1236
1237Casts can be used to make bitwise operations on the representation of floating
1238point numbers. For example the following program will compute the opposite of
1239the single-precision floating point number \texttt{\%f} into \texttt{\%rs}.
1240
1241\begin{verbatim}
1242%b0 =w cast %f
1243%b1 =w xor 2147483648, %b0  # flip the msb
1244%rs =s cast %b1
1245\end{verbatim}
1246
1247\subsection{Call}
1248\label{sec:call}
1249
1250\begin{code}
1251-- TODO: Code duplication with 'param'.
1252callArg :: Parser Q.FuncArg
1253callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))
1254    <|> (string "..." >> pure Q.ArgVar)
1255    <|> do
1256          ty <- ws1 abity
1257          Q.ArgReg ty <$> val
1258
1259callArgs :: Parser [Q.FuncArg]
1260callArgs = parenLst callArg
1261
1262callInstr :: Parser Q.Statement
1263callInstr = do
1264  retValue <- optionMaybe $ do
1265    i <- ws local <* ws (char '=')
1266    a <- ws1 abity
1267    return (i, a)
1268  toCall <- ws1 (string "call") >> ws val
1269  fnArgs <- callArgs
1270  return $ Q.Call retValue toCall fnArgs
1271\end{code}
1272
1273The call instruction is special in several ways. It is not a three-address
1274instruction and requires the type of all its arguments to be given. Also, the
1275return type can be either a base type or an aggregate type. These specifics are
1276required to compile calls with C compatibility (i.e., to respect the ABI).
1277
1278When an aggregate type is used as argument type or return type, the value
1279respectively passed or returned needs to be a pointer to a memory location
1280holding the value. This is because aggregate types are not first-class
1281citizens of the IL.
1282
1283Sub-word types are used for arguments and return values of width less than a
1284word. Details on these types are presented in the \nameref{sec:functions} section.
1285Arguments with sub-word types need not be sign or zero extended according to
1286their type. Calls with a sub-word return type define a temporary of base type
1287\texttt{w} with its most significant bits unspecified.
1288
1289Unless the called function does not return a value, a return temporary must be
1290specified, even if it is never used afterwards.
1291
1292An environment parameter can be passed as first argument using the \texttt{env}
1293keyword. The passed value must be a 64-bit integer. If the called function does
1294not expect an environment parameter, it will be safely discarded. See the
1295\nameref{sec:functions} section for more information about environment
1296parameters.
1297
1298When the called function is variadic, there must be a \texttt{...} marker
1299separating the named and variadic arguments.
1300
1301\subsection{Variadic}
1302\label{sec:variadic}
1303
1304\begin{code}
1305vastartInstr :: Parser Q.VolatileInstr
1306vastartInstr = do
1307  _ <- ws1 (string "vastart")
1308  Q.VAStart <$> ws val
1309\end{code}
1310
1311The \texttt{vastart} and \texttt{vaarg} instructions provide a portable way to
1312access the extra parameters of a variadic function.
1313
1314\begin{enumerate}
1315  \item \texttt{vastart} -- \texttt{(m)}
1316  \item \texttt{vaarg} -- \texttt{T(mmmm)}
1317\end{enumerate}
1318
1319The \texttt{vastart} instruction initializes a variable argument list used to
1320access the extra parameters of the enclosing variadic function. It is safe to
1321call it multiple times.
1322
1323The \texttt{vaarg} instruction fetches the next argument from a variable
1324argument list. It is currently limited to fetching arguments that have a base
1325type. This instruction is essentially effectful: calling it twice in a row will
1326return two consecutive arguments from the argument list.
1327
1328Both instructions take a pointer to a variable argument list as the sole argument.
1329The size and alignment of the variable argument lists depends on the target used.
1330
1331\subsection{Phi}
1332
1333\begin{code}
1334phiBranch :: Parser (Q.BlockIdent, Q.Value)
1335phiBranch = do
1336  n <- ws1 label
1337  v <- val
1338  pure (n, v)
1339
1340phiInstr :: Parser Q.Phi
1341phiInstr = do
1342  -- TODO: code duplication with 'assign'
1343  n <- ws local
1344  t <- ws (char '=') >> ws1 baseType
1345
1346  _ <- ws1 (string "phi")
1347  -- TODO: combinator for sepBy
1348  p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')
1349  return $ Q.Phi n t p
1350\end{code}
1351
1352First and foremost, phi instructions are NOT necessary when writing a frontend
1353to QBE. One solution to avoid having to deal with SSA form is to use stack
1354allocated variables for all source program variables and perform assignments
1355and lookups using \nameref{sec:memory} operations. This is what LLVM users
1356typically do.
1357
1358Another solution is to simply emit code that is not in SSA form! Contrary to
1359LLVM, QBE is able to fixup programs not in SSA form without requiring the
1360boilerplate of loading and storing in memory. For example, the following
1361program will be correctly compiled by QBE.
1362
1363\begin{verbatim}
1364@start
1365    %x =w copy 100
1366    %s =w copy 0
1367@loop
1368    %s =w add %s, %x
1369    %x =w sub %x, 1
1370    jnz %x, @loop, @end
1371@end
1372    ret %s
1373\end{verbatim}
1374
1375Now, if you want to know what phi instructions are and how to use them in QBE,
1376you can read the following.
1377
1378Phi instructions are specific to SSA form. In SSA form values can only be
1379assigned once, without phi instructions, this requirement is too strong to
1380represent many programs. For example consider the following C program.
1381
1382\begin{verbatim}
1383int f(int x) {
1384    int y;
1385    if (x)
1386        y = 1;
1387    else
1388        y = 2;
1389    return y;
1390}
1391\end{verbatim}
1392
1393The variable \texttt{y} is assigned twice, the solution to translate it in SSA
1394form is to insert a phi instruction.
1395
1396\begin{verbatim}
1397@ifstmt
1398    jnz %x, @ift, @iff
1399@ift
1400    jmp @retstmt
1401@iff
1402    jmp @retstmt
1403@retstmt
1404    %y =w phi @ift 1, @iff 2
1405    ret %y
1406\end{verbatim}
1407
1408Phi instructions return one of their arguments depending on where the control
1409came from. In the example, \texttt{\%y} is set to 1 if the
1410\texttt{\textbackslash{}ift} branch is taken, or it is set to 2 otherwise.
1411
1412An important remark about phi instructions is that QBE assumes that if a
1413variable is defined by a phi it respects all the SSA invariants. So it is
1414critical to not use phi instructions unless you know exactly what you are
1415doing.
1416
1417\subsection{Debug Information}
1418
1419QBE supports the inclusion of debug information. Specifically, it allows
1420defining from which source file type, data, and function definitions originated.
1421For this purpose, it provides the \texttt{dbgfile} definition, which receives a
1422file name (string literal) as its sole argument. Every type, data and function
1423definition thereafter are assumed to originate in this file.
1424
1425\begin{code}
1426-- TODO: not documnted in the QBE BNF.
1427fileDef :: Parser String
1428fileDef = do
1429  _ <- ws1 $ string "dbgfile"
1430  wsNL1 strLit
1431\end{code}
1432
1433Further, instructions within a function can be associated with a specific line
1434and column number of a previously defined \texttt{dbgfile}. The
1435\texttt{dbgfile} is referenced by index using the first argument to
1436\texttt{dbgloc}. The second argument represents the line number, the third
1437(optional) argument the column number.
1438
1439\begin{code}
1440-- TODO: not documnted in the QBE BNF.
1441dbglocInstr :: Parser Q.VolatileInstr
1442dbglocInstr = do
1443  _ <- ws1 $ string "dbgloc"
1444  file <- ws decNumber <* ws (char ',')
1445  line <- ws decNumber
1446  col  <- optionMaybe (ws (char ',') >> ws decNumber)
1447  return $ Q.DBGLoc file line col
1448\end{code}
1449
1450\end{document}