quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

   1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>
   2% SPDX-FileCopyrightText: 2025-2026 Sören Tempel <soeren+git@soeren-tempel.net>
   3%
   4% SPDX-License-Identifier: MIT AND GPL-3.0-only
   5
   6\documentclass{article}
   7%include polycode.fmt
   8
   9%subst blankline = "\\[5mm]"
  10
  11% See https://github.com/kosmikus/lhs2tex/issues/58
  12%format <$> = "\mathbin{\langle\$\rangle}"
  13%format <&> = "\mathbin{\langle\&\rangle}"
  14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"
  15%format <?> = "\mathbin{\langle?\rangle}"
  16%format <*> = "\mathbin{\langle*\rangle}"
  17%format <*  = "\mathbin{\langle*}"
  18%format *>  = "\mathbin{*\rangle}"
  19
  20\long\def\ignore#1{}
  21
  22\usepackage{hyperref}
  23\hypersetup{
  24	colorlinks = true,
  25}
  26
  27\begin{document}
  28
  29\title{QBE Intermediate Language\vspace{-2em}}
  30\date{}
  31\maketitle
  32\frenchspacing
  33
  34\ignore{
  35\begin{code}
  36module Language.QBE.Parser
  37  ( skipInitComments,
  38    dataDef,
  39    typeDef,
  40    funcDef,
  41    fileDef
  42  )
  43where
  44
  45import Data.Char (chr)
  46import Data.Word (Word64)
  47import Data.Functor ((<&>))
  48import Data.List (singleton)
  49import Data.Map qualified as Map
  50import qualified Language.QBE.Types as Q
  51import Language.QBE.Util (bind, decNumber, octNumber, float)
  52import Text.ParserCombinators.Parsec
  53  ( Parser,
  54    alphaNum,
  55    anyChar,
  56    between,
  57    char,
  58    choice,
  59    letter,
  60    many,
  61    many1,
  62    manyTill,
  63    newline,
  64    noneOf,
  65    oneOf,
  66    optional,
  67    optionMaybe,
  68    sepBy,
  69    sepBy1,
  70    skipMany,
  71    skipMany1,
  72    string,
  73    try,
  74    (<?>),
  75    (<|>),
  76  )
  77\end{code}
  78}
  79
  80This an executable description of the
  81\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},
  82specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}
  83parser combinators and generated from a literate Haskell file. The description
  84is derived from the original QBE IL documentation, licensed under MIT.
  85Presently, this implementation targets version 1.2 of the QBE intermediate
  86language and aims to be equivalent with the original specification.
  87
  88\section{Basic Concepts}
  89
  90The intermediate language (IL) is a higher-level language than the
  91machine's assembly language. It smoothes most of the
  92irregularities of the underlying hardware and allows an infinite number
  93of temporaries to be used. This higher abstraction level lets frontend
  94programmers focus on language design issues.
  95
  96\subsection{Input Files}
  97
  98The intermediate language is provided to QBE as text. Usually, one file
  99is generated per each compilation unit from the frontend input language.
 100An IL file is a sequence of \nameref{sec:definitions} for
 101data, functions, and types. Once processed by QBE, the resulting file
 102can be assembled and linked using a standard toolchain (e.g., GNU
 103binutils).
 104
 105\begin{code}
 106comment :: Parser ()
 107comment = skipMany blankNL >> comment' >> skipMany blankNL
 108  where
 109    comment' = char '#' >> manyTill anyChar newline
 110\end{code}
 111
 112\ignore{
 113\begin{code}
 114skipNoCode :: Parser () -> Parser ()
 115skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP
 116\end{code}
 117}
 118
 119Here is a complete "Hello World" IL file which defines a function that
 120prints to the screen. Since the string is not a first class object (only
 121the pointer is) it is defined outside the function\textquotesingle s
 122body. Comments start with a \# character and finish with the end of the
 123line.
 124
 125\begin{verbatim}
 126data $str = { b "hello world", b 0 }
 127
 128export function w $main() {
 129@start
 130        # Call the puts function with $str as argument.
 131        %r =w call $puts(l $str)
 132        ret 0
 133}
 134\end{verbatim}
 135
 136If you have read the LLVM language reference, you might recognize the
 137example above. In comparison, QBE makes a much lighter use of types and
 138the syntax is terser.
 139
 140\subsection{Parser Combinators}
 141
 142\ignore{
 143\begin{code}
 144bracesNL :: Parser a -> Parser a
 145bracesNL = between (wsNL $ char '{') (wsNL $ char '}')
 146
 147quoted :: Parser a -> Parser a
 148quoted = let q = char '"' in between q q
 149
 150sepByTrail1 :: Parser a -> Parser sep -> Parser [a]
 151sepByTrail1 p sep = do
 152  x <- p
 153  xs <- many (try $ sep >> p)
 154  _ <- optional sep
 155  return (x:xs)
 156
 157sepByTrail :: Parser a -> Parser sep -> Parser [a]
 158sepByTrail p sep = sepByTrail1 p sep <|> return []
 159
 160parenLst :: Parser a -> Parser [a]
 161parenLst p = between (ws $ char '(') (char ')') inner
 162  where
 163    inner = sepBy (ws p) (ws $ char ',')
 164
 165unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 166unaryInstr conc keyword = do
 167  _ <- ws (string keyword)
 168  conc <$> ws val
 169
 170binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 171binaryInstr conc keyword = do
 172  _ <- ws (string keyword)
 173  vfst <- ws val <* ws (char ',')
 174  conc vfst <$> ws val
 175
 176-- Can only appear in data and type definitions and hence allows newlines.
 177alignAny :: Parser Word64
 178alignAny = (ws1 (string "align")) >> wsNL decNumber
 179
 180-- Returns true if it is signed.
 181signageChar :: Parser Bool
 182signageChar = (char 's' <|> char 'u') <&> (== 's')
 183\end{code}
 184}
 185
 186The original QBE specification defines the syntax using a BNF grammar. In
 187contrast, this document defines it using Parsec parser combinators. As such,
 188this specification is less formal but more accurate as the parsing code is
 189actually executable. Consequently, this specification also captures constructs
 190omitted in the original specification (e.g., \nameref{sec:identifiers}, or
 191\nameref{sec:strlit}). Nonetheless, the formal language recognized by these
 192combinators aims to be equivalent to the one of the BNF grammar.
 193
 194\subsection{Identifiers}
 195\label{sec:identifiers}
 196
 197% Ident is not documented in the original QBE specification.
 198% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304
 199
 200\begin{code}
 201ident :: Parser String
 202ident = do
 203  start <- letter <|> oneOf "._"
 204  rest <- many (alphaNum <|> oneOf "$._")
 205  return $ start : rest
 206\end{code}
 207
 208Identifiers for data, types, and functions can start with any ASCII letter or
 209the special characters \texttt{.} and \texttt{\_}. This initial character can
 210be followed by a sequence of zero or more alphanumeric characters and the
 211special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.
 212
 213\subsection{Sigils}
 214
 215\begin{code}
 216userDef :: Parser Q.UserIdent
 217userDef = Q.UserIdent <$> (char ':' >> ident)
 218
 219global :: Parser Q.GlobalIdent
 220global = Q.GlobalIdent <$> (char '$' >> ident)
 221
 222local :: Parser Q.LocalIdent
 223local = Q.LocalIdent <$> (char '%' >> ident)
 224
 225label :: Parser Q.BlockIdent
 226label = Q.BlockIdent <$> (char '@' >> ident)
 227\end{code}
 228
 229The intermediate language makes heavy use of sigils, all user-defined
 230names are prefixed with a sigil. This is to avoid keyword conflicts, and
 231also to quickly spot the scope and nature of identifiers.
 232
 233\begin{itemize}
 234  \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}
 235  \item \texttt{\$} is for globals (represented by a pointer)
 236  \item \texttt{\%} is for function-scope temporaries
 237  \item \texttt{@@} is for block labels
 238\end{itemize}
 239
 240\subsection{Spacing}
 241
 242\begin{code}
 243blank :: Parser Char
 244blank = oneOf "\t " <?> "blank"
 245
 246blankNL :: Parser Char
 247blankNL = oneOf "\n\t " <?> "blank or newline"
 248\end{code}
 249
 250Individual tokens in IL files must be separated by one or more spacing
 251characters. Both spaces and tabs are recognized as spacing characters.
 252In data and type definitions, newlines may also be used as spaces to
 253prevent overly long lines. When exactly one of two consecutive tokens is
 254a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.
 255
 256\ignore{
 257\begin{code}
 258ws :: Parser a -> Parser a
 259ws p = p <* skipMany blank
 260
 261ws1 :: Parser a -> Parser a
 262ws1 p = p <* skipMany1 blank
 263
 264wsNL :: Parser a -> Parser a
 265wsNL p = p <* skipNoCode (skipMany blankNL)
 266
 267wsNL1 :: Parser a -> Parser a
 268wsNL1 p = p <* skipNoCode (skipMany1 blankNL)
 269
 270-- Only intended to be used to skip comments at the start of a file.
 271skipInitComments :: Parser ()
 272skipInitComments = skipNoCode (skipMany blankNL)
 273\end{code}
 274}
 275
 276\subsection{String Literals}
 277\label{sec:strlit}
 278
 279% The string literal is not documented in the original QBE specification.
 280% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287
 281
 282\begin{code}
 283strLit :: Parser String
 284strLit = concat <$> quoted (many strChr)
 285  where
 286    strChr :: Parser [Char]
 287    strChr = (singleton <$> noneOf "\"\\") <|> escSeq
 288
 289    -- TODO: not documnted in the QBE BNF.
 290    octEsc :: Parser Char
 291    octEsc = do
 292      n <- octNumber
 293      pure $ chr (fromIntegral n)
 294
 295    escSeq :: Parser [Char]
 296    escSeq = try $ do
 297      esc <- char '\\'
 298      (singleton <$> octEsc) <|> (anyChar <&> (\c -> [esc, c]))
 299\end{code}
 300
 301Strings are enclosed by double quotes and are, for example, used to specify a
 302section name as part of the \nameref{sec:linkage} information. Within a string,
 303a double quote can be escaped using a \texttt{\textbackslash} character. All
 304escape sequences, including double quote escaping, are passed through as-is to
 305the generated assembly file.
 306
 307\section{Types}
 308
 309\subsection{Simple Types}
 310
 311The IL makes minimal use of types. By design, the types used are
 312restricted to what is necessary for unambiguous compilation to machine
 313code and C interfacing. Unlike LLVM, QBE is not using types as a means
 314to safety; they are only here for semantic purposes.
 315
 316\begin{code}
 317baseType :: Parser Q.BaseType
 318baseType = choice
 319  [ bind "w" Q.Word
 320  , bind "l" Q.Long
 321  , bind "s" Q.Single
 322  , bind "d" Q.Double ]
 323\end{code}
 324
 325The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}
 326(double), they stand respectively for 32-bit and 64-bit integers, and
 32732-bit and 64-bit floating-point numbers. There are no pointer types
 328available; pointers are typed by an integer type sufficiently wide to
 329represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).
 330Temporaries in the IL can only have a base type.
 331
 332\begin{code}
 333extType :: Parser Q.ExtType
 334extType = (Q.Base <$> baseType)
 335       <|> bind "b" Q.Byte
 336       <|> bind "h" Q.HalfWord
 337\end{code}
 338
 339Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),
 340respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}
 341and \nameref{sec:data} definitions.
 342
 343For C interfacing, the IL also provides user-defined aggregate types as
 344well as signed and unsigned variants of the sub-word extended types.
 345Read more about these types in the \nameref{sec:aggregate-types}
 346and \nameref{sec:functions} sections.
 347
 348\subsection{Subtyping}
 349\label{sec:subtyping}
 350
 351The IL has a minimal subtyping feature, for integer types only. Any
 352value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the
 35332 least significant bits of the word value are used.
 354
 355Make note that it is the opposite of the usual subtyping on integers (in
 356C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value
 357cannot be used in word context. The rationale is that a word can be
 358signed or unsigned, so extending it to a long could be done in two ways,
 359either by zero-extension, or by sign-extension.
 360
 361\subsection{Constants and Vals}
 362\label{sec:constants-and-vals}
 363
 364\begin{code}
 365dynConst :: Parser Q.DynConst
 366dynConst =
 367  (Q.Const <$> constant)
 368    <|> (Q.Thread <$> global)
 369    <?> "dynconst"
 370\end{code}
 371
 372Constants come in two kinds: compile-time constants and dynamic
 373constants. Dynamic constants include compile-time constants and other
 374symbol variants that are only known at program-load time or execution
 375time. Consequently, dynamic constants can only occur in function bodies.
 376
 377The representation of integers is two's complement.
 378Floating-point numbers are represented using the single-precision and
 379double-precision formats of the IEEE 754 standard.
 380
 381\begin{code}
 382constant :: Parser Q.Const
 383constant =
 384  (Q.Number <$> decNumber)
 385    <|> (Q.SFP <$> sfp)
 386    <|> (Q.DFP <$> dfp)
 387    <|> (Q.Global <$> global)
 388    <?> "const"
 389  where
 390    sfp = string "s_" >> float
 391    dfp = string "d_" >> float
 392\end{code}
 393
 394Constants specify a sequence of bits and are untyped. They are always
 395parsed as 64-bit blobs. Depending on the context surrounding a constant,
 396only some of its bits are used. For example, in the program below, the
 397two variables defined have the same value since the first operand of the
 398subtraction is a word (32-bit) context.
 399
 400\begin{verbatim}
 401%x =w sub -1, 0 %y =w sub 4294967295, 0
 402\end{verbatim}
 403
 404Because specifying floating-point constants by their bits makes the code
 405less readable, syntactic sugar is provided to express them. Standard
 406scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and
 407double precision numbers respectively. Once again, the following example
 408defines twice the same double-precision constant.
 409
 410\begin{verbatim}
 411%x =d add d_0, d_-1
 412%y =d add d_0, -4616189618054758400
 413\end{verbatim}
 414
 415Global symbols can also be used directly as constants; they will be
 416resolved and turned into actual numeric constants by the linker.
 417
 418When the \texttt{thread} keyword prefixes a symbol name, the
 419symbol\textquotesingle s numeric value is resolved at runtime in the
 420thread-local storage.
 421
 422\begin{code}
 423val :: Parser Q.Value
 424val =
 425  (Q.VConst <$> dynConst)
 426    <|> (Q.VLocal <$> local)
 427    <?> "val"
 428\end{code}
 429
 430Vals are used as arguments in regular, phi, and jump instructions within
 431function definitions. They are either constants or function-scope
 432temporaries.
 433
 434\subsection{Linkage}
 435\label{sec:linkage}
 436
 437\begin{code}
 438linkage :: Parser Q.Linkage
 439linkage =
 440  wsNL (bind "export" Q.LExport)
 441    <|> wsNL (bind "thread" Q.LThread)
 442    <|> do
 443      _ <- ws1 $ string "section"
 444      (try secWithFlags) <|> sec
 445  where
 446    sec :: Parser Q.Linkage
 447    sec = wsNL strLit <&> (`Q.LSection` Nothing)
 448
 449    secWithFlags :: Parser Q.Linkage
 450    secWithFlags = do
 451      n <- ws1 strLit
 452      wsNL strLit <&> Q.LSection n . Just
 453\end{code}
 454
 455Function and data definitions (see below) can specify linkage
 456information to be passed to the assembler and eventually to the linker.
 457
 458The \texttt{export} linkage flag marks the defined item as visible outside the
 459current file\textquotesingle s scope. If absent, the symbol can only be
 460referred to locally. Functions compiled by QBE and called from C need to
 461be exported.
 462
 463The \texttt{thread} linkage flag can only qualify data definitions. It mandates
 464that the object defined is stored in thread-local storage. Each time a
 465runtime thread starts, the supporting platform runtime is in charge of
 466making a new copy of the object for the fresh thread. Objects in
 467thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,
 468as specified in the \nameref{sec:constants-and-vals} section.
 469
 470A \texttt{section} flag can be specified to tell the linker to put the defined
 471item in a certain section. The use of the section flag is platform
 472dependent and we refer the user to the documentation of their assembler
 473and linker for relevant information.
 474
 475\begin{verbatim}
 476section ".init_array" data $.init.f = { l $f }
 477\end{verbatim}
 478
 479The section flag can be used to add function pointers to a global
 480initialization list, as depicted above. Note that some platforms provide
 481a BSS section that can be used to minimize the footprint of uniformly
 482zeroed data. When this section is available, QBE will automatically make
 483use of it and no section flag is required.
 484
 485The section and export linkage flags should each appear at most once in
 486a definition. If multiple occurrences are present, QBE is free to use
 487any.
 488
 489\subsection{Definitions}
 490\label{sec:definitions}
 491
 492Definitions are the essential components of an IL file. They can define
 493three types of objects: aggregate types, data, and functions. Aggregate
 494types are never exported and do not compile to any code. Data and
 495function definitions have file scope and are mutually recursive (even
 496across IL files). Their visibility can be controlled using linkage
 497flags.
 498
 499\subsubsection{Aggregate Types}
 500\label{sec:aggregate-types}
 501
 502\begin{code}
 503typeDef :: Parser Q.TypeDef
 504typeDef = do
 505  _ <- wsNL1 (string "type")
 506  i <- wsNL1 userDef
 507  _ <- wsNL1 (char '=')
 508  a <- optionMaybe alignAny
 509  bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a
 510\end{code}
 511
 512Aggregate type definitions start with the \texttt{type} keyword. They have file
 513scope, but types must be defined before being referenced. The inner
 514structure of a type is expressed by a comma-separated list of fields.
 515
 516\begin{code}
 517subType :: Parser Q.SubType
 518subType =
 519  (Q.SExtType <$> extType)
 520    <|> (Q.SUserDef <$> userDef)
 521
 522field :: Parser Q.Field
 523field = do
 524  -- TODO: newline is required if there is a number argument
 525  f <- wsNL subType
 526  s <- ws $ optionMaybe decNumber
 527  pure (f, s)
 528
 529fields :: Bool -> Parser [Q.Field]
 530fields allowEmpty =
 531  (if allowEmpty then sepByTrail else sepByTrail1) field (wsNL $ char ',')
 532\end{code}
 533
 534A field consists of a subtype, either an extended type or a user-defined type,
 535and an optional number expressing the value of this field. In case many items
 536of the same type are sequenced (like in a C array), the shorter array syntax
 537can be used.
 538
 539\begin{code}
 540regularType :: Parser Q.AggType
 541regularType = Q.ARegular <$> fields True
 542\end{code}
 543
 544Three different kinds of aggregate types are presentl ysupported: regular
 545types, union types and opaque types. The fields of regular types will be
 546packed. By default, the alignment of an aggregate type is the maximum alignment
 547of its members. The alignment can be explicitly specified by the programmer.
 548
 549\begin{code}
 550unionType :: Parser Q.AggType
 551unionType = Q.AUnion <$> many1 (wsNL unionType')
 552  where
 553    unionType' :: Parser [Q.Field]
 554    unionType' = bracesNL $ fields False
 555\end{code}
 556
 557Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.
 558
 559\begin{code}
 560opaqueType :: Parser Q.AggType
 561opaqueType = Q.AOpaque <$> wsNL decNumber
 562\end{code}
 563
 564Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.
 565
 566\subsubsection{Data}
 567\label{sec:data}
 568
 569\begin{code}
 570dataDef :: Parser Q.DataDef
 571dataDef = do
 572  link <- many linkage
 573  name <- wsNL1 (string "data") >> wsNL global
 574  _ <- wsNL (char '=')
 575  alignment <- optionMaybe alignAny
 576  bracesNL dataObjs <&> Q.DataDef link name alignment
 577 where
 578    -- TODO: sepByTrail is not documented in the QBE BNF.
 579    dataObjs = sepByTrail dataObj (wsNL $ char ',')
 580\end{code}
 581
 582Data definitions express objects that will be emitted in the compiled
 583file. Their visibility and location in the compiled artifact are
 584controlled with linkage flags described in the \nameref{sec:linkage}
 585section.
 586
 587They define a global identifier (starting with the sigil \texttt{\$}), that
 588will contain a pointer to the object specified by the definition.
 589
 590\begin{code}
 591dataObj :: Parser Q.DataObj
 592dataObj =
 593  (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))
 594    <|> do
 595      t <- wsNL1 extType
 596      i <- many1 (wsNL dataItem)
 597      return $ Q.OItem t i
 598\end{code}
 599
 600Objects are described by a sequence of fields that start with a type
 601letter. This letter can either be an extended type, or the \texttt{z} letter.
 602If the letter used is an extended type, the data item following
 603specifies the bits to be stored in the field.
 604
 605\begin{code}
 606dataItem :: Parser Q.DataItem
 607dataItem =
 608  (Q.DString <$> strLit)
 609    <|> try
 610      ( do
 611          i <- ws global
 612          off <- (ws $ char '+') >> ws decNumber
 613          return $ Q.DSymOff i off
 614      )
 615    <|> (Q.DConst <$> constant)
 616\end{code}
 617
 618Within each object, several items can be defined. When several data items
 619follow a letter, they initialize multiple fields of the same size.
 620
 621\begin{code}
 622allocSize :: Parser Q.AllocSize
 623allocSize =
 624  choice
 625    [ bind "4" Q.AllocWord,
 626      bind "8" Q.AllocLong,
 627      bind "16" Q.AllocLongLong
 628    ]
 629\end{code}
 630
 631The members of a struct will be packed. This means that padding has to
 632be emitted by the frontend when necessary. Alignment of the whole data
 633objects can be manually specified, and when no alignment is provided,
 634the maximum alignment from the platform is used.
 635
 636When the \texttt{z} letter is used the number following indicates the size of
 637the field; the contents of the field are zero initialized. It can be
 638used to add padding between fields or zero-initialize big arrays.
 639
 640\subsubsection{Functions}
 641\label{sec:functions}
 642
 643\begin{code}
 644funcDef :: Parser Q.FuncDef
 645funcDef = do
 646  link <- many linkage
 647  _ <- ws1 (string "function")
 648  retTy <- optionMaybe (ws1 abity)
 649  name <- ws global
 650  args <- wsNL params
 651  body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block
 652
 653  case (Q.insertJumps body) of
 654    Nothing -> fail $ "invalid fallthrough in " ++ show name
 655    Just bl -> return $ Q.FuncDef link name retTy args bl
 656\end{code}
 657
 658Function definitions contain the actual code to emit in the compiled
 659file. They define a global symbol that contains a pointer to the
 660function code. This pointer can be used in \texttt{call} instructions or stored
 661in memory.
 662
 663\begin{code}
 664subWordType :: Parser Q.SubWordType
 665subWordType = choice
 666  [ try $ bind "sb" Q.SignedByte
 667  , try $ bind "ub" Q.UnsignedByte
 668  , bind "sh" Q.SignedHalf
 669  , bind "uh" Q.UnsignedHalf ]
 670
 671abity :: Parser Q.Abity
 672abity = try (Q.ASubWordType <$> subWordType)
 673    <|> (Q.ABase <$> baseType)
 674    <|> (Q.AUserDef <$> userDef)
 675\end{code}
 676
 677The type given right before the function name is the return type of the
 678function. All return values of this function must have this return type.
 679If the return type is missing, the function must not return any value.
 680
 681\begin{code}
 682param :: Parser Q.FuncParam
 683param = (Q.Env <$> (ws1 (string "env") >> local))
 684    <|> (string "..." >> pure Q.Variadic)
 685    <|> do
 686          ty <- ws1 abity
 687          Q.Regular ty <$> local
 688
 689params :: Parser [Q.FuncParam]
 690params = parenLst param
 691\end{code}
 692
 693The parameter list is a comma separated list of temporary names prefixed
 694by types. The types are used to correctly implement C compatibility.
 695When an argument has an aggregate type, a pointer to the aggregate is
 696passed by thea caller. In the example below, we have to use a load
 697instruction to get the value of the first (and only) member of the
 698struct.
 699
 700\begin{verbatim}
 701type :one = { w }
 702
 703function w $getone(:one %p) {
 704@start
 705        %val =w loadw %p
 706        ret %val
 707}
 708\end{verbatim}
 709
 710If a function accepts or returns values that are smaller than a word,
 711such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type
 712must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,
 713respectively, for signed and unsigned 8-bit values, and signed and
 714unsigned 16-bit values. Parameters associated with a sub-word type of
 715bit width N only have their N least significant bits set and have base
 716type \texttt{w}. For example, the function
 717
 718\begin{verbatim}
 719function w $addbyte(w %a, sb %b) {
 720@start
 721        %bw =w extsb %b
 722        %val =w add %a, %bw
 723        ret %val
 724}
 725\end{verbatim}
 726
 727needs to sign-extend its second argument before the addition. Dually,
 728return values with sub-word types do not need to be sign or zero
 729extended.
 730
 731If the parameter list ends with \texttt{...}, the function is a variadic
 732function: it can accept a variable number of arguments. To access the
 733extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}
 734instructions described in the \nameref{sec:variadic} section.
 735
 736Optionally, the parameter list can start with an environment parameter
 737\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,
 738of type \texttt{l}). If the function does not use its environment parameter,
 739callers can safely omit it. This parameter is invisible to a C caller:
 740for example, the function
 741
 742\begin{verbatim}
 743export function w $add(env %e, w %a, w %b) {
 744@start
 745        %c =w add %a, %b
 746        ret %c
 747}
 748\end{verbatim}
 749
 750must be given the C prototype \texttt{int add(int, int)}. The intended use of
 751this feature is to pass the environment pointer of closures while
 752retaining a very good compatibility with C. The \nameref{sec:call}
 753section explains how to pass an environment parameter.
 754
 755Since global symbols are defined mutually recursive, there is no need
 756for function declarations: a function can be referenced before its
 757definition. Similarly, functions from other modules can be used without
 758previous declaration. All the type information necessary to compile a
 759call is in the instruction itself.
 760
 761The syntax and semantics for the body of functions are described in the
 762\nameref{sec:control} section.
 763
 764\section{Control}
 765\label{sec:control}
 766
 767The IL represents programs as textual transcriptions of control flow
 768graphs. The control flow is serialized as a sequence of blocks of
 769straight-line code which are connected using jump instructions.
 770
 771\subsection{Blocks}
 772\label{sec:blocks}
 773
 774\begin{code}
 775block :: Parser Q.Block'
 776block = do
 777  l <- wsNL1 label
 778  p <- many (wsNL1 $ try phiInstr)
 779  s <- many (wsNL1 statement)
 780  Q.Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr)
 781\end{code}
 782
 783All blocks have a name that is specified by a label at their beginning.
 784Then follows a sequence of instructions that have "fall-through" flow.
 785Finally one jump terminates the block. The jump can either transfer
 786control to another block of the same function or return; jumps are
 787described further below.
 788
 789The first block in a function must not be the target of any jump in the
 790program. If a jump to the function start is needed, the frontend must
 791insert an empty prelude block at the beginning of the function.
 792
 793When one block jumps to the next block in the IL file, it is not
 794necessary to write the jump instruction, it will be automatically added
 795by the parser. For example the start block in the example below jumps
 796directly to the loop block.
 797
 798\subsection{Jumps}
 799\label{sec:jumps}
 800
 801\begin{code}
 802jumpInstr :: Parser Q.JumpInstr
 803jumpInstr = (string "hlt" >> pure Q.Halt)
 804        -- TODO: Return requires a space if there is an optionMaybe
 805        <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)
 806        <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))
 807        <|> do
 808          _ <- ws1 $ string "jnz"
 809          v <- ws val <* ws (char ',')
 810          l1 <- ws label <* ws (char ',')
 811          l2 <- ws label
 812          return $ Q.Jnz v l1 l2
 813\end{code}
 814
 815A jump instruction ends every block and transfers the control to another
 816program location. The target of a jump must never be the first block in
 817a function. The three kinds of jumps available are described in the
 818following list.
 819
 820\begin{enumerate}
 821  \item \textbf{Unconditional jump.} Jumps to another block of the same function.
 822  \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.
 823  \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.
 824  \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.
 825\end{enumerate}
 826
 827\section{Instructions}
 828\label{sec:instructions}
 829
 830\begin{code}
 831instr :: Parser Q.Instr
 832instr =
 833  choice
 834    [ try $ binaryInstr Q.Add "add",
 835      try $ binaryInstr Q.Sub "sub",
 836      try $ binaryInstr Q.Mul "mul",
 837      try $ binaryInstr Q.Div "div",
 838      try $ binaryInstr Q.URem "urem",
 839      try $ binaryInstr Q.Rem "rem",
 840      try $ binaryInstr Q.UDiv "udiv",
 841      try $ binaryInstr Q.Or "or",
 842      try $ binaryInstr Q.Xor "xor",
 843      try $ binaryInstr Q.And "and",
 844      try $ binaryInstr Q.Sar "sar",
 845      try $ binaryInstr Q.Shr "shr",
 846      try $ binaryInstr Q.Shl "shl",
 847      try $ unaryInstr Q.Neg "neg",
 848      try $ unaryInstr Q.Cast "cast",
 849      try $ unaryInstr Q.Copy "copy",
 850      try $ unaryInstr Q.VAArg "vaarg",
 851      try $ loadInstr,
 852      try $ allocInstr,
 853      try $ compareInstr,
 854      try $ extInstr,
 855      try $ truncInstr,
 856      try $ fromFloatInstr,
 857      try $ toFloatInstr
 858    ]
 859\end{code}
 860
 861Instructions are the smallest piece of code in the IL, they form the body of
 862\nameref{sec:blocks}. This specification distinguishes instructions and
 863volatile instructions, the latter do not return a value. For the former, the IL
 864uses a three-address code, which means that one instruction computes an
 865operation between two operands and assigns the result to a third one.
 866
 867\begin{code}
 868assign :: Parser Q.Statement
 869assign = do
 870  n <- ws local
 871  t <- ws (char '=') >> ws1 baseType
 872  Q.Assign n t <$> instr
 873
 874volatileInstr :: Parser Q.Statement
 875volatileInstr =
 876  Q.Volatile <$>
 877    (storeInstr <|> blitInstr <|> vastartInstr <|> dbglocInstr)
 878
 879-- TODO: Not documented in the QBE BNF.
 880statement :: Parser Q.Statement
 881statement = (try callInstr) <|> assign <|> volatileInstr
 882\end{code}
 883
 884An instruction has both a name and a return type, this return type is a base
 885type that defines the size of the instruction's result. The type of the
 886arguments can be unambiguously inferred using the instruction name and the
 887return type. For example, for all arithmetic instructions, the type of the
 888arguments is the same as the return type. The two additions below are valid if
 889\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).
 890
 891\begin{verbatim}
 892%x =w add 0, %y
 893%z =w add %x, %x
 894\end{verbatim}
 895
 896Some instructions, like comparisons and memory loads have operand types
 897that differ from their return types. For instance, two floating points
 898can be compared to give a word result (0 if the comparison succeeds, 1
 899if it fails).
 900
 901\begin{verbatim}
 902%c =w cgts %a, %b
 903\end{verbatim}
 904
 905In the example above, both operands have to have single type. This is
 906made explicit by the instruction suffix.
 907
 908\subsection{Arithmetic and Bits}
 909
 910\begin{quote}
 911\begin{itemize}
 912\item \texttt{add}, \texttt{sub}, \texttt{div}, \texttt{mul}
 913\item \texttt{neg}
 914\item \texttt{udiv}, \texttt{rem}, \texttt{urem}
 915\item \texttt{or}, \texttt{xor}, \texttt{and}
 916\item \texttt{sar}, \texttt{shr}, \texttt{shl}
 917\end{itemize}
 918\end{quote}
 919
 920The base arithmetic instructions in the first bullet are available for
 921all types, integers and floating points.
 922
 923When \texttt{div} is used with word or long return type, the arguments are
 924treated as signed. The unsigned integral division is available as \texttt{udiv}
 925instruction. When the result of a division is not an integer, it is truncated
 926towards zero.
 927
 928The signed and unsigned remainder operations are available as \texttt{rem} and
 929\texttt{urem}. The sign of the remainder is the same as the one of the
 930dividend. Its magnitude is smaller than the divisor one. These two instructions
 931and \texttt{udiv} are only available with integer arguments and result.
 932
 933Bitwise OR, AND, and XOR operations are available for both integer
 934types. Logical operations of typical programming languages can be
 935implemented using \nameref{sec:comparisions} and \nameref{sec:jumps}.
 936
 937Shift instructions \texttt{sar}, \texttt{shr}, and \texttt{shl}, shift right or
 938left their first operand by the amount from the second operand. The shifting
 939amount is taken modulo the size of the result type. Shifting right can either
 940preserve the sign of the value (using \texttt{sar}), or fill the newly freed
 941bits with zeroes (using \texttt{shr}). Shifting left always fills the freed
 942bits with zeroes.
 943
 944Remark that an arithmetic shift right (\texttt{sar}) is only equivalent to a
 945division by a power of two for non-negative numbers. This is because the shift
 946right "truncates" towards minus infinity, while the division truncates towards
 947zero.
 948
 949\subsection{Memory}
 950\label{sec:memory}
 951
 952The following sections discuss instructions for interacting with values stored in memory.
 953
 954\subsubsection{Store instructions}
 955
 956\begin{code}
 957storeInstr :: Parser Q.VolatileInstr
 958storeInstr = do
 959  t <- string "store" >> ws1 extType
 960  v <- ws val
 961  _ <- ws $ char ','
 962  ws val <&> Q.Store t v
 963\end{code}
 964
 965Store instructions exist to store a value of any base type and any extended
 966type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}
 967and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of
 968this word will be stored in memory at the address specified in the second
 969argument.
 970
 971\subsubsection{Load instructions}
 972
 973\begin{code}
 974loadInstr :: Parser Q.Instr
 975loadInstr = do
 976  _ <- string "load"
 977  t <- ws1 $ choice
 978    [ try $ bind "sw" (Q.LBase Q.Word),
 979      try $ bind "uw" (Q.LBase Q.Word),
 980      try $ Q.LSubWord <$> subWordType,
 981      Q.LBase <$> baseType
 982    ]
 983  ws val <&> Q.Load t
 984\end{code}
 985
 986For types smaller than long, two variants of the load instruction are
 987available: one will sign extend the loaded value, while the other will zero
 988extend it. Note that all loads smaller than long can load to either a long or a
 989word.
 990
 991The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect
 992when they are used to define a word temporary. A \texttt{loadw} instruction is
 993provided as syntactic sugar for \texttt{loadsw} to make explicit that the
 994extension mechanism used is irrelevant.
 995
 996\subsubsection{Blits}
 997
 998\begin{code}
 999blitInstr :: Parser Q.VolatileInstr
1000blitInstr = do
1001  v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')
1002  v2 <- ws val <* (ws $ char ',')
1003  nb <- decNumber
1004  return $ Q.Blit v1 v2 nb
1005\end{code}
1006
1007The blit instruction copies in-memory data from its first address argument to
1008its second address argument. The third argument is the number of bytes to copy.
1009The source and destination spans are required to be either non-overlapping, or
1010fully overlapping (source address identical to the destination address). The
1011byte count argument must be a nonnegative numeric constant; it cannot be a
1012temporary.
1013
1014One blit instruction may generate a number of instructions proportional to its
1015byte count argument, consequently, it is recommended to keep this argument
1016relatively small. If large copies are necessary, it is preferable that
1017frontends generate calls to a supporting \texttt{memcpy} function.
1018
1019\subsubsection{Stack Allocation}
1020
1021\begin{code}
1022allocInstr :: Parser Q.Instr
1023allocInstr = do
1024  siz <- (ws $ string "alloc") >> (ws1 allocSize)
1025  val <&> Q.Alloc siz
1026\end{code}
1027
1028These instructions allocate a chunk of memory on the stack. The number ending
1029the instruction name is the alignment required for the allocated slot. QBE will
1030make sure that the returned address is a multiple of that alignment value.
1031
1032Stack allocation instructions are used, for example, when compiling the C local
1033variables, because their address can be taken. When compiling Fortran,
1034temporaries can be used directly instead, because it is illegal to take the
1035address of a variable.
1036
1037\subsection{Comparisons}
1038\label{sec:comparisions}
1039
1040\begin{code}
1041compareInstr :: Parser Q.Instr
1042compareInstr = do
1043  _ <- char 'c'
1044  (try intCompare) <|> floatCompare
1045
1046compareArgs :: Parser (Q.Value, Q.Value)
1047compareArgs = do
1048  lhs <- ws val <* ws (char ',')
1049  rhs <- ws val
1050  pure (lhs, rhs)
1051
1052intCompare :: Parser Q.Instr
1053intCompare = do
1054  op <- compareIntOp
1055  ty <- ws1 intArg
1056
1057  (lhs, rhs) <- compareArgs
1058  pure $ Q.CompareInt ty op lhs rhs
1059
1060floatCompare :: Parser Q.Instr
1061floatCompare = do
1062  op <- compareFloatOp
1063  ty <- ws1 floatArg
1064
1065  (lhs, rhs) <- compareArgs
1066  pure $ Q.CompareFloat ty op lhs rhs
1067\end{code}
1068
1069Comparison instructions return an integer value (either a word or a long), and
1070compare values of arbitrary types. The returned value is 1 if the two operands
1071satisfy the comparison relation, or 0 otherwise. The names of comparisons
1072respect a standard naming scheme in three parts:
1073
1074\begin{enumerate}
1075  \item All comparisons start with the letter \texttt{c}.
1076  \item Then comes a comparison type.
1077  \item Finally, the instruction name is terminated with a basic type suffix precising the type of the operands to be compared.
1078\end{enumerate}
1079
1080The following instruction are available for integer comparisons:
1081
1082\begin{code}
1083compareIntOp :: Parser Q.IntCmpOp
1084compareIntOp = choice
1085  [ bind "eq" Q.IEq
1086  , bind "ne" Q.INe
1087  , try $ bind "sle" Q.ISle
1088  , try $ bind "slt" Q.ISlt
1089  , try $ bind "sge" Q.ISge
1090  , try $ bind "sgt" Q.ISgt
1091  , try $ bind "ule" Q.IUle
1092  , try $ bind "ult" Q.IUlt
1093  , try $ bind "uge" Q.IUge
1094  , try $ bind "ugt" Q.IUgt ]
1095\end{code}
1096
1097For floating point comparisons use one of these instructions:
1098
1099\begin{code}
1100compareFloatOp :: Parser Q.FloatCmpOp
1101compareFloatOp = choice
1102  [ bind "eq" Q.FEq
1103  , bind "ne" Q.FNe
1104  , try $ bind "le" Q.FLe
1105  , bind "lt" Q.FLt
1106  , try $ bind "ge" Q.FGe
1107  , bind "gt" Q.FGt
1108  , bind "o" Q.FOrd
1109  , bind "uo" Q.FUnord ]
1110\end{code}
1111
1112For example, \texttt{cod} compares two double-precision floating point numbers
1113and returns 1 if the two floating points are not NaNs, or 0 otherwise. The
1114\texttt{csltw} instruction compares two words representing signed numbers and
1115returns 1 when the first argument is smaller than the second one.
1116
1117\subsection{Conversions}
1118
1119Conversion operations change the representation of a value, possibly modifying
1120it if the target type cannot hold the value of the source type. Conversions can
1121extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or
1122convert a floating point into an integer and vice versa.
1123
1124\begin{code}
1125extInstr :: Parser Q.Instr
1126extInstr = do
1127  _ <- string "ext"
1128  ty <- ws1 extArg
1129  ws val <&> Q.Ext ty
1130 where
1131  extArg :: Parser Q.ExtArg
1132  extArg = try (Q.ExtSubWord <$> subWordType)
1133    <|> try (bind "sw" Q.ExtSignedWord)
1134    <|> bind "s" Q.ExtSingle
1135    <|> bind "uw" Q.ExtUnsignedWord
1136\end{code}
1137
1138Extending the precision of a temporary is done using the \texttt{ext} family of
1139instructions. Because QBE types do not specify the signedness (like in LLVM),
1140extension instructions exist to sign-extend and zero-extend a value. For
1141example, \texttt{extsb} takes a word argument and sign-extends the 8
1142least-significant bits to a full word or long, depending on the return type.
1143
1144\begin{code}
1145truncInstr :: Parser Q.Instr
1146truncInstr = do
1147  _ <- ws1 $ string "truncd"
1148  ws val <&> Q.TruncDouble
1149\end{code}
1150
1151The instructions \texttt{exts} (extend single) and \texttt{truncd} (truncate
1152double) are provided to change the precision of a floating point value. When
1153the double argument of truncd cannot be represented as a single-precision
1154floating point, it is truncated towards zero.
1155
1156\begin{code}
1157floatArg :: Parser Q.FloatArg
1158floatArg = bind "d" Q.FDouble <|> bind "s" Q.FSingle
1159
1160fromFloatInstr :: Parser Q.Instr
1161fromFloatInstr = do
1162  arg <- floatArg <* string "to"
1163  isSigned <- signageChar
1164  _ <- ws1 $ char 'i'
1165  ws val <&> Q.FloatToInt arg isSigned
1166
1167intArg :: Parser Q.IntArg
1168intArg = bind "w" Q.IWord <|> bind "l" Q.ILong
1169
1170toFloatInstr :: Parser Q.Instr
1171toFloatInstr = do
1172  isSigned <- signageChar
1173  arg <- intArg
1174  _ <- ws1 $ string "tof"
1175  ws val <&> Q.IntToFloat arg isSigned
1176\end{code}
1177
1178Converting between signed integers and floating points is done using
1179\texttt{stosi} (single to signed integer), \texttt{stoui} (single to unsigned
1180integer), \texttt{dtosi} (double to signed integer), \texttt{dtoui} (double to
1181unsigned integer), \texttt{swtof} (signed word to float), \texttt{uwtof}
1182(unsigned word to float), \texttt{sltof} (signed long to float) and
1183\texttt{ultof} (unsigned long to float).
1184
1185\subsection{Cast and Copy}
1186
1187The \texttt{cast} and \texttt{copy} instructions return the bits of their
1188argument verbatim. However a cast will change an integer into a floating point
1189of the same width and vice versa.
1190
1191Casts can be used to make bitwise operations on the representation of floating
1192point numbers. For example the following program will compute the opposite of
1193the single-precision floating point number \texttt{\%f} into \texttt{\%rs}.
1194
1195\begin{verbatim}
1196%b0 =w cast %f
1197%b1 =w xor 2147483648, %b0  # flip the msb
1198%rs =s cast %b1
1199\end{verbatim}
1200
1201\subsection{Call}
1202\label{sec:call}
1203
1204\begin{code}
1205-- TODO: Code duplication with 'param'.
1206callArg :: Parser Q.FuncArg
1207callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))
1208    <|> (string "..." >> pure Q.ArgVar)
1209    <|> do
1210          ty <- ws1 abity
1211          Q.ArgReg ty <$> val
1212
1213callArgs :: Parser [Q.FuncArg]
1214callArgs = parenLst callArg
1215
1216callInstr :: Parser Q.Statement
1217callInstr = do
1218  retValue <- optionMaybe $ do
1219    i <- ws local <* ws (char '=')
1220    a <- ws1 abity
1221    return (i, a)
1222  toCall <- ws1 (string "call") >> ws val
1223  fnArgs <- callArgs
1224  return $ Q.Call retValue toCall fnArgs
1225\end{code}
1226
1227The call instruction is special in several ways. It is not a three-address
1228instruction and requires the type of all its arguments to be given. Also, the
1229return type can be either a base type or an aggregate type. These specifics are
1230required to compile calls with C compatibility (i.e., to respect the ABI).
1231
1232When an aggregate type is used as argument type or return type, the value
1233respectively passed or returned needs to be a pointer to a memory location
1234holding the value. This is because aggregate types are not first-class
1235citizens of the IL.
1236
1237Sub-word types are used for arguments and return values of width less than a
1238word. Details on these types are presented in the \nameref{sec:functions} section.
1239Arguments with sub-word types need not be sign or zero extended according to
1240their type. Calls with a sub-word return type define a temporary of base type
1241\texttt{w} with its most significant bits unspecified.
1242
1243Unless the called function does not return a value, a return temporary must be
1244specified, even if it is never used afterwards.
1245
1246An environment parameter can be passed as first argument using the \texttt{env}
1247keyword. The passed value must be a 64-bit integer. If the called function does
1248not expect an environment parameter, it will be safely discarded. See the
1249\nameref{sec:functions} section for more information about environment
1250parameters.
1251
1252When the called function is variadic, there must be a \texttt{...} marker
1253separating the named and variadic arguments.
1254
1255\subsection{Variadic}
1256\label{sec:variadic}
1257
1258\begin{code}
1259vastartInstr :: Parser Q.VolatileInstr
1260vastartInstr = do
1261  _ <- ws1 (string "vastart")
1262  Q.VAStart <$> ws val
1263\end{code}
1264
1265The \texttt{vastart} and \texttt{vaarg} instructions provide a portable way to
1266access the extra parameters of a variadic function.
1267
1268\begin{enumerate}
1269  \item \texttt{vastart} -- \texttt{(m)}
1270  \item \texttt{vaarg} -- \texttt{T(mmmm)}
1271\end{enumerate}
1272
1273The \texttt{vastart} instruction initializes a variable argument list used to
1274access the extra parameters of the enclosing variadic function. It is safe to
1275call it multiple times.
1276
1277The \texttt{vaarg} instruction fetches the next argument from a variable
1278argument list. It is currently limited to fetching arguments that have a base
1279type. This instruction is essentially effectful: calling it twice in a row will
1280return two consecutive arguments from the argument list.
1281
1282Both instructions take a pointer to a variable argument list as the sole argument.
1283The size and alignment of the variable argument lists depends on the target used.
1284
1285\subsection{Phi}
1286
1287\begin{code}
1288phiBranch :: Parser (Q.BlockIdent, Q.Value)
1289phiBranch = do
1290  n <- ws1 label
1291  v <- val
1292  pure (n, v)
1293
1294phiInstr :: Parser Q.Phi
1295phiInstr = do
1296  -- TODO: code duplication with 'assign'
1297  n <- ws local
1298  t <- ws (char '=') >> ws1 baseType
1299
1300  _ <- ws1 (string "phi")
1301  -- TODO: combinator for sepBy
1302  p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')
1303  return $ Q.Phi n t p
1304\end{code}
1305
1306First and foremost, phi instructions are NOT necessary when writing a frontend
1307to QBE. One solution to avoid having to deal with SSA form is to use stack
1308allocated variables for all source program variables and perform assignments
1309and lookups using \nameref{sec:memory} operations. This is what LLVM users
1310typically do.
1311
1312Another solution is to simply emit code that is not in SSA form! Contrary to
1313LLVM, QBE is able to fixup programs not in SSA form without requiring the
1314boilerplate of loading and storing in memory. For example, the following
1315program will be correctly compiled by QBE.
1316
1317\begin{verbatim}
1318@start
1319    %x =w copy 100
1320    %s =w copy 0
1321@loop
1322    %s =w add %s, %x
1323    %x =w sub %x, 1
1324    jnz %x, @loop, @end
1325@end
1326    ret %s
1327\end{verbatim}
1328
1329Now, if you want to know what phi instructions are and how to use them in QBE,
1330you can read the following.
1331
1332Phi instructions are specific to SSA form. In SSA form values can only be
1333assigned once, without phi instructions, this requirement is too strong to
1334represent many programs. For example consider the following C program.
1335
1336\begin{verbatim}
1337int f(int x) {
1338    int y;
1339    if (x)
1340        y = 1;
1341    else
1342        y = 2;
1343    return y;
1344}
1345\end{verbatim}
1346
1347The variable \texttt{y} is assigned twice, the solution to translate it in SSA
1348form is to insert a phi instruction.
1349
1350\begin{verbatim}
1351@ifstmt
1352    jnz %x, @ift, @iff
1353@ift
1354    jmp @retstmt
1355@iff
1356    jmp @retstmt
1357@retstmt
1358    %y =w phi @ift 1, @iff 2
1359    ret %y
1360\end{verbatim}
1361
1362Phi instructions return one of their arguments depending on where the control
1363came from. In the example, \texttt{\%y} is set to 1 if the
1364\texttt{\textbackslash{}ift} branch is taken, or it is set to 2 otherwise.
1365
1366An important remark about phi instructions is that QBE assumes that if a
1367variable is defined by a phi it respects all the SSA invariants. So it is
1368critical to not use phi instructions unless you know exactly what you are
1369doing.
1370
1371\subsection{Debug Information}
1372
1373QBE supports the inclusion of debug information. Specifically, it allows
1374defining from which source file type, data, and function definitions originated.
1375For this purpose, it provides the \texttt{dbgfile} definition, which receives a
1376file name (string literal) as its sole argument. Every type, data and function
1377definition thereafter are assumed to originate in this file.
1378
1379\begin{code}
1380-- TODO: not documnted in the QBE BNF.
1381fileDef :: Parser String
1382fileDef = do
1383  _ <- ws1 $ string "dbgfile"
1384  wsNL1 strLit
1385\end{code}
1386
1387Further, instructions within a function can be associated with a specific line
1388and column number of a previously defined \texttt{dbgfile}. The
1389\texttt{dbgfile} is referenced by index using the first argument to
1390\texttt{dbgloc}. The second argument represents the line number, the third
1391(optional) argument the column number.
1392
1393\begin{code}
1394-- TODO: not documnted in the QBE BNF.
1395dbglocInstr :: Parser Q.VolatileInstr
1396dbglocInstr = do
1397  _ <- ws1 $ string "dbgloc"
1398  file <- ws decNumber <* ws (char ',')
1399  line <- ws decNumber
1400  col  <- optionMaybe (ws (char ',') >> ws decNumber)
1401  return $ Q.DBGLoc file line col
1402\end{code}
1403
1404\end{document}