quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

   1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>
   2% SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net>
   3%
   4% SPDX-License-Identifier: MIT AND GPL-3.0-only
   5
   6\documentclass{article}
   7%include polycode.fmt
   8
   9%subst blankline = "\\[5mm]"
  10
  11% See https://github.com/kosmikus/lhs2tex/issues/58
  12%format <$> = "\mathbin{\langle\$\rangle}"
  13%format <&> = "\mathbin{\langle\&\rangle}"
  14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"
  15%format <?> = "\mathbin{\langle?\rangle}"
  16%format <*> = "\mathbin{\langle*\rangle}"
  17%format <*  = "\mathbin{\langle*}"
  18%format *>  = "\mathbin{*\rangle}"
  19
  20\long\def\ignore#1{}
  21
  22\usepackage{hyperref}
  23\hypersetup{
  24	colorlinks = true,
  25}
  26
  27\begin{document}
  28
  29\title{QBE Intermediate Language\vspace{-2em}}
  30\date{}
  31\maketitle
  32\frenchspacing
  33
  34\ignore{
  35\begin{code}
  36module Language.QBE.Parser
  37  ( skipInitComments,
  38    dataDef,
  39    typeDef,
  40    funcDef,
  41  )
  42where
  43
  44import Data.Char (chr)
  45import Data.Word (Word64)
  46import Data.Functor ((<&>))
  47import Data.List (singleton)
  48import Data.Map qualified as Map
  49import qualified Language.QBE.Types as Q
  50import Language.QBE.Util (bind, decNumber, octNumber, float)
  51import Text.ParserCombinators.Parsec
  52  ( Parser,
  53    alphaNum,
  54    anyChar,
  55    between,
  56    char,
  57    choice,
  58    letter,
  59    many,
  60    many1,
  61    manyTill,
  62    newline,
  63    noneOf,
  64    oneOf,
  65    optional,
  66    optionMaybe,
  67    sepBy,
  68    sepBy1,
  69    skipMany,
  70    skipMany1,
  71    string,
  72    try,
  73    (<?>),
  74    (<|>),
  75  )
  76\end{code}
  77}
  78
  79This an executable description of the
  80\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},
  81specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}
  82parser combinators and generated from a literate Haskell file. The description
  83is derived from the original QBE IL documentation, licensed under MIT.
  84Presently, this implementation targets version 1.2 of the QBE intermediate
  85language and aims to be equivalent with the original specification.
  86
  87\section{Basic Concepts}
  88
  89The intermediate language (IL) is a higher-level language than the
  90machine's assembly language. It smoothes most of the
  91irregularities of the underlying hardware and allows an infinite number
  92of temporaries to be used. This higher abstraction level lets frontend
  93programmers focus on language design issues.
  94
  95\subsection{Input Files}
  96
  97The intermediate language is provided to QBE as text. Usually, one file
  98is generated per each compilation unit from the frontend input language.
  99An IL file is a sequence of \nameref{sec:definitions} for
 100data, functions, and types. Once processed by QBE, the resulting file
 101can be assembled and linked using a standard toolchain (e.g., GNU
 102binutils).
 103
 104\begin{code}
 105comment :: Parser ()
 106comment = skipMany blankNL >> comment' >> skipMany blankNL
 107  where
 108    comment' = char '#' >> manyTill anyChar newline
 109\end{code}
 110
 111\ignore{
 112\begin{code}
 113skipNoCode :: Parser () -> Parser ()
 114skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP
 115\end{code}
 116}
 117
 118Here is a complete "Hello World" IL file which defines a function that
 119prints to the screen. Since the string is not a first class object (only
 120the pointer is) it is defined outside the function\textquotesingle s
 121body. Comments start with a \# character and finish with the end of the
 122line.
 123
 124\begin{verbatim}
 125data $str = { b "hello world", b 0 }
 126
 127export function w $main() {
 128@start
 129        # Call the puts function with $str as argument.
 130        %r =w call $puts(l $str)
 131        ret 0
 132}
 133\end{verbatim}
 134
 135If you have read the LLVM language reference, you might recognize the
 136example above. In comparison, QBE makes a much lighter use of types and
 137the syntax is terser.
 138
 139\subsection{Parser Combinators}
 140
 141\ignore{
 142\begin{code}
 143bracesNL :: Parser a -> Parser a
 144bracesNL = between (wsNL $ char '{') (wsNL $ char '}')
 145
 146quoted :: Parser a -> Parser a
 147quoted = let q = char '"' in between q q
 148
 149sepByTrail1 :: Parser a -> Parser sep -> Parser [a]
 150sepByTrail1 p sep = do
 151  x <- p
 152  xs <- many (try $ sep >> p)
 153  _ <- optional sep
 154  return (x:xs)
 155
 156sepByTrail :: Parser a -> Parser sep -> Parser [a]
 157sepByTrail p sep = sepByTrail1 p sep <|> return []
 158
 159parenLst :: Parser a -> Parser [a]
 160parenLst p = between (ws $ char '(') (char ')') inner
 161  where
 162    inner = sepBy (ws p) (ws $ char ',')
 163
 164unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 165unaryInstr conc keyword = do
 166  _ <- ws (string keyword)
 167  conc <$> ws val
 168
 169binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr
 170binaryInstr conc keyword = do
 171  _ <- ws (string keyword)
 172  vfst <- ws val <* ws (char ',')
 173  conc vfst <$> ws val
 174
 175-- Can only appear in data and type definitions and hence allows newlines.
 176alignAny :: Parser Word64
 177alignAny = (ws1 (string "align")) >> wsNL decNumber
 178
 179-- Returns true if it is signed.
 180signageChar :: Parser Bool
 181signageChar = (char 's' <|> char 'u') <&> (== 's')
 182\end{code}
 183}
 184
 185The original QBE specification defines the syntax using a BNF grammar. In
 186contrast, this document defines it using Parsec parser combinators. As such,
 187this specification is less formal but more accurate as the parsing code is
 188actually executable. Consequently, this specification also captures constructs
 189omitted in the original specification (e.g., \nameref{sec:identifiers}, or
 190\nameref{sec:strlit}). Nonetheless, the formal language recognized by these
 191combinators aims to be equivalent to the one of the BNF grammar.
 192
 193\subsection{Identifiers}
 194\label{sec:identifiers}
 195
 196% Ident is not documented in the original QBE specification.
 197% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304
 198
 199\begin{code}
 200ident :: Parser String
 201ident = do
 202  start <- letter <|> oneOf "._"
 203  rest <- many (alphaNum <|> oneOf "$._")
 204  return $ start : rest
 205\end{code}
 206
 207Identifiers for data, types, and functions can start with any ASCII letter or
 208the special characters \texttt{.} and \texttt{\_}. This initial character can
 209be followed by a sequence of zero or more alphanumeric characters and the
 210special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.
 211
 212\subsection{Sigils}
 213
 214\begin{code}
 215userDef :: Parser Q.UserIdent
 216userDef = Q.UserIdent <$> (char ':' >> ident)
 217
 218global :: Parser Q.GlobalIdent
 219global = Q.GlobalIdent <$> (char '$' >> ident)
 220
 221local :: Parser Q.LocalIdent
 222local = Q.LocalIdent <$> (char '%' >> ident)
 223
 224label :: Parser Q.BlockIdent
 225label = Q.BlockIdent <$> (char '@' >> ident)
 226\end{code}
 227
 228The intermediate language makes heavy use of sigils, all user-defined
 229names are prefixed with a sigil. This is to avoid keyword conflicts, and
 230also to quickly spot the scope and nature of identifiers.
 231
 232\begin{itemize}
 233  \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}
 234  \item \texttt{\$} is for globals (represented by a pointer)
 235  \item \texttt{\%} is for function-scope temporaries
 236  \item \texttt{@@} is for block labels
 237\end{itemize}
 238
 239\subsection{Spacing}
 240
 241\begin{code}
 242blank :: Parser Char
 243blank = oneOf "\t " <?> "blank"
 244
 245blankNL :: Parser Char
 246blankNL = oneOf "\n\t " <?> "blank or newline"
 247\end{code}
 248
 249Individual tokens in IL files must be separated by one or more spacing
 250characters. Both spaces and tabs are recognized as spacing characters.
 251In data and type definitions, newlines may also be used as spaces to
 252prevent overly long lines. When exactly one of two consecutive tokens is
 253a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.
 254
 255\ignore{
 256\begin{code}
 257ws :: Parser a -> Parser a
 258ws p = p <* skipMany blank
 259
 260ws1 :: Parser a -> Parser a
 261ws1 p = p <* skipMany1 blank
 262
 263wsNL :: Parser a -> Parser a
 264wsNL p = p <* skipNoCode (skipMany blankNL)
 265
 266wsNL1 :: Parser a -> Parser a
 267wsNL1 p = p <* skipNoCode (skipMany1 blankNL)
 268
 269-- Only intended to be used to skip comments at the start of a file.
 270skipInitComments :: Parser ()
 271skipInitComments = skipNoCode (skipMany blankNL)
 272\end{code}
 273}
 274
 275\subsection{String Literals}
 276\label{sec:strlit}
 277
 278% The string literal is not documented in the original QBE specification.
 279% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287
 280
 281\begin{code}
 282strLit :: Parser String
 283strLit = concat <$> quoted (many strChr)
 284  where
 285    strChr :: Parser [Char]
 286    strChr = (singleton <$> noneOf "\"\\") <|> escSeq
 287
 288    -- TODO: not documnted in the QBE BNF.
 289    octEsc :: Parser Char
 290    octEsc = do
 291      n <- octNumber
 292      pure $ chr (fromIntegral n)
 293
 294    escSeq :: Parser [Char]
 295    escSeq = try $ do
 296      esc <- char '\\'
 297      (singleton <$> octEsc) <|> (anyChar <&> (\c -> [esc, c]))
 298\end{code}
 299
 300Strings are enclosed by double quotes and are, for example, used to specify a
 301section name as part of the \nameref{sec:linkage} information. Within a string,
 302a double quote can be escaped using a \texttt{\textbackslash} character. All
 303escape sequences, including double quote escaping, are passed through as-is to
 304the generated assembly file.
 305
 306\section{Types}
 307
 308\subsection{Simple Types}
 309
 310The IL makes minimal use of types. By design, the types used are
 311restricted to what is necessary for unambiguous compilation to machine
 312code and C interfacing. Unlike LLVM, QBE is not using types as a means
 313to safety; they are only here for semantic purposes.
 314
 315\begin{code}
 316baseType :: Parser Q.BaseType
 317baseType = choice
 318  [ bind "w" Q.Word
 319  , bind "l" Q.Long
 320  , bind "s" Q.Single
 321  , bind "d" Q.Double ]
 322\end{code}
 323
 324The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}
 325(double), they stand respectively for 32-bit and 64-bit integers, and
 32632-bit and 64-bit floating-point numbers. There are no pointer types
 327available; pointers are typed by an integer type sufficiently wide to
 328represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).
 329Temporaries in the IL can only have a base type.
 330
 331\begin{code}
 332extType :: Parser Q.ExtType
 333extType = (Q.Base <$> baseType)
 334       <|> bind "b" Q.Byte
 335       <|> bind "h" Q.HalfWord
 336\end{code}
 337
 338Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),
 339respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}
 340and \nameref{sec:data} definitions.
 341
 342For C interfacing, the IL also provides user-defined aggregate types as
 343well as signed and unsigned variants of the sub-word extended types.
 344Read more about these types in the \nameref{sec:aggregate-types}
 345and \nameref{sec:functions} sections.
 346
 347\subsection{Subtyping}
 348\label{sec:subtyping}
 349
 350The IL has a minimal subtyping feature, for integer types only. Any
 351value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the
 35232 least significant bits of the word value are used.
 353
 354Make note that it is the opposite of the usual subtyping on integers (in
 355C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value
 356cannot be used in word context. The rationale is that a word can be
 357signed or unsigned, so extending it to a long could be done in two ways,
 358either by zero-extension, or by sign-extension.
 359
 360\subsection{Constants and Vals}
 361\label{sec:constants-and-vals}
 362
 363\begin{code}
 364dynConst :: Parser Q.DynConst
 365dynConst =
 366  (Q.Const <$> constant)
 367    <|> (Q.Thread <$> global)
 368    <?> "dynconst"
 369\end{code}
 370
 371Constants come in two kinds: compile-time constants and dynamic
 372constants. Dynamic constants include compile-time constants and other
 373symbol variants that are only known at program-load time or execution
 374time. Consequently, dynamic constants can only occur in function bodies.
 375
 376The representation of integers is two's complement.
 377Floating-point numbers are represented using the single-precision and
 378double-precision formats of the IEEE 754 standard.
 379
 380\begin{code}
 381constant :: Parser Q.Const
 382constant =
 383  (Q.Number <$> decNumber)
 384    <|> (Q.SFP <$> sfp)
 385    <|> (Q.DFP <$> dfp)
 386    <|> (Q.Global <$> global)
 387    <?> "const"
 388  where
 389    sfp = string "s_" >> float
 390    dfp = string "d_" >> float
 391\end{code}
 392
 393Constants specify a sequence of bits and are untyped. They are always
 394parsed as 64-bit blobs. Depending on the context surrounding a constant,
 395only some of its bits are used. For example, in the program below, the
 396two variables defined have the same value since the first operand of the
 397subtraction is a word (32-bit) context.
 398
 399\begin{verbatim}
 400%x =w sub -1, 0 %y =w sub 4294967295, 0
 401\end{verbatim}
 402
 403Because specifying floating-point constants by their bits makes the code
 404less readable, syntactic sugar is provided to express them. Standard
 405scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and
 406double precision numbers respectively. Once again, the following example
 407defines twice the same double-precision constant.
 408
 409\begin{verbatim}
 410%x =d add d_0, d_-1
 411%y =d add d_0, -4616189618054758400
 412\end{verbatim}
 413
 414Global symbols can also be used directly as constants; they will be
 415resolved and turned into actual numeric constants by the linker.
 416
 417When the \texttt{thread} keyword prefixes a symbol name, the
 418symbol\textquotesingle s numeric value is resolved at runtime in the
 419thread-local storage.
 420
 421\begin{code}
 422val :: Parser Q.Value
 423val =
 424  (Q.VConst <$> dynConst)
 425    <|> (Q.VLocal <$> local)
 426    <?> "val"
 427\end{code}
 428
 429Vals are used as arguments in regular, phi, and jump instructions within
 430function definitions. They are either constants or function-scope
 431temporaries.
 432
 433\subsection{Linkage}
 434\label{sec:linkage}
 435
 436\begin{code}
 437linkage :: Parser Q.Linkage
 438linkage =
 439  wsNL (bind "export" Q.LExport)
 440    <|> wsNL (bind "thread" Q.LThread)
 441    <|> do
 442      _ <- ws1 $ string "section"
 443      (try secWithFlags) <|> sec
 444  where
 445    sec :: Parser Q.Linkage
 446    sec = wsNL strLit <&> (`Q.LSection` Nothing)
 447
 448    secWithFlags :: Parser Q.Linkage
 449    secWithFlags = do
 450      n <- ws1 strLit
 451      wsNL strLit <&> Q.LSection n . Just
 452\end{code}
 453
 454Function and data definitions (see below) can specify linkage
 455information to be passed to the assembler and eventually to the linker.
 456
 457The \texttt{export} linkage flag marks the defined item as visible outside the
 458current file\textquotesingle s scope. If absent, the symbol can only be
 459referred to locally. Functions compiled by QBE and called from C need to
 460be exported.
 461
 462The \texttt{thread} linkage flag can only qualify data definitions. It mandates
 463that the object defined is stored in thread-local storage. Each time a
 464runtime thread starts, the supporting platform runtime is in charge of
 465making a new copy of the object for the fresh thread. Objects in
 466thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,
 467as specified in the \nameref{sec:constants-and-vals} section.
 468
 469A \texttt{section} flag can be specified to tell the linker to put the defined
 470item in a certain section. The use of the section flag is platform
 471dependent and we refer the user to the documentation of their assembler
 472and linker for relevant information.
 473
 474\begin{verbatim}
 475section ".init_array" data $.init.f = { l $f }
 476\end{verbatim}
 477
 478The section flag can be used to add function pointers to a global
 479initialization list, as depicted above. Note that some platforms provide
 480a BSS section that can be used to minimize the footprint of uniformly
 481zeroed data. When this section is available, QBE will automatically make
 482use of it and no section flag is required.
 483
 484The section and export linkage flags should each appear at most once in
 485a definition. If multiple occurrences are present, QBE is free to use
 486any.
 487
 488\subsection{Definitions}
 489\label{sec:definitions}
 490
 491Definitions are the essential components of an IL file. They can define
 492three types of objects: aggregate types, data, and functions. Aggregate
 493types are never exported and do not compile to any code. Data and
 494function definitions have file scope and are mutually recursive (even
 495across IL files). Their visibility can be controlled using linkage
 496flags.
 497
 498\subsubsection{Aggregate Types}
 499\label{sec:aggregate-types}
 500
 501\begin{code}
 502typeDef :: Parser Q.TypeDef
 503typeDef = do
 504  _ <- wsNL1 (string "type")
 505  i <- wsNL1 userDef
 506  _ <- wsNL1 (char '=')
 507  a <- optionMaybe alignAny
 508  bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a
 509\end{code}
 510
 511Aggregate type definitions start with the \texttt{type} keyword. They have file
 512scope, but types must be defined before being referenced. The inner
 513structure of a type is expressed by a comma-separated list of fields.
 514
 515\begin{code}
 516subType :: Parser Q.SubType
 517subType =
 518  (Q.SExtType <$> extType)
 519    <|> (Q.SUserDef <$> userDef)
 520
 521field :: Parser Q.Field
 522field = do
 523  -- TODO: newline is required if there is a number argument
 524  f <- wsNL subType
 525  s <- ws $ optionMaybe decNumber
 526  pure (f, s)
 527
 528fields :: Bool -> Parser [Q.Field]
 529fields allowEmpty =
 530  (if allowEmpty then sepByTrail else sepByTrail1) field (wsNL $ char ',')
 531\end{code}
 532
 533A field consists of a subtype, either an extended type or a user-defined type,
 534and an optional number expressing the value of this field. In case many items
 535of the same type are sequenced (like in a C array), the shorter array syntax
 536can be used.
 537
 538\begin{code}
 539regularType :: Parser Q.AggType
 540regularType = Q.ARegular <$> fields True
 541\end{code}
 542
 543Three different kinds of aggregate types are presentl ysupported: regular
 544types, union types and opaque types. The fields of regular types will be
 545packed. By default, the alignment of an aggregate type is the maximum alignment
 546of its members. The alignment can be explicitly specified by the programmer.
 547
 548\begin{code}
 549unionType :: Parser Q.AggType
 550unionType = Q.AUnion <$> many1 (wsNL unionType')
 551  where
 552    unionType' :: Parser [Q.Field]
 553    unionType' = bracesNL $ fields False
 554\end{code}
 555
 556Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.
 557
 558\begin{code}
 559opaqueType :: Parser Q.AggType
 560opaqueType = Q.AOpaque <$> wsNL decNumber
 561\end{code}
 562
 563Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.
 564
 565\subsubsection{Data}
 566\label{sec:data}
 567
 568\begin{code}
 569dataDef :: Parser Q.DataDef
 570dataDef = do
 571  link <- many linkage
 572  name <- wsNL1 (string "data") >> wsNL global
 573  _ <- wsNL (char '=')
 574  alignment <- optionMaybe alignAny
 575  bracesNL dataObjs <&> Q.DataDef link name alignment
 576 where
 577    -- TODO: sepByTrail is not documented in the QBE BNF.
 578    dataObjs = sepByTrail dataObj (wsNL $ char ',')
 579\end{code}
 580
 581Data definitions express objects that will be emitted in the compiled
 582file. Their visibility and location in the compiled artifact are
 583controlled with linkage flags described in the \nameref{sec:linkage}
 584section.
 585
 586They define a global identifier (starting with the sigil \texttt{\$}), that
 587will contain a pointer to the object specified by the definition.
 588
 589\begin{code}
 590dataObj :: Parser Q.DataObj
 591dataObj =
 592  (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))
 593    <|> do
 594      t <- wsNL1 extType
 595      i <- many1 (wsNL dataItem)
 596      return $ Q.OItem t i
 597\end{code}
 598
 599Objects are described by a sequence of fields that start with a type
 600letter. This letter can either be an extended type, or the \texttt{z} letter.
 601If the letter used is an extended type, the data item following
 602specifies the bits to be stored in the field.
 603
 604\begin{code}
 605dataItem :: Parser Q.DataItem
 606dataItem =
 607  (Q.DString <$> strLit)
 608    <|> try
 609      ( do
 610          i <- ws global
 611          off <- (ws $ char '+') >> ws decNumber
 612          return $ Q.DSymOff i off
 613      )
 614    <|> (Q.DConst <$> constant)
 615\end{code}
 616
 617Within each object, several items can be defined. When several data items
 618follow a letter, they initialize multiple fields of the same size.
 619
 620\begin{code}
 621allocSize :: Parser Q.AllocSize
 622allocSize =
 623  choice
 624    [ bind "4" Q.AllocWord,
 625      bind "8" Q.AllocLong,
 626      bind "16" Q.AllocLongLong
 627    ]
 628\end{code}
 629
 630The members of a struct will be packed. This means that padding has to
 631be emitted by the frontend when necessary. Alignment of the whole data
 632objects can be manually specified, and when no alignment is provided,
 633the maximum alignment from the platform is used.
 634
 635When the \texttt{z} letter is used the number following indicates the size of
 636the field; the contents of the field are zero initialized. It can be
 637used to add padding between fields or zero-initialize big arrays.
 638
 639\subsubsection{Functions}
 640\label{sec:functions}
 641
 642\begin{code}
 643funcDef :: Parser Q.FuncDef
 644funcDef = do
 645  link <- many linkage
 646  _ <- ws1 (string "function")
 647  retTy <- optionMaybe (ws1 abity)
 648  name <- ws global
 649  args <- wsNL params
 650  body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block
 651
 652  case (Q.insertJumps body) of
 653    Nothing -> fail $ "invalid fallthrough in " ++ show name
 654    Just bl -> return $ Q.FuncDef link name retTy args bl
 655\end{code}
 656
 657Function definitions contain the actual code to emit in the compiled
 658file. They define a global symbol that contains a pointer to the
 659function code. This pointer can be used in \texttt{call} instructions or stored
 660in memory.
 661
 662\begin{code}
 663subWordType :: Parser Q.SubWordType
 664subWordType = choice
 665  [ try $ bind "sb" Q.SignedByte
 666  , try $ bind "ub" Q.UnsignedByte
 667  , bind "sh" Q.SignedHalf
 668  , bind "uh" Q.UnsignedHalf ]
 669
 670abity :: Parser Q.Abity
 671abity = try (Q.ASubWordType <$> subWordType)
 672    <|> (Q.ABase <$> baseType)
 673    <|> (Q.AUserDef <$> userDef)
 674\end{code}
 675
 676The type given right before the function name is the return type of the
 677function. All return values of this function must have this return type.
 678If the return type is missing, the function must not return any value.
 679
 680\begin{code}
 681param :: Parser Q.FuncParam
 682param = (Q.Env <$> (ws1 (string "env") >> local))
 683    <|> (string "..." >> pure Q.Variadic)
 684    <|> do
 685          ty <- ws1 abity
 686          Q.Regular ty <$> local
 687
 688params :: Parser [Q.FuncParam]
 689params = parenLst param
 690\end{code}
 691
 692The parameter list is a comma separated list of temporary names prefixed
 693by types. The types are used to correctly implement C compatibility.
 694When an argument has an aggregate type, a pointer to the aggregate is
 695passed by thea caller. In the example below, we have to use a load
 696instruction to get the value of the first (and only) member of the
 697struct.
 698
 699\begin{verbatim}
 700type :one = { w }
 701
 702function w $getone(:one %p) {
 703@start
 704        %val =w loadw %p
 705        ret %val
 706}
 707\end{verbatim}
 708
 709If a function accepts or returns values that are smaller than a word,
 710such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type
 711must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,
 712respectively, for signed and unsigned 8-bit values, and signed and
 713unsigned 16-bit values. Parameters associated with a sub-word type of
 714bit width N only have their N least significant bits set and have base
 715type \texttt{w}. For example, the function
 716
 717\begin{verbatim}
 718function w $addbyte(w %a, sb %b) {
 719@start
 720        %bw =w extsb %b
 721        %val =w add %a, %bw
 722        ret %val
 723}
 724\end{verbatim}
 725
 726needs to sign-extend its second argument before the addition. Dually,
 727return values with sub-word types do not need to be sign or zero
 728extended.
 729
 730If the parameter list ends with \texttt{...}, the function is a variadic
 731function: it can accept a variable number of arguments. To access the
 732extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}
 733instructions described in the \nameref{sec:variadic} section.
 734
 735Optionally, the parameter list can start with an environment parameter
 736\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,
 737of type \texttt{l}). If the function does not use its environment parameter,
 738callers can safely omit it. This parameter is invisible to a C caller:
 739for example, the function
 740
 741\begin{verbatim}
 742export function w $add(env %e, w %a, w %b) {
 743@start
 744        %c =w add %a, %b
 745        ret %c
 746}
 747\end{verbatim}
 748
 749must be given the C prototype \texttt{int add(int, int)}. The intended use of
 750this feature is to pass the environment pointer of closures while
 751retaining a very good compatibility with C. The \nameref{sec:call}
 752section explains how to pass an environment parameter.
 753
 754Since global symbols are defined mutually recursive, there is no need
 755for function declarations: a function can be referenced before its
 756definition. Similarly, functions from other modules can be used without
 757previous declaration. All the type information necessary to compile a
 758call is in the instruction itself.
 759
 760The syntax and semantics for the body of functions are described in the
 761\nameref{sec:control} section.
 762
 763\section{Control}
 764\label{sec:control}
 765
 766The IL represents programs as textual transcriptions of control flow
 767graphs. The control flow is serialized as a sequence of blocks of
 768straight-line code which are connected using jump instructions.
 769
 770\subsection{Blocks}
 771\label{sec:blocks}
 772
 773\begin{code}
 774block :: Parser Q.Block'
 775block = do
 776  l <- wsNL1 label
 777  p <- many (wsNL1 $ try phiInstr)
 778  s <- many (wsNL1 statement)
 779  Q.Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr)
 780\end{code}
 781
 782All blocks have a name that is specified by a label at their beginning.
 783Then follows a sequence of instructions that have "fall-through" flow.
 784Finally one jump terminates the block. The jump can either transfer
 785control to another block of the same function or return; jumps are
 786described further below.
 787
 788The first block in a function must not be the target of any jump in the
 789program. If a jump to the function start is needed, the frontend must
 790insert an empty prelude block at the beginning of the function.
 791
 792When one block jumps to the next block in the IL file, it is not
 793necessary to write the jump instruction, it will be automatically added
 794by the parser. For example the start block in the example below jumps
 795directly to the loop block.
 796
 797\subsection{Jumps}
 798\label{sec:jumps}
 799
 800\begin{code}
 801jumpInstr :: Parser Q.JumpInstr
 802jumpInstr = (string "hlt" >> pure Q.Halt)
 803        -- TODO: Return requires a space if there is an optionMaybe
 804        <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)
 805        <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))
 806        <|> do
 807          _ <- ws1 $ string "jnz"
 808          v <- ws val <* ws (char ',')
 809          l1 <- ws label <* ws (char ',')
 810          l2 <- ws label
 811          return $ Q.Jnz v l1 l2
 812\end{code}
 813
 814A jump instruction ends every block and transfers the control to another
 815program location. The target of a jump must never be the first block in
 816a function. The three kinds of jumps available are described in the
 817following list.
 818
 819\begin{enumerate}
 820  \item \textbf{Unconditional jump.} Jumps to another block of the same function.
 821  \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.
 822  \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.
 823  \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.
 824\end{enumerate}
 825
 826\section{Instructions}
 827\label{sec:instructions}
 828
 829\begin{code}
 830instr :: Parser Q.Instr
 831instr =
 832  choice
 833    [ try $ binaryInstr Q.Add "add",
 834      try $ binaryInstr Q.Sub "sub",
 835      try $ binaryInstr Q.Mul "mul",
 836      try $ binaryInstr Q.Div "div",
 837      try $ binaryInstr Q.URem "urem",
 838      try $ binaryInstr Q.Rem "rem",
 839      try $ binaryInstr Q.UDiv "udiv",
 840      try $ binaryInstr Q.Or "or",
 841      try $ binaryInstr Q.Xor "xor",
 842      try $ binaryInstr Q.And "and",
 843      try $ binaryInstr Q.Sar "sar",
 844      try $ binaryInstr Q.Shr "shr",
 845      try $ binaryInstr Q.Shl "shl",
 846      try $ unaryInstr Q.Neg "neg",
 847      try $ unaryInstr Q.Cast "cast",
 848      try $ unaryInstr Q.Copy "copy",
 849      try $ loadInstr,
 850      try $ allocInstr,
 851      try $ compareInstr,
 852      try $ extInstr,
 853      try $ truncInstr,
 854      try $ fromFloatInstr,
 855      try $ toFloatInstr
 856    ]
 857\end{code}
 858
 859Instructions are the smallest piece of code in the IL, they form the body of
 860\nameref{sec:blocks}. This specification distinguishes instructions and
 861volatile instructions, the latter do not return a value. For the former, the IL
 862uses a three-address code, which means that one instruction computes an
 863operation between two operands and assigns the result to a third one.
 864
 865\begin{code}
 866assign :: Parser Q.Statement
 867assign = do
 868  n <- ws local
 869  t <- ws (char '=') >> ws1 baseType
 870  Q.Assign n t <$> instr
 871
 872volatileInstr :: Parser Q.Statement
 873volatileInstr = Q.Volatile <$> (storeInstr <|> blitInstr)
 874
 875-- TODO: Not documented in the QBE BNF.
 876statement :: Parser Q.Statement
 877statement = (try callInstr) <|> assign <|> volatileInstr
 878\end{code}
 879
 880An instruction has both a name and a return type, this return type is a base
 881type that defines the size of the instruction's result. The type of the
 882arguments can be unambiguously inferred using the instruction name and the
 883return type. For example, for all arithmetic instructions, the type of the
 884arguments is the same as the return type. The two additions below are valid if
 885\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).
 886
 887\begin{verbatim}
 888%x =w add 0, %y
 889%z =w add %x, %x
 890\end{verbatim}
 891
 892Some instructions, like comparisons and memory loads have operand types
 893that differ from their return types. For instance, two floating points
 894can be compared to give a word result (0 if the comparison succeeds, 1
 895if it fails).
 896
 897\begin{verbatim}
 898%c =w cgts %a, %b
 899\end{verbatim}
 900
 901In the example above, both operands have to have single type. This is
 902made explicit by the instruction suffix.
 903
 904\subsection{Arithmetic and Bits}
 905
 906\begin{quote}
 907\begin{itemize}
 908\item \texttt{add}, \texttt{sub}, \texttt{div}, \texttt{mul}
 909\item \texttt{neg}
 910\item \texttt{udiv}, \texttt{rem}, \texttt{urem}
 911\item \texttt{or}, \texttt{xor}, \texttt{and}
 912\item \texttt{sar}, \texttt{shr}, \texttt{shl}
 913\end{itemize}
 914\end{quote}
 915
 916The base arithmetic instructions in the first bullet are available for
 917all types, integers and floating points.
 918
 919When \texttt{div} is used with word or long return type, the arguments are
 920treated as signed. The unsigned integral division is available as \texttt{udiv}
 921instruction. When the result of a division is not an integer, it is truncated
 922towards zero.
 923
 924The signed and unsigned remainder operations are available as \texttt{rem} and
 925\texttt{urem}. The sign of the remainder is the same as the one of the
 926dividend. Its magnitude is smaller than the divisor one. These two instructions
 927and \texttt{udiv} are only available with integer arguments and result.
 928
 929Bitwise OR, AND, and XOR operations are available for both integer
 930types. Logical operations of typical programming languages can be
 931implemented using \nameref{sec:comparisions} and \nameref{sec:jumps}.
 932
 933Shift instructions \texttt{sar}, \texttt{shr}, and \texttt{shl}, shift right or
 934left their first operand by the amount from the second operand. The shifting
 935amount is taken modulo the size of the result type. Shifting right can either
 936preserve the sign of the value (using \texttt{sar}), or fill the newly freed
 937bits with zeroes (using \texttt{shr}). Shifting left always fills the freed
 938bits with zeroes.
 939
 940Remark that an arithmetic shift right (\texttt{sar}) is only equivalent to a
 941division by a power of two for non-negative numbers. This is because the shift
 942right "truncates" towards minus infinity, while the division truncates towards
 943zero.
 944
 945\subsection{Memory}
 946\label{sec:memory}
 947
 948The following sections discuss instructions for interacting with values stored in memory.
 949
 950\subsubsection{Store instructions}
 951
 952\begin{code}
 953storeInstr :: Parser Q.VolatileInstr
 954storeInstr = do
 955  t <- string "store" >> ws1 extType
 956  v <- ws val
 957  _ <- ws $ char ','
 958  ws val <&> Q.Store t v
 959\end{code}
 960
 961Store instructions exist to store a value of any base type and any extended
 962type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}
 963and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of
 964this word will be stored in memory at the address specified in the second
 965argument.
 966
 967\subsubsection{Load instructions}
 968
 969\begin{code}
 970loadInstr :: Parser Q.Instr
 971loadInstr = do
 972  _ <- string "load"
 973  t <- ws1 $ choice
 974    [ try $ bind "sw" (Q.LBase Q.Word),
 975      try $ bind "uw" (Q.LBase Q.Word),
 976      try $ Q.LSubWord <$> subWordType,
 977      Q.LBase <$> baseType
 978    ]
 979  ws val <&> Q.Load t
 980\end{code}
 981
 982For types smaller than long, two variants of the load instruction are
 983available: one will sign extend the loaded value, while the other will zero
 984extend it. Note that all loads smaller than long can load to either a long or a
 985word.
 986
 987The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect
 988when they are used to define a word temporary. A \texttt{loadw} instruction is
 989provided as syntactic sugar for \texttt{loadsw} to make explicit that the
 990extension mechanism used is irrelevant.
 991
 992\subsubsection{Blits}
 993
 994\begin{code}
 995blitInstr :: Parser Q.VolatileInstr
 996blitInstr = do
 997  v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')
 998  v2 <- ws val <* (ws $ char ',')
 999  nb <- decNumber
1000  return $ Q.Blit v1 v2 nb
1001\end{code}
1002
1003The blit instruction copies in-memory data from its first address argument to
1004its second address argument. The third argument is the number of bytes to copy.
1005The source and destination spans are required to be either non-overlapping, or
1006fully overlapping (source address identical to the destination address). The
1007byte count argument must be a nonnegative numeric constant; it cannot be a
1008temporary.
1009
1010One blit instruction may generate a number of instructions proportional to its
1011byte count argument, consequently, it is recommended to keep this argument
1012relatively small. If large copies are necessary, it is preferable that
1013frontends generate calls to a supporting \texttt{memcpy} function.
1014
1015\subsubsection{Stack Allocation}
1016
1017\begin{code}
1018allocInstr :: Parser Q.Instr
1019allocInstr = do
1020  siz <- (ws $ string "alloc") >> (ws1 allocSize)
1021  val <&> Q.Alloc siz
1022\end{code}
1023
1024These instructions allocate a chunk of memory on the stack. The number ending
1025the instruction name is the alignment required for the allocated slot. QBE will
1026make sure that the returned address is a multiple of that alignment value.
1027
1028Stack allocation instructions are used, for example, when compiling the C local
1029variables, because their address can be taken. When compiling Fortran,
1030temporaries can be used directly instead, because it is illegal to take the
1031address of a variable.
1032
1033\subsection{Comparisons}
1034\label{sec:comparisions}
1035
1036\begin{code}
1037compareInstr :: Parser Q.Instr
1038compareInstr = do
1039  _ <- char 'c'
1040  (try intCompare) <|> floatCompare
1041
1042compareArgs :: Parser (Q.Value, Q.Value)
1043compareArgs = do
1044  lhs <- ws val <* ws (char ',')
1045  rhs <- ws val
1046  pure (lhs, rhs)
1047
1048intCompare :: Parser Q.Instr
1049intCompare = do
1050  op <- compareIntOp
1051  ty <- ws1 intArg
1052
1053  (lhs, rhs) <- compareArgs
1054  pure $ Q.CompareInt ty op lhs rhs
1055
1056floatCompare :: Parser Q.Instr
1057floatCompare = do
1058  op <- compareFloatOp
1059  ty <- ws1 floatArg
1060
1061  (lhs, rhs) <- compareArgs
1062  pure $ Q.CompareFloat ty op lhs rhs
1063\end{code}
1064
1065Comparison instructions return an integer value (either a word or a long), and
1066compare values of arbitrary types. The returned value is 1 if the two operands
1067satisfy the comparison relation, or 0 otherwise. The names of comparisons
1068respect a standard naming scheme in three parts:
1069
1070\begin{enumerate}
1071  \item All comparisons start with the letter \texttt{c}.
1072  \item Then comes a comparison type.
1073  \item Finally, the instruction name is terminated with a basic type suffix precising the type of the operands to be compared.
1074\end{enumerate}
1075
1076The following instruction are available for integer comparisons:
1077
1078\begin{code}
1079compareIntOp :: Parser Q.IntCmpOp
1080compareIntOp = choice
1081  [ bind "eq" Q.IEq
1082  , bind "ne" Q.INe
1083  , try $ bind "sle" Q.ISle
1084  , try $ bind "slt" Q.ISlt
1085  , try $ bind "sge" Q.ISge
1086  , try $ bind "sgt" Q.ISgt
1087  , try $ bind "ule" Q.IUle
1088  , try $ bind "ult" Q.IUlt
1089  , try $ bind "uge" Q.IUge
1090  , try $ bind "ugt" Q.IUgt ]
1091\end{code}
1092
1093For floating point comparisons use one of these instructions:
1094
1095\begin{code}
1096compareFloatOp :: Parser Q.FloatCmpOp
1097compareFloatOp = choice
1098  [ bind "eq" Q.FEq
1099  , bind "ne" Q.FNe
1100  , try $ bind "le" Q.FLe
1101  , bind "lt" Q.FLt
1102  , try $ bind "ge" Q.FGe
1103  , bind "gt" Q.FGt
1104  , bind "o" Q.FOrd
1105  , bind "uo" Q.FUnord ]
1106\end{code}
1107
1108For example, \texttt{cod} compares two double-precision floating point numbers
1109and returns 1 if the two floating points are not NaNs, or 0 otherwise. The
1110\texttt{csltw} instruction compares two words representing signed numbers and
1111returns 1 when the first argument is smaller than the second one.
1112
1113\subsection{Conversions}
1114
1115Conversion operations change the representation of a value, possibly modifying
1116it if the target type cannot hold the value of the source type. Conversions can
1117extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or
1118convert a floating point into an integer and vice versa.
1119
1120\begin{code}
1121extInstr :: Parser Q.Instr
1122extInstr = do
1123  _ <- string "ext"
1124  ty <- ws1 extArg
1125  ws val <&> Q.Ext ty
1126 where
1127  extArg :: Parser Q.ExtArg
1128  extArg = try (Q.ExtSubWord <$> subWordType)
1129    <|> try (bind "sw" Q.ExtSignedWord)
1130    <|> bind "s" Q.ExtSingle
1131    <|> bind "uw" Q.ExtUnsignedWord
1132\end{code}
1133
1134Extending the precision of a temporary is done using the \texttt{ext} family of
1135instructions. Because QBE types do not specify the signedness (like in LLVM),
1136extension instructions exist to sign-extend and zero-extend a value. For
1137example, \texttt{extsb} takes a word argument and sign-extends the 8
1138least-significant bits to a full word or long, depending on the return type.
1139
1140\begin{code}
1141truncInstr :: Parser Q.Instr
1142truncInstr = do
1143  _ <- ws1 $ string "truncd"
1144  ws val <&> Q.TruncDouble
1145\end{code}
1146
1147The instructions \texttt{exts} (extend single) and \texttt{truncd} (truncate
1148double) are provided to change the precision of a floating point value. When
1149the double argument of truncd cannot be represented as a single-precision
1150floating point, it is truncated towards zero.
1151
1152\begin{code}
1153floatArg :: Parser Q.FloatArg
1154floatArg = bind "d" Q.FDouble <|> bind "s" Q.FSingle
1155
1156fromFloatInstr :: Parser Q.Instr
1157fromFloatInstr = do
1158  arg <- floatArg <* string "to"
1159  isSigned <- signageChar
1160  _ <- ws1 $ char 'i'
1161  ws val <&> Q.FloatToInt arg isSigned
1162
1163intArg :: Parser Q.IntArg
1164intArg = bind "w" Q.IWord <|> bind "l" Q.ILong
1165
1166toFloatInstr :: Parser Q.Instr
1167toFloatInstr = do
1168  isSigned <- signageChar
1169  arg <- intArg
1170  _ <- ws1 $ string "tof"
1171  ws val <&> Q.IntToFloat arg isSigned
1172\end{code}
1173
1174Converting between signed integers and floating points is done using
1175\texttt{stosi} (single to signed integer), \texttt{stoui} (single to unsigned
1176integer), \texttt{dtosi} (double to signed integer), \texttt{dtoui} (double to
1177unsigned integer), \texttt{swtof} (signed word to float), \texttt{uwtof}
1178(unsigned word to float), \texttt{sltof} (signed long to float) and
1179\texttt{ultof} (unsigned long to float).
1180
1181\subsection{Cast and Copy}
1182
1183The \texttt{cast} and \texttt{copy} instructions return the bits of their
1184argument verbatim. However a cast will change an integer into a floating point
1185of the same width and vice versa.
1186
1187Casts can be used to make bitwise operations on the representation of floating
1188point numbers. For example the following program will compute the opposite of
1189the single-precision floating point number \texttt{\%f} into \texttt{\%rs}.
1190
1191\begin{verbatim}
1192%b0 =w cast %f
1193%b1 =w xor 2147483648, %b0  # flip the msb
1194%rs =s cast %b1
1195\end{verbatim}
1196
1197\subsection{Call}
1198\label{sec:call}
1199
1200\begin{code}
1201-- TODO: Code duplication with 'param'.
1202callArg :: Parser Q.FuncArg
1203callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))
1204    <|> (string "..." >> pure Q.ArgVar)
1205    <|> do
1206          ty <- ws1 abity
1207          Q.ArgReg ty <$> val
1208
1209callArgs :: Parser [Q.FuncArg]
1210callArgs = parenLst callArg
1211
1212callInstr :: Parser Q.Statement
1213callInstr = do
1214  retValue <- optionMaybe $ do
1215    i <- ws local <* ws (char '=')
1216    a <- ws1 abity
1217    return (i, a)
1218  toCall <- ws1 (string "call") >> ws val
1219  fnArgs <- callArgs
1220  return $ Q.Call retValue toCall fnArgs
1221\end{code}
1222
1223The call instruction is special in several ways. It is not a three-address
1224instruction and requires the type of all its arguments to be given. Also, the
1225return type can be either a base type or an aggregate type. These specifics are
1226required to compile calls with C compatibility (i.e., to respect the ABI).
1227
1228When an aggregate type is used as argument type or return type, the value
1229respectively passed or returned needs to be a pointer to a memory location
1230holding the value. This is because aggregate types are not first-class
1231citizens of the IL.
1232
1233Sub-word types are used for arguments and return values of width less than a
1234word. Details on these types are presented in the \nameref{sec:functions} section.
1235Arguments with sub-word types need not be sign or zero extended according to
1236their type. Calls with a sub-word return type define a temporary of base type
1237\texttt{w} with its most significant bits unspecified.
1238
1239Unless the called function does not return a value, a return temporary must be
1240specified, even if it is never used afterwards.
1241
1242An environment parameter can be passed as first argument using the \texttt{env}
1243keyword. The passed value must be a 64-bit integer. If the called function does
1244not expect an environment parameter, it will be safely discarded. See the
1245\nameref{sec:functions} section for more information about environment
1246parameters.
1247
1248When the called function is variadic, there must be a \texttt{...} marker
1249separating the named and variadic arguments.
1250
1251\subsection{Variadic}
1252\label{sec:variadic}
1253
1254To-Do.
1255
1256\subsection{Phi}
1257
1258\begin{code}
1259phiBranch :: Parser (Q.BlockIdent, Q.Value)
1260phiBranch = do
1261  n <- ws1 label
1262  v <- val
1263  pure (n, v)
1264
1265phiInstr :: Parser Q.Phi
1266phiInstr = do
1267  -- TODO: code duplication with 'assign'
1268  n <- ws local
1269  t <- ws (char '=') >> ws1 baseType
1270
1271  _ <- ws1 (string "phi")
1272  -- TODO: combinator for sepBy
1273  p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')
1274  return $ Q.Phi n t p
1275\end{code}
1276
1277First and foremost, phi instructions are NOT necessary when writing a frontend
1278to QBE. One solution to avoid having to deal with SSA form is to use stack
1279allocated variables for all source program variables and perform assignments
1280and lookups using \nameref{sec:memory} operations. This is what LLVM users
1281typically do.
1282
1283Another solution is to simply emit code that is not in SSA form! Contrary to
1284LLVM, QBE is able to fixup programs not in SSA form without requiring the
1285boilerplate of loading and storing in memory. For example, the following
1286program will be correctly compiled by QBE.
1287
1288\begin{verbatim}
1289@start
1290    %x =w copy 100
1291    %s =w copy 0
1292@loop
1293    %s =w add %s, %x
1294    %x =w sub %x, 1
1295    jnz %x, @loop, @end
1296@end
1297    ret %s
1298\end{verbatim}
1299
1300Now, if you want to know what phi instructions are and how to use them in QBE,
1301you can read the following.
1302
1303Phi instructions are specific to SSA form. In SSA form values can only be
1304assigned once, without phi instructions, this requirement is too strong to
1305represent many programs. For example consider the following C program.
1306
1307\begin{verbatim}
1308int f(int x) {
1309    int y;
1310    if (x)
1311        y = 1;
1312    else
1313        y = 2;
1314    return y;
1315}
1316\end{verbatim}
1317
1318The variable \texttt{y} is assigned twice, the solution to translate it in SSA
1319form is to insert a phi instruction.
1320
1321\begin{verbatim}
1322@ifstmt
1323    jnz %x, @ift, @iff
1324@ift
1325    jmp @retstmt
1326@iff
1327    jmp @retstmt
1328@retstmt
1329    %y =w phi @ift 1, @iff 2
1330    ret %y
1331\end{verbatim}
1332
1333Phi instructions return one of their arguments depending on where the control
1334came from. In the example, \texttt{\%y} is set to 1 if the
1335\texttt{\textbackslash{}ift} branch is taken, or it is set to 2 otherwise.
1336
1337An important remark about phi instructions is that QBE assumes that if a
1338variable is defined by a phi it respects all the SSA invariants. So it is
1339critical to not use phi instructions unless you know exactly what you are
1340doing.
1341\end{document}