A software analysis framework built around the QBE intermediate language
git clone https://git.8pit.net/quebex.git
1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me> 2% SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net> 3% 4% SPDX-License-Identifier: MIT AND GPL-3.0-only 5 6\documentclass{article} 7%include polycode.fmt 8 9%subst blankline = "\\[5mm]" 10 11% See https://github.com/kosmikus/lhs2tex/issues/58 12%format <$> = "\mathbin{\langle\$\rangle}" 13%format <&> = "\mathbin{\langle\&\rangle}" 14%format <|> = "\mathbin{\langle\:\vline\:\rangle}" 15%format <?> = "\mathbin{\langle?\rangle}" 16%format <*> = "\mathbin{\langle*\rangle}" 17%format <* = "\mathbin{\langle*}" 18%format *> = "\mathbin{*\rangle}" 19 20\long\def\ignore#1{} 21 22\usepackage{hyperref} 23\hypersetup{ 24 colorlinks = true, 25} 26 27\begin{document} 28 29\title{QBE Intermediate Language\vspace{-2em}} 30\date{} 31\maketitle 32\frenchspacing 33 34\ignore{ 35\begin{code} 36module Language.QBE.Parser (dataDef, typeDef, funcDef) where 37 38import Data.Word (Word64) 39import Data.Functor ((<&>)) 40import Data.List (singleton) 41import Data.Map qualified as Map 42import qualified Language.QBE.Types as Q 43import Language.QBE.Util (bind, decNumber, float) 44import Text.ParserCombinators.Parsec 45 ( Parser, 46 alphaNum, 47 anyChar, 48 between, 49 char, 50 choice, 51 letter, 52 many, 53 many1, 54 manyTill, 55 newline, 56 noneOf, 57 oneOf, 58 optionMaybe, 59 sepBy, 60 sepBy1, 61 skipMany, 62 skipMany1, 63 string, 64 try, 65 (<?>), 66 (<|>), 67 ) 68\end{code} 69} 70 71This an executable description of the 72\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language}, 73specified through \href{https://hackage.haskell.org/package/parsec}{Parsec} 74parser combinators and generated from a literate Haskell file. The description 75is derived from the original QBE IL documentation, licensed under MIT. 76Presently, this implementation targets version 1.2 of the QBE intermediate 77language and aims to be equivalent with the original specification. 78 79\section{Basic Concepts} 80 81The intermediate language (IL) is a higher-level language than the 82machine's assembly language. It smoothes most of the 83irregularities of the underlying hardware and allows an infinite number 84of temporaries to be used. This higher abstraction level lets frontend 85programmers focus on language design issues. 86 87\subsection{Input Files} 88 89The intermediate language is provided to QBE as text. Usually, one file 90is generated per each compilation unit from the frontend input language. 91An IL file is a sequence of \nameref{sec:definitions} for 92data, functions, and types. Once processed by QBE, the resulting file 93can be assembled and linked using a standard toolchain (e.g., GNU 94binutils). 95 96\begin{code} 97comment :: Parser () 98comment = skipMany blankNL >> comment' >> skipMany blankNL 99 where 100 comment' = char '#' >> manyTill anyChar newline 101\end{code} 102 103\ignore{ 104\begin{code} 105skipNoCode :: Parser () -> Parser () 106skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP 107\end{code} 108} 109 110Here is a complete "Hello World" IL file which defines a function that 111prints to the screen. Since the string is not a first class object (only 112the pointer is) it is defined outside the function\textquotesingle s 113body. Comments start with a \# character and finish with the end of the 114line. 115 116\begin{verbatim} 117data $str = { b "hello world", b 0 } 118 119export function w $main() { 120@start 121 # Call the puts function with $str as argument. 122 %r =w call $puts(l $str) 123 ret 0 124} 125\end{verbatim} 126 127If you have read the LLVM language reference, you might recognize the 128example above. In comparison, QBE makes a much lighter use of types and 129the syntax is terser. 130 131\subsection{Parser Combinators} 132 133\ignore{ 134\begin{code} 135bracesNL :: Parser a -> Parser a 136bracesNL = between (wsNL $ char '{') (wsNL $ char '}') 137 138quoted :: Parser a -> Parser a 139quoted = let q = char '"' in between q q 140 141parenLst :: Parser a -> Parser [a] 142parenLst p = between (ws $ char '(') (char ')') inner 143 where 144 inner = sepBy (ws p) (ws $ char ',') 145 146unaryInstr :: (Q.Value -> Q.Instr) -> String -> Parser Q.Instr 147unaryInstr conc keyword = do 148 _ <- ws (string keyword) 149 conc <$> ws val 150 151binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr 152binaryInstr conc keyword = do 153 _ <- ws (string keyword) 154 vfst <- ws val <* ws (char ',') 155 conc vfst <$> ws val 156 157-- Can only appear in data and type definitions and hence allows newlines. 158alignAny :: Parser Word64 159alignAny = (ws1 (string "align")) >> wsNL decNumber 160\end{code} 161} 162 163The original QBE specification defines the syntax using a BNF grammar. In 164contrast, this document defines it using Parsec parser combinators. As such, 165this specification is less formal but more accurate as the parsing code is 166actually executable. Consequently, this specification also captures constructs 167omitted in the original specification (e.g., \nameref{sec:identifiers}, 168\nameref{sec:statements}, or \nameref{sec:strlit}). Nonetheless, the formal 169language recognized by these combinators aims to be equivalent to the one of 170the BNF grammar. 171 172\subsection{Identifiers} 173\label{sec:identifiers} 174 175% Ident is not documented in the original QBE specification. 176% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304 177 178\begin{code} 179ident :: Parser String 180ident = do 181 start <- letter <|> oneOf "._" 182 rest <- many (alphaNum <|> oneOf "$._") 183 return $ start : rest 184\end{code} 185 186Identifiers for data, types, and functions can start with any ASCII letter or 187the special characters \texttt{.} and \texttt{\_}. This initial character can 188be followed by a sequence of zero or more alphanumeric characters and the 189special characters \texttt{\$}, \texttt{.}, and \texttt{\_}. 190 191\subsection{Sigils} 192 193\begin{code} 194userDef :: Parser Q.UserIdent 195userDef = Q.UserIdent <$> (char ':' >> ident) 196 197global :: Parser Q.GlobalIdent 198global = Q.GlobalIdent <$> (char '$' >> ident) 199 200local :: Parser Q.LocalIdent 201local = Q.LocalIdent <$> (char '%' >> ident) 202 203label :: Parser Q.BlockIdent 204label = Q.BlockIdent <$> (char '@' >> ident) 205\end{code} 206 207The intermediate language makes heavy use of sigils, all user-defined 208names are prefixed with a sigil. This is to avoid keyword conflicts, and 209also to quickly spot the scope and nature of identifiers. 210 211\begin{itemize} 212 \item \texttt{:} is for user-defined \nameref{sec:aggregate-types} 213 \item \texttt{\$} is for globals (represented by a pointer) 214 \item \texttt{\%} is for function-scope temporaries 215 \item \texttt{@@} is for block labels 216\end{itemize} 217 218\subsection{Spacing} 219 220\begin{code} 221blank :: Parser Char 222blank = oneOf "\t " <?> "blank" 223 224blankNL :: Parser Char 225blankNL = oneOf "\n\t " <?> "blank or newline" 226\end{code} 227 228Individual tokens in IL files must be separated by one or more spacing 229characters. Both spaces and tabs are recognized as spacing characters. 230In data and type definitions, newlines may also be used as spaces to 231prevent overly long lines. When exactly one of two consecutive tokens is 232a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted. 233 234\ignore{ 235\begin{code} 236ws :: Parser a -> Parser a 237ws p = p <* skipMany blank 238 239ws1 :: Parser a -> Parser a 240ws1 p = p <* skipMany1 blank 241 242wsNL :: Parser a -> Parser a 243wsNL p = p <* skipNoCode (skipMany blankNL) 244 245wsNL1 :: Parser a -> Parser a 246wsNL1 p = p <* skipNoCode (skipMany1 blankNL) 247\end{code} 248} 249 250\subsection{String Literals} 251\label{sec:strlit} 252 253% The string literal is not documented in the original QBE specification. 254% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287 255 256\begin{code} 257strLit :: Parser String 258strLit = concat <$> quoted (many strChr) 259 where 260 strChr :: Parser [Char] 261 strChr = (singleton <$> noneOf "\"\\") <|> escSeq 262 263 escSeq :: Parser [Char] 264 escSeq = try $ do 265 esc <- char '\\' 266 chr <- anyChar 267 return $ [esc, chr] 268\end{code} 269 270Strings are enclosed by double quotes and are, for example, used to specify a 271section name as part of the \nameref{sec:linkage} information. Within a string, 272a double quote can be escaped using a \texttt{\textbackslash} character. All 273escape sequences, including double quote escaping, are passed through as-is to 274the generated assembly file. 275 276\section{Types} 277 278\subsection{Simple Types} 279 280The IL makes minimal use of types. By design, the types used are 281restricted to what is necessary for unambiguous compilation to machine 282code and C interfacing. Unlike LLVM, QBE is not using types as a means 283to safety; they are only here for semantic purposes. 284 285\begin{code} 286baseType :: Parser Q.BaseType 287baseType = choice 288 [ bind "w" Q.Word 289 , bind "l" Q.Long 290 , bind "s" Q.Single 291 , bind "d" Q.Double ] 292\end{code} 293 294The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d} 295(double), they stand respectively for 32-bit and 64-bit integers, and 29632-bit and 64-bit floating-point numbers. There are no pointer types 297available; pointers are typed by an integer type sufficiently wide to 298represent all memory addresses (e.g., \texttt{l} on 64-bit architectures). 299Temporaries in the IL can only have a base type. 300 301\begin{code} 302extType :: Parser Q.ExtType 303extType = (Q.Base <$> baseType) 304 <|> bind "b" Q.Byte 305 <|> bind "h" Q.HalfWord 306\end{code} 307 308Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word), 309respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types} 310and \nameref{sec:data} definitions. 311 312For C interfacing, the IL also provides user-defined aggregate types as 313well as signed and unsigned variants of the sub-word extended types. 314Read more about these types in the \nameref{sec:aggregate-types} 315and \nameref{sec:functions} sections. 316 317\subsection{Subtyping} 318 319The IL has a minimal subtyping feature, for integer types only. Any 320value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the 32132 least significant bits of the word value are used. 322 323Make note that it is the opposite of the usual subtyping on integers (in 324C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value 325cannot be used in word context. The rationale is that a word can be 326signed or unsigned, so extending it to a long could be done in two ways, 327either by zero-extension, or by sign-extension. 328 329\subsection{Constants and Vals} 330\label{sec:constants-and-vals} 331 332\begin{code} 333dynConst :: Parser Q.DynConst 334dynConst = 335 (Q.Const <$> constant) 336 <|> (Q.Thread <$> global) 337 <?> "dynconst" 338\end{code} 339 340Constants come in two kinds: compile-time constants and dynamic 341constants. Dynamic constants include compile-time constants and other 342symbol variants that are only known at program-load time or execution 343time. Consequently, dynamic constants can only occur in function bodies. 344 345The representation of integers is two's complement. 346Floating-point numbers are represented using the single-precision and 347double-precision formats of the IEEE 754 standard. 348 349\begin{code} 350constant :: Parser Q.Const 351constant = 352 (Q.Number <$> decNumber) 353 <|> (Q.SFP <$> sfp) 354 <|> (Q.DFP <$> dfp) 355 <|> (Q.Global <$> global) 356 <?> "const" 357 where 358 sfp = string "s_" >> float 359 dfp = string "d_" >> float 360\end{code} 361 362Constants specify a sequence of bits and are untyped. They are always 363parsed as 64-bit blobs. Depending on the context surrounding a constant, 364only some of its bits are used. For example, in the program below, the 365two variables defined have the same value since the first operand of the 366subtraction is a word (32-bit) context. 367 368\begin{verbatim} 369%x =w sub -1, 0 %y =w sub 4294967295, 0 370\end{verbatim} 371 372Because specifying floating-point constants by their bits makes the code 373less readable, syntactic sugar is provided to express them. Standard 374scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and 375double precision numbers respectively. Once again, the following example 376defines twice the same double-precision constant. 377 378\begin{verbatim} 379%x =d add d_0, d_-1 380%y =d add d_0, -4616189618054758400 381\end{verbatim} 382 383Global symbols can also be used directly as constants; they will be 384resolved and turned into actual numeric constants by the linker. 385 386When the \texttt{thread} keyword prefixes a symbol name, the 387symbol\textquotesingle s numeric value is resolved at runtime in the 388thread-local storage. 389 390\begin{code} 391val :: Parser Q.Value 392val = 393 (Q.VConst <$> dynConst) 394 <|> (Q.VLocal <$> local) 395 <?> "val" 396\end{code} 397 398Vals are used as arguments in regular, phi, and jump instructions within 399function definitions. They are either constants or function-scope 400temporaries. 401 402\subsection{Linkage} 403\label{sec:linkage} 404 405\begin{code} 406linkage :: Parser Q.Linkage 407linkage = 408 wsNL (bind "export" Q.LExport) 409 <|> wsNL (bind "thread" Q.LThread) 410 <|> do 411 _ <- ws1 $ string "section" 412 (try secWithFlags) <|> sec 413 where 414 sec :: Parser Q.Linkage 415 sec = wsNL strLit <&> (`Q.LSection` Nothing) 416 417 secWithFlags :: Parser Q.Linkage 418 secWithFlags = do 419 n <- ws1 strLit 420 wsNL strLit <&> Q.LSection n . Just 421\end{code} 422 423Function and data definitions (see below) can specify linkage 424information to be passed to the assembler and eventually to the linker. 425 426The \texttt{export} linkage flag marks the defined item as visible outside the 427current file\textquotesingle s scope. If absent, the symbol can only be 428referred to locally. Functions compiled by QBE and called from C need to 429be exported. 430 431The \texttt{thread} linkage flag can only qualify data definitions. It mandates 432that the object defined is stored in thread-local storage. Each time a 433runtime thread starts, the supporting platform runtime is in charge of 434making a new copy of the object for the fresh thread. Objects in 435thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax, 436as specified in the \nameref{sec:constants-and-vals} section. 437 438A \texttt{section} flag can be specified to tell the linker to put the defined 439item in a certain section. The use of the section flag is platform 440dependent and we refer the user to the documentation of their assembler 441and linker for relevant information. 442 443\begin{verbatim} 444section ".init_array" data $.init.f = { l $f } 445\end{verbatim} 446 447The section flag can be used to add function pointers to a global 448initialization list, as depicted above. Note that some platforms provide 449a BSS section that can be used to minimize the footprint of uniformly 450zeroed data. When this section is available, QBE will automatically make 451use of it and no section flag is required. 452 453The section and export linkage flags should each appear at most once in 454a definition. If multiple occurrences are present, QBE is free to use 455any. 456 457\subsection{Definitions} 458\label{sec:definitions} 459 460Definitions are the essential components of an IL file. They can define 461three types of objects: aggregate types, data, and functions. Aggregate 462types are never exported and do not compile to any code. Data and 463function definitions have file scope and are mutually recursive (even 464across IL files). Their visibility can be controlled using linkage 465flags. 466 467\subsubsection{Aggregate Types} 468\label{sec:aggregate-types} 469 470\begin{code} 471typeDef :: Parser Q.TypeDef 472typeDef = do 473 _ <- wsNL1 (string "type") 474 i <- wsNL1 userDef 475 _ <- wsNL1 (char '=') 476 a <- optionMaybe alignAny 477 bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a 478\end{code} 479 480Aggregate type definitions start with the \texttt{type} keyword. They have file 481scope, but types must be defined before being referenced. The inner 482structure of a type is expressed by a comma-separated list of fields. 483 484\begin{code} 485subType :: Parser Q.SubType 486subType = 487 (Q.SExtType <$> extType) 488 <|> (Q.SUserDef <$> userDef) 489 490field :: Parser Q.Field 491field = do 492 -- TODO: newline is required if there is a number argument 493 f <- wsNL subType 494 s <- ws $ optionMaybe decNumber 495 pure (f, s) 496 497-- TODO: "For ease of IL generation, a trailing comma is tolerated by the parser". 498fields :: Bool -> Parser [Q.Field] 499fields allowEmpty = 500 (if allowEmpty then sepBy else sepBy1) field (wsNL $ char ',') 501\end{code} 502 503A field consists of a subtype, either an extended type or a user-defined type, 504and an optional number expressing the value of this field. In case many items 505of the same type are sequenced (like in a C array), the shorter array syntax 506can be used. 507 508\begin{code} 509regularType :: Parser Q.AggType 510regularType = Q.ARegular <$> fields True 511\end{code} 512 513Three different kinds of aggregate types are presentl ysupported: regular 514types, union types and opaque types. The fields of regular types will be 515packed. By default, the alignment of an aggregate type is the maximum alignment 516of its members. The alignment can be explicitly specified by the programmer. 517 518\begin{code} 519unionType :: Parser Q.AggType 520unionType = Q.AUnion <$> many1 (wsNL unionType') 521 where 522 unionType' :: Parser [Q.Field] 523 unionType' = bracesNL $ fields False 524\end{code} 525 526Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified. 527 528\begin{code} 529opaqueType :: Parser Q.AggType 530opaqueType = Q.AOpaque <$> wsNL decNumber 531\end{code} 532 533Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces. 534 535\subsubsection{Data} 536\label{sec:data} 537 538\begin{code} 539dataDef :: Parser Q.DataDef 540dataDef = do 541 link <- many linkage 542 name <- wsNL1 (string "data") >> wsNL global 543 _ <- wsNL (char '=') 544 alignment <- optionMaybe alignAny 545 bracesNL dataObjs <&> Q.DataDef link name alignment 546 where 547 dataObjs = sepBy dataObj (wsNL $ char ',') 548\end{code} 549 550Data definitions express objects that will be emitted in the compiled 551file. Their visibility and location in the compiled artifact are 552controlled with linkage flags described in the \nameref{sec:linkage} 553section. 554 555They define a global identifier (starting with the sigil \texttt{\$}), that 556will contain a pointer to the object specified by the definition. 557 558\begin{code} 559dataObj :: Parser Q.DataObj 560dataObj = 561 (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber)) 562 <|> do 563 t <- wsNL1 extType 564 i <- many1 (wsNL dataItem) 565 return $ Q.OItem t i 566\end{code} 567 568Objects are described by a sequence of fields that start with a type 569letter. This letter can either be an extended type, or the \texttt{z} letter. 570If the letter used is an extended type, the data item following 571specifies the bits to be stored in the field. 572 573\begin{code} 574dataItem :: Parser Q.DataItem 575dataItem = 576 (Q.DString <$> strLit) 577 <|> (Q.DConst <$> constant) 578 <|> do 579 i <- global 580 off <- optionMaybe (char '+' >> decNumber) 581 return $ Q.DSymbol i off 582\end{code} 583 584Within each object, several items can be defined. When several data items 585follow a letter, they initialize multiple fields of the same size. 586 587\begin{code} 588allocSize :: Parser Q.AllocSize 589allocSize = 590 choice 591 [ bind "4" Q.AlignWord, 592 bind "8" Q.AlignLong, 593 bind "16" Q.AlignLongLong 594 ] 595\end{code} 596 597The members of a struct will be packed. This means that padding has to 598be emitted by the frontend when necessary. Alignment of the whole data 599objects can be manually specified, and when no alignment is provided, 600the maximum alignment from the platform is used. 601 602When the \texttt{z} letter is used the number following indicates the size of 603the field; the contents of the field are zero initialized. It can be 604used to add padding between fields or zero-initialize big arrays. 605 606\subsubsection{Functions} 607\label{sec:functions} 608 609\begin{code} 610funcDef :: Parser Q.FuncDef 611funcDef = do 612 link <- many linkage 613 _ <- ws1 (string "function") 614 retTy <- optionMaybe (ws1 abity) 615 name <- ws global 616 args <- wsNL params 617 body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block 618 619 case (Q.insertJumps body) of 620 Nothing -> fail $ "invalid fallthrough in " ++ show name 621 Just bl -> return $ Q.FuncDef link name retTy args bl 622\end{code} 623 624Function definitions contain the actual code to emit in the compiled 625file. They define a global symbol that contains a pointer to the 626function code. This pointer can be used in \texttt{call} instructions or stored 627in memory. 628 629\begin{code} 630subWordType :: Parser Q.SubWordType 631subWordType = choice 632 [ try $ bind "sb" Q.SignedByte 633 , try $ bind "ub" Q.UnsignedByte 634 , bind "sh" Q.SignedHalf 635 , bind "uh" Q.UnsignedHalf ] 636 637abity :: Parser Q.Abity 638abity = try (Q.ASubWordType <$> subWordType) 639 <|> (Q.ABase <$> baseType) 640 <|> (Q.AUserDef <$> userDef) 641\end{code} 642 643The type given right before the function name is the return type of the 644function. All return values of this function must have this return type. 645If the return type is missing, the function must not return any value. 646 647\begin{code} 648param :: Parser Q.FuncParam 649param = (Q.Env <$> (ws1 (string "env") >> local)) 650 <|> (string "..." >> pure Q.Variadic) 651 <|> do 652 ty <- ws1 abity 653 Q.Regular ty <$> local 654 655params :: Parser [Q.FuncParam] 656params = parenLst param 657\end{code} 658 659The parameter list is a comma separated list of temporary names prefixed 660by types. The types are used to correctly implement C compatibility. 661When an argument has an aggregate type, a pointer to the aggregate is 662passed by thea caller. In the example below, we have to use a load 663instruction to get the value of the first (and only) member of the 664struct. 665 666\begin{verbatim} 667type :one = { w } 668 669function w $getone(:one %p) { 670@start 671 %val =w loadw %p 672 ret %val 673} 674\end{verbatim} 675 676If a function accepts or returns values that are smaller than a word, 677such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type 678must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand, 679respectively, for signed and unsigned 8-bit values, and signed and 680unsigned 16-bit values. Parameters associated with a sub-word type of 681bit width N only have their N least significant bits set and have base 682type \texttt{w}. For example, the function 683 684\begin{verbatim} 685function w $addbyte(w %a, sb %b) { 686@start 687 %bw =w extsb %b 688 %val =w add %a, %bw 689 ret %val 690} 691\end{verbatim} 692 693needs to sign-extend its second argument before the addition. Dually, 694return values with sub-word types do not need to be sign or zero 695extended. 696 697If the parameter list ends with \texttt{...}, the function is a variadic 698function: it can accept a variable number of arguments. To access the 699extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg} 700instructions described in the \nameref{sec:variadic} section. 701 702Optionally, the parameter list can start with an environment parameter 703\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e., 704of type \texttt{l}). If the function does not use its environment parameter, 705callers can safely omit it. This parameter is invisible to a C caller: 706for example, the function 707 708\begin{verbatim} 709export function w $add(env %e, w %a, w %b) { 710@start 711 %c =w add %a, %b 712 ret %c 713} 714\end{verbatim} 715 716must be given the C prototype \texttt{int add(int, int)}. The intended use of 717this feature is to pass the environment pointer of closures while 718retaining a very good compatibility with C. The \nameref{sec:call} 719section explains how to pass an environment parameter. 720 721Since global symbols are defined mutually recursive, there is no need 722for function declarations: a function can be referenced before its 723definition. Similarly, functions from other modules can be used without 724previous declaration. All the type information necessary to compile a 725call is in the instruction itself. 726 727The syntax and semantics for the body of functions are described in the 728\nameref{sec:control} section. 729 730\section{Control} 731\label{sec:control} 732 733The IL represents programs as textual transcriptions of control flow 734graphs. The control flow is serialized as a sequence of blocks of 735straight-line code which are connected using jump instructions. 736 737\subsection{Blocks} 738\label{sec:blocks} 739 740\begin{code} 741block :: Parser Q.Block' 742block = do 743 l <- wsNL1 label 744 p <- many (wsNL1 $ try phiInstr) 745 s <- many (wsNL1 statement) 746 Q.Block' l p s <$> (optionMaybe $ wsNL1 jumpInstr) 747\end{code} 748 749All blocks have a name that is specified by a label at their beginning. 750Then follows a sequence of instructions that have "fall-through" flow. 751Finally one jump terminates the block. The jump can either transfer 752control to another block of the same function or return; jumps are 753described further below. 754 755The first block in a function must not be the target of any jump in the 756program. If a jump to the function start is needed, the frontend must 757insert an empty prelude block at the beginning of the function. 758 759When one block jumps to the next block in the IL file, it is not 760necessary to write the jump instruction, it will be automatically added 761by the parser. For example the start block in the example below jumps 762directly to the loop block. 763 764\subsection{Jumps} 765 766\begin{code} 767jumpInstr :: Parser Q.JumpInstr 768jumpInstr = (string "hlt" >> pure Q.Halt) 769 -- TODO: Return requires a space if there is an optionMaybe 770 <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val) 771 <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label)) 772 <|> do 773 _ <- ws1 $ string "jnz" 774 v <- ws val <* ws (char ',') 775 l1 <- ws label <* ws (char ',') 776 l2 <- ws label 777 return $ Q.Jnz v l1 l2 778\end{code} 779 780A jump instruction ends every block and transfers the control to another 781program location. The target of a jump must never be the first block in 782a function. The three kinds of jumps available are described in the 783following list. 784 785\begin{enumerate} 786 \item \textbf{Unconditional jump.} Jumps to another block of the same function. 787 \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0. 788 \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used. 789 \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}. 790\end{enumerate} 791 792\section{Instructions} 793\label{sec:instructions} 794 795\begin{code} 796instr :: Parser Q.Instr 797instr = 798 choice 799 [ try $ binaryInstr Q.Add "add", 800 try $ binaryInstr Q.Sub "sub", 801 try $ binaryInstr Q.Mul "mul", 802 try $ binaryInstr Q.Div "div", 803 try $ binaryInstr Q.URem "urem", 804 try $ binaryInstr Q.Rem "rem", 805 try $ binaryInstr Q.UDiv "udiv", 806 try $ binaryInstr Q.Or "or", 807 try $ binaryInstr Q.Xor "xor", 808 try $ binaryInstr Q.And "and", 809 try $ binaryInstr Q.Sar "sar", 810 try $ binaryInstr Q.Shr "shr", 811 try $ binaryInstr Q.Shl "shl", 812 try $ unaryInstr Q.Neg "neg", 813 try $ loadInstr, 814 try $ allocInstr, 815 try $ compareInstr, 816 try $ extInstr 817 ] 818\end{code} 819 820Instructions are the smallest piece of code in the IL, they form the body of 821\nameref{sec:blocks}. This specification distinguishes instructions and 822volatile instructions, the latter do not return a value. For the former, the IL 823uses a three-address code, which means that one instruction computes an 824operation between two operands and assigns the result to a third one. 825 826\begin{code} 827assign :: Parser Q.Statement 828assign = do 829 n <- ws local 830 t <- ws (char '=') >> ws1 baseType 831 Q.Assign n t <$> instr 832 833volatileInstr :: Parser Q.Statement 834volatileInstr = Q.Volatile <$> (storeInstr <|> blitInstr) 835 836-- TODO: Not documented in the QBE BNF. 837statement :: Parser Q.Statement 838statement = (try callInstr) <|> assign <|> volatileInstr 839\end{code} 840 841An instruction has both a name and a return type, this return type is a base 842type that defines the size of the instruction's result. The type of the 843arguments can be unambiguously inferred using the instruction name and the 844return type. For example, for all arithmetic instructions, the type of the 845arguments is the same as the return type. The two additions below are valid if 846\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}). 847 848\begin{verbatim} 849%x =w add 0, %y 850%z =w add %x, %x 851\end{verbatim} 852 853Some instructions, like comparisons and memory loads have operand types 854that differ from their return types. For instance, two floating points 855can be compared to give a word result (0 if the comparison succeeds, 1 856if it fails). 857 858\begin{verbatim} 859%c =w cgts %a, %b 860\end{verbatim} 861 862In the example above, both operands have to have single type. This is 863made explicit by the instruction suffix. 864 865\subsection{Arithmetic and Bits} 866 867To-Do. 868 869\subsection{Memory} 870\label{sec:memory} 871 872To-Do. 873 874\subsubsection{Store instructions} 875 876\begin{code} 877storeInstr :: Parser Q.VolatileInstr 878storeInstr = do 879 t <- string "store" >> ws1 extType 880 v <- ws val 881 _ <- ws $ char ',' 882 ws val <&> Q.Store t v 883\end{code} 884 885Store instructions exist to store a value of any base type and any extended 886type. Since halfwords and bytes are not first class in the IL, \texttt{storeh} 887and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of 888this word will be stored in memory at the address specified in the second 889argument. 890 891\subsubsection{Load instructions} 892 893\begin{code} 894loadInstr :: Parser Q.Instr 895loadInstr = do 896 _ <- string "load" 897 t <- ws1 $ choice 898 [ try $ bind "sw" (Q.LBase Q.Word), 899 try $ bind "uw" (Q.LBase Q.Word), 900 try $ Q.LSubWord <$> subWordType, 901 Q.LBase <$> baseType 902 ] 903 ws val <&> Q.Load t 904\end{code} 905 906For types smaller than long, two variants of the load instruction are 907available: one will sign extend the loaded value, while the other will zero 908extend it. Note that all loads smaller than long can load to either a long or a 909word. 910 911The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect 912when they are used to define a word temporary. A \texttt{loadw} instruction is 913provided as syntactic sugar for \texttt{loadsw} to make explicit that the 914extension mechanism used is irrelevant. 915 916\subsubsection{Blits} 917 918\begin{code} 919blitInstr :: Parser Q.VolatileInstr 920blitInstr = do 921 v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',') 922 v2 <- ws val <* (ws $ char ',') 923 nb <- decNumber 924 return $ Q.Blit v1 v2 nb 925\end{code} 926 927The blit instruction copies in-memory data from its first address argument to 928its second address argument. The third argument is the number of bytes to copy. 929The source and destination spans are required to be either non-overlapping, or 930fully overlapping (source address identical to the destination address). The 931byte count argument must be a nonnegative numeric constant; it cannot be a 932temporary. 933 934One blit instruction may generate a number of instructions proportional to its 935byte count argument, consequently, it is recommended to keep this argument 936relatively small. If large copies are necessary, it is preferable that 937frontends generate calls to a supporting \texttt{memcpy} function. 938 939\subsubsection{Stack Allocation} 940 941\begin{code} 942allocInstr :: Parser Q.Instr 943allocInstr = do 944 siz <- (ws $ string "alloc") >> (ws1 allocSize) 945 val <&> Q.Alloc siz 946\end{code} 947 948These instructions allocate a chunk of memory on the stack. The number ending 949the instruction name is the alignment required for the allocated slot. QBE will 950make sure that the returned address is a multiple of that alignment value. 951 952Stack allocation instructions are used, for example, when compiling the C local 953variables, because their address can be taken. When compiling Fortran, 954temporaries can be used directly instead, because it is illegal to take the 955address of a variable. 956 957\subsection{Comparisons} 958 959Comparison instructions return an integer value (either a word or a long), and 960compare values of arbitrary types. The returned value is 1 if the two operands 961satisfy the comparison relation, or 0 otherwise. The names of comparisons 962respect a standard naming scheme in three parts. 963 964\begin{code} 965compareInstr :: Parser Q.Instr 966compareInstr = do 967 _ <- char 'c' 968 op <- compareOp 969 ty <- ws1 baseType 970 lhs <- ws val <* ws (char ',') 971 rhs <- ws val 972 pure $ Q.Compare ty op lhs rhs 973\end{code} 974 975\begin{code} 976compareOp :: Parser Q.CmpOp 977compareOp = choice 978 [ bind "eq" Q.CEq 979 , bind "ne" Q.CNe 980 , try $ bind "sle" Q.CSle 981 , try $ bind "slt" Q.CSlt 982 , try $ bind "sge" Q.CSge 983 , try $ bind "sgt" Q.CSgt 984 , try $ bind "ule" Q.CUle 985 , try $ bind "ult" Q.CUlt 986 , try $ bind "uge" Q.CUge 987 , try $ bind "ugt" Q.CUgt ] 988\end{code} 989 990For example, \texttt{cod} (\texttt{I(dd,dd)}) compares two double-precision floating point numbers and returns 1 if the two floating points are not NaNs, or 0 otherwise. The \texttt{csltw} (\texttt{I(ww,ww)}) instruction compares two words representing signed numbers and returns 1 when the first argument is smaller than the second one. 991 992\subsection{Conversions} 993 994\begin{code} 995subLongType :: Parser Q.SubLongType 996subLongType = try (Q.SLSubWord <$> subWordType) 997 <|> bind "sw" Q.SLSignedWord 998 <|> bind "uw" Q.SLUnsignedWord 9991000extInstr :: Parser Q.Instr1001extInstr = do1002 _ <- string "ext"1003 ty <- ws1 subLongType1004 ws val <&> Q.Ext ty1005\end{code}10061007Conversion operations change the representation of a value, possibly modifying1008it if the target type cannot hold the value of the source type. Conversions can1009extend the precision of a temporary (e.g., from signed 8-bit to 32-bit), or1010convert a floating point into an integer and vice versa.10111012\subsection{Cast and Copy}10131014To-Do.10151016\subsection{Call}1017\label{sec:call}10181019\begin{code}1020-- TODO: Code duplication with 'param'.1021callArg :: Parser Q.FuncArg1022callArg = (Q.ArgEnv <$> (ws1 (string "env") >> val))1023 <|> (string "..." >> pure Q.ArgVar)1024 <|> do1025 ty <- ws1 abity1026 Q.ArgReg ty <$> val10271028callArgs :: Parser [Q.FuncArg]1029callArgs = parenLst callArg10301031callInstr :: Parser Q.Statement1032callInstr = do1033 retValue <- optionMaybe $ do1034 i <- ws local <* ws (char '=')1035 a <- ws1 abity1036 return (i, a)1037 toCall <- ws1 (string "call") >> ws val1038 fnArgs <- callArgs1039 return $ Q.Call retValue toCall fnArgs1040\end{code}10411042The call instruction is special in several ways. It is not a three-address1043instruction and requires the type of all its arguments to be given. Also, the1044return type can be either a base type or an aggregate type. These specifics are1045required to compile calls with C compatibility (i.e., to respect the ABI).10461047When an aggregate type is used as argument type or return type, the value1048respectively passed or returned needs to be a pointer to a memory location1049holding the value. This is because aggregate types are not first-class1050citizens of the IL.10511052Sub-word types are used for arguments and return values of width less than a1053word. Details on these types are presented in the <@ Functions > section.1054Arguments with sub-word types need not be sign or zero extended according to1055their type. Calls with a sub-word return type define a temporary of base type1056\texttt{w} with its most significant bits unspecified.10571058Unless the called function does not return a value, a return temporary must be1059specified, even if it is never used afterwards.10601061An environment parameter can be passed as first argument using the \texttt{env}1062keyword. The passed value must be a 64-bit integer. If the called function does1063not expect an environment parameter, it will be safely discarded. See the1064\nameref{sec:functions} section for more information about environment1065parameters.10661067When the called function is variadic, there must be a \texttt{...} marker1068separating the named and variadic arguments.10691070\subsection{Variadic}1071\label{sec:variadic}10721073To-Do.10741075\subsection{Phi}10761077\begin{code}1078phiBranch :: Parser (Q.BlockIdent, Q.Value)1079phiBranch = do1080 n <- ws1 label1081 v <- val1082 pure (n, v)10831084phiInstr :: Parser Q.Phi1085phiInstr = do1086 -- TODO: code duplication with 'assign'1087 n <- ws local1088 t <- ws (char '=') >> ws1 baseType10891090 _ <- ws1 (string "phi")1091 -- TODO: combinator for sepBy1092 p <- Map.fromList <$> sepBy1 (ws phiBranch) (ws $ char ',')1093 return $ Q.Phi n t p1094\end{code}10951096First and foremost, phi instructions are NOT necessary when writing a frontend1097to QBE. One solution to avoid having to deal with SSA form is to use stack1098allocated variables for all source program variables and perform assignments1099and lookups using \nameref{sec:memory} operations. This is what LLVM users1100typically do.11011102Another solution is to simply emit code that is not in SSA form! Contrary to1103LLVM, QBE is able to fixup programs not in SSA form without requiring the1104boilerplate of loading and storing in memory. For example, the following1105program will be correctly compiled by QBE.11061107\begin{verbatim}1108@start1109 %x =w copy 1001110 %s =w copy 01111@loop1112 %s =w add %s, %x1113 %x =w sub %x, 11114 jnz %x, @loop, @end1115@end1116 ret %s1117\end{verbatim}11181119Now, if you want to know what phi instructions are and how to use them in QBE,1120you can read the following.11211122Phi instructions are specific to SSA form. In SSA form values can only be1123assigned once, without phi instructions, this requirement is too strong to1124represent many programs. For example consider the following C program.11251126\begin{verbatim}1127int f(int x) {1128 int y;1129 if (x)1130 y = 1;1131 else1132 y = 2;1133 return y;1134}1135\end{verbatim}11361137The variable \texttt{y} is assigned twice, the solution to translate it in SSA1138form is to insert a phi instruction.11391140\begin{verbatim}1141@ifstmt1142 jnz %x, @ift, @iff1143@ift1144 jmp @retstmt1145@iff1146 jmp @retstmt1147@retstmt1148 %y =w phi @ift 1, @iff 21149 ret %y1150\end{verbatim}11511152Phi instructions return one of their arguments depending on where the control1153came from. In the example, \texttt{%y} is set to 1 if the \texttt{@ift} branch1154is taken, or it is set to 2 otherwise.11551156An important remark about phi instructions is that QBE assumes that if a1157variable is defined by a phi it respects all the SSA invariants. So it is1158critical to not use phi instructions unless you know exactly what you are1159doing.1160\end{document}