quebex

A software analysis framework built around the QBE intermediate language

git clone https://git.8pit.net/quebex.git

  1% SPDX-FileCopyrightText: 2015-2024 Quentin Carbonneaux <quentin@c9x.me>
  2% SPDX-FileCopyrightText: 2025 Sören Tempel <soeren+git@soeren-tempel.net>
  3%
  4% SPDX-License-Identifier: MIT AND GPL-3.0-only
  5
  6\documentclass{article}
  7%include polycode.fmt
  8
  9%subst blankline = "\\[5mm]"
 10
 11% See https://github.com/kosmikus/lhs2tex/issues/58
 12%format <$> = "\mathbin{\langle\$\rangle}"
 13%format <&> = "\mathbin{\langle\&\rangle}"
 14%format <|> = "\mathbin{\langle\:\vline\:\rangle}"
 15%format <?> = "\mathbin{\langle?\rangle}"
 16%format <*> = "\mathbin{\langle*\rangle}"
 17%format <*  = "\mathbin{\langle*}"
 18%format *>  = "\mathbin{*\rangle}"
 19
 20\long\def\ignore#1{}
 21
 22\usepackage{hyperref}
 23\hypersetup{
 24	colorlinks = true,
 25}
 26
 27\begin{document}
 28
 29\title{QBE Intermediate Language\vspace{-2em}}
 30\date{}
 31\maketitle
 32\frenchspacing
 33
 34\ignore{
 35\begin{code}
 36module Language.QBE.Parser (dataDef, typeDef, funcDef) where
 37
 38import Data.Functor ((<&>))
 39import Data.List (singleton)
 40import qualified Language.QBE.Types as Q
 41import Language.QBE.Util (bind, decNumber, float)
 42import Text.ParserCombinators.Parsec
 43  ( Parser,
 44    alphaNum,
 45    anyChar,
 46    between,
 47    char,
 48    choice,
 49    letter,
 50    many,
 51    many1,
 52    manyTill,
 53    newline,
 54    noneOf,
 55    oneOf,
 56    optionMaybe,
 57    sepBy,
 58    sepBy1,
 59    skipMany,
 60    skipMany1,
 61    string,
 62    try,
 63    (<?>),
 64    (<|>),
 65  )
 66\end{code}
 67}
 68
 69This an executable description of the
 70\href{https://c9x.me/compile/doc/il-v1.2.html}{QBE intermediate language},
 71specified through \href{https://hackage.haskell.org/package/parsec}{Parsec}
 72parser combinators and generated from a literate Haskell file. The description
 73is derived from the original QBE IL documentation, licensed under MIT.
 74Presently, this implementation targets version 1.2 of the QBE intermediate
 75language and aims to be equivalent with the original specification.
 76
 77\section{Basic Concepts}
 78
 79The intermediate language (IL) is a higher-level language than the
 80machine's assembly language. It smoothes most of the
 81irregularities of the underlying hardware and allows an infinite number
 82of temporaries to be used. This higher abstraction level lets frontend
 83programmers focus on language design issues.
 84
 85\subsection{Input Files}
 86
 87The intermediate language is provided to QBE as text. Usually, one file
 88is generated per each compilation unit from the frontend input language.
 89An IL file is a sequence of \nameref{sec:definitions} for
 90data, functions, and types. Once processed by QBE, the resulting file
 91can be assembled and linked using a standard toolchain (e.g., GNU
 92binutils).
 93
 94\begin{code}
 95comment :: Parser ()
 96comment = skipMany blankNL >> comment' >> skipMany blankNL
 97  where
 98    comment' = char '#' >> manyTill anyChar newline
 99\end{code}
100
101\ignore{
102\begin{code}
103skipNoCode :: Parser () -> Parser ()
104skipNoCode blankP = try (skipMany1 comment <?> "comments") <|> blankP
105\end{code}
106}
107
108Here is a complete "Hello World" IL file which defines a function that
109prints to the screen. Since the string is not a first class object (only
110the pointer is) it is defined outside the function\textquotesingle s
111body. Comments start with a \# character and finish with the end of the
112line.
113
114\begin{verbatim}
115data $str = { b "hello world", b 0 }
116
117export function w $main() {
118@start
119        # Call the puts function with $str as argument.
120        %r =w call $puts(l $str)
121        ret 0
122}
123\end{verbatim}
124
125If you have read the LLVM language reference, you might recognize the
126example above. In comparison, QBE makes a much lighter use of types and
127the syntax is terser.
128
129\subsection{Parser Combinators}
130
131\ignore{
132\begin{code}
133bracesNL :: Parser a -> Parser a
134bracesNL = between (wsNL $ char '{') (wsNL $ char '}')
135
136quoted :: Parser a -> Parser a
137quoted = let q = char '"' in between q q
138
139binaryInstr :: (Q.Value -> Q.Value -> Q.Instr) -> String -> Parser Q.Instr
140binaryInstr conc keyword = do
141  _ <- ws (string keyword)
142  vfst <- ws val <* ws (char ',')
143  conc vfst <$> ws val
144
145-- Can only appear in data and type definitions and hence allows newlines.
146alignSpec :: Parser Q.AllocSize
147alignSpec = (ws1 (string "align")) >> wsNL allocSize
148\end{code}
149}
150
151The original QBE specification defines the syntax using a BNF grammar. In
152contrast, this document defines it using Parsec parser combinators. As such,
153this specification is less formal but more accurate as the parsing code is
154actually executable. Consequently, this specification also captures constructs
155omitted in the original specification (e.g., \nameref{sec:identifiers},
156\nameref{sec:statements}, or \nameref{sec:strlit}). Nonetheless, the formal
157language recognized by these combinators aims to be equivalent to the one of
158the BNF grammar.
159
160\subsection{Identifiers}
161\label{sec:identifiers}
162
163% Ident is not documented in the original QBE specification.
164% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n304
165
166\begin{code}
167ident :: Parser String
168ident = do
169  start <- letter <|> oneOf "._"
170  rest <- many (alphaNum <|> oneOf "$._")
171  return $ start : rest
172\end{code}
173
174Identifiers for data, types, and functions can start with any ASCII letter or
175the special characters \texttt{.} and \texttt{\_}. This initial character can
176be followed by a sequence of zero or more alphanumeric characters and the
177special characters \texttt{\$}, \texttt{.}, and \texttt{\_}.
178
179\subsection{Sigils}
180
181\begin{code}
182userDef :: Parser Q.UserIdent
183userDef = Q.UserIdent <$> (char ':' >> ident)
184
185global :: Parser Q.GlobalIdent
186global = Q.GlobalIdent <$> (char '$' >> ident)
187
188local :: Parser Q.LocalIdent
189local = Q.LocalIdent <$> (char '%' >> ident)
190
191label :: Parser Q.BlockIdent
192label = Q.BlockIdent <$> (char '@' >> ident)
193\end{code}
194
195The intermediate language makes heavy use of sigils, all user-defined
196names are prefixed with a sigil. This is to avoid keyword conflicts, and
197also to quickly spot the scope and nature of identifiers.
198
199\begin{itemize}
200  \item \texttt{:} is for user-defined \nameref{sec:aggregate-types}
201  \item \texttt{\$} is for globals (represented by a pointer)
202  \item \texttt{\%} is for function-scope temporaries
203  \item \texttt{@@} is for block labels
204\end{itemize}
205
206\subsection{Spacing}
207
208\begin{code}
209blank :: Parser Char
210blank = oneOf "\t " <?> "blank"
211
212blankNL :: Parser Char
213blankNL = oneOf "\n\t " <?> "blank or newline"
214\end{code}
215
216Individual tokens in IL files must be separated by one or more spacing
217characters. Both spaces and tabs are recognized as spacing characters.
218In data and type definitions, newlines may also be used as spaces to
219prevent overly long lines. When exactly one of two consecutive tokens is
220a symbol (for example \texttt{,} or \texttt{=} or \texttt{\{}), spacing may be omitted.
221
222\ignore{
223\begin{code}
224ws :: Parser a -> Parser a
225ws p = p <* skipMany blank
226
227ws1 :: Parser a -> Parser a
228ws1 p = p <* skipMany1 blank
229
230wsNL :: Parser a -> Parser a
231wsNL p = p <* skipNoCode (skipMany blankNL)
232
233wsNL1 :: Parser a -> Parser a
234wsNL1 p = p <* skipNoCode (skipMany1 blankNL)
235\end{code}
236}
237
238\subsection{String Literals}
239\label{sec:strlit}
240
241% The string literal is not documented in the original QBE specification.
242% See https://c9x.me/git/qbe.git/tree/parse.c?h=v1.2#n287
243
244\begin{code}
245strLit :: Parser String
246strLit = concat <$> quoted (many strChr)
247  where
248    strChr :: Parser [Char]
249    strChr = (singleton <$> noneOf "\"\\") <|> escSeq
250
251    escSeq :: Parser [Char]
252    escSeq = try $ do
253      esc <- char '\\'
254      chr <- anyChar
255      return $ [esc, chr]
256\end{code}
257
258Strings are enclosed by double quotes and are, for example, used to specify a
259section name as part of the \nameref{sec:linkage} information. Within a string,
260a double quote can be escaped using a \texttt{\textbackslash} character. All
261escape sequences, including double quote escaping, are passed through as-is to
262the generated assembly file.
263
264\section{Types}
265
266\subsection{Simple Types}
267
268The IL makes minimal use of types. By design, the types used are
269restricted to what is necessary for unambiguous compilation to machine
270code and C interfacing. Unlike LLVM, QBE is not using types as a means
271to safety; they are only here for semantic purposes.
272
273\begin{code}
274baseType :: Parser Q.BaseType
275baseType = choice
276  [ bind "w" Q.Word
277  , bind "l" Q.Long
278  , bind "s" Q.Single
279  , bind "d" Q.Double ]
280\end{code}
281
282The four base types are \texttt{w} (word), \texttt{l} (long), \texttt{s} (single), and \texttt{d}
283(double), they stand respectively for 32-bit and 64-bit integers, and
28432-bit and 64-bit floating-point numbers. There are no pointer types
285available; pointers are typed by an integer type sufficiently wide to
286represent all memory addresses (e.g., \texttt{l} on 64-bit architectures).
287Temporaries in the IL can only have a base type.
288
289\begin{code}
290extType :: Parser Q.ExtType
291extType = (Q.Base <$> baseType)
292       <|> bind "b" Q.Byte
293       <|> bind "h" Q.HalfWord
294\end{code}
295
296Extended types contain base types plus \texttt{b} (byte) and \texttt{h} (half word),
297respectively for 8-bit and 16-bit integers. They are used in \nameref{sec:aggregate-types}
298and \nameref{sec:data} definitions.
299
300For C interfacing, the IL also provides user-defined aggregate types as
301well as signed and unsigned variants of the sub-word extended types.
302Read more about these types in the \nameref{sec:aggregate-types}
303and \nameref{sec:functions} sections.
304
305\subsection{Subtyping}
306
307The IL has a minimal subtyping feature, for integer types only. Any
308value of type \texttt{l} can be used in a \texttt{w} context. In that case, only the
30932 least significant bits of the word value are used.
310
311Make note that it is the opposite of the usual subtyping on integers (in
312C, we can safely use an \texttt{int} where a \texttt{long} is expected). A long value
313cannot be used in word context. The rationale is that a word can be
314signed or unsigned, so extending it to a long could be done in two ways,
315either by zero-extension, or by sign-extension.
316
317\subsection{Constants and Vals}
318\label{sec:constants-and-vals}
319
320\begin{code}
321dynConst :: Parser Q.DynConst
322dynConst =
323  (Q.Const <$> constant)
324    <|> (Q.Thread <$> global)
325    <?> "dynconst"
326\end{code}
327
328Constants come in two kinds: compile-time constants and dynamic
329constants. Dynamic constants include compile-time constants and other
330symbol variants that are only known at program-load time or execution
331time. Consequently, dynamic constants can only occur in function bodies.
332
333The representation of integers is two's complement.
334Floating-point numbers are represented using the single-precision and
335double-precision formats of the IEEE 754 standard.
336
337\begin{code}
338constant :: Parser Q.Const
339constant =
340  (Q.Number <$> decNumber)
341    <|> (Q.SFP <$> sfp)
342    <|> (Q.DFP <$> dfp)
343    <|> (Q.Global <$> global)
344    <?> "const"
345  where
346    sfp = string "s_" >> float
347    dfp = string "d_" >> float
348\end{code}
349
350Constants specify a sequence of bits and are untyped. They are always
351parsed as 64-bit blobs. Depending on the context surrounding a constant,
352only some of its bits are used. For example, in the program below, the
353two variables defined have the same value since the first operand of the
354subtraction is a word (32-bit) context.
355
356\begin{verbatim}
357%x =w sub -1, 0 %y =w sub 4294967295, 0
358\end{verbatim}
359
360Because specifying floating-point constants by their bits makes the code
361less readable, syntactic sugar is provided to express them. Standard
362scientific notation is prefixed with \texttt{s\_} and \texttt{d\_} for single and
363double precision numbers respectively. Once again, the following example
364defines twice the same double-precision constant.
365
366\begin{verbatim}
367%x =d add d_0, d_-1
368%y =d add d_0, -4616189618054758400
369\end{verbatim}
370
371Global symbols can also be used directly as constants; they will be
372resolved and turned into actual numeric constants by the linker.
373
374When the \texttt{thread} keyword prefixes a symbol name, the
375symbol\textquotesingle s numeric value is resolved at runtime in the
376thread-local storage.
377
378\begin{code}
379val :: Parser Q.Value
380val =
381  (Q.VConst <$> dynConst)
382    <|> (Q.VLocal <$> local)
383    <?> "val"
384\end{code}
385
386Vals are used as arguments in regular, phi, and jump instructions within
387function definitions. They are either constants or function-scope
388temporaries.
389
390\subsection{Linkage}
391\label{sec:linkage}
392
393\begin{code}
394linkage :: Parser Q.Linkage
395linkage =
396  wsNL (bind "export" Q.LExport)
397    <|> wsNL (bind "thread" Q.LThread)
398    <|> do
399      _ <- ws1 $ string "section"
400      (try secWithFlags) <|> sec
401  where
402    sec :: Parser Q.Linkage
403    sec = wsNL strLit <&> (`Q.LSection` Nothing)
404
405    secWithFlags :: Parser Q.Linkage
406    secWithFlags = do
407      n <- ws1 strLit
408      wsNL strLit <&> Q.LSection n . Just
409\end{code}
410
411Function and data definitions (see below) can specify linkage
412information to be passed to the assembler and eventually to the linker.
413
414The \texttt{export} linkage flag marks the defined item as visible outside the
415current file\textquotesingle s scope. If absent, the symbol can only be
416referred to locally. Functions compiled by QBE and called from C need to
417be exported.
418
419The \texttt{thread} linkage flag can only qualify data definitions. It mandates
420that the object defined is stored in thread-local storage. Each time a
421runtime thread starts, the supporting platform runtime is in charge of
422making a new copy of the object for the fresh thread. Objects in
423thread-local storage must be accessed using the \texttt{thread \$IDENT} syntax,
424as specified in the \nameref{sec:constants-and-vals} section.
425
426A \texttt{section} flag can be specified to tell the linker to put the defined
427item in a certain section. The use of the section flag is platform
428dependent and we refer the user to the documentation of their assembler
429and linker for relevant information.
430
431\begin{verbatim}
432section ".init_array" data $.init.f = { l $f }
433\end{verbatim}
434
435The section flag can be used to add function pointers to a global
436initialization list, as depicted above. Note that some platforms provide
437a BSS section that can be used to minimize the footprint of uniformly
438zeroed data. When this section is available, QBE will automatically make
439use of it and no section flag is required.
440
441The section and export linkage flags should each appear at most once in
442a definition. If multiple occurrences are present, QBE is free to use
443any.
444
445\subsection{Definitions}
446\label{sec:definitions}
447
448Definitions are the essential components of an IL file. They can define
449three types of objects: aggregate types, data, and functions. Aggregate
450types are never exported and do not compile to any code. Data and
451function definitions have file scope and are mutually recursive (even
452across IL files). Their visibility can be controlled using linkage
453flags.
454
455\subsubsection{Aggregate Types}
456\label{sec:aggregate-types}
457
458\begin{code}
459typeDef :: Parser Q.TypeDef
460typeDef = do
461  _ <- wsNL1 (string "type")
462  i <- wsNL1 userDef
463  _ <- wsNL1 (char '=')
464  a <- optionMaybe alignSpec
465  bracesNL (opaqueType <|> unionType <|> regularType) <&> Q.TypeDef i a
466\end{code}
467
468Aggregate type definitions start with the \texttt{type} keyword. They have file
469scope, but types must be defined before being referenced. The inner
470structure of a type is expressed by a comma-separated list of fields.
471
472\begin{code}
473subType :: Parser Q.SubType
474subType =
475  (Q.SExtType <$> extType)
476    <|> (Q.SUserDef <$> userDef)
477
478field :: Parser Q.Field
479field = do
480  -- TODO: newline is required if there is a number argument
481  f <- wsNL subType
482  s <- ws $ optionMaybe decNumber
483  pure (f, s)
484
485-- TODO: "For ease of IL generation, a trailing comma is tolerated by the parser".
486fields :: Bool -> Parser [Q.Field]
487fields allowEmpty =
488  (if allowEmpty then sepBy else sepBy1) field (wsNL $ char ',')
489\end{code}
490
491A field consists of a subtype, either an extended type or a user-defined type,
492and an optional number expressing the value of this field. In case many items
493of the same type are sequenced (like in a C array), the shorter array syntax
494can be used.
495
496\begin{code}
497regularType :: Parser Q.AggType
498regularType = Q.ARegular <$> fields True
499\end{code}
500
501Three different kinds of aggregate types are presentl ysupported: regular
502types, union types and opaque types. The fields of regular types will be
503packed. By default, the alignment of an aggregate type is the maximum alignment
504of its members. The alignment can be explicitly specified by the programmer.
505
506\begin{code}
507unionType :: Parser Q.AggType
508unionType = Q.AUnion <$> many1 (wsNL unionType')
509  where
510    unionType' :: Parser [Q.Field]
511    unionType' = bracesNL $ fields False
512\end{code}
513
514Union types allow the same chunk of memory to be used with different layouts. They are defined by enclosing multiple regular aggregate type bodies in a pair of curly braces. Size and alignment of union types are set to the maximum size and alignment of each variation or, in the case of alignment, can be explicitly specified.
515
516\begin{code}
517opaqueType :: Parser Q.AggType
518opaqueType = Q.AOpaque <$> wsNL decNumber
519\end{code}
520
521Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque types is mandatory. They are defined simply by enclosing their size between curly braces.
522
523\subsubsection{Data}
524\label{sec:data}
525
526\begin{code}
527dataDef :: Parser Q.DataDef
528dataDef = do
529  link <- many linkage
530  name <- wsNL1 (string "data") >> wsNL global
531  _ <- wsNL (char '=')
532  alignment <- optionMaybe alignSpec
533  bracesNL dataObjs <&> Q.DataDef link name alignment
534 where
535    dataObjs = sepBy dataObj (wsNL $ char ',')
536\end{code}
537
538Data definitions express objects that will be emitted in the compiled
539file. Their visibility and location in the compiled artifact are
540controlled with linkage flags described in the \nameref{sec:linkage}
541section.
542
543They define a global identifier (starting with the sigil \texttt{\$}), that
544will contain a pointer to the object specified by the definition.
545
546\begin{code}
547dataObj :: Parser Q.DataObj
548dataObj =
549  (Q.OZeroFill <$> (wsNL1 (char 'z') >> wsNL decNumber))
550    <|> do
551      t <- wsNL1 extType
552      i <- many1 (wsNL dataItem)
553      return $ Q.OItem t i
554\end{code}
555
556Objects are described by a sequence of fields that start with a type
557letter. This letter can either be an extended type, or the \texttt{z} letter.
558If the letter used is an extended type, the data item following
559specifies the bits to be stored in the field.
560
561\begin{code}
562dataItem :: Parser Q.DataItem
563dataItem =
564  (Q.DString <$> strLit)
565    <|> (Q.DConst <$> constant)
566    <|> do
567      i <- global
568      off <- optionMaybe (char '+' >> decNumber)
569      return $ Q.DSymbol i off
570\end{code}
571
572Within each object, several items can be defined. When several data items
573follow a letter, they initialize multiple fields of the same size.
574
575\begin{code}
576allocSize :: Parser Q.AllocSize
577allocSize =
578  choice
579    [ bind "4" Q.AlignWord,
580      bind "8" Q.AlignLong,
581      bind "16" Q.AlignLongLong
582    ]
583\end{code}
584
585The members of a struct will be packed. This means that padding has to
586be emitted by the frontend when necessary. Alignment of the whole data
587objects can be manually specified, and when no alignment is provided,
588the maximum alignment from the platform is used.
589
590When the \texttt{z} letter is used the number following indicates the size of
591the field; the contents of the field are zero initialized. It can be
592used to add padding between fields or zero-initialize big arrays.
593
594\subsubsection{Functions}
595\label{sec:functions}
596
597\begin{code}
598funcDef :: Parser Q.FuncDef
599funcDef = do
600  link <- many linkage
601  _ <- ws1 (string "function")
602  retTy <- optionMaybe (ws1 abity)
603  name <- ws global
604  args <- wsNL params
605  body <- between (wsNL1 $ char '{') (wsNL $ char '}') $ many1 block
606  return $ Q.FuncDef link name retTy args body
607\end{code}
608
609Function definitions contain the actual code to emit in the compiled
610file. They define a global symbol that contains a pointer to the
611function code. This pointer can be used in \texttt{call} instructions or stored
612in memory.
613
614\begin{code}
615subWordType :: Parser Q.SubWordType
616subWordType = choice
617  [ try $ bind "sb" Q.SignedByte
618  , try $ bind "ub" Q.UnsignedByte
619  , bind "sh" Q.SignedHalf
620  , bind "uh" Q.UnsignedHalf ]
621
622abity :: Parser Q.Abity
623abity = try (Q.ASubWordType <$> subWordType)
624    <|> (Q.ABase <$> baseType)
625    <|> (Q.AUserDef <$> userDef)
626\end{code}
627
628The type given right before the function name is the return type of the
629function. All return values of this function must have this return type.
630If the return type is missing, the function must not return any value.
631
632\begin{code}
633param :: Parser Q.FuncParam
634param = (Q.Env <$> (ws1 (string "env") >> local))
635    <|> (string "..." >> pure Q.Variadic)
636    <|> do
637          ty <- ws1 abity
638          Q.Regular ty <$> local
639
640params :: Parser [Q.FuncParam]
641params = between (ws $ char '(') (char ')') params'
642  where
643    params' = sepBy (ws param) (ws $ char ',')
644\end{code}
645
646The parameter list is a comma separated list of temporary names prefixed
647by types. The types are used to correctly implement C compatibility.
648When an argument has an aggregate type, a pointer to the aggregate is
649passed by thea caller. In the example below, we have to use a load
650instruction to get the value of the first (and only) member of the
651struct.
652
653\begin{verbatim}
654type :one = { w }
655
656function w $getone(:one %p) {
657@start
658        %val =w loadw %p
659        ret %val
660}
661\end{verbatim}
662
663If a function accepts or returns values that are smaller than a word,
664such as \texttt{signed char} or \texttt{unsigned short} in C, one of the sub-word type
665must be used. The sub-word types \texttt{sb}, \texttt{ub}, \texttt{sh}, and \texttt{uh} stand,
666respectively, for signed and unsigned 8-bit values, and signed and
667unsigned 16-bit values. Parameters associated with a sub-word type of
668bit width N only have their N least significant bits set and have base
669type \texttt{w}. For example, the function
670
671\begin{verbatim}
672function w $addbyte(w %a, sb %b) {
673@start
674        %bw =w extsb %b
675        %val =w add %a, %bw
676        ret %val
677}
678\end{verbatim}
679
680needs to sign-extend its second argument before the addition. Dually,
681return values with sub-word types do not need to be sign or zero
682extended.
683
684If the parameter list ends with \texttt{...}, the function is a variadic
685function: it can accept a variable number of arguments. To access the
686extra arguments provided by the caller, use the \texttt{vastart} and \texttt{vaarg}
687instructions described in the \nameref{sec:variadic} section.
688
689Optionally, the parameter list can start with an environment parameter
690\texttt{env \%e}. This special parameter is a 64-bit integer temporary (i.e.,
691of type \texttt{l}). If the function does not use its environment parameter,
692callers can safely omit it. This parameter is invisible to a C caller:
693for example, the function
694
695\begin{verbatim}
696export function w $add(env %e, w %a, w %b) {
697@start
698        %c =w add %a, %b
699        ret %c
700}
701\end{verbatim}
702
703must be given the C prototype \texttt{int add(int, int)}. The intended use of
704this feature is to pass the environment pointer of closures while
705retaining a very good compatibility with C. The \nameref{sec:call}
706section explains how to pass an environment parameter.
707
708Since global symbols are defined mutually recursive, there is no need
709for function declarations: a function can be referenced before its
710definition. Similarly, functions from other modules can be used without
711previous declaration. All the type information necessary to compile a
712call is in the instruction itself.
713
714The syntax and semantics for the body of functions are described in the
715\nameref{sec:control} section.
716
717\section{Control}
718\label{sec:control}
719
720The IL represents programs as textual transcriptions of control flow
721graphs. The control flow is serialized as a sequence of blocks of
722straight-line code which are connected using jump instructions.
723
724\subsection{Blocks}
725\label{sec:blocks}
726
727\begin{code}
728block :: Parser Q.Block
729block = do
730  l <- wsNL1 label
731  s <- many (wsNL1 statement)
732  Q.Block l s <$> (wsNL1 jumpInstr)
733\end{code}
734
735All blocks have a name that is specified by a label at their beginning.
736Then follows a sequence of instructions that have "fall-through" flow.
737Finally one jump terminates the block. The jump can either transfer
738control to another block of the same function or return; jumps are
739described further below.
740
741The first block in a function must not be the target of any jump in the
742program. If a jump to the function start is needed, the frontend must
743insert an empty prelude block at the beginning of the function.
744
745When one block jumps to the next block in the IL file, it is not
746necessary to write the jump instruction, it will be automatically added
747by the parser. For example the start block in the example below jumps
748directly to the loop block.
749
750\subsection{Jumps}
751
752\begin{code}
753jumpInstr :: Parser Q.JumpInstr
754jumpInstr = (string "hlt" >> pure Q.Halt)
755        -- TODO: Return requires a space if there is an optionMaybe
756        <|> Q.Return <$> ((ws $ string "ret") >> optionMaybe val)
757        <|> try (Q.Jump <$> ((ws1 $ string "jmp") >> label))
758        <|> do
759          _ <- ws1 $ string "jnz"
760          v <- ws val <* ws (char ',')
761          l1 <- ws label <* ws (char ',')
762          l2 <- ws label
763          return $ Q.Jnz v l1 l2
764\end{code}
765
766A jump instruction ends every block and transfers the control to another
767program location. The target of a jump must never be the first block in
768a function. The three kinds of jumps available are described in the
769following list.
770
771\begin{enumerate}
772  \item \textbf{Unconditional jump.} Jumps to another block of the same function.
773  \item \textbf{Conditional jump.} When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0.
774  \item \textbf{Function return.} Terminates the execution of the current function, optionally returning a value to the caller. The value returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used.
775  \item \textbf{Program termination.} Terminates the execution of the program with a target-dependent error. This instruction can be used when it is expected that the execution never reaches the end of the block it closes; for example, after having called a function such as \texttt{exit()}.
776\end{enumerate}
777
778\section{Instructions}
779\label{sec:instructions}
780
781\begin{code}
782instr :: Parser Q.Instr
783instr =
784  choice
785    [ try $ binaryInstr Q.Add "add",
786      try $ binaryInstr Q.Sub "sub",
787      try $ loadInstr,
788      try $ allocInstr
789    ]
790\end{code}
791
792Instructions are the smallest piece of code in the IL, they form the body of
793\nameref{sec:blocks}. This specification distinguishes instructions and
794volatile instructions, the latter do not return a value. For the former, the IL
795uses a three-address code, which means that one instruction computes an
796operation between two operands and assigns the result to a third one.
797
798\begin{code}
799assign :: Parser Q.Statement
800assign = do
801  n <- ws local
802  t <- ws (char '=') >> ws1 baseType
803  Q.Assign n t <$> instr
804
805volatileInstr :: Parser Q.Statement
806volatileInstr = Q.Volatile <$> (storeInstr <|> blitInstr)
807
808-- TODO: Not documented in the QBE BNF.
809statement :: Parser Q.Statement
810statement = (try callInstr) <|> assign <|> volatileInstr
811\end{code}
812
813An instruction has both a name and a return type, this return type is a base
814type that defines the size of the instruction's result. The type of the
815arguments can be unambiguously inferred using the instruction name and the
816return type. For example, for all arithmetic instructions, the type of the
817arguments is the same as the return type. The two additions below are valid if
818\texttt{\%y} is a word or a long (because of \nameref{sec:subtyping}).
819
820\begin{verbatim}
821%x =w add 0, %y
822%z =w add %x, %x
823\end{verbatim}
824
825Some instructions, like comparisons and memory loads have operand types
826that differ from their return types. For instance, two floating points
827can be compared to give a word result (0 if the comparison succeeds, 1
828if it fails).
829
830\begin{verbatim}
831%c =w cgts %a, %b
832\end{verbatim}
833
834In the example above, both operands have to have single type. This is
835made explicit by the instruction suffix.
836
837\subsection{Arithmetic and Bits}
838
839To-Do.
840
841\subsection{Memory}
842
843\subsubsection{Store instructions}
844
845\begin{code}
846storeInstr :: Parser Q.VolatileInstr
847storeInstr = do
848  t <- string "store" >> ws1 extType
849  v <- ws val
850  _ <- ws $ char ','
851  ws val <&> Q.Store t v
852\end{code}
853
854Store instructions exist to store a value of any base type and any extended
855type. Since halfwords and bytes are not first class in the IL, \texttt{storeh}
856and \texttt{storeb} take a word as argument. Only the first 16 or 8 bits of
857this word will be stored in memory at the address specified in the second
858argument.
859
860\subsubsection{Load instructions}
861
862\begin{code}
863loadInstr :: Parser Q.Instr
864loadInstr = do
865  _ <- string "load"
866  t <- ws1 $ choice
867    [ try $ bind "sw" (Q.LBase Q.Word),
868      try $ bind "uw" (Q.LBase Q.Word),
869      try $ Q.LSubWord <$> subWordType,
870      Q.LBase <$> baseType
871    ]
872  ws val <&> Q.Load t
873\end{code}
874
875For types smaller than long, two variants of the load instruction are
876available: one will sign extend the loaded value, while the other will zero
877extend it. Note that all loads smaller than long can load to either a long or a
878word.
879
880The two instructions \texttt{loadsw} and \texttt{loaduw} have the same effect
881when they are used to define a word temporary. A \texttt{loadw} instruction is
882provided as syntactic sugar for \texttt{loadsw} to make explicit that the
883extension mechanism used is irrelevant.
884
885\subsubsection{Blits}
886
887\begin{code}
888blitInstr :: Parser Q.VolatileInstr
889blitInstr = do
890  v1 <- (ws1 $ string "blit") >> ws val <* (ws $ char ',')
891  v2 <- ws val <* (ws $ char ',')
892  nb <- decNumber
893  return $ Q.Blit v1 v2 nb
894\end{code}
895
896The blit instruction copies in-memory data from its first address argument to
897its second address argument. The third argument is the number of bytes to copy.
898The source and destination spans are required to be either non-overlapping, or
899fully overlapping (source address identical to the destination address). The
900byte count argument must be a nonnegative numeric constant; it cannot be a
901temporary.
902
903One blit instruction may generate a number of instructions proportional to its
904byte count argument, consequently, it is recommended to keep this argument
905relatively small. If large copies are necessary, it is preferable that
906frontends generate calls to a supporting \texttt{memcpy} function.
907
908\subsubsection{Stack Allocation}
909
910\begin{code}
911allocInstr :: Parser Q.Instr
912allocInstr = do
913  siz <- (ws $ string "alloc") >> (ws1 allocSize)
914  decNumber <&> Q.Alloc siz
915\end{code}
916
917These instructions allocate a chunk of memory on the stack. The number ending
918the instruction name is the alignment required for the allocated slot. QBE will
919make sure that the returned address is a multiple of that alignment value.
920
921Stack allocation instructions are used, for example, when compiling the C local
922variables, because their address can be taken. When compiling Fortran,
923temporaries can be used directly instead, because it is illegal to take the
924address of a variable.
925
926\subsection{Comparisons}
927
928To-Do.
929
930\subsection{Conversions}
931
932To-Do.
933
934\subsection{Cast and Copy}
935
936To-Do.
937
938\subsection{Call}
939\label{sec:call}
940
941\begin{code}
942callInstr :: Parser Q.Statement
943callInstr = do
944  retValue <- optionMaybe $ do
945    i <- ws local <* ws (char '=')
946    a <- ws1 abity
947    return (i, a)
948  toCall <- ws1 (string "call") >> ws val
949  fnArgs <- params
950  return $ Q.Call retValue toCall fnArgs
951\end{code}
952
953To-Do.
954
955\subsection{Variadic}
956\label{sec:variadic}
957
958To-Do.
959
960\subsection{Phi}
961
962To-Do.
963
964\end{document}