From the Foreword by Susan L. Graham: This book takes on the challenges of contemporary languages and architectures, and prepares the reader for the new . Advanced Compiler Design and Implementation eBook: Steven Muchnick: ronaldweinland.info: Kindle Store. [Steven S. Muchnick] Advanced Compiler Design And - Free ebook download as PDF File Advanced compiler design and implementation / Steve Muchnick.
|Language:||English, Spanish, Arabic|
|Genre:||Science & Research|
|ePub File Size:||27.43 MB|
|PDF File Size:||12.12 MB|
|Distribution:||Free* [*Register to download]|
Editorial Reviews. ronaldweinland.info Review. Optimizing compilers, which turn human -readable eBook features: Highlight, take notes, and search in the book. Get this from a library! Advanced compiler design and implementation. [Steven S Muchnick]. The definitive book on advanced compiler design This comprehensive, up-to- date work examines advanced issues in the design and implementation of.
From the Foreword by Susan L. Graham: This book takes on the challenges of contemporary languages and architectures, and prepares the reader for the new compiling problems that will inevitably arise in the future. The definitive book on advanced compiler design This comprehensive, up-to-date work examines advanced issues in the design and implementation of compilers for mo From the Foreword by Susan L. The definitive book on advanced compiler design This comprehensive, up-to-date work examines advanced issues in the design and implementation of compilers for modern processors. Written for professionals and graduate students, the book guides readers in designing and implementing efficient structures for highly optimizing compilers for real-world languages. Covering advanced issues in fundamental areas of compiler design, this book discusses a wide array of possible code optimizations, determining the relative importance of optimizations, and selecting the most effective methods of implementation.
Parallelization and vectorization and their relationship to scalar optimization are not covered because they would require a lot more space and because there are already several good texts on parallelization and vectorization, for example, those by Wolfe, Banerjee, and Zima and Chapman.
However, the technique of dependence analysis covered in Chapter 9 and the loop transformations discussed in Section Profiling feedback to the compilation process is important and is referred to several times in the remainder of the book. A good introduction to the techniques, interpre tation of their results, and their application in compilers can be found in the work of Ball and Larus and of Wall, along with references to previous work.
Target Machines Used in Examples Most of our examples of target-machine code are in sparc assembly language. We use a simplified version that does not have register windows.
Occasionally there are examples for sparc-V9 or in other assembly languages, such as for power or the Intel architecture family. In all cases, the assembly languages are described well enough in Appendix A to enable one to read the examples. Number Notations and Data Sizes The terms byte and word are usually reserved for the natural sizes of a character and a register datum on a particular system.
Since we are concerned primarily with bit systems with 8-bit bytes and with bit systems designed as extensions of. Almost all the numbers in this book are in decimal notation and are written in the ordinary way. We occasionally use hexadecimal notation, however. An integer represented in hexadecimal is written as Ox followed by a string of hexadecimal digits namely, and either a - f or A-F and is always to be interpreted as an unsigned number unless it is specifically indicated to represent a signed value of length equal to the number of the leftmost one bit counting the rightmost as number one.
Wrap-Up In this chapter we have concentrated on reviewing some of the basic aspects of compilers and on providing a setting for launching into the chapters that follow.
After discussing what to expect in Chapters 3 through 6, which concern ad vanced aspects of what are generally considered elementary topics, we next described the importance of optimization; structures for optimizing compilers, including the mixed and low-level models; and the organization of optimizations in an aggressive optimizing compiler. Next we discussed possible orderings for reading the remaining chapters, and concluded with a list of related topics not covered here, and short sections on target machines used in examples and on notations for numbers and the names we use for various data sizes.
The primary lessons to take away from this chapter are five in number, namely,1 1. Unlike the history of programming languages, for which there are two excellent books available, namely, [Wexe81] and [BerG95], there is very little published ma terial on the history of compilers. A history of the very earliest stages of compiler development is given in Knuth [Knut62]; some more recent material is included in the two volumes noted above, i. Among the better recent introductory texts on compiler construction are [AhoS86] and [FisL91].
Wismiillers [Wism94] is a reference to another thread of work in this area. The work of Ball and Larus on profiling is covered in [BalL92].
Walls work in [Wall91] concerns the effect of feedback from profiling on recompilation and the resulting performance effects. Determine and describe the large-scale structure and intermediate codes of a com piler in use in your computing environment. What sections of the compiler are optionally executed under user control? RSCH 1. First we discuss the extended Backus-Naur form that is used to express the syntax of both ican and the intermediate languages discussed in the following chapter.
Next we provide an in troduction to the language and its relationship to common programming languages, an informal overview of the language, and then a formal description of the syntax of ican and an informal English description of its semantics.
It is hoped that, in general, the informal overview will be sufficient for the reader to understand ican programs, but the full definition is provided to deal with those instances where it is not. In xbnf terminals are written in ty p ew riter fon t e. A production consists of a nonterminal followed by a long right arrow and a sequence of nonterminals, terminals, and operators.
The symbol e represents the empty string of characters. The operators are listed in Table 2.
Note that the xbnf operators are written in our ordinary text font. When the same symbols appear in ty p e w riter fo n t, they are terminal symbols in the language being defined. Thus, for example, Separates alternatives Grouping Optional Zero or more repetitions One or more repetitions One or more repetitions of the left operand separated by occurrences of the right operand.
The first line describes an ArrayTypeExpr as the keyword array , followed by a left bracket [ , followed by an occurrence of something that conforms to the syntax of ArrayBounds, followed by a right bracket ] , followed by the keyword of, followed by an occurrence of something conforming to the syntax of TypeExpr.
The second line describes ArrayBounds as a series of one or more triples of the form of an optional Expr, followed by , followed by an optional Expr, with the triples separated by com m as",. The following are examples of ArrayTypeExprs: Introduction to ICAN Algorithms in this text are written in a relatively transparent, informal notation1 called ican Informal Compiler Algorithm Notation that derives features from 1.
One measure of the informality of ican is that many facets of the language that are considered to be errors, such as accessing an array with an out-of-range subscript, have their effects undefined. Node Struc r: A sample ican global declaration and procedure the line numbers at left are not part of the code. Figures 2. The syntax of i c a n is designed so that every variety of compound statement includes an ending delimiter, such as f i to end an if statement.
As a result, sep arators are not needed between statements. However, as a convention to improve readability, when two or more statements are written on the same line we separate them with semicolons Figure 2. Similarly, if a definition, declaration, or statement extends beyond a single line, the continuation lines are indented Fig ure 2. A comment begins with the delimiter I I and runs to the end of the line Figure 2.
Lexically, an i c a n program is a sequence of a s c i i characters. Tabs, comments, line ends, and sequences of one or more spaces are called whitespace. Each occurrence of whitespace may be turned into a single space without affecting the meaning of a program. Keywords are preceded and followed by whitespace, but operators need not be.
Lexical analysis proceeds left to right, and characters are accumulated to form tokens that are as long as they can be. A Quick Overview o f ICAN In this section we give a quick overview of ican , which should be sufficient for the reader to begin reading and understanding program s in the text.
The following sections define the syntax of the language formally and the semantics informally. An ican program consists of a series of type definitions, followed by a series of variable declarations, followed by a series of procedure declarations, followed by an optional main program. Types may be either generic or compiler-specific, and either simple or constructed. The generic simple types are boolean, in te g e r , r e a l , and c h a ra c te r. The type constructors are listed in the following table: A variable declaration consists of the name of the variable, followed by an optional initialization, followed by a colon and the variables type, e.
A procedure declaration consists of the procedures name, followed by its parameter list in parentheses, followed by an optional return type, followed by its parameter declarations and its body. A parameter declaration consists of a commaseparated sequence of variable names; followed by a colon; one of in call by value , out call by result , or in o u t call by value-result ; and the type of the param eters.
A procedure body consists of the keyword b eg in , followed by a series of variable declarations, followed by a series of statements, followed by the keyword end. For example, Figures 2. An expression is either a constant, a variable, n i l , a unary operator followed by an expression, two expressions separated by a binary operator, a parenthesized expression, an array expression, a sequence expression, a set expression, a tuple expression, a record expression, a procedure or function call, an array element, a tuple element, a record field, a size expression, or a quantified expression.
The operands and operators must be of compatible types. The operators appropriate to specific types may be found in Section 2. A few of the less obvious ones are discussed below. The following are examples of constants of constructed types: The in t e g e r and r e a l types include 00 and The empty set is denoted 0 and the empty sequence . The value n i l is a member of every type and is the value of any uninitialized variable. The unary operator when applied to a set yields an arbitrary member of the set.
Compiler-specific types are defined as needed. Statements include assignments, calls, returns, gotos, ifs, cases; and for, while, and repeat loops. The basic assignment operator is As in C, the colon may be replaced by any binary operator whose left-hand operand and result have the same type; for example, the following assignments both do the same thing:.
Each compound statement has an ending delimiter. The beginning, internal, and ending delimiters of the compound statements are as follows: Case labels are also internal delimiters in case statements. All keywords used in the language are reservedthey may not be used as identifiers.
They are listed in Table 2. The following sections describe ican in detail. Whole Programs An ican program consists of a series of type definitions, followed by a series of variable declarations, followed by a series of procedure declarations, followed by an optional main program. The main program has the form of a procedure body.
The syntax of ican programs is given in Table 2. Type Definitions A type definition consists of one or more pairs of a type name followed by an equals sign followed by the definition Figure 2. The syntax of type definitions is given in Table 2.
Type definitions may be recursive. The type defined by a recursive type definition is the smallest set that satisfies the definition, i. Declarations The syntax of ican variable and procedure declarations is given in Table 2. The syntax includes the nonterminal C o n s t E x p r , which is not defined in the grammar. It denotes an expression none of whose components is a variable. A variable declaration consists of the name of the identifier being declared followed by an optional initialization, followed by a colon and its type Figure 2.
An arrays dimensions are part of its type and are specified by placing a list of the ranges of values of each of the subscripts in square. An initial value for an identifier is specified by following it with the assignment operator: Several identifiers of the same type may be declared in a single declaration by separating their names and optional initializations with commas Figure 2. A procedure is declared by the keyword procedure followed by its name, fol lowed by a series of parameters in parentheses, followed by an optional return type, followed by a series of indented lines that declare the types of the parameters Figure 2.
The return type consists of the keyword retu rn s followed by a type expression Fig ure 2. Parameters are declared in call by value , out call by result , or inout call by value-result see Figure 2. A procedures text is indented between the keywords begin and end Figure 2. A type definition or variable declaration may be made global to a group of procedures by placing it before their declarations Figure 2.
Data Types and Expressions The syntax of ican expressions and generic simple constants are given in Tables 2. A type corresponds to the set of its members. It may be either simple or con structed. Also, a type may be generic or compiler-specific. A constructed type is defined by using one or more type constructors. The type constructors are enum, a r r a y. An expression is either a constant, a variable, n i l , a unary operator followed by an expression, two expressions separated by a binary operator, a parenthesized expression, an array expression, a sequence expression, a set expression, a tuple expression, a record expression, a procedure or function call, an array element, a quantified expression, or a size expression.
The operands and operators must be of compatible types, as described below. Generic Simple Types The Boolean values are tr u e and f a l s e. The following binary operators apply to Booleans: The prefix unary operator negation!
A quantified expression is Boolean-valued. It consists of the symbol 3 or V followed by a variable, followed by e , followed by a type- or set-valued expression, followed by a parenthesized Boolean-valued expression. For example, 3v e Var O p n d in st, v is a quantified expression that evaluates to tr u e if and only if there is some variable v such that O p n d in st,v is tru e.
An integer value is either 0, an optional minus sign followed by a series of one or more decimal digits the first of which is nonzero, , or -. A real value is an integer followed by a period, followed by an integer either but not both of the integers may be absent , followed by an optional exponent, , or -.
The following binary operators apply to finite integers and reals: The prefix unary operator negation applies to finite integers and reals. Only the relational operators apply to infinite values. A character value is an allowed ascii character enclosed in single quotation marks, e. The allowed ascii characters represented by the otherwise undefined nonterminal ASCIICharacter in the syntax are all the printing ascii characters, space, tab, and carriage return.
Several of the characters require escape sequences to represent them, as follows: Escape Sequence. Enumerated Types An enumerated type is a non-empty finite set of identifiers.
A variable var is declared to be of an enumerated type by a declaration of the form var: The following example declares a c tio n to be a variable of an enumerated type: Elements of an enumerated type may appear as case labels see Section 2. Arrays A variable var is declared to be of an array type by a declaration of the form var: For example, the code fragment U: It also assigns particular array constants to be their values.
Of course, an array may be viewed as a finite function from the product of some number of copies of the integers to another type. Thus, arrays are represented in row-major order. An array-valued expression of dimension n followed by a comma-separated list of at most n subscript values enclosed in square brackets is an expression.
Note that the two array types array [,] of integer array  of array  of integer. The first is a two-dimensional array, and the second is a one-dimensional array of one dimensional arrays. Sets A variable var of a set type may have as its value any subset of its base type declared as follows: A set constant is either the empty set 0 , a comma-separated series of elements enclosed by left and right curly braces, or an intentionally defined set constant.
The elements must all be of the same type. For example, the following are set constants: The following binary operators apply to sets: The last two of these, e and take a value of a type ty as their left operand and a value of type s e t of ty as their right operand, and produce a Boolean result. For e , the result is tru e if the left operand is a member of the right operand and f a l s e otherwise.
For the result is f a l s e if the left operand is a member of the right and tru e otherwise. The prefix unary operator selects, at random, an element from its set operand, and the selected element is the result, e. Note that the binary operator e and the same symbol used in for-loop iterators are different. The former produces a Boolean value, while the latter is part of a larger expression that generates a sequence of values, as shown in Figure 2.
The code in a is equivalent in meaning to that in b , where Tmp is a new temporary of the same type as A. Note that a set definition containing a SetDefClause can always be replaced by a nest of loops, such as the following for the assignment above, assuming that the type of E is the product of the type of N with itself: Sequences A variable var is declared to be of a sequence type by a declaration of the form var: A constant of a sequence type is a finite comma-separated series of members of its base type enclosed in square brackets.
The empty sequence is denoted [ ]. For example, the following are sequence constants: Sequence concatenation is represented by the binary operator.
The binary oper ator 1 when applied to a sequence and a nonzero integer selects an element of the sequence; in particular, the positive integer n selects the th element of the sequence, and the negative integer n selects the th element from the end of the sequence.
The binary operator when ap plied to a sequence s and a nonzero integer n produces a copy of the sequence with the th element removed. The type C harString is an abbreviation for sequence of ch aracter. For ex ample, "ab CD" is identical to [ 'a ' , 'b ' , ' ' , 'C' , 'D ']. Note that the array constants are a subset of the sequence constantsthe only difference is that in an array constant, all members at each nesting level must have the same length.
Tuples A variable var is declared to be of a tuple type by a declaration of the form var: A tuple constant is a fixed-length comma-separated series enclosed in angle brackets. The following are also examples of tuple constants: The binary operator when applied to a tuple and a positive integer index produces the element of the tuple with that index.
Records A variable var is declared to be of a record type by a declaration of the form var: A record constant is a tuple, each of whose elements is a pair consisting of an identifier called the selector and a value, separated by a colon. All values of a particular record type must have the same set of identifiers and, for each identifier, the values must be of the same type, but the values corresponding to different selectors may be of different types.
The following are also examples of record constants: The binary operator. Unions A union type is the union of the sets of values in the types that make up the union.
A variable var is declared to be of a union type by a declaration of the form var: As an example of a union type, consider in te g e r u boolean. An element of this type is either an integer or a Boolean. All the operators that apply to sets apply to unions. If the sets making up a union type are disjoint, then the set an element of the union belongs to may be determined by using the member o f operator e.
Functions A function type has a domain type written to the left of the arrow and a range type written to the right of the arrow. A variable var is declared to be of a function type by a declaration of the form var: To be a function, the set of tuples must be single-valued, i. A variable or constant of this type is a set of pairs whose first member is a Boolean and whose. It may also be expressed by an assignment or assignments involving the name of the type.
Thus, given the declaration A: A function need not have a defined value for every element of its domain. Compiler-Specific Types The compiler-specific types are all named with an uppercase initial letter and are introduced as needed in the text. They may be either simple or constructed, as necessary. The Value n i l The value n i l is a member of every type. It is the value of any variable that has not been initialized.
In most contexts, using it as an operand in an expression causes the result to be n il. The only expressions in which using n i l as an operand in an expression does not produce n i l as the result are equality and inequality comparisons, as follows: In addition, n i l may appear as the right-hand side of an assignment Figure 2. In each case, its value is the number of elements in its argument, as long as its argument is of finite size.
For example, if A is declared to be A: Statements Statements include assignments e. Their syntax is given in Table 2. A statement may be labeled Figure 2. Each structured statements body is delimited by keywords, such as i f and f i.
Assignment Statements An assignment statement consists of a series of one or more left-hand parts, each followed by an assignment operator, and a right-hand part Figure 2. Each left-hand part may be a variable name or the name of an element of a variable, such as a member of a record, array, sequence, or tuple, or a function value.
The assignment operator in each of the left-hand parts except the last must be: The last assignment operator may be either: For example, all the assignments in the following code are legal:. Statement Statement ;. The right-hand part may be any type-compatible expression see Section 2. The left- and right-hand parts of an assignment must be of the same type when an extended assignment operator is expanded to its ordinary form. The right-hand side following an extended assignment operator is evaluated as if it had parentheses around it.
For example, the assignment S: Procedure Call Statements A procedure call statement has the form of a procedure expression, i. It consists of a procedure name followed by a parenthe sized list of arguments separated by commas. It causes the named procedure to be invoked with the given arguments. Return Statements A return statement consists of the keyword retu rn followed optionally by an ex pression. The conditions are Boolean-valued expres sions and are evaluated in the short-circuit manner, e.
Each of the bodies is a sequence of zero or more statements. Each label is a constant of the same type as the selector expression, which must be of a simple type or an enumerated type.
Each body is a sequence of zero or more statements. As in Pascal, after executing one of the bodies, execution continues with the statement following the e sa c closing delimiter. There must be at least one non-default case label and corresponding body.
While Statements A while statement has the form while condition do w b ilejb o d y. The body is a sequence of zero or more statements. A numerical iterator specifies a variable, a range of values, and a parenthesized Boolean expression, such as i: The by part is. The Boolean expression is optional and, if not supplied, the value tru e is used.
The value of the variable may not be changed in the body of the loop. If the parenthesized Boolean expression is missing, the value tru e is used.
If the variable series has more than one element, they all must satisfy the same criteria. For any set S that appears in the iterator, the body of the for statement must not change Ss value. The condition is a Boolean valued expression and is evaluated in the short-circuit manner. They are all reserved and may not be used as identifiers. W rap-Up This chapter is devoted to describing ican , the informal notation used to present algorithms in this book.
The language allows for a rich set of predefined and constructed types, including ones that are specific to compiler construction, and is an expressive notation for expressions and statements. Each compound statement has an ending delimiter, and some, such as while and case statements, have internal delimiters too.
The informality of the language lies primarily in its not being specific about the semantics of constructions that are syntactically valid but semantically ambiguous, undefined, or otherwise invalid.
That is, return a list of nodes beginning and ending with s t a r t that passes through every node other than s t a r t exactly once, or return n i l if there is no such path. ADV 2. These include, for example, pointer aliasing, in which two or more pointers point to the same object, so that changing the referent of one of them affects the referents of the others also, and the possibility of creating circular structures, i.
On the other hand, excluding pointers may result in algorithms being less efficient than they would otherwise be. Suppose we were to decide to extend ican to create a language, call it pican, that includes pointers, a List advantages and disadvantages of doing so.
We begin with a discussion of the storage classes that symbols may belong to and the rules governing their visibility, or scope rules, in various parts of a program. Next we discuss symbol attributes and how to structure a local symbol table, i. This is followed by a description of a representation for global symbol tables that includes importing and exporting of scopes, a programming interface to global and local symbol tables, and ican implementations of routines to generate loads and stores for variables according to their attributes.
Storage Classes, Visibility, and Lifetimes Most programming languages allow the user to assign variables to storage classes that prescribe scope, visibility, and lifetime characteristics for them.
The rules gov erning scoping also prescribe principles for structuring symbol tables and for repre senting variable access at run time, as discussed below. A scope is a unit of static program structure that may have one or more variables declared within it. In many languages, scopes may be nested: The closely related concept of visibility of a variable indicates in what scopes the variables name refers to a particular instance of the name.
For example, in Pascal, if a variable named a is declared in the outermost scope, it is visible everywhere in the program1 except 1. If a variable in an inner scope makes a variable with the same name in a containing scope temporarily invisible, we say the inner one shadows the outer one. The extent or lifetime of a variable is the part of the execution period of the program in which it is declared from when it first becomes visible to when it is last visible.
Thus, a variable declared in the outermost scope of a Pascal program has a lifetime that extends throughout the execution of the program, while one declared within a nested procedure may have multiple lifetimes, each extending from an entry to the procedure to the corresponding exit from it.
A Fortran variable with the save attribute or a C static local variable has a noncontiguous lifetimeif it is declared within procedure f , its lifetime consists of the periods during which f is executing, and its value is preserved from each execution period to the next.
Finding libraries that hold this item You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.
Your request to send this item has been completed. APA 6th ed. Citations are based on reference standards. However, formatting rules can vary widely between applications and fields of interest or study.
The specific requirements or preferences of your reviewing publisher, classroom teacher, institution or organization should be applied. The E-mail Address es field is required. Please enter recipient e-mail address es. The E-mail Address es you entered is are not in a valid format. Please re-enter recipient e-mail address es.
You may send this item to up to five recipients. The blanket was woven of fine strands of red-cedar inner bark and mountain-goat wool in the late 19th century by a Tlingit woman from southeastern Alaska. It generally took six to nine months of work to complete such a blanket. The blanket design is divided into three panels, and the center panel depicts a diving whale.
The head is the split image at the bottom; the body is the panel with the face in the center a panel that looks like a face never represents the face in this iconography ; the lateral fins are at the sides of the body; and the tail flukes are at the top. Each part of the design is, in itself, functional but meaningless; assembled together in the right way, the elements combine to depict a diving whale and proclaim the rights and prerogatives of the village chief who owned the blanket.
In a similar way, each component of a compiler is functional, but it is only when the components are put together in the proper way that they serve their overall purpose. Designing and weaving such a blanket requires skills that are akin to those involved in constructing industrial-strength compilerseach discipline has a set of required tools, materials, design elements, and overall patterns that must be combined in a way that meets the prospective users needs and desires.
Audience for This Book This book is intended for computer professionals, graduate students, and advanced undergraduates who need to understand the issues involved in designing and con structing advanced compilers for uniprocessors. The reader is assumed to have had introductory courses in data structures, algorithms, compiler design and implemen- Preface tation, computer architecture, and assembly-language programming, or equivalent work experience. Overview of the Books Contents This volume is divided into 21 chapters and three appendices as follows: Chapter 1.
Introduction to Advanced Topics This chapter introduces the subject of the book, namely, advanced topics in the de sign and construction of compilers, and discusses compiler structure, the importance of optimization, and how the rest of the material in the book works together. Chapter 2. Informal Compiler Algorithm Notation ICAN Chapter 2 describes and gives examples of an informal programming notation called ican that is used to present algorithms in the text. After describing the notation used to express the languages syntax, it gives a brief overview of ican , followed by a detailed description of the language.
The brief description should be sufficient for reading most of the algorithms presented and the full description should need to be referred to only rarely.
Chapter 3 first discusses the attributes of variables, such as storage class, visibility, volatility, scope, size, type, alignment, structure, addressing method, and so on. Then it describes effective methods for structuring and managing local and global symbol tables, including importation and exportation of scopes as found, e.
Chapter 4. Intermediate Representations This chapter focuses on intermediate language design, three specific intermediate lan guages used in the remainder of the book, and other basic forms of intermediate code that might be used in a compiler.
We use three closely related intermediate forms, one high-level, one medium-level, and one low-level, to allow us to demonstrate vir tually all the optimizations discussed. We also discuss the relative importance and usefulness of our chosen forms and the others. Two other more elaborate forms of intermediate code, namely, static single assignment SSA form and program dependence graphs, are discussed in Sec tions 8.