GCCINT:RTL
来自 ChinaUnix Wiki
下一页: 控制流程, 前一页: 树, 上一级: GCC Internals 中文翻译
目录 |
12 RTL表示
编译器的大部分工作都是基于一种中间表示,叫做寄存器传送语言(register transfer language)。在该语言中,描述了将要输出的指令,并且差不多是按照字母顺序一个一个的来描述指令的行为。
RTL的灵感来自Lisp列表。它同时具有一个内部形式,由指向结构体的结构体组成,以及一个文本形式,用在机器描述和打印的调试输出中。文本形式使用嵌套的括号,来表示内部形式中的指针。
- RTL对象:表达式、向量、字符串、整数
- RTL类别:RTL表达式对象的类别,以及它们的结构
- 访问操作数:用来访问表达式操作数和向量成员的宏
- 访问特殊操作数:用来访问RTL中的特殊注解
- 标记:RTL表达式中的其它标记
- 机器模式:描述一个数据的大小和格式
- 常数:常数值表达式
- 寄存器和内存:表示寄存器内容或内存的表达式
- 算术运算:表示算术运算的表达式
- 比较运算:表示比较运算的表达式
- 位域:表示内存或寄存器中位域的表达式
- 向量操作:关于向量数据类型的表达式
- 转换:扩展,截断,浮点化或者定点化
- RTL声明:声明为volatile的,constant的,等等
- 副作用:存储到寄存器的表达式,等
- Incdec:自动增量寻址的嵌入的副作用
- 汇编:表示带有操作数的asm
- Insns:整个insn的表达式类型
- Calls :函数call insn的RTL表示
- 共享:一些表达式是唯一的;其它的必须要被复制
- 读取RTL:从文件中读取文本的RTL
RTL对象
12.1 RTL Object Types 12.1 RTL 对象类型
RTL uses five kinds of objects: expressions, integers, wide integers, strings and vectors. Expressions are the most important ones. An RTL expression (“RTX”, for short) is a C structure, but it is usually referred to with a pointer; a type that is given the typedef name rtx.
RTL 使用五种对象:表达式(expression)、整数(integer)、宽整数(wide interger)、字符串(string)、与向量(vector).Expressions是最重要的。一个RTL expression(缩写为“RTX”)s是一个C的结构(structure),当通常都用一个指针来引用。rtx是那个typedef的类型。
An integer is simply an int; their written form uses decimal digits. A wide integer is an integral object whose type is HOST_WIDE_INT; their written form uses decimal digits.
Integer就是int,使用十进制书写。Wide integer是类型为HOST_WIDE_INT的整体对象,也使用十进制书写。
A string is a sequence of characters. In core it is represented as a char * in usual C fashion, and it is written in C syntax as well. However, strings in RTL may never be null. If you write an empty string in a machine description, it is represented in core as a null pointer rather than as a pointer to a null character. In certain contexts, these null pointers instead of strings are valid. Within RTL code, strings are most commonly found inside symbol_ref expressions, but they appear in other contexts in the RTL expressions that make up machine descriptions.
String是一系列的字符,在内部,它用C风格的char *表示,它也用C语法书写。然而,RTL中的string不能是null。如果你在机器描述(machine description)中写了一个空string,在内部会用一个null指针,而不是一个指向null字符的指针表示。在某些情况下,这个null指针而不是string有效。在RTL代码中,string常常用于symbol_ref expression中,也在其他环境中出现,如在组成机器描述的RTL expression中。
In a machine description, strings are normally written with double quotes, as you would in C. However, strings in machine descriptions may extend over many lines, which is invalid C, and adjacent string constants are not concatenated as they are in C. Any string constant may be surrounded with a single set of parentheses. Sometimes this makes the machine description easier to read.
There is also a special syntax for strings, which can be useful when C code is embedded in a machine description. Wherever a string can appear, it is also valid to write a C-style brace block. The entire brace block, including the outermost pair of braces, is considered to be the string constant. Double quote characters inside the braces are not special. Therefore, if you write string constants in the C code, you need not escape each quote character with a backslash.
A vector contains an arbitrary number of pointers to expressions. The number of elements in the vector is explicitly present in the vector. The written form of a vector consists of square brackets (`[...]') surrounding the elements, in sequence and with whitespace separating them. Vectors of length zero are not created; null pointers are used instead.
Expressions are classified by expression codes (also called RTX codes). The expression code is a name defined in rtl.def, which is also (in uppercase) a C enumeration constant. The possible expression codes and their meanings are machine-independent. The code of an RTX can be extracted with the macro GET_CODE (x) and altered with PUT_CODE (x, newcode).
The expression code determines how many operands the expression contains, and what kinds of objects they are. In RTL, unlike Lisp, you cannot tell by looking at an operand what kind of object it is. Instead, you must know from its context—from the expression code of the containing expression. For example, in an expression of code subreg, the first operand is to be regarded as an expression and the second operand as an integer. In an expression of code plus, there are two operands, both of which are to be regarded as expressions. In a symbol_ref expression, there is one operand, which is to be regarded as a string.
Expressions are written as parentheses containing the name of the expression type, its flags and machine mode if any, and then the operands of the expression (separated by spaces).
Expression code names in the `md' file are written in lowercase, but when they appear in C code they are written in uppercase. In this manual, they are shown as follows: const_int.
In a few contexts a null pointer is valid where an expression is normally wanted. The written form of this is (nil).
RTL类别
12.2 RTL Classes and Formats
The various expression codes are divided into several classes, which are represented by single characters. You can determine the class of an RTX code with the macro GET_RTX_CLASS (code). Currently, rtl.def defines these classes:
| RTX_OBJ | An RTX code that represents an actual object, such as a register (REG) or a memory location (MEM, SYMBOL_REF). LO_SUM) is also included; instead, SUBREG and STRICT_LOW_PART are not in this class, but in class x. |
| RTX_CONST_OBJ | An RTX code that represents a constant object. HIGH is also included in this class. |
| RTX_COMPARE | An RTX code for a non-symmetric comparison, such as GEU or LT. |
| RTX_COMM_COMPARE | An RTX code for a symmetric (commutative) comparison, such as EQ or ORDERED. |
| RTX_UNARY | An RTX code for a unary arithmetic operation, such as NEG, NOT, or ABS. This category also includes value extension (sign or zero) and conversions between integer and floating point. |
| RTX_COMM_ARITH | An RTX code for a commutative binary operation, such as PLUS or AND. NE and EQ are comparisons, so they have class <. |
| RTX_BIN_ARITH | An RTX code for a non-commutative binary operation, such as MINUS, DIV, or ASHIFTRT. |
| RTX_BITFIELD_OPS | An RTX code for a bit-field operation. Currently only ZERO_EXTRACT and SIGN_EXTRACT. These have three inputs and are lvalues (so they can be used for insertion as well). See Bit-Fields. |
| RTX_TERNARY | An RTX code for other three input operations. Currently only IF_THEN_ELSE and VEC_MERGE. |
| RTX_INSN | An RTX code for an entire instruction: INSN, JUMP_INSN, and CALL_INSN. See Insns. |
| RTX_MATCH | An RTX code for something that matches in insns, such as MATCH_DUP. These only occur in machine descriptions. |
| RTX_AUTOINC | An RTX code for an auto-increment addressing mode, such as POST_INC. |
| RTX_EXTRA | All other RTX codes. This category includes the remaining codes used only in machine descriptions (DEFINE_*, etc.). It also includes all the codes describing side effects (SET, USE, CLOBBER, etc.) and the non-insns that may appear on an insn chain, such as NOTE, BARRIER, and CODE_LABEL. SUBREG is also part of this class. |
For each expression code, rtl.def specifies the number of contained objects and their kinds using a sequence of characters called the format of the expression code. For example, the format of subreg is `ei'.
These are the most commonly used format characters:
| e | An expression (actually a pointer to an expression). |
| i | An integer. |
| w | A wide integer. |
| s | A string. |
| E | A vector of expressions. |
A few other format characters are used occasionally:
| u | `u' is equivalent to `e' except that it is printed differently in debugging dumps. It is used for pointers to insns. |
| n | `n' is equivalent to `i' except that it is printed differently in debugging dumps. It is used for the line number or code number of a note insn. |
| S | `S' indicates a string which is optional. In the RTL objects in core, `S' is equivalent to `s', but when the object is read, from an `md' file, the string value of this operand may be omitted. An omitted string is taken to be the null string. |
| V | `V' indicates a vector which is optional. In the RTL objects in core, `V' is equivalent to `E', but when the object is read from an `md' file, the vector value of this operand may be omitted. An omitted vector is effectively the same as a vector of no elements. |
| B | `B' indicates a pointer to basic block structure. |
| 0 | `0' means a slot whose contents do not fit any normal category. `0' slots are not printed at all in dumps, and are often used in special ways by small parts of the compiler. |
| There are macros to get the number of operands and the format of an expression code: | |
| GET_RTX_LENGTH (code) | |
| Number of operands of an RTX of code code. | |
| GET_RTX_FORMAT (code) | |
| The format of an RTX of code code, as a C string. | |
| Some classes of RTX codes always have the same format. For example, it is safe to assume that all comparison operations have format ee. | |
| 1 | All codes of this class have format e. |
| < | All codes of these classes have format ee. |
| c | |
| 2 | |
| b | All codes of these classes have format eee. |
| 3 | |
| i | All codes of this class have formats that begin with iuueiee. See Insns. Note that not all RTL objects linked onto an insn chain are of class i. |
| o | You can make no assumptions about the format of these codes. |
| m | |
| x
|
访问操作数
12.3 Access to Operands
Operands of expressions are accessed using the macros XEXP, XINT, XWINT and XSTR. Each of these macros takes two arguments: an expression-pointer (RTX) and an operand number (counting from zero). Thus,
XEXP (x, 2)
accesses operand 2 of expression x, as an expression.
XINT (x, 2)
accesses the same operand as an integer. XSTR, used in the same fashion, would access it as a string.
Any operand can be accessed as an integer, as an expression or as a string. You must choose the correct method of access for the kind of value actually stored in the operand. You would do this based on the expression code of the containing expression. That is also how you would know how many operands there are.
For example, if x is a subreg expression, you know that it has two operands which can be correctly accessed as XEXP (x, 0) and XINT (x, 1). If you did XINT (x, 0), you would get the address of the expression operand but cast as an integer; that might occasionally be useful, but it would be cleaner to write (int) XEXP (x, 0). XEXP (x, 1) would also compile without error, and would return the second, integer operand cast as an expression pointer, which would probably result in a crash when accessed. Nothing stops you from writing XEXP (x, 28) either, but this will access memory past the end of the expression with unpredictable results.
Access to operands which are vectors is more complicated. You can use the macro XVEC to get the vector-pointer itself, or the macros XVECEXP and XVECLEN to access the elements and length of a vector.
XVEC (exp, idx)
Access the vector-pointer which is operand number idx in exp.
XVECLEN (exp, idx)
Access the length (number of elements) in the vector which is in operand number idx in exp. This value is an int.
XVECEXP (exp, idx, eltnum)
Access element number eltnum in the vector which is in operand number idx in exp. This value is an RTX.
It is up to you to make sure that eltnum is not negative and is less than XVECLEN (exp, idx).
All the macros defined in this section expand into lvalues and therefore can be used to assign the operands, lengths and vector elements as well as to access them.
访问特殊操作数
12.4 Access to Special Operands
Some RTL nodes have special annotations associated with them.
| MEM_ALIAS_SET (x) | If 0, x is not in any alias set, and may alias anything. Otherwise, x can only alias MEMs in a conflicting alias set. This value is set in a language-dependent manner in the front-end, and should not be altered in the back-end. In some front-ends, these numbers may correspond in some way to types, or other language-level entities, but they need not, and the back-end makes no such assumptions. These set numbers are tested with alias_sets_conflict_p. |
| MEM_EXPR (x) | If this register is known to hold the value of some user-level declaration, this is that tree node. It may also be a COMPONENT_REF, in which case this is some field reference, and TREE_OPERAND (x, 0) contains the declaration, or another COMPONENT_REF, or null if there is no compile-time object associated with the reference. |
| MEM_OFFSET (x) | The offset from the start of MEM_EXPR as a CONST_INT rtx. |
| MEM_SIZE (x) | The size in bytes of the memory reference as a CONST_INT rtx. This is mostly relevant for BLKmode references as otherwise the size is implied by the mode. |
| MEM_ALIGN (x) | The known alignment in bits of the memory reference. |
| ORIGINAL_REGNO (x) | This field holds the number the register “originally” had; for a pseudo register turned into a hard reg this will hold the old pseudo register number. |
| REG_EXPR (x) | If this register is known to hold the value of some user-level declaration, this is that tree node. |
| REG_OFFSET (x) | If this register is known to hold the value of some user-level declaration, this is the offset into that logical storage. |
| SYMBOL_REF_DECL (x) | If the symbol_ref x was created for a VAR_DECL or a FUNCTION_DECL, that tree is recorded here. If this value is null, then x was created by back end code generation routines, and there is no associated front end symbol table entry. | ||||||||||||||||||||
| SYMBOL_REF_DECL may also point to a tree of class 'c', that is, some sort of constant. In this case, the symbol_ref is an entry in the per-file constant pool; again, there is no associated front end symbol table entry. | |||||||||||||||||||||
| SYMBOL_REF_CONSTANT (x) | If `CONSTANT_POOL_ADDRESS_P (x)' is true, this is the constant pool entry for x. It is null otherwise. | ||||||||||||||||||||
| SYMBOL_REF_DATA (x) | A field of opaque type used to store SYMBOL_REF_DECL or SYMBOL_REF_CONSTANT. | ||||||||||||||||||||
| SYMBOL_REF_FLAGS (x) | In a symbol_ref, this is used to communicate various predicates about the symbol. Some of these are common enough to be computed by common code, some are specific to the target. The common bits are:
|
