OOC:Programming Savvy --Arithmetic Expressions
来自 ChinaUnix Wiki
目录 |
Programming Savvy --Arithmetic Expressions
3 Programming Savvy --Arithmetic Expressions
/* 翻译有不少错误,还请各位大哥帮忙改改,尤其是注有斜体的地方.谢谢!*/
/*为开源努力....**************************************************/
Dynamic linkage is a powerful programming technique in its own right. Rather
than writing a few functions, each with a big switch to handle many special cases,
we can write many small functions, one for each case, and arrange for the proper
function to be called by dynamic linkage. This often simplifies a routine job and it
usually results in code that can be extended easily.
--动态链接就其本身而言是一种很有用的程序规划技术。相比写比较少的附带许多switch语句控制许多特殊情况的函数,不如写许多短的函数,这样,在每种情况下,都能很好地动态链接 。这也常常简化了我们的程序工作并且它常常使我们的代码容易扩展。
As an example we will write a small program to read and evaluate arithmetic expressions consisting of floating point numbers, parentheses and the usual operators for addition, subtraction, and so on. Normally we would use the compiler generator tools lex and yacc to build that part of the program which recognizes an arithmetic expression. This book is not about compiler building, however, so just this once we will write this code ourselves.
--例如我们要写一个计算由浮点数、圆括号及一些常用的加、减等算术符号组成的算术表达式的程序。一般说来,我们将要通过规定的编译工具和另外的某编译器来生成能计算这个算术运算的目标程序。当然,这本书不是谈论编译器的工作原理,所以,这一次我们将亲自写出程序代码.
3.1 The Main Loop
The main loop of the program reads a line from standard input, initializes things so that numbers and operators can be extracted and white space is ignored, calls up a function to recognize a correct arithmetic expression and somehow store it, and finally processes whatever was stored. If things go wrong, we simply read the next input line. Here is the main loop:
--3.1 主循环 这个程序的主循环先从标准输入中读取一行,并且根据它进行部分初始化,这样,数字和操作符就能被提取出来,而且空格也被忽略掉了。然后调用一个函数来识别这个正确的算术表达式,并以某种方式存储它,最后处理这些存储的结果。如果出错,那我们可以简单地阅读下一个输入行。下面是主函数:
#include <setjmp.h>
static enum tokens token; /* current input symbol */ /*当前输入信号*/
static jmp_buf onError;
int main (void)
{ volatile int errors = 0;
char buf [BUFSIZ];
if (setjmp(onError))
++ errors;
while (gets(buf))
if (scan(buf))
{ void * e = sum();
if (token)
error("trash after sum");
process(e);
delete(e);
}
return errors > 0;
}
void error (const char * fmt, ...)
{ va_list ap;
va_start(ap, fmt);
vfprintf(stderr, fmt, ap), putc(’\n’, stderr);
va_end(ap);
longjmp(onError, 1);
}
The error recovery point is defined with setjmp().If error() is called somewhere in the program, longjmp() continues execution with another return from setjmp().In this case, the result is the value passed to longjmp(), the error is counted, and the next input line is read. The exit code of the program reports if any errors were encountered.
--错误校正在setjmp()中定义了。如果error()函数在程序的某个地方被调用,则伴随着setjmp()的另一个循环,longjmp()将继续被执行。在这种情况下,结果就就传给了longjmp(),错误也被保存了,同时下一个输入行仍有效。离开当前程序时将报告这段代码是否有错误。
3.2 The Scanner
In the main loop, once an input line has been read into buf[], it is passed to scan(), which for each call places the next input symbol into the variable token. At the end of a line token is zero:
--3、2 扫描仪 在主循环中,一旦输入数据被传入buf[],它就会传递至scan()函数,为每一处调用函数的地方提供下一个输入变量标志。在每行的结束是零标记。
#include <ctype.h>
#include <errno.h>
#include <stdlib.h>
#include "parse.h" /* defines NUMBER */
static double number; /* if NUMBER: numerical value */
static enum tokens scan (const char * buf)
/* return token = next input symbol */ /*返回的标记=下一个输入标记*/
{ static const char * bp;
if (buf)
bp = buf; /* new input line */
while (isspace(* bp))
++ bp;
if (isdigit(* bp) || * bp == ’.’)
{ errno = 0;
token = NUMBER, number = strtod(bp, (char **) & bp);
if (errno == ERANGE)
error("bad value: %s", strerror(errno));
}
else
token = * bp ? * bp ++ : 0;
return token;
}
We call scan() with the address of an input line or with a null pointer to continue work on the present line. White space is ignored and for a leading digit or decimal point we extract a floating point number with the ANSI-C function strtod(). Any other character is returned as is, and we do not advance past a null byte at the end of the input buffer.
--我们通过输入的地址或者空指针调用scan()函数来继续运行。为了求出符合ANSI-C 标准的strtod()函数中的浮点数,我们忽略了空格,和首要的数字或小数点.其它的任何特性仍不变,并且我们不会在输入的缓冲提前输出以前的空位。
3.3 The Recognizer
The result of scan() is stored in the global variable token — this simplifies the recognizer. If we have detected a number, we return the unique value NUMBER and we make the actual value available in the global variable number. --3、3 识别器 Scan()函数的结果被存取在全局变量中—-这简化了识别器的工作。如果我们发现了一个数值,我们就返回它的唯一值并且确保它的全局变量有效。
At the top level expressions are recognized by the function sum() which internally calls on scan() and returns a representation that can be manipulated by process() and reclaimed by delete(). If we do not use yacc we recognize expressions by the method of recursive descent where grammatical rules are translated into equivalent C functions. For example: a sum is a product, followed by zero or more groups, each consisting of an addition operator and another product. A grammatical rule like
--在最外层的表达式被内在scan()函数调用的sum()函数验证,同时返回了一个能被process()函数处理和被delete()函数回收的表现形式。
sum : product { +|— product }...
is translated into a C function like
被翻译成C函数就成了
void sum (void)
{
product();
for (;;)
{ switch (token) {
case ’+’:
case ’—’:
scan(0), product(); continue;
}
return;
}
}
There is a C function for each grammatical rule so that rules can call each other. Alternatives are translated into switch or if statements, iterations in the grammar produce loops in C. The only problem is that we must avoid infinite recursion.
token always contains the next input symbol. If we recognize it, we must call scan(0) to advance in the input and store a new symbol in token.
-这是一个能互相调用且符号文法规则的C函数。我们需将其翻译成switch语句或if语句,否则它将会在C语言中不停地循环。在这里,我们唯一需要注意的问题就是必须避免其无限的循环。一些标志经常包含下一个输入的标志。如果我们想运用它,那我们就必须调用scan(0)来提前输入完并存储一个新的标志。
3.4 The Processor
How do we process an expression? If we only want to perform simple arithmetic on numerical values, we can extend the recognition functions and compute the result as soon as we recognize the operators and the operands: sum() would expect a double result from each call to product(), perform addition or subtraction as soon as possible, and return the result, again as a double function value.
--3、4 计算程序 我们怎样计算一个表达式?如果我们仅仅是简单的算术运算,我们可以通过扩展一个识别函数 来快速地计算出结果,而且操作数sum()每次将得到调用product()函数的返回数,快速进行加减,并返回以双函数计算的结果
If we want to build a system that can handle more complicated expressions we need to store expressions for later processing. In this case, we can not only perform arithmetic, but we can permit decisions and conditionally evaluate only part of an expression, and we can use stored expressions as user functions within other expressions. All we need is a reasonably general way to represent an expression. The conventional technique is to use a binary tree and store token in each node:
如果我们想建立一个能计算更复杂表达式的系统(模板),则需要存取为下一个计算作准备的表达式。在这种情况下,我们不仅可以计算,而且还可以有条件地计算部分表达式,甚至还可以利用在其它表达式中作为用户函数的表达式来计算。我们所需要的就是一种常规方法来表达一个数学表达式。最常用的方法就是运用一个二进制树同时在每一个结点储存一个标志。
struct Node {
enum tokens token;
struct Node * left, * right;
};
This is not very flexible, however. We need to introduce a union to create a node in which we can store a numerical value and we waste space in nodes representing unary operators. Additionally, process() and delete() will contain switch statements which grow with every new token which we invent.
--然而这比较难以操作。因为我们需要引进一个联合体来产生一个能储存数值的结点,这使得代表一元操作符的结点浪费了空间。另外,process()和delete()函数将会包含一个伴随着我们添加标志时增长的switch语句。
3.5 Information Hiding
Applying what we have learned thus far, we do not reveal the structure of a node at all. Instead, we place some declarations in a header file value.h:
--3、5 隐藏信息 运用目前已学的知识,我们完全不能明白结点的结构。倒是我们存储了许多声明在value.h头文件中。
const void * Add;
...
void * new (const void * type, ...);
void process (const void * tree);
void delete (void * tree);
Now we can code sum() as follows:
#include "value.h"
static void * sum (void)
{ void * result = product();
const void * type;
for (;;)
{ switch (token) {
case ’+’:
type = Add;
break;
case ’—’:
type = Sub;
break;
default:
return result;
}
scan(0);
result = new(type, result, product());
}
}
product() has the same architecture as sum() and calls on a function factor() to recognize numbers, signs, and a sum enclosed in parentheses:
--product()与sum()函数有着相同的结构,并且调用了同一个factor()函数来识别数字,信号,和一个被圆括号附加的sum()函数。
static void * sum (void);
static void * factor (void)
{ void * result;
switch (token) {
case ’+’:
scan(0);
return factor();
case ’—’:
scan(0);
return new(Minus, factor());
default:
error("bad factor: ’%c’ 0x%x", token, token);
case NUMBER:
result = new(Value, number);
break;
case ’(’:
scan(0);
result = sum();
if (token != ’)’)
error("expecting )");
}
scan(0);
return result;
}
Especially in factor() we need to be very careful to maintain the scanner invariant: token must always contain the next input symbol. As soon as token is consumed we need to call scan(0).
--尤其在factor()函数中,我们必须非常小心地维护好扫描器的常量:其标志必须包含下一个输入信号.当标志销毁时,我们就必须调用scan(0).
3.6 Dynamic Linkage
3.6 动态链接
The recognizer is complete. value.h completely hides the evaluator for arithmetic expressions and at the same time specifies what we have to implement. new() takes a description such as Add and suitable arguments such as pointers to the operands of the addition and returns a pointer representing the sum:
识别器完成了。value.h完全隐藏了数学表达式的求值程序,同时也指定了我们必须要实现的函数。 new()的参数为运算符和合适的参数,如加法运算符和指向加数的指针,然后返回指向和的指针:
struct Type {
void * (* new) (va_list ap);
double (* exec) (const void * tree);
void (* delete) (void * tree);
};
void * new (const void * type, ...)
{ va_list ap;
void * result;
assert(type && ((struct Type *) type) —> new);
va_start(ap, type);
result = ((struct Type *) type) —> new(ap);
* (const struct Type **) result = type;
va_end(ap);
return result;
}
We use dynamic linkage and pass the call to a node-specific routine which, in the case of Add, has to create the node and enter the two pointers:
我们利用动态链接,调用节点相关的子程序,在这里就是加法。这个子程序必须创造出节点, 并且要给这两个指针赋值。
struct Bin {
const void * type;
void * left, * right;
};
static void * mkBin (va_list ap)
{ struct Bin * node = malloc(sizeof(struct Bin));
assert(node);
node —> left = va_arg(ap, void *);
node —> right = va_arg(ap, void *);
return node;
}
Note that only mkBin() knows what node it creates. All we require is that the various nodes start with a pointer for the dynamic linkage. This pointer is entered by new() so that delete() can reach its node-specific function:
请注意,只有mkBin()知道它造出的节点是什么样子的。 我们所需要的仅是为动态链接所准备的指向这些节点的一个指针。 这个指针被new()赋值,所以delete()可以使用 它的节点相关函数。
void delete (void * tree)
{
assert(tree && * (struct Type **) tree
&& (* (struct Type **) tree) —> delete);
(* (struct Type **) tree) —> delete(tree);
}
static void freeBin (void * tree)
{
delete(((struct Bin *) tree) —> left);
delete(((struct Bin *) tree) —> right);
free(tree);
}
Dynamic linkage elegantly avoids complicated nodes. .new() creates precisely the right node for each type description: binary operators have two descendants, unary operators have one, and a value node only contains the value. delete() is a very simple function because each node handles its own destruction: binary operators delete two subtrees and free their own node, unary operators delete only one subtree, and a value node will only free itself. Variables or constants can even remain behind — they simply would do nothing in response to delete().
动态链接很好的避免了代码的复杂化。new()为每种运算符都创造了合适的节点:二元运算符有两个分支, 一元运算符有一个,值节点只包含值。delete()是一个非常简单的函数,因为每个节点都带着它自己的析构函数: 二元运算符删除两个子树,释放自己的节点,一元运算符只删除一个子树,一个值节点只需要把自己释放掉。 变量和常量甚至可以存活到后面--在delete()中他们什么都不做。
3.7 A Postfix Writer
3.7 生成后缀格式
So far we have not really decided what process() is going to do. If we want to emit a postfix version of the expression, we would add a character string to the struct Type to show the actual operator and process() would arrange for a single output line indented by a tab:
到目前为止,我们还没决定process()要做的事。如果我们想生成一个后缀的表达式,我们就要再结构类型中 加上一个字符串,用来显示实际的运算符,process()将会用一个tab缩进的单行输出处理还一切。
void process (const void * tree)
{
putchar(’\t’);
exec(tree);
putchar(’\n’);
}
exec() handles the dynamic linkage:
exec()处理动态链接:
static void exec (const void * tree)
{
assert(tree && * (struct Type **) tree
&& (* (struct Type **) tree) —> exec);
(* (struct Type **) tree) —> exec(tree);
}
and every binary operator is emitted with the following function:
(译者注:最简表达式的)每一个二元运算符都由下面的函数打印出来:
static void doBin (const void * tree)
{
exec(((struct Bin *) tree) —> left);
exec(((struct Bin *) tree) —> right);
printf(" %s", (* (struct Type **) tree) —> name);
}
The type descriptions tie everything together:
运算符把所有的东西结合起来:
static struct Type _Add = { "+", mkBin, doBin, freeBin };
static struct Type _Sub = { "—", mkBin, doBin, freeBin };
const void * Add = & _Add;
const void * Sub = & _Sub;
It should be easy to guess how a numerical value is implemented. It is represented as a structure with a double information field:
数值的实现很容易,它是含有一个double类型数据域的结构体。
struct Val {
const void * type;
double value;
};
static void * mkVal (va_list ap)
{ struct Val * node = malloc(sizeof(struct Val));
assert(node);
node —> value = va_arg(ap, double);
return node;
}
Processing consists of printing the value:
处理过程包括打印出数值节点的值:
static void doVal (const void * tree)
{
printf(" %g", ((struct Val *) tree) —> value);
}
We are done — there is no subtree to delete, so we can use the library function free() directly to delete the value node:
工作已经完成了--这里没有需要删除的子树,所以我们可以直接使用库函数free()来删除值节点。
static struct Type _Value = { "", mkVal, doVal, free };
const void * Value = & _Value;
A unary operator such as Minus is left as an exercise.
像负号这样的一元运算符就留给读者练习。
3.8 Arithmetic
3.8 算术
If we want to do arithmetic, we let the execute functions return double values to be printed in process():
如果我们想做算术运算,我们就把process()中执行的函数的double型返回值打印出来。
static double exec (const void * tree)
{
return (* (struct Type **) tree) —> exec(tree);
}
void process (const void * tree)
{
printf("\t%g\n", exec(tree));
}
For each type of node we need one execution function which computes and returns the value for the node. Here are two examples:
对于每一种类型的节点,我们都需要一个函数来计算并且返回结果。下面是两个例子:
static double doVal (const void * tree)
{
return ((struct Val *) tree) —> value;
}
static double doAdd (const void * tree)
{
return exec(((struct Bin *) tree) —> left) +
exec(((struct Bin *) tree) —> right);
}
static struct Type _Add = { mkBin, doAdd, freeBin };
static struct Type _Value = { mkVal, doVal, free };
const void * Add = & _Add;
const void * Value = & _Value;
3.9 Infix Output
3.9 中缀输出
Perhaps the highlight of processing arithmetic expressions is to print them with a minimal number of parentheses. This is usually a bit tricky, depending on who is responsible for emitting the parentheses. In addition to the operator name used for postfix output we add two numbers to the struct Type:
恐怕处理算术表达式的亮点就是,使用尽可能少的括号把它们表示出来。 这通常有点棘手,取决于谁来负责抛出这个括号。 除了中缀所用的运算符名字外,我们为这个结构类型加两个成员。
struct Type {
const char * name; /* node’s name */
char rank, rpar;
void * (* new) (va_list ap);
void (* exec) (const void * tree, int rank, int par);
void (* delete) (void * tree);
};
.rank is the precedence of the operator, starting with 1 for addition. .rpar is set for operators such as subtraction, which require their right operand to be enclosed in parentheses if it uses an operator of equal precedence. As an example consider
.rank 是运算符的优先级,从1递增。.rpar是为了像减号这样的运算符准备的, 当减号遇到左操作数使用一个同优先级的运算符时,需要把这个左操作数用括号括起来, 以使左操作数和减号结合起来。 考虑下面的例子:
$ infix
1+(2 — 3)
1+2—3
1 — (2 — 3)
1 — (2 — 3)
This demonstrates that we have the following initialization:
在这个例子中,我们做如下初始化:
static struct Type _Add = {"+", 1, 0, mkBin, doBin, freeBin};
static struct Type _Sub = {"—", 1, 1, mkBin, doBin, freeBin};
The tricky part is for a binary node to decide if it must surround itself with parentheses. A binary node such as an addition is given the precedence of its superior and a flag indicating whether parentheses are needed in the case of equal precedence. doBin() decides if it will use parentheses:
棘手的地方就在于,一个二元节点判断自己是否需要被括号括起来。 比如加法这样的二元节点,有自己的优先级,还有一个用于指示遇到同优先级运算符时是否需要使用括号的标志。doBin()来决定它是否需要括号:
static void doBin (const void * tree, int rank, int par)
{ const struct Type * type = * (struct Type **) tree;
par = type —> rank < rank
|| (par && type —> rank == rank);
if (par) putchar(’(’);
If our node has less precedence than its superior, or if we are asked to put up parentheses on equal precedence, we print parentheses. In any case, if our description has .rpar set, we require only of our right operand that it put up extra parentheses:
当我们的节点的优先级小于它的上一级时,或者被要求在同优先级时使用括号,那就使用括号。 任何时候,只要我们的运算符的.rpar被设置了,我们就要求它的右操作数加上额外的括号。
exec(((struct Bin *) tree) —> left, type —> rank, 0);
printf(" %s ", type —> name);
exec(((struct Bin *) tree) —> right,
type —> rank, type —> rpar);
if (par) putchar(’)’);
}
The remaining printing routines are significantly simpler to write.
打印最简表达式的程序余下的部分写起来就相当简单了。
3.10 Summary
3.10 小结
Three different processors demonstrate the advantages of information hiding. Dynamic linkage has helped to divide a problem into many very simple functions. The resulting program is easily extended — try adding comparisons and an operator like ?: in C.
三个不同的处理函数显示出信息隐藏的优点。动态链接可以帮助把一个大问题分割成许多简单的小问题。 编出来的程序也很容易实现扩展--试试加上比较功能,再加上一个c语言中的?:操作符。
