CSCE 531
Spring 2019
Homework 2
Due 11:59pm, Tuesday, March 5, 2019

Reading

On a Linux machine, at the prompt, type
info bison
just as you did for flex in Homework 1. (If you want more information about navigating through info, type "info info".)

An authoritative web site devoted to bison is maintained by GNU:

http://www.gnu.org/software/bison

containing downloads and full documentation.

General submission guidelines

In general, all exercises that I assign throughout the course, except drawings, are to be submitted through CSE Dropbox (go to https://dropbox.cse.sc.edu). Drawings (if any) may, but are not required to, be submitted by dropbox as well. Alternatively, drawings may be submitted in class.

No late submissions will be accepted for this assignment.

Homework

This assignment is to be done individually. No teams. The rules about collaboration and plagiarism apply here just as with Homework 1.

In this homework you are to modify an existing bison program for a Simple Expression Evaluator (SEE). You should do these exercises in order, except that items 5 and 6 may be done independently. You will write your programs in flex, bison, and C only.

(0 points) Download the file hw2.zip to your own directory and unzip it, producing a subdirectory named hw2. Go into that directory (cd hw2) and type
    make
then type
    ./see
then type the lines
    x = 5     y = 2*x - 3     (x+y)/3     [Ctrl-D]
and observe the output. (There is also an executable file see-solution, which is a complete solution to the assignment. You may run that program as well.) To see more examples, you can also run
    ./see < sample-input.txt
Running bison with the options given in the Makefile produces a file y.output, which describes the parser created from the grammar in detail. You should browse through it, but I don't expect you to understand the stuff regarding states. We'll cover it later.
(0 points) Edit main.c, changing the line
yydebug = 0;
to read
yydebug = 1;
Then repeat all the steps in the last item. Now you get messages describing what the parser is doing at each step (i.e., shifting what token or reducing by what production). You may then return this line in main.c to its original content.
(30 points) Currently, yylex() is written by hand in the utilities section of parse.y. Write a flex program named scan.l that handles the lexical analysis in the same way. Running flex on scan.l will then produce the needed yylex() function, so you remove the hand-written one. Note that variable names in SEE are single letters and case insensitive. (All attributes are integers, which is the default.) Edit the Makefile to accommodate scan.l (which includes adding lex.yy.o to the list of OBJECTS). To let bison and flex work together, you need to do two things:
1. Use the "-d" option when running bison. This (along with the "-y" option) generates the header file y.tab.h for use by scan.l. (Edit the Makefile.)
2. Insert the line
  #include "y.tab.h"
  inside the %{ and %} delimiters in the preamble of scan.l. (Note: if you include other files in your scan.l preamble, you should place them above the include directive for y.tab.h, so they are included first; otherwise, you may get gcc compiler errors.)
3. Update your Makefile to include the new files and their dependencies: The final executable now also depends on lex.yy.o, which depends on lex.yy.c, which in turn depends on both scan.l and y.tab.h. Add y.tab.h as an additional target along with y.tab.c.
(30 points) Alter the grammer in parse.y to allow for unary prefix + and - (the first operator does nothing, the second negates its argument). The syntax rules are as follows: The first term in an expression may optionally be preceded with a single '+' or '-', and that is the only place it can appear. The operator applies to the whole first term. This gives unary + and - the same precedence and associativity as their binary counterparts. Thus for example,
- - 2*x + 5 is evaluated as (-(2x))+5,
- - - 3 is a syntax error, but -(-3) is ok,
- 3*-x is a syntax error, but 3*(-x) is ok.
In particular, you cannot apply two of these operators in a row without anything in between. NOTE: This is how Pascal parses these operators, but not how C/C++/Java handles them. In C/C++/Java, all unary operators have precedence higher than all binary operators.
(40 points)

Note: If you decide to implement the last task (full extra credit), then you will need to rewrite some of the code you use for this task. If you implement the last two tasks fully, you will also get credit for this task as well.

We now want to remember each expression entered (the expression itself, not its value) and be able to refer to it by number. The first expr entered has number 1, the second has number 2, etc. The idea is to be able to re-evalate expressions with variables after the variables change. Here is a sample session using my renamed executable "see2". Each number followed by a colon is a computer-generated prompt giving the number of the expression to be entered.
  1: x = 5 5   2: y = 2*x - 3 7   3: x = 13 13   4: y = #2 23   5: y 23   6: Ctrl-D
The string '#2' is an expression referring to the expression entered on input line 2, i.e., 2*x - 3. Note that the value assigned to y on input line 4 is now 23, reflecting the new value of x used in the expression '#2'.

Edit parse.y so that syntax trees are created as attributes, rather than integer values. (For this you will need a %union declaration in parse.y; see the example below.) Keep the completed syntax trees for the exprs on a list. Please do not put all your code in parse.y this time. Make your actions reasonably short (no more than about five or six lines, say, and fewer is better), calling high-level tree-building functions. Implement these functions in a separate file, tree.c, with an accompanying tree.h to export the definitions there. Include tree.h in the preamble of parse.y and also in the preamble of scan.l before the inclusion of y.tab.h. (This is because y.tab.h includes the attribute union you declared in parse.y, but does not include any of the typedef's that support it. Also note that the type of yylval is now the union you declared in parse.y.)

Add a new production for factor so that a factor can expand into '#' followed by a CONST (there could be whitespace in between). The corresponding action should assign to the parent nonterminal (a pointer to) the n^th syntax tree on the list, where n is the value of the CONST. If n is out of range, give an error message and quit the program. Don't create a duplicate tree, just a reference to the existing tree. Thus, expressions may be shared.

The actions for the productions
    assign         : VAR '=' expr         | expr         ;
must now each evaluate the tree given as the expr attribute by post-order traversal and pass its (integer) value to $$ to be printed as before.

You must also store the expression tree ($3 in the first production above; $1 in the second) in a global list of expression trees. Note that, in the case of the first production, you don't store the whole assignment as an expression, just the right-hand side. The actual update of the variable only occurs once, i.e., you don't "reassign" as a result of evaluating a #-reference. For example:
   1: x = 5 5    2: x = 17 17    3: #1 5    4: x 17
Put your definition(s) of the tree-evaluating function(s) in a separate file eval.c, with an accompanying eval.h. (It's alright by me if you include "tree.h" inside eval.h.) Don't forget to update the Makefile to accommodate your new source files.
(Graduate students: 20 points for full credit; Undergrads: 20 points extra credit)

To evaluate these expressions efficiently, one must be sure not to evaluate a shared node more than once in a traversal. This precaution can prevent an exponential slow-down, so it really should be done. For example, consider
1: 2 2 2: (#1 + #1) % 11 4 3: (#2 + #2) % 11 8 4: (#3 + #3) % 11 5 5: (#4 + #4) % 11 10 6: (#5 + #5) % 11 9 (etc.)
Suppose we continue this pattern. Then a naive evaluation of the expression on line n takes Θ(2ⁿ) time, but if we "memoize" (or "cache") our intermediate results we can evaluate it in time Θ(n).

You only need to memoize values of expressions of the form `#n', that is, values of top-level expressions, one per input line. You do not need to detect or memoize common subexpressions within the same expression. Every time any variable is reassigned, you should "erase" all your currently memoized values, because the reassignment may render these values obsolete. The next evaluation would then start from scratch.
(30 points extra credit for everybody) Currently, an expression of the form #n requires n to be an integer constant, and so the top-level expression it refers to is known "at compile time." Alter the grammar and your syntax-directed translation so that n itself can be an expression (thus evaluated "at run time"). Here, "at compile time" means as the syntax tree is being built, and "at run time" means as the syntax tree is being evaluated. You now must accommodate '#' as a new unary prefix operator in your syntax trees. It should have precedence higher than all the other operators except parentheses. Thus, for example, the expression
#(#x*2)*##(2*y)-4
yields the same syntax tree as the fully parenthesized
((#((#x)*2))*(#(#(2*y))))-4

Now that #-references are not known until run time, you cannot hard-link the subtree corresponding to the reference into the expression tree you are building, but instead, treat # as another unary operator, and do a table look-up at run time to determine which tree to evaluate. For this, you will need to keep the list of trees in some data structure that allows fast look-up by index -- a resizable array, for example.

Now that #-references are changeable dynamically, we have the possibility of getting circular references. For example,
1: x = 1 1 2: #x 1 3: x = 2 2 4: #2
The evaluation on line 4 now goes
#2 -> #x -> #2 -> #x -> #2 -> ...,
and the recursion will never end. You need to detect circular references. Upon encountering a circular reference, print an error message and quit the program.

Here is a natural way of detecting circular references. It is done in conjunction with memoizing:

Each new top-level tree is flagged as "unevaluated" when it is created (but before evaluation starts).
When evaluating a top-level tree:
1. When happening upon a #-reference to an unevaluated tree, flag the tree as "currently evaluating", and then (recursively) evaluate the tree. When this recursive evaluation returns with a value, flag the tree as "evaluated", storing the value there for future evaluations (i.e., memoizing) and returning that value to the parent call.
2. When happening upon a #-reference to an evaluated tree, just return the value stored there without recursing.
3. You find a circular reference whenever you happen upon a #-reference to a currently evaluating tree.
Whenever a variable is assigned a value, reset all the existing top-level trees to "unevaluated" right afterwards, because their values are potentially obsolete.

Your code should work at the very least on the examples given above, but these are not the only tests we will run.

Detailed guidelines for testing and grading

We will grade your assignment by running the hw2-test.pl Perl script (in the hw2 directory) on it on one of the department's Linux machines using the test files in the test subdirectory. I highly recommend that you run this script yourself on your own code in advance to see what to expect and to make corrections if necessary. Run it on one of the department's linux machines, using the "--self-test" option as with Homework 1 (see below). The hw2 directory also includes the comment files that the script produces on my own solution code, so you can see what the script will say for a perfect, full-credit submission.

How to run the script on your code

Make sure the script file is executable by you. Typing
chmod 755 hw2-test.pl
should do the trick.
Edit the line (near the top) in hw2-test.pl starting with "$testSuiteDir =" to assign to this variable the absolute pathname of the test subdirectory on your system.
Create a new directory, calling it anything you like, e.g., "foobar":
mkdir foobar
copy the directory containing your program files into foobar so that, for example, ".../foobar/parse.y" and ".../foobar/main.c" etc. exist. (Although the script should not harm your program files, it does muck about in the directory a bit, and accidents/bugs do happen, which is why you should run it on a copy of your program, keeping your original program in a separate directory as a back-up.)
Now you're ready to run the script. If the script is installed in some directory in your PATH variable, you can just type
hw2-test.pl --self-test foobar
Otherwise, go to the directory containing the script and type
./hw2-test.pl --self-test <path to foobar>
If all goes well, the script will create a text file comments.txt in the foobar directory. This file contains all the script's diagnostic output, including a summary of your program's behavior on each test file at the end.
You may correct your code, copy it to the foobar directory, and re-run the script any number of times as needed.

What the script does (and what it means to you)

After running the script, look at the file "comments.txt" in the foobar directory. Near the bottom (below the hash marks) is a summary of your program's behavior on all the test files. Here's the bottom line: If you see the text "problem(s) found", you will not get credit for that particular test file. (This assumes that your program compiles correctly using "make" -- see below; otherwise no credit will be given.) The script first runs "make clean" in your directory, checks that no files that should have been deleted still exist, then runs "make see". If that fails (i.e., the executable "see" does not exist), it then assumes that you have implemented the alternative programs "see1" and "see2". It runs "make see1" and "make see2" and checks them separately. Assuming you have one program "see", the script runs it repeatedly with the command
./see < <test_file>.in > <test_file>.out 2> <test_file>.err
where <test_file> ranges over the base names of the files in /class/csce531-001/handouts/hw2/test (p4-correct1, p4-error1, etc.). Your program reads the contents of <test_file>.in from stdin, and the text your program sends to stdout is captured in the file <test_file>.out. Any text your program sends to stderr is captured in the file <test_file>.err. The script handles the test files differently according to their names. Test file names start with "p4", "p5", "p6", and "p7", depending on which problem is being tested (Problems 4, 5, 6, and 7, respectively). If you have two programs see1 and see2, then see1 will only be run on "p4" files, and see2 will be run on the rest. If a file name does not contain the substring "error", then it is meant to be processed normally without any errors. In this case, the .err file is ignored, and the .out file generated is compared with the corresponding .out file in the hw2/test directory. For you to get credit for correctly handling this test file, THESE FILES MUST MATCH EXACTLY. There are only two possibilities here: credit or no credit; partial credit will not be given. In addition, if the file name contains the substring "-timed", then your program will be given 10-11 seconds to run and must quit within that time to get credit (this tests Problem 6).

Error-handling

If a filename contains "error" as a substring, then it is meant to test your program's error-handling abilities. In this case, the .out file will be ignored and only the .err file will be examined. All errors for Problem 4 are simple syntax errors. If your grammar is correct, then by default, bison will make your program produce the error message "syntax error" for each of these files. DO NOT CHANGE THIS DEFAULT BEHAVIOR. The script will check that this is exactly the error message your program produces, and nothing else. Error files for Problems 5 and 7 are grammatically correct, but produce semantic, "run-time" errors, such as out-of-range and circular reference errors. Below are the error messages my own program produces. Your error messages should be at least as informative as these, and they must be appropriate. We will evaluate them with our own eyes.

p5-error1: "index 0 out of range"
p5-error2: "index 3 out of range"
p5-error3: "index 2 out of range" ("circular reference found" is also acceptable)
p7-error1: "index 3 out of range"
p7-error2: "index 0 out of range"
p7-error3: "circular reference found"
p7-error4: "circular reference found"

Since none of these are syntax errors, your error message should not say "syntax error".

Other test files

We may run your program on other test files besides those currently in the hw2/test directory.

Problem 3

I have not yet found a way to automate testing of Problem 3. I may add something to the script later to do that.

Reporting bugs

If you think you've found a bug in the script, please let me know ASAP, for your sake and the sake of your classmates. Thanks.

How to submit

Assemble your source files in a directory called "hw2". In this directory goes "Makefile", "parse.y", "scan.l", "main.c", and any other .c and .h source programs you may have created for modularity's sake. (I will allow extra C source files for this assignment; just remember that if you create a new C source file, you must alter the Makefile to accommodate it.)
Include in the directory a file named "README", written in plain ASCII text, describing briefly which parts of the homework you got to work and which parts, if any, you failed to get to work. There should be NO OTHER FILES in the directory (for example, .o files, executables, files automatically generated by flex or bison, or test input/output files).
Go to the parent directory and run
tar cvf hw2.tar hw2 gzip hw2.tar
Upload the resulting file "hw2.tar.gz" to the "SEE" section in CSE Dropbox.

Your tar file and directory names must match those above exactly (with case sensitivity), otherwise you will get points off.

FAQs

Some rules seem to be missing from the Makefile. Shouldn't there be a rule for building main.o from main.c, for example? Some common rules are implicit and do not need to be specified in the Makefile. For example, if it needs to, make will automatically (re)build some_name.o from some_name.c using
$(CC) $(CPPFLAGS) $(CFLAGS) -c -o some_name.o some_name.c
There are a number of other implicit rules, and you can even define some yourself. See this section of the GNU make manual for more details.

This page was last modified Saturday February 16, 2019 at 09:50:51 EST.

CSCE 531 Spring 2019 Homework 2 Due 11:59pm, Tuesday, March 5, 2019