CSCE 531
Spring 2019
Homework 2
Due 11:59pm, Tuesday, March 5, 2019
Reading
On a Linux machine, at the prompt, type
info bison
just as you did for flex in Homework 1. (If you want more information
about navigating through info, type "info info".)
An authoritative web site devoted to bison is maintained by GNU:
http://www.gnu.org/software/bison
containing downloads and full documentation.
General submission guidelines
In general, all exercises that I assign throughout the course, except
drawings, are to be submitted through CSE Dropbox (go to
https://dropbox.cse.sc.edu).
Drawings (if any)
may, but are not required to, be submitted by dropbox as well.
Alternatively, drawings may be submitted in class.
No late submissions will be accepted for this assignment.
Homework
This assignment is to be done individually. No teams. The rules
about collaboration and plagiarism apply here just as with Homework 1.
In this homework you are to modify an existing bison program for a
Simple Expression Evaluator (SEE). You should do these exercises in
order, except that items 5 and 6 may be done independently. You will
write your programs in flex, bison, and C only.
- (0 points)
Download the file hw2.zip to your own directory
and unzip it, producing a subdirectory named hw2. Go into that
directory (cd hw2) and type
make
then type
./see
then type the lines
x = 5
y = 2*x - 3
(x+y)/3
[Ctrl-D]
and observe the output. (There is also an executable file
see-solution, which is a complete solution to the assignment. You may
run that program as well.) To see
more examples, you can also run
./see < sample-input.txt
Running bison with the options given in the Makefile produces a file
y.output, which describes the parser created from the grammar in
detail. You should browse through it, but I don't expect you to
understand the stuff regarding states. We'll cover it later.
- (0 points)
Edit main.c, changing the line
yydebug = 0;
to read
yydebug = 1;
Then repeat all the steps in the last item. Now you get messages
describing what the parser is doing at each step (i.e., shifting what
token or reducing by what production). You may then return this line in
main.c to its original content.
- (30 points)
Currently, yylex() is written by hand in the utilities section of
parse.y. Write a flex program named scan.l that handles the lexical analysis
in the same way. Running flex on scan.l will then produce the needed
yylex() function, so you remove the hand-written one. Note that
variable names in SEE are single letters and case insensitive. (All
attributes are integers, which is the default.) Edit the Makefile to
accommodate scan.l (which includes adding lex.yy.o to the list of
OBJECTS). To let bison and flex work together, you need to
do two things:
- Use the "-d" option when running bison. This (along with the "-y"
option) generates the header file y.tab.h for use by scan.l. (Edit
the Makefile.)
-
Insert the line
#include "y.tab.h"
inside the %{ and %} delimiters in the preamble of scan.l. (Note:
if you include other files in your scan.l preamble, you should place
them above the include directive for y.tab.h, so they are
included first; otherwise, you may get gcc compiler errors.)
-
Update your Makefile to include the new files and their dependencies:
The final executable now also depends on lex.yy.o, which depends on
lex.yy.c, which in turn depends on both scan.l and y.tab.h. Add
y.tab.h as an additional target along with y.tab.c.
- (30 points)
Alter the grammer in parse.y to allow for unary prefix + and - (the first
operator does nothing, the second negates its argument). The syntax
rules are as follows: The first term in an expression may optionally
be preceded with a single '+' or '-', and that is the only place it
can appear. The operator applies to the whole first term. This gives
unary + and - the same precedence and associativity as their
binary counterparts. Thus for example,
- - 2*x + 5 is evaluated as (-(2x))+5,
- - - 3 is a syntax error, but -(-3) is ok,
- 3*-x is a syntax error, but 3*(-x) is ok.
In particular, you cannot apply two of these operators in
a row without anything in between. NOTE: This is how Pascal parses
these operators, but not how
C/C++/Java handles them. In C/C++/Java, all unary operators have
precedence higher than all binary operators.
- (40 points)
Note: If you decide to implement the last task (full extra credit),
then you will need to rewrite some of the code you use for this task.
If you implement the last two tasks fully, you will also get credit for
this task as well.
We now want to remember each expression entered (the expression
itself, not its value) and be able to refer to it by number. The
first expr entered has number 1, the second has number 2, etc. The
idea is to be able to re-evalate expressions with variables
after the variables change. Here is a sample session using my
renamed executable "see2". Each number
followed by a colon is a computer-generated prompt giving the number of the
expression to be entered.
1: x = 5
5
2: y = 2*x - 3
7
3: x = 13
13
4: y = #2
23
5: y
23
6: Ctrl-D
The string '#2' is an expression referring to the expression entered
on input line 2, i.e., 2*x - 3. Note that the value assigned to y
on input line 4 is now 23, reflecting the new value of x used in the
expression '#2'.
Edit parse.y so that syntax trees are created as attributes, rather
than integer values. (For this you will need a %union declaration in
parse.y; see the example below.) Keep the completed syntax trees for
the exprs on a list. Please do not put all your code in parse.y this
time. Make your actions reasonably short (no more than about five or
six lines, say, and fewer is better), calling high-level tree-building
functions. Implement these functions in a separate file, tree.c, with
an accompanying tree.h to export the definitions there. Include
tree.h in the preamble of parse.y and also in the preamble of scan.l
before the inclusion of y.tab.h. (This is because y.tab.h
includes the attribute union you declared in parse.y, but does not
include any of the typedef's that support it. Also note that the type
of yylval is now the union you declared in parse.y.)
Add a new production for factor so that a factor can expand into '#'
followed by a CONST (there could be whitespace in between). The
corresponding action should assign to the parent nonterminal (a
pointer to) the nth syntax tree on the list, where n is the
value of the CONST. If n is out of range, give an error message and
quit the program. Don't create a duplicate tree, just a
reference to the existing tree. Thus, expressions may be shared.
The actions for the productions
assign
: VAR '=' expr
| expr
;
must now each evaluate the tree given as the expr attribute by
post-order traversal and pass its (integer) value to $$ to be printed as
before.
You must also store the expression tree ($3 in the first production above;
$1 in the second) in a global list of expression trees. Note that, in the
case of the first production, you don't store the whole assignment as
an expression, just the right-hand side. The actual update of the
variable only occurs once, i.e., you don't "reassign" as a result of
evaluating a #-reference. For example:
1: x = 5
5
2: x = 17
17
3: #1
5
4: x
17
Put your definition(s) of the tree-evaluating function(s) in a
separate file eval.c, with an accompanying eval.h. (It's alright by
me if you include "tree.h" inside eval.h.) Don't forget to update the
Makefile to accommodate your new source files.
- (Graduate students: 20 points for full credit; Undergrads: 20
points extra credit)
To evaluate these expressions efficiently, one must be
sure not to evaluate a shared node more than once in a traversal.
This precaution can prevent
an exponential slow-down, so it really should be done. For example,
consider
1: 2
2
2: (#1 + #1) % 11
4
3: (#2 + #2) % 11
8
4: (#3 + #3) % 11
5
5: (#4 + #4) % 11
10
6: (#5 + #5) % 11
9
(etc.)
Suppose we continue this pattern. Then a naive evaluation of the
expression on line n takes Θ(2n) time, but if we "memoize"
(or "cache") our intermediate results we can evaluate it in time
Θ(n).
You only need to memoize values of expressions of the form `#n',
that is, values of top-level expressions, one per input line. You do
not need to detect or memoize common subexpressions within the same
expression. Every time any variable is reassigned, you should "erase"
all your currently memoized values, because the
reassignment may render these values obsolete. The next evaluation
would then start from scratch.
- (30 points extra credit for everybody)
Currently, an expression of the form #n requires n to be an
integer constant, and so the top-level expression it refers to is known
"at compile time." Alter the grammar and your syntax-directed
translation so that n itself can be an expression (thus evaluated "at run
time"). Here, "at compile time" means as the syntax tree is being
built, and "at run time" means as the syntax tree is being evaluated.
You now must accommodate '#' as a new unary prefix operator in
your syntax trees. It should have precedence higher than all the
other operators except parentheses. Thus, for example, the expression
#(#x*2)*##(2*y)-4
yields the same syntax tree as the fully parenthesized
((#((#x)*2))*(#(#(2*y))))-4
Now that #-references are not known until run time, you cannot
hard-link the subtree corresponding to the reference into the
expression tree you are building, but instead, treat # as another
unary operator, and do a table look-up at run time to determine which
tree to evaluate. For this, you will need to keep the list of trees
in some data structure that allows fast look-up by index -- a
resizable array, for example.
Now that #-references are changeable dynamically, we have the
possibility of getting circular references. For example,
1: x = 1
1
2: #x
1
3: x = 2
2
4: #2
The evaluation on line 4 now goes
#2 -> #x -> #2 -> #x -> #2 -> ...,
and the recursion will never end. You need to detect circular
references. Upon encountering a circular reference, print an error
message and quit the program.
Here is a natural way of detecting circular references. It is done in
conjunction with memoizing:
- Each new top-level tree is flagged as "unevaluated" when it is
created (but before evaluation starts).
- When evaluating a top-level tree:
- When happening upon a #-reference to an unevaluated tree, flag
the tree as "currently evaluating", and then (recursively) evaluate
the tree. When this recursive evaluation returns with a value, flag
the tree as "evaluated", storing the value there for future
evaluations (i.e., memoizing) and returning that value to the parent call.
- When happening upon a #-reference to an evaluated tree, just
return the value stored there without recursing.
- You find a circular reference whenever you happen upon a #-reference
to a currently evaluating tree.
- Whenever a variable is assigned a value, reset all the existing top-level
trees to "unevaluated" right afterwards, because their values are
potentially obsolete.
Your code should work at the very least on the examples given above, but
these are not the only tests we will run.
Detailed guidelines for testing and grading
We will grade your assignment by running the hw2-test.pl Perl
script (in the hw2 directory) on it on one of the department's Linux
machines using the test files in the test
subdirectory. I highly recommend that you run this script
yourself on your own code in advance to see what to expect and to make
corrections if necessary. Run it on one of the department's linux
machines, using the "--self-test" option as with Homework 1 (see
below). The hw2 directory also includes the comment files that the
script produces on my own solution code, so you can see what the
script will say for a perfect, full-credit submission.
How to run the script on your code
-
Make sure the script file is executable by you. Typing
chmod 755 hw2-test.pl
should do the trick.
-
Edit the line (near the top) in hw2-test.pl starting with
"$testSuiteDir =" to assign to this variable the absolute pathname
of the test subdirectory on your system.
-
Create a new directory, calling it anything you like, e.g., "foobar":
mkdir foobar
copy the directory containing your program files into
foobar so that, for example, ".../foobar/parse.y" and
".../foobar/main.c" etc. exist. (Although the script should not
harm your program files, it does muck about in the directory a bit, and
accidents/bugs do happen, which is why you should run it on a copy of
your program, keeping your original program in a separate directory as a
back-up.)
-
Now you're ready to run the script. If the script is installed in some
directory in your PATH variable, you can just type
hw2-test.pl --self-test foobar
Otherwise, go to the directory containing the script and type
./hw2-test.pl --self-test <path to foobar>
If all goes well, the script will create a text file
comments.txt in the foobar directory. This file contains all the
script's diagnostic output, including a summary of your program's
behavior on each test file at the end.
-
You may correct your code, copy it to the foobar directory, and
re-run the script any number of times as needed.
What the script does (and what it means to you)
After running the script, look at the file "comments.txt" in the
foobar directory. Near the bottom (below the hash marks) is a summary
of your program's behavior on all the test files. Here's the bottom
line: If you see the text "problem(s) found", you will not get credit
for that particular test file. (This assumes that your program compiles
correctly using "make" -- see below; otherwise no credit will be
given.)
The script first runs "make clean" in your directory, checks that no
files that should have been deleted still exist, then runs "make see".
If that fails (i.e., the executable "see" does not exist), it then assumes
that you have implemented the alternative programs "see1" and "see2".
It runs "make see1" and "make see2" and checks them separately.
Assuming you have one program "see", the script runs it repeatedly
with the command
./see < <test_file>.in > <test_file>.out 2> <test_file>.err
where <test_file> ranges over the base names of the files in
/class/csce531-001/handouts/hw2/test (p4-correct1, p4-error1, etc.).
Your program reads the contents of
<test_file>.in from stdin, and the text your program sends to stdout is
captured in the file <test_file>.out. Any text your program sends to
stderr is captured in the file <test_file>.err.
The script handles the test files differently according to their names.
Test file names start with "p4", "p5", "p6", and "p7", depending on
which problem is being tested (Problems 4, 5, 6, and 7,
respectively). If you have two programs see1 and see2, then see1 will
only be run on "p4" files, and see2 will be run on the rest.
If a file name does not contain the substring "error", then it is meant
to be processed normally without any errors. In this case,
the .err file is ignored, and the .out file generated is compared with
the corresponding .out file in the hw2/test directory.
For you to get credit for correctly handling this test file, THESE FILES
MUST MATCH EXACTLY. There are only two possibilities here: credit or no
credit; partial credit will not be given. In addition, if the file name
contains the substring "-timed", then your program will be given 10-11
seconds to run and must quit within that time to get credit (this tests
Problem 6).
Error-handling
If a filename contains "error" as a substring, then it is meant to test your
program's error-handling abilities. In this case, the .out file will
be ignored and only the .err file will be examined. All errors for Problem
4 are simple syntax errors. If your grammar is correct, then by
default, bison will make your program produce the error message
"syntax error" for each of these files. DO NOT CHANGE THIS DEFAULT
BEHAVIOR. The script will check that this is exactly the error message
your program produces, and nothing else.
Error files for Problems 5 and 7 are grammatically
correct, but produce semantic, "run-time" errors, such as out-of-range
and circular reference errors. Below are the error messages my own
program produces. Your error messages should be at least as informative
as these, and they must be appropriate. We will evaluate them with our own
eyes.
- p5-error1: "index 0 out of range"
- p5-error2: "index 3 out of range"
- p5-error3: "index 2 out of range" ("circular reference found" is
also acceptable)
- p7-error1: "index 3 out of range"
- p7-error2: "index 0 out of range"
- p7-error3: "circular reference found"
- p7-error4: "circular reference found"
Since none of these are syntax errors, your error message should not say
"syntax error".
Other test files
We may run your program on other test files besides those currently in
the hw2/test directory.
Problem 3
I have not yet found a way to automate testing of Problem 3. I may add
something to the script later to do that.
Reporting bugs
If you think you've found a bug in the script, please let me know ASAP,
for your sake and the sake of your classmates. Thanks.
How to submit
-
Assemble your source files in a directory called "hw2". In this
directory goes "Makefile", "parse.y", "scan.l", "main.c", and any
other .c and .h source programs you may have created for modularity's sake.
(I will allow extra C source files for this assignment; just remember
that if you create a new C source file, you must alter the Makefile to
accommodate it.)
-
Include in the directory a file named "README", written in plain ASCII
text, describing briefly which parts of the homework you got to work and
which parts, if any, you failed to get to work.
There should be NO OTHER FILES in the directory (for
example, .o files, executables, files automatically generated by flex
or bison, or test input/output files).
-
Go to the parent directory and run
tar cvf hw2.tar hw2
gzip hw2.tar
-
Upload the resulting file "hw2.tar.gz" to the "SEE" section in
CSE Dropbox.
Your tar file and directory names must match those above exactly (with case
sensitivity), otherwise you will get points off.
FAQs
- Some rules seem to be missing from the Makefile. Shouldn't
there be a rule for building main.o from main.c, for
example?
Some common rules are implicit and do not need to be specified in the
Makefile. For example, if it needs to, make will automatically (re)build
some_name.o from some_name.c using
$(CC) $(CPPFLAGS) $(CFLAGS) -c -o some_name.o some_name.c
There are a number of other implicit rules, and you can even define
some yourself. See
this
section of the GNU make manual for more details.
This page was last modified Saturday February 16, 2019 at 09:50:51 EST.