Process C global variable declarations. This involves both installing the declarations into the symbol table and allocating memory for the variables in the assembly language output file. Also, after all declarations have been processed, you should dump the symbol table (using st_dump() from symtab.h); to do this, run your executable with the "-d" or "--dump" option as a command line argument.
Your compiler should read C source code from stdin and write the x86 assembly language output to stdout. Your compiler executable should be called pcc3. You will not have to emit assembly code explicitly, but rather call appropriate routines in the back end (backend-x86.c and backend-x86.h). Besides altering the gram.y file, put syntax tree-building functions into a new file tree.c, with definitions for export in tree.h. Put code-generating routines into a new file encode.c, with definitions for export in encode.h. With few exceptions throughout the project, all backend routines are called from encode.c (some may be called directly from the grammar). No backend routines should be called from tree.c, hence you will not need to include backend-x86.h in tree.c.
The scores given below are for graduate students. Undergraduates get a 10% boost overall.
To receive 80% of the credit: You must be able to process the following basic type specifiers: int, char, float, and double. You may limit the syntax so that only one type specifier may be given per declaration. You must also be able to handle pointer and array type modifiers. You may limit the syntax so that array dimensions must always be given. You may assume the dimension given will always be an unsigned integer constant. Each declaration should include an identifier (id). If not, an error should be issued. A symbol table entry should be made for each id. The entry should indicate the type of the declaration. Routines for building and analyzing types are in the types module (types.h) and bucket module (bucket.h), and routines for manipulating the symbol table are in the symbol table module (symtab.h). You are required to use these modules, but you are not allowed to modify them. For more on these and the other modules, see the Resources section, below.
To receive 90% of the credit: In addition to obtaining the 80% level, you should also allow multiple type specifiers per declaration. You should handle the additional specifiers signed, unsigned, short, and long. You should add the necessary semantic checks and error messages to support multiple type specifiers (e.g., short short, unsigned double, et cetera are illegal). You should also add the function type modifier. You should add the necessary semantic checks and error messages to support function modifiers (it is illegal for a function to return a function, for example). Only "old style" functions need to be supported at this level, that is, with no parameter list between parentheses.
To receive 100% of the credit: In
addition to obtaining the 90% level, you should also allow parameters
in function declarations. You should insist that each parameter
declaration includes an id (else semantic error). The possible
parameter types are the same as described in the previous levels,
including pointers, arrays, and functions. You should also support
the void return type for a function. A parameter may be a
reference parameter, e.g.,
int f(int& a);
void g(int (&a)[5]);
This is the only aspect of the
language that is not part of C. You can assume that any "&" appears
only once in a parameter declaration, and only modifies the complete
parameter type (so for example, you will never see
int h(int&* a);). You can also assume that any parameter of
function type has no parameter declarations of its own (you will only
see "old style" function types as parameters).
The semantic errors you should check for at this level are that each parameter declaration must include an id, and that the same id should not appear more than once in the same parameter list.
To receive 110% of the credit (that is, 10% extra credit): In addition to obtaining the 100% level, you should also be capable of processing initializers. You may assume that the initializing expressions will only be unsigned constants. You should support initializations of arrays, including multidimensional arrays. For multidimensional arrays "the brace-enclosed list of initializers should match the structure of the variable being initialized" (to quote Harbison & Steele, "C: A Reference Manual"). Arrays may be incompletely initialized; fill remaining slots with zeros. You do not have to support the initialization of arrays with string literals. You also may assume that pointers will only be initialized to zero. Be sure to consider semantic errors: wrong number of initializers, wrong type, etc.
The x86 (actually 32-bit i386) assembly code to be emitted for this assignment is generated automatically by calling functions in backend-x86.c, which I will discuss briefly in class.
At all levels you are responsible for detecting duplicate declarations. At the 100% level, you must also detect duplicate declarations in parameter lists.
Your compiler should be capable of detecting multiple semantic errors in one file. You can make arbitrary decisions about how to proceed when errors occur (for instance, with a duplicate declaration you might decide to ignore the second declaration). The important point is to do something so you can proceed (without causing a later segmentation fault during compilation).
You may allow the compiler to stop processing with the first syntax error. A syntax error is defined with respect to the distributed grammar (gram.y, see next paragraph).
The file proj_src.zip unzips to a directory proj_src containing the base files for the project. The base files include:
Do not alter the module files (backend-x86.*, symtab.*, types.*, bucket.*, message.*, main.c). If you feel the need to alter one of these files, then there is a problem, either with your code or with ours. If you think there is a bug in the code we gave you, then you are probably wrong, but please let us know anyway. If necessary, we will issue updates in a timely fashion.
You should copy the files in proj_src into a sibling directory named proj1, where you develop your code.
The file proj1test.zip unzips to a
directory proj1test that contains files for testing. These test files
are C source (.c) files with names starting with the letter "T").
Generally, test file names follow a regular pattern:
"T"[1-4]"L"[0-9]+"_"(err|ok)".c"
The digit after the "T" indicates the project installment. The number
between the "L" and the underscore indicates which level is being
tested. The text between the underscore and the period indicates
whether or not the file contains errors. The "err" files are used for
testing your compiler's error reporting, and the "ok" files are to
test your compiler's actual translation of well-formed C code. The .s
and .err files are the "officially correct" outputs of the compiler to
stdout and stderr, respectively, and are used for comparisons when
running the test script (see below).
These are not necessarily the only test files that will be used when grading, so do your own testing too.
The proj_src and proj1test directories should be subdirectories of the same parent. Do not nest one inside the other.
The proj1test directory also contains the file "pcc3", a working executable solution to the entire project. With one exception (see below), your compiler's output (both to stdout and stderr) must match the output of the solution pcc3 on the same test file. Also in the proj1test directory is the Perl script proj1-test.pl, which will be used for grading. You may run it yourself with the "--self-test" option, but beforehand, you will need to hand-edit the script near the top to point to the common parent directory of proj1 and proj1test.
The 80% level functionality will be needed in order to do later parts of the project, so be sure you at least get that much of the assignment completed.
As with previous assignments, we will grade in a mostly automated fashion, using the Perl script proj1-test.pl. This script first attempts to compile your compiler by executing the "make" command. If the make succeeds, then it will execute your compiler on each of the test files (redirected to stdin), capturing your compiler's output to stdout and stderr separately, and comparing your output with that of the official solution. Running this script yourself is your best determination of how you will score. Generally, anything short of files matching exactly will cause points to be subtracted. However, there are three important things to consider:
The solution executable pcc3 reads C code from standard input, writes assembly code to standard output, and error messages to standard error. If given the "-d" or "--dump" option, it also dumps the symbol table to stderr and the end of compilation. (If given the "--dump-all" option, it will also dump the symbol table after every batch of local declarations, as well as at the end of compilation. This will be useful when testing future installments of the project.) You may do whatever you wish with this program (it may be useful to run it on tiny C programs to see what it produces).
The official platform for your compiler development is the Linux machines in our department (e.g., l-1d43-01.cse.sc.edu, l-1d43-02.cse.sc.edu, and the like). You may develop code on another platform (GNU/Linux/Unix-like is heavily preferred; I strongly recommend against using Windows), but you must make sure your program ultimately compiles and runs correctly on the official platform, because the testing script proj1-test.pl will evaluate your compiler on this platform when grading.
WARNING: Porting your code from one platform to another can be an unexpectedly time-consuming task. You should NEVER wait until the last minute to do this. If you develop on a separate platform, you should test your code frequently on the official platform to guard against unpleasant surprises. There will be no extra consideration for projects submitted late because of porting issues (e.g., wrong version of gcc, wrong include directories, etc.). You have been warned.
This is merely a suggestion. The first thing you should do as a team is to explore and understand the base code given to you, as well as getting a better feel for the C language and its syntax. All team members should do this, but if the team has more than one person, it may help if, say, one team member studies the grammar (gram.y) while another looks at the symbol table module (reading the comments in symtab.h), while another concentrates on the types module, etc., each reporting to the other team members.
A multiperson team should meet regularly -- at least once or twice a week, or if the class is during the summer, every day. Set up a schedule of meetings as soon as possible. Each team member should contribute substantially to the coding effort, and should also understand her or his teammates' contributions as well.
To receive full credit for the assignment, your team must submit it via CSE Dropbox (Moodle) no later than 11:59 p.m. on the due date. Late submissions will be accepted with penalties described in the syllabus up to one week late. There should be only one submission per team; each team should designate one of its members to submit on behalf of the team. Any number of resubmissions are allowed up to the final deadline, and only the last submission will be graded. This will be true for future project installments as well.
You must turn in all source files (even the ones we gave you) and a Makefile for your compiler.
To turn in this assignment, follow these steps exactly. Any deviation from these instructions will get points taken off.
You get credit for features successfully implemented. You do not get credit for attempting to do something; you get credit for the things that you can successfully demonstrate work.
Work on and test your system incrementally and back up your system frequently, especially when the due time is approaching! Too many times in the past, a student made seemingly minor code changes to try to improve a stable system, only to find that the altered system crashed completely and was useless. They didn't back up the old system, and they didn't have time to undo the changes before the project was due. They weren't even sure they could remember what the changes were. FAIR WARNING: don't let this happen to you; you will not be given any leniency if this happens.
As always, you are expected to do your own work on this assignment, although this time, a team counts as a "single person".
Finally: you should adequately document and structure your program. Remember, you or your teammates may be called upon to explain this program orally during a subsequent quiz.
This list will probably be updated in the coming days in response to student queries.
After you fix the errors, be sure to remove or comment out all the calls to message(), msg(), and msgn() you added to track down the errors. You can and should keep the calls to bug() in the code permanently, however.