Overview
The overall goal of this sequence of projects is to build an interpreter for the version of LISP presented in class.
Project 1
For the first project, I build a lexical analyzer, a parser, and a printer. The input language for my parser is defined by the following grammar.
S-expression
Here
<Start> ::= <S-exp> <Start> | <S-exp> eof
<S-exp> ::= atom | ( <S-exp> . <S-exp> )
Here
An atom, represented by the atom terminal, is either a literal atom or a numeric atom. A literal atom is a non-empty sequence of digits and upper-case letters, starting with a letter. A numeric atom is a non-empty sequence of digits. Terminal eof is an artificial terminal representing the “end-of-input-file” event.
We define a class Sexp to denote S-expression tree.
1 | class Sexp { |
Lexical Analysis
The lexical analyzer (a.k.a. scanner) should process as input a sequence in ASCII characters and should produce a sequence of tokens that serve as input to the parser. The parser performs syntactic analysis (i.e., parsing) of the token stream. Getting the tokens is typically done on demand: the parser asks the scanner for the next token by calling getNextToken, in order to apply of some production from the grammar.
The scanner should read its input from stdin in Unix. The input contains a non-empty sequence of charac- ters. Guaranteed that the only characters will ever be seen in an input file are
- upper-case letters
- digits
- (
- .
- )
- white spaces (e.g., space, tab, end of line, etc.)
use java Interpreter < f1 > f2
to run an interpreter written in Java, which means to process a program written in file f1 and to write the output to file f2.
The main function of the scanner is getNextToken. It is repeatedly called by the parser. At a high level, getNextToken does the following:
- if the current character is a white space, reads it and any white spaces that follow it 2. if the current character is ‘(‘ returns token OpenParenthesis
- if the current character is ‘)’ returns token ClosingParenthesis
- if the current character is ‘.’ returns token Dot
- if the current character is letter/digit, reads it and all letter/digit characters that follow it. The re- sulting string is either a literal atom (e.g., “XY3Z”), a numeric atom (e.g., “3415”), or an error (e.g., “34XY”). In the first two cases, an Atom is returned back to the parser, together with all rel- evant information about the atom.
1 | class Lexical { |
Syntactic Analysis
The parser processes the stream of tokens produced by the scanner. For each parsed
Rather than building a parse tree for an S-expression, parser should build a binary tree that captures the structure of an S-expression. The leaves of this tree are atoms. Let T(S) be the binary tree representation of an S-expression S. Tree T(S) is defined as follows:
- if S is an atom, T(S) contains one node which is the atom itself
- if Sis(S1 .S2 ),the root of T(S) is anodewhose left child is the root of T(S1) and right childis the root of T(S2)
A simple way to build parser is to have two recursive functions ParseStart and ParseSexp. At a high level, the functions work as follows:
- Function ParseStart will call ParseSexp and then will check for end of file. If the end is reached, the parser will terminate. If not, ParseStart will call itself.
- Function ParseSexp will get the next token. If it is not Atom or OpenParenthesis, an error will be reported. If it is Atom, the function returns. If it is OpenParenthesis, the function will call itself, then will get the next token, report an error if it is not Dot, call itself again, get the next token, and report an error if it is not ClosingParenthesis.
While the parser is applying its productions, we build (incrementally) the corresponding bi-nary tree representation of the S-expression that is being parsed.
1 | class Parse { |
Output
All output should go to UNIX stdout. This includes error messages – do not print to stderr. For each input S-expression, pretty-print the expression followed by newline.
Some S-expressions are considered to be lists:
- the atom NIL is a list,
- if S2 is a list, so is ( S1 . S2 ) for any S1.
In my implementation, for each inner node in the binary tree for an S-expression, compute a bool- ean attribute isList. For a node n, n.isList is true if and only if the subtree rooted at n represents a list. This is needed for printing an input S-expression:
- If n.isList for every inner node n, print the entire expression using only list notation
- Otherwise, print the entire expression using only dot notation
class Printer {
void launch(Sexp sexp) {
if(Sexp.isListTree(sexp)) {
print(sexp, true);
}
else {
printRaw(sexp);
}
}
void print(List<Sexp> sexpList) {
int length = sexpList.size();
for(int i = 0; i < length; i++) {
print(sexpList.get(i));
System.out.print('\n');
}
}
void print(Sexp sexp, Boolean withPa) {
if(sexp == null) {
return;
}
if(sexp.isList) {
System.out.print("(");
print(sexp.left, true);
if(sexp.right != null && sexp.right.val != null && sexp.right.val.equals("NIL")) {
}
else {
System.out.print(" ");
print(sexp.right);
}
System.out.print(")");
}
else if(sexp.val != null) {
System.out.print(sexp.val);
}
}
void print(Sexp sexp) {
if(sexp == null) {
return;
}
else if(sexp.isList) {
print(sexp.left, true);
if(sexp.right != null && sexp.right.val != null && sexp.right.val.equals("NIL")){
}
else {
System.out.print(" ");
print(sexp.right);
}
}
else if(sexp.val != null) {
System.out.print(sexp.val);
}
}
void printRaw(Sexp sexp) {
if(sexp == null) {
return;
}
else if(sexp.isList) {
System.out.print("(");
printRaw(sexp.left);
System.out.print(" . ");
printRaw(sexp.right);
System.out.print(")");
}
else {
System.out.print(sexp.val);
}
}
}