2015-10-02

CSE 6341 Implement Lisp Interpreter with Java, Part 1

Overview

The overall goal of this sequence of projects is to build an interpreter for the version of LISP presented in class.

Project 1

For the first project, I build a lexical analyzer, a parser, and a printer. The input language for my parser is defined by the following grammar.

S-expression

Here represents an S-expression (“symbolic expression”), a key concept in LISP.

<Start> ::= <S-exp> <Start> | <S-exp> eof 
<S-exp> ::= atom | ( <S-exp> . <S-exp> )

Here and are non-terminals. There are five terminals: atom ( . ) eof

An atom, represented by the atom terminal, is either a literal atom or a numeric atom. A literal atom is a non-empty sequence of digits and upper-case letters, starting with a letter. A numeric atom is a non-empty sequence of digits. Terminal eof is an artificial terminal representing the “end-of-input-file” event.

We define a class Sexp to denote S-expression tree.

class Sexp {
	public Boolean isList;
	// public Atom atom;
	public String kind;
	public String val;
	public Sexp left;
	public Sexp right;
	Sexp() {
		this.isList = true;
		this.kind = null;
		this.val = null;
		this.left = null;
		this.right = null;
	}
	public static Boolean isListTree(Sexp sexp) {
		if(sexp == null) {
			return true;
		}
		else if(sexp.right == null) {
			return true;
		}
		else if(sexp.right.right == null) {
			if(sexp.right.val.equals("NIL")) {
				// System.out.println("True with" + sexp.right.val);
			}
			else {
				// System.out.println("false!");
				return false;
			}
		}
		return isListTree(sexp.left) && isListTree(sexp.right);
	}
}

Lexical Analysis

The lexical analyzer (a.k.a. scanner) should process as input a sequence in ASCII characters and should produce a sequence of tokens that serve as input to the parser. The parser performs syntactic analysis (i.e., parsing) of the token stream. Getting the tokens is typically done on demand: the parser asks the scanner for the next token by calling getNextToken, in order to apply of some production from the grammar.

The scanner should read its input from stdin in Unix. The input contains a non-empty sequence of charac- ters. Guaranteed that the only characters will ever be seen in an input file are

upper-case letters
digits
(
.
)
white spaces (e.g., space, tab, end of line, etc.)

use java Interpreter < f1 > f2 to run an interpreter written in Java, which means to process a program written in file f1 and to write the output to file f2.

The main function of the scanner is getNextToken. It is repeatedly called by the parser. At a high level, getNextToken does the following:

if the current character is a white space, reads it and any white spaces that follow it 2. if the current character is ‘(‘ returns token OpenParenthesis
if the current character is ‘)’ returns token ClosingParenthesis
if the current character is ‘.’ returns token Dot
if the current character is letter/digit, reads it and all letter/digit characters that follow it. The re- sulting string is either a literal atom (e.g., “XY3Z”), a numeric atom (e.g., “3415”), or an error (e.g., “34XY”). In the first two cases, an Atom is returned back to the parser, together with all rel- evant information about the atom.

class Lexical {
	public File file;
	public String line;
	public InputStream reader = System.in;
	String getNextToken() {
		try {
			int tempchar;
			tempchar = reader.read();
			if(tempchar == -1) {
				return "EOF";
			}
			else if(tempchar == '\n') {
				return "EOL";
			}
			else if((char)tempchar == '(') {
				return "("; //OpenParenthesis";
			}
			else if((char)tempchar == ')') {
				return ")"; //ClosingParenthesis";
			}
			else if((char)tempchar == '.') {
				return ".";
			}
			else if(tempchar >= 'A' && tempchar <= 'Z') {
				String literalVal = new String();
				while((tempchar >= 'A' && tempchar <= 'Z') || (tempchar >= '0' && tempchar <='9')) {
					reader.mark(32);
					literalVal += String.valueOf((char) tempchar);
					tempchar = reader.read();
				}
				System.in.reset();
				return literalVal;
			}
			else if(tempchar >= '0' && tempchar <= '9') {
				String numericVal = new String();
				while(tempchar >= '0' && tempchar <='9') {
					reader.mark(32);
					numericVal += String.valueOf((char) tempchar);
					tempchar = reader.read();
					if(tempchar <= '0' && tempchar >= '9') {
						System.out.println("Invalid Input: " + numericVal + String.valueOf((char) tempchar) + "...");
						System.exit(1);
					}
				}
				reader.reset();
				return numericVal;
			} 
			else if(tempchar == ' ') {
				while(tempchar == ' ') {
					reader.mark(32);
					tempchar = reader.read();
				}			
				reader.reset();
				return getNextToken();
			}
			else {
				System.out.println("Invalid Input: " + String.valueOf((char) tempchar) + "...");
				System.exit(1);
			}
        } catch (Exception e) {
            e.printStackTrace();
        }
        return "ERR";
	}
}

Syntactic Analysis

The parser processes the stream of tokens produced by the scanner. For each parsed , some result is printed, followed by newline. The result depends on the project; for Project 1, the result is pretty- printing of the S-expression.

Rather than building a parse tree for an S-expression, parser should build a binary tree that captures the structure of an S-expression. The leaves of this tree are atoms. Let T(S) be the binary tree representation of an S-expression S. Tree T(S) is defined as follows:

if S is an atom, T(S) contains one node which is the atom itself
if Sis(S1 .S2 ),the root of T(S) is anodewhose left child is the root of T(S1) and right childis the root of T(S2)

A simple way to build parser is to have two recursive functions ParseStart and ParseSexp. At a high level, the functions work as follows:

Function ParseStart will call ParseSexp and then will check for end of file. If the end is reached, the parser will terminate. If not, ParseStart will call itself.
Function ParseSexp will get the next token. If it is not Atom or OpenParenthesis, an error will be reported. If it is Atom, the function returns. If it is OpenParenthesis, the function will call itself, then will get the next token, report an error if it is not Dot, call itself again, get the next token, and report an error if it is not ClosingParenthesis.

While the parser is applying its productions, we build (incrementally) the corresponding bi-nary tree representation of the S-expression that is being parsed.

class Parse {
	public Lexical lexical;
	public Printer printer;
	public List<Sexp> sexpList = new ArrayList<Sexp>();
	public Sexp sexp = new Sexp();
	public Sexp sexp0 = sexp;
	Parse(Lexical lex, Printer pri) {
		this.lexical = lex;
		this.printer = pri;
	}
	void ParseStart() { 
		// sexpList.add(sexp);
		sexp = ParseSexp(0);
		if(sexp == null) {
			System.exit(0);
		}
		// check end?
		else if(sexp.val != null || sexp.left != null){
			printer.launch(sexp);
			ParseStart();
		}
		else {
			ParseStart();
		}
	}
		Sexp ParseSexp(int dir) {
		String token = lexical.getNextToken();
		Sexp tempSexp = new Sexp();
		// System.out.println(token);
		if(token == "EOF") {
			return null;
		}
		else if(token == "EOL") {
			// System.out.println("PaeseSexp reached EOL");
			// printer.print(tempSexp, true);
			System.out.print('\n');
			return tempSexp;
			// ParseSexp(0);
		}
		else if(token == "(") {
			tempSexp.left = ParseSexp(1);
			token = lexical.getNextToken();
			// System.out.println(token);
			if(token != ".") {
				System.out.println("ERROR: here should be a '.' not" + token);
				System.exit(1);
			}
			tempSexp.right = ParseSexp(2);
			token = lexical.getNextToken();
			// System.out.println(token);
			if(token != ")") {
				System.out.println("ERROR: here should be a ')'");
				System.exit(1);
			}
			// System.out.println(tempSexp.)
			return tempSexp;
		}
		else if(token.charAt(0) >= 'A' && token.charAt(0) <= 'Z') { // litera
			if(dir == 0) {
				tempSexp.kind = "literal";
				tempSexp.val = token;
			}
			else if(dir == 1) { // add to left child
				tempSexp.isList = false;
				tempSexp.kind = "literal";
				tempSexp.val = token;
			}
			else if(dir == 2) {
				tempSexp.isList = (token == "NIL");
				tempSexp.kind = token == "NIL" ? null : "literal";
				tempSexp.val = token;
			}
			return tempSexp;
		}
		else if(token.charAt(0) >= '0' && token.charAt(0) <= '9') { // numeric
			tempSexp.isList = false;
			tempSexp.kind = "numeric";
			tempSexp.val = token;
			return tempSexp;
		}
		else{
			System.out.println("ERROR: Invalid character" + token);
			return tempSexp;
		}
	}
}

Output

All output should go to UNIX stdout. This includes error messages – do not print to stderr. For each input S-expression, pretty-print the expression followed by newline.
Some S-expressions are considered to be lists:

the atom NIL is a list,
if S2 is a list, so is ( S1 . S2 ) for any S1.

In my implementation, for each inner node in the binary tree for an S-expression, compute a bool- ean attribute isList. For a node n, n.isList is true if and only if the subtree rooted at n represents a list. This is needed for printing an input S-expression:

If n.isList for every inner node n, print the entire expression using only list notation
Otherwise, print the entire expression using only dot notation

class Printer {
    void launch(Sexp sexp) {
        if(Sexp.isListTree(sexp)) {
            print(sexp, true);
        }
        else {
            printRaw(sexp);
        }
    }
    void print(List<Sexp> sexpList) {
        int length = sexpList.size();
        for(int i = 0; i < length; i++) {
            print(sexpList.get(i));
            System.out.print('\n');
        }
    }
    void print(Sexp sexp, Boolean withPa) {
        if(sexp == null) {
            return;
        }
        if(sexp.isList) {
            System.out.print("(");
            print(sexp.left, true);
            if(sexp.right != null && sexp.right.val != null && sexp.right.val.equals("NIL")) {
            }
            else {
                System.out.print(" ");
                print(sexp.right); 
            }
            System.out.print(")");
        }
        else if(sexp.val != null) {
            System.out.print(sexp.val);
        }
    }
    void print(Sexp sexp) {
        if(sexp == null) {
            return;
        }
        else if(sexp.isList) {
            print(sexp.left, true);
            if(sexp.right != null && sexp.right.val != null && sexp.right.val.equals("NIL")){
            }
            else {
                System.out.print(" ");
                print(sexp.right);
            }
        }
        else if(sexp.val != null) {
            System.out.print(sexp.val);
        }
    }
    void printRaw(Sexp sexp) {
        if(sexp == null) {
            return;
        }
        else if(sexp.isList) {
            System.out.print("(");
            printRaw(sexp.left);
            System.out.print(" . ");
            printRaw(sexp.right); 
            System.out.print(")");
        }
        else {
            System.out.print(sexp.val);
        }
    }
}

cfdtlee

We do what we have to do, so we can do what we want to do