Hand Rolled Parser• Appendix A

Hand Rolling


steamroller

Steam Roller • We're gonna be rolling machines.

In this book we used mpc to do all the parsing, but in this appendix we're going to replace mpc with our own hand rolled parser. The choice to use mpc has by far been the main source of complaints by readers of this book, and before we dive into the details of hand-rolling I want to talk about some of the reasons I decided on using a parsing library in the first place.

The main reason was that when I was learning about programming languages, I found the theory of formal languages fascinating. I really enjoyed getting into the mind set of thinking about languages more abstractly. I think this is a fun thing to teach, as it opens up a lot of new avenues of exploration.

Another reason is that it gives new programmers a chance to learn how to use a library. It gets them comfortable early on with weird interfaces, and other peoples' code.

And finally, perhaps most importantly, using a library for the parsing allowed me to delay the topics of memory management and pointers for as long as possible - by which point readers should be much more comfortable with the C constructs they'd encountered so far.

But that doesn't really matter - writing a parser by hand is a great thing to do, and although one with all the bells and whistles can be a complicated thing to get right, because our Lisp is so simple, for us it wont be too difficult.

Can I do this if I've not finished the book?

If you've not completed all the chapters of this book it is probably not a good idea to attempt this appendix. It may be possible to complete this appendix if you're already past Chapter 9 • S-Expressions, but if you're not completed this chapter already it might not make much sense. Sorry!

Replacing mpc

The output of mpc was an Abstract Syntax Tree, but because our language is so simple, when we replace it we're going to go directly from the input string to an S-Expressions (which is pretty much like abstract syntax tree anyway). So let us declare a function which takes some string s, some pointer to a position in that string i and some terminal character end and which outputs an lval. We'll call it lval_read_expr.

lval* lval_read_expr(char* s, int* i, char end);

We'll define the implementation later. For now lets go ahead and replace the places we call mpc with this new function. For error handling we'll get lval_read_expr to return an error lval if something goes wrong. The terminal character when we're reading from a C string is just the null character \0.

char* input = readline("lispy> ");
/* Read from input to create an S-Expr */
int pos = 0;
lval* expr = lval_read_expr(input, &pos, '\0');

/* Evaluate and print input */
lval* x = lval_eval(e, expr);

We also need to replace mpc in our function builtin_load. Here, because we only support strings in our read function, we first need to load in the contents of the file. To do this we use fseek and ftell to find out how many characters are in the file before allocating a string of that size (plus one) and reading the file into it.

lval* builtin_load(lenv* e, lval* a) {
  LASSERT_NUM("load", a, 1);
  LASSERT_TYPE("load", a, 0, LVAL_STR);
  
  /* Open file and check it exists */
  FILE* f = fopen(a->cell[0]->str, "rb");
  if (f == NULL) {
    lval* err = lval_err("Could not load Library %s", a->cell[0]->str);
    lval_del(a);
    return err;
  }
  
  /* Read File Contents */
  fseek(f, 0, SEEK_END);
  long length = ftell(f);
  fseek(f, 0, SEEK_SET);
  char* input = calloc(length+1, 1);
  fread(input, 1, length, f);
  fclose(f);

We can then easily read from this string as we did in main.

  /* Read from input to create an S-Expr */
  int pos = 0;
  lval* expr = lval_read_expr(input, &pos, '\0');
  free(input);
  
  /* Evaluate all expressions contained in S-Expr */
  if (expr->type != LVAL_ERR) {
    while (expr->count) {
      lval* x = lval_eval(e, lval_pop(expr, 0));
      if (x->type == LVAL_ERR) { lval_println(x); }
      lval_del(x);
    }
  } else {
    lval_println(expr);
  }
  
  lval_del(expr);    
  lval_del(a);
  
  return lval_sexpr();
}

And with those calls replaced we can start defining lval_read_expr.

A Character at a Time


reading with pipe

Reading • A Jolly Good Time (tm).

The way we think about implementing a parser is quite different to the high level abstract view we were given with mpc. Instead of thinking about the language we instead need to think about the process.

Usually this process takes a very simple form - a parser is almost always just a loop, which repeatedly reads a character at a time from the input, and each time decides what to do with it. The challenge is in making this process elegant. It all starts to get a little messy when we think about whitespace, and comments, and everything else.

To give an idea of how it might work - in our Lisp, if we encounter the character d in the input, we can store it in some string, and also we know we must be reading in a symbol, so can enter a state where we look for more letters, each time adding them to the string. Once we're found no more letters in the input we can return the whole thing as a symbol (for example def) and start again.

The function lval_read_expr is basically going to work like this. We're going to take as input some string, some position in that string, and decide what to do next. When the next character isn't the one specified by the argument end we will try to read in whatever thing appears next, create an lval object from it, and append it to the first argument v.

If instead we reach the character specified by end we're going to return the next position in the string and return to the caller. This return value will help whoever calls lval_read_expr to see how much of the string it has consumed and how much is left.

For now let us assume the next character isn't the end character. The first thing we need to check is that we've not reached the end of the input. If we've reached the end of the input without encountering the end character then we can throw a syntax error and jump to the end of the input to ensure no more is consumed.

int lval_read_expr(lval* v, char* s, int i, char end) {
  
  while (s[i] != end) {
    
    /* If we reach end of input then there is some syntax error */
    if (s[i] == '\0') {
      lval_add(v, lval_err("Missing %c at end of input", end));
      return strlen(s)+1;
    }

After this we can check if the next character is whitespace. Any whitespace characters we can just skip over as our language is not whitespace sensitive.

    /* Skip all whitespace */
    if (strchr(" \t\v\r\n", s[i])) {
      i++;
      continue;
    }

Another easy case is if the next character is a semi-colon ;. If it is a semi-colon we are starting a comment and we can ignore the rest of the characters until we reach a new line.

    /* If next char is ; then read comment */
    if (s[i] == ';') {
      while (s[i] != '\n' && s[i] != '\0') { i++; }
      i++;
      continue;
    }

If the next character is an open parenthesis ( or a curly bracket { we need to parse either an S-Expression or a Q-Expression. For these we can use lval_read_expr again and just supply it with a different character to end on and a different expression to write the results to.

    /* If next character is ( then read S-Expr */
    if (s[i] == '(') {
      lval* x = lval_sexpr();
      lval_add(v, x);
      i = lval_read_expr(x, s, i+1, ')');
      continue;
    }
    
    /* If next character is { then read Q-Expr */
    if (s[i] == '{') {
      lval* x = lval_qexpr();
      lval_add(v, x);
      i = lval_read_expr(x, s, i+1, '}');
      continue;
    }

Those are all the easy cases done. Now we need to decide what to do if we encounter some letter or number. In this case we need to parse all of the numbers or letters we can until we reach something that isn't. For simplicity we're going to treat numbers like a special case of symbols and if we encounter any of these we're going to call the function lval_read_sym, which we'll define later.

    /* If next character is part of a symbol then read symbol */
    if (strchr(
      "abcdefghijklmnopqrstuvwxyz"
      "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
      "0123456789_+-*\\/=<>!&", s[i])) {
      i = lval_read_sym(v, s, i);
      continue;
    }

We also have to deal with strings. If we reach a " character we're going to have to consume everything we encounter up until the next unescaped ". For this we can call a function lval_read_str, which we'll define later.

    /* If next character is " then read string */
    if (strchr("\"", s[i])) {
      i = lval_read_str(v, s, i+1);
      continue;
    }

Finally if we somehow encounter something else we better throw an error and skip to the end of the input, and as mentioned before, if we do actually match our end character and the while loop ends, we just need to return the updated position in the input.

    /* Encountered some unknown character */
    lval_add(v, lval_err("Unknown Character %c", s[i]));
    return strlen(s)+1;
  }
  
  return i+1;
}

That completes the body of our function lval_read_expr. Now we just need to fill in the gaps.

Reading Symbols


Reading in symbols is fairly straight forward. We start by allocating some empty string, and then, for each character in the input which is valid as part of a symbol we append this character to the string.

int lval_read_sym(lval* v, char* s, int i) {
  
  /* Allocate Empty String */
  char* part = calloc(1,1);
  
  /* While valid identifier characters */
  while (strchr(
      "abcdefghijklmnopqrstuvwxyz"
      "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
      "0123456789_+-*\\/=<>!&", s[i]) && s[i] != '\0') {
    
    /* Append character to end of string */
    part = realloc(part, strlen(part)+2);
    part[strlen(part)+1] = '\0';
    part[strlen(part)+0] = s[i];
    i++;
  }

You'll notice that we're also reading in numbers with this code. So the next step is to check if this symbol is actually a number. To do this we just check if all of the characters are digits.

  /* Check if Identifier looks like number */
  int is_num = strchr("-0123456789", part[0]);
  for (int i = 1; i < strlen(part); i++) {
    if (!strchr("0123456789", part[i])) { is_num = 0; break; }
  }

Finally, if the symbol is a number we convert it with similar code to the previous code we used when we were reading from the mpc AST, or otherwise we just use it as a normal symbol. Then we free the temporary string we allocated and return the new position in the input.

  /* Add Symbol or Number as lval */
  if (is_num) {
    errno = 0;
    long x = strtol(part, NULL, 10);
    lval_add(v, errno != ERANGE ? lval_num(x) : lval_err("Invalid Number %s", part));
  } else {
    lval_add(v, lval_sym(part));
  }
  
  /* Free temp string */
  free(part);
  
  /* Return updated position in input */
  return i;
}

And we are done with reading symbols. Onto strings!

Reading Strings


When we read in strings things get a little more complicated. This is because of escaping. We've covered this a little in earlier chapters but now we're going to have to deal with it head on.

Escaping is the name we give to the way we let users enter special characters by using special combinations of symbols starting with a backslash \. For example the most famous of these is the newline character which we encode using \n. When converting from the multi-character representation to the real representation we call this unescaping and when converting from the real representation to the two character encoding we call this escaping.

When we read in user strings we'll need to convert these pairs of characters into the single special character they represent. If we ignore the leading backslash we can make a function in C that tells us this mapping.

/* Function to unescape characters */
char lval_str_unescape(char x) {
  switch (x) {
    case 'a':  return '\a';
    case 'b':  return '\b';
    case 'f':  return '\f';
    case 'n':  return '\n';
    case 'r':  return '\r';
    case 't':  return '\t';
    case 'v':  return '\v';
    case '\\': return '\\';
    case '\'': return '\'';
    case '\"': return '\"';
  }
  return '\0';
}

It is also going to be useful to list all the possible unescapable characters so we can check if we have one.

/* Possible unescapable characters */
char* lval_str_unescapable = "abfnrtv\\\'\"";

We can write similar functions to do the conversion in the other direction.

/* List of possible escapable characters */
char* lval_str_escapable = "\a\b\f\n\r\t\v\\\'\"";
/* Function to escape characters */
char* lval_str_escape(char x) {
  switch (x) {
    case '\a': return "\\a";
    case '\b': return "\\b";
    case '\f': return "\\f";
    case '\n': return "\\n";
    case '\r': return "\\r";
    case '\t': return "\\t";
    case '\v': return "\\v";
    case '\\': return "\\\\";
    case '\'': return "\\\'";
    case '\"': return "\\\"";
  }
  return "";
}

With these we can begin to write our functions for reading strings. First we allocate a temporary string and while we're not reading the terminal " character we're going to process the incoming characters.

int lval_read_str(lval* v, char* s, int i) {
  
  /* Allocate empty string */
  char* part = calloc(1,1);
  
  while (s[i] != '"') {
    
    char c = s[i];

First we need to check for the end of the input - if we're reached this then there must be some string input which doesn't terminate. In this case we free the temporary string we allocated, and return some error.

    /* If end of input then there is an unterminated string literal */
    if (c == '\0') {
      lval_add(v, lval_err("Unexpected end of input at string literal"));
      free(part);
      return strlen(s);
    }

We then check if the next character is a backslash. If we have a backslash then we need to escape the next character after it. Given the previous functions we're already defined this is easy. If it is unescapable then we unescape it - otherwise we throw some error.

    /* If backslash then unescape character after it */
    if (c == '\\') {
      i++;
      /* Check next character is escapable */
      if (strchr(lval_str_unescapable, s[i])) {        
        c = lval_str_unescape(s[i]);
      } else {
        lval_add(v, lval_err("Invalid escape character %c", c));
        free(part);
        return strlen(s);
      }
    }

Given either the escaped character, or the normal character which is part of the string we simply add it to our temporary string, and once we are done consuming characters we convert this into an lval and add it to the function argument v, free the temporary string we allocated, and return.


    /* Append character to string */
    part = realloc(part, strlen(part)+2);
    part[strlen(part)+1] = '\0';
    part[strlen(part)+0] = c;
    i++;    
  }
  
  /* Add lval and free temp string */
  lval_add(v, lval_str(part));
  
  free(part);
  
  return i+1;
}

Printing Strings


As well as using mpc to unescape strings we also used mpc to escape strings when we printed them out.

So if we're going to replace all uses of mpc we better do it here too. Given our functions we already defined for escaping and unescaping characters this wont be too difficult. We'll just loop over each character in the string and if it is esapable then we'll escape it, otherwise we'll print it out as normal.

void lval_print_str(lval* v) {
  putchar('"');
  /* Loop over the characters in the string */
  for (int i = 0; i < strlen(v->str); i++) {
    if (strchr(lval_str_escapable, v->str[i])) {
      /* If the character is escapable then escape it */
      printf("%s", lval_str_escape(v->str[i]));
    } else {
      /* Otherwise print character as it is */
      putchar(v->str[i]);
    }
  }
  putchar('"');
}

Cleaning Up


With that done we can finally remove our include for mpc. We'll need to replace it with includes for some of the libraries mpc included for us. The top of our file should now look like this.

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <errno.h>

And we'll also have to remove the forward declarations of our parsers, so please delete the following too.

/* Parser Declariations */

mpc_parser_t* Number; 
mpc_parser_t* Symbol; 
mpc_parser_t* String; 
mpc_parser_t* Comment;
mpc_parser_t* Sexpr;  
mpc_parser_t* Qexpr;  
mpc_parser_t* Expr; 
mpc_parser_t* Lispy;

And with that we're really done! Enjoy your new hand rolled parser.

Reference


#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <errno.h>

#ifdef _WIN32

static char buffer[2048];

char* readline(char* prompt) {
  fputs(prompt, stdout);
  fgets(buffer, 2048, stdin);
  char* cpy = malloc(strlen(buffer)+1);
  strcpy(cpy, buffer);
  cpy[strlen(cpy)-1] = '\0';
  return cpy;
}

void add_history(char* unused) {}

#else
#include <editline/readline.h>
#include <editline/history.h>
#endif

/* Forward Declarations */

struct lval;
struct lenv;
typedef struct lval lval;
typedef struct lenv lenv;

/* Lisp Value */

enum { LVAL_ERR, LVAL_NUM,   LVAL_SYM, LVAL_STR, 
       LVAL_FUN, LVAL_SEXPR, LVAL_QEXPR };
       
typedef lval*(*lbuiltin)(lenv*, lval*);

struct lval {
  int type;

  /* Basic */
  long num;
  char* err;
  char* sym;
  char* str;
  
  /* Function */
  lbuiltin builtin;
  lenv* env;
  lval* formals;
  lval* body;
  
  /* Expression */
  int count;
  lval** cell;
};

lval* lval_num(long x) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_NUM;
  v->num = x;
  return v;
}

lval* lval_err(char* fmt, ...) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_ERR;  
  va_list va;
  va_start(va, fmt);  
  v->err = malloc(512);  
  vsnprintf(v->err, 511, fmt, va);  
  v->err = realloc(v->err, strlen(v->err)+1);
  va_end(va);  
  return v;
}

lval* lval_sym(char* s) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_SYM;
  v->sym = malloc(strlen(s) + 1);
  strcpy(v->sym, s);
  return v;
}

lval* lval_str(char* s) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_STR;
  v->str = malloc(strlen(s) + 1);
  strcpy(v->str, s);
  return v;
}

lval* lval_builtin(lbuiltin func) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_FUN;
  v->builtin = func;
  return v;
}

lenv* lenv_new(void);

lval* lval_lambda(lval* formals, lval* body) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_FUN;  
  v->builtin = NULL;  
  v->env = lenv_new();  
  v->formals = formals;
  v->body = body;
  return v;  
}

lval* lval_sexpr(void) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_SEXPR;
  v->count = 0;
  v->cell = NULL;
  return v;
}

lval* lval_qexpr(void) {
  lval* v = malloc(sizeof(lval));
  v->type = LVAL_QEXPR;
  v->count = 0;
  v->cell = NULL;
  return v;
}

void lenv_del(lenv* e);

void lval_del(lval* v) {

  switch (v->type) {
    case LVAL_NUM: break;
    case LVAL_FUN: 
      if (!v->builtin) {
        lenv_del(v->env);
        lval_del(v->formals);
        lval_del(v->body);
      }
    break;
    case LVAL_ERR: free(v->err); break;
    case LVAL_SYM: free(v->sym); break;
    case LVAL_STR: free(v->str); break;
    case LVAL_QEXPR:
    case LVAL_SEXPR:
      for (int i = 0; i < v->count; i++) {
        lval_del(v->cell[i]);
      }
      free(v->cell);
    break;
  }
  
  free(v);
}

lenv* lenv_copy(lenv* e);

lval* lval_copy(lval* v) {
  lval* x = malloc(sizeof(lval));
  x->type = v->type;
  switch (v->type) {
    case LVAL_FUN:
      if (v->builtin) {
        x->builtin = v->builtin;
      } else {
        x->builtin = NULL;
        x->env = lenv_copy(v->env);
        x->formals = lval_copy(v->formals);
        x->body = lval_copy(v->body);
      }
    break;
    case LVAL_NUM: x->num = v->num; break;
    case LVAL_ERR: x->err = malloc(strlen(v->err) + 1);
      strcpy(x->err, v->err);
    break;
    case LVAL_SYM: x->sym = malloc(strlen(v->sym) + 1);
      strcpy(x->sym, v->sym);
    break;
    case LVAL_STR: x->str = malloc(strlen(v->str) + 1);
      strcpy(x->str, v->str);
    break;
    case LVAL_SEXPR:
    case LVAL_QEXPR:
      x->count = v->count;
      x->cell = malloc(sizeof(lval*) * x->count);
      for (int i = 0; i < x->count; i++) {
        x->cell[i] = lval_copy(v->cell[i]);
      }
    break;
  }
  return x;
}

lval* lval_add(lval* v, lval* x) {
  v->count++;
  v->cell = realloc(v->cell, sizeof(lval*) * v->count);
  v->cell[v->count-1] = x;
  return v;
}

lval* lval_join(lval* x, lval* y) {  
  for (int i = 0; i < y->count; i++) {
    x = lval_add(x, y->cell[i]);
  }
  free(y->cell);
  free(y);  
  return x;
}

lval* lval_pop(lval* v, int i) {
  lval* x = v->cell[i];  
  memmove(&v->cell[i],
    &v->cell[i+1], sizeof(lval*) * (v->count-i-1));  
  v->count--;  
  v->cell = realloc(v->cell, sizeof(lval*) * v->count);
  return x;
}

lval* lval_take(lval* v, int i) {
  lval* x = lval_pop(v, i);
  lval_del(v);
  return x;
}

void lval_print(lval* v);

void lval_print_expr(lval* v, char open, char close) {
  putchar(open);
  for (int i = 0; i < v->count; i++) {
    lval_print(v->cell[i]);    
    if (i != (v->count-1)) {
      putchar(' ');
    }
  }
  putchar(close);
}

/* Possible unescapable characters */
char* lval_str_unescapable = "abfnrtv\\\'\"";

/* Function to unescape characters */
char lval_str_unescape(char x) {
  switch (x) {
    case 'a':  return '\a';
    case 'b':  return '\b';
    case 'f':  return '\f';
    case 'n':  return '\n';
    case 'r':  return '\r';
    case 't':  return '\t';
    case 'v':  return '\v';
    case '\\': return '\\';
    case '\'': return '\'';
    case '\"': return '\"';
  }
  return '\0';
}

/* List of possible escapable characters */
char* lval_str_escapable = "\a\b\f\n\r\t\v\\\'\"";

/* Function to escape characters */
char* lval_str_escape(char x) {
  switch (x) {
    case '\a': return "\\a";
    case '\b': return "\\b";
    case '\f': return "\\f";
    case '\n': return "\\n";
    case '\r': return "\\r";
    case '\t': return "\\t";
    case '\v': return "\\v";
    case '\\': return "\\\\";
    case '\'': return "\\\'";
    case '\"': return "\\\"";
  }
  return "";
}

void lval_print_str(lval* v) {
  putchar('"');
  /* Loop over the characters in the string */
  for (int i = 0; i < strlen(v->str); i++) {
    if (strchr(lval_str_escapable, v->str[i])) {
      /* If the character is escapable then escape it */
      printf("%s", lval_str_escape(v->str[i]));
    } else {
      /* Otherwise print character as it is */
      putchar(v->str[i]);
    }
  }
  putchar('"');
}

void lval_print(lval* v) {
  switch (v->type) {
    case LVAL_FUN:
      if (v->builtin) {
        printf("<builtin>");
      } else {
        printf("(\\ ");
        lval_print(v->formals);
        putchar(' ');
        lval_print(v->body);
        putchar(')');
      }
    break;
    case LVAL_NUM:   printf("%li", v->num); break;
    case LVAL_ERR:   printf("Error: %s", v->err); break;
    case LVAL_SYM:   printf("%s", v->sym); break;
    case LVAL_STR:   lval_print_str(v); break;
    case LVAL_SEXPR: lval_print_expr(v, '(', ')'); break;
    case LVAL_QEXPR: lval_print_expr(v, '{', '}'); break;
  }
}

void lval_println(lval* v) { lval_print(v); putchar('\n'); }

int lval_eq(lval* x, lval* y) {
  
  if (x->type != y->type) { return 0; }
  
  switch (x->type) {
    case LVAL_NUM: return (x->num == y->num);    
    case LVAL_ERR: return (strcmp(x->err, y->err) == 0);
    case LVAL_SYM: return (strcmp(x->sym, y->sym) == 0);    
    case LVAL_STR: return (strcmp(x->str, y->str) == 0);    
    case LVAL_FUN: 
      if (x->builtin || y->builtin) {
        return x->builtin == y->builtin;
      } else {
        return lval_eq(x->formals, y->formals) && lval_eq(x->body, y->body);
      }    
    case LVAL_QEXPR:
    case LVAL_SEXPR:
      if (x->count != y->count) { return 0; }
      for (int i = 0; i < x->count; i++) {
        if (!lval_eq(x->cell[i], y->cell[i])) { return 0; }
      }
      return 1;
    break;
  }
  return 0;
}

char* ltype_name(int t) {
  switch(t) {
    case LVAL_FUN: return "Function";
    case LVAL_NUM: return "Number";
    case LVAL_ERR: return "Error";
    case LVAL_SYM: return "Symbol";
    case LVAL_STR: return "String";
    case LVAL_SEXPR: return "S-Expression";
    case LVAL_QEXPR: return "Q-Expression";
    default: return "Unknown";
  }
}

/* Lisp Environment */

struct lenv {
  lenv* par;
  int count;
  char** syms;
  lval** vals;
};

lenv* lenv_new(void) {
  lenv* e = malloc(sizeof(lenv));
  e->par = NULL;
  e->count = 0;
  e->syms = NULL;
  e->vals = NULL;
  return e;
}

void lenv_del(lenv* e) {
  for (int i = 0; i < e->count; i++) {
    free(e->syms[i]);
    lval_del(e->vals[i]);
  }  
  free(e->syms);
  free(e->vals);
  free(e);
}

lenv* lenv_copy(lenv* e) {
  lenv* n = malloc(sizeof(lenv));
  n->par = e->par;
  n->count = e->count;
  n->syms = malloc(sizeof(char*) * n->count);
  n->vals = malloc(sizeof(lval*) * n->count);
  for (int i = 0; i < e->count; i++) {
    n->syms[i] = malloc(strlen(e->syms[i]) + 1);
    strcpy(n->syms[i], e->syms[i]);
    n->vals[i] = lval_copy(e->vals[i]);
  }
  return n;
}

lval* lenv_get(lenv* e, lval* k) {
  
  for (int i = 0; i < e->count; i++) {
    if (strcmp(e->syms[i], k->sym) == 0) { return lval_copy(e->vals[i]); }
  }
  
  if (e->par) {
    return lenv_get(e->par, k);
  } else {
    return lval_err("Unbound Symbol '%s'", k->sym);
  }
}

void lenv_put(lenv* e, lval* k, lval* v) {
  
  for (int i = 0; i < e->count; i++) {
    if (strcmp(e->syms[i], k->sym) == 0) {
      lval_del(e->vals[i]);
      e->vals[i] = lval_copy(v);
      return;
    }
  }
  
  e->count++;
  e->vals = realloc(e->vals, sizeof(lval*) * e->count);
  e->syms = realloc(e->syms, sizeof(char*) * e->count);  
  e->vals[e->count-1] = lval_copy(v);
  e->syms[e->count-1] = malloc(strlen(k->sym)+1);
  strcpy(e->syms[e->count-1], k->sym);
}

void lenv_def(lenv* e, lval* k, lval* v) {
  while (e->par) { e = e->par; }
  lenv_put(e, k, v);
}

/* Builtins */

#define LASSERT(args, cond, fmt, ...) \
  if (!(cond)) { lval* err = lval_err(fmt, ##__VA_ARGS__); lval_del(args); return err; }

#define LASSERT_TYPE(func, args, index, expect) \
  LASSERT(args, args->cell[index]->type == expect, \
    "Function '%s' passed incorrect type for argument %i. Got %s, Expected %s.", \
    func, index, ltype_name(args->cell[index]->type), ltype_name(expect))

#define LASSERT_NUM(func, args, num) \
  LASSERT(args, args->count == num, \
    "Function '%s' passed incorrect number of arguments. Got %i, Expected %i.", \
    func, args->count, num)

#define LASSERT_NOT_EMPTY(func, args, index) \
  LASSERT(args, args->cell[index]->count != 0, \
    "Function '%s' passed {} for argument %i.", func, index);

lval* lval_eval(lenv* e, lval* v);

lval* builtin_lambda(lenv* e, lval* a) {
  LASSERT_NUM("\\", a, 2);
  LASSERT_TYPE("\\", a, 0, LVAL_QEXPR);
  LASSERT_TYPE("\\", a, 1, LVAL_QEXPR);
  
  for (int i = 0; i < a->cell[0]->count; i++) {
    LASSERT(a, (a->cell[0]->cell[i]->type == LVAL_SYM),
      "Cannot define non-symbol. Got %s, Expected %s.",
      ltype_name(a->cell[0]->cell[i]->type), ltype_name(LVAL_SYM));
  }
  
  lval* formals = lval_pop(a, 0);
  lval* body = lval_pop(a, 0);
  lval_del(a);
  
  return lval_lambda(formals, body);
}

lval* builtin_list(lenv* e, lval* a) {
  a->type = LVAL_QEXPR;
  return a;
}

lval* builtin_head(lenv* e, lval* a) {
  LASSERT_NUM("head", a, 1);
  LASSERT_TYPE("head", a, 0, LVAL_QEXPR);
  LASSERT_NOT_EMPTY("head", a, 0);
  
  lval* v = lval_take(a, 0);  
  while (v->count > 1) { lval_del(lval_pop(v, 1)); }
  return v;
}

lval* builtin_tail(lenv* e, lval* a) {
  LASSERT_NUM("tail", a, 1);
  LASSERT_TYPE("tail", a, 0, LVAL_QEXPR);
  LASSERT_NOT_EMPTY("tail", a, 0);

  lval* v = lval_take(a, 0);  
  lval_del(lval_pop(v, 0));
  return v;
}

lval* builtin_eval(lenv* e, lval* a) {
  LASSERT_NUM("eval", a, 1);
  LASSERT_TYPE("eval", a, 0, LVAL_QEXPR);
  
  lval* x = lval_take(a, 0);
  x->type = LVAL_SEXPR;
  return lval_eval(e, x);
}

lval* builtin_join(lenv* e, lval* a) {
  
  for (int i = 0; i < a->count; i++) {
    LASSERT_TYPE("join", a, i, LVAL_QEXPR);
  }
  
  lval* x = lval_pop(a, 0);
  
  while (a->count) {
    lval* y = lval_pop(a, 0);
    x = lval_join(x, y);
  }
  
  lval_del(a);
  return x;
}

lval* builtin_op(lenv* e, lval* a, char* op) {
  
  for (int i = 0; i < a->count; i++) {
    LASSERT_TYPE(op, a, i, LVAL_NUM);
  }
  
  lval* x = lval_pop(a, 0);
  
  if ((strcmp(op, "-") == 0) && a->count == 0) { x->num = -x->num; }
  
  while (a->count > 0) {  
    lval* y = lval_pop(a, 0);
    
    if (strcmp(op, "+") == 0) { x->num += y->num; }
    if (strcmp(op, "-") == 0) { x->num -= y->num; }
    if (strcmp(op, "*") == 0) { x->num *= y->num; }
    if (strcmp(op, "/") == 0) {
      if (y->num == 0) {
        lval_del(x); lval_del(y);
        x = lval_err("Division By Zero.");
        break;
      }
      x->num /= y->num;
    }
    
    lval_del(y);
  }
  
  lval_del(a);
  return x;
}

lval* builtin_add(lenv* e, lval* a) { return builtin_op(e, a, "+"); }
lval* builtin_sub(lenv* e, lval* a) { return builtin_op(e, a, "-"); }
lval* builtin_mul(lenv* e, lval* a) { return builtin_op(e, a, "*"); }
lval* builtin_div(lenv* e, lval* a) { return builtin_op(e, a, "/"); }

lval* builtin_var(lenv* e, lval* a, char* func) {
  LASSERT_TYPE(func, a, 0, LVAL_QEXPR);
  
  lval* syms = a->cell[0];
  for (int i = 0; i < syms->count; i++) {
    LASSERT(a, (syms->cell[i]->type == LVAL_SYM),
      "Function '%s' cannot define non-symbol. "
      "Got %s, Expected %s.",
      func, ltype_name(syms->cell[i]->type), ltype_name(LVAL_SYM));
  }
  
  LASSERT(a, (syms->count == a->count-1),
    "Function '%s' passed too many arguments for symbols. "
    "Got %i, Expected %i.",
    func, syms->count, a->count-1);
    
  for (int i = 0; i < syms->count; i++) {
    if (strcmp(func, "def") == 0) { lenv_def(e, syms->cell[i], a->cell[i+1]); }
    if (strcmp(func, "=")   == 0) { lenv_put(e, syms->cell[i], a->cell[i+1]); } 
  }
  
  lval_del(a);
  return lval_sexpr();
}

lval* builtin_def(lenv* e, lval* a) { return builtin_var(e, a, "def"); }
lval* builtin_put(lenv* e, lval* a) { return builtin_var(e, a, "="); }

lval* builtin_ord(lenv* e, lval* a, char* op) {
  LASSERT_NUM(op, a, 2);
  LASSERT_TYPE(op, a, 0, LVAL_NUM);
  LASSERT_TYPE(op, a, 1, LVAL_NUM);
  
  int r;
  if (strcmp(op, ">")  == 0) { r = (a->cell[0]->num >  a->cell[1]->num); }
  if (strcmp(op, "<")  == 0) { r = (a->cell[0]->num <  a->cell[1]->num); }
  if (strcmp(op, ">=") == 0) { r = (a->cell[0]->num >= a->cell[1]->num); }
  if (strcmp(op, "<=") == 0) { r = (a->cell[0]->num <= a->cell[1]->num); }
  lval_del(a);
  return lval_num(r);
}

lval* builtin_gt(lenv* e, lval* a) { return builtin_ord(e, a, ">");  }
lval* builtin_lt(lenv* e, lval* a) { return builtin_ord(e, a, "<");  }
lval* builtin_ge(lenv* e, lval* a) { return builtin_ord(e, a, ">="); }
lval* builtin_le(lenv* e, lval* a) { return builtin_ord(e, a, "<="); }

lval* builtin_cmp(lenv* e, lval* a, char* op) {
  LASSERT_NUM(op, a, 2);
  int r;
  if (strcmp(op, "==") == 0) { r =  lval_eq(a->cell[0], a->cell[1]); }
  if (strcmp(op, "!=") == 0) { r = !lval_eq(a->cell[0], a->cell[1]); }
  lval_del(a);
  return lval_num(r);
}

lval* builtin_eq(lenv* e, lval* a) { return builtin_cmp(e, a, "=="); }
lval* builtin_ne(lenv* e, lval* a) { return builtin_cmp(e, a, "!="); }

lval* builtin_if(lenv* e, lval* a) {
  LASSERT_NUM("if", a, 3);
  LASSERT_TYPE("if", a, 0, LVAL_NUM);
  LASSERT_TYPE("if", a, 1, LVAL_QEXPR);
  LASSERT_TYPE("if", a, 2, LVAL_QEXPR);
  
  lval* x;
  a->cell[1]->type = LVAL_SEXPR;
  a->cell[2]->type = LVAL_SEXPR;
  
  if (a->cell[0]->num) {
    x = lval_eval(e, lval_pop(a, 1));
  } else {
    x = lval_eval(e, lval_pop(a, 2));
  }
  
  lval_del(a);
  return x;
}

/* Change forward declaration */
lval* lval_read_expr(char* s, int* i, char end);

lval* builtin_load(lenv* e, lval* a) {
  LASSERT_NUM("load", a, 1);
  LASSERT_TYPE("load", a, 0, LVAL_STR);
  
  /* Open file and check it exists */
  FILE* f = fopen(a->cell[0]->str, "rb");
  if (f == NULL) {
    lval* err = lval_err("Could not load Library %s", a->cell[0]->str);
    lval_del(a);
    return err;
  }
  
  /* Read File Contents */
  fseek(f, 0, SEEK_END);
  long length = ftell(f);
  fseek(f, 0, SEEK_SET);
  char* input = calloc(length+1, 1);
  fread(input, 1, length, f);
  fclose(f);
  
  /* Read from input to create an S-Expr */
  int pos = 0;
  lval* expr = lval_read_expr(input, &pos, '\0');
  free(input);
  
  /* Evaluate all expressions contained in S-Expr */
  if (expr->type != LVAL_ERR) {
    while (expr->count) {
      lval* x = lval_eval(e, lval_pop(expr, 0));
      if (x->type == LVAL_ERR) { lval_println(x); }
      lval_del(x);
    }
  } else {
    lval_println(expr);
  }
  
  lval_del(expr);    
  lval_del(a);
  
  return lval_sexpr();
}

lval* builtin_print(lenv* e, lval* a) {  
  for (int i = 0; i < a->count; i++) {
    lval_print(a->cell[i]); putchar(' ');
  }
  putchar('\n');
  lval_del(a);
  return lval_sexpr();
}

lval* builtin_error(lenv* e, lval* a) {
  LASSERT_NUM("error", a, 1);
  LASSERT_TYPE("error", a, 0, LVAL_STR);
  lval* err = lval_err(a->cell[0]->str);  
  lval_del(a);
  return err;
}

void lenv_add_builtin(lenv* e, char* name, lbuiltin func) {
  lval* k = lval_sym(name);
  lval* v = lval_builtin(func);
  lenv_put(e, k, v);
  lval_del(k); lval_del(v);
}

void lenv_add_builtins(lenv* e) {
  /* Variable Functions */
  lenv_add_builtin(e, "\\",  builtin_lambda); 
  lenv_add_builtin(e, "def", builtin_def);
  lenv_add_builtin(e, "=",   builtin_put);
  
  /* List Functions */
  lenv_add_builtin(e, "list", builtin_list);
  lenv_add_builtin(e, "head", builtin_head);
  lenv_add_builtin(e, "tail", builtin_tail);
  lenv_add_builtin(e, "eval", builtin_eval);
  lenv_add_builtin(e, "join", builtin_join);
  
  /* Mathematical Functions */
  lenv_add_builtin(e, "+", builtin_add);
  lenv_add_builtin(e, "-", builtin_sub);
  lenv_add_builtin(e, "*", builtin_mul);
  lenv_add_builtin(e, "/", builtin_div);
  
  /* Comparison Functions */
  lenv_add_builtin(e, "if", builtin_if);
  lenv_add_builtin(e, "==", builtin_eq);
  lenv_add_builtin(e, "!=", builtin_ne);
  lenv_add_builtin(e, ">",  builtin_gt);
  lenv_add_builtin(e, "<",  builtin_lt);
  lenv_add_builtin(e, ">=", builtin_ge);
  lenv_add_builtin(e, "<=", builtin_le);
  
  /* String Functions */
  lenv_add_builtin(e, "load",  builtin_load); 
  lenv_add_builtin(e, "error", builtin_error);
  lenv_add_builtin(e, "print", builtin_print);
}

/* Evaluation */

lval* lval_call(lenv* e, lval* f, lval* a) {
  
  if (f->builtin) { return f->builtin(e, a); }
  
  int given = a->count;
  int total = f->formals->count;
  
  while (a->count) {
    
    if (f->formals->count == 0) {
      lval_del(a);
      return lval_err("Function passed too many arguments. "
        "Got %i, Expected %i.", given, total); 
    }
    
    lval* sym = lval_pop(f->formals, 0);
    
    if (strcmp(sym->sym, "&") == 0) {
      
      if (f->formals->count != 1) {
        lval_del(a);
        return lval_err("Function format invalid. "
          "Symbol '&' not followed by single symbol.");
      }
      
      lval* nsym = lval_pop(f->formals, 0);
      lenv_put(f->env, nsym, builtin_list(e, a));
      lval_del(sym); lval_del(nsym);
      break;
    }
    
    lval* val = lval_pop(a, 0);    
    lenv_put(f->env, sym, val);    
    lval_del(sym); lval_del(val);
  }
  
  lval_del(a);
  
  if (f->formals->count > 0 &&
    strcmp(f->formals->cell[0]->sym, "&") == 0) {
    
    if (f->formals->count != 2) {
      return lval_err("Function format invalid. "
        "Symbol '&' not followed by single symbol.");
    }
    
    lval_del(lval_pop(f->formals, 0));
    
    lval* sym = lval_pop(f->formals, 0);
    lval* val = lval_qexpr();    
    lenv_put(f->env, sym, val);
    lval_del(sym); lval_del(val);
  }
  
  if (f->formals->count == 0) {  
    f->env->par = e;    
    return builtin_eval(f->env, lval_add(lval_sexpr(), lval_copy(f->body)));
  } else {
    return lval_copy(f);
  }
  
}

lval* lval_eval_sexpr(lenv* e, lval* v) {
  
  for (int i = 0; i < v->count; i++) { v->cell[i] = lval_eval(e, v->cell[i]); }
  for (int i = 0; i < v->count; i++) { if (v->cell[i]->type == LVAL_ERR) { return lval_take(v, i); } }
  
  if (v->count == 0) { return v; }  
  if (v->count == 1) { return lval_eval(e, lval_take(v, 0)); }
  
  lval* f = lval_pop(v, 0);
  if (f->type != LVAL_FUN) {
    lval* err = lval_err(
      "S-Expression starts with incorrect type. "
      "Got %s, Expected %s.",
      ltype_name(f->type), ltype_name(LVAL_FUN));
    lval_del(f); lval_del(v);
    return err;
  }
  
  lval* result = lval_call(e, f, v);
  lval_del(f);
  return result;
}

lval* lval_eval(lenv* e, lval* v) {
  if (v->type == LVAL_SYM) {
    lval* x = lenv_get(e, v);
    lval_del(v);
    return x;
  }
  if (v->type == LVAL_SEXPR) { return lval_eval_sexpr(e, v); }
  return v;
}

/* Reading */

lval* lval_read_sym(char* s, int* i) {
  
  /* Allocate Empty String */
  char* part = calloc(1,1);
  
  /* While valid identifier characters */
  while (strchr(
      "abcdefghijklmnopqrstuvwxyz"
      "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
      "0123456789_+-*\\/=<>!&", s[*i]) && s[*i] != '\0') {
    
    /* Append character to end of string */
    part = realloc(part, strlen(part)+2);
    part[strlen(part)+1] = '\0';
    part[strlen(part)+0] = s[*i];
    (*i)++;
  }
  
  /* Check if Identifier looks like number */
  int is_num = strchr("-0123456789", part[0]) != NULL;
  for (int j = 1; j < strlen(part); j++) {
    if (strchr("0123456789", part[j]) == NULL) { is_num = 0; break; }
  }
  if (strlen(part) == 1 && part[0] == '-') { is_num = 0; }
  
  /* Add Symbol or Number as lval */
  lval* x = NULL;
  if (is_num) {
    errno = 0;
    long v = strtol(part, NULL, 10);
    x = (errno != ERANGE) ? lval_num(v) : lval_err("Invalid Number %s", part);
  } else {
    x = lval_sym(part);
  }
  
  /* Free temp string */
  free(part);
  
  /* Return lval */
  return x;
}

lval* lval_read_str(char* s, int* i) {
  
  /* Allocate empty string */
  char* part = calloc(1,1);
  
  /* More forward one step past initial " character */
  (*i)++;
  while (s[*i] != '"') {
    
    char c = s[*i];
    
    /* If end of input then there is an unterminated string literal */
    if (c == '\0') {
      free(part);
      return lval_err("Unexpected end of input");
    }
    
    /* If backslash then unescape character after it */
    if (c == '\\') {
      (*i)++;
      /* Check next character is escapable */
      if (strchr(lval_str_unescapable, s[*i])) {        
        c = lval_str_unescape(s[*i]);
      } else {
        free(part);
        return lval_err("Invalid escape sequence \\%c", s[*i]);
      }
    }
    
    /* Append character to string */
    part = realloc(part, strlen(part)+2);
    part[strlen(part)+1] = '\0';
    part[strlen(part)+0] = c;
    (*i)++;    
  }
  /* Move forward past final " character */
  (*i)++;
  
  lval* x = lval_str(part);
  
  /* free temp string */
  free(part);
  
  return x;
}

lval* lval_read(char* s, int* i);

lval* lval_read_expr(char* s, int* i, char end) {
  
  /* Either create new qexpr or sexpr */
  lval* x = (end == '}') ? lval_qexpr() : lval_sexpr();
  
  /* While not at end character keep reading lvals */
  while (s[*i] != end) {
    lval* y = lval_read(s, i);
    /* If an error then return this and stop */
    if (y->type == LVAL_ERR) {
      lval_del(x);
      return y;
    } else {
      lval_add(x, y);
    }
  }

  /* Move past end character */
  (*i)++;
  
  return x;

}

lval* lval_read(char* s, int* i) {
  
  /* Skip all trailing whitespace and comments */
  while (strchr(" \t\v\r\n;", s[*i]) && s[*i] != '\0') {
    if (s[*i] == ';') {
      while (s[*i] != '\n' && s[*i] != '\0') { (*i)++; }
    }
    (*i)++;
  }
  
  lval* x = NULL;

  /* If we reach end of input then we're missing something */
  if (s[*i] == '\0') {
    return lval_err("Unexpected end of input");
  }
  
  /* If next character is ( then read S-Expr */
  else if (s[*i] == '(') {
    (*i)++;
    x = lval_read_expr(s, i, ')');
  }
  
  /* If next character is { then read Q-Expr */
  else if (s[*i] == '{') {
    (*i)++;
    x = lval_read_expr(s, i, '}');
  }
  
  /* If next character is part of a symbol then read symbol */
  else if (strchr(
    "abcdefghijklmnopqrstuvwxyz"
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    "0123456789_+-*\\/=<>!&", s[*i])) {
    x = lval_read_sym(s, i);
  }
  
  /* If next character is " then read string */
  else if (strchr("\"", s[*i])) {
    x = lval_read_str(s, i);
  }
  
  /* Encountered some unexpected character */
  else {
    x = lval_err("Unexpected character %c", s[*i]);
  }
  
  /* Skip all trailing whitespace and comments */
  while (strchr(" \t\v\r\n;", s[*i]) && s[*i] != '\0') {
    if (s[*i] == ';') {
      while (s[*i] != '\n' && s[*i] != '\0') { (*i)++; }
    }
    (*i)++;
  }
  
  return x;
  
}

/* Main */    

int main(int argc, char** argv) {
  
  lenv* e = lenv_new();
  lenv_add_builtins(e);
  
  /* Interactive Prompt */
  if (argc == 1) {
  
    puts("Lispy Version 0.0.0.1.1");
    puts("Press Ctrl+c to Exit\n");
  
    while (1) {
    
      char* input = readline("lispy> ");
      add_history(input);
      
      /* Read from input to create an S-Expr */
      int pos = 0;
      lval* expr = lval_read_expr(input, &pos, '\0');
      
      /* Evaluate and print input */
      lval* x = lval_eval(e, expr);
      lval_println(x);
      lval_del(x);
      
      free(input);
    }
  }
  
  /* Supplied with list of files */
  if (argc >= 2) {  
    for (int i = 1; i < argc; i++) {
      lval* args = lval_add(lval_sexpr(), lval_str(argv[i]));
      lval* x = builtin_load(e, args);      
      if (x->type == LVAL_ERR) { lval_println(x); }
      lval_del(x);
    }
  }
  
  lenv_del(e);
  
  return 0;
}

Bonus Marks


  • › Make syntax errors give the line and column where the error occured.

Navigation

• Contents •