Skip to content
Go back

Writing an Interpreter in Go

Published:  at  09:23 PM

Introduction

When I stumbled upon the book “Writing an Interpreter in Go”, shared by some fellow developers, I was immediately intrigued. The idea of building an interpreter from scratch seemed daunting yet fascinating. In this post, I’ll break down the core components of an interpreter—lexers, parsers, abstract syntax trees (ASTs), evaluators, and more. Let’s dive in!

What’s a Lexer?

A lexer (or lexical analyzer, or scanner) is the first step in the process of interpreting source code. Its job is to read the raw input (the program as text) and break it up into a sequence of tokens. Each token represents a meaningful element: a keyword, identifier, operator, literal, etc.

Why is a Lexer important?

How does the Lexer work?

Lexer Structure:

type Lexer struct {
input        string
position     int
readPosition int
ch           byte 
}

Initialization:

func New(input string) *Lexer {
	l := &Lexer{
        input: input,
    }

	l.readChar()
	return l
}

Reading Characters:

func (l *Lexer) readChar() {
	if l.readPosition >= len(l.input) {
		l.ch = 0 // ASCII code for 'NUL', signals end of input
	} else {
		l.ch = l.input[l.readPosition]
	}
	l.position = l.readPosition
	l.readPosition += 1
}

Tokenization:The Heart of the Lexer

func (l *Lexer) NextToken() token.Token

Helper Methods:

Token Construction:

How the Lexer Advances:

Example Walkthrough:

Given input:

let x = 5 + 10;

How Lexers Can Differ:

Purpose of the Token Component

The Token component defines:

Tokens are the “words” of your language, produced by the lexer and consumed by the parser.

Token Structure:

type Token struct {
	Type    TokenType
	Literal string
}

Token Type Constants:

Keyword Lookup:

var keywords = map[string]TokenType{
	"fn":     FUNCTION,
	"let":    LET,
	"true":   TRUE,
	"false":  FALSE,
	"if":     IF,
	"else":   ELSE,
	"return": RETURN,
}

func LookupIdent(ident string) TokenType {
	if tok, ok := keywords[ident]; ok {
		return tok
	}
	return IDENT
}

How Tokens Are Used:

Example:

Given:

let x = 5;

The lexer would emit:

Purpose of the Parser

The parser’s job is to take the stream of tokens produced by the lexer and turn them into an Abstract Syntax Tree (AST), which represents the syntactic structure of the program. This tree is then used by the evaluator to execute the program.

Parser Structure:

type Parser struct {
	l      *lexer.Lexer
	errors []string

	curToken  token.Token
	peekToken token.Token

	prefixParseFns map[token.TokenType]prefixParseFn
	infixParseFns  map[token.TokenType]infixParseFn
}

Operator Precedence:

The parser uses a precedence climbing (Pratt parsing) approach:

Initialization:

func New(l *lexer.Lexer) *Parser

Parsing Functions:

These are registered in the constructor, allowing the parser to be easily extended.

Main Parsing Loop:

func (p *Parser) ParseProgram() *ast.Program

Statement Parsing:

Expression Parsing:

Other Features:

Error Handling: If a token is not expected, the parser records an error but tries to continue parsing.

Example: Parsing an Expression:

Give:

let x = 5 + 3 * 2;

Extensibility:

New syntax can be added by registering new prefix/infix parse functions and updating the precedence map.

Purpose of AST:

The AST is a tree-like data structure that represents the syntactic structure of the source code.

Core Interfaces:

type Node interface {
	TokenLiteral() string
	String() string
}

type Statement interface {
	Node
	statementNode()
}

type Expression interface {
	Node
	expressionNode()
}

Key Node Types:

Program root:

type Program struct {
	Statements []Statement
}

Statements:

Expressions:

Node Examples:

type InfixExpression struct {
	Token    token.Token // the operator token, e.g., '+'
	Left     Expression
	Operator string
	Right    Expression
}
type LetStatement struct {
	Token token.Token // the 'let' token
	Name  *Identifier
	Value Expression
}

String Representation:

How the AST is Built:

Extensibility:

Example AST for a Simple Program:

Given: let x = 5 + 3;

The AST would look like:

Object:

The Object component defines the runtime data structures (values) used during the evaluation of a Monkey program. These objects represent the results of evaluating AST nodes and are the entities manipulated by the program at runtime (e.g., numbers, strings, functions).

Integer Object Creation

case *ast.IntegerLiteral:
    return &object.Integer{Value: node.Value}

Key Role in Pipeline

Evaluator:

The Evaluator is the execution engine of the Monkey interpreter. It walks the AST, interprets the meaning of each node, and produces runtime values (objects) based on the program’s logic. It manages variable scopes through an environment and handles control flow like returns and conditionals.

case *ast.InfixExpression:
    left := Eval(node.Left, env)
    right := Eval(node.Right, env)
	return evalInfixExpression(node.Operator, left, right)

REPL (Read-Eval-Print Loop):

The REPL provides an interactive interface for users to enter Monkey code, see it evaluated immediately, and view the results. It is a user-friendly way to experiment with the language without writing full programs.

func Start(in io.Reader, out io.Writer) {
	scanner := bufio.NewScanner(in)
	env := object.NewEnvironment()

	for {
		fmt.Print(PROMPT)
		scanned := scanner.Scan()
		if !scanned {
			return
		}

		line := scanner.Text()
		l := lexer.New(line)
		p := parser.New(l)

		program := p.ParseProgram()
		if len(p.Errors()) != 0 {
			printParserErrors(out, p.Errors())
			continue
		}

		evaluated := evaluator.Eval(program, env)
		if evaluated != nil {
			io.WriteString(out, evaluated.Inspect())
			io.WriteString(out, "\n")
		}
	}
}

func printParserErrors(out io.Writer, errors []string) {
	io.WriteString(out, MONKEY_FACE)
	io.WriteString(out, "Woops! We ran into some monkey business here!\n")
	io.WriteString(out, " parser errors:\n")
	for _, msg := range errors {
		io.WriteString(out, "\t"+msg+"\n")
	}
}

Summary of the Entire Pipeline

Now that we’ve covered all components, here’s a comprehensive summary of the Monkey interpreter pipeline, showing how each part interacts to transform source code into results:

  1. Source Code (Input): The raw text of a Monkey program, user input in the REPL.
  1. Lexer: Reads the source code character by character and produces a stream of Tokens (e.g., keywords, identifiers, operators).
  1. Parser: Consumes tokens and builds an Abstract Syntax Tree (AST), representing the syntactic structure of the program.
  1. Evaluator: Walks the AST recursively, interpreting each node to compute results and manage state using an environment. Produces runtime Objects (values like integers, strings).
  1. Output: The results of evaluation (objects) are either stored in the environment, returned as output in the REPL, or used for further computation.
Source Code → [Main Program/REPL] → Lexer → Tokens → Parser → AST → Evaluator → Objects → Output

When I first started coding, all of this would have seemed like magic to me. However, it’s really just the result of a few dedicated individuals putting in the extra effort to create something valuable and useful.


Share this post on:
Share this post on X

Next Post
Notes From the Undergound