Creating a programming language from scratch is an exciting journey that allows you to understand the inner workings of how languages are designed and interpreted. Whether you're a seasoned programmer or just starting out, building your own language can be a rewarding experience. In this article, we'll walk through the process of creating a simple, Lisp-like programming language without parentheses, using Python as our implementation language.
Before diving into the technical details, let's address the question: Why would you want to create a programming language? There are several reasons:
For this tutorial, we'll create a simple Lisp-like language called "SimpleLisp." Lisp is known for its minimalistic syntax and powerful macro system, but we'll simplify it even further by removing parentheses. Our language will have the following features:
The first step in creating a programming language is designing its syntax. Since we're aiming for a Lisp-like language without parentheses, we'll use a prefix notation for expressions. Here's how our syntax will look:
var x = 5
+ 2 3
(equivalent to 2 + 3
)if x > 3 then print "x is greater than 3" else print "x is not greater than 3"
func add(a, b) return a + b
The next step is to convert the source code into a series of tokens. Tokens are the smallest units of meaning in our language, such as keywords, operators, and literals.
Here's a simple tokenizer in Python:
import re
def tokenize(code):
tokens = re.findall(r'\b\w+\b|[+\-*/=<>]+|[\(\)]', code)
return tokens
# Example usage
code = "var x = 5 + 2"
tokens = tokenize(code)
print(tokens) # Output: ['var', 'x', '=', '5', '+', '2']
Once we have the tokens, we need to parse them into an Abstract Syntax Tree (AST). The AST represents the structure of the program in a way that is easy to interpret.
Here's a simple parser that converts tokens into an AST:
class ASTNode:
def __init__(self, type, value=None, children=None):
self.type = type
self.value = value
self.children = children if children else []
def parse(tokens):
if not tokens:
return None
token = tokens.pop(0)
if token == 'var':
name = tokens.pop(0)
tokens.pop(0) # Skip '='
value = parse(tokens)
return ASTNode('variable_declaration', value=name, children=[value])
elif token in ['+', '-', '*', '/']:
left = parse(tokens)
right = parse(tokens)
return ASTNode('binary_operation', value=token, children=[left, right])
elif token.isdigit():
return ASTNode('number', value=int(token))
elif token.isalpha():
return ASTNode('identifier', value=token)
else:
raise SyntaxError(f"Unexpected token: {token}")
# Example usage
tokens = ['var', 'x', '=', '5', '+', '2']
ast = parse(tokens)
print(ast)
With the AST in hand, we can now interpret it to execute the program. The interpreter will traverse the AST and perform the corresponding actions.
Here's a simple interpreter:
class Interpreter:
def __init__(self):
self.variables = {}
def interpret(self, node):
if node.type == 'variable_declaration':
name = node.value
value = self.interpret(node.children[0])
self.variables[name] = value
return value
elif node.type == 'binary_operation':
left = self.interpret(node.children[0])
right = self.interpret(node.children[1])
if node.value == '+':
return left + right
elif node.value == '-':
return left - right
elif node.value == '*':
return left * right
elif node.value == '/':
return left / right
elif node.type == 'number':
return node.value
elif node.type == 'identifier':
return self.variables[node.value]
else:
raise RuntimeError(f"Unknown node type: {node.type}")
# Example usage
interpreter = Interpreter()
result = interpreter.interpret(ast)
print(result) # Output: 7
Now that we have the basics working, let's add support for conditionals and functions.
For conditionals, we'll extend our parser and interpreter to handle if
statements:
def parse(tokens):
if not tokens:
return None
token = tokens.pop(0)
if token == 'var':
name = tokens.pop(0)
tokens.pop(0) # Skip '='
value = parse(tokens)
return ASTNode('variable_declaration', value=name, children=[value])
elif token == 'if':
condition = parse(tokens)
tokens.pop(0) # Skip 'then'
true_branch = parse(tokens)
tokens.pop(0) # Skip 'else'
false_branch = parse(tokens)
return ASTNode('if_statement', children=[condition, true_branch, false_branch])
elif token in ['+', '-', '*', '/']:
left = parse(tokens)
right = parse(tokens)
return ASTNode('binary_operation', value=token, children=[left, right])
elif token.isdigit():
return ASTNode('number', value=int(token))
elif token.isalpha():
return ASTNode('identifier', value=token)
else:
raise SyntaxError(f"Unexpected token: {token}")
class Interpreter:
def __init__(self):
self.variables = {}
def interpret(self, node):
if node.type == 'variable_declaration':
name = node.value
value = self.interpret(node.children[0])
self.variables[name] = value
return value
elif node.type == 'if_statement':
condition = self.interpret(node.children[0])
if condition:
return self.interpret(node.children[1])
else:
return self.interpret(node.children[2])
elif node.type == 'binary_operation':
left = self.interpret(node.children[0])
right = self.interpret(node.children[1])
if node.value == '+':
return left + right
elif node.value == '-':
return left - right
elif node.value == '*':
return left * right
elif node.value == '/':
return left / right
elif node.type == 'number':
return node.value
elif node.type == 'identifier':
return self.variables[node.value]
else:
raise RuntimeError(f"Unknown node type: {node.type}")
# Example usage
tokens = ['var', 'x', '=', '5', 'if', 'x', '>', '3', 'then', 'print', '"x is greater than 3"', 'else', 'print', '"x is not greater than 3"']
ast = parse(tokens)
interpreter = Interpreter()
result = interpreter.interpret(ast)
print(result) # Output: "x is greater than 3"
Finally, let's add support for functions. Functions will allow us to define reusable blocks of code.
def parse(tokens):
if not tokens:
return None
token = tokens.pop(0)
if token == 'var':
name = tokens.pop(0)
tokens.pop(0) # Skip '='
value = parse(tokens)
return ASTNode('variable_declaration', value=name, children=[value])
elif token == 'if':
condition = parse(tokens)
tokens.pop(0) # Skip 'then'
true_branch = parse(tokens)
tokens.pop(0) # Skip 'else'
false_branch = parse(tokens)
return ASTNode('if_statement', children=[condition, true_branch, false_branch])
elif token == 'func':
name = tokens.pop(0)
tokens.pop(0) # Skip '('
params = []
while tokens[0] != ')':
params.append(tokens.pop(0))
tokens.pop(0) # Skip ')'
body = parse(tokens)
return ASTNode('function_definition', value=name, children=[body] + params)
elif token in ['+', '-', '*', '/']:
left = parse(tokens)
right = parse(tokens)
return ASTNode('binary_operation', value=token, children=[left, right])
elif token.isdigit():
return ASTNode('number', value=int(token))
elif token.isalpha():
return ASTNode('identifier', value=token)
else:
raise SyntaxError(f"Unexpected token: {token}")
class Interpreter:
def __init__(self):
self.variables = {}
self.functions = {}
def interpret(self, node):
if node.type == 'variable_declaration':
name = node.value
value = self.interpret(node.children[0])
self.variables[name] = value
return value
elif node.type == 'if_statement':
condition = self.interpret(node.children[0])
if condition:
return self.interpret(node.children[1])
else:
return self.interpret(node.children[2])
elif node.type == 'function_definition':
name = node.value
params = node.children[1:]
body = node.children[0]
self.functions[name] = (params, body)
return None
elif node.type == 'binary_operation':
left = self.interpret(node.children[0])
right = self.interpret(node.children[1])
if node.value == '+':
return left + right
elif node.value == '-':
return left - right
elif node.value == '*':
return left * right
elif node.value == '/':
return left / right
elif node.type == 'number':
return node.value
elif node.type == 'identifier':
if node.value in self.variables:
return self.variables[node.value]
elif node.value in self.functions:
params, body = self.functions[node.value]
args = [self.interpret(child) for child in node.children]
old_vars = self.variables.copy()
for param, arg in zip(params, args):
self.variables[param] = arg
result = self.interpret(body)
self.variables = old_vars
return result
else:
raise RuntimeError(f"Unknown identifier: {node.value}")
else:
raise RuntimeError(f"Unknown node type: {node.type}")
# Example usage
tokens = ['func', 'add', '(', 'a', ',', 'b', ')', 'return', 'a', '+', 'b', 'var', 'x', '=', 'add', '2', '3']
ast = parse(tokens)
interpreter = Interpreter()
result = interpreter.interpret(ast)
print(result) # Output: 5
Creating a programming language from scratch is a challenging but rewarding endeavor. In this article, we've walked through the process of designing a simple Lisp-like language, tokenizing the source code, parsing it into an AST, and interpreting the AST to execute the program. We've also added support for variables, arithmetic operations, conditionals, and functions.
While our "SimpleLisp" language is quite basic, it serves as a foundation for more complex languages. You can extend it by adding more features, such as loops, lists, and error handling, or by experimenting with different syntax and semantics.
Remember, the key to creating a successful programming language is understanding the needs of the users and designing a language that meets those needs in a clear and intuitive way. Happy coding!