Simple Brainfuck interpreter in C

2020.09.02 | tags: programming · tutorials

How Brainfuck works

There are only 8 symbols supported in Brainfuck:

Symbol Function
> Increase position of pointer
< Decrease position of pointer
+ Increase value of pointer
- Decrease value of pointer
[ Beginning of loop
] End of loop
. Output ASCII code of pointer
, Read a character and stores its ASCII value in pointer

It’s best to imagine Brainfuck programs as arrays of integers, which the pointer can manipulate. Let’s say this is our initial state:

{ 0, 0, 0, 0, 0, 0 }

We can assign values to each position in the array by moving the pointer around. Using the + and - symbols, we can increment or decrement by 1 each time. If we wanted to move the pointer two times to the right and increment the value there by 3, we would have a program that looks like this:

>>+++

The updated version of the array:

{ 0, 0, 3, 0, 0, 0 }

Following the same logic, we can assign specific values to each cell and make an actual program. If we wanted to print the letter “B” on the screen, which corresponds to the ASCII value 66, we could write the following program:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.

In order to avoid writing things like this we can use loops. The loop is executed as long as the value inside it is not 0. Essentially, it’s going to do the multiplication 10 x 6 = 66 and then print the value:

+++++ +++++		# add 10 to cell #0
[			# beginning of loop
	> +++ +++	# add 6 to cell #1
	< -		# subtract 1 from cell #0
]			# end of loop
			# value at cell 1 is now 66 (10 x 6 = 66)
> .			# go to cell 1 and print its value

Or, for compactness:

+++++++++++[>++++++<-]>.

You can learn more about how Brainfuck works here.

Building the interpreter

We’ll first read the Brainfuck source from stdin into a static size buffer. 50.000 bytes should be large enough to store any Brainfuck program, since I doubt anyone is mad enough to write actual programs in it:

#define BUFSIZE 50000
.
.
.
size_t len = 0;
char buf[BUFSIZE];

while (read(STDIN_FILENO, &buf[len], 1) > 0)
	len++;
buf[len] = '\0';

We’ll declare the rest of the needed variables:

int closed;		/* number of active loops */
int opened;		/* number of inactive loops */
int pos = 0;		/* position in the program */
unsigned short *pc;	/* program counter */
char *src;		/* source code */

One of the reasons we have a len variable is to allocate just enough memory for src. We’ll also empty the buffer because we now want to use it to store the values the Brainfuck program will produce:

if ((src = malloc(len)) == NULL) {
	perror("malloc");
	exit(1);
}
strcpy(src, buf);
memset(buf, 0, len);

We can now parse the source code symbol by symbol. pc will act as the “pointer”, moving back and forth in the array. Each symbol will have its own case inside the following switch statement:

for (pc = (unsigned short *)buf, pos = 0; pos < len; pos++) {
	switch (src[pos]) {
		...
	}
}

For the < and > symbols we simply move the pointer:

case '>':
	pc++;
	break;
case '<':
	pc--;
	break;

The + and - symbols in/decrement the value of the cell the pointer is currently at:

case '+':
	(*pc)++;
	break;
case '-':
	(*pc)--;
	break;

To implement the . and , symbols we’ll use the standard library’s putchar() and getchar() functions:

case '.':
	putchar(*pc);
	break;
case ',':
	*pc = getchar();
	break;

Now comes the last, but harder part, which is to implement loops. The logic behind my implementation is that instead of keeping track of every bracket to know where a loop starts and ends, the program keeps going through the source code and, using a counter, we know that a loop starts or ends when that counter is 0 and and an opposite bracket has been found. Also, in each iteration the pos variable changes accordingly so that we can imitate the looping behavior of Brainfuck.

These are the steps the program follows for each of the two symbols:

Beginning of loop: [

case '[':
if (!(*pc)) {
	for (opened = 0; pos++; pos < len; pos++) {
		if (src[pos] == ']' && !opened)
			break;
		else if (src[pos] == '[')
			opened++;
		else if (src[pos] == ']')
			opened--;
	}
}
break;

End of loop: ]

case ']':
if ((*pc)) {
	for (closed = 0; pos--; pos >= 0; pos--) {
		if (src[pos] == '[' && !closed)
			break;
		else if (src[pos] == ']')
			closed++;
		else if (src[pos] == '[')
			closed--;
	}
}
break;

Putting it all together

Below is the full program. You can also find it here.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define BUFSIZE 50000

int
main(int argc, char *argv[])
{
	size_t len = 0;
	int closed, opened, pos = 0;
	unsigned short *pc;
	char buf[BUFSIZE], *src;

	while (read(STDIN_FILENO, &buf[len], 1) > 0)
		len++;
	buf[len] = '\0';

	if ((src = malloc(len)) == NULL) {
		perror("malloc");
		exit(1);
	}
	strcpy(src, buf);
	memset(buf, 0, len);

	for (pc = (unsigned short *)buf; pos < len; pos++) {
		switch (src[pos]) {
		case '>':
			pc++;
			break;
		case '<':
			pc--;
			break;
		case '+':
			(*pc)++;
			break;
		case '-':
			(*pc)--;
			break;
		case '.':
			putchar(*pc);
			break;
		case ',':
			*pc = getchar();
			break;
		case '[':
			if (!(*pc)) {
				for (opened = 0, pos++; pos < len; pos++) {
					if (src[pos] == ']' && !opened)
						break;
					else if (src[pos] == '[')
						opened++;
					else if (src[pos] == ']')
						opened--;
				}
			}
			break;
		case ']':
			if (*pc) {
				for (closed = 0, pos--; pos >= 0; pos--) {
					if (src[pos] == '[' && !closed)
						break;
					else if (src[pos] == ']')
						closed++;
					else if (src[pos] == '[')
						closed--;
				}
			}
			break;
		}
	}
	free(src);

	return (0);
}