Sunday, October 19th, 2014

Building a CARDIAC Assembler in Python

In my last article I gave a fully working example of a CARDIAC CPU simulator in Python. This article will continue down the path of creating a build toolchain for the Cardiac. Due to how the Cardiac accepts data from the outside world, namely in the form of a deck of cards or plain numbers, developing an assembler was a bit more difficult than a traditional bytecode assembler. At least when I build a binary assembler, I assemble the bytecodes in-memory. This ensures that all memory pointers are correctly pointing to the proper memory addresses, and also eliminates errors due to how information is stored. This is similar to how the MS-DOS DEBUG.EXE worked to build .COM files. It was simple and yet very effective. This tends to be the model I normally follow, as it makes debugging the virtual machine very easy. This is how I built up my simple-cpu project, and now I am in the process of enabling dynamic memory relocation to better enable the loading binary files. I also recently added a full memory map interface to it as well, so now memory mapped I/O is possible, and I plan on using that to allow my binaries to directly draw onto an SDL surface. Anyways, onward to the Cardiac Assembler, as that is the real reason your here today!

As of this post, you can view all the source code for this post and the last one on my Python Experiments Bitbucket repo.

The assembler is going to use just the standard Python library modules, so no fancy GNU Bison or YACC here. First, let's begin by defining what our language will look like, it's overall syntax. Here is deck1.txt turned into an Assembly program:

# This deck1.txt converted over to ASM format.
# Lines which start with a hash are comments.

# Here we declare our variables in memory for use.
var count = 0
var i = 9

# This is needed to add the special Cardiac bootstrap code.
bootstrap

# Program begins here.
CLA
STO $count
label loop
CLA $i
TAC $endloop
OUT $count
CLA $count
ADD
STO $count
CLA $i
SUB
STO $i
JMP $loop
label endloop
HRS

# End of program.
end

To make developing programs for the CARDIAC easier, the assembler is going to introduce the concept of variable storage, and labels. The first few lines here that begin with var are basically setting aside memory addresses to store these variables in question. The bootstrap statement is more of a Macro as it just writes out a basic bootstrap loader for use on CARDIAC. There is also an alternative bootloader, which uses a second stage boot program to load in your code. This has the advantage of being able to load in very large programs without needing a very large deck. I'd recommend playing around with both options when assembling your programs. The bootloader was custom built code made by me, here is an assembly listing of it:

INP 2
JMP 0

INP 89
INP 1
INP 90
CLA 89
INP 91
ADD
INP 92
STO 89
INP 93
CLA 98
INP 94
SUB
INP 95
STO 98
INP 96
TAC 3
INP 97
JMP 89
INP 98
INP 13
INP 2
JMP 89

The bootloader is passed in the size of the deck to load, and loops over each card to load in your program, this means that your input program will no longer need to have lots of address markings. It can also be used to load in other data into a specific memory address. I am planning on updating the bootloader code to also act like a subroutine, so you could call it to load additional data into memory. Okay, so onto the actual Assembler now.

from cmd import Cmd
from cStringIO import StringIO
import shlex, sys

class AsmError(Exception):
    pass

class Assembler(Cmd):
    """
    I am sure this could be done better with say GNU Bison or Yacc,
    but that's more complicated than needed for a simple assembler.
    """
    op_map = {
        'inp': 0,
        'cla': 1,
        'add': 2,
        'tac': 3,
        'sft': 4,
        'out': 5,
        'sto': 6,
        'sub': 7,
        'jmp': 8,
        'hrs': 9,
    }
    padding = '00'
    def configure(self):
        self.start = self.addr = None
        self.pc = 0 #: Allows us to keep track of the program pointer.
        self.var_map = {} #: This is used to keep track of variables.
        self.labels = {} #: Stores the label names, and where they point to.
        self.buffer = StringIO() #: This is our buffer where we will store the CARDIAC deck
        self.size = 0
    def emptyline(self):
        """ This is requried due to how the Python Cmd module works... """
        pass
    def unknown_command(self, line):
        self.stdout.write('*** Unknown syntax: %s\n'%line)
    @property
    def ptr(self):
        """ This will always give the proper pointer in the deck. """
        if self.addr:
            return self.addr
        return self.pc
    def write_cards(self, *card_list):
        """ Helper fuction to make life easier. """
        for card in card_list:
            self.buffer.write('%s\n' % card)
        self.pc += len(card_list) #: Increment the program pointer.
    def write_addr(self):
        """ This method will only write out the address if we're in that mode. """
        if self.addr:
            self.write_cards('0%s' % self.addr)
            self.addr +=1
        self.size +=1

The first thing to notice here, is that the assembler is a subclass of Cmd, which is a command line parser. I choose to use a command-line parser and it makes things easier and can also be used interactively if I need to debug or later want to enable that feature. The first thing we declare in this class, is a dictionary called op_map, which maps our op codes to actual byte codes. This makes it efficient to write the output and also add new opcodes in the future. The first method here, configure() could have well been an override of __init__(), but I choose to just have it a separate method that I can call manually. The next 2 methods are specific to the command parser, and tell it how to handle empty lines and invalid commands. The ptr property added here is to make it transparent to the code the proper location in the deck at runtime. This is required due to how the Cardiac bootstraps code. You cannot just load in RAW code, you need to bootstrap your deck of cards, and the deck itself needs to keep track at which memory locations each of it's operations are being placed in. Since I offer 2 modes of assembly here, and also support RAW assembly, there are different ways the exact memory pointer is obtained for runtime variables and jumps. The next 2 methods are for writing specific card types, one being a normal list of cards, while the other being an address card. The address cards are only used if you assemble using bootstrap, the original Cardiac bootstrap program.

Since this is a command processor Python subclass, next we are going to put in some commands that our source code can use to perform specific tasks:

    def do_exit(self, args):
        return True
    def do_bootstrap(self, args):
        """ Places some basic bootstrap code in. """
        self.addr = 10
        self.start = self.addr #: Updates all required address variables.
        self.write_cards('002', '800')
        self.pc = self.start
    def do_bootloader(self, args):
        """ Places a Cardiac compatible bootloader in. """
        self.write_cards('002', '800') 
        addr = 89 #: This is the start address of the bootloader code.
        for card in ('001', '189', '200', '689', '198', '700', '698', '301', '889', 'SIZE'):
            self.write_cards('0%s' % addr, card)
            addr+=1
        self.write_cards('002', '889')
        self.pc = 1
    def do_var(self, args):
        """ Creates a named variable reference in memory, a simple pointer. """
        s = shlex.split(args)
        if len(s) != 3 or s[1] != '=':
            raise AsmError('Incorrect format of the "var" statement.')
        if s[0] in self.var_map:
            raise AsmError('Variable has been declared twice!')
        value = int(s[2])
        self.var_map[s[0]] = '000'+str(value)
    def do_label(self, args):
        """ Creates a named label reference in memory, a simple pointer. """
        if args == '':
            raise AsmError('Incorrect format of the "label" statement.')
        if args in self.labels:
            raise AsmError('Label has been declared twice!')
        ptr = '00'+str(self.ptr)
        self.labels[args] = ptr[-2:]

Okay, there's a fair amount to explain with this code. Firstly, the two first methods here, bootstrap and bootloader are obviously Cardiac specific, and it's doubtful that a homemade Assembler will even need such things. The bootstrap() method sets up some variables required for that mode of assembly, namely the address variables used to keep track which memory location we are at. We also write 2 basic cards, the bootstrap that Cardiac needs to load your program in a real or simulated Cardiac. The next method, bootloader is a bit more complex, it writes the same initial bootstrap required by all cardiac programs, but also writes out a second stage bootloader program which is responsible for loading in your program. Once I get to the guide about creating a compiler, the compiler will automatically choose which mode of assembly will be used based on the program being compiled.

The next two very important methods, do_var, and do_label are responsible for controlling the variable and label system of the assembler. They make it easy to set a side a memory address for use as a variable, and to place in a label within your code so that you may JMP to it. If these didn't exist, you would be forced to keep track of memory addresses yourself, and if you update your program, you will also need to update the memory addresses... So, variables and labels keep track of all that for you, and the memory addresses are generated automatically during assembly time. Both of these methods should be fairly easy to understand what they are doing. They are storing the variable and label name into a hash along with the needed metadata.

    def default(self, line):
        """ This method is the actual lexical parser for the assembly. """
        if line.startswith('#'):
            return
        s = shlex.split(line)
        op, arg = s[0].lower(), '00'
        if len(s) == 2:
            if s[1].startswith('$'):
                arg = '*%s' % s[1][1:]
            else:
                arg = self.padding+s[1]
                arg = arg[-2:]
        if op in self.op_map:
            self.write_addr()
            self.write_cards('%s%s' % (self.op_map[op], arg))
        else:
            self.unknown_command(line)
    def do_end(self, args):
        """ Finalizes your code. """
        for var in self.var_map:
            ptr = self.padding+str(self.ptr)
            self.write_addr()
            self.write_cards(self.var_map[var][-3:])
            self.labels[var] = ptr[-2:]
        if self.start:
            self.write_cards('002', '8%s' % self.start)
        self.buffer.seek(0)
        buf = StringIO()
        for card in self.buffer.readlines():
            if card[1] == '*': #: We have a label.
                card = '%s%s\n' % (card[0], self.labels[card[2:-1]])
            elif card[:4] == 'SIZE':
                card = '000'+str(self.size-1)
                card = '%s\n' % card[-3:]
            buf.write(card)
        buf.seek(0)
        print ''.join(buf.readlines())
        return True

Now onto the bread and butter of the assembler, these two methods here do most of the heavy lifting during the assembly process. First lets talk about default(), this method is called by the command-line parser when it cannot find the command the end-user attempted to call. Makes sense? So, here is where we place the code to check if any of those lines are assembly instructions and process them accordingly. First we filter out any comments, any line that starts with a hash is a commented line, simple as that. Next, we use the help of shlex module to parse the actual line. This module splits all the tokens into separate array elements, it even works with quotation marks, so take this line for example: echo "Example echo command" will result in only 2 array elements, the first being echo, and the second being the string. In the next line we set some needed defaults for Cardiac cards, so if there is a missing argument it will still work fine on the Cardiac. It is here that we also check the arguments to see if we are using any variables or labels and mark them accordingly in the buffer for the second pass.

The last and final method here do_end is when you place end in your code at the very end. This method is essentially the last pass, and which generates the final assembled code. It gathers together all the variables, and generates a table of pointers to them. Then we write out the required cards to run the program the Cardiac just loaded, if we are using the original bootstrap. The bootloader doesn't use this, the bootloader manages that during runtime. Then we seek the buffer to the beginning and being our last pass through the assembled data. We check for any memory pointer references and fill in the holes as needed in the output code. If using the bootloader, it needs to be passed the size of the program code to be loaded, and this is where we fill that in so that the bootloader can locate the variable. Then finally, we display the assembled Cardiac code to standard output.

You can find all the source for both the simulator and this assembler on my Python Experiments BitBucket page, along with an example deck that counts to 10 in both Assembly and Cardiac ready format. The code assembled by this assembler should run without error on either a real cardiac or a simulated one. You can find a simulator made in JavaScript by a Drexel University professor by following the links through this blog page here: Developing Upwards: CARDIAC: The Cardboard Computer.

Hopefully this guide/article was helpful and inspiring to you, and I hope to have the final part of this guide/toolchain ready for consumption soon, the guide on how to build a Compiler for the Cardiac computer.

About Me

My Photo
Names Kevin, hugely into UNIX technologies, not just Linux. I've dabbled with the demons, played with the Sun, and now with the Penguins.




Kevin Veroneau Consulting Services
Do you require the services of a Django contractor? Do you need both a website and hosting services? Perhaps I can help.

This Month

If you like what you read, please consider donating to help with hosting costs, and to fund future books to review.

Python Powered | © 2012-2014 Kevin Veroneau