Wednesday, October 15th, 2014

Building a CPU simulator in Python

You can find the entire source listing from this article, plus more interesting code on my Python Experiments BitBucket page. Plus, the source code there will evolve and grow.

Early this morning in my Planet Python feed, I read this really interesting article called Developing Upwards: CARDIAC: The Cardboard Computer, which is about this cardboard computer called the Cardiac. As some of my followers and readers might know, I have another project that I have been working on and evolving for the past months called simple-cpu, which I have released the source code for. I really should give that project a proper license so that others might feel interested in using it in their own projects... Anyways, hopefully I'll have that done by the time this publishes.

After reading that article and the page it links to, I felt a bit inspired and decided to write my own simulator for it since I already have experience in writing bytecode engines. I plan on following this article up with another on how to write a Assembler for it, then shortly after how to write a compiler. So basically, throughout these articles, you will learn how to create a build toolchain for the Cardiac in Python. The simple-cpu I wrote already has a fully working Assembler, and in the game I am putting it in, already has the first steps of a working compiler. I choose to also use Cardiac as a demonstration machine due to it's pure simplicity. There are no really complex mnemonics here, each opcode only accepts a single parameter, so it's perfect as a learning tool. Furthermore, all data parameters are known to be the same, there is no need to detect if the program wants a register, a literal, or a memory address. In fact, there is really only one register, the accumulator. So lets get started. We will be making this class-based, so that the scope is contained and so that you can easily sub-class it to say add new opcodes if you want to feel adventurous. First, we will focus on the Initialization routines. This CPU is very simple, so we only need to initialize the following: CPU registers, opcodes, memory space, card reader/input, and the printer/tty/output.

class Cardiac(object):
    """
    This class is the cardiac "CPU".
    """
    def __init__(self):
        self.init_cpu()
        self.reset()
        self.init_mem()
        self.init_reader()
        self.init_output()
    def reset(self):
        """
        This method resets the CPU's registers to their defaults.
        """
        self.pc = 0 #: Program Counter
        self.ir = 0 #: Instruction Register
        self.acc = 0 #: Accumulator
        self.running = False #: Are we running?
    def init_cpu(self):
        """
        This fancy method will automatically build a list of our opcodes into a hash.
        This enables us to build a typical case/select system in Python and also keeps
        things more DRY.  We could have also used the getattr during the process()
        method before, and wrapped it around a try/except block, but that looks
        a bit messy.  This keeps things clean and simple with a nice one-to-one
        call-map. 
        """
        self.__opcodes = {}
        classes = [self.__class__] #: This holds all the classes and base classes.
        while classes:
            cls = classes.pop() # Pop the classes stack and being
            if cls.__bases__: # Does this class have any base classes?
                classes = classes + list(cls.__bases__)
            for name in dir(cls): # Lets iterate through the names.
                if name[:7] == 'opcode_': # We only want opcodes here.
                    try:
                        opcode = int(name[7:])
                    except ValueError:
                        raise NameError('Opcodes must be numeric, invalid opcode: %s' % name[7:])
                    self.__opcodes.update({opcode:getattr(self, 'opcode_%s' % opcode)})
    def init_mem(self):
        """
        This method resets the Cardiac's memory space to all blank strings, as per Cardiac specs.
        """
        self.mem = ['' for i in range(0,100)]
        self.mem[0] = '001' #: The Cardiac bootstrap operation.
    def init_reader(self):
        """
        This method initializes the input reader.
        """
        self.reader = [] #: This variable can be accessed after initializing the class to provide input data.
    def init_output(self):
        """
        This method initializes the output deck/paper/printer/teletype/etc...
        """
        self.output = []

Hopefully I left enough comments in the code above for you to understand exactly what's going on here. You might notice here a key difference from my simple-cpu project, is the method which opcodes are handled. I actually plan on incorporating this new way of detecting opcodes into that project in the near future. This method makes it easier for other developers of the library to extend it for their own requirements. As mentioned before, that project has been evolving as I started to understand more about how things should be done. In fact, taking on a CPU simulator project is a really incredible learning experience all on it's own. If you are truly a computer scientist, you should understand the under workings of a CPU, and how each opcode is processed at it's lowest level. Plus, developing and seeing a custom built CPU simulator made by your own imagination is a very gratifying experience. It's almost like giving birth, as this machine was entirely built by your mind alone, and seeing it work, is something magical.

In the next part of this class, we will focus on the utility functions that we may need to call and use multiple times, these methods can also be overridden in subclasses to alter how the CPU actually functions:

    def read_deck(self, fname):
        """
        This method will read a list of instructions into the reader.
        """
        self.reader = [s.rstrip('\n') for s in open(fname, 'r').readlines()]
        self.reader.reverse()
    def fetch(self):
        """
        This method retrieves an instruction from memory address pointed to by the program pointer.
        Then we increment the program pointer.
        """
        self.ir = int(self.mem[self.pc])
        self.pc +=1
    def get_memint(self, data):
        """
        Since our memory storage is *string* based, like the real Cardiac, we need
        a reusable function to grab a integer from memory.  This method could be
        overridden if say a new memory type was implemented, say an mmap one.
        """
        return int(self.mem[data])
    def pad(self, data, length=3):
        """
        This function pads either an integer or a number in string format with
        zeros.  This is needed to replicate the exact behavior of the Cardiac.
        """
        orig = int(data)
        padding = '0'*length
        data = '%s%s' % (padding, abs(data))
        if orig < 0:
            return '-'+data[-length:]
        return data[-length:]

These are the various utility functions, all of which might be overridden in a subclass. Later in this article, I will also provide an alternate source output which displays how this simulator can be built using Python Mixin classes, to make things even more pluggable. Finally, we find ourselves at the final part of code to get this simulator running, the actual processing methods:

    def process(self):
        """
        Process a single opcode from the current program counter.  This is
        normally called from the running loop, but can also be called
        manually to provide a "step-by-step" debugging interface, or
        to slow down execution using time.sleep().  This is the method
        that will also need to used if you build a TK/GTK/Qt/curses frontend
        to control execution in another thread of operation.
        """
        self.fetch()
        opcode, data = int(math.floor(self.ir / 100)), self.ir % 100
        self.__opcodes[opcode](data)
    def opcode_0(self, data):
        """ INPUT Operation """
        self.mem[data] = self.reader.pop()
    def opcode_1(self, data):
        """ Clear and Add Operation """
        self.acc = self.get_memint(data)
    def opcode_2(self, data):
        """ Add Operation """
        self.acc += self.get_memint(data)
    def opcode_3(self, data):
        """ Test Accumulator contents Operation """
        if self.acc < 0:
            self.pc = data
    def opcode_4(self, data):
        """ Shift operation """
        x,y = int(math.floor(data / 10)), int(data % 10)
        for i in range(0,x):
            self.acc = (self.acc * 10) % 10000
        for i in range(0,y):
            self.acc = int(math.floor(self.acc / 10))
    def opcode_5(self, data):
        """ Output operation """
        self.output.append(self.mem[data])
    def opcode_6(self, data):
        """ Store operation """
        self.mem[data] = self.pad(self.acc)
    def opcode_7(self, data):
        """ Subtract Operation """
        self.acc -= self.get_memint(data)
    def opcode_8(self, data):
        """ Unconditional Jump operation """
        self.pc = data
    def opcode_9(self, data):
        """ Halt and Reset operation """
        self.reset()
    def run(self, pc=None):
        """ Runs code in memory until halt/reset opcode. """
        if pc:
            self.pc = pc
        self.running = True
        while self.running:
            self.process()
        print "Output:\n%s" % '\n'.join(self.output)
        self.init_output()

if __name__ == '__main__':
    c = Cardiac()
    c.read_deck('deck1.txt')
    try:
        c.run()
    except:
        print "IR: %s\nPC: %s\nOutput: %s\n" % (c.ir, c.pc, '\n'.join(c.output))
        raise

As mentioned above, I will now refactor the code into Mixin classes and provide a full source output of that:

class Memory(object):
    """
    This class controls the virtual memory space of the simulator.
    """
    def init_mem(self):
        """
        This method resets the Cardiac's memory space to all blank strings, as per Cardiac specs.
        """
        self.mem = ['' for i in range(0,100)]
        self.mem[0] = '001' #: The Cardiac bootstrap operation.
    def get_memint(self, data):
        """
        Since our memory storage is *string* based, like the real Cardiac, we need
        a reusable function to grab a integer from memory.  This method could be
        overridden if say a new memory type was implemented, say an mmap one.
        """
        return int(self.mem[data])
    def pad(self, data, length=3):
        """
        This function pads either an integer or a number in string format with
        zeros.  This is needed to replicate the exact behavior of the Cardiac.
        """
        orig = int(data)
        padding = '0'*length
        data = '%s%s' % (padding, abs(data))
        if orig < 0:
            return '-'+data[-length:]
        return data[-length:]

class IO(object):
    """
    This class controls the virtual I/O of the simulator.
    To enable alternate methods of input and output, swap this.
    """
    def init_reader(self):
        """
        This method initializes the input reader.
        """
        self.reader = [] #: This variable can be accessed after initializing the class to provide input data.
    def init_output(self):
        """
        This method initializes the output deck/paper/printer/teletype/etc...
        """
        self.output = []
    def read_deck(self, fname):
        """
        This method will read a list of instructions into the reader.
        """
        self.reader = [s.rstrip('\n') for s in open(fname, 'r').readlines()]
        self.reader.reverse()
    def format_output(self):
        """
        This method is to format the output of this virtual IO device.
        """
        return '\n'.join(self.output)
    def get_input(self):
        """
        This method is used to get input from this IO device, this could say
        be replaced with raw_input() to manually enter in data.
        """
        try:
            return self.reader.pop()
        except IndexError:
            # Fall back to raw_input() in the case of EOF on the reader.
            return raw_input('INP: ')[:3]
    def stdout(self, data):
        self.output.append(data)

class CPU(object):
    """
    This class is the cardiac "CPU".
    """
    def __init__(self):
        self.init_cpu()
        self.reset()
        try:
            self.init_mem()
        except AttributeError:
            raise NotImplementedError('You need to Mixin a memory-enabled class.')
        try:
            self.init_reader()
            self.init_output()
        except AttributeError:
            raise NotImplementedError('You need to Mixin a IO-enabled class.')
    def reset(self):
        """
        This method resets the CPU's registers to their defaults.
        """
        self.pc = 0 #: Program Counter
        self.ir = 0 #: Instruction Register
        self.acc = 0 #: Accumulator
        self.running = False #: Are we running?
    def init_cpu(self):
        """
        This fancy method will automatically build a list of our opcodes into a hash.
        This enables us to build a typical case/select system in Python and also keeps
        things more DRY.  We could have also used the getattr during the process()
        method before, and wrapped it around a try/except block, but that looks
        a bit messy.  This keeps things clean and simple with a nice one-to-one
        call-map. 
        """
        self.__opcodes = {}
        classes = [self.__class__] #: This holds all the classes and base classes.
        while classes:
            cls = classes.pop() # Pop the classes stack and being
            if cls.__bases__: # Does this class have any base classes?
                classes = classes + list(cls.__bases__)
            for name in dir(cls): # Lets iterate through the names.
                if name[:7] == 'opcode_': # We only want opcodes here.
                    try:
                        opcode = int(name[7:])
                    except ValueError:
                        raise NameError('Opcodes must be numeric, invalid opcode: %s' % name[7:])
                    self.__opcodes.update({opcode:getattr(self, 'opcode_%s' % opcode)})
    def fetch(self):
        """
        This method retrieves an instruction from memory address pointed to by the program pointer.
        Then we increment the program pointer.
        """
        self.ir = self.get_memint(self.pc)
        self.pc +=1
    def process(self):
        """
        Process a single opcode from the current program counter.  This is
        normally called from the running loop, but can also be called
        manually to provide a "step-by-step" debugging interface, or
        to slow down execution using time.sleep().  This is the method
        that will also need to used if you build a TK/GTK/Qt/curses frontend
        to control execution in another thread of operation.
        """
        self.fetch()
        opcode, data = int(math.floor(self.ir / 100)), self.ir % 100
        self.__opcodes[opcode](data)
    def opcode_0(self, data):
        """ INPUT Operation """
        self.mem[data] = self.get_input()
    def opcode_1(self, data):
        """ Clear and Add Operation """
        self.acc = self.get_memint(data)
    def opcode_2(self, data):
        """ Add Operation """
        self.acc += self.get_memint(data)
    def opcode_3(self, data):
        """ Test Accumulator contents Operation """
        if self.acc < 0:
            self.pc = data
    def opcode_4(self, data):
        """ Shift operation """
        x,y = int(math.floor(data / 10)), int(data % 10)
        for i in range(0,x):
            self.acc = (self.acc * 10) % 10000
        for i in range(0,y):
            self.acc = int(math.floor(self.acc / 10))
    def opcode_5(self, data):
        """ Output operation """
        self.stdout(self.mem[data])
    def opcode_6(self, data):
        """ Store operation """
        self.mem[data] = self.pad(self.acc)
    def opcode_7(self, data):
        """ Subtract Operation """
        self.acc -= self.get_memint(data)
    def opcode_8(self, data):
        """ Unconditional Jump operation """
        self.pc = data
    def opcode_9(self, data):
        """ Halt and Reset operation """
        self.reset()
    def run(self, pc=None):
        """ Runs code in memory until halt/reset opcode. """
        if pc:
            self.pc = pc
        self.running = True
        while self.running:
            self.process()
        print "Output:\n%s" % self.format_output()
        self.init_output()

class Cardiac(CPU, Memory, IO):
    pass

if __name__ == '__main__':
    c = Cardiac()
    c.read_deck('deck1.txt')
    try:
        c.run()
    except:
        print "IR: %s\nPC: %s\nOutput: %s\n" % (c.ir, c.pc, c.format_output())
        raise

You can find the code for deck1.txt from the Developing Upwards: CARDIAC: The Cardboard Computer article, I used the counting to 10 example.

Hopefully this article was inspiring to you and gives you a brief understanding on how to built class-bases modular and pluggable code in Python, and also gave you a nice introduction to developing a CPU simulator. In the next article which I hope to publish soon, will guide you through making a basic assembler for this CPU, so that you can easily and effortlessly create decks to play around with in the simulator.

Comment #1: Posted 2 years, 11 months ago by Calvin Spealman

This is such cool work! The simplicity of the CARDIAC is certainly something to admire. If you don't mind I'll probably follow up with a nod towards this. Are you planning to put the code up at a proper repo anywhere?

Comment #2: Posted 2 years, 11 months ago by Kevin Veroneau

@Calvin, the code is up in a repo, look inside the information tooltip box at the very top of this article for a click-able link to the BitBucket page.

About Me

My Photo
Names Kevin, hugely into UNIX technologies, not just Linux. I've dabbled with the demons, played with the Sun, and now with the Penguins.




Kevin Veroneau Consulting Services
Do you require the services of a Django contractor? Do you need both a website and hosting services? Perhaps I can help.

This Month

If you like what you read, please consider donating to help with hosting costs, and to fund future books to review.

Python Powered | © 2012-2014 Kevin Veroneau