Computer Architecture – Nand2Tetris (Week 5)

  1. Von Neumann Architecture
    • Computer can run any kind of software
    • Memory stores data and programs
    • CPU carries out instructions
    • CPU has two main parts: ALU and registers
    • 3 Types of info: 1) data, 2) address (what instruction or piece of data), 3) control (what system should do)
    • Wires implemented with buses
    • ALU takes numbers, does stuff, and feeds output back to data bus
    • Registers store intermediate results. Info moves from data bus to registers and vice versa
    • Memory: data memory and program memory
  2. Fetch-Execute Cycle
    • Instructions need to be fetched and then executed
    • Fetch
      • put location of next instruction into the address of program memory
      • get instruction code by reading memory contents of location
      • next instruction usually +1, but can jump / output this to program counter
    • Execute
      • different subsets of bits control different aspects
      • feeds into control bus and tells ALU what to do
    • there is clash between fetch and execute cycle (data when executing and instruction when fetching)
    • This will be simplified in hack computer by having different memory modules (harvard architecture)
  3. Central Processing Unit
    • determines which instruction runs next
    • executes the current instruction, figure out what to execute next
    • Input
      • inM from Data Memory 16-bit
      • instruction from instruction memory 16-bit
      • reset from user 1-bit
    • Output
      • outM 16-bit (what to write)
      • writeM 1-bit (enables write)
      • addressM 15-bit (where)
      • PC 15-bit (next instruction)
    • CPU architecture is outlined in the lecture based on chips from previous project.
    • CPU needs to decode Op code and handle A-instructions and C-instructions
    • Reset bit makes PC start from beginning
  4. The Hack Computer
    • if right hand side has M, value is read from inM
    • if on left, the ALU output is stored in main memory
    • Data Memory has 3 chips: RAM, screen, keyboard
    • ROM32K will have instruction memory (read-only and contains a sequence of Hack instructions)

Machine Language – Nand2Tetris (Week 4)

  1. Machine Language Overview
    • Before implementing language want to get some familiarity using it
    • goal is universality, meaning same hardware can run many different software programs
    • universal turing machine (theory) and von Neumann architecture (practice)
    • Machine Language has 1) Operations 2) Program Counter and 3) Address
    • machine language are in bits (0/1)
    • Machine Language is not convenient to program in which is why we have compilers for translating higher level languages
    • If we want to be this low level , use assembly, which is a symbolic version of machine language
    • an assembler can convert assembly to machine language
  2. Machine Language Elements
    • should support operations and how program is controlled
    • usually machine language has close correspondence with the hardware architecture
    • some simple operations (add, sub, and, or, goto, etc)
    • others have other operations e.g. division and data types (e.g floating point)
    • There is usually a memory hierarchy: registers -> cache -> main -> disk
    • speed/cost trade off
    • all CPUs have some registers so no delay in accessing those values
      • Data Registers
      • Address Registers
      • Direct           // Add R1, M[200]
      • Indirect        //  Add R1, @A
      • Immediate  // Add 73, R1
    • There are also many I/O devices (keyboard, mouse, camera, etc) and we connect them as memory
  3. Hack Machine Language
    • [instruction memory] –instructions–> [CPU] . <–data in/data out–> [data memory]
    • Data Memory (RAM): sequence of 16-bit registers e.g. RAM[0], RAM[1]
    • Instruction memory: Seq of 16-bit registers
    • CPU performs 16-bit instructions
    • There are instruction, data, and address bus
    • A-instructions set the A (address) register and have syntax like “@21” which selects RAM[21]. Before acting on memory you need to select the register
    • C-instructions are computations i.e. 0, 1, -1, D, A, !D, !A, -A, D+1, A+1, D-1, A-1, D+A, D-A, A-D, D&A, D|A     // . (A can be replaced with M)
    • dest can be null, M, D, MD, A, AM, AD, AMD  // can output to multiple containers
    • Jump instructions null, JGT (>0), JEQ (==0), JGE (>=0), JLT ( <0), JLE (<=0), JNE (!=0), JMP (jump)
    • D=-1     // set D register to -1 
      @300     // set A (Address Register) to 300
      M=D-1    // set RAM[300] to value of D-register minus 1
      @56      // set A register to 56
      D-1; JEQ // if D-1 is equal to 0, jump to 56
      
  4. Hack Language Specification
    • symbolic syntax has associated binary
    • A register value is based on addresss space
    • binary is split up: 1-bit op code, 2-bits not used, 7-bit computation, 3-bits for destination, 3 bits for jump (total of 16 bits)
  5. Input/Output
    • Screen
      • For a screen need to translate a screen memory map (256 rows & 512 columns) to corresponding registers
      • will require 8k 16-bit words
      • 32 chunks correspond to a row
      • i = 32*row + col/16
      • we will assume we have a 8k chip called screen
    • Keyboard
      • only need 16-bits to represent
      • memory map will be 0 if nothing is pressed
  6. Hack Programming
    • Use CPU emulator to translate and load assembly language
    • use jumps to implement branching (e.g. if else while)
    • make readable by setting symbolic references
    • (END)
        @END
        0; JMP // this will jump back to line 1
      
    • use variables e.g. @name. will find some available memory
    • helps to start with pseudo-code and/or use a trace-table
    • variables that store memory addresses are called pointers

Memory – Nand2Tetris (Week 3)

  1. Sequential Logic
    • need a notion of things happening after another
    • must be able to reuse hardware to compute multiple things and remember state
    • we break up time into discrete units which helps us think about things in cycles / units
    • combinatorial: input and output happens at same time unit out[t] = function(in[t])
    • sequential: out[t] = function(in[t-1])
  2. Flip Flips
    • Need something to remember bits of information
    • flip flop (DFF) goes to 0 and back to 1
    • we will treat this as a primitive operation
    • First chip we will design will remember its information forever unless we set the load
  3. Memory Units
    • main memory: RAM
    • secondary memory: disks, memory chips
    • volatile/non-volatile
    • We will tend to focus on logical perspectives vs physical
    • we can combine single bit chips to make registers
    • registers will store some value “register state”
    • we can read a register by probing the output
    • registers are the building blocks of memory
    • RAM Abstraction: sequence of n addressable registers 0 to n-1
    • only one register in RAM is selected at a time
    • width of address input can be represented by k = log 2 n
    • to read from ram set address = i and out emits the state of register i
    • to write, set address = i, in = v, and load = 1 and out will emit v
    • Random Access Memory: any address access is basically instantaneous
  4. Counters
    • counter tells what instruction to do next
    • must be able to reset (set to 0), next (inc 1), and goto (set n)
  5. Project 3
    • Will start with chips from project 1, project 2, and DFF gate
    • Will build Bit, Register, RAM8, RAM64, RAM512, RAM4K, RAM16K, PC (Program Counter)
    • 1 Bit – can be built from DFF and multiplexor
    • 16-Bit – multiple 1-Bit registers
    • 8-Register RAM – feed the in value to all registers at same time. use mux/demux to select exact registers
    • stack in recursively ascending manner to build bigger ones
    • think of address input as having two fields: 1) select RAM part 2) select register w/in the RAM part
    • use mux/demux to effect this addressing scheme
    • PC – can be built from register, an incrementer, and some logic gates
  6. Perspectives
    • Flip flop gates can be implemented with NAND gates that loop output to each other which allows storing of state
    • RAM is most important and volatile
    • ROM (Read only memory) is non-volatile and has code for booting process
    • flash memory: combines good from both RAM and ROM. Is read/write and non-volatile
    • cache memory: there is cost trade-off between faster memory. can use hierarchy of smaller more expensive/faster memory. this lets you have small fast/expensive memory and large/cheap memory and still have good performance
    • all memory is based on registers though

Boolean Arithmetic and the ALU – Nand2Tetris (Week 2)

  1. Binary Numbers
    • 0s and 1s can represent integers based on 2^n
    • computers we usually have a fixed number of bits
    • unsigned 8 bit can represent 256 different possibilities
  2. Binary Addition
    • We want to add, subtract, multiply and divide
    • we will implement multiplication and division in software
    • machine add binary with carry numbers
    • for overflows we ignore the most significant carry number
    • is not true integer addition, but module (2^wordsize)
    • half adder takes in a, b and outputs sum and carry
    • full adder takes in a, b, c and outputs sum and carry
    • add multiple bits with full adders and a half adder at last step
  3. Negative numbers
    • We give up half of our possibilities for negative values
    • Could use the most significant digit as negative signal, but have repetitive zeros and makes manipulation difficult
    • two’s complement the most significant position is negative
    • pos number range: 0.. 2^(2-1) -1
    • neg number range: -1 .. -2^(n-1)
    • This gives us addition and subtraction for free
    • If we can get number negation (pos to negative) we get subtraction
    • 2^n – x = 1 + ((2^n) -1) – x
    • e.g. (11111 – 10101) + 1
  4. ALU (Arithmetic Logic Unit)
    • computes a function on two inputs
    • we will take in x (16 bits), y (16 bit), and the following:
      • zx: if zx then x = 0
      • nx: if nx then x = !x
      • zy: if zy then y = 0
      • ny: if ny then y = !y
      • f: if f then out = x+y else x&y
      • no: if no then out = !out
    • output: out, zr, ng
  5. Project 2
    • Half-Adder: sum can be implemented with XOR and carry with AND
    • Full-Adder: can be built from two half-adders
    • 16-bit adder: n full-address chips. carry bit is piped from right to left. drop most significant carry
    • 16-bit Incrementor
    • ALU: build with 16 bit adders. can do with less than 20 lines in HDL
  6. Perspectives
    • Are these chips standard? most, but not the ALU. simplified.
    • How come ALU doesn’t have more operations? Opted for simplicity. can implement in software.
    • Are these efficient? Most are except the adder. made up of multiple full-adders and needs to iterate over them for the carry. Can use carry look ahead to optimize.
    • Why is it recommended to use built-in chips? Local failures. So we know mistakes are only from the current project. unit testing. focus on just the interface.

Boolean Functions and Gate Logic – Nand2Tetris (Week 1)

  1. Boolean Logic
    • 0/1, True/False, On/Off, Yes/No, Positive/Negative
    • AND, OR, NOT are simple operators
    • We can combine e.g. NOT(0 OR (1 AND 1))
    • They can be generalized e.g. f(x,y,z) = (x AND y) OR (NOT(x) AND z)
    • All combos can be expressed as a truth table
    • Can follow communicative, associative and distributive laws
      • communicative: (x AND y) = (y AND x)
      • associative: (x AND (y AND z)) = ((x AND y) AND z)
      • distributive: (x AND (y OR z)) = (x AND y) OR (x AND z)
      • De Morgan’s Law: NOT(x AND y) = NOT(x) OR NOT(y)
  2. Boolean Functions Synthesis
    • Can compose from truth tables
    • Construct functions by going line by line of a truth table and tacking on an OR
    • NAND function can represent AND, NOT, and OR
    • NAND is negation of x and y. outputs 0 if x and y = 1 else 1.
    • NOT(x) = (x NAND x) // 0 input will always be 1 and 1 input will always be 0
    • (x AND y) = NOT(x NAND y) // 1 only if x and y are 1
  3. Logic Gates
    • chips can be thought of as simple logic gates
    • can combine to build other logic gates
    • interface is abstraction that describes what it’s doing
    • there should only be 1 abstraction/interface, but can be multiple implementations
  4. Hardware Description Language (HDL)
    • language for specifying these logic gates
    • Interface will be something like “IN a, b” and “OUT out”
    • Implementation will be statements such as “Not(in=a, out=nota)” (describe intermediary connections with names)
    • Some common HDLs in industry: VHDL and Verilog
  5. Hardware Simulation
    • included with class
    • can run interactive or systematic tests
    • compare files can be used to compare output and test for accuracy
    • compare files can be created with high level language
    • Typically designer (teacher) decides what sub-chips required and API and the developers (students in this course) will implement
  6. Multi-Bit Buses
    • A bunch of bits can be described as a bus
    • described with e.g. A[16], a[0..7] = lsb
    • buses are indexed right to left
    • In a 16 bit bus, A[0] is right most bit and a[15] is left-most or most-significant bit
  7. Project 1 Overview
    • Start with NAND which will be given and will build a computer
    • Will start with 15 gates
    • Multiplexor: takes in 3 inputs a,b,sel and 1 output (out). out is a if sel == 0 else b
    • Demultiplexor: takes in a input, sel and outputs to a or b based on sel
    • practical application of multiplexor and demultiplexor is sending 2 groups of data in one stream over a network and splitting out at the end
    • Try to build chips in order
    • You can design helper chips like methods, but shouldn’t need to and is discouraged
    • Use as few parts as possible
  8. Perspective (FAQs)
    • Can you start with something other than NAND? You can use NOR or AND, NOT, OR, but NAND is popular bc cheap to build and used a lot already
    • How are NAND gates implemented? We try to stay away bc more physics and electrical engineering than CS. One example is a positive and negative charge between an output and two transistors. If the input a and b are active, the negative charge becomes output. A bit outdated, many implementations. (+) (output) (a) (b) (-)
    • How does the HDL we use compare to ones used in industry? Simpler, but is all we need to build our computers or any computer. Others are more advanced, but more complex.
    • How do you design complex chips? Some optimizers, but not perfect algorithm. NP complete problem. Like any software. modularity and abstraction. easier to optimize simpler components.

Introduction – Nand2Tetris (Week 1)

  • Intro
    • In part 1, we will build a simple general purpose computer focusing on hardware
    • In part2, we will build the software hierarchy to write and run programs
    • how things are done is the implementation and what we are doing is abstraction
    • one we’ve implemented, we can just worry about the interface and build on top of it
    • each week we will implement something and build on top of it
  • Part 1
    • This is EE / physics
    • will start with NAND gate and use to build logic gates, chip set and a computer architecure
    • Use hardware simulator
  • Projects
    • week 1: build 15 logic gates
    • week 2: arithmetic logic unit
    • week 3: memory systems
    • week 4: low-level programs
    • week 5: computer architecture
    • week 6: assembler
    • Result will be the HACK computer which is general purpose
  • Part 2
    • will build higher level language, compiler, std library and an operating system

Update

It’s been a long time since I last wrote and quite a bit has happened since then.

Picking up from October, I had been interviewing at a startup in Mountain View for a Full Stack Engineering position which I got through an online coding challenge. The interview process was interesting. I had to come in for a few days and build a Django app using their API. I received some good feedback but ultimately ended up not getting an offer. I had considered staying in the Bay area since I thought it might be easier to get a tech job there, but decided to just pack up and move to LA since my wife had just started dental school in that area.

I spent pretty much the next two months looking for a job. I did work on one more project to get some experience with full stack Javascript (React, Express, Mongo), but from the conversations I had with people, I felt that they thought I was job-ready. Almost everyday, I spent all my time researching companies and reaching out to people.

It was really demoralizing and a tough couple of months. I figured not having a CS background nor any professional experience would make online applications exceptionally useless so I focused all my efforts on going to events and reaching out to anyone I possibly could. Bootcamp graduates were particularly helpful and I’m awfully thankful. A lot of them had also transitioned into the industry from non-traditional backgrounds and were often willing to chat.

Towards the beginning of December, I finally started picking up traction with a few companies and had a few phone interviews. The job I did end up getting was through a UF alumni. I had done research on the company and thought the bar might be too high for their software engineer role. I was a little lonely and didn’t know many people so figured I’d reach out anyway. We met up and had lunch. He recommended looking into the company’s Data Engineer position which I did.

I started working at Factual in January and have been really happy since then. I’ve learned about distributed computing, Hadoop Map Reduce, and Spark. I’ve started doing work around Machine Learning and managing geographical data with PostGIS. I feel like I’m learning a lot and best of all, the culture and people here are really awesome :).

8.14 Boolean Evaluation – CCI

# [8.14] Boolean Evaluation: Given a boolean expression 
# consisting of the symbols 0 (false), 1 (true), &
# (AND), | (OR), and ^ (XOR), and a desired boolean result
# value 'result', implement a function to count the number
# of ways of parenthesizing the expression such that it
# evaluates to 'result'. The expression should be
# fully parenthesize (e.g., (0)^(1)) but not extraneously
# (e.g., (((0))^(1))).

# EXAMPLE
# countEval("1^0|0|1", False) -> 2
# countEval("0&0&0&1^1|0", True) -> 10

import unittest

def bool_eval(expr, result):
    
    if expr == "1":
        if result:
            return 1
        return 0
    if expr == "0":
        if not result:
            return 1
        return 0

    subways = 0

    for i in xrange(1,len(expr),2):
        
        operator = expr[i]
        left = expr[0:i]
        right = expr[i+1:]

        left_true = bool_eval(left, True)
        left_false = bool_eval(left, False)
        right_true = bool_eval(right, True)
        right_false = bool_eval(right, False)
        total_ways = (left_true + left_false) * (right_true + right_false)
        true_ways = 0

        if operator == "&":
            true_ways += left_true*right_true
        elif operator == "|":
            true_ways += (
                (left_true*right_true) 
                + (left_true*right_false) 
                + (left_false*right_true)
            )
        elif operator == "^":
            true_ways += (left_true*right_false) + (left_false*right_true)

        if result:
            subways += true_ways
        else:
            subways += total_ways - true_ways

    return subways
    
class Test(unittest.TestCase):

    def test_bool_eval(self):
        self.assertEqual(bool_eval("0&0&0&1^1|0", True), 10)
        self.assertEqual(bool_eval("1^0|0|1", False), 2)
        
if __name__ == '__main__':
    unittest.main()

8.13 Stack of Boxes – CCI

# [8.13] Stack of Boxes: You have a stack of boxes, with width 
# w, heights h, and depths d. The boxes cannot be rotated
# and can only be stacked on top of one another if each box
# in the stack is strictly larger than the box above it
# in in width, height, and depth. Implement a method to compute
# the height of the tallest possible stack. The height of a 
# stack is the sum of the heights of each box.

import unittest

def max_stack_height_helper(boxes):
    if len(boxes) == 1:
        return boxes[0].height

    current_box = boxes[0]
    stackable_boxes = get_stackable_boxes(current_box, boxes[1:])

    if stackable_boxes:
        current_height = current_box.height + max_stack_height_helper(stackable_boxes)
    else:
        current_height = current_box.height

    return max(current_height, max_stack_height_helper(boxes[1:]))

def max_stack_height(boxes):
    boxes.sort(key=lambda x: x.height, reverse=True)
    return max_stack_height_helper(boxes)


def get_stackable_boxes(box, boxes):
    for i, top_box in enumerate(boxes):
        if box.can_stack(top_box):
            return boxes[i:]
    return None

class Box(object):
    def __init__(self, width, height, length):
        self.width = width
        self.height = height
        self.length = length

    def can_stack(self, top_box):
        return (
            self.width > top_box.width and
            self.height > top_box.height and
            self.length > top_box.length
        )

    def __repr__(self):
        return 'Box(%s, %s, %s)' % (self.width, self.height, self.length)


class Test(unittest.TestCase):
    def setUp(self):
        b1 = Box(1,1,1)
        b2 = Box(2,2,2)
        b3 = Box(3,3,3)
        self.boxes1 = [b1,b2,b3]

        b4 = Box(1,1,1)
        b5 = Box(3,2,2)
        b6 = Box(3,3,3)
        self.boxes2 = [b4,b5,b6]

        b7 = Box(6,4,4)
        b8 = Box(7,5,5)
        b9 = Box(7,8,2)
        self.boxes3 = [b7,b8,b9]

        b10 = Box(5,5,5)
        self.boxes4 = [b10]

    
    def test_max_stack_height(self):
        self.assertEqual(max_stack_height(self.boxes1), 6)
        self.assertEqual(max_stack_height(self.boxes2), 4)
        self.assertEqual(max_stack_height(self.boxes3), 9)
        self.assertEqual(max_stack_height(self.boxes4), 5)


if __name__ == '__main__':
    unittest.main()

8.12 Eight Queens – CCI

# [8.12] Eight Queens: Write an algorithm to print all ways of
# arranging eight queens on an 8x8 chess board so that none
# of them share the same row, column, or diagonal. In this case,
# "diagonal" means all diagonals, not just the two that bisect
# the board.

import unittest

def queens(num_queens):
    results = []
    grid_size = num_queens
    columns = [None]*num_queens
    n_ways(columns, 0, results, grid_size)
    return results


def n_ways(columns, row, results, grid_size):
    
    if row == grid_size:
        results.append(columns)
        return

    for col in range(0, grid_size):
        if is_valid(columns, row, col, grid_size):
            cols_copy = columns[:]
            cols_copy[row] = col
            n_ways(cols_copy, row+1, results, grid_size)


def is_valid(columns, row1, col1, grid_size):
    
    if col1 in columns:
        return False

    for row2, col2 in enumerate(columns):
        if col2 is not None:
            if abs(col1 - col2) == abs(row1 - row2):
                return False

    return True


class Test(unittest.TestCase):
    
    def test_queens(self):
        self.assertEqual(len(queens(8)), 92)

if __name__ == '__main__':
    unittest.main()