[Project 2 is due on Monday, 11/25]

Problem: For the grammar presented in Project1, generate an intermediate representation, which consists of the following:

  • A Control flow graph constructed for each function, where every basic block is a sequence of instructions in the 3-address form, where
  • Each instruction is in an SSA form, and
  • Each operation in the instruction has a 1-1 mapping with the MIPS32 opcode.

The code for generation of abstract syntax trees and symbol tables for the above grammar is here.

Note: The arrays must be of the same name in the CMM source code in order to share memory. In other words, all the array variables are added to the global symbol table.

Simplified Problem: If you are finding it difficult to construct the Control flow graph within the stipulated time, here is a sample code for the CFG construction. You have two choices:

(1) either consider this as an inspiration to have your version of CFG construction, OR

(2) use this code as a starting point to implement SSA using the algorithm described in the papers section.

Debugging: Please note that I have also given the utility for generating the Dotty dumps for the CFG. All the references for using dotty can be found here. This can be very helpful for debugging.

Testing: All the tests are presented here.

MIPS32 instruction set: Here is the subset of MIPS32 instruction set that is sufficient for our code generation. Note that in the description below, dst refers to a destination register; src1 and src2 refer to some source registers; imm refers to an immediate constant.

MIPS32 Instruction Description
add dst, src1, src2 dst = src1 + src2 [signed addition]
sub dst, src1, src2 dst = src1 – src2 [signed subtraction]
mul dst, src1, src2 dst = src1 * src2 [without full precision]
div dst, src1, src2 dst = src1/src2
neg dst, src1 dst = – src1 [arithmetic negation]
not dst, src1 dst = ! src1 [logical not]
and dst, src1, src2 dst = src1 && src2 [logical and]
or dst, src1, src2 dst = src1 || src2 [logical or]
li dst, imm dst = imm [constant assignment]
la dst, lab dst = address(lab) [Load address of label lab]
lw dst, imm(src1) Load word into dst from (address = src1 + imm)
sw src1, imm(src2) Store word from src1 into (address = src2 + imm)
move dst, src1 dst = src1
seq dst, src1, src2 dst = (src1 == src2)? 1 : 0
sne dst, src1, src2 dst = (src1 != src2)? 1 : 0
slt dst, src1, src2 dst = (src1 < src2)? 1 : 0
sle dst, src1, src2 dst = (src1 <= src2)? 1 : 0
sgt dst, src1, src2 dst = (src1 > src2)? 1 : 0
sge dst, src1, src2 dst = (src1 >= src2)? 1 : 0
beqz src1, lab Branch to lab if src1 == 0
lab: A label ‘lab’
j lab; jal lab Jump to lab; If jal is used, $ra is automatically set to the point of return
jr src1 Jump to location pointed to by register src1; src1 is typically $ra, the return address register.
syscall Make a system call

Some notes about the current design (in the shared source code):

(1) The construction of the abstract syntax tree (AST) is done in ccmm.y, where the AST is realized in terms of Stmt and Expr as fundamental nodes in the AST. A NodeVisitor class is used to set up the centralization of all operations over Stmt and Expr. The NodeVisitor interface is currently being leveraged through PrintVisitor (which prints out the AST textually) and ConstructCfgVisitor (which constructs the CFG). Right now, there are pointer leaks in the code as no clean-up has been done. But a CleanupVisitor class can easily be written to remove all the memory leaks.

(2) All the compound statements in the grammar are present within the Stmt class hierarchy, and the varieties of expressions in the grammar are captured within the Expr class hierarchy.

(3) The class hierarchy deriving from Expr can be shown as follows:


(4) The class hierarchy deriving from Stmt can be shown as follows:



(5) The ConstructCFGVisitor maps the AST into a Control flow graph (implemented using the BasicBlock and ControlFlowGraph classes), where the MIPS32 instruction set is directly used in the generated 3-address code. Each 3-address code is realized by the TargetInstr class. Certain instruction operands (for which the register mapping is already known) are pre-allocated. For example, the result register ($v0 or $2), the argument register ($a0 or $4) and the return register ($ra or $31) are directly used in the instructions that are generated for syscall.

(6) There is a special instruction class dedicated to call instructions (TargetCallInstr) that derives from TargetInstr class. The goal of this class is to provide services for generating the prologue and epilogue (stack frame maintenance) after registers have been allocated. This class carefully captures all the actual parameters of the call instruction.

(7) The arguments in the TargetInstr class deliberately uses Expr* as arguments which can be easily mapped to VarExpr*, ConstExpr* or LabelExpr* during the code generation process. This is so chosen because the arguments in all the assembly instructions are either a register (corresponding to a variable), an immediate constant or a label. The variables and labels are mapped to the symbol table entries. So, after register allocation, each variable in the symbol table would receive a mapping to a specific register number.

(8) We have added a special target instruction for emitting labels. We have called the instruction “label” which would be later mapped to the specific label format in the final code.

(9) I have changed the main() interface (in ccmm.y) to take arguments so that you could now issue the filename (*.cmm) as a command line parameter.

(10) The dotty dumping is done within DotCfg class. For example, for the test3.cmm (located here), the CFG can be visualized as follows:


Expected results:

I have added 3 simple SSA tests here.

For an if-then-else pattern captured in ssatest2_output, the CFG in SSA form should appear as follows:


For a while-loop pattern captured in ssatest3_output, the CFG in SSA form should appear as follows:


Submitting to provide:

provide comp181 project2 <file1> [<file2> [<file3> ...]]

Comments are closed.