Instruction Set

Understand what an Instruction Set Architecture (ISA) is and how it defines what the CPU can do. Learn how assemblers work to translate human-readable mnemonics into binary machine code, and how different ISAs use different instruction formats.

As we mentioned before, each processor has a different Instruction Set Architecture (ISA), which is a set of instructions that the CPU can understand.

Each instruction is identified by an OP code, which is a binary number that tells the CPU what operation to perform. To make it easier to understand, assembly languages convert this binary number into a human-readable format, which we call a mnemonic, and is what we usually consider "an instruction".

The assembler

The assembler can be seen similarly as a compiler that converts the assembly code into machine code.

The programmer writes code in assembly language, which then gets fed to the assembler, which will convert it into machine code.

Even though instructions often share the same structure, the way that data is represented in each instruction can be different. The assembler hides this complexity from the programmer, who usually just needs to worry about picking the right instruction and the right operands.

Syntax of assembly languages

Most assembly languages follow the same syntax, which is:

<instruction mnemonic> <operand1>, <operand2>, ...

Where the instruction mnemonic is the name of the instruction we want to use, followed by a list (possibly even empty) of operands. Each operand can be a register, memory address, immediate value (a constant value), etc.

Here are examples of adding two registers in different assembly languages:

# x86
add eax, ebx

# RISC-V
add a0, a1, a2

# MIPS
add $t0, $t1, $t2

# M68K
add d0, d1

Labels, macros and directives

Assemblers also allow you to write labels, which are a way to name a specific address in memory, often used to mark the position of a specific instruction. During assembly, the assembler will replace the label with the actual address of the instruction.

The assembler will also allow you to use macros and directives.

  • Macros are a way to define a sequence of instructions that can be reused multiple times in the code.

  • Directives are special instructions that are not executed by the CPU, but are used by the assembler to control the assembly process. They can, for example, be used to define constants, reserve space in memory, or include other files.

Instruction naming

Instructions in assembly languages are usually named after the operation they perform, which makes it easier to understand what the code does.

For example, the add instruction is used to add two values, since the word "add" is pretty short.

For more complex operations, the names are usually either shorthands of a word, or an acronym of the operation.

A few examples:

  • mov (move): moves a value from one place to another.
  • cmp (compare): compares two values.
  • jmp (jump): jumps to a specific address.
  • bge (branch if greater or equal): branches to a specific address if the first value is greater than or equal to the second value.
  • bclr (bit clear): clears a specific bit in a value.
  • etc...