Assembler programming

Getting to Grips with Assembly Language

Have you pushed Basic to its limits? Do you want to write routines
that are really fast, or go beyond what Basic can do? If so, Martyn
Fox will help you on your way

This is an extremely lengthy article, and has therefore been broken
into a number of 'chapters'. Clicking the following links will scroll
the page down to the corresponding heading, whereas clicking on the
heading itself will return the view to the top of the page.

•       ARM Registers
•       The Basic Assembler
•       Labels and loops
•       Branches and conditional execution
•       Loading and storing memory
•       Keeping things aligned

•       Two-pass assembly
•       Pre-indexed addressing
•       Subroutines and the stack
•       Shifting bits sideways
•       Further reading

The example programs referred to in this article are provided in an
associated archive, as is a StrongHelp manual (and a copy of
StrongHelp itself) which is referred to in the final section. Use the
icons to the right to access these resources.

Zip     See

You may be wondering, “What is assembly language and why bother to learn it?”

Assembly language is a convenient way of writing machine code: the
actual instructions used directly by the ARM or StrongARM processor in
your computer. A program written in assembly language is the most
efficient type of program; it should run faster than code produced by
other methods, and take up less memory.

It's also the most laborious to write and debug, though, so why
bother? It may be that you want to move on from Basic but can't be
bothered to learn C or C++, the other common programming options. It
may be that you want to write something which will run faster than a
Basic program (BBC Basic on a RISC OS machine is actually pretty fast,
but not as fast as machine code), or you may wish to write a module
(which certainly can't be written in Basic). Alternatively, you may
not want someone else to be able to dissect your programming, which
they can do fairly easily with a Basic program.

You've nothing to lose by having a dabble, so read on.

Hex dumps, bytes and wordsThe first thing to do is to see what machine
code actually looks like.

You can load a machine code file into a text editor of the type which
can show you a hexadecimal listing of its contents or, alternatively,
you could start up Basic and load the file into spare memory with the
*Load command, then look at it with the *Memory command. Either way,
you'll see something like this:

Address  :     7 6 5 4     B A 9 8     F E D C     3 2 1 0 :    ASCII Data

00009054 :    E92D4000    EB000011    68BD8000    E3A00006 : .@-é…ë.€½h.. ã

00009064 :    E3A03C01    EF02001E    68BD8000    E58C2000 : .< ã…ï.€½h. Œå

00009074 :    E1A0C002    EA000031    E92D4000    EB000007 : .À á1..ê.@-é…ë

00009084 :    68BD8000    E3A00006    E3A03C01    EF02001E : .€½h.. ã.< ã…ï

00009094 :    68BD8000    E58C2000    E1A0C002    EA000027 : .€½h. Œå.À á'..ê

000090A4 :    E24F20AC    E2422000    E28F3060    E2833000 : ¬ Oâ. Bâ`0?â.0?â

000090B4 :    E59F4084    E4920004    E4931004    E0211004 : „@?å..’ä..”ä..!à

000090C4 :    E1500001    128F0010    139EF201    E28F0068 : ..Pá..?..ò?.h.?â

000090D4 :    E1500003    1AFFFFF6    E1A0F00E    00000000 : ..Páöÿÿ..ð á…

Everything in this listing is in hexadecimal notation. Each line shows
the contents of 16 bytes of memory, split into groups of four bytes or
32 bits. Each number in the left-hand column is the address of the
first byte in the first group on the line, and the numbers along the
top line are the lowest digits of the addresses of the bytes below.
Each 32-bit group, incidentally, is called a word. It's important to
understand that every machine code instruction consists of one word
and that the address of the first byte of the word must be divisible
by four. The four bytes are then said to be word-aligned.

You'll no doubt agree that this listing is pretty well
incomprehensible. The processor knows what the contents of the words
mean, but to us poor humans it conveys nothing. Even the right-hand
section, which shows the result of treating the individual bytes as
ASCII code, tells us nothing.

We need a simpler way of looking at the code; one which will tell us
what each instruction means.

DisassemblyWe could make the text editor treat the file as machine
code or look at it in memory using the *MemoryI command instead of
*Memory, in which case we would see something like this:

00009054 : .@-é : E92D4000 : STMDB   R13!,{r14 }

00009058 : …ë : EB000011 : BL      &000090A4

0000905C : .€½h : 68BD8000 : LDMVSIA R13!,{pc }

00009060 : .. ã : E3A00006 : MOV     R0,#6

00009064 : .< ã : E3A03C01 : MOV     R3,#&0100

00009068 : …ï : EF02001E : SWI     XOS_Module

0000906C : .€½h : 68BD8000 : LDMVSIA R13!,{pc }

You can see from the addresses on the left-hand side that each line
now contains just one four-byte word, corresponding to one machine
code instruction. The second column shows what the four bytes would
represent if they were ASCII code and the third column shows the bytes
themselves: still as incomprehensible as before.

When we look at the fourth and fifth columns, though, things start to
get a little clearer. Each line in the fourth column contains a
mnemonic (a brief description of what the instruction does), followed
in the fifth column by which numbers or which of the processor's
registers are involved.

Clearly, the simplest way to write machine code is by writing the
mnemonics and getting them converted. The software which produced the
above listing is called a disassembler. A package which does the job
in reverse, turning mnemonics into machine code, is called an
assembler, and a program consisting of mnemonics is said to be written
in assembly language.

CommentsYou may have your own ideas about how many REM statements it's
worth adding to your own Basic programs. You may also occasionally
look at a program you wrote some time ago and wish you had added more

The equivalent to the REM in assembly language is the comment. A
comment starts with a semi-colon (;) and, like a REM statement, can
either occupy a line on its own or be added onto the end of an

It's a good idea to put plenty of comments in your assembly language
programs for two reasons. Firstly, assembly language is a lot more
inscrutable than Basic, and it may be difficult to comprehend the
workings of a program that you wrote several months ago without them.
Secondly, they do not inflate the 'final product' of your programming,
which is the machine code itself. You can put in as many comments as
you like, but none of them will appear in the assembled code.

ARM RegistersBefore we start assembling instructions, we must first
understand something of the internal architecture of the processor
that is going to use them.

Within the ARM and StrongARM processors there are sixteen
number-stores called registers, each of which can hold a 32-bit
number. These are referred to as R0, R1, R2 etc. through to R15. In
fact, there are more than sixteen, because the processor actually has
four modes of operation and some of the higher number registers are
replaced by alternative ones when the processor is switched into a
different mode, but that need not concern us in this article.

Register R15 is the program counter (PC). This always holds the
address of the next instruction to be read from memory. Usually, this
is increased by four (to point to the next word) each time an
instruction is read. Sometime, though, the instruction may be to
branch, i.e. jump, to a different address. When this happens, the
number in R15 is replaced by the address where the program jumps to.

Register R14 is called the link register. Sometimes, we may wish to
call a subroutine, a bit like calling a procedure in Basic. When the
subroutine has finished, we will want the program to jump back to the
point at which the subroutine was called and continue from there.

This is achieved with a branch linking instruction; you can see one at
address &9058 in the listing above, with the mnemonic BL. Before the
jump to the subroutine, the number in the program counter (the return
address) is copied into the link register. To return just involves
copying the link register back into the program counter.

The other higher-numbered registers do not have specific functions
within the processor but are often given special jobs by the software.
Register R13, for example, is normally used as the stack pointer,
containing an address within a section of memory used for temporary
storage. Most of the number-crunching is done by the lower-numbered
registers, R0 to R6.

Some of the assembly language instructions are concerned with copying
numbers from one register to another, or loading a number into a
register. The instruction at &9060, for example, moves the number 6
into register R0.

The Basic AssemblerAs we saw earlier, in order to write assembly
language you need an assembler. Fortunately, you already have one!
It's built into the BBC Basic interpreter included in every RISC OS
computer. Take a look at the short file Assem01 to see how to use it:


   10 REM > Assem01

   20 REM simple assembly language program


   40 DIM code% 12

   50 P%=code%

   60 [OPT 3

   70 MOV R0,#7

   80 SWI “OS_WriteC”

   90 MOV PC,R14

  100 ]

The assembler is turned on and off by the square brackets, [ and ], in
lines 60 and 100, and the assembly language instructions are put
between them. Before we get to that bit, line 40 sets up a block of
memory to hold the machine code instructions (12 bytes are enough for
the three instructions in this simple program) and line 50 sets up the
resident integer variable P%. This variable is used by the assembler
to represent the program counter and determines (in this case) where
in memory each instruction is put.

The OPT instruction in line 60 controls the way the assembler
operates. More about that later; for now, leave it set to OPT 3.

We'll look at the rest of the program later. In the meantime, try
running it. Observe what is on the screen, then press space or click
the mouse to get rid of it and get back to here.

You should have seen a command window showing something like the following:

00008FB8                    OPT 3

00008FB8 E3A00007           MOV R0,#7

00008FBC EF000000           SWI “OS_WriteC”

00008FC0 E1A0F00E           MOV PC,R14

The program started up the assembler and assembled each instruction at
the address pointed to by P%, putting what it was doing on the screen
each time. The first instruction was placed at address &8FB8, which
happened to be the value of code% and the start of the memory block
set up by the DIM instruction. After each address comes the
hexadecimal machine code instruction followed by the original mnemonic
which created it.

After each instruction has been assembled, P% is incremented by four
so that the next instruction is assembled four bytes further on. Don't
worry if your address numbers were different; it's not important, but
note that the addresses are always word-aligned; they all end in 0, 4,
8 or C because they're divisible by four.

All that this program does is assemble a bit of machine code; it
doesn't run it. To do that, prepare to run file Assem02, which is
identical to Assem01 except that it has two extra lines on the end:



  120 CALL code%

This time, when you run the program, you should see the same assembled
listing as before, though with different address numbers because the
Basic program is longer. The cursor will be flashing away underneath.
The program has paused at the loop in line 110. When you press a key
it will move on to line 120, which is an instruction to run the
machine code, starting at address code%.

Now run the file and see what happens; but first, a word of warning.
Machine code does not have the error-trapping capabilities of Basic.
If there is an error in your assembly language program (as there is
bound to be at some point), it is highly likely that your computer
will crash and have to be reset. If you're following an on-screen
guide such as this, make sure that you can get back to where you were
in it. Also make sure that you don't have any unsaved work on your

You should have found that, when you pressed a key, the machine
beeped, then asked you to press Space or click the mouse to return to
the desktop. It's time to examine the three instructions to find out
what they're doing:

MOV R0,#7

SWI “OS_WriteC”


The first instruction, the MOV command, is one of the simplest of all
assembly language instructions: a command to move something into a
register. If the instruction had been:

MOV R0,R1then this would have been a command to copy the contents of
register R1 into register R0. The hash symbol (#), though, means that
'7' is an immediate constant: it specifies the actual number to be
placed in the register. Without it, the assembler would assume it
meant register R7 (the 'R' is optional, but makes things clearer).
This instruction results in the number seven being placed in register

There is a limitation to the number that can be moved into a register
in this way. The machine code instruction consists of 32 bits, and
only eight of them are available to hold the number, so the number
itself can only have eight bits. The instruction

MOV R0,#&FFis permitted but

MOV R0,#&1FFis not.

The second instruction is, of course, a software interrupt (SWI), used
to call the operating system. In Basic, a SWI may be called using the
SYS command. As you may know, the command:

SYS “OS_Splodge”, a%, b%, c% TO x%, y%, z%puts the values of a%, b%
and c% into R0, R1 and R2 respectively, calls the SWI whose name is
given, and, on return, puts the values in R0, R1 and R2 into Basic
variables x%, y% and z%.

The first two instructions are the equivalent of the Basic command:

SYS “OS_WriteC”, 7which is the equivalent of VDU 7, a command to make
the machine beep.

If you were to copy Assem02 from the CD onto your hard disc so that
you could modify it and change the number after the hash from 7 to 65,
you should find that, instead of beeping, a letter 'A' appears on the
screen. The number 65 is, of course, the ASCII code for A, and the two
instructions are now the equivalent of VDU 65.

Getting back to BasicThe third instruction is the most important of
all. It brings the program to an end by making the processor jump back
to the Basic interpreter. Without it, the processor would load
whatever was in the memory after the last instruction and try to
execute it as a machine code instruction: definitely a way of crashing
the computer!

When Basic executed the CALL command, it treated your program as a
subroutine, getting to it with a Branch Linking or BL instruction.
This caused the address in the program counter (i.e. the first
instruction to be executed after your program had finished) to be
placed into R14 (this is often known as the return address). To get
back to this address, we simply have to copy the number in R14 back
into R15 which, you will recall, is the program counter.

The assembler recognises “PC” as referring to R15, so we use it to
remind ourselves that we are talking about the program counter. The
instruction MOV PC,R14 simply copies the contents of R14 into the
program counter, which is all that is required to pass control back to

Labels and loopsIn our next experiment, we're going to make use of
Basic variables within the assembly language part of the program.

Take a look at file Assem03:


   10 REM > Assem03

   20 REM simple assembly language program


   40 DIM code% 24

   50 P%=code%

   60 [OPT 3

   70 MOV R0,#ASC(“A”)

   80 .loop

   90 SWI “OS_WriteC”

  100 ADD R0,R0,#1

  110 CMP R0,#ASC(“Z”)+1

  120 BNE loop

  130 MOV PC,R14

  140 ]


  160 CALL code%

This program works in the same way as the previous one, but we've
added some more instructions. The memory block created by the DIM
command in line 40 has been enlarged to 24 bytes for this reason. In
fact, there's no harm in creating a large block, perhaps of several
thousand bytes, while you're experimenting, provided you have the RAM
to spare; it can mean that you avoid the risk of running out.

We saw how the previous program could be modified by changing the
immediate constant in the MOV instruction from 7 to 65 so that it
printed a letter A. Line 70 does the same thing, but the simple number
'65' has been replaced by ASC(“A”), which means 'the ASCII code for
A', to make the listing more readable. The result is just the same,
but it makes it easier to follow what the program is doing.

The word loop with a dot in front of it in line 80 is a label. This is
a way of marking the point in the program where something occurs so
that we could refer to it at some other place. We might use a label in
one of two ways:

1.      To mark a point we might wish to branch (i.e. jump) to;
2.      To mark some data we might wish to load into a register, or a place
in memory where we might wish to store some data.

In this program, the label is being used as part of a repeated loop,
to mark the point where the program jumps back to.

A label with a dot may either occupy a line on its own, as in this
case, or be placed in front of an instruction, separated from it by a

The label is, in fact, a Basic variable which is created and given the
current value of the program counter, P%. Putting a dot in front of it
is similar to typing:

loop=P%(except that you can't do that between the square brackets
where the assembler is operating.)

Line 90 operates in the same way as in the previous program, calling
the SWI “OS_WriteC” to print a letter A on the screen.

Line 100 introduces a new instruction, ADD, which, not surprisingly,
adds two numbers together. It has to be followed by three parameters,
referred to as the destination, operand one and operand two, such as:

ADD R0,R1,R2This instruction would mean, “Take the numbers stored in
registers R1 and R2, add them together and store the result in R0.”

The destination (R0) and operand one (R1) must be registers. Operand
two could be either a register or an immediate constant.

The actual instruction in line 100 has an immediate constant for
operand two and the register where the answer is stored is the same as
the one where the other number is taken from. There is nothing wrong
with this. The instruction means, “Take the number in R0, add 1 to it
and put the result back into R0,” in other words, “increment the
number in R0 by 1.”

Comparisons and processor flagsThe next instruction, CMP, is a
comparison and is followed by two operands. It subtracts operand 2
from operand 1, but doesn't put the result anywhere. Instead, it sets
or clears one or more of the processor flags. These are single bits
which are actually bits in R15 that are not used by the program
counter. There are four of them, each usually referred to by a letter:

Negative (N):           Set if operand 2 is greater than operand 1
Zero (Z):               Set if operand 2 is equal to operand 1 (i.e. the result of
the subtraction is zero)
Carry (C):              Set if operand 1 is greater than operand 2, treating them
as unsigned numbers
Overflow (V):           Set if a mathematical overflow occurred

In fact, various mathematical instructions can set these flags, but
they only do so if they have the suffix S on the end of their
mnemonic. The CMP instruction doesn't need the S suffix because its
only purpose is to set the flags.

When the CMP instruction is executed for the first time, operand 1
(R0) is the ASCII code for 'A' (65) and it subtracts 91, a number one
greater than the ASCII code for 'Z', from it. The result is clearly
negative, so the N flag would be set. Each time round the loop,
however, R0 has been increased by 1, and eventually reaches the value
91. When this happens, the result of the comparison becomes zero, the
N flag is clear and the Z flag is set.

Branches and conditional executionNow we get to the instruction in
line 120 where a decision is made whether to go back for a repeat of
the loop or to plough straight on. This instruction is a branch; a
jump forwards or backwards, specified not in terms of the destination
address but as the distance moved from where we are now. The number in
the program counter is increased or decreased by the number contained
in the instruction so that execution continues at a different point.
This need not greatly concern us, though, because the assembler works
everything out for us. We just have to tell it the address to branch
to, which is the value of the label loop.

If we wanted the program to branch every time it reached this point,
the instruction would be:

B loop

The other two letters, NE, mean that this instruction is our first
example of conditional execution.

Any instruction can be executed conditionally, the condition being
determined by the suffix. There are sixteen possibilities:

EQ              Equal (Z flag set)
NE              Not equal (Z flag clear)
CS              Carry flag set
CC              Carry flag clear
MI              Minus (N flag set)
PL              Positive or zero (N flag clear)
VS              Overflow (V) flag set
VC              Overflow (V) flag clear
HI              Unsigned higher (C flag set and Z flag clear)
LS              Unsigned lower or the same (C flag clear or Z flag set)
GE              Signed higher or the same (N flag the same as the V flag)
LT              Signed lower (N flag not the same as the V flag)
GT              Signed higher (Z flag clear and the N flag the same as the V flag)
LE              Signed lower or the same (Z flag set and the N flag not the same
as the V flag)
AL              Always
NV              Never

Obviously, you never need to use the AL suffix because unconditional
execution doesn't need a suffix. The use of NV is also frowned upon
because the bit-pattern it sets up might be used for some other
condition some day.

The BNE instruction in line 120 means 'Branch if not equal'. As long
as the CMP instruction in the previous line is comparing two different
numbers, it will keep the Z flag clear. When the value in R0 reaches
ASC(“Z”)+1, i.e. 91, the instruction will be comparing identical
numbers, so the Z flag will be set. Under this condition, the branch
instruction will not be executed and the program finishes.

We could have achieved the same result with the instruction:

BMI loopIn this case, the loop repeats as long as the N flag is set.
When operand 1 reaches the same value as operand 2, not only are they
equal but the result of the subtraction is no longer negative. Doing
it this way would be safer if R0 was not always incremented by 1 each
time. If for some reason R0 skipped the value in operand 2, the BNE
instruction would fail to stop the loop because the situation where
both values were equal would be missed. The BMI instruction, though,
would still catch it.

You've probably worked out by now what this program does, even if you
haven't run it to have a look. Each time round the loop, a character
is printed whose ASCII code is one higher than the one before. It
starts at 'A' and ends with 'Z'. In other words, it prints the

Loading and storing memorySo far we've seen how to move immediate
numbers into registers and move numbers between registers, but we
haven't loaded from or stored to the main memory.

Take a look at file Assem04. It works in a similar manner to the first
program, except that the ASCII code used to call the SWI is loaded
from a memory location instead of being moved into a register as an
immediate constant.


   10 REM > Assem04

   20 REM simple assembly language program


   40 DIM code% 100

   50 P%=code%

   60 [OPT 3

   70 .data

   80 EQUD &41

   90 .start

  100 ADR R1,data

  110 LDR R0,[R1]

  120 SWI “OS_WriteC”

  130 MOV PC,R14

  140 ]


  160 CALL start

The code byte is stored at a location pointed to by the label data,
and is put there by the EQUD command. This is one of several assembler
commands which put data into memory rather than assemble machine code
instructions. The full list is:

EQUB:           stores one byte,                e.g. EQUB &41
EQUW:           stores two bytes,               e.g. EQUW &0D0A
EQUD:           stores four bytes (one word),           e.g. EQUD &56F4D31A
EQUS:           stores a string,                e.g. EQUS “This is a string”

A word of explanation here. These terms were originally devised for
the 8-bit 6502 assembler built into the Basic interpreter in the BBC
Microcomputer in the early 1980s. In those days, the term 'word' was
used to mean 16 bits or two bytes; hence the term EQUW for two bytes.
The expression EQUD meant 'double word', or four bytes. When the
32-bit ARM processor was developed, it was decided that it would be
better for 'word' to mean four bytes, or 32 bits. The old expressions,
however, have been retained for compatibility.

Keeping things alignedIt will have occurred to you that line 80
appears to use EQUD to store one byte. There is a reason for doing it
this way. The address pointed to by label data is word-aligned, i.e.
divisible by four. If we used the EQUB command to store one byte, the
program counter, P%, would be incremented by one, not four, and so
would no longer be word-aligned. This is important because it would
mean that the machine code instruction in line 100 (and all the
subsequent instructions) were not word-aligned and the processor would
not cope with this.

Line 80 could have been written as:


The ALIGN command means, “If P% is not divisible by four, then
increment it until it is.” It is very important to use it if you add
something to the memory which doesn't consist of a multiple of four
bytes. We might, for example, wish to put in a string, terminated by a

EQUS “This is a string”:EQUB 0:ALIGNThe string itself contains 16
bytes: the zero increases this to 17. The ALIGN command then adds a
further three bytes to restore word-alignment.

Coming back to the original form of line 80, the expression EQUD &41
is effectively the same as EQUD &00000041. Four-byte numbers are
always stored in memory with the least significant byte in the lowest
of the four addresses. The byte pointed to by label data will contain
&41, and the next three bytes will contain zeros.

Memory addressingThe LDR instruction in line 110 loads the contents of
a memory location into register R0. It's not possible to put the
entire 32-bit address of label data into the instruction; the entire
instruction only consists of 32 bits, so we have to do it some other

The address of the location to be loaded from is contained within the
square brackets. In this case, it's pointed to by R1. It's also
possible to add an offset (either a second register or an immediate
constant) which comes after the register number but is still within
the brackets. We'll see how that is used later.

The form of the instruction in line 110 tells the processor to load
from the address pointed to by R1, and to put the contents into R0.

We set up R1 in line 100 with an ADR instruction. You may be wondering
how, if we can't get a complete 32-bit address into the LDR
instruction, we're able to do it with ADR. The answer is that ADR is
actually a pseudo-instruction: one which the assembler effectively
creates out of another instruction. In this case, it calculates the
difference between the required address and the current value of the
program counter and sets up an instruction to add or subtract this
difference to or from the PC. In other words, the address is stated
relative to the PC, not in absolute terms.

PipeliningAll this is transparent to the programmer when using the ADR
instruction except for one thing. There is a limit to how far away
from the instruction the target address can be because the offset
number can only have eight bits. It's actually possible to refer to an
address a little further forward from the ADR instruction than behind
it because of a feature called pipelining.

When the processor fetches an instruction from memory, it decodes it
while it's fetching the next one and executes it while fetching the
one after that. What this means is that, while an instruction is being
executed, the program counter has already moved on eight bytes to
fetch the instruction after next. If the instruction does something
relative to the PC, such a branch or ADR, it is doing it relative not
to its own address but to an address eight bytes further on. The
assembler always takes this into account when setting up such
instructions, but it's best to bear it in mind.

Loading bytes and wordsReturning to our program, the LDR instruction
always loads a complete word which, in this case, is &00000041. This
doesn't matter much in this program because the OS_WriteC SWI only
acts on the bottom byte of R0. If we wanted to load a single byte (for
example, if we had a string of ASCII codes), we could add a B suffix
to the LDR instruction, making it:

LDRB R0,[R1]Only the bottom eight bits would be loaded. This
instruction would work just as well in line 110 as the LDR

Note, by the way, that the B suffix goes after any condition code on
the instruction. If, for example, the above instruction was only to be
executed if the zero flag was clear, its mnemonic would be:

LDRNEB R0,[R1]It should be apparent now how this program works. The
address of the data word is set up in R1 and is used to load an ASCII
code into R0. The SWI is then called to print a character on the
screen and the program then exits.

A problemThe last line of the Basic part of the program has been
changed. The part to be executed no longer starts at the beginning of
the block at code%, but four bytes further on at the label called
start, so that is the address which we CALL. This may be inconvenient.
It may seem preferable to put the data label and the ASCII code at the
end of the code, as in the file Assem05:


   60 [OPT 3

   70 .start

   80 ADR R1,data

   90 LDR R0,[R1]

  100 SWI “OS_WriteC”

  110 MOV PC,R14

  120 .data

  130 EQUD &41

  140 ]

There is certainly nothing wrong with the assembly language
instructions here, but you will find if you try to run Assem05 that
you get an error message saying, “Unknown or missing variable at line

It's easy to work out what's going wrong. In line 80, the program has
to do something with the value of label data which, you will recall,
is a Basic variable. In the previous listing, this variable was
created in line 70, given the current value of P% and used in line
100. By the time the program reached line 100, it already knew the
value of variable data.

In this program, though, the variable data is used in line 80 but not
created until the program reaches line 120. How do we get round this

Two-pass assemblyThe answer is to assemble the code twice. On the
first 'pass' we create all the instructions but ignore any references
to labels we don't know about. The instruction will occupy its four
bytes but the numbers it contains may be wrong.

By the time we get to the end of the first pass, we should have met
all the labels. We can then go back and assemble the code again,
exactly as it was before, except that this time all the references to
labels should work (provided, of course, that we don't include a
reference to a label that doesn't exist!).

The easiest way to run a piece of code twice in Basic is with a FOR
… NEXT loop, and this is what we do in Assem06:


   10 REM > Assem06

   20 REM simple assembly language program


   40 DIM code% 100

   50 FOR pass%=0 TO 3 STEP 3

   60   P%=code%

   70   [OPT pass%

   80   .start

   90   ADR R1,data

  100   LDR R0,[R1]

  110   SWI “OS_WriteC”

  120   MOV PC,R14

  130   .data

  140   EQUD &41

  150   ]

  160 NEXT


  180 CALL start

Making use of OPTThe assembler has to work differently on the two
passes. The way it behaves is controlled by the OPT statement at the
start of the assembler section. In all the examples up to now, we've
left this set to 3.

The individual bits of the value of OPT control different aspects of
the assembler:

Bit 0:          If clear, the assembled listing is not shown on the screen;
if set, it is shown.
Bit 1:          If clear, unknown labels are ignored; if set, they cause an error.
Bit 2:          If clear, P% acts as both the program counter and a pointer
to where the machine code is assembled. If set, offset assembly is
used. P% then acts as the program counter but O% controls where in
memory the instruction is placed. Both variables are normally
incremented together.
Bit 3:          If set, a range check is applied to ensure that we don't try
to assemble more code than will fit into the data block which we
created to hold it. We can set L% to the upper limit and assembly will
stop if P% (or O%) exceeds it.

In this program, the FOR … NEXT loop creates two passes, the first
with OPT set to zero and the second with it set to 3. On the first
pass, the listing is not shown on the screen (we don't want to see it
twice!) and unknown labels are ignored. On the second pass, the
listing is shown and any references to non-existent labels cause an

Note that P% is set to code% inside the loop, so that it is reset at
the start of the second pass. It is important for both passes to start
in the same place.

This time, the program will work.

If we didn't want to see the assembled listing on either pass, we
could, of course, change line 50 to read:

50 FOR pass%=0 TO 2 STEP 2Both passes now will have bit 0 clear.

Offset assemblyAll the code assembled up to now has been run in the
memory buffer where it was assembled. Suppose, though, we wanted to
save the assembled code as an Absolute file (such as the !RunImage
file of an application). Suppose also that the code contained
references to addresses within it in absolute terms, rather than
relative to the program counter.

An Absolute file is loaded into memory starting at &8000 and run from
there. We would have to assemble the code in the data block, but its
contents would have to be as though it started at &8000.

We can do this using offset assembly, with bit 2 of OPT set on both
passes, as in Assem07:


   10 REM > Assem07

   20 REM simple assembly language program with offset assembly


   40 DIM code% 100

   50 FOR pass%=4 TO 7 STEP 3

   60   P%=&8000:O%=code%

   70   [OPT pass%

   80   .start

   90   ADR R1,data

  100   LDR R0,[R1]

  110   SWI “OS_WriteC”

  120   SWI “OS_Exit”

  130   .data

  140   EQUD &41

  150   ]

  160 NEXT


  180 OSCLI (“Save MyFile “+STR$~code%+” “+STR$~O%)

  190 *SetType MyFile Absolute

This time, we set P% to &8000 and O% to code%. Watch the assembled
listing on the screen as you run the program: instead of the numbers
on the left-hand side referring to addresses within Basic's variable
workspace, they now start at &8000.

Instead of calling the code and displaying the letter A on the screen,
the last part of the program saves the code as a file, after you've
pressed a key (you may wish to dispense with line 170). The OSCLI
command sets up a command line string of the form:

*Save Myfile xxxx yyyywhere xxxx is the start address (code% in hex
form) and yyyy is the address following the end of the program (O%
after assembly has finished).

The file will be saved in your currently selected directory and will
run if you double-click on it.

There is, incidentally, an important difference between the assembly
language in this file and that in the previous one. Because the
program is not CALLed from Basic, but run as an absolute file, it
doesn't have a return address passed to it in R14, so it can't finish
with MOV PC,R14. Instead, it calls SWI OS_Exit, which passes control
straight back to the operating system.

PC-relative addressingGetting a number into R0 in the previous example
involved using a second register, R1. This can be a bit cumbersome;
you may not have a register to spare for this job if your program is
complex, and it takes two instructions to get a number from memory
into a register.

If you've finished with the address once you've loaded the data from
it, there's nothing wrong with the following:

ADR R0,data
LDR R0,[R0]

R0 is first set up to point to the address. The address is then
overwritten by the data itself. This gets rid of the extra register
but it still takes two instructions, the first of which sets up the
address by referring to the program counter.

We can combine the two instructions into one, which looks like this:

LDR R0,dataThis is really a pseudo-instruction, like ADR. The
assembler turns it into an instruction to load from an address
relative to the program counter, which involves using indexed
addressing; something we shall look at next.

Pre-indexed addressingAs we heard earlier, it's possible to put two
parameters between the square brackets in a LDR instruction. The first
one (the base) has to be a register, but the second (the offset) may
be another register or an immediate constant. The processor adds them
together to get the address to load from.

An example:

LDR R0,[R1,R2]R0 is loaded from the address obtained by adding the
contents of R1 and R2.

We might have a label called data which points to the start of eight
bytes of data. We want to load the first four-byte word into R0 and
the second into R1:

ADR R2,data
LDR R0,[R2]
LDR R1,[R2,#4]

This is especially useful if we want to load repeatedly from
successive addresses, using a loop.

Look at file Assem08:


   70 [OPT pass%

   80 ;set up R1 to point to text string

   90 ADR R1,string

  100 .loop

  110 LDRB R0,[R1];load one character

  120 ADD R1,R1,#1;increment R1 ready for next character

  130 CMP R0,#0;check for terminating zero

  140 ;next two instructions executed only if end of string not yet reached

  150 SWINE “OS_WriteC”

  160 BNE loop

  170 MOV PC,R14

  180 .string

  190 EQUB &0A:EQUS “This is a string”:EQUB &0A:EQUB 0:ALIGN

  200 ]

From now on, we'll only show the part of the program between the
square brackets which turn the assembler on and off, except where
necessary. This is because the Basic parts remain the same as before.
One change which has been made starting with Assem08, though, is that
the REPEAT UNTIL GET loop has been removed: the program assembles the
code and executes it immediately.

You'll also notice that we've started adding comments because the code
is getting more complicated.

Printing a stringReturning to our latest listing, this is one of
several programs which print a string one character at a time, using a
call to SWI OS_WriteC for each character. It's not actually necessary
to do this in practice; one call to OS_Write0 will achieve the same

Let's take a look at the string first. This is contained in several
statements in line 190, starting with a LF character to create a blank
line. The text of the string is in the EQUS statement (you could
change it to anything you like!) and is followed by another LF. After
this comes a null character (zero) which marks the end of the string.
Last of all, we have an ALIGN instruction to ensure that whatever
comes next is word-aligned. It's not actually necessary in this case,
because nothing follows the string, but it's a good habit to get into.

This version of the program works in the simplest possible way.
Register R1 is set up to point to the first character, which is loaded
into R0. Note that we use LDRB, not LDR, as we are only loading one
eight-bit ASCII character, which goes into the bottom byte of R0.
After loading, we increment R1 by one to point to the next character.

We check the character we've just loaded, using the CMP instruction,
to see if it is zero. If it is not, we print it and branch back. Note
that the instructions in lines 150 and 160 which do this only do so if
the character is not zero, due to the NE suffix on their mnemonics.
Once the terminating zero has been loaded, we get to line 170 and the
program exits.

This isn't actually indexed addressing; we're just using an address in
R1 and incrementing it each time we want to read another character.
It's possible that we might want to keep R1 pointing to the start of
the string, perhaps so that we can load it again. To see how we could
do this, look at file Assem09:


   70 [OPT pass%

   80 ;set up R1 to point to start of text string

   90 ADR R1,string

  100 MOV R2,#0;set up R2 to index first character of string

  110 .loop

  120 LDRB R0,[R1,R2];load one character

  130 ADD R2,R2,#1;increment R2 ready for next character

  140 CMP R0,#0;check for terminating zero

  150 ;next two instructions executed only if end of string not yet reached

  160 SWINE “OS_WriteC”

  170 BNE loop

  180 MOV PC,R14

  190 .string

  200 EQUB &0A:EQUS “This is a string”:EQUB &0A:EQUB 0:ALIGN

  210 ]

This time, we set up R1 to point to the start of the string and R2 to
select an individual character within the string; we say that R2
indexes a character, starting with the one that's zero bytes in (i.e.
the first one).

In line 120, we load a byte from the address obtained from the values
of R1 + R2. If we haven't reached the terminating zero, we increment
R2 for the next character.

Write backWe can streamline the loop a little by combining the loading
and incrementing instructions, using a facility called write back, as
shown in listing Assem10:


   70 [OPT pass%

   80 ;set up R1 to point to one byte before start of text string

   90 ADR R1,string-1

  100 .loop

  110 LDRB R0,[R1,#1]!;load one character

  120 CMP R0,#0;check for terminating zero

  130 ;next two instructions executed only if end of string not yet reached

  140 SWINE “OS_WriteC”

  150 BNE loop

  160 MOV PC,R14

  170 .string

  180 EQUB &0A:EQUS “This is a string”:EQUB &0A:EQUB 0:ALIGN

  190 ]

Note the pling (!) on the end of the LDRB instruction in line 110. We
derive the address to load from by adding the immediate constant (1 in
this case) to the value of R1. After doing the loading, this value is
written back into R1, due to the presence of the pling. The effect of
this is that R1 is incremented each time the instruction is executed.

Because R1 + 1 points to the next character to be loaded, R1 has to be
set up initially to point to one byte before the string starts.

Post-indexed addressingThe examples we've just been looking at used
pre-indexed addressing. This means that the two parameters in the LDR
(or LDRB) instruction are added together before data is loaded from

An alternative technique is post-indexed addressing, which is used in
listing Assem11:


   70 [OPT pass%

   80 ;set up R1 to point to start of text string

   90 ADR R1,string

  100 .loop

  110 LDRB R0,[R1],#1;load one character, then increment R1

  120 CMP R0,#0;check for terminating zero

  130 ;next two instructions executed only if end of string not yet reached

  140 SWINE “OS_WriteC”

  150 BNE loop

  160 MOV PC,R14

  170 .string

  180 EQUB &0A:EQUS “This is a string”:EQUB &0A:EQUB 0:ALIGN

  190 ]

The instruction in line 110 still has two parameters in its operand,
but one of them is now outside the square brackets. In this case, the
data to be loaded is pointed to by R1 on its own, and R1 is
incremented by having the second parameter added to it after the
loading has been done. There is no pling suffix because write back is
implicit in post-indexed addressing.

StoringWe've only seen examples of data being loaded from memory so
far. Storing a number is just the same, and is done with a STR
instruction to store a word, or STRB to store one byte. The various
forms of addressing that we've seen all work in the same way.

Subroutines and the stackIf you've programmed in Basic, or any other
high-level language, you'll be aware of the advantages of structured
programming and the way programs can be broken down into smaller units
by using functions and procedures.

The equivalent in assembly language is the subroutine, called using a
BL instruction.

Take a look at the following listing. You won't find it as a file to
be run from the CD, for a reason which will become apparent shortly.

   10 REM > Assem12a

   20 REM use of subroutine to multiply by six


   40 DIM code% 100

   50 FOR pass%=0 TO 3 STEP 3

   60   P%=code%

   70   [OPT pass%

   80   LDR R0,buf;get number passed from Basic via buffer

   90   BL times_six;call subroutine to multiply

  100   STR R0,buf;deposit answer in buffer for Basic to find

  110   MOV PC,R14

  120   ;

  130   ;subroutine to multiply value of R0 by six

  140   .times_six

  150   ADD R0,R0,R0,LSL #1;multiply by three

  160   MOV R0,R0,LSL #1;multiply by two

  170   MOV PC,R14

  180   ;

  190   .buf EQUD 0

  200   ]

  210 NEXT

  220 REPEAT

  230   INPUT “Give me a number “a%

  240   !buf=a%

  250   CALL code%

  260   PRINT !buf


This is a program to multiply a number, entered by the user, by six
and print it on the screen. To avoid writing a long and complicated
assembly language program, most of the work is done by Basic and the
machine code part just does the multiplication.

The assembled code includes a one-word buffer, pointed to by label
buf. Because buf is a Basic variable, is can be used in the Basic part
of the program which follows the assembly. After assembling the code,
the loop is entered. The number INPUTted is put into the buffer for
the machine code to find when it is called. The machine code
multiplies the number by six (we'll see how later) and puts the answer
back into the buffer for the Basic part to find and print. The program
is terminated by pressing Escape.

Looking now at the assembly language, in line 80 the number is loaded
into R0 from the buffer, using PC-relative addressing. The following
instruction calls the subroutine. You will recall from earlier that
the BL instruction causes the address of the next instruction to be
copied into R14, known as the link register, so that the program can
return to the right point when the subroutine has finished.

Don't worry for now how the subroutine works; we'll look at it later.
For now, think of it as a 'black box' which returns a value in R0 six
times the original.

When the program returns to line 100, the new value of R0 is stored in
the buffer and the machine code part exits, back to Basic.

Have you spotted a problem here? The reason that this particular
listing is shown in this article but isn't included in the files to be
run is that, if you did run it, your computer would crash.

As far as Basic is concerned, the whole machine code part of the
program is a subroutine. When it got to the CALL command, it branched
to the assembled code with a BL instruction, putting its return
address in R14. We give control back to Basic by copying R14 back into
the program counter.

Unfortunately, we've used a BL instruction ourselves in line 90,
putting the address of the following STR instruction into R14 and
thereby overwriting the return address previously put there by Basic.
When we get to the final exit point at line 110, the address of the
line 100 instruction will still be there. Instead of handing control
back to Basic, the program will jump back to the instruction at line
100 and go into an infinite loop, possibly requiring you to reset your

In effect, we have nested subroutines. We may wish to go further and
have subroutines which call other subroutines and so on, so we need a
way to store R14 and recover it later. If a subroutine uses other
registers, we may wish to store them as well and recover their values
when the subroutine finishes.

What we need is a stack: an area of memory used for temporary storage.

What's a stack?A good analogy of a stack is a column of building
blocks with numbers on them. Whenever we want to store a number, we
put its block on top of the column. When we take blocks off the top,
we take them in the reverse order to the order in which they were put
on. If, for example, we placed number x, then y, then z on the column,
the first number we retrieved would be z, followed by y, followed by
x. For obvious reasons, this is known as a 'LIFO' (Last In, First Out)
stack. The height of the column is limited only by the space available
to contain it.

In software, our column of blocks is, of course, a section of memory.
The stack can grow either upwards from the bottom or downwards from
the top. In fact, the analogy starts to break down here because most
stacks grow downwards, and if our column of blocks were like a
computer stack, it would have to be hanging from the ceiling with new
blocks being stuck on the bottom!

The address where new data can be stored on the stack is contained in
the stack pointer. It is customary to use R13 for this purpose. It
could point to either the first free address or the last address that
was used.

If we are running our program from Basic with the CALL command, we can
use part of Basic's stack. There should be plenty of it to spare,
unless your machine is short on memory and the program only just fits,
as it can occupy the space between HIMEM and the top of the variables.
If you are writing a program to be run as an Absolute file, like our
earlier listing, Assem07, you will have to set up your own block of
memory to act as the stack and set R13 to point to it.

The LDM and STM instructionsThe LDM and STM instructions load and
store multiple registers, using the address in one register as a basis
for where to put the data. There is a range of options for how these
instructions will work; for example:

STMIA R4!,{R0-R3}stores the contents of R0, R1, R2 and R3 in that
order, starting with R0 at the address pointed to by R4. The I suffix
means that the address is incremented as each register is stored, and
the A suffix that it is incremented after each store, rather than
before. The final address is written back into R4 because of the !, so
R4 ends up pointing to the address one word above where R3 is stored.

To read the data out again, you could use:

LDMDB R4!,{R0-R3}This time, the D suffix means that the address is
decremented as each register is loaded and the B suffix that this
happens before the load.

The list of registers between the curly brackets can include
individual registers separated by commas, e.g. {R0,R3,R5}, or a
continuous range, using a hyphen as in the example above, or a
combination of the two.

Foolproof stacksThere is considerable scope for getting the above
instructions wrong! Fortunately, there is a set of pseudo-instructions
which can be used in place of them to implement a LIFO stack.
Registers can be pushed onto the stack and pulled off again in reverse

As we saw earlier, a stack can either start at the bottom of the
memory block and work upwards (an ascending stack) or at the top and
work downwards (a descending stack). It can be full, where the stack
pointer (usually R13) points to the address where the last register
was stored; or empty, where R13 points to the first free location.

You can implement these options with the following pseudo-instructions:

STMEA, LDMEA            empty ascending stack
STMED, LDMED            empty descending stack
STMFA, LDMFA            full ascending stack
STMFD, LDMFD            full descending stack (the most commonly used type of stack)

The great advantage of these pseudo-instructions is that the same
instruction is used for pushing something onto the stack and for
pulling it off again, with just the first two letters changing. For
example, you can push all the registers except the program counter and
stack pointer onto the stack with:

STMFD R13!,{R0-R12,R14}and pull them off again with:

LDMFD R13!,{R0-R12,R14}

Now look at file Assem12:


   10 REM > Assem12

   20 REM use of subroutine to multiply by six and use of stack


   40 DIM code% 100

   50 FOR pass%=0 TO 3 STEP 3

   60   P%=code%

   70   [OPT pass%

   80   STMFD R13!,{R14};save R14 on stack

   90   LDR R0,buf;get number passed from Basic via buffer

  100   BL times_six;call subroutine to multiply

  110   STR R0,buf;deposit answer in buffer for Basic to find

  120   LDMFD R13!,{R14};restore R14 from stack

  130   MOV PC,R14

  140   ;

  150   ;subroutine to multiply value of R0 by six

  160   .times_six

  170   ADD R0,R0,R0,LSL #1;multiply by three

  180   MOV R0,R0,LSL #1;multiply by two

  190   MOV PC,R14

  200   ;

  210   .buf EQUD 0

  220   ]

  230 NEXT

  240 REPEAT

  250   INPUT “Give me a number “a%

  260   !buf=a%

  270   CALL code%

  280   PRINT !buf


As you can see, two extra instructions have been added, at lines 80
and 120. The value of R14, passed to our program by Basic, is now
protected by being stored on the stack, so it doesn't matter if we use
R14 when calling the subroutine, or make any other use of it, for that
matter, provided we pull it off the stack again when we've finished.

Although we're only storing one register on the stack, it is still
worth using the 'store multiple' and 'load multiple' instructions to
do so because they make it easier to control the stack pointer
register, R13. Before R14 is pushed onto the stack, R13 is decremented
to point to an unused address for it to go into. We might wish to push
more registers onto the stack before we pull R14 off again. They would
go into addresses below the one where we just pushed R14, and should
be pulled off again (in reverse order) before we pull R14 off at line

In this example, we restore the value of R14, then transfer it to the
program counter in line 130 to return to Basic. It's not really
necessary to do this as two separate steps; there is no reason why we
couldn't pull the return address off the stack and put it straight
into the program counter, instead of going via R14. Lines 120 and 130
could therefore be combined into:


Repeating the earlier example, we can save all the registers except
the program counter and stack pointer at the start of a subroutine

STMFD R13!,{R0-R12,R14}and restore them and return to the main program with:

LDMFD R13!,{R0-R12,PC}

Shifting bits sidewaysNow we'll examine the subroutine in listing
Assem12, which multiplies the number in R0 by six.

All the work is done in lines 170 and 180. The first line looks very elaborate:

ADD R0,R0,R0,LSL #1

Basically, this is an instruction to add R0 (as operand one) to R0 (as
operand two) and put the result in R0 (as the destination register);
in other words, to double the value of R0. There's an extra bit on the
end of the instruction, though: the LSL #1 part.

LSL stands for 'Logical Shift Left', and means that all the bits of
operand two are shifted to the left, in this case by one place,
replacing the lowest bit with zero. The equivalent of this instruction
in Basic would be:


The effect of shifting all the bits in operand two by one place to the
left in binary arithmetic is, of course, to double the number. Adding
it to operand one has the overall effect of multiplying the value in
R0 by three.

The second line is a bit simpler:

MOV R0,R0,LSL #1

This instruction simply replaces the value in R0 by itself, but
shifted one place to the left and thus doubled, as in:

a%=a%<<1in Basic.

So the effect of the two instructions is to multiply the value of R0 by six.

There are other types of shift:

LSR ('Logical Shift Right'): all the bits are shifted to the right,
the highest bit(s) being replaced by zeros.

ASR ('Arithmetic Shift Right'): like LSR except that the highest bit
is replaced by whatever was there before (0 or 1). This is to preserve
the sign of signed numbers.

In the shifts listed so far, the bit which 'falls out of the end' of
the register is moved into the carry flag.

ROR ('ROtate Right'): the bits are shifted to the right, and bit 0 is
copied into bit 31.

RRX ('Rotate Right eXtended;): the same as ROR except that the carry
flag acts as an extra bit.

The shift instruction may be followed by either an immediate constant,
as in listing Assem12, or a register which contains the number of
positions to be shifted.

Further readingThere are 25 different ARM instructions and we've
covered roughly ten so far. The remainder consist chiefly of
arithmetic instructions; as well as ADD, there is ADC (add with
carry), SUB (subtract) and MUL (multiply) and several others. There
are also several bit-manipulation instructions (AND, ORR, EOR etc.)
which are similar to their Basic equivalents.

A full list of these instructions can be found in Guttorm Vik's
StrongHelp assembly language manual, which is an excellent reference
source for this subject.

In a future article, we'll be looking at assembly language in action
by examining the source code for the IClear module. This module
enables the text in a writable icon to be cleared and replaced by new
text by double-clicking on the icon and typing a new character. We'll
be looking at an upgraded version of IClear which will be published
here for the first time.

StrongHelp assembly language manual

If you have not already installed StrongHelp on your system, then you
will need to do so in order to access the manual provided here. Use
the icons to the right either to access StrongHelp for installation
onto your computer, or to run a copy directly from the CD.

See     Run

The icons to the right access the StrongHelp Assembly manual. If your
system has already 'seen' a copy of StrongHelp, clicking the 'Run'
icon will launch the manual into it; otherwise, clicking the icon will
produce an error message.

See     Run

Best regards,
Mike Hewitt

PC Recycler Ltd.
29-35 Ripon road
Blackpool FY1 4DY
01253 293258
07711 736899

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s