Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Furber S.ARM system-on-chip architecture.2000

.pdf
Скачиваний:
89
Добавлен:
23.08.2013
Размер:
18.35 Mб
Скачать

CP15 MMU registers

299

300

Architectural Support for Operating Systems

Bits [3:0] contain a revision number, bits [15:4] contain a 3-digit part number in

• Register 0 (which is read-only) returns identification information.

binary-coded decimal, bits [23:16] contain the architecture version ('A' = 0 for version 3, 'A' = 1 for version 4) and bits [31:24] contain the ASCII code of an implementer's trademark (ASCII 'A' = 41 ]6 indicates ARM Limited, ’Ď’ = 4416 indicates Digital, and so on).

Some CPUs do not follow the above register 0 format exactly, and recent CPUs have a second register 0 (accessed by changing the Cop2 field in the MRC instruction) which gives details on the cache organization.

Register 1 (which is write-only in architecture version 3 but read-write in version

4)contains several bits of control information which enable system functions and control system parameters.

All bits are cleared on reset. If subsequently set, M enables the MMU, A enables address alignment fault checking, C enables the data or unified cache, W enables the write buffer, P switches from 26to 32-bit exception handling, D switches from 26to 32-bit address range, L switches to late abort timing, B switches from littleto big-endian byte ordering, S and R modify the MMU system and ROM protection states, F controls the speed of external coprocessor communications, Z enables branch prediction, I enables the instruction cache when this is separate from the data cache, V moves the exception vector base from 0x00000000 to OxfffroOOO and RR controls the cache replacement algorithm (pseudo-random or round-robin). Note that not all bits are provided in all implementations.

Bits [31:15] are unpredictable on read and should be preserved using read-modify-write accesses. Bits [31:30] are used in the ARM920 and ARM940 for clock control functions, for example.

Register 2 (which is write-only in architecture version 3 but read-write in version

4)contains the address of the start of the currently active first-level translation table. This must be aligned on a 16 Kbyte boundary.

CP15 MMU registers

301

Register 3 (which is write-only in architecture version 3 but read-write in version 4) contains sixteen 2-bit fields, each specifying the access permissions for one of the 16 domains. See 'Domains' on page 302 for further details.

Register 5 (which is read-write in architecture version 4, but in version 3 it is read-only and writing to it flushes the whole TLB) indicates the type of fault and the domain of the last data access that aborted. D is set on a data breakpoint.

Register 6 (which is read-write in architecture version 4, but in version 3 it is read-only and writing to it flushes a particular TLB entry) contains the address of the last data access that aborted.

Register 7 (which is read-write in architecture version 4, but in version 3 it is write-only and simply flushes the cache) is used to perform a number of cache, write buffer, prefetch buffer and branch target cache clean and/or flush oper ations. The data supplied should be either zero or a relevant virtual address.

Accesses to register 7 use the Cop2 and CRm fields to specify particular operations; the available functions vary from implementation to implementation.

Register 8 (which is read-write in architecture version 4 and unavailable in ver sion 3) is used to perform a number of TLB operations, flushing single entries or the whole TLB and supporting unified or separate instruction and data TLBs.

Register 9 is used to control the read buffer, if one is present. In some CPUs it is used to control cache lockdown functions.

Register 10 is used to control TLB lockdown functions where these are sup ported.

Register 13 is used to remap virtual addresses through a process ID register. This mechanism is used to support Windows CE and is only present on particular

302

Architectural Support for Operating Systems

CPUs such as the ARM720T, the ARM920T and the S A-1100. If bits [31:25] of the virtual address are zero they are replaced with bits [31:25] of this register.

Register 14 is used for debug support.

Register 15 is used for test and in some CPUs for clock control purposes.

11.6ARM MMU architecture

Memory granularity

Domains

An MMU performs two primary functions:

It translates virtual addresses into physical addresses.

It controls memory access permissions, aborting illegal accesses.

The ARM MMU uses a 2-level page table with table-walking hardware and a TLB which stores recently used page translations. Where the processor has separate instruction and data caches it is likely also to have separate instruction and data TLBs.

The memory mapping is performed at several different granularities by the same basic mechanism. The units that can be used are:

Sections. These are 1 Mbyte blocks of memory.

Large pages. These are 64 Kbyte blocks of memory, and within a large page access control is applied to individual 16 Kbyte subpages.

Small pages. These are 4 Kbyte blocks of memory, and within a small page access control is applied to individual 1 Kbyte subpages.

Tiny pages. Some of the latest CPUs also support 1 Kbyte 'tiny' pages.

The normal granularity is the 4 Kbyte small page. Large pages and sections exist to allow the mapping of large data areas with a single TLB entry. Forcing a large data area to be mapped in small pages can, under certain circumstances, cause the TLB to perform inefficiently.

Domains are an unusual feature of the ARM MMU architecture. A domain is a group of sections and/or pages which have particular access permissions. This allows a number of different processes to run with the same translation tables while retaining some protection from each other. It gives a much more lightweight process switch mechanism than is possible if each process must have its own translation tables.

ARM MMU architecture

303

The access control is based on two sorts of programs:

Clients are users of domains and must observe the access permissions of the individual sections and pages that make up the domain.

Managers are the controllers of the domain and can bypass the access

permissions of individual sections or pages.

 

 

Table 11.4

Domain access control bits.

 

 

 

 

 

Value

Status

Description

 

 

 

 

 

 

00

No access

Any access will generate a domain fault

 

01

Client

Page and section permission bits are checked

 

10

Reserved

Do not use

 

 

11

Manager

Page and section permission bits are not checked

 

 

 

 

some domains, a manager of some

 

 

At any one time a program may be a client of

 

 

other domains and have no access at all to the remaining domains. This is controlled

 

 

by CP15 register 3 which contains two bits for each of the 16 domains describing the

 

 

status of the current program with respect to each domain. The interpretation of the

 

 

two bits is given in Table 11.4. The relationship of a program to all of the domains can

 

 

be changed by writing a single new value into CP15 register 3.

Translatio

 

The translation of a new virtual address always begins with a first-level fetch. (We

n process

ignore for now the TLB, which is only a cache to accelerate the process described

 

 

below.) This uses the translation base address held in CP15 register 2. Bits [31:14]

 

 

of the translation base register are concatenated with bits [31:20] of the virtual

 

 

address to form a memory address which is used to access the first-level descriptor

 

 

as shown in Figure 11.3 on page 304.

 

 

The first-level descriptor may be either a section descriptor or a pointer to a

 

 

second-level page table depending on its bottom two bits. '01' indicates a pointer to

 

 

a second-level coarse page table; '10' indicates a section descriptor; '11' indicates a

 

 

pointer to a second-level fine page table (only supported by certain CPUs). '00'

 

 

should be used to indicate a descriptor that causes a translation fault.

Section

 

Where the first-level descriptor indicates that the virtual address translates into a

translation

section, the domain ('Domain' in the section descriptor) is checked and, if the cur-

 

 

rent process is a client of the domain, the access permissions ('AP' in the section

descriptor) are also checked. If the access is permissible, the memory address is formed by concatenating bits [31:20] of the section descriptor with bits [19:0] of the

304

Architectural Support for Operating Systems

Figure 11.3 First-level translation fetch.

 

virtual address. This address is used to access the data in memory. The full section

 

translation sequence is shown in Figure 11.4 on page 305.

 

The operation of the access permission bits (AP) is described in 'Access permis-

 

sions' on page 305, and the operation of the bufferable (B) and cacheable (C) bits is

 

described in 'Cache and write buffer control' on page 308.

Page translation

Where the first-level descriptor indicates that the virtual address translates into a

 

page, a further access is required to a second-level page table. The address of a

 

second-level coarse page descriptor is formed by concatenating bits [31:10] of the

 

first-level descriptor to bits [19:12] of the virtual address. The address of a

 

second-level fine page descriptor is formed by concatenating bits [31:12] of the

 

first-level descriptor to bits [19:10] of the virtual address.

 

The second-level coarse page descriptor may be a large (64 Kbyte) page descriptor

 

or a small (4 Kbyte) page descriptor, depending on its bottom two bits. '01' indicates a

 

large page; '10' indicates a small page. Other values are trapped, and '00' should be

 

used to generate a translation fault; '11' should not be used. A second-level fine page

 

descriptor may also be a tiny (1 Kbyte) page descriptor, indicated by ' 11' in its bottom

 

two bits, or it may be a large or small page descriptor as above.

 

A small page base address is held in bits [31:12] of the page descriptor. Bits [11:4]

 

contain two access permission bits ('APO-3') for each of the four subpages, where a

 

subpage is a quarter of the size of the page. Bits [3:2] contain the 'bufferable' and

 

'cacheable' bits. (Bits marked '?' have implementation-specific uses.)

ARM MMU architecture

305

Figure 11.4 Section translation sequence.

 

The overall translation sequence for a small page is shown in Figure 11.5 on

 

page 306. The translation sequence for a large page is similar except bits [15:12] of

 

the virtual address are used both in the page table index and in the page offset. Each

 

page table entry for a large page must therefore be copied 16 times in the page table

 

for every value of these bits in the page table index.

 

The tiny page translation scheme is also similar, but must start from a fine

 

first-level descriptor. Tiny pages do not support subpages and therefore there is only

 

one set of access permissions in the second-level descriptor.

Access

The AP bits for each section or subpage are used together with the domain informa-

permissions

tion in the first-level descriptor, the domain control information in CP15 register 3,

 

the S and R control bits in CP15 register 1 and the user/supervisor state of the pro-

 

cessor to determine whether a read or write access to the addressed location is per-

 

missible. The permission checking operation proceeds as follows:

306

Architectural Support for Operating Systems

Figure 11.5 Small page translation sequence.

1.If alignment checking is enabled (bit 1 of CP15 register 1 is set) check the address alignment and fault if misaligned (that is, if a word is not aligned on a 4- byte boundary or a half-word is not aligned on a 2-byte boundary).

2.Identify the domain of the addressed location from bits [8:5] of the first-level descriptor. (Fetching the first-level descriptor will fault if the descriptor is invalid.)

3.Check in CP15 register 3, the domain access control register, whether the current process is a client or manager of this domain; if neither, fault here.

Figure 11.6 Access permission checking scheme.

 

 

 

Table 11.5

Access permissions.

 

 

 

 

 

AP

s

R

Supervisor

User

 

 

 

 

 

00

0

0

No access

No access

00

1

0

Read only

No access

00

0

1

Read only

Read only

00

1

1

 

Do not use

Read/write

No access

01

-

-

10

-

-

Read/write

Read only

11

-

-

Read/write

Read/ write

 

 

 

 

 

308

Architectural Support for Operating Systems

Cache and write buffer control

External faults

4.If a manager of this domain, proceed ignoring access permissions. If a client, check the access permissions against Table 11.5 on page 307 using the S and R bits from CP15 register 1. Fault if access is not permitted, otherwise continue to access data.

The permission checking scheme is illustrated in Figure 11.6 on page 307 which shows the various faults that can be generated in the course of an address translation. The MMU may generate alignment, translation, domain and permission faults. In addition, the external memory system may fault on cache line fetches (though not all CPUs support this), uncached or unbuffered accesses (aborts on buffered writes are not supported) and translation table accesses. These faults are all called aborts and are handled by the processor as prefetch or data abort exceptions, depending on whether the access was for an instruction or for data.

A fault on a data access causes the fault status register (CP15 register 5) and the fault address register (CP15 register 6) to be updated to provide information on the cause and location of the fault. A fault on an instruction access only causes an exception if and when the instruction is executed (it may not be executed since it may be fetched just after a taken branch), and it does not update the fault status and address registers. The fault address may be deduced from the return address in the link register.

The C and B bits in the section and second-level page descriptors control whether the data in the section or page may be copied into a cache and/or written back to memory through a write buffer.

Where the cache uses a write-through scheme, C controls whether or not the data is cacheable and B controls whether or not writes may be buffered. Where the cache uses a copy-back scheme the 'cached, unbuffered' combination may alternatively be used to specify a 'write-through, buffered' behaviour. (This cache terminology is described in 'Write strategies' on page 278.)

Note that the processor cannot recover from external faults signalled on buffered writes, because by the time the fault is signalled the processor may have executed several instructions and is therefore unable to recover its state to retry the faulting store instruction. Where recovery is required (for example, to allow the processor to retry a store instruction following a bus fault) unbuffered writes must be used.

In typical ARM applications there are no potentially recoverable sources of external faults, so this is not an issue.