# CS:APP--Chapter02 : representing and manipulating information

标签（空格分隔）： CS:APP

- CS:APP--Chapter02 : representing and manipulating information
- 1.information storage
- 1.1 hexadecimal notation
- 1.1.1 hexadecimal representation and addition

- 1.2 data sizes
- 1.2.1 word and the range of virtual address memory
- 1.2.2 multiple data sizes of number
- 1.2.3 addressing and byte ordering(寻址和字节排序)
- a.addressing
- b.byte ordering

- 1.3 logic algebra
- 1.4 bit-level operations in C
- 1.5 logic operation in C
- 1.6 shift operation in C

- 1.1 hexadecimal notation
- 2. integer representation
- 2.1 unsigned integer
- 2.2 signed integer
- 2.3 the conversion between signed and unsigned number
- 3.3 two's complement addition
- 3.4 two's complement negation

- 1.information storage

Most computers now *store and process* information with binary/two-valued singals/data.

For the representation of information/numbers,we consider **Three** most important representations of numbers:

- unsigned
- two's-complement
- float-point

succeeding by the details of arithmetic operation in which we indeed understand why our codes can generate some peculiar results and learn to optimize our codes.

## 1.information storage

First of all,byte,the smallest addressable unit of memory,consists of 8 bits,and main memory(later virtually described by **virtual address space**) are treated as an array of bytes (like a linear list).

The softwares running on a computer can be treated as one program,so all the information we need to consider are the program objects among these softwares.

->It simply treats each program object as a block of bytes and program as a sequence of bytes.

Then

->Each byte in main memory is labelled by a unique number,known as its address.

**keyword:**[1.virtual address space 2.byte:8bits 3.bit->addressed along virtual address space]

### 1.1 hexadecimal notation

Even if computers treat data as a run of binary numbers, it is still so tedious to understand these 0 and 1 bunches by humankind.

Hexadecimal can be handily converted into binary, vice-versa.

There are some special rules for mapping hexadecimal to decimal and binary respectively.

(For 0-9, hexadecimal is same as decimal does in terms of the representation and mapping to binaray.)

hexadecimal | decimal | binary |
---|---|---|

A | 10 | 1010 |

B | 11 | 1011 |

C | 12 | 1100 |

D | 13 | 1101 |

E | 14 | 1110 |

F | 15 | 1111 |

Conversely, converting a string of binary to hexadecimal is to divide it into groups of 4 bits each. Then if the length of the binary is not a multiple of 4 resulting in the absence of the beginning elements of the leftmost group, just padding the number with leading zeros. (**zero extension**:perserve itself value.)

#### 1.1.1 hexadecimal representation and addition

0x1111

0x1234

Technique: preceding the hexadecimal by *0x*

Addition:

carry it as same as a decimal does! add 1 to the next high-order bit.

**keywords:**[1.hexadecimal 2.hexadecimal representation 3,conversion between H and B]

### 1.2 data sizes

#### 1.2.1 word and the range of virtual address memory

As for the most important parameter in a computer,**word** size defines how far our computer can reach our computer. That mean, if one computer defines a word as w bits, its virtual address automatically ranges from 0 to (2^w-1) with the capacity of (2^w) bytes.

As a result of different word sizes, programs always are labelled by 32-bits or 64-bits.64 bits computer can execute both programs but 32-bits computer is compatible with only 32-bit program.

#### 1.2.2 multiple data sizes of number

computer and compilers supports various data size to effectively represent integer and float.

often data size depends on how the program is compiled.So let's shed light on the data size defined in C languagae.

(C only define the minimum size of each data type.)

Different size on different computer may result in bug because of compatibility of data type.

In oorder to standardize data size,C also provides fix-sized data type denoted as int32_t,int64_t and so on.

#### 1.2.3 addressing and byte ordering(寻址和字节排序)

Like the same two questions when we is confronted to how to restore data we fetch and is written into memory:

- where is these data? ----addressing
- how we will order the bytes ?(my way:how to find them correctly?) -- byte ordering

##### a.addressing

Generally speaking,a multi-bytes object occupies a block of continuous units/bytes in virtual address space,with the address of the objects given by the smallest address of the unit/bytes used.

##### b.byte ordering

There are two conventions named as **big endian && little endian.**

Given a bit-vector of length (w,[x_{w-1},x_{w-2},...,x_0])

1.little endian:restore this number in memory ordered from the lest significant byte to the least.

2.big endian:restore this number in memory ordered from the most significant bytes to the least.

**One problem:**

byte ordering becomes an issue:

- data communicated over the network between the little-endian and big-endian devices.
- ?
- ?

### 1.3 logic algebra

It's hard to imagine how the binary can perform the function such addition,division,multiplication,subtraction and so on.

->A rich body of mathematical knowledge has evolved around the study of the value 1 and 0.

There are four basic operations and truth tables in logic algebra:

1.(neg)

(neg) | 0 | 1 |
---|---|---|

- | 1 | 0 |

2.(&)

(&) | 0 | 1 |
---|---|---|

0 | 0 | 0 |

1 | 0 | 1 |

3.(|)

|(|)|0|1|

|--|--|--|

|0|0|1|

|1|1|1|

4.^

^ | 0 | 1 |
---|---|---|

0 | 0 | 1 |

1 | 1 | 0 |

### 1.4 bit-level operations in C

| & ~

Apart from logic operation bit by bit,mask operation is always used in these two operation:

1.preserve some pattern

**perserve the three high-order bits**

&:1110000...0000

|:0001111...1111

2.extract some pattern

### 1.5 logic operation in C

&& || !

### 1.6 shift operation in C

>>

<<

## 2. integer representation

Now we move on to the representation of integers which is divided into two types:

- unsigned integer
- sign(two's-complement) integer

### 2.1 unsigned integer

assume (vec{x}) is a vector of bits beginning from right with the index of 0 to the left with the index w-1.,shown as ([x_{w-1},x_{w-2},...,x_1,x_0]).

(B2U_w(vec{x})=sum^{i}_{i=0}x_i2^i)

(U_{max}=frac{1(1-2^n)}{1-2}=2^n-1)

We will see later what I have recorded here can be closely related to the its mathematical properties and its real-lift implementation in assembly language.

### 2.2 signed integer

For non-negative number encoded by unsigned way,then **Two's-complement** way can lead to the encoding of zero,negative and positive number within a more narrow range in comparsion with unsigned encoding with vecttor x of the same length w.

(B2T_w(vec{x})=(-1)cdot x_{w-1} +sum^{w-2}_{i=0}x_i2^i)

(TMax_w=2^{w-1}-1)

(TMin_w=-2^{w-1})

The range of positve number extends one more number than the range of negative number in one two's complement.

(TMax_w+TMin_w=1)

### 2.3 the conversion between signed and unsigned number

The problem can be separated into three parts:

1): both are positive number

2): two's complement is for a negative number

3): positve or negative zero

case 1: there is no any difference between each bit representation so (B2T_w(vec{x})=B2U_w(vec{x})),case 1 is only valid when number is in the range from 0 to (TMax_w)

case 2:

2.1 **Unsign->signed**

a unsigned number is greater than (TMax_w) then conveted into signed number,For a bit vector (vec{x})

(U2T_w(vec{x})=B2T_w(U2B(vec{x}))=x-2^w)

2.2 **signed->unsigned**

(T2U_w(vec{x})=x+2^w)

case 3:

positive and negative zero ~~~~ not understand yet!

All of these three cases can be conclude as one formula:

T2U_w(vec{x})=x+x_{w-1}cdot 2^w