• 欢迎光临~

# CS:APP--Chapter02 : representing and manipulating information

• CS:APP--Chapter02 : representing and manipulating information
• 1.information storage
• 1.2 data sizes
• 1.2.1 word and the range of virtual address memory
• 1.2.2 multiple data sizes of number
• 1.2.3 addressing and byte ordering(寻址和字节排序)
• b.byte ordering
• 1.3 logic algebra
• 1.4 bit-level operations in C
• 1.5 logic operation in C
• 1.6 shift operation in C
• 2. integer representation
• 2.1 unsigned integer
• 2.2 signed integer
• 2.3 the conversion between signed and unsigned number
• 3.4 two's complement negation

Most computers now store and process information with binary/two-valued singals/data.

For the representation of information/numbers,we consider Three most important representations of numbers:

1. unsigned
2. two's-complement
3. float-point

succeeding by the details of arithmetic operation in which we indeed understand why our codes can generate some peculiar results and learn to optimize our codes.

## 1.information storage

First of all,byte,the smallest addressable unit of memory,consists of 8 bits,and main memory(later virtually described by virtual address space) are treated as an array of bytes (like a linear list).

The softwares running on a computer can be treated as one program,so all the information we need to consider are the program objects among these softwares.

->It simply treats each program object as a block of bytes and program as a sequence of bytes.

Then

->Each byte in main memory is labelled by a unique number,known as its address.

Even if computers treat data as a run of binary numbers, it is still so tedious to understand these 0 and 1 bunches by humankind.
Hexadecimal can be handily converted into binary, vice-versa.

There are some special rules for mapping hexadecimal to decimal and binary respectively.
(For 0-9, hexadecimal is same as decimal does in terms of the representation and mapping to binaray.)

A 10 1010
B 11 1011
C 12 1100
D 13 1101
E 14 1110
F 15 1111

Conversely, converting a string of binary to hexadecimal is to divide it into groups of 4 bits each. Then if the length of the binary is not a multiple of 4 resulting in the absence of the beginning elements of the leftmost group, just padding the number with leading zeros. (zero extension:perserve itself value.)

0x1111
0x1234

Technique: preceding the hexadecimal by 0x

carry it as same as a decimal does! add 1 to the next high-order bit.

### 1.2 data sizes

#### 1.2.1 word and the range of virtual address memory

As for the most important parameter in a computer,word size defines how far our computer can reach our computer. That mean, if one computer defines a word as w bits, its virtual address automatically ranges from 0 to (2^w-1) with the capacity of (2^w) bytes.

As a result of different word sizes, programs always are labelled by 32-bits or 64-bits.64 bits computer can execute both programs but 32-bits computer is compatible with only 32-bit program.

#### 1.2.2 multiple data sizes of number

computer and compilers supports various data size to effectively represent integer and float.

often data size depends on how the program is compiled.So let's shed light on the data size defined in C languagae.

(C only define the minimum size of each data type.)

Different size on different computer may result in bug because of compatibility of data type.
In oorder to standardize data size,C also provides fix-sized data type denoted as int32_t,int64_t and so on.

#### 1.2.3 addressing and byte ordering(寻址和字节排序)

Like the same two questions when we is confronted to how to restore data we fetch and is written into memory:

1. where is these data? ----addressing
2. how we will order the bytes ?(my way:how to find them correctly?) -- byte ordering

Generally speaking,a multi-bytes object occupies a block of continuous units/bytes in virtual address space,with the address of the objects given by the smallest address of the unit/bytes used.

##### b.byte ordering

There are two conventions named as big endian && little endian.
Given a bit-vector of length (w,[x_{w-1},x_{w-2},...,x_0])

1.little endian:restore this number in memory ordered from the lest significant byte to the least.
2.big endian:restore this number in memory ordered from the most significant bytes to the least.

One problem:
byte ordering becomes an issue:

1. data communicated over the network between the little-endian and big-endian devices.
2. ?
3. ?

### 1.3 logic algebra

It's hard to imagine how the binary can perform the function such addition,division,multiplication,subtraction and so on.

->A rich body of mathematical knowledge has evolved around the study of the value 1 and 0.

There are four basic operations and truth tables in logic algebra:

1.(neg)

(neg) 0 1
- 1 0

2.(&)

(&) 0 1
0 0 0
1 0 1

3.(|)
|(|)|0|1|
|--|--|--|
|0|0|1|
|1|1|1|

4.^

^ 0 1
0 0 1
1 1 0

### 1.4 bit-level operations in C

| & ~

Apart from logic operation bit by bit,mask operation is always used in these two operation:

1.preserve some pattern

perserve the three high-order bits
&:1110000...0000
|:0001111...1111

2.extract some pattern

&& || !

>>
<<

## 2. integer representation

Now we move on to the representation of integers which is divided into two types:

1. unsigned integer
2. sign(two's-complement) integer

### 2.1 unsigned integer

assume (vec{x}) is a vector of bits beginning from right with the index of 0 to the left with the index w-1.,shown as ([x_{w-1},x_{w-2},...,x_1,x_0]).

(B2U_w(vec{x})=sum^{i}_{i=0}x_i2^i)

(U_{max}=frac{1(1-2^n)}{1-2}=2^n-1)

We will see later what I have recorded here can be closely related to the its mathematical properties and its real-lift implementation in assembly language.

### 2.2 signed integer

For non-negative number encoded by unsigned way,then Two's-complement way can lead to the encoding of zero,negative and positive number within a more narrow range in comparsion with unsigned encoding with vecttor x of the same length w.

(B2T_w(vec{x})=(-1)cdot x_{w-1} +sum^{w-2}_{i=0}x_i2^i)
(TMax_w=2^{w-1}-1)
(TMin_w=-2^{w-1})

The range of positve number extends one more number than the range of negative number in one two's complement.

(TMax_w+TMin_w=1)

### 2.3 the conversion between signed and unsigned number

The problem can be separated into three parts:
1): both are positive number
2): two's complement is for a negative number
3): positve or negative zero

case 1: there is no any difference between each bit representation so (B2T_w(vec{x})=B2U_w(vec{x})),case 1 is only valid when number is in the range from 0 to (TMax_w)

case 2:
2.1 Unsign->signed
a unsigned number is greater than (TMax_w) then conveted into signed number,For a bit vector (vec{x})

(U2T_w(vec{x})=B2T_w(U2B(vec{x}))=x-2^w)

2.2 signed->unsigned
(T2U_w(vec{x})=x+2^w)

case 3:
positive and negative zero ~~~~ not understand yet!

All of these three cases can be conclude as one formula:

[U2T_W(vec{x})= x - x_{w-1}cdot 2^w \$\$]

T2U_w(vec{x})=x+x_{w-1}cdot 2^w

[ ### 2.4 expanding and shrinking the bit-level representation of a integer #### 2.4.1 expanding the bit represenation its mathematical properties tells us that ,whatever the bit representation varies ,the number itself would perserve always. "Expanding" here means making more high-order bits available in the bit represenation.for example,->.So one question comes here soon,what criteria can we depend on to make sure the expanding bits? ##### two genrics of expansion 1. zero expension : stretch to left more with all bit zilch 0. 2. sign expansion : stretch to left more with all sign bit(the significant bit) the later section of assembly directives will uncover the practical implmentation on the hardware. we already mentioned the match between its mathematical properties and its encoding,let make the detail illustate more for us ! #### expansion of an unsigned number by zero extension we suppose here a vector x with the expension by zeo extension: \$B2U_{w+i}(vec{x})=B2U_w(vec{x})\$ #### expansion of a signed number by sign extension \$B2T_{w+i}(vec{x})=(-1)cdot x_{w+i-1}+sum^{w+i-2}_{i=0}=B2T_w(vec{x})\$ sign expansion can fulll maintain and perserve the original value itself. ### 2.5 truncate integer Shoterening w bits down to w-i bits raises the question of what we should do when truncating a number? modular operation makes it by dividing by the n-th power to 2. ### 2.6 intuitive behavior - implicitly cast signed number to unsigned number implicitly cast signed number to unsigned number when c handles an expression with either operand is an unsigned number. ## 3. integer arithmetic The nature of finite computer resources may result in some peculiar outcomes like positive addition producing a negative number. Here we mostly get used to starting with the simplest part : unsigned arithmetic then moving on to the more difficult parts of signed parts. **Notes:** Overflow is a significant problem that the capacity of the data type cannot hold the number to be restored here. ### 3.1 unsigned addition principle : \$x+^u_w y = left{ begin{aligned} x+y,x+yleq UMax_w\ x+y-2^w,x+y>UMax_w\ end{aligned} right.\$ Question: no idea on the underlying principle of integer addition on computer. ### 3.2 unsigned negation \$\$-^u_wx= left{ begin{aligned} x,x=0\ 2^w-x,x>0 end{aligned} right.]