Using Maoemory Efficiently 高效使用内存
4.1 A Word On Memory
4.2 Data Types
4.2.1 Comparing Values 值的比较
4.2.2 Other Algorithms
4.2.3Sorting Arrays
4.2.4 Defining Your Own Classes
4.3 Accessing Memory 访问内存
4.4 Laying Out Your Data排布数据
4.5 Garbage Collection垃圾收集
4.51 Memory Leaks
4.5.2 References
4.6 APIs
4.7 Low Memory内存少的时候
Applications spend a significant part of their time dealing with data in memory(应用生命期的绝大部分时间都是用来处理内存中的数据). While many developers are aware of the need to try to use as little memory as possible on devices like phones or tablets, not all realize the impact of memory usage on performance(并不是所有人都意识到内存对性能的影响). In this chapter, you will learn how choosing the right data type and how arranging your data in memory如何在内存中排布数据 can boost your application’s performance. Also, we will review a basic yet虽然基本还是 often overlooked feature of Java: memory management using .garbage collection and references垃圾回收和内存管理.
4.1 A Word On Memory
No matter how much an application is given, it could always ask for more. There are two big differences between an Android device, like a phone or tablet, and a
traditional computer:
The amount of physical memory 物理内存的大小
The ability to do virtual memory swapping 虚拟内存交换的能力
Typically通常情况下, today’s computers come installed with several gigabytes of RAM (few come installed with only 1GB or less), however Android devices often have a maximum of 512MB of RAM. To add insult to injury雪上加霜的是, computers use swapping内存交换能力, which Android devices don’t have, to give the system and applications the illusion of more memory内存可用的假象. For example, an application could still address up寻址 to 4GB of memory even with a system that has only 256MB of RAM. Your Android application simply does not have this luxury特遇,
and therefore you have to be more careful about how much memory your application uses 必须小心内存的使用量.
NOTE: You can find applications on Android Market that enable swapping, but they require root access and a different Linux kernel内核. Assume the Android devices your applications will be running on do not have swapping capabilities假定运行在不具备内存交换功能的设备上.
The Java language defines the size of most primitive types, as shown in Table 4–1,
together with their matching native types. If you are primarily familiar with C/C++ code, then you need to pay extra attention to two things:
Java’s char is 16-bit (UTF-16).
Java’s long is 64–bit (while C long is usually 32-bit and C long long is usually 64–bit).
Table 4–1. Java Primitive Types
Java primitive type Native type Size
boolean jboolean 8 bits (VM dependent)
byte jbyte 8 bits
char jchar 16 bits
short jshort 16 bits
int jint 32 bits
long jlong 64 bits
float jfloat 32 bits
double jdouble 64 bits
A good rule of thumb翻阅 is to use as little memory as possible, which is just common sense as your Android device and applications have a limited amount of memory. But in addition to reducing the risk of an OutOfMemoryError exception, using less memory can also increase performance.
Performance depends mostly on three things:
How the CPU can manipulate a certain data type
How much space is needed to store data and instructions
How data is laid out in memory 数据在内存中的布局
We will cover each one逐一讲解 of these items, looking at both native code and Java code, measuring execution times, and comparing several implementations of simple algorithms to improve performance.
4.2 Data Types
You already got an aperu of the first point, how the CPU can manipulate a certain data type, in Chapter 2 with the native code version of computeIterativelyFaster(), which used two add instructions to add two 64–bit integers, as shown in Listing 4–1.
Listing 4–1. Adding Two 64–bit Integers
448: e0944002 adds r4, r4, r2
44c: e0a55003 adc r5, r5, r3
Because the ARM registers are 32-bit wide, two instructions are needed to add two 64– bit integers; the lowest 32-bit integers are stored in one register (r4), and the highest 32- bit are stored in another register (r5). Adding two 32-bit values would require a single instruction.
Let’s now consider the trivial简单 C function, shown in Listing 4–2, which simply returns the sum of two 32-bit values passed as parameters.
Listing 4–2. Sum of Two Values
int32_t add_32_32 (int32_t value1, int32_t value2)
{
return value1 + value2;
}
The assembly code for this function is shown in Listing 4–3.
Listing 4–3. Assembly Code
000016c8 <add_32_32>:
16c8: e0810000 add r0, r1, r0
16cc: e12fff1e bx lr
As expected, only one instruction is needed to add the two values (and bx lr is the
equivalent of return). We can now create new functions that are very much like
add_32_32, but with different types for value1 and value2. For example, add_16_16 adds two int16_t values, as shown in Listing 4–4, and add_16_32 adds an int16_t value and an int32_t value, as shown in Listing 4–5.
Listing 4–4. add_16_16’s C and Assembly
int16_t add_16_16 (int16_t value1, int16_t value2)
{
return value1 + value2;
}
000016d0 <add_16_16>:
16d0: e0810000 add r0, r1, r0
16d4: e6bf0070 sxth r0, r0
16d8: e12fff1e bx lr
Listing 4–5. add_16_32’s C And Assembly
int32_t add_16_32 (int16_t value1, int32_t value2)
{
return value1 + value2;
}
000016dc <add_16_32>:
16dc: e0810000 add r0, r1, r0
16e0: e12fff1e bx lr
You can see that adding two 16- values required an additional instruction in order to convert转换 the result from 16-bit to 32-bit.
Listing 4–6 shows five more functions, repeating basically the same algorithm but for different data types.
Listing 4–6. More Assembly Code
000016e4 <add_32_64>:
16e4: e0922000 adds r2, r2, r0
16e8: e0a33fc0 adc r3, r3, r0, asr #31
16ec: e1a00002 mov r0, r2
16f0: e1a01003 mov r1, r3
16f4: e12fff1e bx lr
000016f8 <add_32_float>:
16f8: ee070a10 fmsr s14, r0
16fc: eef87ac7 fsitos s15, s14
1700: ee071a10 fmsr s14, r1
1704: ee777a87 fadds s15, s15, s14
1708: eefd7ae7 ftosizs s15, s15
170c: ee170a90 fmrs r0, s15
1710: e12fff1e bx lr
00001714 <add_float_float>:
1714: ee070a10 fmsr s14, r0
1718: ee071a90 fmsr s15, r1
171c: ee377a27 fadds s14, s14, s15
1720: ee170a10 fmrs r0, s14
1724: e12fff1e bx lr
00001728 <add_double_double>:
1728: ec410b16 vmov d6, r0, r1
172c: ec432b17 vmov d7, r2, r3
1730: ee366b07 faddd d6, d6, d7
1734: ec510b16 vmov r0, r1, d6
1738: e12fff1e bx lr
0000173c <add_float_double>:
173c: ee060a10 fmsr s12, r0
1740: eeb77ac6 fcvtds d7, s12
1744: ec432b16 vmov d6, r2, r3
1748: ee376b06 faddd d6, d7, d6
174c: ec510b16 vmov r0, r1, d6
1750: e12fff1e bx lr
NOTE: The generated native code may differ from what is shown here as a lot depends on the context of the addition加法的上下文而定. (Code that is inline may look different as the compiler may reorder instructions or change the register allocation.)
As you can see, using a smaller type is not always beneficial to performance as it may actually require more instructions, as demonstrated in Listing 4–4. Besides, if add_16_16 were called with two 32-bit values as parameters, these two values would first have to be converted to 16-bit values before the actual call, as shown in Listing 4–7. Once again, the sxth instruction is used to convert the 32-bit values into 16-bit values by
performing a “sign extend符号扩展” operation.
Listing 4–7. Calling add_16_16 With Two 32-Bit Values
00001754 <add_16_16_from_32_32>:
1754: e6bf0070 sxth r0, r0
1758: e6bf1071 sxth r1, r1
175c: eaffffdb b 16d0 <add_16_16>
4.2.1 Comparing Values 值的比较
Let’s now consider another basic function, which takes two parameters and returns 0 or 1 depending on whether the first parameter is greater than the second one, as shown in
Listing 4–8.
Listing 4–8. Comparing Two Values
int32_t cmp_32_32 (int32_t value1, int32_t value2)
{
return (value1 > value2) ? 1 : 0;
}
Again, we can see the assembly code for this function and its variants变种 in Listing 4–9. Listing 4–9. Comparing Two Values In Assembly
00001760 <cmp_32_32>:
1760: e1500001 cmp r0, r1
1764: d3a00000 movle r0, #0 ; 0x0
1768: c3a00001 movgt r0, #1 ; 0x1
176c: e12fff1e bx lr
00001770 <cmp_16_16>:
1770: e1500001 cmp r0, r1
1774: d3a00000 movle r0, #0 ; 0x0
1778: c3a00001 movgt r0, #1 ; 0x1
177c: e12fff1e bx lr
00001780 <cmp_16_32>:
1780: e1500001 cmp r0, r1
1784: d3a00000 movle r0, #0 ; 0x0
1788: c3a00001 movgt r0, #1 ; 0x1
178c: e12fff1e bx lr
00001790 <cmp_32_64>:
1790: e92d0030 push {r4, r5}
1794: e1a04000 mov r4, r0
1798: e1a05fc4 asr r5, r4, #31
179c: e1550003 cmp r5, r3
17a0: e3a00000 mov r0, #0 ; 0x0
17a4: ca000004 bgt 17bc <cmp_32_64+0x2c>
17a8: 0a000001 beq 17b4 <cmp_32_64+0x24>
17ac: e8bd0030 pop {r4, r5}
17b0: e12fff1e bx lr
17b4: e1540002 cmp r4, r2
17b8: 9afffffb bls 17ac <cmp_32_64+0x1c>
17bc: e3a00001 mov r0, #1 ; 0x1
17c0: eafffff9 b 17ac <cmp_32_64+0x1c>
000017c4 <cmp_32_float>:
17c4: ee070a10 fmsr s14, r0
17c8: eef87ac7 fsitos s15, s14
17cc: ee071a10 fmsr s14, r1
17d0: eef47ac7 fcmpes s15, s14
17d4: eef1fa10 fmstat
17d8: d3a00000 movle r0, #0 ; 0x0
17dc: c3a00001 movgt r0, #1 ; 0x1
17e0: e12fff1e bx lr
000017e4 <cmp_float_float>:
17e4: ee070a10 fmsr s14, r0
17e8: ee071a90 fmsr s15, r1
17ec: eeb47ae7 fcmpes s14, s15
17f0: eef1fa10 fmstat
17f4: d3a00000 movle r0, #0 ; 0x0
17f8: c3a00001 movgt r0, #1 ; 0x1
17fc: e12fff1e bx lr
00001800 <cmp_double_double>:
1800: ee060a10 fmsr s12, r0
1804: eeb77ac6 fcvtds d7, s12
1808: ec432b16 vmov d6, r2, r3
180c: eeb47bc6 fcmped d7, d6
1810: eef1fa10 fmstat
1814: d3a00000 movle r0, #0 ; 0x0
1818: c3a00001 movgt r0, #1 ; 0x1
181c: e12fff1e bx lr
00001820 <cmp_float_double>:
1820: ee060a10 fmsr s12, r0
1824: eeb77ac6 fcvtds d7, s12
1828: ec432b16 vmov d6, r2, r3
182c: eeb47bc6 fcmped d7, d6
1830: eef1fa10 fmstat
1834: d3a00000 movle r0, #0 ; 0x0
1838: c3a00001 movgt r0, #1 ; 0x1
183c: e12fff1e bx lr
1840: e3a00001 mov r0, #1 ; 0x1
1844: e12fff1e bx lr
Using the long type appears to be slower than using the short and int types because of the higher number of instructions having to be executed. Similarly, using the double type and mixing float and double seems to be slower than using the float type alone.
NOTE: The number of instructions alone is not enough to determine whether code will be slower or not. Because not all instructions take the same amount of time to complete, and because of the complex nature of today’s CPUs, one cannot simply count the number of instructions to know how much time a certain operation will take.
4.2. Other Algorithms
Now that we have seen what difference data types can make on the generated code, it is time to see how slightly轻微 more sophisticated algorithms perform when dealing with more significant amounts of data.
Listing 4–10 shows three simple methods: one that sorts an array by simply calling the static Arrays.sort() method, one that finds the minimum value in an array, and one that adds all the elements in an array.
Listing 4–10. Sorting, Finding, and Summing in Java
private static void sort (int array[]) {
Arrays.sort(array);
}
private static int findMin (int array[]) {
int min = Integer.MAX_VALUE;
for (int e : array) {
if (e < min) min = e;
}
return min;
}
private static int addAll (int array[]) {
int sum = 0;
for (int e : array) {
sum += e; // this could overflow but we’ll ignore that
}
return sum;
}
Table 4–2 shows how many milliseconds each of these functions took to complete when given an array of one million random elements. In addition to these, the results for the variants of these methods (using short, long, float, and double types) are also shown. Refer to Chapter 6 for more information on how to measure execution times.
Table 4–2. Execution Times With 1,000,000-Element Array
Java primitive type sort findMin addAll
short 93 27 25
int 753 31 30
long 1,240 57 54
float 1,080 41 33
double 1,358 58 55
We can make a couple of comments on these results:
Sorting the short array is much faster than sorting any of the other arrays.(sort数组排序永远快于其他数组排序)
Working with 64–bit types (long or double) is slower than working with 32-bit types.
4.2.3Sorting Arrays
Sorting an array of 16-bit values can be much faster than sorting an array of 32- or 64– bit values simply because it is using a different algorithm. While the int and long arrays are sorted using some version of Quicksort algorithm, the short array was sorted using counting sort, which sorts in linear time. Using the short type in that case is killing two birds with one stone: less memory is consumed (2 megabytes instead of 4 for the array of int values, and 8 megabytes for the array of long values) and performance is improved.
NOTE: Many wrongly believe Quicksort is always the most efficient sorting algorithm. You can refer to Arrays.java in the Android source code to see how each type of array is sorted.
A less common solution is to use multiple arrays to store your data用多个数组储存数据. For example, if your application needs to store many integers all in the range 0 to 100,000, then you may be tempted to allocate an array of 32-bit values for storage as the short type can be used only to store values from -32,768 to 32,767. Depending on how values are distributed,
you may end up with many values equal to or less than 32,767. In such an event, it may be beneficial to use two arrays: one for all the values between 0 and 32,767 (that is, an array of short values), and one for all values greater than 32,767 (an array of int values). While this most likely would result in a greater complexity复杂, the memory savings and potential performance increase may make that solution worthwhile, and perhaps even allow you to optimize the implementation of certain methods. For example, a method that finds the maximum element may only have to go through the array of int values. (Only if the array of int values is empty would it go through the array of short values.只有Int数组是空的时候才遍历short数组)
One of the types that was not shown in Listing 4–9 and Table 4–2 is the boolean type. In fact, sorting an array of boolean values makes little sense. However, there may be occasions where you need to store a rather large number of boolean values and refer to them by index. For that purpose, you could simply create an array. While this would work, this would result in many bits being wasted as 8 bits would be allocated for each entry in the array when actually a boolean value can only be true or false. In other words, only one bit is needed to represent a boolean value. For that purpose, the BitSet class was defined: it allows you to store boolean values in an array (and allows you to refer to them by index) while at the same time using the minimum amount of memory for that array (one bit per entry). If you look at the public methods of the BitSet class and its implementation in BitSet.java, you may notice a few things that deserve your attention:
BitSet’s backend后端 is an array of long values. You may achieve better
performance using an array of int values. (Tests showed a gain of about 10% when switching to an int array.)
Some notes in the code indicate some things should be changed for better performance (for example, see the comment starting with FIXME).
You may not need all the features from that class.
For all these reasons, it would be acceptable for you to implement your own class,
possibly based on BitSet.java to improve performance.
4.2.4 Defining Your Own Classes
Listing 4–11 shows a very simple implementation that would be acceptable if the array does not need to grow after the creation of the object, and if the only operations you need to perform are to set and get the value of a certain bit in the array, for example as you implement your own Bloom花 filter. When using this simple implementation versus BitSet, tests showed performance improved by about 50%. We achieved even better performance by using a simple array instead of the SimpleBitSet class: using an array alone was about 50% faster than using a SimpleBitSet object (that is, using an array was 4 times faster than using a BitSet object). This practice actually goes against the encapsulation封装 principle of object-oriented design and languages, so you should do this with care.
Listing 4–11. Defining Your Own BitSet-like Class
public class SimpleBitSet {
private static final int SIZEOF_INT = 32;
private static final int OFFSET_MASK = SIZEOF_INT - 1; // 0x1F
private int[] bits;
SimpleBitSet(int nBits) {
bits = new int[(nBits + SIZEOF_INT - 1) / SIZEOF_INT];
}
void set(int index, boolean value) {
int i = index / SIZEOF_INT;
int o = index & OFFSET_MASK;
if (value) {
bits[i] |= 1 << o; // set bit to 1
} else {
bits[i] &= ~(1 << o); // set bit to 0
}
}
boolean get(int index) {
int i = index / SIZEOF_INT;
int o = index & OFFSET_MASK;
return 0 != (bits[i] & (1 << o));
}
}
Alternatively另外, if most bits are set to the same value, you may want to use a
SparseBooleanArray to save memory (possibly at the cost of performance). Once again, you could use the Strategy pattern discussed in Chapter 2 to easily select one
implementation or the other.
All in all, these examples and techniques can be summarized as follows:
When dealing with large amounts of data, use the smallest type 用最小的数据类型possible that meets your requirements. For example, choose an
array of short values over an array of int values, for both performance and memory consumption reasons基于性能和空间考量. Use float instead of double if you don’t need the extra precision (and use FloatMath class if needed).
Avoid conversions from one type to another. Try to be consistent and use a single type in your computations, if possible.
Reinvent重建 the wheel if necessary to achieve better performance如果有必要取得更好性能,推倒重来, but do it with care.
Of course, these rules are not set in stone. For example, you may find yourself in a
situation where converting from one type to another could actually give better
performance, despite the conversion overhead类型转换后的性能超过开销. Be pragmatic务实 and fix a problem only when you determine there is a one.
More often than not, using less memory is a good rule to follow. In addition to simply leaving more memory available for other tasks, using less memory can improve performance as CPUs use caches to quickly access data or instructions.
4.3 Accessing Memory 访问内存
As we saw earlier, manipulating larger types can be more costly because of the higher number of instructions involved指令较多. Intuitively此外, more instructions often result in lower performance simply because of the extra work the CPU has to perform to execute them. In addition to that, code and data both reside驻留 in memory, and accessing memory itself has a cost访问内存本身也有开销.
Because accessing memory is a costly operation, a CPU caches the memory that was recently accessed, whether it was memory that was read from or memory that was written to. In fact, a CPU typically uses two caches organized in a hierarchy层级:
Level 1 cache (L1)
Level 2 cache (L2)
The L1 cache is the faster but also the smaller of the two. For example, the L1 cache could be 64 kilobytes (32 kilobytes for data cache, and 32 kilobytes for instruction cache) whereas an L2 cache could be 512 kilobytes.
NOTE: Some processors may also have a Level 3 cache, typically several megabytes 几兆字节in size, but you won’t find that on embedded devices yet.
When data or instructions cannot be found in a cache, a cache miss occurs. This is
when data or instructions need to be fetched命中 from main memory. There are several
kinds of cache misses:
Read miss in instruction cache
Read miss in data cache
Write miss写未命中
The first type of cache miss is the most critical最关键, as the CPU has to wait until the instruction is read from memory before it can be executed. The second type of cache miss can be as critical as the first type, although the CPU may still be able to execute other instructions that do not depend on the data being fetched. This effectively results in an out-of-order execution of the instructions(CPU指令乱序执行时出现). The last type of cache miss is much less critical, as the CPU can typically continue executing instructions. You will have little control over write misses几乎无法控制写为明证, but you should not worry about it much. Your focus should be
on the first two types, which are the kinds of cache misses you want to avoid.
The Cache’s Line Size缓存行的大小
Besides its total size, another important property of a cache is its line size. Each entry in the cache is a line, which contains several bytes. For example, a cache line on a Cortex A8 L1 cache is 64 bytes (16 words). The idea behind the cache and cache line is the principle of locality: if your application reads from or writes to a certain address, it is likely to read from or write to the same address, or a close-enough address in the near future. For example, this behavior was obvious in the implementation of the findMin() and addAll() methods in Listing 4–9.
There is no easy way for your application to know the size of a cache and the size of a cache line. However, knowing the caches exist and having some knowledge about how caches work can help you write better code and achieve better performance. The following tips can help you take advantage of the cache without having to recourse to low-level optimization不必诉诸底层的优化手段, as shown in Chapter 3 with the PLD and PLI assembly instructions. To reduce the number of cache read misses from the instruction cache:
Compile your native libraries in Thumb mode. There is no guarantee this will make your code faster though as Thumb code can be slower than ARM code (because more instructions may have to be executed). Refer to Chapter 2 for more information on how to compile libraries in Thumb mode.
Keep your code relatively dense保持代码相对密集. While there is no guarantee dense Java code will ultimately result in dense native code, this is still quite
often a true assumption.
To reduce the number of cache read misses from the data cache:
Again, use the smallest type possible when storing a large amount of data in arrays大量数据存储在数组中,使用尽可能小的数据类型.
Choose sequential access over random access. This maximizes the reuse of data already in the cache, and can prevent data from being removed from the cache only to be loaded in the cache again later.防止数据中清除后再次载入
NOTE: Modern CPUs are capable of prefetching预取 memory automatically to avoid, or at least limit, cache misses.防止缓存未命中的情况
As usual, apply these tips on performance-critical sections of your application在应用性能关键处使用这些代码, which usually is only a small part of your code. On the one hand, compiling in Thumb mode is an easy optimization that does not really increase your maintenance effort. On the other hand, writing dense code may make things more complicated in the long run. There is no one-size-fits-all optimization没有一个放之四海皆准的方法, and you will have the responsibility of balancing the
multiple options you have.
While you don’t necessarily have control over what goes into the cache, how you
structure and use your data can have an impact on what ends up being in the cache,
and therefore can impact performance. In some cases, you may be able to arrange your data in a specific manner to maximize cache hits, albeit虽然 possibly creating greater complexity and maintenance cost.
4.4 Laying Out Your Data排布数据
Once again, the principle of encapsulation封装 will be broken. Let’s assume your
application needs to store records of some sort, and each record contains two fields:
an id and a value. A very object-oriented approach is to define a Record class定义一个记录类, as shown in Listing 4–12.
Listing 4–12. Record Class
public class Record {
private final short id;
private final short value;
// and possibly more
public Record(short id, short value) {
this.id = id;
this.value = value;
}
public final short getId() {
return id;
}
public final short getValue() {
return value;
}
public void doSomething() {
// do something here
}
}
Now that the Record class is defined, your application could simply allocate an array, save the records in that array, and provide additional methods, as shown in Listing 4–13.
Listing 4–13. Saving Records
public class MyRecords {
private Record[] records;
int nbRecords;
public MyRecords (int size) {
records = new Record[size];
}
public int addRecord (short id, short value) {
int index;
if (nbRecords < records.length) {
index = nbRecords;
records[nbRecords] = new Record(id, value);
nbRecords++;
} else {
index = -1;
}
return index;
}
public void deleteRecord (int index) {
if (index < 0) {
// throw exception here – invalid argument
}
if (index < nbRecords) {
nbRecords--;
records[index] = records[nbRecords];
records[nbRecords] = null; // don’t forget to delete reference
122 CHAPTER 4: Using Memory Efficiently
}
}
public int sumValues (int id) {
int sum = 0;
for (int i = 0; i < nbRecords; i++) {
Record r = records[i];
if (r.getId() == id) {
sum += r.getValue();
}
}
return sum;
}
public void doSomethingWithAllRecords () {
for (int i = 0; i < nbRecords; i++) {
records[i].doSomething();
}
}
}
All of this would work and would result in a pretty clean design. However, there are drawbacks缺陷 that may not be visible until you actually run the code:
A new object is created every single time a record is added to the array. While each object is lightweight轻量级, memory allocations are still somewhat costly and could be avoided.
Calls to getId() and getValue() could be avoided if id and value
were public. If you are allowed to modify the Record class, making id and value public is obviously trivial简单 . The implementation of sumValues() would then be slightly modified, as shown in
Listing 4–14.
Listing 4–14. Modified sumValues()
public int sumValues (int id) {
int sum = 0;
for (Record r : records) {
if (r.id == id) {
sum += r.value;
}
}
return sum;
}
However, this alone does not reduce the number of allocations at all; record objects still need to be created as records are added to the array.
NOTE: You could avoid allocations easily in C/C++, but in Java all objects are actually
references引用 and have to be created with the new operator操作符.
Since all objects are allocated in the heap, and you can only store references to objects in the array, you can modify the MyRecords class to use an array of short values to remove the allocations. The modified class is shown in Listing 4–15.
Listing 4–15. Modified MyRecords Class Using a Short Array
public class MyRecords {
private short[] records;
int nbRecords;
public MyRecords (int size) {
records = new short[size * 2];
}
public int addRecord (short id, short value) {
int index;
if (nbRecords < records.length) {
index = nbRecords;
records[nbRecords * 2] = id;
records[nbRecords * 2 + 1] = value;
nbRecords++;
} else {
index = -1;
}
return index;
}
public void deleteRecord (int index) {
if (index < 0) {
// throw exception here – invalid argument
}
if (index < nbRecords) {
nbRecords--;
records[index * 2] = records[nbRecords * 2];
records[index * 2 + 1] = records[nbRecords * 2 + 1];
}
}
public int sumValues (int id) {
int sum = 0;
for (int i = 0; i < nbRecords; i++) {
if (records[i * 2] == id) {
sum += records[i * 2 + 1];
}
}
return sum;
}
public void doSomethingWithAllRecords () {
Record r = new Record(0, 0);
for (int i = 0; i < nbRecords; i++) {
r.id = records[i * 2];
r.value = records[i * 2 + 1];
r.doSomething();
}
}
}
Let’s imagine that, later on, you find out these things about how the MyRecords class is used:
sumValues() is called much more often than doSomethingWillAllRecords().
Only a few records in the array share the same id.
In other words, that would tell you the id field is read much more often than the value field. Given this additional piece of information, you could come up with the following solution to improve performance: using two arrays instead of one, to keep all the ids close together, maximizes cache hits when sequentially going through the array of ids in sumValues(). The first array contains only record ids, while the second array contains only record values. Consequently, more record ids are found in the cache when sumValues() runs as twice as many record ids would be stored in a single cache line.
The new implementation of MyRecords is shown in Listing 4–16.
Listing 4–16. Modified MyRecords Class Using Two Arrays
public class MyRecords {
private short[] recordIds; // first array only for ids
private short[] recordValues; // second array only for values
int nbRecords;
public MyRecords (int size) {
recordIds = new short[size];
recordValues = new short[size];
}
public int addRecord (short id, short value) {
int index;
if (nbRecords < recordIds.length) {
index = nbRecords;
recordIds[nbRecords] = id;
recordValues[nbRecords] = value;
nbRecords++;
} else {
index = -1;
}
return index;
}
public void deleteRecord (int index) {
if (index < 0) {
// throw exception here – invalid argument
}
if (index < nbRecords) {
nbRecords--;
recordIds[index] = recordIds[nbRecords];
recordValues[index] = recordValues[nbRecords];
}
}
public int sumValues (int id) {
int sum = 0;
for (int i = 0; i < nbRecords; i++) {
if (recordIds[i] == id) {
sum += recordValues[i]; // we only read the value if the id matches
}
}
return sum;
}
public void doSomethingWithAllRecords () {
Record r = new Record(0, 0);
for (int i = 0; i < nbRecords; i++) {
r.id = recordIds[i];
r.value = recordValues[i];
r.doSomething();
}
}
}
You may not always be able to apply this kind of optimization though. For example, the listing above assumes doSomething() does not modify the Record object and assumes MyRecords does not provide any method to retrieve Record objects from the array. If these assumptions ever become false, then the implementations in Listings 4–15 and 4– 16 would no longer be equivalent to the one in Listing 4–13.
Keep in mind that you may not be able to optimize your code properly until you find out how your code is used. Again, follow a pragmatic务实 approach: don’t start optimizing until you know what problem you are trying to solve, as optimizing one usage pattern may negatively impact other patterns.
4.5 Garbage Collection垃圾收集
One of the great benefits of Java is garbage collection. The garbage collector frees (or reclaims) memory as objects are no longer in use. For example, the Record object
allocated in doSomethingWithAllRecords() in Listing 4–15 is made eligible合适 for garbage collection when the method returns, as a reference to that object no longer exists. There are two very important things to note:
Memory leaks can still exist.
Use the garbage collector to help you manage memory as it does more than just freeing memory not in used anymore.
4.51 Memory Leaks
As memory can be reclaimed引用 only when an object is no longer referenced, a memory leak can occur when a reference to an object is kept around当一个被释放的引用仍然存在时就会发生泄漏. A typical example from the Android documentation is when the whole Activity object is leaked as the screen turns (for example, from portrait to landscape orientation). This particular example is easy to
reproduce and is quite serious as an Activity object uses quite a bit of memory(actvity对象占用相当多内存) and often contains references to even more objects(往往包含多个对象的引用). There is no easy solution to avoiding
memory leaks, however Android provides you with tools and APIs to help you.
The DDMS perspective in Eclipse lets you track memory usage and memory allocation with the Heap and Allocation Tracker, respectively. Once again, these tools are notgoing to tell you where the memory leak is (if there is any), but you can use them to analyze your application’s memory usage用来分析内存的使用情况 and hopefully find out if your application has leaks.
TIP: Use the Eclipse Memory Analyzer to even better analyze your memory usage. You can download it from http://www.eclipse.org/mat. Android 2.3 defines the StrictMode class, which can be of great help to detect potential
memory leaks. While StrictMode’s virtual machine policy in Android 2.3 lets you detect only when SQLite objects (such as cursors) are not closed, StrictMode’s VM policy in Android 3.0 and above also lets you detect these potential leaks:
Activity leaks
Leaks of other objects
Leaks when objects are not closed (see Android documentation for
complete list of classes implementing the Closeable interface)
NOTE: The StrictMode class was introduced in Android 2.3 (API level 9), but additional functionalities were added in Android 3.0 (API level 11) in both the VM policy and thread policy. For example, Honeycomb’s StrictMode’s thread policy supports flashing the screen when a violation is detected发现有违规操作时闪烁屏幕.
Listing 4–17 shows how to use the StrictMode class to detect memory leaks in your
application. You should enable this feature only during development and testing, and disable it as you release your application into the wild发布时要禁用.
Listing 4–17. Using StrictMode
public class MyApplication extends Application {
@Override
public void onCreate() {
super.onCreate();
StrictMode.VmPolicy.Builder builder = new StrictMode.VmPolicy.Builder();
builder.detectLeakedSqlLiteObjects();
if (VERSION.SDK_INT >= Build.VERSION_CODES.HONEYCOMB) {
builder.detectActivityLeaks().detectLeakedClosableObjects();
}
// or you could simply call builder.detectAll()
// penalty
builder.penaltyLog(); // other penalties exist (e.g. penaltyDeath()) and can be
combined
StrictMode.VmPolicy vmp = builder.build();
StrictMode.setVmPolicy(vmp);
}
}
In that particular instance, StrictMode detects a violation when a closeable object
(SQLite object or other) is not closed, and will only log the violation. To verify the
behavior, you can simply query a database and purposely forget to close the returned cursor, for example by modifying the code shown in Listing 1-25 in Chapter 1.
As the StrictMode class evolves进展, it is recommended you simply use detectAll(), which allows you to test your application with future Android releases while taking advantage of the new functionalities the StrictMode class supports.
4.5.2 References
While freeing memory内存释放 is an important feature of the garbage collector, it does more than that as it is a complete memory management system. Everybody writing Java code has heard about references and how an object can be referenced or not引用及对象是否被引用. However, too few seem to know about the multiple types of references. In fact, Java defines four types of references:
Strong
Soft
Weak
Phantom
1.强引用Strong References
Strong references are the references Java developers are the most familiar with.
Creating such a reference is trivial and is done all the time in Java, as shown in Listing 4–18. In fact, they are the references your application should use most of the time. Two strong references are created, one to an Integer object and one to a BigInteger object. Listing 4–18. Strong References
public void printTwoStrings (int n) {
BigInteger bi = BigInteger.valueOf(n); // strong reference
Integer i = new Integer(n); // strong reference
System.out.println(i.toString());
i = null; // Integer object freshly created is now eligible for garbage
collection
System.out.println(bi.toString());
bi = null; // BigInteger object may not be eligible for garbage collection here!
}
The important thing to notice here is that while setting i to null does make the Integer object eligible合适 for garbage collection(设置i为null时会使Integer对象被回收), setting bi to null may not. Because the BigInteger.valueOf() method may return a preallocated object (for example, BigInteger.ZERO), setting bi to null merely removes one strong reference to the BigInteger object, but more strong references to that same object may still exist. Two more strong references are created in that method, and they may not be as obvious as the other ones: the calls to i.toString() and bi.toString() each create a strong reference to a String object(分别创建yigeString对象的强引用).
NOTE: Strictly speaking, you would have to know the implementation of the Integer constructor to make sure no strong reference to the new Integer object is created anywhere else and therefore make sure that setting i to null does indeed make the object eligible for garbage collection.
As discussed earlier, keeping strong references to objects around can cause memory leaks. We’ve probably used the term “strong reference” too many times, so it is time to say that Java does not really define such a term or class. Strong references are “normal” references, simply referred to (pun双关 intended) as references.
2 Soft, Weak, and Phantom虚 References
Soft and weak references are similar in nature, as they are references that are not strong enough to keep an object from being deleted (or reclaimed). They differ in how aggressively the garbage collector will try to reclaim the object they have a reference to处理他们引用对象的积极程度不同.
An object that is softly reachable, that is, for which there exists a soft reference but no strong reference, is likely to be left alone by the garbage collector when the collection occurs but there is still enough memory to hold the object. However, if the garbage collector determines it needs to reclaim more memory回收更多内存, then it is free to reclaim the softly reachable object’s memory. This type of reference is the perfect candidate for a cache that can automatically remove its entries.
TIP: When using a cache当使用缓存时, make sure you understand what type of reference it uses. For example, Android’s LruCache uses strong references.
Weakly reachable objects, that is, objects for which there exists a weak reference but no strong or soft reference, may be reclaimed as soon as the next collection happens. In other words, the garbage collector will more aggressively reclaim the memory of weakly reachable objects. This type of reference is the perfect candidate for mappings that can be removed automatically as keys are no longer referenced自动删除不被引用的KEY. Use the WeakHashMap class for this purpose.
NOTE: How aggressive the garbage collector is depends on the actual implementation.
Phantom references are the weakest of the references and are rarely used. They can be useful if your application needs to know when an object is being garbage collected and you need to perform some clean-up at that time. To be truly useful, phantom references should be registered with a reference queue关联了不同队列.
Soft, weak, and phantom references are actually objects themselves and offer a level of indirection to any other object. For example, you could create a phantom reference to a soft reference to a weak reference. In practice though不过在实际中, you will almost always create soft, weak, or phantom references to “strong” references. Listing 4–19 shows an example of soft and weak references being created, each associated with a different reference queue.
Listing 4–19. References and Reference Queues
private Integer strongRef;
private SoftReference<Integer> softRef;
private WeakReference<Integer> weakRef;
private ReferenceQueue<Integer> softRefQueue = new ReferenceQueue<Integer>();
private ReferenceQueue<Integer> weakRefQueue = new ReferenceQueue<Integer>();
public void reset () {
strongRef = new Integer(1);
softRef = new SoftReference<Integer>(strongRef, softRefQueue);
weakRef = new WeakReference<Integer>(strongRef, weakRefQueue);
}
public void clearStrong () {
strongRef = null; // no more strong reference, but weak and soft references may
still exist
}
public void clearSoft () {
softRef = null; // no more soft reference, but strong and weak references may
still exist
}
public void clearWeak () {
weakRef = null; // no more weak reference, but strong and soft references may
still exist
}
public void pollAndPrint () {
Reference<? extends Integer> r;
if ((r = softRefQueue.poll()) != null) {
do {
Log.i(TAG, "Soft reference: " + r);
} while ((r = softRefQueue.poll()) != null);
} else {
Log.i(TAG, "Soft reference queue empty");
}
if ((r = weakRefQueue.poll()) != null) {
do {
Log.i(TAG, "Weak reference: " + r);
} while ((r = weakRefQueue.poll()) != null);
} else {
Log.i(TAG, "Weak reference queue empty");
}
}
public void gc() {
System.gc();
}
Experiment with this code to see when references are enqueued and how this can affect your application. To take full advantage of the garbage collector’s memory management abilities, it is important to understand references. You should not try to implement a similar memory management system when working with caches or maps. Most of the things you would want to achieve may be left to the garbage collector with a careful use of references精心规划引用后.
3.垃圾收集Garbage Collection
Garbage collection can occur at various times不定的时间触发, and you have little control over when it is happening你几乎无法控制他发生的时间. You may be able to give some hint to Android by calling System.gc(), but ultimately the Dalvik virtual machine gets to decide when garbage collection actually occurs. There are five situations that prompt garbage collection to occur, and I’ll refer to
them by their log messages, which you can see when using logcat. GC_FOR_MALLOC: Occurs when the heap is too full to allocate memory, and memory must be reclaimed before the allocation can proceed GC_CONCURRENT: Occurs when a (possibly partial) collection kicks in, usually as there are enough objects to reclaim GC_EXPLICIT: Can occur when you call System.gc() to explicitly
request a garbage collection GC_EXTERNAL_ALLOC: Does not occur anymore on Honeycomb or later (as everything is allocated in the heap一切都已在堆中分配)
GC_HPROF_DUMP_HEAP: Occurs when you create an HPROF file
Listing 4–20 shows some log messages from the garbage collector.
Listing 4–20. Garbage Collection Messages
GC_CONCURRENT freed 103K, 69% free 320K/1024K, external 0K/0K, paused 1ms+1ms
GC_EXPLICIT freed 2K, 55% free 2532K/5511K, external 1625K/2137K, paused 55ms
As garbage collection takes time, reducing the number of objects you allocate分配 and then release释放 can improve performance. This is especially true in Android 2.2 and earlier versions because garbage collection occurred in the application’s main thread and could cause pretty serious issues with responsiveness and performance. For example, some frames in a real-time game could be dropped because too much time was spent doing garbage collection. Things got better with Android 2.3, with most of the garbage collection work relocated to a separate thread. As garbage collection occurs, the main thread is still affected a little bit (a pause of 5 milliseconds or less), but much less than in previous versions of Android. It is not uncommon for a complete garbage collection to take over 50 milliseconds. For reference, a 30 frames-per-second game would need about 33 milliseconds to render and display each frame, so it is easy to see why garbage collection on pre-Android 2.3 systems could cause problems.
4.6 APIs
Android defines several APIs you can use to learn about how much memory is available on the system and how much is being used:
ActivityManager’s getMemoryInfo()
ActivityManager’s getMemoryClass()
ActivityManager’s getLargeMemoryClass()
Debug’s dumpHprofData()
Debug’s getMemoryInfo()
Debug’s getNativeHeapAllocatedSize()
Debug’s getNativeHeapSize()
TIP: Set android:largeHeap to true in your application’s manifest file to use a large heap.This attribute was introduced in Android 3.0. Note that there is no guarantee the large heap is any larger than the regular heap. You should try hard to prevent your application from having to depend on this setting.
Listing 4–21 shows how to use the two getMemoryInfo() methods.
Listing 4–21. Calling getMemoryInfo()
ActivityManager am = (ActivityManager) getSystemService(Context.ACTIVITY_SERVICE);
ActivityManager.MemoryInfo memInfo = new ActivityManager.MemoryInfo();
am.getMemoryInfo(memInfo);
// use information from memInfo here...
Debug.MemoryInfo debugMemInfo = new Debug.MemoryInfo();
Debug.getMemoryInfo(debugMemInfo);
// use information from debugMemInfo here...
4.7 Low Memory内存少的时候
Your application is not alone. It has to share resources with many other applications and also the system as a whole. Consequently, there may be times when there is not enough memory for everyone and in this case, Android will ask applications and applications’ components (such as activities or fragments) to tighten their belts.勒紧裤腰带
The ComponentCallbacks interface defines the onLowMemory() API, which is common to all application components. When it is called, a component is basically asked to release objects it does not really need. Typically, your implementation of onLowMemory() would release:
Caches or cache entries (for example, LruCache as it uses strong references)
Bitmap objects that can be generated again on demand
Layout objects that are not visible
Database objects
You should be careful about deleting objects that are costly to recreate. However, not releasing enough memory may cause Android to be more aggressive and start killing processes, possibly even your own application. If your application is killed, then it will have to start from scratch抓痕 again the next time the user wants to use it. Consequently, your application should play nice and release as many resources as it can, because it should benefit not only other applications but also your own. Using lazy initializations in your code is a good habit; it allows you to implement onLowMemory() later without having to modify the rest of your code significantly.
Summary
On embedded devices, memory is a scarce resource内存在嵌入式设备是稀缺物. Even though today’s cellular phones and tablets have more and more memory, these devices also run more and more complex systems and applications. Using memory effectively not only allows your application to run on older devices with less memory but also to run faster. Remember, if you give your application memory, it will ask for more.