java string类源码 java string源码解析

转载

ghpsyn 2023-07-15 12:35:50

文章标签 java string类源码字符串构造函数字符数组 文章分类 Java 后端开发

String类源码解析

1. 体系结构
首先看下源码：

final class String implements java.io.Serializable, Comparable<String>, CharSequence

String类是一个final类，因此是不可变的、线程安全的，并实现了Serializable、Comparable和CharSequence接口，String 类是日常开发中使用最频繁的类之一，同时也是非常重要的一个类，因此很有必要针对String类的进一步的理解和分析，而不能仅仅停留在会用的地步
2. 属性

private final char value[];
    private int hash; // Default to 0

String类中用一个不可变的char数组来存放字符串，一个int型的变量hash用来存放计算后的哈希值
3. 构造方法
String类有十几个构造方法，我们重点解析下其中的几个
3.1 无惨构造函数

/**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

3.2 String(String original)构造函数

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

这个构造函数接收一个字符串对象orginal作为参数，并用它初始化一个新创建的字符串对象，使其表示一个与参数相同的字符序列；换句话说，新创建的字符串是该参数字符串的副本。除非需要original的显式副本，否则不要使用此构造函数
3.3 String(char value[])

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

该构造函数接受一个字符数组value作为初始值来构造一个新的字符串，以表示字符数组参数中当前包含的字符序列。字符数组value的内容已被复制到字符串对象中，因此后续对字符数组的修改不会影响新创建的字符串
3.4 String(char value[], int offset, int count)

public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

该构造函数会分配一个新的字符串，初始值取自字符数组value，offset参数是子数组第一个字符的索引，count参数指定子数组的长度。当count=0且offset<=value.length时，会返回一个空的字符串。
3.5 String(int[] codePoints, int offset, int count)

public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

该构造函数从代码点数组构造字符串：
先对offset、count等做判断，看是否超出界限，然后计算字符数组大的精确大小，最后将代码点数组的内容拷贝到数组v中并返回（这里涉及到字符编码的知识，会在Character源码解析中详细叙述）。
3.6 String(byte bytes[], int offset, int length, String charsetName)

public String(byte bytes[], int offset, int length, String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(charsetName, bytes, offset, length);
    }

这两个构造函数使用指定的字符集解码字节数组，构造一个新的字符串。解码的字符集可以使用字符集名指定或者直接将字符集传入。方法中用到了两个函数checkBounds和StringCoding.decode，其中checkBounds函数，源码如下：

private static void checkBounds(byte[] bytes, int offset, int length) {
        if (length < 0)
            throw new StringIndexOutOfBoundsException(length);
        if (offset < 0)
            throw new StringIndexOutOfBoundsException(offset);
        if (offset > bytes.length - length)
            throw new StringIndexOutOfBoundsException(offset + length);
    }

这个方法只是单纯的进行边界检查，length、offset不能小于零，而且offset+lenght不能超出字节数组的长度。
decode源码如下：

static char[] decode(String charsetName, byte[] ba, int off, int len)
        throws UnsupportedEncodingException
    {
        StringDecoder sd = deref(decoder);
        String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
        if ((sd == null) || !(csn.equals(sd.requestedCharsetName())
                              || csn.equals(sd.charsetName()))) {
            sd = null;
            try {
                Charset cs = lookupCharset(csn);
                if (cs != null)
                    sd = new StringDecoder(cs, csn);
            } catch (IllegalCharsetNameException x) {}
            if (sd == null)
                throw new UnsupportedEncodingException(csn);
            set(decoder, sd);
        }
        return sd.decode(ba, off, len);
    }

此方法主要是利用参数中的编码类型先进行解码，然后构造对应的字符串，就不过多解析了。
3.7 String(StringBuffer buffer) 和 String(StringBuilder builder)两个构造方法

public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }
public String(StringBuilder builder) {
    this.value = Arrays.copyOf(builder.getValue(), builder.length());
}

除了前面所示的，可以从字符串、字符数组、代码点数组、字节数组构造字符串外，也可以使用StringBuffer和StringBuilder构造字符串，注意：使用StringBuffer构造字符串时内部使用了同步代码块
其他几个构造方法也都很简单，也不解析了
4. 常用方法
String类的常用方法都比较简单，里面会经常用到一个native方法，

public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

4.1 boolean equals(Object anObject)

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

equals方法经常用得到，它用来判断两个对象从实际意义上是否相等，String对象判断规则：
1. 内存地址相同，则为真。
2. 如果对象类型不是String类型，则为假。否则继续判断。
3. 如果对象长度不相等，则为假。否则继续判断。
4. 从后往前，判断String类中char数组value的单个字符是否相等，有不相等则为假。如果一直相等直到第一个数，则返回真。
由此可以看出，如果对两个超长的字符串进行比较还是非常费时间的。
4.2 int compareTo(String anotherString)

public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

这个方法写的很巧妙，先从0开始判断字符大小。如果两个对象能比较字符的地方比较完了还相等，就直接返回自身长度减被比较对象长度，如果两个字符串长度相等，则返回的是0，巧妙地判断了三种情况。
4.3 int hashCode()

public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

String类重写了hashCode方法，Object中的hashCode方法是一个Native调用。String类的hash采用多项式计算得来，我们完全可以通过不相同的字符串得出同样的hash，所以两个String对象的hashCode相同，并不代表两个String是一样的。
4.4 String substring(int beginIndex)

public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

这个方法是用来截取字符串的，若beginIndex的值为0则直接返回当前字符串，否则会构造一个新的字符串并返回
4.5 String replace(char oldChar, char newChar)

public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

这个方法也有讨巧的地方，例如最开始先找出旧值出现的位置，这样节省了一部分对比的时间。replace(String oldStr,String newStr)方法通过正则表达式来判断。
4.6 String concat(String str)

public String concat(String str) {
        int otherLen = str.length();
        if (otherLen == 0) {
            return this;
        }
        int len = value.length;
        char buf[] = Arrays.copyOf(value, len + otherLen);
        str.getChars(buf, len);
        return new String(buf, true);
    }

concat方法也是经常用的方法之一，它先判断被添加字符串是否为空来决定要不要创建新的对象。
4.7 trim()

public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

4.8 intern()

public native String intern();

intern方法是Native调用，它的作用是在方法区中的常量池里通过equals方法寻找等值的对象，如果没有找到则在常量池中开辟一片空间存放字符串并返回该对应String的引用，否则直接返回常量池中已存在String对象的引用
5. 为何String要设计成final类
字符串常量池的需要
字符串常量池(String pool, String intern pool, String保留池) 是Java方法区中一个特殊的存储区域, 当创建一个String对象时,假如此字符串值已经存在于常量池中,则不会创建一个新的对象,而是引用已经存在的对象。
如下面的代码所示,将会在堆内存中只创建一个实际String对象，代码如下:

String s1 = "abcd"; 
String s2 = "abcd";

假若字符串对象允许改变,那么将会导致各种逻辑错误,比如改变一个对象会影响到另一个独立对象. 严格来说，这种常量池的思想,是一种优化手段.

String s1= "ab" + "cd"; 
String s2= "abc" + "d";

也许这个问题违反新手的直觉, 但是考虑到现代编译器会进行常规的优化, 所以他们都会指向常量池中的同一个对象. 或者,你可以用 jd-gui 之类的工具查看一下编译后的class文件.
允许String对象缓存HashCode
Java中String对象的哈希码被频繁地使用, 比如在hashMap 等容器中。
字符串不变性保证了hash码的唯一性,因此可以放心地进行缓存.这也是一种性能优化手段,意味着不必每次都去计算新的哈希码.
安全性
String被许多的Java类(库)用来当做参数,例如网络连接地址URL,文件路径path,还有反射机制所需要的String参数等, 假若String不是固定不变的,将会引起各种安全隐患。
线程安全
因为字符串是不可变的，所以是多线程安全的，同一个字符串实例可以被多个线程共享。这样便不用因为线程安全问题而使用同步。字符串自己便是线程安全的。
总体来说, String不可变的原因包括设计考虑,效率优化问题,以及安全性这三大方面.