Swift 里字符串(四)large sting

对于普通的字符串,对应的_StringObject 有两个存储属性:

  1. _countAndFlagsBits: UInt64
  2. _object: Builtin.BridgeObject

_countAndFlagsBits

存储者字符串的长度和一些标记位。

┌─────────┬───────┬──────────────────┬─────────────────┬────────┬───────┐
│   b63   │  b62  │       b61        │       b60       │ b59:48 │ b47:0 │
├─────────┼───────┼──────────────────┼─────────────────┼────────┼───────┤
│ isASCII │ isNFC │ isNativelyStored │ isTailAllocated │  TBD   │ count │
└─────────┴───────┴──────────────────┴─────────────────┴────────┴───────┘

其中高16位是flag,低48位为字符串的长度,是utf8 code point的长度,而不是人眼看到的字符的个数。

  @inlinable @inline(__always)
  internal init(count: Int, flags: UInt16) {
    // Currently, we only use top 4 flags
    _internalInvariant(flags & 0xF000 == flags)

    let rawBits = UInt64(truncatingIfNeeded: flags) &<< 48
                | UInt64(truncatingIfNeeded: count)
    self.init(raw: rawBits)
    _internalInvariant(self.count == count && self.flags == flags)
  }

_object

真正字符串的位置。高四位是 discriminator,指示着字符串的一些属性。

On 64-bit platforms, the discriminator is the most significant 4 bits of the bridge object.

字符串的分类

Large strings can either be "native", "shared", or "foreign".

Native strings have tail-allocated storage, which begins at an offset of nativeBias from the storage object's address. String literals, which reside in the constant section, are encoded as their start address minus nativeBias, unifying code paths for both literals ("immortal native") and native strings. Native Strings are always managed by the Swift runtime.

Shared strings do not have tail-allocated storage, but can provide access upon query to contiguous UTF-8 code units. Lazily-bridged NSStrings capable of providing access to contiguous ASCII/UTF-8 set the ObjC bit. Accessing shared string's pointer should always be behind a resilience barrier, permitting future evolution.

Foreign strings cannot provide access to contiguous UTF-8. Currently, this only encompasses lazily-bridged NSStrings that cannot be treated as "shared". Such strings may provide access to contiguous UTF-16, or may be discontiguous in storage. Accessing foreign strings should remain behind a resilience barrier for future evolution. Other foreign forms are reserved for the future.

  native shared foreign
tail-allocated
连续UTF-8 code unit

NSString 的转换

  // Whether the object stored can be bridged directly as a NSString
  @usableFromInline // @opaque
  internal var hasObjCBridgeableObject: Bool {
    @_effects(releasenone) get {
      // Currently, all mortal objects can zero-cost bridge
      return !self.isImmortal
    }
  }

  // Fetch the stored subclass of NSString for bridging
  @inline(__always)
  internal var objCBridgeableObject: AnyObject {
    _internalInvariant(hasObjCBridgeableObject)
    return Builtin.reinterpretCast(largeAddressBits)
  }