String乱码转义 java string转换编码的方法java

转载

柳随风 2023-06-26 15:20:30

文章标签 String乱码转义 java 字节数组字符串 java 文章分类 Java 后端开发

先给出最保险的转码操作，既无视平台编码，也无视字符编码：

/** 保证接收到的字符串转为 UTF-8 格式
 *    以 UTF-8 格式编码，再以 UTF-8 格式解码
 */
val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")

1. 对字符串的编解码使用了如下四个方法（ java.lang.String ），还有其它的几个方法差不多，这里不说了：
   | getBytes(charsetName) ：按指定字符编码格式将字符串编码为字节数组；
   | getBytes() ：按平台默认字符编码格式将字符串编码为字节数组；
   | String(bytes, offset, length, charsetName)：按指定字符编码格式将字节数组解码为字符串，并指定数组起始；
   | String(bytes, charsetName)：按指定字符编码格式将字节数组解码为字符串，按字节数组的默认起始；

/**
     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     *
     * @return  The resultant byte array
     */
    public byte[] getBytes(String charsetName);

    /* @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  offset
     *         The index of the first byte to decode
     *
     * @param  length
     *         The number of bytes to decode

     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     */
    public String(byte bytes[], int offset, int length, String charsetName);

/**
     * @return  The resultant byte array
     */
    public byte[] getBytes();

    /**
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     */
    public String(byte bytes[], String charsetName);

// 1. 使用默认的字节数组长度
val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")
// 2. 或者 指定转为UTF-8的字节长度
//    这种方式如果指定的字节数组小于UTF-8编码后的字节数组长度，最后几个中文字符会出现乱码
val strUTF8 = new String(strGBK.getBytes("UTF-8"), 0, strGBK.length()*3, "UTF-8")
// 3. （推荐）使用 UTF-8 编解码格式
val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")

3. 完整测试代码（scala/java）：

object a extends App {
  
  testUTF8ToGBK
  testGBKToUTF8
  
  def testUTF8ToGBK = {
    println("-------------------[Test UTF8 To GBK]-------------------------")
    val strBytes = new String("中文").getBytes("UTF-8")
    println("strBytes: " + strBytes.mkString(" "))
    
    val strUTF8 = new String(strBytes, "UTF-8")
    println("strUTF8 Bytes: " + strUTF8.getBytes("UTF-8").mkString(" "))
    
    // 使用默认的字节数组长度
    val strGBK = new String(strUTF8.getBytes("GBK"), "GBK")
//    // 或者 指定转为GBK的字节长度
//    val strGBK = new String(strUTF8.getBytes("GBK"), 0, strUTF8.length()*2, "GBK")
    println("strGBK Bytes: " + strGBK.getBytes("GBK").mkString(" "))
    
    println("strUTF8: " + strUTF8)
    println("strGBK: " + strGBK)
  }
  
  def testGBKToUTF8 = {
    println("-------------------[Test GBK To UTF8]-------------------------")
    val strBytes = new String("中文").getBytes("GBK")
    println("strBytes: " + strBytes.mkString(" "))
    
    val strGBK = new String(strBytes, "GBK")
    println("strGBK Bytes: " + strGBK.getBytes("GBK").mkString(" "))
    
//    // 1. 使用默认的字节数组长度
//    val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")
//    // 2. 或者 指定转为UTF-8的字节长度
//    //    这种方式如果指定的字节数组小于UTF-8编码后的字节数组长度，会出现乱码
//    val strUTF8 = new String(strGBK.getBytes("UTF-8"), 0, strGBK.length()*3, "UTF-8")
    // 3. （推荐）使用 UTF-8 编解码格式
    val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")
    println("strUTF8 Bytes: " + strUTF8.getBytes("UTF-8").mkString(" "))
    
    println("strGBK: " + strGBK)
    println("strUTF8: " + strUTF8)
  }
  
}

-------------------[Test UTF8 To GBK]-------------------------
strBytes: -28 -72 -83 -26 -106 -121
strUTF8 Bytes: -28 -72 -83 -26 -106 -121
strGBK Bytes: -42 -48 -50 -60
strUTF8: 中文
strGBK: 中文
-------------------[Test GBK To UTF8]-------------------------
strBytes: -42 -48 -50 -60
strGBK Bytes: -42 -48 -50 -60
strUTF8 Bytes: -28 -72 -83 -26 -106 -121
strGBK: 中文
strUTF8: 中文

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。