Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:
packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java
网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:
"There is no Chinese collator, HanziToPinyin is disabled"
发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。
对此问题的修复(解决方案)
我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,
1 if ( locale[i].equals(Locale.CHINA) || locale[i].equals(chinaAddition) ){
2 …
3 }
下面是我改进后的getInstance()方法全部代码:
1 public static HanziToPinyin getInstance() {
2 synchronized (HanziToPinyin.class) {
3 if (sInstance != null) {
4 return sInstance;
5 }
6 // Check if zh_CN collation data is available
7 final Locale locale[] = Collator.getAvailableLocales();
8
9 // 增加的代码,增强。
10 final Locale chinaAddition = new Locale("zh");
11
12 for (int i = 0; i < locale.length; i++) {
13 if (locale[i].equals(Locale.CHINA)
14 || locale[i].equals(chinaAddition)) {
15 // Do self validation just once.
16 if (DEBUG) {
17 Log.d(TAG, "Self validation. Result: "
18 + doSelfValidation());
19 }
20 sInstance = new HanziToPinyin(true);
21 return sInstance;
22 }
23 }
24 Log.w(TAG,
25 "There is no Chinese collator, HanziToPinyin is disabled");
26 sInstance = new HanziToPinyin(false);
27 return sInstance;
28 }
29 }
经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):
1 /*
2 * Copyright (C) 2011 The Android Open Source Project
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 * Unless required by applicable law or agreed to in writing, software
11 * distributed under the License is distributed on an "AS IS" BASIS,
12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 * See the License for the specific language governing permissions and
14 * limitations under the License.
15 */
16
17 package zhangphil.hanyupinyin;
18
19 import android.text.TextUtils;
20 import android.util.Log;
21
22 import java.text.Collator;
23 import java.util.ArrayList;
24 import java.util.Locale;
25
26 /**
27 * An object to convert Chinese character to its corresponding pinyin string.
28 * For characters with multiple possible pinyin string, only one is selected
29 * according to collator. Polyphone is not supported in this implementation.
30 * This class is implemented to achieve the best runtime performance and minimum
31 * runtime resources with tolerable sacrifice of accuracy. This implementation
32 * highly depends on zh_CN ICU collation data and must be always synchronized
33 * with ICU.
34 *
35 * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
36 */
37 public class HanziToPinyin {
38 private static final String TAG = "HanziToPinyin";
39
40 // Turn on this flag when we want to check internal data structure.
41 private static final boolean DEBUG = false;
42
43 /**
44 * Unihans array.
45 *
46 * Each unihans is the first one within same pinyin when collator is zh_CN.
47 */
48 public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89',
49 '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6',
50 '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9',
51 '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec',
52 '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a',
53 '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286',
54 '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88',
55 '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa',
56 '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe',
57 '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46',
58 '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39',
59 '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10',
60 '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f',
61 '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428',
62 '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f',
63 '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30',
64 '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85',
65 '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839',
66 '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56',
67 '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8',
68 '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2',
69 '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f',
70 '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f',
71 '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d',
72 '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5',
73 '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a',
74 '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a',
75 '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321',
76 '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170',
77 '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215',
78 '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce',
79 '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c',
80 '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988',
81 '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445',
82 '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c',
83 '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea',
84 '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c',
85 '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8',
86 '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e',
87 '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1',
88 '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705',
89 '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15',
90 '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b',
91 '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b',
92 '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18',
93 '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63',
94 '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338',
95 '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4',
96 '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb',
97 '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71',
98 '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552',
99 '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',
100 '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',
101 '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',
102 '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',
103 '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',
104 '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',
105 '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',
106 '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',
107 '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',
108 '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',
109 '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',
110 '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',
111 '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',
112 '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',
113 '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',
114 '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',
115 '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',
116 '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',
117 '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',
118 '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',
119 '\u9fc4', };
120
121 /**
122 * Pinyin array.
123 *
124 * Each pinyin is corresponding to unihans of same offset in the unihans
125 * array.
126 */
127 public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128 { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129 { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130 { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131 { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132 { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133 { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134 { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135 { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136 { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137 { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138 { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139 { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140 { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141 { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142 { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143 { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144 { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145 { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146 { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147 { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148 { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149 { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150 { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151 { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152 { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153 { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154 { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155 { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156 { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157 { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158 { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159 { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160 { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161 { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162 { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163 { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164 { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165 { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166 { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167 { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168 { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169 { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170 { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171 { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172 { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173 { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174 { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175 { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176 { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177 { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178 { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179 { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180 { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181 { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182 { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183 { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184 { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185 { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186 { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187 { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188 { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189 { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190 { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191 { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192 { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193 { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194 { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195 { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196 { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197 { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198 { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199 { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200 { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201 { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202 { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203 { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204 { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205 { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206 { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207 { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208 { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209 { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210 { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211 { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212 { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213 { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214 { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215 { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216 { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217 { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218 { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219 { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220 { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221 { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222 { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223 { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224 { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225 { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226 { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227 { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228 { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229 { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230 { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231 { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232 { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233 { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234 { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235 { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236 { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237 { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238 { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239 { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240 { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241 { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242 { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243 { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244 { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245 { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246 { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247 { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248 { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249 { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250 { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251 { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252 { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253 { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254 { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255 { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256 { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257 { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258 { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259 { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260 { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261 { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262 { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263 { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264 { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265 { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266 { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267 { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268 { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269 { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270 { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271 { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272 { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273 { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274 { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275 { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276 { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277 { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278 { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279 { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280 { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281 { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282 { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283 { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284 { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285 { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286 { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287 { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288 { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289 { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290 { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291 { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292 { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293 { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294 { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295 { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296 { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297 { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298 { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299 { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300 { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301 { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302 { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303 { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304 { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305 { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306 { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307 { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308 { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309 { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310 { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311 { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312 { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313 { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314 { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315 { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316 { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317 { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318 { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319 { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320 { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321 { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322 { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323 { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324 { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325 { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326 { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327 { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328 { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329 { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330 { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331 { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332 { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333 { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334 { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335 { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336 { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337 { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338 { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339 { 0, 0, 0, 0, 0, 0 }, };
340
341 /**
342 * First and last Chinese character with known Pinyin according to zh
343 * collation
344 */
345 private static final String FIRST_PINYIN_UNIHAN = "\u963F";
346 private static final String LAST_PINYIN_UNIHAN = "\u9FFF";
347
348 private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349
350 private static HanziToPinyin sInstance;
351 private final boolean mHasChinaCollator;
352
353 public static class Token {
354 /**
355 * Separator between target string for each source char
356 */
357 public static final String SEPARATOR = " ";
358
359 public static final int LATIN = 1;
360 public static final int PINYIN = 2;
361 public static final int UNKNOWN = 3;
362
363 public Token() {
364 }
365
366 public Token(int type, String source, String target) {
367 this.type = type;
368 this.source = source;
369 this.target = target;
370 }
371
372 /**
373 * Type of this token, ASCII, PINYIN or UNKNOWN.
374 */
375 public int type;
376 /**
377 * Original string before translation.
378 */
379 public String source;
380 /**
381 * Translated string of source. For Han, target is corresponding Pinyin.
382 * Otherwise target is original string in source.
383 */
384 public String target;
385 }
386
387 protected HanziToPinyin(boolean hasChinaCollator) {
388 mHasChinaCollator = hasChinaCollator;
389 }
390
391 public static HanziToPinyin getInstance() {
392 synchronized (HanziToPinyin.class) {
393 if (sInstance != null) {
394 return sInstance;
395 }
396 // Check if zh_CN collation data is available
397 final Locale locale[] = Collator.getAvailableLocales();
398
399 // 增加的代码,增强。
400 final Locale chinaAddition = new Locale("zh");
401
402 for (int i = 0; i < locale.length; i++) {
403 if (locale[i].equals(Locale.CHINA)
404 || locale[i].equals(chinaAddition)) {
405 // Do self validation just once.
406 if (DEBUG) {
407 Log.d(TAG, "Self validation. Result: "
408 + doSelfValidation());
409 }
410 sInstance = new HanziToPinyin(true);
411 return sInstance;
412 }
413 }
414 Log.w(TAG,
415 "There is no Chinese collator, HanziToPinyin is disabled");
416 sInstance = new HanziToPinyin(false);
417 return sInstance;
418 }
419 }
420
421 /**
422 * Validate if our internal table has some wrong value.
423 *
424 * @return true when the table looks correct.
425 */
426 private static boolean doSelfValidation() {
427 char lastChar = UNIHANS[0];
428 String lastString = Character.toString(lastChar);
429 for (char c : UNIHANS) {
430 if (lastChar == c) {
431 continue;
432 }
433 final String curString = Character.toString(c);
434 int cmp = COLLATOR.compare(lastString, curString);
435 if (cmp >= 0) {
436 Log.e(TAG, "Internal error in Unihan table. "
437 + "The last string \"" + lastString
438 + "\" is greater than current string \"" + curString
439 + "\".");
440 return false;
441 }
442 lastString = curString;
443 }
444 return true;
445 }
446
447 private Token getToken(char character) {
448 Token token = new Token();
449 final String letter = Character.toString(character);
450 token.source = letter;
451 int offset = -1;
452 int cmp;
453 if (character < 256) {
454 token.type = Token.LATIN;
455 token.target = letter;
456 return token;
457 } else {
458 cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459 if (cmp < 0) {
460 token.type = Token.UNKNOWN;
461 token.target = letter;
462 return token;
463 } else if (cmp == 0) {
464 token.type = Token.PINYIN;
465 offset = 0;
466 } else {
467 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468 if (cmp > 0) {
469 token.type = Token.UNKNOWN;
470 token.target = letter;
471 return token;
472 } else if (cmp == 0) {
473 token.type = Token.PINYIN;
474 offset = UNIHANS.length - 1;
475 }
476 }
477 }
478
479 token.type = Token.PINYIN;
480 if (offset < 0) {
481 int begin = 0;
482 int end = UNIHANS.length - 1;
483 while (begin <= end) {
484 offset = (begin + end) / 2;
485 final String unihan = Character.toString(UNIHANS[offset]);
486 cmp = COLLATOR.compare(letter, unihan);
487 if (cmp == 0) {
488 break;
489 } else if (cmp > 0) {
490 begin = offset + 1;
491 } else {
492 end = offset - 1;
493 }
494 }
495 }
496 if (cmp < 0) {
497 offset--;
498 }
499 StringBuilder pinyin = new StringBuilder();
500 for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501 pinyin.append((char) PINYINS[offset][j]);
502 }
503 token.target = pinyin.toString();
504 if (TextUtils.isEmpty(token.target)) {
505 token.type = Token.UNKNOWN;
506 token.target = token.source;
507 }
508 return token;
509 }
510
511 /**
512 * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513 * characters without space will be put into a Token, One Hanzi character
514 * which has pinyin will be treated as a Token. If these is no China
515 * collator, the empty token array is returned.
516 */
517 public ArrayList<Token> get(final String input) {
518 ArrayList<Token> tokens = new ArrayList<Token>();
519 if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520 // return empty tokens.
521 return tokens;
522 }
523 final int inputLength = input.length();
524 final StringBuilder sb = new StringBuilder();
525 int tokenType = Token.LATIN;
526 // Go through the input, create a new token when
527 // a. Token type changed
528 // b. Get the Pinyin of current charater.
529 // c. current character is space.
530 for (int i = 0; i < inputLength; i++) {
531 final char character = input.charAt(i);
532 if (character == ' ') {
533 if (sb.length() > 0) {
534 addToken(sb, tokens, tokenType);
535 }
536 } else if (character < 256) {
537 if (tokenType != Token.LATIN && sb.length() > 0) {
538 addToken(sb, tokens, tokenType);
539 }
540 tokenType = Token.LATIN;
541 sb.append(character);
542 } else {
543 Token t = getToken(character);
544 if (t.type == Token.PINYIN) {
545 if (sb.length() > 0) {
546 addToken(sb, tokens, tokenType);
547 }
548 tokens.add(t);
549 tokenType = Token.PINYIN;
550 } else {
551 if (tokenType != t.type && sb.length() > 0) {
552 addToken(sb, tokens, tokenType);
553 }
554 tokenType = t.type;
555 sb.append(character);
556 }
557 }
558 }
559 if (sb.length() > 0) {
560 addToken(sb, tokens, tokenType);
561 }
562 return tokens;
563 }
564
565 private void addToken(final StringBuilder sb,
566 final ArrayList<Token> tokens, final int tokenType) {
567 String str = sb.toString();
568 tokens.add(new Token(tokenType, str, str));
569 sb.setLength(0);
570 }
571 }
HanziToPinyin.java
写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:
1 package zhangphil.hanyupinyin;
2
3 import java.util.ArrayList;
4
5 import zhangphil.hanyupinyin.HanziToPinyin.Token;
6 import android.app.Activity;
7 import android.os.Bundle;
8
9 public class MainActivity extends Activity {
10
11 @Override
12 protected void onCreate(Bundle savedInstanceState) {
13 super.onCreate(savedInstanceState);
14
15 String s = "安卓";
16 System.out.println("汉字转拼音输出: " + getPinYin(s));
17 }
18
19 // 输入汉字返回拼音的通用方法函数。
20 public static String getPinYin(String hanzi) {
21 ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
22 StringBuilder sb = new StringBuilder();
23 if (tokens != null && tokens.size() > 0) {
24 for (Token token : tokens) {
25 if (Token.PINYIN == token.type) {
26 sb.append(token.target);
27 } else {
28 sb.append(token.source);
29 }
30 }
31 }
32
33 return sb.toString().toUpperCase();
34 }
35 }
结果输出如图: