当前位置：首页 > Java

java如何截取中文

2026-03-23 12:24:35Java

截取中文字符的方法

在Java中截取中文字符需要特别注意编码问题，因为中文字符通常占用多个字节（UTF-8编码下占3个字节）。以下是几种常见的方法：

使用String的substring方法

String类提供了substring方法，可以按字符位置截取字符串。对于中文字符，每个中文字符会被视为一个单独的字符单元。

java如何截取中文

String str = "这是一个测试字符串";
String subStr = str.substring(0, 4); // 截取前4个字符
System.out.println(subStr); // 输出：这是一

处理中英混合字符串

如果字符串包含中英混合字符，可以使用相同的方法：

String mixedStr = "abc这是一个测试";
String subMixed = mixedStr.substring(0, 5); // 截取前5个字符
System.out.println(subMixed); // 输出：abc这是

按字节长度截取

如果需要按字节长度截取字符串（例如数据库字段限制），需要特殊处理：

java如何截取中文

public static String substringByByte(String str, int byteLength) throws UnsupportedEncodingException {
    byte[] bytes = str.getBytes("GBK");
    if (byteLength >= bytes.length) {
        return str;
    }
    int n = 0; // 统计当前的字节数
    int i = 0; // 要截取的字符数
    for (; i < str.length() && n < byteLength; i++) {
        char c = str.charAt(i);
        if (c >= 0x4E00 && c <= 0x9FA5) { // 中文字符范围
            n += 2;
        } else {
            n += 1;
        }
    }
    return str.substring(0, i);
}

使用Apache Commons Lang库

Apache Commons Lang库提供了更强大的字符串处理功能：

import org.apache.commons.lang3.StringUtils;

String str = "这是一个测试字符串";
String subStr = StringUtils.substring(str, 0, 4); // 截取前4个字符

处理特殊编码情况

对于不同编码（如UTF-8、GBK等），截取时需要特别注意：

public static String substringByByte(String str, String charsetName, int byteLength) 
    throws UnsupportedEncodingException {
    byte[] bytes = str.getBytes(charsetName);
    if (bytes.length <= byteLength) {
        return str;
    }
    String result = new String(bytes, 0, byteLength, charsetName);
    // 处理可能出现的截断字符
    int length = result.length();
    if (length > 0 && !str.startsWith(result)) {
        result = result.substring(0, length - 1);
    }
    return result;
}

注意：在实际应用中，应根据具体需求选择合适的方法，特别是要注意编码问题和边界条件处理。