Sorry, this is off topic. Not relevant to the original issue at all.
I played on java.text.RuleBasedCollator
further. I wanted to sort digits (Days of month in the range of 1 - 31) in Japanese KANJI characters in “natural order”. This is my code:
import java.text.Collator;
import java.text.RuleBasedCollator;
List<String> stringList = Arrays.asList("十","五","一","七","八","四","六","二","九",
"三","十一","二十","廿","廿三","三十","二十一","十三");
// [10,5,1,7,8,4,6,2,9,3,11,20,20,23,30,21,13]
RuleBasedCollator localRules = (RuleBasedCollator) Collator.getInstance(Locale.JAPAN);
StringBuilder sb= new StringBuilder();
sb.append("一 < 二 < 三 < 四 < 五 < 六 < 七 < 八 < 九 < 十")
sb.append(" < 十一 < 十二 < 十三 < 十四 < 十五 < 十六 < 十七 < 十八 < 十九 < 二十,廿");
sb.append(" < 二十一,廿一 < 二十二,廿二 < 二十三,廿三 < 二十四,廿四 < 二十五,廿五");
sb.append(" < 二十六,廿六 < 二十七,廿七 < 二十八,廿八 < 二十九,廿九 < 三十 < 三十一");
RuleBasedCollator c = new RuleBasedCollator(localRules.getRules() + " & " + sb.toString());
Collections.sort(stringList, c);
System.out.println(stringList);
// [一, 二, 三, 四, 五, 六, 七, 八, 九, 十, 十一, 十三, 二十, 廿, 二十一, 廿三, 三十]
// [1,2,3,4,5,6,7,8,9,10,11,13,20,20,21,23,30]
Wow! Perfectly natural.
A memo for myself.
The concept of “Collation” was developed by ICU. The ICU User Guide wrote:
Combinations of letters can be treated as if they were one letter. For example, in traditional Spanish “ch” is treated as a single letter, and sorted between “c” and “d”.
This is the point.
For example a set of strings: 1
, 10
and 2
. Each of them can be regarded by the RuleBasedCollator as one letter. And I can define the order as 1
< 2
< 10
.