Can i process utf8 datasets or files
Java can process UTF-8 files without any problems, it is just that Java uses a different encoding for displaying them under Windows (= "Cp1252"). If you change the file encoding to "utf-8" everything should be fine. If you are running WEKA directly from the commandline, just add the following parameter to your commandline:
-Dfile.encoding=utf-8
If you are starting WEKA from the Start menu, then edit the RunWEKA.ini file:
- If a
fileEncodingplaceholder already exists, then just change the value from "Cp1252" to "utf-8" (without the quotes of course). - If there isn't a
fileEncodingyet, just add the-Dfile.encoding=utf-8parameter to all thejava/javawcommands).
For Korean users, the following was suggested:
- Save ARFF file as UTF-8
- Use cp949 for RunWeka.ini
- Check Unicode Utf-8 in Region Settings