I am trying to deal with an encoding problem (I want to transform the special characters from a string into correct UTF-8 characters. ):
When I execute this simple code:
In the console I expect: ‘é’ but I get
2 Answers 2
é is the HTML entity reference for the é character, not the UTF-8 encoded string. To decode it, you can use Commons Lang’s org.apache.commons.lang.StringEscapeUtils :
Java Strings know nothing of SGML / XML / HTML5 entities. é is such an entity. It works in web browsers inside HTML because in one of the DTDs, or the HTML5 spec, it’s defined that é is the letter e with accent acute by mapping it to the corresponding unicode character entity é .
new String(someString.getBytes("UTF-8"), "UTF-8"); is a meaningless operation, it converts a String into bytes, with an encoding that can represent all meaningful characters, and converts it back into a String. It’s the same thing as using someString directly, just you have a new object.
In order to get e with accent acute, you can do one of the following things:
Есть сайт в кодировке windows-1251
URL u = new URL("sitename");
URLConnection conn = u.openConnection();
DataInputStream in = new DataInputStream ( conn.getInputStream ( ) ) ;
BufferedReader d = new BufferedReader(new InputStreamReader(in));
StringBuilder sb = new StringBuilder();
while( (str = d.readLine()) !=null)
str = sb.toString();
WebView myWebView = (WebView) findViewById(R.id.webview);
String utf8String= new String(res.getBytes("UTF-8"), "windows-1251");;
String summary = "!"+utf8String+"";
myWebView.loadData(summary, "text/html", "utf-8");
и в WebView выходят каракули
подскажите как конвертнуть
Functions written there work properly that is pack(unpack("string")) yields to "string" . But I would like to have the same result as "string".getBytes("UTF8") gives in Java.