Getbytes utf 8 java

I am trying to deal with an encoding problem (I want to transform the special characters from a string into correct UTF-8 characters. ):

When I execute this simple code:

In the console I expect: ‘é’ but I get

2 Answers 2

é is the HTML entity reference for the é character, not the UTF-8 encoded string. To decode it, you can use Commons Lang’s org.apache.commons.lang.StringEscapeUtils :

Java Strings know nothing of SGML / XML / HTML5 entities. é is such an entity. It works in web browsers inside HTML because in one of the DTDs, or the HTML5 spec, it’s defined that &eacute is the letter e with accent acute by mapping it to the corresponding unicode character entity é .

new String(someString.getBytes("UTF-8"), "UTF-8"); is a meaningless operation, it converts a String into bytes, with an encoding that can represent all meaningful characters, and converts it back into a String. It’s the same thing as using someString directly, just you have a new object.

In order to get e with accent acute, you can do one of the following things:

Всем привет
Подскажите пожалуйста
Есть сайт в кодировке windows-1251
Делаю следующее
try <
URL u = new URL("sitename");
URLConnection conn = u.openConnection();
DataInputStream in = new DataInputStream ( conn.getInputStream ( ) ) ;
BufferedReader d = new BufferedReader(new InputStreamReader(in));
String str=null;
StringBuilder sb = new StringBuilder();
while( (str = d.readLine()) !=null)
str = sb.toString();

WebView myWebView = (WebView) findViewById(;
int st=str.indexOf("block_title")-12;
int en=str.indexOf("block2")-18;
String res=str.substring(st,en);
String utf8String= new String(res.getBytes("UTF-8"), "windows-1251");;
String summary = "!"+utf8String+"";
myWebView.loadData(summary, "text/html", "utf-8");

и в WebView выходят каракули
подскажите как конвертнуть

Functions written there work properly that is pack(unpack("string")) yields to "string" . But I would like to have the same result as "string".getBytes("UTF8") gives in Java.

The question is how to make a function giving the same functionality as Java getBytes("UTF8") in JavaScript?

For Latin strings unpack(str) from the article mentioned above provides the same result as getBytes("UTF8") except it adds 0 for odd positions. But with non-Latin strings it works completely different as it seems to me. Is there a way to work with string data in JavaScript like Java does?

Оцените статью
Много толка
Добавить комментарий