在 Android 中解析 HTML

我试图解析 HTML 在安卓从一个网页,因为网页它没有很好的形式,我得到 SAXException

在 Android 中有解析 HTML 的方法吗?

145130 次浏览

Have you tried using Html.fromHtml(source)?

I think that class is pretty liberal with respect to source quality (it uses TagSoup internally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.

Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.

http://developer.android.com/reference/android/webkit/WebView.html

I think that you can enable javascript if you need it.

I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
=> Good thing about it is that it will handle badly formed HTML

Here's a good example from their site.

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");


//http://jsoup.org/cookbook/input/load-document-from-url
//Document doc = Jsoup.connect("http://example.com/").get();


Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}
String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();

We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..

So Code goes like this

  private void getWebsite() {
new Thread(new Runnable() {
@Override
public void run() {
final StringBuilder builder = new StringBuilder();


try {
Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
String title = doc.title();
Elements links = doc.select("a[href]");


builder.append(title).append("\n");


for (Element link : links) {
builder.append("\n").append("Link : ").append(link.attr("href"))
.append("\n").append("Text : ").append(link.text());
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}


runOnUiThread(new Runnable() {
@Override
public void run() {
result.setText(builder.toString());
}
});
}
}).start();
}

You just have to call the above function in onCreate Method of your MainActivity

I hope this one is also helpful for you guys.

Also read the original blog at Medium