目标 C HTML 转义/转义

想知道是否有一种简单的方法可以在 Objective C 中做一个简单的 HTML 转义/转义。我想要的是类似于这个 suedo 代码的东西:

NSString *string = @"<span>Foo</span>";
[string stringByUnescapingHTML];

然后又回来了

<span>Foo</span>

希望不转义所有其他 HTML 实体以及甚至 ASCII 代码像 & # 1234; 和类似。

在 Cocoa Touch/UIKit 中有什么方法可以做到这一点吗?

96682 次浏览

This link contains the solution below. Cocoa CF has the CFXMLCreateStringByUnescapingEntities function but that's not available on the iPhone.

@interface MREntitiesConverter : NSObject <NSXMLParserDelegate>{
NSMutableString* resultString;
}


@property (nonatomic, retain) NSMutableString* resultString;


- (NSString*)convertEntitiesInString:(NSString*)s;


@end




@implementation MREntitiesConverter


@synthesize resultString;


- (id)init
{
if([super init]) {
resultString = [[NSMutableString alloc] init];
}
return self;
}


- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
[self.resultString appendString:s];
}


- (NSString*)convertEntitiesInString:(NSString*)s {
if (!s) {
NSLog(@"ERROR : Parameter string is nil");
}
NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSXMLParser* xmlParse = [[[NSXMLParser alloc] initWithData:data] autorelease];
[xmlParse setDelegate:self];
[xmlParse parse];
return [NSString stringWithFormat:@"%@",resultString];
}


- (void)dealloc {
[resultString release];
[super dealloc];
}


@end

MREntitiesConverter doesn't work for escaping malformed xml. It will fail on a simple URL:

http://www.google.com/search?client=safari&rls=en&q=fail&ie=UTF-8&oe=UTF-8

The MREntitiesConverter above is an HTML stripper, not encoder.

If you need an encoder, go here: Encode NSString for XML/HTML

Check out my NSString category for XMLEntities. There's methods to decode XML entities (including all HTML character references), encode XML entities, stripping tags and removing newlines and whitespace from a string:

- (NSString *)stringByStrippingTags;
- (NSString *)stringByDecodingXMLEntities; // Including all HTML character references
- (NSString *)stringByEncodingXMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;

This is an easy to use NSString category implementation:

It is far from complete but you can add some missing entities from here: http://code.google.com/p/statz/source/browse/trunk/NSString%2BHTML.m

Usage:

#import "NSString+HTML.h"


NSString *raw = [NSString stringWithFormat:@"<div></div>"];
NSString *escaped = [raw htmlEscapedString];

This is an incredibly hacked together solution I did, but if you want to simply escape a string without worrying about parsing, do this:

-(NSString *)htmlEntityDecode:(NSString *)string
{
string = [string stringByReplacingOccurrencesOfString:@"&quot;" withString:@"\""];
string = [string stringByReplacingOccurrencesOfString:@"&apos;" withString:@"'"];
string = [string stringByReplacingOccurrencesOfString:@"&lt;" withString:@"<"];
string = [string stringByReplacingOccurrencesOfString:@"&gt;" withString:@">"];
string = [string stringByReplacingOccurrencesOfString:@"&amp;" withString:@"&"]; // Do this last so that, e.g. @"&amp;lt;" goes to @"&lt;" not @"<"


return string;
}

I know it's by no means elegant, but it gets the job done. You can then decode an element by calling:

string = [self htmlEntityDecode:string];

Like I said, it's hacky but it works. IF you want to encode a string, just reverse the stringByReplacingOccurencesOfString parameters.

Why not just using ?

NSData *data = [s dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSString *result = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
return result;

Noob question but in my case it works...

Another HTML NSString category from Google Toolbox for Mac
Despite the name, this works on iOS too.

http://google-toolbox-for-mac.googlecode.com/svn/trunk/Foundation/GTMNSString+HTML.h

/// Get a string where internal characters that are escaped for HTML are unescaped
//
///  For example, '&amp;' becomes '&'
///  Handles &#32; and &#x32; cases as well
///
//  Returns:
//    Autoreleased NSString
//
- (NSString *)gtm_stringByUnescapingFromHTML;

And I had to include only three files in the project: header, implementation and GTMDefines.h.

Here's a solution that neutralizes all characters (by making them all HTML encoded entities for their unicode value)... Used this for my need (making sure a string that came from the user but was placed inside of a webview couldn't have any XSS attacks):

Interface:

@interface NSString (escape)
- (NSString*)stringByEncodingHTMLEntities;
@end

Implementation:

@implementation NSString (escape)


- (NSString*)stringByEncodingHTMLEntities {
// Rather then mapping each individual entity and checking if it needs to be replaced, we simply replace every character with the hex entity


NSMutableString *resultString = [NSMutableString string];
for(int pos = 0; pos<[self length]; pos++)
[resultString appendFormat:@"&#x%x;",[self characterAtIndex:pos]];
return [NSString stringWithString:resultString];
}


@end

Usage Example:

UIWebView *webView = [[UIWebView alloc] init];
NSString *userInput = @"<script>alert('This is an XSS ATTACK!');</script>";
NSString *safeInput = [userInput stringByEncodingHTMLEntities];
[webView loadHTMLString:safeInput baseURL:nil];

Your mileage will vary.

This is an old answer that I posted some years ago. My intention was not to provide a "good" and "respectable" solution, but a "hacky" one that might be useful under some circunstances. Please, don't use this solution unless nothing else works.

Actually, it works perfectly fine in many situations that other answers don't because the UIWebView is doing all the work. And you can even inject some javascript (which can be dangerous and/or useful). The performance should be horrible, but actually is not that bad.

There is another solution that has to be mentioned. Just create a UIWebView, load the encoded string and get the text back. It escapes tags "<>", and also decodes all html entities (e.g. "&gt;") and it might work where other's don't (e.g. using cyrillics). I don't think it's the best solution, but it can be useful if the above solutions doesn't work.

Here is a small example using ARC:

@interface YourClass() <UIWebViewDelegate>


@property UIWebView *webView;


@end


@implementation YourClass


- (void)someMethodWhereYouGetTheHtmlString:(NSString *)htmlString {
self.webView = [[UIWebView alloc] init];
NSString *htmlString = [NSString stringWithFormat:@"<html><body>%@</body></html>", self.description];
[self.webView loadHTMLString:htmlString baseURL:nil];
self.webView.delegate = self;
}


- (void)webView:(UIWebView *)webView didFailLoadWithError:(NSError *)error {
self.webView = nil;
}


- (void)webViewDidFinishLoad:(UIWebView *)webView {
self.webView = nil;
NSString *escapedString = [self.webView stringByEvaluatingJavaScriptFromString:@"document.body.textContent;"];
}


- (void)webViewDidStartLoad:(UIWebView *)webView {
// Do Nothing
}


@end

In iOS 7 you can use NSAttributedString's ability to import HTML to convert HTML entities to an NSString.

Eg:

@interface NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString;
@end


@implementation NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString
{
NSDictionary *options = @{ NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute :@(NSUTF8StringEncoding) };


NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding];


return [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];
}


@end

Then in your code when you want to clean up the entities:

NSString *cleanString = [[NSAttributedString attributedStringWithHTMLString:question.title] string];

This is probably the simplest way, but I don't know how performant it is. You should probably be pretty damn sure the content your "cleaning" doesn't contain any <img> tags or stuff like that because this method will download those images during the HTML to NSAttributedString conversion. :)

If you need to generate a literal you might consider using a tool like this:

http://www.freeformatter.com/java-dotnet-escape.html#ad-output

to accomplish the work for you.

See also this answer.

This easiest solution is to create a category as below:

Here’s the category’s header file:

#import <Foundation/Foundation.h>
@interface NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding;
@end

And here’s the implementation:

#import "NSString+URLEncoding.h"
@implementation NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding {
return (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)self,
NULL,
(CFStringRef)@"!*'\"();:@&=+$,/?%#[]% ",
CFStringConvertNSStringEncodingToEncoding(encoding));
}
@end

And now we can simply do this:

NSString *raw = @"hell & brimstone + earthly/delight";
NSString *url = [NSString stringWithFormat:@"http://example.com/example?param=%@",
[raw urlEncodeUsingEncoding:NSUTF8Encoding]];
NSLog(url);

The credits for this answer goes to the website below:-

http://madebymany.com/blog/url-encoding-an-nsstring-on-ios

The least invasive and most lightweight way to encode and decode HTML or XML strings is to use the GTMNSStringHTMLAdditions CocoaPod.

It is simply the Google Toolbox for Mac NSString category GTMNSString+HTML, stripped of the dependency on GTMDefines.h. So all you need to add is one .h and one .m, and you're good to go.

Example:

#import "GTMNSString+HTML.h"


// Encoding a string with XML / HTML elements
NSString *stringToEncode = @"<TheBeat>Goes On</TheBeat>";
NSString *encodedString = [stringToEncode gtm_stringByEscapingForHTML];


// encodedString looks like this now:
// &lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;


// Decoding a string with XML / HTML encoded elements
NSString *stringToDecode = @"&lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;";
NSString *decodedString = [stringToDecode gtm_stringByUnescapingFromHTML];


// decodedString looks like this now:
// <TheBeat>Goes On</TheBeat>