Parsing HTML easily in Objective-C with ObjectiveGumbo

Previously if you wanted to parse HTML on iOS/OSX using Objective-C you had to use an XML parser however I have now written an Objective-C wrapper around Gumbo, Google’s new C HTML5 parsing library, so that it is really easy to get parse HTML with minimum hassle.

To get started you will firstly need to download the Gumbo source code from their GitHub repo and follow their getting started instructions.

Once you have a working local copy of Gumbo (you can validate this by running one of the samples) you should download the source code forObjectiveGumbo from GitHub. To add ObjectiveGumbo to your Xcode projects do the following:

  1. Add the ObjectiveGumbo folder from the repository – this contains the source code for the framework, which is basically just a few classes
  2. Add the src folder from the Gumbo repo (only the .c and .h files)
  3. Ensure that Xcode is set to compile all .c and .m files (Xcode 5 doesn’t do this by default when adding files to a project
  4. Validate that the project builds correctly
  5. Import “ObjectiveGumbo.h” when you want to use it in your app

Usage for ObjectiveGumbo is pretty simple. You can either parse a whole document – which will gain you access to the DOCTYPE information – or just the root element (i.e. the body). These can be loaded from an instance of NSData, NSURL or NSString.

Simple example fetching the current top stories on Hacker News:

OGNode * data = [ObjectiveGumbo parseDocumentWithUrl:[NSURL URLWithString:@"http://news.ycombinator.com"]];
NSArray * tableRows = [data elementsWithClass:@"title"];
for (OGElement * tableRow in tableRows)
{
    if (tableRow.children.count > 1)
    {
         OGElement * link = tableRow.children[0];
         NSLog(@"%@", link.attributes[@"href"]);
    }
}

More detail is explained in the README and there is also a more developed Hacker News example (iOS) and simple Markdown to HTML parser (OSX – not complete) in the GitHub repo.

Advertisements

3 thoughts on “Parsing HTML easily in Objective-C with ObjectiveGumbo

  1. I don’t think you have to add ‘src’ file since there is ‘Gumbo’ folder in the file uploaded. You’ll get an error if you do. Also, there is a line with @”command” which produces an error. There was no command in tag.c, so I commented the line out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s