» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with parsing + Web

SitePoint Blogs " SimpleXML and namespaces

$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');

XML: del.icio.us/tag/xml

therning.org/ magnus " Blog Archive " TagSoup, meet Parsec!

"Recently I began writing a tool to scrape some information off a web site for some off-line processing. After writing up the basics using TagSoup I showed what I had to a colleague. His first comment was "Can't you use Parsec for that?" It took me a...

Haskell: del.icio.us tag/haskell

Ian Bicking: a blog :: lxml.html

lxml.html.ElementSoup.parse() can parse pages with BeautifulSoup into lxml data structures. While the native lxml/libxml2 HTML parser works on pretty bad HTML, BeautifulSoup works on really bad HTML.

XML: del.icio.us/tag/xml

xmltramp: Make XML documents easily accessible.

Everyone's got their data in XML these days. You need to read it. You've looked at the other XML APIs and they all contain<sep/>

XML: del.icio.us/tag/xml

Beautiful Soup: We called him Tortoise because he taught us.

" You didn't write that awful page. You're just trying to get some data out of it. Right now, you don't really care what HTML is supposed to look like. Neither does this parser. "

XML: del.icio.us/tag/xml

Universal Feed Parser in Ruby

"rFeedParser is a translation of Mark Pilgrim’s Universal Feed Parser from Python into Ruby. It has nearly the exact same behavior."

XML: del.icio.us/tag/xml

lxml

lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique in that it combines the speed and feature completeness of these libraries with the simplicity of a native Python API.

libxml: del.icio.us tag libxml2

The ElementSoup Module ::: www.effbot.org

The ElementSoup module is a (slightly experimental) wrapper for Leonard Richardson's robust BeautifulSoup HTML parser, which turns the BeautifulSoup data structure into an element tree. The resulting combo is similar to ElementTidy, but a lot less picky.

XML: del.icio.us/tag/xml

TagSoup home page

This is the home page of TagSoup, a SAX-compliant parser written in Javathat, instead of parsing well-formed or valid XML, parses HTML as it isfound in the wild:poor, nasty and brutish, though quite often far from short.TagSoup is designed for people wh

XML: del.icio.us/tag/xml

atropine

atropine is a screen-scraping library built on top of BeautifulSoup. It helps programmers make assertions about document structure while getting at the data they are interested in.