CyberNeko HTML Parser _1.9.21

About

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

License Agreement

The NekoHTML parser is distributed under the Apache 2.0 license. For specific license details, please refer to the LICENSE.txt file.

Download

The NekoHTML parser includes complete Java source code and documentation. You can download the latest version from the following location:

NekoHTML [zip] [tgz]

Requirements and Limitations

This version of NekoHTML requires the following:

Java 1.3 (or higher)
Xerces 2.0.0 (or higher) [archive]

This version has the following limitations:

There are HTML documents for which NekoHTML cannot properly generate a well-formed XML document event stream. For example, documents with multiple <html> tags are inherently ill-formed because XML documents may only have a single root element.
Code added to the core DOM implementation in Xerces-J 2.0.1 introduced a bug in the HTML DOM implementation based on it. The bug causes the element nodes in the resultant HTML document object to be of type org.apache.xerces.dom.ElementNSImpl instead of the appropriate HTML DOM element objects. The problem affects NekoHTML users who use the parser with Xerces-J 2.0.1 and anyone using the HTML DOM implementation in Xerces-J 2.0.1.
There are no other known major limitations with this release. However, additional work can always be done to improve performance, fix bugs, and add functionality.

More Information

Questions or comments about the CyberNeko HTML Parser can be posted to the appropriate mailing list. The User mailing list is for general parser usage issues and the Developer mailing list is for design discussions.

User mailing list view join post
Developer mailing list view join post
If you find a problem with NekoHTML, please file a bug.

User mailing list	view	join	post
Developer mailing list	view	join	post

CyberNeko HTML Parser 1.9.21