Module htmllib :: Class HTMLParser
[show private | hide private]
[frames | no frames]

Class HTMLParser

ParserBase --+    
             |    
    SGMLParser --+
                 |
                HTMLParser


This is the basic HTML parser class.

It supports all entity names required by the XHTML 1.0 Recommendation. It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
Method Summary
  __init__(self, formatter, verbose)
Creates an instance of the HTMLParser class.
  anchor_bgn(self, href, name, type)
This method is called at the start of an anchor region.
  anchor_end(self)
This method is called at the end of an anchor region.
  ddpop(self, bl)
  do_base(self, attrs)
  do_br(self, attrs)
  do_dd(self, attrs)
  do_dt(self, attrs)
  do_hr(self, attrs)
  do_img(self, attrs)
  do_isindex(self, attrs)
  do_li(self, attrs)
  do_link(self, attrs)
  do_meta(self, attrs)
  do_nextid(self, attrs)
  do_p(self, attrs)
  do_plaintext(self, attrs)
  end_a(self)
  end_address(self)
  end_b(self)
  end_blockquote(self)
  end_body(self)
  end_cite(self)
  end_code(self)
  end_dir(self)
  end_dl(self)
  end_em(self)
  end_h1(self)
  end_h2(self)
  end_h3(self)
  end_h4(self)
  end_h5(self)
  end_h6(self)
  end_head(self)
  end_html(self)
  end_i(self)
  end_kbd(self)
  end_listing(self)
  end_menu(self)
  end_ol(self)
  end_pre(self)
  end_samp(self)
  end_strong(self)
  end_title(self)
  end_tt(self)
  end_ul(self)
  end_var(self)
  end_xmp(self)
  error(self, message)
  handle_data(self, data)
  handle_image(self, src, alt, *args)
This method is called to handle images.
  reset(self)
Reset this instance.
  save_bgn(self)
Begins saving character data in a buffer instead of sending it to the formatter object.
  save_end(self)
Ends buffering character data and returns all data saved since the preceding call to the save_bgn() method.
  start_a(self, attrs)
  start_address(self, attrs)
  start_b(self, attrs)
  start_blockquote(self, attrs)
  start_body(self, attrs)
  start_cite(self, attrs)
  start_code(self, attrs)
  start_dir(self, attrs)
  start_dl(self, attrs)
  start_em(self, attrs)
  start_h1(self, attrs)
  start_h2(self, attrs)
  start_h3(self, attrs)
  start_h4(self, attrs)
  start_h5(self, attrs)
  start_h6(self, attrs)
  start_head(self, attrs)
  start_html(self, attrs)
  start_i(self, attrs)
  start_kbd(self, attrs)
  start_listing(self, attrs)
  start_menu(self, attrs)
  start_ol(self, attrs)
  start_pre(self, attrs)
  start_samp(self, attrs)
  start_strong(self, attrs)
  start_title(self, attrs)
  start_tt(self, attrs)
  start_ul(self, attrs)
  start_var(self, attrs)
  start_xmp(self, attrs)
  unknown_endtag(self, tag)
  unknown_starttag(self, tag, attrs)
    Inherited from SGMLParser
  close(self)
Handle the remaining data.
  feed(self, data)
Feed some data to the parser.
  finish_endtag(self, tag)
  finish_shorttag(self, tag, data)
  finish_starttag(self, tag, attrs)
  get_starttag_text(self)
  goahead(self, end)
  handle_charref(self, name)
Handle character reference, no need to override.
  handle_comment(self, data)
  handle_decl(self, decl)
  handle_endtag(self, tag, method)
  handle_entityref(self, name)
Handle entity references.
  handle_pi(self, data)
  handle_starttag(self, tag, method, attrs)
  parse_endtag(self, i)
  parse_pi(self, i)
  parse_starttag(self, i)
  report_unbalanced(self, tag)
  setliteral(self, *args)
Enter literal mode (CDATA).
  setnomoretags(self)
Enter literal mode (CDATA) till EOF.
  unknown_charref(self, ref)
  unknown_entityref(self, ref)
    Inherited from ParserBase
  getpos(self)
Return current line number and offset.
  parse_comment(self, i, report)
  parse_declaration(self, i)
  parse_marked_section(self, i, report)
  unknown_decl(self, data)
  updatepos(self, i, j)

Class Variable Summary
dict entitydefs = {'zwnj': '‌', 'aring': '\xe5', 'gt': ...

Method Details

__init__(self, formatter, verbose=0)
(Constructor)

Creates an instance of the HTMLParser class.

The formatter parameter is the formatter instance associated with the parser.
Overrides:
sgmllib.SGMLParser.__init__

anchor_bgn(self, href, name, type)

This method is called at the start of an anchor region.

The arguments correspond to the attributes of the <A> tag with the same names. The default implementation maintains a list of hyperlinks (defined by the HREF attribute for <A> tags) within the document. The list of hyperlinks is available as the data attribute anchorlist.

anchor_end(self)

This method is called at the end of an anchor region.

The default implementation adds a textual footnote marker using an index into the list of hyperlinks created by the anchor_bgn()method.

handle_image(self, src, alt, *args)

This method is called to handle images.

The default implementation simply passes the alt value to the handle_data() method.

reset(self)

Reset this instance. Loses all unprocessed data.
Overrides:
sgmllib.SGMLParser.reset (inherited documentation)

save_bgn(self)

Begins saving character data in a buffer instead of sending it to the formatter object.

Retrieve the stored data via the save_end() method. Use of the save_bgn() / save_end() pair may not be nested.

save_end(self)

Ends buffering character data and returns all data saved since the preceding call to the save_bgn() method.

If the nofill flag is false, whitespace is collapsed to single spaces. A call to this method without a preceding call to the save_bgn() method will raise a TypeError exception.

Class Variable Details

entitydefs

Type:
dict
Value:
{'Chi': '&#935;',
 'Egrave': '\xc8',
 'aring': '\xe5',
 'bull': '&#8226;',
 'gt': '>',
 'ograve': '\xf2',
 'trade': '&#8482;',
 'yen': '\xa5',
...                                                                    

Generated by Epydoc 2.1 on Sun Apr 22 21:30:28 2007 http://epydoc.sf.net