Class Asciidoctor::Parser
In: lib/asciidoctor/parser.rb
Parent: Object

Public: Methods to parse lines of AsciiDoc into an object hierarchy representing the structure of the document. All methods are class methods and should be invoked from the Parser class. The main entry point is ::next_block. No Parser instances shall be discovered running around. (Any attempt to instantiate a Parser will be futile).

The object hierarchy created by the Parser consists of zero or more Section and Block objects. Section objects may be nested and a Section object contains zero or more Block objects. Block objects may be nested, but may only contain other Block objects. Block objects which represent lists may contain zero or more ListItem objects.

Examples

  # Create a Reader for the AsciiDoc lines and retrieve the next block from it.
  # Parser.next_block requires a parent, so we begin by instantiating an empty Document.

  doc = Document.new
  reader = Reader.new lines
  block = Parser.next_block(reader, doc)
  block.class
  # => Asciidoctor::Block

Methods

Constants

BlockMatchData = Struct.new :context, :masq, :tip, :terminator
TabRx = /\t/   Regexp for replacing tab character
TabIndentRx = /^\t+/   Regexp for leading tab indentation
StartOfBlockProc = lambda {|l| ((l.start_with? '[') && (BlockAttributeLineRx.match? l)) || (is_delimited_block? l) }
StartOfListProc = lambda {|l| AnyListRx.match? l }
StartOfBlockOrListProc = lambda {|l| (is_delimited_block? l) || ((l.start_with? '[') && (BlockAttributeLineRx.match? l)) || (AnyListRx.match? l) }
NoOp = nil
TableCellHorzAlignments = { '<' => 'left', '>' => 'right', '^' => 'center'   Internal: A Hash mapping horizontal alignment abbreviations to alignments that can be applied to a table cell (or to all cells in a column)
TableCellVertAlignments = { '<' => 'top', '>' => 'bottom', '^' => 'middle'   Internal: A Hash mapping vertical alignment abbreviations to alignments that can be applied to a table cell (or to all cells in a column)
TableCellStyles = { 'd' => :none, 's' => :strong, 'e' => :emphasis, 'm' => :monospaced, 'h' => :header, 'l' => :literal, 'v' => :verse, 'a' => :asciidoc   Internal: A Hash mapping styles abbreviations to styles that can be applied to a table cell (or to all cells in a column)

Public Class methods

Remove the block indentation (the leading whitespace equal to the amount of leading whitespace of the least indented line), then replace tabs with spaces (using proper tab expansion logic) and, finally, indent the lines by the amount specified.

This method preserves the relative indentation of the lines.

lines - the Array of String lines to process (no trailing endlines) indent - the integer number of spaces to add to the beginning

         of each line; if this value is nil, the existing
         space is preserved (optional, default: 0)

Examples

  source = <<EOS
      def names
        @name.split
      end
  EOS

  source.split "\n"
  # => ["    def names", "      @names.split", "    end"]

  puts Parser.adjust_indentation!(source.split "\n") * "\n"
  # => def names
  # =>   @names.split
  # => end

returns Nothing

Checks whether the line given is an atx section title.

The level returned is 1 less than number of leading markers.

line - [String] candidate title with leading atx marker.

Returns the [Integer] section level if this line is an atx section title, otherwise nothing.

whether a block supports compound content should be a config setting if terminator is false, that means the all the lines in the reader should be parsed NOTE could invoke filter in here, before and after parsing

Internal: Catalog any callouts found in the text, but don‘t process them

text - The String of text in which to look for callouts document - The current document in which the callouts are stored

Returns A Boolean indicating whether callouts were found

Internal: Catalog any inline anchors found in the text (but don‘t convert)

text - The String text in which to look for inline anchors block - The block in which the references should be searched document - The current Document on which the references are stored

Returns nothing

Internal: Catalog the bibliography inline anchor found in the start of the list item (but don‘t convert)

text - The String text in which to look for an inline bibliography anchor block - The ListItem block in which the reference should be searched document - The current document in which the reference is stored

Returns nothing

Internal: Initialize a new Section object and assign any attributes provided

The information for this section is retrieved by parsing the lines at the current position of the reader.

reader - the source reader parent - the parent Section or Document of this Section attributes - a Hash of attributes to assign to this section (default: {})

Public: Determines whether this line is the start of any of the delimited blocks

returns the match data if this line is the first line of a delimited block or nil if not

Internal: Convenience API for checking if the next line on the Reader is the document title

reader - the source Reader attributes - a Hash of attributes collected above the current line leveloffset - an Integer (or integer String value) the represents the current leveloffset

returns true if the Reader is positioned at the document title, false otherwise

Internal: Checks if the next line on the Reader is a section title

reader - the source Reader attributes - a Hash of attributes collected above the current line

Returns the Integer section level if the Reader is positioned at a section title or nil otherwise

Public: Checks whether the lines given are an atx or setext section title.

line1 - [String] candidate title. line2 - [String] candidate underline (default: nil).

Returns the [Integer] section level if these lines are a section title, otherwise nothing.

Internal: Determine whether the this line is a sibling list item according to the list type and trait (marker) provided.

line - The String line to check list_type - The context of the list (:olist, :ulist, :colist, :dlist) sibling_trait - The String marker for the list or the Regexp to match a sibling

Returns a Boolean indicating whether this line is a sibling list item given the criteria provided

Public: Make sure the Parser object doesn‘t get initialized.

Raises RuntimeError if this constructor is invoked.

Public: Parse and return the next Block at the Reader‘s current location

This method begins by skipping over blank lines to find the start of the next block (paragraph, block macro, or delimited block). If a block is found, that block is parsed, initialized as a Block object, and returned. Otherwise, the method returns nothing.

Regular expressions from the Asciidoctor module are used to match block boundaries. The ensuing lines are then processed according to the content model.

reader - The Reader from which to retrieve the next Block. parent - The Document, Section or Block to which the next Block belongs. attributes - A Hash of attributes that will become the attributes

             associated with the parsed Block (default: {}).

options - An options Hash to control parsing (default: {}):

             * :text indicates that the parser is only looking for text content

Returns a Block object built from the parsed content of the processed lines, or nothing if no block is found.

Internal: Parse and construct a description list Block from the current position of the Reader

reader - The Reader from which to retrieve the description list match - The Regexp match for the head of the list parent - The parent Block to which this description list belongs

Returns the Block encapsulating the parsed description list

Internal: Parse and construct an item list (ordered or unordered) from the current position of the Reader

reader - The Reader from which to retrieve the outline list list_type - A Symbol representing the list type (:olist for ordered, :ulist for unordered) parent - The parent Block to which this outline list belongs

Returns the Block encapsulating the parsed outline (unordered or ordered) list

Internal: Parse and construct the next ListItem for the current bulleted (unordered or ordered) list Block, callout lists included, or the next term ListItem and description ListItem pair for the description list Block.

First collect and process all the lines that constitute the next list item for the parent list (according to its type). Next, parse those lines into blocks and associate them with the ListItem (in the case of a description list, the description ListItem). Finally, fold the first block into the item‘s text attribute according to rules described in ListItem.

reader - The Reader from which to retrieve the next list item list_block - The parent list Block of this ListItem. Also provides access to the list type. match - The match Array which contains the marker and text (first-line) of the ListItem sibling_trait - The list marker or the Regexp to match a sibling item

Returns the next ListItem or ListItem pair (depending on the list type) for the parent list Block.

Public: Return the next section from the Reader.

This method process block metadata, content and subsections for this section and returns the Section object and any orphaned attributes.

If the parent is a Document and has a header (document title), then this method will put any non-section blocks at the start of document into a preamble Block. If there are no such blocks, the preamble is dropped.

Since we are reading line-by-line, there‘s a chance that metadata that should be associated with the following block gets consumed. To deal with this case, the method returns a running Hash of "orphaned" attributes that get passed to the next Section or Block.

reader - the source Reader parent - the parent Section or Document of this new section attributes - a Hash of metadata that was left orphaned from the

             previous Section.

Examples

  source
  # => "= Greetings\n\nThis is my doc.\n\n== Salutations\n\nIt is awesome."

  reader = Reader.new source, nil, :normalize => true
  # create empty document to parent the section
  # and hold attributes extracted from header
  doc = Document.new

  Parser.next_section(reader, doc)[0].title
  # => "Greetings"

  Parser.next_section(reader, doc)[0].title
  # => "Salutations"

returns a two-element Array containing the Section and Hash of orphaned attributes

Internal: Parse the table contained in the provided Reader

table_reader - a Reader containing the source lines of an AsciiDoc table parent - the parent Block of this Asciidoctor::Table attributes - attributes captured from above this Block

returns an instance of Asciidoctor::Table parsed from the provided reader

Public: Parses AsciiDoc source read from the Reader into the Document

This method is the main entry-point into the Parser when parsing a full document. It first looks for and, if found, processes the document title. It then proceeds to iterate through the lines in the Reader, parsing the document into nested Sections and Blocks.

reader - the Reader holding the source lines of the document document - the empty Document into which the lines will be parsed options - a Hash of options to control processing

returns the Document object

Internal: Parse the next line if it contains metadata for the following block

This method handles lines with the following content:

  • line or block comment
  • anchor
  • attribute list
  • block title

Any attributes found will be inserted into the attributes argument. If the line contains block metadata, the method returns true, otherwise false.

reader - the source reader document - the current Document attributes - a Hash of attributes in which any metadata found will be stored options - a Hash of options to control processing: (default: {})

             *  :text indicates the parser is only looking for text content,
                  thus neither a block title or attribute entry should be captured

returns true if the line contains metadata, otherwise false

Internal: Parse lines of metadata until a line of metadata is not found.

This method processes sequential lines containing block metadata, ignoring blank lines and comments.

reader - the source reader document - the current Document attributes - a Hash of attributes in which any metadata found will be stored (default: {}) options - a Hash of options to control processing: (default: {})

             *  :text indicates that parser is only looking for text content
                  and thus the block title should not be captured

returns the Hash of attributes including any metadata found

Public: Parse blocks from this reader until there are no more lines.

This method calls Parser#next_block until there are no more lines in the Reader. It does not consider sections because it‘s assumed the Reader only has lines which are within a delimited block region.

reader - The Reader containing the lines to process parent - The parent Block to which to attach the parsed blocks

Returns nothing.

Internal: Parse the cell specs for the current cell.

The cell specs dictate the cell‘s alignments, styles or filters, colspan, rowspan and/or repeating content.

The default spec when pos == :end is {} since we already know we‘re at a delimiter. When pos == :start, we may be at a delimiter, nil indicates we‘re not.

returns the Hash of attributes that indicate how to layout and style this cell in the table.

Internal: Parse the column specs for this table.

The column specs dictate the number of columns, relative width of columns, default alignments for cells in each column, and/or default styles or filters applied to the cells in the column.

Every column spec is guaranteed to have a width

returns a Hash of attributes that specify how to format and layout the cells in the table.

Public: Parses the document header of the AsciiDoc source read from the Reader

Reads the AsciiDoc source from the Reader until the end of the document header is reached. The Document object is populated with information from the header (document title, document attributes, etc). The document attributes are then saved to establish a save point to which to rollback after parsing is complete.

This method assumes that there are no blank lines at the start of the document, which are automatically removed by the reader.

returns the Hash of orphan block attributes captured above the header

Public: Consume and parse the two header lines (line 1 = author info, line 2 = revision info).

Returns the Hash of header metadata. If a Document object is supplied, the metadata is applied directly to the attributes of the Document.

reader - the Reader holding the source lines of the document document - the Document we are building (default: nil)

Examples

 data = ["Author Name <author@example.org>\n", "v1.0, 2012-12-21: Coincide w/ end of world.\n"]
 parse_header_metadata(Reader.new data, nil, :normalize => true)
 # => {'author' => 'Author Name', 'firstname' => 'Author', 'lastname' => 'Name', 'email' => 'author@example.org',
 #       'revnumber' => '1.0', 'revdate' => '2012-12-21', 'revremark' => 'Coincide w/ end of world.'}

Public: Parses the manpage header of the AsciiDoc source read from the Reader

returns Nothing

Internal: Parse the section title from the current position of the reader

Parse an atx (single-line) or setext (underlined) section title. After this method is called, the Reader will be positioned at the line after the section title.

For efficiency, we don‘t reuse methods internally that check for a section title.

reader - the source [Reader], positioned at a section title. document - the current [Document].

Examples

  reader.lines
  # => ["Foo", "~~~"]

  id, reftext, title, level, atx = parse_section_title(reader, document)

  title
  # => "Foo"
  level
  # => 2
  id
  # => nil
  atx
  # => false

  line1
  # => "==== Foo"

  id, reftext, title, level, atx = parse_section_title(reader, document)

  title
  # => "Foo"
  level
  # => 3
  id
  # => nil
  atx
  # => true

Returns an 5-element [Array] containing the id (String), reftext (String), title (String), level (Integer), and flag (Boolean) indicating whether an atx section title was matched, or nothing.

Public: Parse the first positional attribute and assign named attributes

Parse the first positional attribute to extract the style, role and id parts, assign the values to their cooresponding attribute keys and return the parsed style from the first positional attribute.

attributes - The Hash of attributes to process and update

Examples

  puts attributes
  => { 1 => "abstract#intro.lead%fragment", "style" => "preamble" }

  parse_style_attribute(attributes)
  => "abstract"

  puts attributes
  => { 1 => "abstract#intro.lead%fragment", "style" => "abstract", "id" => "intro",
        "role" => "lead", "options" => "fragment", "fragment-option" => '' }

Returns the String style parsed from the first positional attribute

Internal: Parse the author line into a Hash of author metadata

author_line - the String author line names_only - a Boolean flag that indicates whether to process line as

               names only or names with emails (default: false)

multiple - a Boolean flag that indicates whether to process multiple

               semicolon-separated entries in the author line (default: true)

returns a Hash of author metadata

Internal: Collect the lines belonging to the current list item, navigating through all the rules that determine what comprises a list item.

Grab lines until a sibling list item is found, or the block is broken by a terminator (such as a line comment). Description lists are more greedy if they don‘t have optional inline item text...they want that text

reader - The Reader from which to retrieve the lines. list_type - The Symbol context of the list (:ulist, :olist, :colist or :dlist) sibling_trait - A Regexp that matches a sibling of this list item or String list marker

                  of the items in this list (default: nil)

has_text - Whether the list item has text defined inline (always true except for description lists)

Returns an Array of lines belonging to the current list item.

Internal: Resolve the 0-index marker for this list item

For ordered lists, match the marker used for this list item against the known list markers and determine which marker is the first (0-index) marker in its number series.

For callout lists, return <1>.

For bulleted lists, return the marker as passed to this method.

list_type - The Symbol context of the list marker - The String marker for this list item ordinal - The position of this list item in the list validate - Whether to validate the value of the marker

Returns the String 0-index marker for this list item

Internal: Resolve the 0-index marker for this ordered list item

Match the marker used for this ordered list item against the known ordered list markers and determine which marker is the first (0-index) marker in its number series.

The purpose of this method is to normalize the implicit numbered markers so that they can be compared against other list items.

marker - The marker used for this list item ordinal - The 0-based index of the list item (default: 0) validate - Perform validation that the marker provided is the proper

           marker in the sequence (default: false)

Examples

 marker = 'B.'
 Parser.resolve_ordered_list_marker(marker, 1, true)
 # => 'A.'

Returns the String of the first marker in this number series

Internal: Converts a Roman numeral to an integer value.

value - The String Roman numeral to convert

Returns the Integer for this Roman numeral

Public: Convert a string to a legal attribute name.

name - the String name of the attribute

Returns a String with the legal AsciiDoc attribute name.

Examples

  sanitize_attribute_name('Foo Bar')
  => 'foobar'

  sanitize_attribute_name('foo')
  => 'foo'

  sanitize_attribute_name('Foo 3 #-Billy')
  => 'foo3-billy'

Checks whether the lines given are an setext section title.

line1 - [String] candidate title line2 - [String] candidate underline

Returns the [Integer] section level if these lines are an setext section title, otherwise nothing.

Public: Store the attribute in the document and register attribute entry if accessible

name - the String name of the attribute to store;

        if name begins or ends with !, it signals to remove the attribute with that root name

value - the String value of the attribute to store doc - the Document being parsed attrs - the attributes for the current context

returns a 2-element array containing the resolved attribute name (minus the ! indicator) and value

[Validate]