Class | Asciidoctor::Parser |
In: |
lib/asciidoctor/parser.rb
|
Parent: | Object |
Public: Methods to parse lines of AsciiDoc into an object hierarchy representing the structure of the document. All methods are class methods and should be invoked from the Parser class. The main entry point is ::next_block. No Parser instances shall be discovered running around. (Any attempt to instantiate a Parser will be futile).
The object hierarchy created by the Parser consists of zero or more Section and Block objects. Section objects may be nested and a Section object contains zero or more Block objects. Block objects may be nested, but may only contain other Block objects. Block objects which represent lists may contain zero or more ListItem objects.
Examples
# Create a Reader for the AsciiDoc lines and retrieve the next block from it. # Parser.next_block requires a parent, so we begin by instantiating an empty Document. doc = Document.new reader = Reader.new lines block = Parser.next_block(reader, doc) block.class # => Asciidoctor::Block
BlockMatchData | = | Struct.new :context, :masq, :tip, :terminator | ||
TabRx | = | /\t/ | Regexp for replacing tab character | |
TabIndentRx | = | /^\t+/ | Regexp for leading tab indentation | |
StartOfBlockProc | = | lambda {|l| ((l.start_with? '[') && (BlockAttributeLineRx.match? l)) || (is_delimited_block? l) } | ||
StartOfListProc | = | lambda {|l| AnyListRx.match? l } | ||
StartOfBlockOrListProc | = | lambda {|l| (is_delimited_block? l) || ((l.start_with? '[') && (BlockAttributeLineRx.match? l)) || (AnyListRx.match? l) } | ||
NoOp | = | nil | ||
TableCellHorzAlignments | = | { '<' => 'left', '>' => 'right', '^' => 'center' | Internal: A Hash mapping horizontal alignment abbreviations to alignments that can be applied to a table cell (or to all cells in a column) | |
TableCellVertAlignments | = | { '<' => 'top', '>' => 'bottom', '^' => 'middle' | Internal: A Hash mapping vertical alignment abbreviations to alignments that can be applied to a table cell (or to all cells in a column) | |
TableCellStyles | = | { 'd' => :none, 's' => :strong, 'e' => :emphasis, 'm' => :monospaced, 'h' => :header, 'l' => :literal, 'v' => :verse, 'a' => :asciidoc | Internal: A Hash mapping styles abbreviations to styles that can be applied to a table cell (or to all cells in a column) |
Remove the block indentation (the leading whitespace equal to the amount of leading whitespace of the least indented line), then replace tabs with spaces (using proper tab expansion logic) and, finally, indent the lines by the amount specified.
This method preserves the relative indentation of the lines.
lines - the Array of String lines to process (no trailing endlines) indent - the integer number of spaces to add to the beginning
of each line; if this value is nil, the existing space is preserved (optional, default: 0)
Examples
source = <<EOS def names @name.split end EOS source.split "\n" # => [" def names", " @names.split", " end"] puts Parser.adjust_indentation!(source.split "\n") * "\n" # => def names # => @names.split # => end
returns Nothing
Checks whether the line given is an atx section title.
The level returned is 1 less than number of leading markers.
line - [String] candidate title with leading atx marker.
Returns the [Integer] section level if this line is an atx section title, otherwise nothing.
whether a block supports compound content should be a config setting if terminator is false, that means the all the lines in the reader should be parsed NOTE could invoke filter in here, before and after parsing
Internal: Catalog any callouts found in the text, but don‘t process them
text - The String of text in which to look for callouts document - The current document in which the callouts are stored
Returns A Boolean indicating whether callouts were found
Internal: Catalog the bibliography inline anchor found in the start of the list item (but don‘t convert)
text - The String text in which to look for an inline bibliography anchor block - The ListItem block in which the reference should be searched document - The current document in which the reference is stored
Returns nothing
Internal: Initialize a new Section object and assign any attributes provided
The information for this section is retrieved by parsing the lines at the current position of the reader.
reader - the source reader parent - the parent Section or Document of this Section attributes - a Hash of attributes to assign to this section (default: {})
Public: Determines whether this line is the start of any of the delimited blocks
returns the match data if this line is the first line of a delimited block or nil if not
Internal: Convenience API for checking if the next line on the Reader is the document title
reader - the source Reader attributes - a Hash of attributes collected above the current line leveloffset - an Integer (or integer String value) the represents the current leveloffset
returns true if the Reader is positioned at the document title, false otherwise
Internal: Determine whether the this line is a sibling list item according to the list type and trait (marker) provided.
line - The String line to check list_type - The context of the list (:olist, :ulist, :colist, :dlist) sibling_trait - The String marker for the list or the Regexp to match a sibling
Returns a Boolean indicating whether this line is a sibling list item given the criteria provided
Public: Make sure the Parser object doesn‘t get initialized.
Raises RuntimeError if this constructor is invoked.
Public: Parse and return the next Block at the Reader‘s current location
This method begins by skipping over blank lines to find the start of the next block (paragraph, block macro, or delimited block). If a block is found, that block is parsed, initialized as a Block object, and returned. Otherwise, the method returns nothing.
Regular expressions from the Asciidoctor module are used to match block boundaries. The ensuing lines are then processed according to the content model.
reader - The Reader from which to retrieve the next Block. parent - The Document, Section or Block to which the next Block belongs. attributes - A Hash of attributes that will become the attributes
associated with the parsed Block (default: {}).
options - An options Hash to control parsing (default: {}):
* :text indicates that the parser is only looking for text content
Returns a Block object built from the parsed content of the processed lines, or nothing if no block is found.
Internal: Parse and construct a description list Block from the current position of the Reader
reader - The Reader from which to retrieve the description list match - The Regexp match for the head of the list parent - The parent Block to which this description list belongs
Returns the Block encapsulating the parsed description list
Internal: Parse and construct an item list (ordered or unordered) from the current position of the Reader
reader - The Reader from which to retrieve the outline list list_type - A Symbol representing the list type (:olist for ordered, :ulist for unordered) parent - The parent Block to which this outline list belongs
Returns the Block encapsulating the parsed outline (unordered or ordered) list
Internal: Parse and construct the next ListItem for the current bulleted (unordered or ordered) list Block, callout lists included, or the next term ListItem and description ListItem pair for the description list Block.
First collect and process all the lines that constitute the next list item for the parent list (according to its type). Next, parse those lines into blocks and associate them with the ListItem (in the case of a description list, the description ListItem). Finally, fold the first block into the item‘s text attribute according to rules described in ListItem.
reader - The Reader from which to retrieve the next list item list_block - The parent list Block of this ListItem. Also provides access to the list type. match - The match Array which contains the marker and text (first-line) of the ListItem sibling_trait - The list marker or the Regexp to match a sibling item
Returns the next ListItem or ListItem pair (depending on the list type) for the parent list Block.
Public: Return the next section from the Reader.
This method process block metadata, content and subsections for this section and returns the Section object and any orphaned attributes.
If the parent is a Document and has a header (document title), then this method will put any non-section blocks at the start of document into a preamble Block. If there are no such blocks, the preamble is dropped.
Since we are reading line-by-line, there‘s a chance that metadata that should be associated with the following block gets consumed. To deal with this case, the method returns a running Hash of "orphaned" attributes that get passed to the next Section or Block.
reader - the source Reader parent - the parent Section or Document of this new section attributes - a Hash of metadata that was left orphaned from the
previous Section.
Examples
source # => "= Greetings\n\nThis is my doc.\n\n== Salutations\n\nIt is awesome." reader = Reader.new source, nil, :normalize => true # create empty document to parent the section # and hold attributes extracted from header doc = Document.new Parser.next_section(reader, doc)[0].title # => "Greetings" Parser.next_section(reader, doc)[0].title # => "Salutations"
returns a two-element Array containing the Section and Hash of orphaned attributes
Internal: Parse the table contained in the provided Reader
table_reader - a Reader containing the source lines of an AsciiDoc table parent - the parent Block of this Asciidoctor::Table attributes - attributes captured from above this Block
returns an instance of Asciidoctor::Table parsed from the provided reader
Public: Parses AsciiDoc source read from the Reader into the Document
This method is the main entry-point into the Parser when parsing a full document. It first looks for and, if found, processes the document title. It then proceeds to iterate through the lines in the Reader, parsing the document into nested Sections and Blocks.
reader - the Reader holding the source lines of the document document - the empty Document into which the lines will be parsed options - a Hash of options to control processing
returns the Document object
Internal: Parse the next line if it contains metadata for the following block
This method handles lines with the following content:
Any attributes found will be inserted into the attributes argument. If the line contains block metadata, the method returns true, otherwise false.
reader - the source reader document - the current Document attributes - a Hash of attributes in which any metadata found will be stored options - a Hash of options to control processing: (default: {})
* :text indicates the parser is only looking for text content, thus neither a block title or attribute entry should be captured
returns true if the line contains metadata, otherwise false
Internal: Parse lines of metadata until a line of metadata is not found.
This method processes sequential lines containing block metadata, ignoring blank lines and comments.
reader - the source reader document - the current Document attributes - a Hash of attributes in which any metadata found will be stored (default: {}) options - a Hash of options to control processing: (default: {})
* :text indicates that parser is only looking for text content and thus the block title should not be captured
returns the Hash of attributes including any metadata found
Public: Parse blocks from this reader until there are no more lines.
This method calls Parser#next_block until there are no more lines in the Reader. It does not consider sections because it‘s assumed the Reader only has lines which are within a delimited block region.
reader - The Reader containing the lines to process parent - The parent Block to which to attach the parsed blocks
Returns nothing.
Internal: Parse the cell specs for the current cell.
The cell specs dictate the cell‘s alignments, styles or filters, colspan, rowspan and/or repeating content.
The default spec when pos == :end is {} since we already know we‘re at a delimiter. When pos == :start, we may be at a delimiter, nil indicates we‘re not.
returns the Hash of attributes that indicate how to layout and style this cell in the table.
Internal: Parse the column specs for this table.
The column specs dictate the number of columns, relative width of columns, default alignments for cells in each column, and/or default styles or filters applied to the cells in the column.
Every column spec is guaranteed to have a width
returns a Hash of attributes that specify how to format and layout the cells in the table.
Public: Parses the document header of the AsciiDoc source read from the Reader
Reads the AsciiDoc source from the Reader until the end of the document header is reached. The Document object is populated with information from the header (document title, document attributes, etc). The document attributes are then saved to establish a save point to which to rollback after parsing is complete.
This method assumes that there are no blank lines at the start of the document, which are automatically removed by the reader.
returns the Hash of orphan block attributes captured above the header
Public: Consume and parse the two header lines (line 1 = author info, line 2 = revision info).
Returns the Hash of header metadata. If a Document object is supplied, the metadata is applied directly to the attributes of the Document.
reader - the Reader holding the source lines of the document document - the Document we are building (default: nil)
Examples
data = ["Author Name <author@example.org>\n", "v1.0, 2012-12-21: Coincide w/ end of world.\n"] parse_header_metadata(Reader.new data, nil, :normalize => true) # => {'author' => 'Author Name', 'firstname' => 'Author', 'lastname' => 'Name', 'email' => 'author@example.org', # 'revnumber' => '1.0', 'revdate' => '2012-12-21', 'revremark' => 'Coincide w/ end of world.'}
Public: Parses the manpage header of the AsciiDoc source read from the Reader
returns Nothing
Internal: Parse the section title from the current position of the reader
Parse an atx (single-line) or setext (underlined) section title. After this method is called, the Reader will be positioned at the line after the section title.
For efficiency, we don‘t reuse methods internally that check for a section title.
reader - the source [Reader], positioned at a section title. document - the current [Document].
Examples
reader.lines # => ["Foo", "~~~"] id, reftext, title, level, atx = parse_section_title(reader, document) title # => "Foo" level # => 2 id # => nil atx # => false line1 # => "==== Foo" id, reftext, title, level, atx = parse_section_title(reader, document) title # => "Foo" level # => 3 id # => nil atx # => true
Returns an 5-element [Array] containing the id (String), reftext (String), title (String), level (Integer), and flag (Boolean) indicating whether an atx section title was matched, or nothing.
Public: Parse the first positional attribute and assign named attributes
Parse the first positional attribute to extract the style, role and id parts, assign the values to their cooresponding attribute keys and return the parsed style from the first positional attribute.
attributes - The Hash of attributes to process and update
Examples
puts attributes => { 1 => "abstract#intro.lead%fragment", "style" => "preamble" } parse_style_attribute(attributes) => "abstract" puts attributes => { 1 => "abstract#intro.lead%fragment", "style" => "abstract", "id" => "intro", "role" => "lead", "options" => "fragment", "fragment-option" => '' }
Returns the String style parsed from the first positional attribute
Internal: Parse the author line into a Hash of author metadata
author_line - the String author line names_only - a Boolean flag that indicates whether to process line as
names only or names with emails (default: false)
multiple - a Boolean flag that indicates whether to process multiple
semicolon-separated entries in the author line (default: true)
returns a Hash of author metadata
Internal: Collect the lines belonging to the current list item, navigating through all the rules that determine what comprises a list item.
Grab lines until a sibling list item is found, or the block is broken by a terminator (such as a line comment). Description lists are more greedy if they don‘t have optional inline item text...they want that text
reader - The Reader from which to retrieve the lines. list_type - The Symbol context of the list (:ulist, :olist, :colist or :dlist) sibling_trait - A Regexp that matches a sibling of this list item or String list marker
of the items in this list (default: nil)
has_text - Whether the list item has text defined inline (always true except for description lists)
Returns an Array of lines belonging to the current list item.
Internal: Resolve the 0-index marker for this list item
For ordered lists, match the marker used for this list item against the known list markers and determine which marker is the first (0-index) marker in its number series.
For callout lists, return <1>.
For bulleted lists, return the marker as passed to this method.
list_type - The Symbol context of the list marker - The String marker for this list item ordinal - The position of this list item in the list validate - Whether to validate the value of the marker
Returns the String 0-index marker for this list item
Internal: Resolve the 0-index marker for this ordered list item
Match the marker used for this ordered list item against the known ordered list markers and determine which marker is the first (0-index) marker in its number series.
The purpose of this method is to normalize the implicit numbered markers so that they can be compared against other list items.
marker - The marker used for this list item ordinal - The 0-based index of the list item (default: 0) validate - Perform validation that the marker provided is the proper
marker in the sequence (default: false)
Examples
marker = 'B.' Parser.resolve_ordered_list_marker(marker, 1, true) # => 'A.'
Returns the String of the first marker in this number series
Internal: Converts a Roman numeral to an integer value.
value - The String Roman numeral to convert
Returns the Integer for this Roman numeral
Public: Convert a string to a legal attribute name.
name - the String name of the attribute
Returns a String with the legal AsciiDoc attribute name.
Examples
sanitize_attribute_name('Foo Bar') => 'foobar' sanitize_attribute_name('foo') => 'foo' sanitize_attribute_name('Foo 3 #-Billy') => 'foo3-billy'
Public: Store the attribute in the document and register attribute entry if accessible
name - the String name of the attribute to store;
if name begins or ends with !, it signals to remove the attribute with that root name
value - the String value of the attribute to store doc - the Document being parsed attrs - the attributes for the current context
returns a 2-element array containing the resolved attribute name (minus the ! indicator) and value