README 17.5 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455
# $Id: README,v 1.2 2007/06/13 10:09:47 ssttoo Exp $

Introduction
============

Text_Highlighter is a class for syntax highlighting. The main idea is to
simplify creation of subclasses implementing syntax highlighting for
particular language. Subclasses do not implement any new functioanality, they
just provide syntax highlighting rules. The rules sources are in XML format.
To create a highlighter for a language, there is no need to code a new class
manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
to create a new class.


This document does not contain a formal description of API - it is very
simple, and I believe providing some examples of code is sufficient.


Highlighter XML source
======================

Basics
------

Creating a new syntax highlighter begins with describing the highlighting
rules. There are two basic elements: block and region. A block is just a
portion of text matching a regular expression and highlighted with a single
color. Keyword is an example of a block. A region is defined by two regular
expressions: one for start of region, and another for the end. The main
difference from a block is that a region can contain blocks and regions
(including same-named regions). An example of a region is a group of
statements enclosed in curly brackets (this is used in many languages, for
example PHP and C). Also, characters matching start and end of a region may be
highlighted with their own color, and region contents with another.

Blocks and regions may be declared as contained. Contained blocks and regions
can only appear inside regions. If a region or a block is not declared as
contained, it can appear both on top level and inside regions. Block or region
declared as not-contained can only appear on top level.

For any region, a list of blocks and regions that can appear inside this
region can be specified.

In this document, the term "color group" is used. Chunks of text assigned to
same color group will be highlighted with same color. Note that in versions
prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
HTML output is supported, so "color group" is more appropriate term.

Elements
--------

The toplevel element is <highlight>. Attribute lang is required and denotes
the name of the language. Its value is used as a part of generated class name,
and must only contain letters, digits and underscores. Optional attribute
case, when given value yes, makes the language case sensitive (default is case
insensitive). Allowed subelements are:

    * <authors>: Information about the authors of the file.
        <author>: Information about a single author of the file. (May be used
        multiple times, one per author.)
                - name="...": Author's name. Required.
                - email="...": Author's email address. Optional.

    * <default>: Default color group.
          - innerGroup="...": color group name. Required.
    
    * <region>: Region definition
          - name="...": Region name. Required.
          - innerGroup="...": Default color group of region contents. Required.
          - delimGroup="...": color group of start and end of region. Optional,
            defaults to value of innerGroup attribute.
          - start="...", end="...": Regular expression matching start and end
            of region. Required. Regular expression delimiters are optional, but
            if you need to specify delimiter, use /. The only case when the
            delimiters are needed, is specifying regular expression modifiers,
            such as m or U. Examples: \/\* or /$/m.
          - contained="yes": Marks region as contained.
          - never-contained="yes": Marks region as not-contained.
          - <contains>: Elements allowed inside this region.
                - all="yes" Region can contain any other region or block
                (except not-contained). May be used multiple times.
                      - <but> Do not allow certain regions or blocks.
                            - region="..." Name of region not allowed within
                              current region.
                            - block="..." Name of block not allowed within
                              current region.
                - region="..." Name of region allowed within current region.
                - block="..." Name of block allowed within current region.
          - <onlyin> Only allow this region within certain regions. May be
            used multiple times.
                - block="..." Name of parent region
    
    * <block>: Block definition
          - name="...": Block name. Required.
          - innerGroup="...": color group of block contents. Optional. If not
            specified, color group of parent region or default color group will be
            used. One would only want to omit this attribute if there are
            keyword groups (see below) inherited from this block, and no special
            highlighting should apply when the block does not match the keyword.
          - match="..." Regular expression matching the block. Required.
            Regular expression delimiters are optional, but if you need to
            specify delimiter, use /. The only case when the delimiters are
            needed, is specifying regular expression modifiers, such as m or U.
            Examples: #|\/\/ or /$/m.
          - contained="yes": Marks block as contained.
          - never-contained="yes": Marks block as not-contained.
          - <onlyin> Only allow this block within certain regions. May be used
              multiple times.
                - block="..." Name of parent region
          - multiline="yes": Marks block as multi-line. By default, whole
            blocks are assumed to reside in a single line. This make the things
            faster. If you need to declare a multi-line block, use this
            attribute.
          - <partgroup>: Assigns another color group to a part of the block that
              matched a subpattern.
                - index="n": Subpattern index. Required.
                - innerGroup="...": color group name. Required.

              This is an example from CSS highlighter: the measure is matched as
              a whole, but the measurement units are highlighted with different
              color.

                <block name="measure"  match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
                        innerGroup="number" contained="yes">
                    <onlyin region="property"/>
                    <partGroup index="1" innerGroup="string" />
                </block>
  
    * <keywords>: Keyword group definition. Keyword groups are useful when you
      want to highlight some words that match a condition for a block with a
      different color. Keywords are defined with literal match, not regular
      expressions. For example, you have a block named identifier matching a
      general identifier, and want to highlight reserved words (which match
      this block as well) with different color. You inherit a keyword group
      "reserved" from "identifier" block.
          - name="...": Keyword group. Required.
          - ifdef="...", ifndef="..." : Conditional declaration. See
            "Conditions" below.
          - inherits="...": Inherited block name. Required.
          - innerGroup="...": color group of keyword group. Required.
          - case="yes|no": Overrides case-sensitivity of the language.
            Optional, defaults to global value.
          - <keyword>: Single keyword definition.
                - match="..." The keyword. Note: this is not a regular
                  expression, but literal match (possibly case insensitive).

Note that for BC reasons element partClass is alias for partGroup, and
attributes innerClass and  delimClass  are aliases of innerGroup and
delimGroup, respectively.
    

Conditions
----------

Conditional declarations allow enabling or disabling certain highlighting
rules at runtime. For example, Java highlighter has a very big list of
keywords matching Java standard classes. Finding a match in this list can take
much time. For that reason, corresponding keyword group is declared with
"ifdef" attribute :

  <keywords name="builtin" inherits="identifier" innerClass="builtin" 
            case="yes" ifdef="java.builtins">
	<keyword match="AbstractAction" />
	<keyword match="AbstractBorder" />
	<keyword match="AbstractButton" />
    ...
    ...
	<keyword match="_Remote_Stub" />
	<keyword match="_ServantActivatorStub" />
	<keyword match="_ServantLocatorStub" />
  </keywords>

This keyword group will be only enabled when "java.builtins" is passed as an
element of "defines" option:

    $options = array(
        'defines' => array(
            'java.builtins',
        ),
        'numbers' => HL_NUMBERS_TABLE,
    );
    $highlighter =& Text_Highlighter::factory('java', $options);

"ifndef" attribute has reverse meaning.

Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
tag. 



Class generation
================

Creating XML description of highlighting rules is the most complicated part of
the process. To generate the class, you need just few lines of code:

    <?php
    require_once 'Text/Highlighter/Generator.php';
    $generator =& new Text_Highlighter_Generator('php.xml');
    $generator->generate();
    $generator->saveCode('PHP.php');
    ?>



Command-line class generation tool
==================================

Example from previous section looks pretty simple, but it does not handle any
errors which may occur during parsing of XML source. The package provides a
command-line script to make generation of classes even more simple, and takes
care of possible errors. It is called generate (on Unix/Linux) or generate.bat
(on Windows). This script is able to process multiple files in one run, and
also to process XML from standard input and write generated code to standard
output.

    Usage:
    generate options

    Options:
      -x filename, --xml=filename
            source XML file. Multiple input files can be specified, in which
            case each -x option must be followed by -p unless -d is specified
            Defaults to stdin
      -p filename, --php=filename
            destination PHP file. Defaults to stdout. If specied multiple times,
            each -p must follow -x
      -d dirname, --dir=dirname
            Default destination directory. File names will be taken from XML input
            ("lang" attribute of <highlight> tag)
      -h, --help
            This help

Examples

    Read from php.xml, write to PHP.php

        generate -x php.xml -p PHP.php

    Read from php.xml, write to standard output

        generate -x php.xml

    Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php

        generate -x php.xml -p PHP.php -x xml.xml -p XML.php

    Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
    /some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
    php.xml contains <highlight lang="php">)

        generate -x php.xml -x xml.xml -d /some/dir/



Renderers
=========

Introduction
------------

Text_Highlighter supports renderes. Using renderers, you can get output in
different formats. Two renderers are included in the package:

    - HTML renderer. Generates HTML output. A style sheet should be linked to
      the document to display colored text

    - Console renderer. Can be used to output highlighted text to
      color-capable terminals, either directly or trough less -r


Renderers API
-------------

Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
override at least two methods - acceptToken and getOutput. Overriding other
methods is optional, depending on the nature of renderer's output and details
of implementation.

    string reset()
        resets renderer state. This method is called every time before a new
        source file is highlighted.

    string preprocess(string $code)
        preprocesses code. Can be used, for example, to normalize whitespace
        before highlighting. Returns preprocessed string.

    void acceptToken(string $group, string $content)
        the core method of the renderer. Highlighter passes chunks of text to
        this method in $content, and color group in $group

    void finalize()
        signals the renderer that no more tokens are available.

    mixed getOutput()
        returns generated output.


Setting renderer options
--------------------------------

Renderers accept an optional argument to their constructor  - options array.
Elements of this array are renderer-specific.

HTML renderer
-------------

HTML renderer produces HTML output with optional line numbering. The renderer
itself does not provide information about actual colors of highlighted text.
Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
If 'use_language' option with value evaluating to true was passed, class names
will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
highlighter XML source ("lang" attribute of <highlight> tag) in lower case.

There are 3 special CSS classes:

    hl-main - this class applies to whole output or right table column,
              depending on 'numbers' option
    hl-gutter - applies to left column in table
    hl-table - applies to whole table

HTML renderer accepts following options (each being optional):
    
    * numbers - line numbering style.
        0 - no numbering (default)
        HL_NUMBERS_LI - use <ol></ol> for line numbering
        HL_NUMBERS_TABLE  - create a 2-column table, with line numbers in left
                            column and highlighted text in right column

    * tabsize - tabulation size. Defaults to 4

    Example:
        
        require_once 'Text/Highlighter/Renderer/Html.php';
        $options = array(
            'numbers' => HL_NUMBERS_LI,
            'tabsize' => 8,
        );
        $renderer =& new Text_Highlighter_Renderer_HTML($options);

Console renderer
----------------

Console renderer produces output for displaying on a color-capable terminal,
either directly or through less -r, using ANSI escape sequences. By default,
this renderer only highlights most common color groups. Additional colors
can be specified using 'colors' option. This renderer also accepts 'numbers'
option - a boolean value, and 'tabsize' option.

    Example :

        require_once 'Text/Highlighter/Renderer/Console.php';
        $colors = array(
            'prepro' => "\033[35m",
            'types' => "\033[32m",
        );
        $options = array(
            'numbers' => true,
            'tabsize' => 8,
            'colors' => $colors,
        );
        $renderer =& new Text_Highlighter_Renderer_Console($options);


ANSI color escape sequences have the following format:

    ESC[#;#;....;#m

where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
one of the following:

        0 for normal display
        1 for bold on
        4 underline (mono only)
        5 blink on
        7 reverse video on
        8 nondisplayed (invisible)
        30 black foreground
        31 red foreground
        32 green foreground
        33 yellow foreground
        34 blue foreground
        35 magenta foreground
        36 cyan foreground
        37 white foreground
        40 black background
        41 red background
        42 green background
        43 yellow background
        44 blue background
        45 magenta background
        46 cyan background
        47 white background


How to use Text_Highlighter class
=================================

Creating a highlighter object
-----------------------------

To create a highlighter for a certain language, use Text_Highlighter::factory()
static method:

    require_once 'Text/Highlighter.php';
    $hl =& Text_Highlighter::factory('php');


Setting a renderer
------------------

Actual output is produced by a renderer.

    require_once 'Text/Highlighter.php';
    require_once 'Text/Highlighter/Renderer/Html.php';
    $options = array(
        'numbers' => HL_NUMBERS_LI,
        'tabsize' => 8,
    );
    $renderer =& new Text_Highlighter_Renderer_HTML($options);
    $hl =& Text_Highlighter::factory('php');
    $hl->setRenderer($renderer);

Note that for BC reasons, it is possible to use highlighter without setting a
renderer. If no renderer is set, HTML renderer will be used by default. In
this case, you should pass options as second parameter to factory method. The
following example works exactly as previous one:

    require_once 'Text/Highlighter.php';
    $options = array(
        'numbers' => HL_NUMBERS_LI,
        'tabsize' => 8,
    );
    $hl =& Text_Highlighter::factory('php', $options);


Getting output
--------------

And finally, do the highlighting and get the output:

    require_once 'Text/Highlighter.php';
    require_once 'Text/Highlighter/Renderer/Html.php';
    $options = array(
        'numbers' => HL_NUMBERS_LI,
        'tabsize' => 8,
    );
    $renderer =& new Text_Highlighter_Renderer_HTML($options);
    $hl =& Text_Highlighter::factory('php');
    $hl->setRenderer($renderer);
    $html = $hl->highlight(file_get_contents('example.php'));

# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */