phpsyntax-highlightinggeshi

Advice for implementing simple regex (for bbcode/geshi parsing)


I had made a personal note software in PHP so I can store and organize my notes and wished for a nice simple format to write them in.

I had done it in Markdown but found it was a little confusing and there was no simple syntax highlighting, so I did bbcode before and wished to implement that.

Now for GeSHi which I really wish to implement (the syntax highlighter), it requires the most simple code like this:

$geshi = new GeSHi($sourcecode, $language);
$geshi->parse_code();

Now this is the easy part , but what I wish to do is allow my bbcode to call it.

My current regular expression to match a made up [syntax=cpp][/syntax] bbcode is the following:

preg_replace('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi(\\2,\\1)????', text);

You will notice I capture the language and the content, how on earth would I connect it to the GeSHi code?

preg_replace seems to just be able to replace it with a string not an 'expression', I am not sure how to use those two lines of code for GeSHi up there with the captured data..

I really am excited about this project and wish to overcome this.


Solution

  • I wrote this class a while back, the reason for the class was to allow easy customization / parsing. Maybe a little overkill, but works well and I needed it overkill for my application. The usage is pretty simple:

    $geshiH = new Geshi_Helper();
    $text = $geshiH->geshi($text); // this assumes that the text should be parsed (ie inline syntaxes)
    

    ---- OR ----

    $geshiH = new Geshi_Helper();
    $text = $geshiH->geshi($text, $lang);  // assumes that you have the language, good for a snippets deal
    

    I had to do some chopping from other custom items I had, but pending no syntax errors from the chopping it should work. Feel free to use it.

    <?php
    
    require_once 'Geshi/geshi.php';
    
    class Geshi_Helper 
    {
        /**
         * @var array Array of matches from the code block.
         */
        private $_codeMatches = array();
    
        private $_token = "";
    
        private $_count = 1;
    
        public function __construct()
        {
            /* Generate a unique hash token for replacement) */
            $this->_token = md5(time() . rand(9999,9999999));
        }
    
        /**
         * Performs syntax highlights using geshi library to the content.
         *
         * @param string $content - The context to parse
         * @return string Syntax Highlighted content
         */
        public function geshi($content, $lang=null)
        {
            if (!is_null($lang)) {
                /* Given the returned results 0 is not set, adding the "" should make this compatible */
                $content = $this->_highlightSyntax(array("", strtolower($lang), $content));
            }else {
                /* Need to replace this prior to the code replace for nobbc */
                $content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', '\'[nobbc]\' . strtr(\'$1\', array(\'[\' => \'&#91;\', \']\' => \'&#93;\', \':\' => \'&#58;\', \'@\' => \'&#64;\')) . \'[/nobbc]\'', $content);
    
                /* For multiple content we have to handle the br's, hence the replacement filters */
                $content = $this->_preFilter($content);
    
                /* Reverse the nobbc markup */
                $content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', 'strtr(\'$1\', array(\'&amp;#91;\' => \'[\', \'&amp;#93;\' => \']\', \'&amp;#58;\' => \':\', \'&amp;#64;\' => \'@\'))', $content);
    
                $content = $this->_postFilter($content);
            }
    
            return $content;
        }
    
        /**
         * Performs syntax highlights using geshi library to the content.
         * If it is unknown the number of blocks, use highlightContent
         * instead.
         *
         * @param string $content - The code block to parse
         * @param string $language - The language to highlight with
         * @return string Syntax Highlighted content
         * @todo Add any extra / customization styling here.
         */
        private function _highlightSyntax($contentArray)
        {
            $codeCount = $contentArray[1];
    
            /* If the count is 2 we are working with the filter */
            if (count($contentArray) == 2) {
                $contentArray = $this->_codeMatches[$contentArray[1]];
            }
    
            /* for default [syntax] */
            if ($contentArray[1] == "")
                $contentArray[1] = "php";
    
            /* Grab the language */
            $language = (isset($contentArray[1]))?$contentArray[1]:'text';
    
            /* Remove leading spaces to avoid problems */
            $content = ltrim($contentArray[2]);
    
            /* Parse the code to be highlighted */
            $geshi = new GeSHi($content, strtolower($language));
            return $geshi->parse_code();
        }
    
        /**
         * Substitute the code blocks for formatting to be done without
         * messing up the code.
         *
         * @param array $match - Referenced array of items to substitute
         * @return string Substituted content
         */
        private function _substitute(&$match)
        {
            $index = sprintf("%02d", $this->_count++);
            $this->_codeMatches[$index] = $match;
            return "----" . $this->_token . $index . "----";
        }
    
        /**
         * Removes the code from the rest of the content to apply other filters.
         *
         * @param string $content - The content to filter out the code lines
         * @return string Content with code removed.
         */
        private function _preFilter($content)
        {
            return preg_replace_callback("#\s*\[syntax=(.*?)\](.*?)\[/syntax\]\s*#siU", array($this, "_substitute"), $content);
        }
    
        /**
         * Replaces the code after the filters have been ran.
         *
         * @param string $content - The content to replace the code lines
         * @return string Content with code re-applied.
         */
        private function _postFilter($content)
        {
            /* using dashes to prevent the old filtered tag being escaped */
            return preg_replace_callback("/----\s*" . $this->_token . "(\d{2})\s*----/si", array($this, "_highlightSyntax"), $content);
        }
    }
    ?>