windowsparsingbatch-filecmdvariable-expansion

How does the Windows Command Interpreter (CMD.EXE) parse scripts?


I ran into ss64.com which provides good help regarding how to write batch scripts that the Windows Command Interpreter will run.

However, I have been unable to find a good explanation of the grammar of batch scripts, how things expand or do not expand, and how to escape things.

Here are sample questions that I have not been able to solve:


Solution

  • We performed experiments to investigate the grammar of batch scripts. We also investigated differences between batch and command line mode.

    Batch Line Parser:

    Here is a brief overview of phases in the batch file line parser:

    Phase 0) Read Line:

    Phase 1) Percent Expansion:

    Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes.

    Phase 3) Echo the parsed command(s) Only if the command block did not begin with @, and ECHO was ON at the start of the preceding step.

    Phase 4) FOR %X variable expansion: Only if a FOR command is active and the commands after DO are being processed.

    Phase 5) Delayed Expansion: Only if delayed expansion is enabled

    Phase 5.3) Pipe processing: Only if commands are on either side of a pipe

    Phase 5.5) Execute Redirection:

    Phase 6) CALL processing/Caret doubling: Only if the command token is CALL

    Phase 7) Execute: The command is executed


    Here are details for each phase:

    Note that the phases described below are only a model of how the batch parser works. The actual cmd.exe internals may not reflect these phases. But this model is effective at predicting behavior of batch scripts.

    Phase 0) Read Line: Read line of input through first <LF>.

    Phase 1) Percent Expansion:

    Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes. What follows is an approximation of this process.

    There are concepts that are important throughout this phase.

    The following characters may have special meaning in this phase, depending on context: <CR> ^ ( @ & | < > <LF> <space> <tab> ; , = <0x0B> <0x0C> <0xFF>

    Look at each character from left to right:

    Phase 3) Echo the parsed command(s) Only if the command block did not begin with @, and ECHO was ON at the start of the preceding step.

    Phase 4) FOR %X variable expansion: Only if a FOR command is active and the commands after DO are being processed.

    ---- From this point onward, each command identified in phase 2 is processed separately.
    ---- Phases 5 through 7 are completed for one command before moving on to the next.

    Phase 5) Delayed Expansion: Only if delayed expansion is on, the command is not in a parenthesized block on either side of a pipe, and the command is not a "naked" batch script (script name without parentheses, CALL, command concatenation, or pipe).

    Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
    Each side of the pipe is processed independently and asynchronously.

    Phase 5.5) Execute Redirection: Any redirection that was discovered in phase 2 is now executed.

    Phase 6) CALL processing/Caret doubling: Only if the command token is CALL, or if the text before the first occurring standard token delimiter is CALL. If CALL is parsed from a larger command token, then the unused portion is prepended to the arguments token before proceeding.

    Phase 7) Execute: The command is executed


    Command Line Parser:

    Works like the BatchLine-Parser, except:

    Phase 1) Percent Expansion:

    Phase 3) Echo the parsed command(s)

    Phase 5) Delayed Expansion: only if DelayedExpansion is enabled

    Phase 7) Execute Command


    Parsing of integer values

    There are many different contexts where cmd.exe parses integer values from strings, and the rules are inconsistent:

    Details for these rules may be found at Rules for how CMD.EXE parses numbers


    For anyone wishing to improve the cmd.exe parsing rules, there is a discussion topic on the DosTips forum where issues can be reported and suggestions made.

    Jan Erik (jeb) - Original author and discoverer of phases
    Dave Benham (dbenham) - Much additional content and editing