JavaScripts source maps seem to typically be at no finer than token granularity. As an example, identity-map uses token granularity.
I know I've seen other examples, but can't remember where.
Why don't we use AST-node based granularity instead? That is, if our source maps had locations for all and only starts of AST nodes, what would be the downside?
In my understanding, source maps are used for crash stack decoding and for debugging: there will never be an error location or useful breakpoint that isn't at the start of some AST node, right?
Some further clarification:
The question pertains to cases where the AST is already known. So "it's more expensive to generate an AST than an array of tokens" wouldn't answer the question.
The practical impact of this question is that if we could decrease the granularity of source maps while preserving the behavior of debuggers and crash stack decoders, then source maps could be much smaller. The main advantage being performance of debuggers: dev tools can take a long time to process large source files, making debugging a pain.
Here is an example of adding source map locations at the token level using the source-map library:
for (const token of tokens) {
generator.addMapping({
source: "source.js",
original: token.location(),
generated: generated.get(token).location(),
});
}
And here is an example of adding locations at the AST node level:
for (const node of nodes) {
generator.addMapping({
source: "source.js",
original: node.location(),
generated: generated.get(node).location(),
});
}
Q1: Why expect there to be fewer starts of AST Nodes than starts of tokens?
A1: Because if there were more starts of AST Nodes than starts of tokens then there would be an AST Node that starts at a non-token. Which would be quite an accomplishment for the author of the parser! To make this concrete, suppose you have the following JavaScript statement:
const a = function *() { return a + ++ b }
Here are the locations at the starts of tokens:
const a = function *() { return a + ++ b } /*
^ ^ ^ ^^^ ^ ^ ^ ^ ^ ^ ^
*/
Here's roughly where most parsers will say the starts of AST Nodes are.
const a = function *() { return a + ++ b } /*
^ ^ ^ ^ ^ ^ ^
*/
That's a 46% reduction in the number of source-map locations!
Q2: Why expect AST-Node-granularity source maps to be smaller?
A2:See A1 above
Q3: What format would you use for referencing AST Nodes?
A3: No format. See the sample code in Update 1 above. I am talking about adding source map locations for the starts of AST Nodes. The process is almost exactly the same as the process for adding source map locations for the starts of tokens, except you are adding fewer locations.
Q4: How can you assert that all tools dealing with the source map use the same AST representation?
A4: Assume we control the entire pipeline and are using the same parser everywhere.
The TypeScript
compiler actually only emits sourcemap locations on AST node bounds, with some exceptions to improve compatibility with certain tools that expect mappings for certain positions, so token-based maps actually aren't quite universal. In the example you give, TS's sourcemaps are for positions like so:
const a = function *() { return a + ++ b } /*
^ ^^ ^ ^ ^^ ^ ^^^
*/
Which are generally both the start and end of each Identifier AST node (plus starts otherwise).
The rationale for mapping both start and end positions for an Identifier AST node is pretty simple - when you rename an Identifier, you want a selection range on that renamed identifier to be able to map back to the original identifier, without necessarily relying on heuristics.