Files
ruff/parser
Dhruv Manilawala 3b4c8fffe5 Lex Jupyter line magic with Mode::Jupyter (#23)
Lex Jupyter line magic with `Mode::Jupyter`

This PR adds a new token `MagicCommand`[^1] which the lexer will
recognize when in `Mode::Jupyter`. The rules for the lexer is as
follows:
1. Given that we are at the start of line, skip the indentation and look
for [characters that represent the start of a magic
command](635815e8f1/IPython/core/inputtransformer2.py (L335-L346)),
determine the magic kind and capture all the characters following it as
the command string.
2. If the command extends multiple lines, the lexer will skip the line
continuation character (`\`) but only if it's followed by a newline
(`\n` or `\r`). The reason to skip this only in case of newline is
because they can occur in the command string which we should not skip:

	```rust
    //        Skip this backslash
    //        v
    //   !pwd \
    //      && ls -a | sed 's/^/\\    /'
    //                          ^^
    //                          Don't skip these backslashes
	```

3. The parser, when in `Mode::Jupyter`, will filter these tokens before
the parsing begins. There is a small caveat when the magic command is
indented. In the following example, when the parser filters out magic
command, it'll throw an indentation error:

	```python
	for i in range(5):
		!ls

	# What the parser will see
	for i in range(5):
	
	```

[^1]: I would prefer to have some other name as this not only represent
a line magic (`%`) but also shell command (`!`), help command (`?`) and
others. In original implementation, it's named as ["IPython
Syntax"](635815e8f1/IPython/core/inputtransformer2.py (L332))
2023-07-18 09:24:24 +05:30
..
2023-05-28 21:03:27 +09:00

RustPython/parser

This directory has the code for python lexing, parsing and generating Abstract Syntax Trees (AST).

The steps are:

  • Lexical analysis: splits the source code into tokens.
  • Parsing and generating the AST: transforms those tokens into an AST. Uses LALRPOP, a Rust parser generator framework.

This crate is published on https://docs.rs/rustpython-parser.

We wrote a blog post with screenshots and an explanation to help you understand the steps by seeing them in action.

For more information on LALRPOP, here is a link to the LALRPOP book.

There is a readme in the src folder with the details of each file.

Directory content

build.rs: The build script. Cargo.toml: The config file.

The src directory has:

lib.rs
This is the crate's root.

lexer.rs
This module takes care of lexing python source text. This means source code is translated into separate tokens.

parser.rs
A python parsing module. Use this module to parse python code into an AST. There are three ways to parse python code. You could parse a whole program, a single statement, or a single expression.

ast.rs
Implements abstract syntax tree (AST) nodes for the python language. Roughly equivalent to the python AST.

python.lalrpop
Python grammar.

token.rs
Different token definitions. Loosely based on token.h from CPython source.

errors.rs
Define internal parse error types. The goal is to provide a matching and a safe error API, masking errors from LALR.

fstring.rs
Format strings.

function.rs
Collection of functions for parsing parameters, arguments.

location.rs
Datatypes to support source location information.

mode.rs
Execution mode check. Allowed modes are exec, eval or single.

How to use

For example, one could do this:

  use rustpython_parser::{Parse, ast};
  let python_source = "print('Hello world')";
  let python_statements = ast::Suite::parse(python_source).unwrap();  // statements
  let python_expr = ast::Expr::parse(python_source).unwrap();  // or expr