Dhruv Manilawala d8f5d2d767 Add support for auto-fix in Jupyter notebooks (#4665)
## Summary

Add support for applying auto-fixes in Jupyter Notebook.

### Solution

Cell offsets are the boundaries for each cell in the concatenated source
code. They are represented using `TextSize`. It includes the start and
end offset as well, thus creating a range for each cell. These offsets
are updated using the `SourceMap` markers.

### SourceMap

`SourceMap` contains markers constructed from each edits which tracks
the original source code position to the transformed positions. The
following drawing might make it clear:

![SourceMap visualization](https://github.com/astral-sh/ruff/assets/67177269/3c94e591-70a7-4b57-bd32-0baa91cc7858)

The center column where the dotted lines are present are the markers
included in the `SourceMap`. The `Notebook` looks at these markers and
updates the cell offsets after each linter loop. If you notice closely,
the destination takes into account all of the markers before it.

The index is constructed only when required as it's only used to render
the diagnostics. So, a `OnceCell` is used for this purpose. The cell
offsets, cell content and the index will be updated after each iteration
of linting in the mentioned order. The order is important here as the
content is updated as per the new offsets and index is updated as per
the new content.

## Limitations

### 1

Styling rules such as the ones in `pycodestyle` will not be applicable
everywhere in Jupyter notebook, especially at the cell boundaries. Let's
take an example where a rule suggests to have 2 blank lines before a
function and the cells contains the following code:

```python
import something
# ---
def first():
	pass

def second():
	pass
```

(Again, the comment is only to visualize cell boundaries.)

In the concatenated source code, the 2 blank lines will be added but it
shouldn't actually be added when we look in terms of Jupyter notebook.
It's as if the function `first` is at the start of a file.

`nbqa` solves this by recording newlines before and after running
`autopep8`, then running the tool and restoring the newlines at the end
(refer https://github.com/nbQA-dev/nbQA/pull/807).

## Test Plan

Three commands were run in order with common flags (`--select=ALL
--no-cache --isolated`) to isolate which stage the problem is occurring:
1. Only diagnostics
2. Fix with diff (`--fix --diff`)
3. Fix (`--fix`)

### https://github.com/facebookresearch/segment-anything

```
-------------------------------------------------------------------------------
 Jupyter Notebooks       3            0            0            0            0
 |- Markdown             3           98            0           94            4
 |- Python               3          513          468            4           41
 (Total)                            611          468           98           45
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix
...
Found 180 errors (89 fixed, 91 remaining).
```

### https://github.com/openai/openai-cookbook

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      65            0            0            0            0
 |- Markdown            64         3475           12         2507          956
 |- Python              65         9700         7362         1101         1237
 (Total)                          13175         7374         3608         2193
===============================================================================
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix
error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-'
...
Found 4227 errors (2165 fixed, 2062 remaining).
```

### https://github.com/tensorflow/docs

```
-------------------------------------------------------------------------------
 Jupyter Notebooks     150            0            0            0            0
 |- Markdown             1           55            0           46            9
 |- Python               1          402          289           60           53
 (Total)                            457          289          106           62
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent
...
Found 12726 errors (5140 fixed, 7586 remaining).
```

### https://github.com/tensorflow/models

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      46            0            0            0            0
 |- Markdown             1           11            0            6            5
 |- Python               1          328          249           19           60
 (Total)                            339          249           25           65
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix
...
Found 4856 errors (2690 fixed, 2166 remaining).
```

resolves: #1218
fixes: #4556
2023-06-12 14:14:15 +00:00
2023-06-05 21:05:42 +02:00
2023-06-09 09:03:34 +02:00
2023-06-04 17:51:47 +00:00

Ruff

Ruff image image image Actions status

Discord | Docs | Playground

An extremely fast Python linter, written in Rust.

Shows a bar chart with benchmark results.

Linting the CPython codebase from scratch.

  • 10-100x faster than existing linters
  • 🐍 Installable via pip
  • 🛠️ pyproject.toml support
  • 🤝 Python 3.11 compatibility
  • 📦 Built-in caching, to avoid re-analyzing unchanged files
  • 🔧 Autofix support, for automatic error correction (e.g., automatically remove unused imports)
  • 📏 Over 500 built-in rules
  • ⚖️ Near-parity with the built-in Flake8 rule set
  • 🔌 Native re-implementations of dozens of Flake8 plugins, like flake8-bugbear
  • ⌨️ First-party editor integrations for VS Code and more
  • 🌎 Monorepo-friendly, with hierarchical and cascading configuration

Ruff aims to be orders of magnitude faster than alternative tools while integrating more functionality behind a single, common interface.

Ruff can be used to replace Flake8 (plus dozens of plugins), isort, pydocstyle, yesqa, eradicate, pyupgrade, and autoflake, all while executing tens or hundreds of times faster than any individual tool.

Ruff is extremely actively developed and used in major open-source projects like:

...and many more.

Ruff is backed by Astral. Read the launch post, or the original project announcement.

Testimonials

Sebastián Ramírez, creator of FastAPI:

Ruff is so fast that sometimes I add an intentional bug in the code just to confirm it's actually running and checking the code.

Nick Schrock, founder of Elementl, co-creator of GraphQL:

Why is Ruff a gamechanger? Primarily because it is nearly 1000x faster. Literally. Not a typo. On our largest module (dagster itself, 250k LOC) pylint takes about 2.5 minutes, parallelized across 4 cores on my M1. Running ruff against our entire codebase takes .4 seconds.

Bryan Van de Ven, co-creator of Bokeh, original author of Conda:

Ruff is ~150-200x faster than flake8 on my machine, scanning the whole repo takes ~0.2s instead of ~20s. This is an enormous quality of life improvement for local dev. It's fast enough that I added it as an actual commit hook, which is terrific.

Timothy Crosley, creator of isort:

Just switched my first project to Ruff. Only one downside so far: it's so fast I couldn't believe it was working till I intentionally introduced some errors.

Tim Abbott, lead developer of Zulip:

This is just ridiculously fast... ruff is amazing.

Table of Contents

For more, see the documentation.

  1. Getting Started
  2. Configuration
  3. Rules
  4. Contributing
  5. Support
  6. Acknowledgements
  7. Who's Using Ruff?
  8. License

Getting Started

For more, see the documentation.

Installation

Ruff is available as ruff on PyPI:

pip install ruff

You can also install Ruff via Homebrew, Conda, and with a variety of other package managers.

Usage

To run Ruff, try any of the following:

ruff check .                        # Lint all files in the current directory (and any subdirectories)
ruff check path/to/code/            # Lint all files in `/path/to/code` (and any subdirectories)
ruff check path/to/code/*.py        # Lint all `.py` files in `/path/to/code`
ruff check path/to/code/to/file.py  # Lint `file.py`

Ruff can also be used as a pre-commit hook:

- repo: https://github.com/astral-sh/ruff-pre-commit
  # Ruff version.
  rev: v0.0.272
  hooks:
    - id: ruff

Ruff can also be used as a VS Code extension or alongside any other editor through the Ruff LSP.

Ruff can also be used as a GitHub Action via ruff-action:

name: Ruff
on: [ push, pull_request ]
jobs:
  ruff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: chartboost/ruff-action@v1

Configuration

Ruff can be configured through a pyproject.toml, ruff.toml, or .ruff.toml file (see: Configuration, or Settings for a complete list of all configuration options).

If left unspecified, the default configuration is equivalent to:

[tool.ruff]
# Enable pycodestyle (`E`) and Pyflakes (`F`) codes by default.
select = ["E", "F"]
ignore = []

# Allow autofix for all enabled rules (when `--fix`) is provided.
fixable = ["A", "B", "C", "D", "E", "F", "G", "I", "N", "Q", "S", "T", "W", "ANN", "ARG", "BLE", "COM", "DJ", "DTZ", "EM", "ERA", "EXE", "FBT", "ICN", "INP", "ISC", "NPY", "PD", "PGH", "PIE", "PL", "PT", "PTH", "PYI", "RET", "RSE", "RUF", "SIM", "SLF", "TCH", "TID", "TRY", "UP", "YTT"]
unfixable = []

# Exclude a variety of commonly ignored directories.
exclude = [
    ".bzr",
    ".direnv",
    ".eggs",
    ".git",
    ".git-rewrite",
    ".hg",
    ".mypy_cache",
    ".nox",
    ".pants.d",
    ".pytype",
    ".ruff_cache",
    ".svn",
    ".tox",
    ".venv",
    "__pypackages__",
    "_build",
    "buck-out",
    "build",
    "dist",
    "node_modules",
    "venv",
]

# Same as Black.
line-length = 88

# Allow unused variables when underscore-prefixed.
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"

# Assume Python 3.10.
target-version = "py310"

[tool.ruff.mccabe]
# Unlike Flake8, default to a complexity level of 10.
max-complexity = 10

Some configuration options can be provided via the command-line, such as those related to rule enablement and disablement, file discovery, logging level, and more:

ruff check path/to/code/ --select F401 --select F403 --quiet

See ruff help for more on Ruff's top-level commands, or ruff help check for more on the linting command.

Rules

Ruff supports over 500 lint rules, many of which are inspired by popular tools like Flake8, isort, pyupgrade, and others. Regardless of the rule's origin, Ruff re-implements every rule in Rust as a first-party feature.

By default, Ruff enables Flake8's E and F rules. Ruff supports all rules from the F category, and a subset of the E category, omitting those stylistic rules made obsolete by the use of an autoformatter, like Black.

If you're just getting started with Ruff, the default rule set is a great place to start: it catches a wide variety of common errors (like unused imports) with zero configuration.

Beyond the defaults, Ruff re-implements some of the most popular Flake8 plugins and related code quality tools, including:

For a complete enumeration of the supported rules, see Rules.

Contributing

Contributions are welcome and highly appreciated. To get started, check out the contributing guidelines.

You can also join us on Discord.

Support

Having trouble? Check out the existing issues on GitHub, or feel free to open a new one.

You can also ask for help on Discord.

Acknowledgements

Ruff's linter draws on both the APIs and implementation details of many other tools in the Python ecosystem, especially Flake8, Pyflakes, pycodestyle, pydocstyle, pyupgrade, and isort.

In some cases, Ruff includes a "direct" Rust port of the corresponding tool. We're grateful to the maintainers of these tools for their work, and for all the value they've provided to the Python community.

Ruff's autoformatter is built on a fork of Rome's rome_formatter, and again draws on both the APIs and implementation details of Rome, Prettier, and Black.

Ruff is also influenced by a number of tools outside the Python ecosystem, like Clippy and ESLint.

Ruff is the beneficiary of a large number of contributors.

Ruff is released under the MIT license.

Who's Using Ruff?

Ruff is used by a number of major open-source projects and companies, including:

Show Your Support

If you're using Ruff, consider adding the Ruff badge to project's README.md:

[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

...or README.rst:

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json
    :target: https://github.com/astral-sh/ruff
    :alt: Ruff

...or, as HTML:

<a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json" alt="Ruff" style="max-width:100%;"></a>

License

MIT

Description
No description provided
Readme MIT 328 MiB
Languages
Rust 96.1%
Python 2.6%
TypeScript 0.9%
RenderScript 0.2%
Shell 0.1%