Commit Graph

41 Commits

Author SHA1 Message Date
Dhruv Manilawala ee0f1270cf
Add `NotebookIndex` to the cache (#6863)
## Summary

This PR updates the `FileCache` to include an optional `NotebookIndex`
to support caching for Jupyter Notebooks.

We only require the index to compute the diagnostics and thus we don't
really need to store the entire `Notebook` on the `Diagnostics` struct.
This means we only need the index to be stored in the cache to
reconstruct the `Diagnostics`.

## Test Plan

Update an existing test case to run over the fixtures under
`ruff_notebook` crate where there are multiple Jupyter Notebook.

Locally, the following commands were run in order:
1. Remove the cache: `rm -rf .ruff_cache`
2. Run without cache: `cargo run --bin ruff -- check --isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb
--no-cache`
3. Run with cache: `cargo run --bin ruff -- check --isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb`
4. Check whether the `.ruff_cache` directory was created or not
5. Run with cache again and verify: `cargo run --bin ruff -- check
--isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb`

## Benchmarks

https://github.com/astral-sh/ruff/pull/6863#issuecomment-1715675186

fixes: #6671
2023-09-12 18:29:03 +05:30
Dhruv Manilawala 1067261a55
Make `SourceKind` a required parameter (#7013) 2023-09-04 07:45:59 +00:00
Charlie Marsh afcd00da56
Create `ruff_notebook` crate (#7039)
## Summary

This PR moves `ruff/jupyter` into its own `ruff_notebook` crate. Beyond
the move itself, there were a few challenges:

1. `ruff_notebook` relies on the source map abstraction. I've moved the
source map into `ruff_diagnostics`, since it doesn't have any
dependencies on its own and is used alongside diagnostics.
2. `ruff_notebook` has a couple tests for end-to-end linting and
autofixing. I had to leave these tests in `ruff` itself.
3. We had code in `ruff/jupyter` that relied on Python lexing, in order
to provide a more targeted error message in the event that a user saves
a `.py` file with a `.ipynb` extension. I removed this in order to avoid
a dependency on the parser, it felt like it wasn't worth retaining just
for that dependency.

## Test Plan

`cargo test`
2023-09-01 13:56:44 +00:00
Charlie Marsh 60132da7bb
Add a `NotebookError` type to avoid returning `Diagnostics` on error (#7035)
## Summary

This PR refactors the error-handling cases around Jupyter notebooks to
use errors rather than `Box<Diagnostics>`, which creates some oddities
in the downstream handling. So, instead of formatting errors as
diagnostics _eagerly_ (in the notebook methods), we now return errors
and convert those errors to diagnostics at the last possible moment (in
`diagnostics.rs`). This is more ergonomic, as errors can be composed and
reported-on in different ways, whereas diagnostics require a `Printer`,
etc.

See, e.g.,
https://github.com/astral-sh/ruff/pull/7013#discussion_r1311136301.

## Test Plan

Ran `cargo run` over a Python file labeled with a `.ipynb` suffix, and
saw:

```
foo.ipynb:1:1: E999 SyntaxError: Expected a Jupyter Notebook, which must be internally stored as JSON, but found a Python source file: expected value at line 1 column 1
```
2023-09-01 11:08:05 +00:00
Charlie Marsh 58f5f27dc3
Add TOML files to `SourceType` (#6929)
## Summary

This PR adds a higher-level enum (`SourceType`) around `PySourceType` to
allow us to use the same detection path to handle TOML files. Right now,
we have ad hoc `is_pyproject_toml` checks littered around, and some
codepaths are omitting that logic altogether (like `add_noqa`). Instead,
we should always be required to check the source type and handle TOML
files as appropriate.

This PR will also help with our pre-commit capabilities. If we add
`toml` to pre-commit (to support `pyproject.toml`), pre-commit will
start to pass _other_ files to Ruff (along with `poetry.lock` and
`Pipfile` -- see
[identify](b59996304f/identify/extensions.py (L355))).
By detecting those files and handling those cases, we avoid attempting
to parse them as Python files, which would lead to pre-commit errors.
(We tried to add `toml` to pre-commit here
(https://github.com/astral-sh/ruff-pre-commit/pull/44), but had to
revert here (https://github.com/astral-sh/ruff-pre-commit/pull/45) as it
led to the pre-commit hook attempting to parse `poetry.lock` files as
Python files.)
2023-08-28 15:01:48 +00:00
Dhruv Manilawala d1f07008f7
Rename Notebook related symbols (#6862)
This PR renames the following symbols:

* `PySourceType::Jupyter` -> `PySourceType::Ipynb`
* `SourceKind::Jupyter` -> `SourceKind::IpyNotebook`
* `JupyterIndex` -> `NotebookIndex`
2023-08-25 11:40:54 +05:30
Micha Reiser ea72d5feba
Refactor `SourceKind` to store file content (#6640) 2023-08-18 13:45:38 +00:00
Charlie Marsh 2aeb27334d
Avoid cloning source code multiple times (#6629)
## Summary

In working on https://github.com/astral-sh/ruff/pull/6628, I noticed
that we clone the source code contents, potentially multiple times,
prior to linting. The issue is that `SourceKind::Python` takes a
`String`, so we first have to provide it with a `String`. In the stdin
case, that means cloning. However, on top of this, we then have to clone
`source_kind.contents()` because `SourceKind` gets mutated. So for
stdin, we end up cloning twice. For non-stdin, we end up cloning once,
but unnecessarily (since the _contents_ don't get mutated, only the
kind).

This PR removes the `String` from `source_kind`, instead requiring that
we parse it out elsewhere. It reduces the number of clones down to 1 for
Jupyter Notebooks, and zero otherwise.
2023-08-18 09:32:18 -04:00
Charlie Marsh 98b9f2e705
Respect .ipynb and .pyi sources when linting from stdin (#6628)
## Summary

When running Ruff from stdin, we were always falling back to the default
source type, even if the user specified a path (as is the case when
running from the LSP). This PR wires up the source type inference, which
means we now get the expected result when checking `.pyi` and `.ipynb`
files.

Closes #6627.

## Test Plan

Verified that `cat
crates/ruff/resources/test/fixtures/jupyter/valid.ipynb | cargo run -p
ruff_cli -- --force-exclude --no-cache --no-fix --isolated --select ALL
--stdin-filename foo.ipynb -` yielded the expected results (and differs
from the errors you get if you omit the filename).

Verified that `cat foo.pyi | cargo run -p ruff_cli -- --force-exclude
--no-cache --no-fix --format json --isolated --select TCH
--stdin-filename path/to/foo.pyi -` yielded no errors.
2023-08-16 20:33:59 +00:00
Dhruv Manilawala 32fa05765a
Use `Jupyter` mode while parsing Notebook files (#5552)
## Summary

Enable using the new `Mode::Jupyter` for the tokenizer/parser to parse
Jupyter line magic tokens.

The individual call to the lexer i.e., `lex_starts_at` done by various
rules should consider the context of the source code (is this content
from a Jupyter Notebook?). Thus, a new field `source_type` (of type
`PySourceType`) is added to `Checker` which is being passed around as an
argument to the relevant functions. This is then used to determine the
`Mode` for the lexer.

## Test Plan

Add new test cases to make sure that the magic statement is considered
while generating the diagnostic and autofix:
* For `I001`, if there's a magic statement in between two import blocks,
they should be sorted independently

fixes: #6090
2023-08-05 00:32:07 +00:00
Charlie Marsh 4231ed2fc3
Skip partial duplicates when applying multi-edit fixes (#6144)
## Summary

Right now, if we have two fixes that have an overlapping edit, but not
an _identical_ set of edits, they'll conflict, causing us to do another
linter traversal. Here, I've enabled the fixer to support partially
overlapping edits, which (as an example) let's us greatly reduce the
number of iterations required in the test suite.

The most common case here is that in which a bunch of edits need to
import some symbol, and then use that symbol, but in different ways. In
that case, all edits will have a common fix (to import the symbol), but
deviate in some way. With this change, we can do all of those edits in
one pass.

Note that the simplest way to enable this was to store sorted edits on
`Fix`. We don't allow modifying the edits on `Fix` once it's
constructed, so this is an easy change, and allows us to avoid a bunch
of clones and traversals later on.

Closes #5800.
2023-07-29 12:11:57 +00:00
Dhruv Manilawala 3c99fbf808
Implement `--diff` for Jupyter Notebooks (#6149)
## Summary

Implement `--diff` for Jupyter Notebooks

## Test Plan

1. Use `crates/ruff/resources/test/fixtures/jupyter/isort.ipynb` as a
test case
and add a markdown cell in between the code cells to check that the diff
   outputs the correct cell index.
2. Run the command:
`cargo run --bin ruff --package ruff_cli -- check --no-cache --isolated
--select=ALL crates/ruff/resources/test/fixtures/jupyter/isort.ipynb
--fix --diff`

<details><summary>Example output:</summary>
<p>

```diff
--- /Users/dhruv/playground/ruff/notebooks/test.ipynb:cell 0
+++ /Users/dhruv/playground/ruff/notebooks/test.ipynb:cell 0
@@ -1,3 +0,0 @@
-from pathlib import Path
-import random
-import math
--- /Users/dhruv/playground/ruff/notebooks/test.ipynb:cell 4
+++ /Users/dhruv/playground/ruff/notebooks/test.ipynb:cell 4
@@ -1,5 +1,3 @@
-from typing import Any
-import collections
 # Newline should be added here
 def foo():
     pass

--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 8
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 8
@@ -1,8 +1,7 @@
 import pprint
 import tempfile
 
-from IPython import display
 import matplotlib.pyplot as plt
-
 import tensorflow as tf
-import tensorflow_datasets as tfds
+import tensorflow_datasets as tfds
+from IPython import display
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 10
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 10
@@ -1,5 +1,4 @@
 import tensorflow_models as tfm
 
 # These are not in the tfm public API for v2.9. They will be available in v2.10
-from official.vision.serving import export_saved_model_lib
-import official.core.train_lib
+from official.vision.serving import export_saved_model_lib
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 13
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 13
@@ -1,5 +1,5 @@
-exp_config = tfm.core.exp_factory.get_exp_config('resnet_imagenet')
-tfds_name = 'cifar10'
+exp_config = tfm.core.exp_factory.get_exp_config("resnet_imagenet")
+tfds_name = "cifar10"
 ds,ds_info = tfds.load(
 tfds_name,
 with_info=True)
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 15
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 15
@@ -6,12 +6,12 @@
 # Configure training and testing data
 batch_size = 128
 
-exp_config.task.train_data.input_path = ''
+exp_config.task.train_data.input_path = ""
 exp_config.task.train_data.tfds_name = tfds_name
-exp_config.task.train_data.tfds_split = 'train'
+exp_config.task.train_data.tfds_split = "train"
 exp_config.task.train_data.global_batch_size = batch_size
 
-exp_config.task.validation_data.input_path = ''
+exp_config.task.validation_data.input_path = ""
 exp_config.task.validation_data.tfds_name = tfds_name
-exp_config.task.validation_data.tfds_split = 'test'
+exp_config.task.validation_data.tfds_split = "test"
 exp_config.task.validation_data.global_batch_size = batch_size
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 17
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 17
@@ -1,16 +1,16 @@
 logical_device_names = [logical_device.name for logical_device in tf.config.list_logical_devices()]
 
-if 'GPU' in ''.join(logical_device_names):
-  print('This may be broken in Colab.')
-  device = 'GPU'
-elif 'TPU' in ''.join(logical_device_names):
-  print('This may be broken in Colab.')
-  device = 'TPU'
+if "GPU" in "".join(logical_device_names):
+  print("This may be broken in Colab.")
+  device = "GPU"
+elif "TPU" in "".join(logical_device_names):
+  print("This may be broken in Colab.")
+  device = "TPU"
 else:
-  print('Running on CPU is slow, so only train for a few steps.')
-  device = 'CPU'
+  print("Running on CPU is slow, so only train for a few steps.")
+  device = "CPU"
 
-if device=='CPU':
+if device=="CPU":
   train_steps = 20
   exp_config.trainer.steps_per_loop = 5
 else:
@@ -20,9 +20,9 @@
 exp_config.trainer.summary_interval = 100
 exp_config.trainer.checkpoint_interval = train_steps
 exp_config.trainer.validation_interval = 1000
-exp_config.trainer.validation_steps =  ds_info.splits['test'].num_examples // batch_size
+exp_config.trainer.validation_steps =  ds_info.splits["test"].num_examples // batch_size
 exp_config.trainer.train_steps = train_steps
-exp_config.trainer.optimizer_config.learning_rate.type = 'cosine'
+exp_config.trainer.optimizer_config.learning_rate.type = "cosine"
 exp_config.trainer.optimizer_config.learning_rate.cosine.decay_steps = train_steps
 exp_config.trainer.optimizer_config.learning_rate.cosine.initial_learning_rate = 0.1
 exp_config.trainer.optimizer_config.warmup.linear.warmup_steps = 100
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 21
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 21
@@ -1,14 +1,14 @@
 logical_device_names = [logical_device.name for logical_device in tf.config.list_logical_devices()]
 
 if exp_config.runtime.mixed_precision_dtype == tf.float16:
-    tf.keras.mixed_precision.set_global_policy('mixed_float16')
+    tf.keras.mixed_precision.set_global_policy("mixed_float16")
 
-if 'GPU' in ''.join(logical_device_names):
+if "GPU" in "".join(logical_device_names):
   distribution_strategy = tf.distribute.MirroredStrategy()
-elif 'TPU' in ''.join(logical_device_names):
+elif "TPU" in "".join(logical_device_names):
   tf.tpu.experimental.initialize_tpu_system()
-  tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='/device:TPU_SYSTEM:0')
+  tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu="/device:TPU_SYSTEM:0")
   distribution_strategy = tf.distribute.experimental.TPUStrategy(tpu)
 else:
-  print('Warning: this will be really slow.')
+  print("Warning: this will be really slow.")
   distribution_strategy = tf.distribute.OneDeviceStrategy(logical_device_names[0])
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 23
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 23
@@ -1,5 +1,3 @@
 with distribution_strategy.scope():
   model_dir = tempfile.mkdtemp()
   task = tfm.core.task_factory.get_task(exp_config.task, logging_dir=model_dir)
-
-#  tf.keras.utils.plot_model(task.build_model(), show_shapes=True)
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 24
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 24
@@ -1,4 +1,4 @@
 for images, labels in task.build_inputs(exp_config.task.train_data).take(1):
   print()
-  print(f'images.shape: {str(images.shape):16}  images.dtype: {images.dtype!r}')
-  print(f'labels.shape: {str(labels.shape):16}  labels.dtype: {labels.dtype!r}')
+  print(f"images.shape: {images.shape!s:16}  images.dtype: {images.dtype!r}")
+  print(f"labels.shape: {labels.shape!s:16}  labels.dtype: {labels.dtype!r}")
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 27
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 27
@@ -1 +1 @@
-plt.hist(images.numpy().flatten());
+plt.hist(images.numpy().flatten())
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 29
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 29
@@ -1,2 +1,2 @@
-label_info = ds_info.features['label']
+label_info = ds_info.features["label"]
 label_info.int2str(1)
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 31
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 31
@@ -10,9 +10,6 @@
     if predictions is None:
       plt.title(label_info.int2str(labels[i]))
     else:
-      if labels[i] == predictions[i]:
-        color = 'g'
-      else:
-        color = 'r'
+      color = "g" if labels[i] == predictions[i] else "r"
       plt.title(label_info.int2str(predictions[i]), color=color)
     plt.axis("off")
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 35
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 35
@@ -1,3 +1,3 @@
-plt.figure(figsize=(10, 10));
+plt.figure(figsize=(10, 10))
 for images, labels in task.build_inputs(exp_config.task.validation_data).take(1):
   show_batch(images, labels)
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 37
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 37
@@ -1,7 +1,7 @@
 model, eval_logs = tfm.core.train_lib.run_experiment(
     distribution_strategy=distribution_strategy,
     task=task,
-    mode='train_and_eval',
+    mode="train_and_eval",
     params=exp_config,
     model_dir=model_dir,
     run_post_eval=True)
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 38
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 38
@@ -1 +0,0 @@
-#  tf.keras.utils.plot_model(model, show_shapes=True)
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 40
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 40
@@ -1,4 +1,4 @@
 for key, value in eval_logs.items():
     if isinstance(value, tf.Tensor):
       value = value.numpy()
-    print(f'{key:20}: {value:.3f}')
+    print(f"{key:20}: {value:.3f}")
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 42
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 42
@@ -4,5 +4,5 @@
 
 show_batch(images, labels, tf.cast(predictions, tf.int32))
 
-if device=='CPU':
-  plt.suptitle('The model was only trained for a few steps, it is not expected to do well.')
+if device=="CPU":
+  plt.suptitle("The model was only trained for a few steps, it is not expected to do well.")
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 45
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 45
@@ -1,8 +1,8 @@
 # Saving and exporting the trained model
 export_saved_model_lib.export_inference_graph(
-    input_type='image_tensor',
+    input_type="image_tensor",
     batch_size=1,
     input_image_size=[32, 32],
     params=exp_config,
     checkpoint_path=tf.train.latest_checkpoint(model_dir),
-    export_dir='./export/')
+    export_dir="./export/")
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 47
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 47
@@ -1,3 +1,3 @@
 # Importing SavedModel
-imported = tf.saved_model.load('./export/')
-model_fn = imported.signatures['serving_default']
+imported = tf.saved_model.load("./export/")
+model_fn = imported.signatures["serving_default"]
--- /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 49
+++ /Users/dhruv/playground/ruff/notebooks/image_classification.ipynb:cell 49
@@ -1,10 +1,10 @@
 plt.figure(figsize=(10, 10))
-for data in tfds.load('cifar10', split='test').batch(12).take(1):
+for data in tfds.load("cifar10", split="test").batch(12).take(1):
   predictions = []
-  for image in data['image']:
-    index = tf.argmax(model_fn(image[tf.newaxis, ...])['logits'], axis=1)[0]
+  for image in data["image"]:
+    index = tf.argmax(model_fn(image[tf.newaxis, ...])["logits"], axis=1)[0]
     predictions.append(index)
-  show_batch(data['image'], data['label'], predictions)
+  show_batch(data["image"], data["label"], predictions)
 
-  if device=='CPU':
-    plt.suptitle('The model was only trained for a few steps, it is not expected to do better than random.')
+  if device=="CPU":
+    plt.suptitle("The model was only trained for a few steps, it is not expected to do better than random.")

Would fix 61 errors.
```

</p>
</details> 

resolves: #4727
2023-07-29 04:22:56 +00:00
Micha Reiser 2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00
Zanie Blue 3000a47fe8
Include file permissions in key for cached files (#5901)
Reimplements https://github.com/astral-sh/ruff/pull/3104
Closes https://github.com/astral-sh/ruff/issues/5726

Note that we will generate the hash for a cache key twice in normal
operation. Once to check for the cached item and again to update the
cache. We could optimize this by generating the hash once in
`diagnostics::lint_file` and passing the `u64` into `get` and `update`.
We'd probably want to wrap it in a `CacheKeyHash` enum for type safety.

## Test plan

Unit tests for Windows and Unix.

Manual test with case from issue

```
❯ touch fake.py
❯ chmod +x fake.py
❯ ./target/debug/ruff --select EXE fake.py
fake.py:1:1: EXE002 The file is executable but no shebang is present
Found 1 error.
❯ chmod -x fake.py
❯ ./target/debug/ruff --select EXE fake.py
```
2023-07-25 17:06:47 +00:00
konsti 92f471a666
Handle io errors gracefully (#5611)
## Summary

It can happen that we can't read a file (a python file, a jupyter
notebook or pyproject.toml), which needs to be handled and handled
consistently for all file types. Instead of using `Err` or `error!`, we
emit E602 with the io error as message and continue. This PR makes sure
we handle all three cases consistently, emit E602.

I'm not convinced that it should be possible to disable io errors, but
we now handle the regular case consistently and at least print warning
consistently.

I went with `warn!` but i can change them all to `error!`, too.

It also checks the error case when a pyproject.toml is not readable. The
error message is not very helpful, but it's now a bit clearer that
actually ruff itself failed instead vs this being a diagnostic.

## Examples

This is how an Err of `run` looks now:


![image](https://github.com/astral-sh/ruff/assets/6826232/890f7ab2-2309-4b6f-a4b3-67161947cc83)

With an unreadable file and `IOError` disabled:


![image](https://github.com/astral-sh/ruff/assets/6826232/fd3d6959-fa23-4ddf-b2e5-8d6022df54b1)

(we lint zero files but count files before linting not during so we exit
0)

I'm not sure if it should (or if we should take a different path with
manual ExitStatus), but this currently also triggers when `files` is
empty:


![image](https://github.com/astral-sh/ruff/assets/6826232/f7ede301-41b5-4743-97fd-49149f750337)

## Test Plan

Unix only: Create a temporary directory with files with permissions
`000` (not readable by the owner) and run on that directory. Since this
breaks the assumptions of most of the test code (single file, `ruff`
instead of `ruff_cli`), the test code is rather cumbersome and looks a
bit misplaced; i'm happy about suggestions to fit it in closer with the
other tests or streamline it in other ways. I added another test for
when the entire directory is not readable.
2023-07-20 11:30:14 +02:00
Dhruv Manilawala 7e6b472c5b
Make `lint_only` aware of the source kind (#5876) 2023-07-19 09:29:35 +05:30
Charlie Marsh a1c559eaa4
Only run pyproject.toml lint rules when enabled (#5578)
## Summary

I was testing some changes on Airflow, and I realized that we _always_
run the `pyproject.toml` validation rules, even if they're not enabled.
This PR gates them behind the appropriate enablement flags.

## Test Plan

- Ran: `cargo run -p ruff_cli -- check ../airflow -n`. Verified that no
RUF200 violations were raised.
- Run: `cargo run -p ruff_cli -- check ../airflow -n --select RUF200`.
Verified that two RUF200 violations were raised.
2023-07-08 11:05:05 -04:00
Dhruv Manilawala 2fc38d81e6
Experimental release for Jupyter notebook integration (#5363)
## Summary

Experimental release for Jupyter Notebook integration.

Currently, this requires a user to explicitly opt-in using the
[include](https://beta.ruff.rs/docs/settings/#include) configuration:

```toml
[tool.ruff]
include = ["*.py", "*.pyi", "**/pyproject.toml", "*.ipynb"]
```

Or, a user can pass in the file directly:

```sh
ruff check path/to/notebook.ipynb
```

For known limitations, please refer #5188 

## Test Plan

Following command should work without the `--all-features` flag:

```sh
cargo dev round-trip /path/to/notebook.ipynb
```

Following command should work with the above config file along with
`select = ["ALL"]`:

```sh
cargo run --bin ruff -- check --no-cache --config=../test-repos/openai-cookbook/pyproject.toml --fix ../test-repos/openai-cookbook/
```

Passing the Jupyter notebook directly:

```sh
cargo run --bin ruff -- check --no-cache --isolated --select=ALL --fix ../test-repos/openai-cookbook/examples/Classification_using_embeddings.ipynb
```
2023-06-26 21:22:42 +05:30
Thomas de Zeeuw 1c638264b2
Keep track of when files are last seen in the cache (#5214)
## Summary

And remove cached files that we haven't seen for a certain period of
time, currently 30 days.

For the last seen timestamp we actually use an `u64`, it's smaller on
disk than `SystemTime` (which size is OS dependent) and fits in an
`AtomicU64` which we can use to update it without locks.

## Test Plan

Added a new unit test, run by `cargo test`.
2023-06-23 15:40:35 +02:00
Charlie Marsh 6b8b318d6b
Use `mod tests` consistently (#5278)
As per the Rust documentation.
2023-06-22 01:50:28 +00:00
Thomas de Zeeuw 17f1ecd56e
Open cache files in parallel (#5120)
## Summary

Open cache files in parallel (again), brings the performance back to be roughly equal to the old implementation.

## Test Plan

Existing tests should keep working.
2023-06-20 17:43:09 +02:00
Thomas de Zeeuw e3c12764f8
Only use a single cache file per Python package (#5117)
## Summary

This changes the caching design from one cache file per source file, to
one cache file per package. This greatly reduces the amount of cache
files that are opened and written, while maintaining roughly the same
(combined) size as bincode is very compact.

Below are some very much not scientific performance tests. It uses
projects/sources to check:

* small.py: single, 31 bytes Python file with 2 errors.
* test.py: single, 43k Python file with 8 errors.
* fastapi: FastAPI repo, 1134 files checked, 0 errors.

Source   | Before # files | After # files | Before size | After size
-------|-------|-------|-------|-------
small.py | 1              | 1             | 20 K        | 20 K
test.py  | 1              | 1             | 60 K        | 60 K
fastapi  | 1134           | 518           | 4.5 M       | 2.3 M

One question that might come up is why fastapi still has 518 cache files
and not 1? That is because this is using the existing package
resolution, which sees examples, docs, etc. as separate from the "main"
source code (in the fastapi directory in the repo). In this future it
might be worth consider switching to a one cache file per repo strategy.

This new design is not perfect and does have a number of known issues.
First, like the old design it doesn't remove the cache for a source file
that has been (re)moved until `ruff clean` is called.

Second, this currently uses a large mutex around the mutation of the
package cache (e.g. inserting result). This could be (or become) a
bottleneck. It's future work to test and improve this (if needed).

Third, currently the packages and opened and stored in a sequential
loop, this could be done parallel. This is also future work.


## Test Plan

Run `ruff check` (with caching enabled) twice on any Python source code
and it should produce the same results.
2023-06-19 17:46:13 +02:00
Charlie Marsh 732b0405d7
Remove `FixMode::None` (#5087)
## Summary

We now _always_ generate fixes, so `FixMode::None` and
`FixMode::Generate` are redundant. We can also remove the TODO around
`--fix-dry-run`, since that's our default behavior.

Closes #5081.
2023-06-14 11:17:09 -04:00
Dhruv Manilawala d8f5d2d767
Add support for auto-fix in Jupyter notebooks (#4665)
## Summary

Add support for applying auto-fixes in Jupyter Notebook.

### Solution

Cell offsets are the boundaries for each cell in the concatenated source
code. They are represented using `TextSize`. It includes the start and
end offset as well, thus creating a range for each cell. These offsets
are updated using the `SourceMap` markers.

### SourceMap

`SourceMap` contains markers constructed from each edits which tracks
the original source code position to the transformed positions. The
following drawing might make it clear:

![SourceMap visualization](https://github.com/astral-sh/ruff/assets/67177269/3c94e591-70a7-4b57-bd32-0baa91cc7858)

The center column where the dotted lines are present are the markers
included in the `SourceMap`. The `Notebook` looks at these markers and
updates the cell offsets after each linter loop. If you notice closely,
the destination takes into account all of the markers before it.

The index is constructed only when required as it's only used to render
the diagnostics. So, a `OnceCell` is used for this purpose. The cell
offsets, cell content and the index will be updated after each iteration
of linting in the mentioned order. The order is important here as the
content is updated as per the new offsets and index is updated as per
the new content.

## Limitations

### 1

Styling rules such as the ones in `pycodestyle` will not be applicable
everywhere in Jupyter notebook, especially at the cell boundaries. Let's
take an example where a rule suggests to have 2 blank lines before a
function and the cells contains the following code:

```python
import something
# ---
def first():
	pass

def second():
	pass
```

(Again, the comment is only to visualize cell boundaries.)

In the concatenated source code, the 2 blank lines will be added but it
shouldn't actually be added when we look in terms of Jupyter notebook.
It's as if the function `first` is at the start of a file.

`nbqa` solves this by recording newlines before and after running
`autopep8`, then running the tool and restoring the newlines at the end
(refer https://github.com/nbQA-dev/nbQA/pull/807).

## Test Plan

Three commands were run in order with common flags (`--select=ALL
--no-cache --isolated`) to isolate which stage the problem is occurring:
1. Only diagnostics
2. Fix with diff (`--fix --diff`)
3. Fix (`--fix`)

### https://github.com/facebookresearch/segment-anything

```
-------------------------------------------------------------------------------
 Jupyter Notebooks       3            0            0            0            0
 |- Markdown             3           98            0           94            4
 |- Python               3          513          468            4           41
 (Total)                            611          468           98           45
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix
...
Found 180 errors (89 fixed, 91 remaining).
```

### https://github.com/openai/openai-cookbook

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      65            0            0            0            0
 |- Markdown            64         3475           12         2507          956
 |- Python              65         9700         7362         1101         1237
 (Total)                          13175         7374         3608         2193
===============================================================================
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix
error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-'
...
Found 4227 errors (2165 fixed, 2062 remaining).
```

### https://github.com/tensorflow/docs

```
-------------------------------------------------------------------------------
 Jupyter Notebooks     150            0            0            0            0
 |- Markdown             1           55            0           46            9
 |- Python               1          402          289           60           53
 (Total)                            457          289          106           62
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent
...
Found 12726 errors (5140 fixed, 7586 remaining).
```

### https://github.com/tensorflow/models

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      46            0            0            0            0
 |- Markdown             1           11            0            6            5
 |- Python               1          328          249           19           60
 (Total)                            339          249           25           65
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix
...
Found 4856 errors (2690 fixed, 2166 remaining).
```

resolves: #1218
fixes: #4556
2023-06-12 14:14:15 +00:00
konstin b6a382eeaf
Lint pyproject.toml (#4496)
This adds a new rule `InvalidPyprojectToml` that lints pyproject.toml by checking if https://github.com/PyO3/pyproject-toml-rs can parse it. This means the linting is currently very basic, e.g. we don't check whether the name is actually a valid python project name or appropriately normalized. It does catch errors e.g. with invalid dependency requirements or problems withs the license specifications. It is open to be extended in the future (validate name, SPDX expressions, classifiers, ...), either in ruff or in pyproject-toml-rs.

Test plan:

```
scripts/ecosystem_all_check.sh check --select RUF200
```
This lead to a bunch of 
```
RUF200 Failed to parse pyproject.toml: missing field `name`
```
(e.g. https://github.com/amitsk/fastapi-todos/blob/main/pyproject.toml) which is indeed invalid (https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#specification).

Filtering those out, the following other problems were found by `cd target/ecosystem_all_results/ && rg RUF200`:
```
UCL-ARC:rred-reports.stdout.txt
1:pyproject.toml:27:16: RUF200 Failed to parse pyproject.toml: Version specifier `>='3.9'` doesn't match PEP 440 rules
EndlessTrax:python-start-project.stdout.txt
1:pyproject.toml:14:16: RUF200 Failed to parse pyproject.toml: Expected package name starting with an alphanumeric character, found '#'
redjax:gardening-api.stdout.txt
1:pyproject.toml:7:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
ajslater:codex.stdout.txt
2:  3:17 RUF200 Failed to parse pyproject.toml: invalid type: sequence, expected a string
LDmitriy7:404_AvatarsBot.stdout.txt
1:pyproject.toml:3:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
ajslater:comicbox.stdout.txt
1:pyproject.toml:3:17: RUF200 Failed to parse pyproject.toml: invalid type: sequence, expected a string
manueldevillena:forecast-earnings.stdout.txt
1:pyproject.toml:24:12: RUF200 Failed to parse pyproject.toml: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `^`
redjax:ohio_utility_scraper.stdout.txt
1:pyproject.toml:11:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
agronholm:typeguard.stdout.txt
1:pyproject.toml:40:8: RUF200 Failed to parse pyproject.toml: Expected a valid marker name, found 'python_implementation'
cyuss:decathlon-turnover.stdout.txt
1:pyproject.toml:7:12: RUF200 Failed to parse pyproject.toml: invalid type: string "Youcef", expected a table with 'name' and 'email' keys
ajslater:boilerplate.stdout.txt
1:pyproject.toml:3:17: RUF200 Failed to parse pyproject.toml: invalid type: sequence, expected a string
kaparoo:lightning-project-template.stdout.txt
1:pyproject.toml:56:16: RUF200 Failed to parse pyproject.toml: You can't mix a >= operator with a local version (`+cu117`)
dijital20:pytexas2023-decorators.stdout.txt
1:pyproject.toml:5:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
pfouque:django-anymail-history.stdout.txt
1:pyproject.toml:137:12: RUF200 Failed to parse pyproject.toml: Version specifier `> = 1.2.0` doesn't match PEP 440 rules
pfouque:django-fakemessages.stdout.txt
1:pyproject.toml:130:12: RUF200 Failed to parse pyproject.toml: Version specifier `> = 1.2.0` doesn't match PEP 440 rules
pypa:build.stdout.txt
1:tests/packages/test-invalid-requirements/pyproject.toml:2:12: RUF200 Failed to parse pyproject.toml: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `i`
4:tests/packages/test-no-requires/pyproject.toml:1:1: RUF200 Failed to parse pyproject.toml: missing field `requires`
UnoYakshi:FRAAND.stdout.txt
2:  3:11 RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
DHolmanCoding:python-template.stdout.txt
1:pyproject.toml:22:1: RUF200 Failed to parse pyproject.toml: missing field `requires`
```
Overall, this emitted errors in 43 out of 3408 projects (`rg -c RUF200 target/ecosystem_all_results/ | wc -l`)


Co-authored-by: Micha Reiser <micha@reiser.io>
2023-05-25 12:05:28 +00:00
Jonathan Plasse c10a4535b9
Disallow `unreachable_pub` (#4314) 2023-05-11 18:00:00 -04:00
Micha Reiser 8969ad5879
Always generate fixes (#4239) 2023-05-10 07:06:14 +00:00
Micha Reiser cab65b25da
Replace row/column based `Location` with byte-offsets. (#3931) 2023-04-26 18:11:02 +00:00
Micha Reiser c33c9dc585
Introduce SourceFile to avoid cloning the message filename (#3904) 2023-04-11 08:28:55 +00:00
Micha Reiser 381203c084
Store source code on message (#3897) 2023-04-11 07:57:36 +00:00
Chris Chan 10504eb9ed
Generate `ImportMap` from module path to imported dependencies (#3243) 2023-04-04 03:31:37 +00:00
Charlie Marsh 6ed6da3e82
Move `fix::FixMode` to `flags::FixMode` (#3753) 2023-03-26 21:40:06 +00:00
konstin 81d0884974
Add basic jupyter notebook support (#3440)
* Add basic jupyter notebook support behind a feature flag

* Address review comments

* Rename in separate commit to make both git and clippy happy

* cfg(feature = "jupyter_notebook") another test

* Address more review comments

* Address more review comments

* and clippy and windows

* More review comment
2023-03-20 12:06:01 +01:00
Charlie Marsh 1e5db58b7b
Include individual path checks in --verbose logging (#3489) 2023-03-13 17:13:47 -04:00
Charlie Marsh 3ed539d50e
Add a CLI flag to force-ignore noqa directives (#3296) 2023-03-01 22:28:13 -05:00
Jeong YunWon c8c575dd43
Adapt BoolLike to flags (#3175) 2023-02-23 16:31:46 -05:00
Micha Reiser 262e768fd3
refactor(ruff): Implement `doc_lines_from_tokens` as iterator (#3124)
This is a nit refactor... It implements the extraction of document lines as an iterator instead of a Vector to avoid the extra allocation.
2023-02-22 09:22:06 -05:00
Charlie Marsh 18800c6884
Include file permissions in cache key (#3104) 2023-02-21 18:20:06 -05:00
Charlie Marsh 1666e8ba1e
Add a `--show-fixes` flag to include applied fixes in output (#2707) 2023-02-12 20:48:01 +00:00
Charlie Marsh 81abc5d7d8
Move error and warning messages into log macro (#2669) 2023-02-08 14:39:09 -05:00
Micha Reiser cd8be8c0be
refactor: Introduce crates folder (#2088)
This PR introduces a new `crates` directory and moves all "product" crates into that folder. 

Part of #2059.
2023-02-05 16:47:48 -05:00