Issue with "preprocessor" argument in TorchTrainer() and ray 2.7 #56

Closed
opened 2025-11-02 00:02:40 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Ibraheem101 on GitHub (Oct 10, 2023).

I think the code needs to be updated. Calling the TorchTrainer class below:

# Trainer
trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
    train_loop_config=train_loop_config,
    scaling_config=scaling_config,
    run_config=run_config,
    datasets={"train": train_ds, "val": val_ds},
    dataset_config=dataset_config,
    preprocessor=preprocessor,
)

reults in the following error

2023-10-09 11:00:25,041	WARNING data_parallel_trainer.py:283 -- The dict form of `dataset_config` is deprecated. Use the DataConfig class instead. Support for this will be dropped in a future release.
---------------------------------------------------------------------------
DeprecationWarning                        Traceback (most recent call last)
[<ipython-input-101-7feea6243449>](https://localhost:8080/#) in <cell line: 2>()
      1 # Trainer
----> 2 trainer = TorchTrainer(
      3     train_loop_per_worker=train_loop_per_worker,
      4     train_loop_config=train_loop_config,
      5     scaling_config=scaling_config,

2 frames
[/usr/local/lib/python3.10/dist-packages/ray/train/base_trainer.py](https://localhost:8080/#) in __init__(self, scaling_config, run_config, datasets, metadata, resume_from_checkpoint, preprocessor)
    227 
    228         if preprocessor is not None:
--> 229             raise DeprecationWarning(PREPROCESSOR_DEPRECATION_MESSAGE)
    230 
    231     @PublicAPI(stability="alpha")

DeprecationWarning: The `preprocessor` argument to Trainers is deprecated as of Ray 2.7. Instead, use the Preprocessor `fit` and `transform` APIs directly on the Ray Dataset. For any state that needs to be saved to the trained checkpoint, pass it in using the `metadata` argument of the `Trainer`. For a full example, see https://docs.ray.io/en/master/train/user-guides/data-loading-preprocessing.html#preprocessing-structured-data
Originally created by @Ibraheem101 on GitHub (Oct 10, 2023). I think the code needs to be updated. Calling the TorchTrainer class below: ``` # Trainer trainer = TorchTrainer( train_loop_per_worker=train_loop_per_worker, train_loop_config=train_loop_config, scaling_config=scaling_config, run_config=run_config, datasets={"train": train_ds, "val": val_ds}, dataset_config=dataset_config, preprocessor=preprocessor, ) ``` reults in the following error ``` 2023-10-09 11:00:25,041 WARNING data_parallel_trainer.py:283 -- The dict form of `dataset_config` is deprecated. Use the DataConfig class instead. Support for this will be dropped in a future release. --------------------------------------------------------------------------- DeprecationWarning Traceback (most recent call last) [<ipython-input-101-7feea6243449>](https://localhost:8080/#) in <cell line: 2>() 1 # Trainer ----> 2 trainer = TorchTrainer( 3 train_loop_per_worker=train_loop_per_worker, 4 train_loop_config=train_loop_config, 5 scaling_config=scaling_config, 2 frames [/usr/local/lib/python3.10/dist-packages/ray/train/base_trainer.py](https://localhost:8080/#) in __init__(self, scaling_config, run_config, datasets, metadata, resume_from_checkpoint, preprocessor) 227 228 if preprocessor is not None: --> 229 raise DeprecationWarning(PREPROCESSOR_DEPRECATION_MESSAGE) 230 231 @PublicAPI(stability="alpha") DeprecationWarning: The `preprocessor` argument to Trainers is deprecated as of Ray 2.7. Instead, use the Preprocessor `fit` and `transform` APIs directly on the Ray Dataset. For any state that needs to be saved to the trained checkpoint, pass it in using the `metadata` argument of the `Trainer`. For a full example, see https://docs.ray.io/en/master/train/user-guides/data-loading-preprocessing.html#preprocessing-structured-data ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/Made-With-ML#56