Miscellaneous routines in scipy included convenient functions. This article cover the removal of this feature.
Table of content
- Why was
_Miscellaneous routines_
removed fromSciPy
? - A Very Brief Introduction to
scipy.datasets
- What does this implement/fix ?
- Pooch and scipy.datasets partnership
Why was _Miscellaneous routines_
removed from SciPy
?
Back in the olden days the Miscellaneous routines of Scipy used to have some importance. But in 2022 it only has five methods ascent()
, central_diff_weights
, derivative
, face
, electrocardiogram
. Most of these methods have moved under some other module/submodule and many Users have complaints about the usefulness/computational inefficiencies of this submodule. For example:
” Stephan Hoyer I would vote for removing them entirely, I haven’t used either of them, it just came up in a search for finite differences in Python”
Because scipy.misc
is a submodule with five methods only. This increases the package size of the Library and comprises optimization, an overall decrease in the processing speed of other methods.
Considering all the reasons given above people were now Frustrated with this! ↓
in 2018 An enthusiast Warren Weckesser created a pull request to remove
.misc
from SciPy And introduce a new Ideascipy.datasets
for some unfortunate reasons it did not proceed. ↓ This year Anirudh Dagar Picked up this idea and eventually convinced the Scipy Maintainers to addscipy.datasets
as a submodule (which includes all methods of.misc
)
A Very Brief Introduction to scipy.datasets
>>> from scipy import datasets
# Example ascent dataset loading with the new module
>>> datasets.ascent()
array([[ 83, 83, 83, ..., 117, 117, 117],
[ 82, 82, 83, ..., 117, 117, 117],
[ 80, 81, 83, ..., 117, 117, 117],
...,
[178, 178, 178, ..., 57, 59, 57],
[178, 178, 178, ..., 56, 57, 57],
[178, 178, 178, ..., 57, 57, 58]])
What does this implement/fix ?
With gh-8707, in 2018, SciPy wanted to introduce the datasets
submodule and move a handful of dataset functions from the current misc
module to this new datasets
submodule. A Big Thanks to @WarrenWeckesser for discovering this idea. With this PR (indeed inspired by gh-8707) they (Anirudh and Ralf) resume those efforts after making some improvements (explained below) and move away from the scipy.misc
module, finally deprecating it in a separate PR #15901)
- Add
scipy.datasets
submodule - Utilize pooch to handle the dataset downloading and caching.
- Enable meson support for
scipy.datasets
submodule - Move all dataset files (eg.
scipy.stats
has its own test datasets within the repository) to their respective new repository (explained below). This is something that can be done after landing this PR once we have a concrete datasets API and approach defined for adding new datasets. - Deprecate the misc module (DEP: Deprecate scipy.misc in favour of scipy.datasets #15901)
Pooch and scipy.datasets partnership
Pooch manages data registrations by downloading your data files from a server only when needed and storing them locally in a data cache (a folder on your computer).
- With
Pooch
you can easily decouple the datasets that are currently present within theSicPy
repository and move them to their new repository. For example, see https://github.com/scipy-datasets, where each dataset has its own repository. This will lead to a lightweightSciPy
Package decreasing the download size for future releases. Keeping the datasets in individual repositories or a singlescipy-datasets
repository is a point of discussion..... - Dependency: Pooch is an extremely light package and has only a few dependencies, so if you were to add a new dependency i.e. Pooch, you can expect it to be small and at the same time it won’t add a lot of sub-dependencies.