[SOLVED] Understanding scikit learn import variants

Understanding scikit learn import variants

Scikit learn import statements in their tutorials are on the form

from sklearn.decomposition import PCA

Another versions that works is

import sklearn.decomposition
pca = sklearn.decomposition.PCA(n_components = 2)

However

import sklearn
pca = sklearn.decomposition.PCA(n_components = 2)

does not, and complains

AttributeError: module 'sklearn' has no attribute 'decomposition'

Why is this, and how can I predict which ones will work and not so i don't have to test around? If the understanding and predictiveness extends to python packages in general that would be the best.

Solution

sklearn doesn't automatically import its submodules. If you want to use sklearn.<SUBMODULE>, then you will need to import it explicitly e.g. import sklearn.<SUBMODULE>. Then you can use it without any further imports like result = sklearn.<SUBMODULE>.function(...).

Large packages often behave this way where they don't automatically import all the submodules.

Memory and load-time efficiency become worse if the submodules are automatically loaded; by specifying the submodule explicitly it saves on memory consumption and minimises the start-up time. I think namespace cluttering is another consideration, where explicit imports reduce the chance of naming conflicts and help maintain clarity about the specific functionality being used.