The Anaconda Python Distribution 2021-11 Ubuntu 22.04 LTS Install

In this guide I have a look at installing the Anaconda Python Distribution which contains Python and the most commonly used Data Science Libraries on Ubuntu 22.04 LTS. Instructions should be equally applicable to other modern Linux distributions. I go through the installation in some detail, highlighting the installation and initialization procedure. I then look at use of the conda package manager syntax as well as a reference to physical file locations. I look at using the Anaconda Navigator, Spyder IDE, JupyterLab IDE and Visual Code IDE. I also discuss the difference between the official conda channel and the community conda-forge channel and go through instructions for installing the latest community version of Spyder and JupyterLab using conda environments.

This guide is also equally applicable to Miniconda. Miniconda is a stripped more lightweight version of Anaconda. It is essentially the same installation as Anaconda but gives a more or less empty conda base environment. The Anaconda Individual Edition is designed for only Individual use and has Commercial restrictions, while Miniconda is exempt from these Commercial restrictions. As the (base) conda environment is empty in Miniconda, you will need to create your own conda environments to install the Spyder 5 and JupyterLab 3 IDEs respectively. These steps are covered in this guide.

I have a separate Anaconda Installation Guide for Windows here:

Uninstalling and Purging an Old Anaconda Installation

Problems can occur when remnants of an old install are present such as old or corrupt configuration files.

Uninstalling

Go to the Home Folder and delete the anaconda3 folder. This will uninstall Anaconda.

Purge Old Configuration Files

To purge the old configuration files which can be problematic. Enable hidden folders by selecting the folder options icon and then Show Hidden Files

Delete the following folders which contain configuration files .anaconda, .conda, .continuum, .ipython, .jupyter, .condarc and .vscode folders:

Go to the .config folder:

Delete the Code, matplotlib and spyder-py3 folders:

Go to the .local folder:

And then the share folder

Delete the jupyter and Spyder folder:

Remove conda Initialization from .bashrc

Go to the Home folder and open the .bashrc file in the text editor:

Delete the block of code beginning and ending with # >>> conda initialize >>> # <<< conda initialize <<< and then save:

Installing Anaconda

The Anaconda bash script install

The Anaconda or Miniconda Linux installer are bash scripts:

Either one of these has to be installed in the Terminal using the command bash. Since you are installing the Anaconda or Miniconda with the aim to learn programming or to program. It is useful for you to familiarise yourself with the Linux Terminal before getting started with Python. I have a guide Essential Linux Terminal Commands which will take you through the basics of using the Linux Terminal:

Right click the downloads folder and select open in Terminal:

Right click the Anaconda script file and select rename, copy the file name including the extension:

Type in:

bash 

Then right click the Terminal and paste in the file name:

You should have a command like the following:

bash install Anaconda3-2021.11-Linux-x86_64.sh

Press ↵. This will display a license agreement within the terminal. Hold down the ↵ to scroll through this license agreement:

An infinite loop will display until you accept the license agreement:

Input:

yes

Then press ↵.

You will be then asked where you want to install Anaconda and a default location will be selected. Press ↵ to proceed with the installation in the default location:

Initializing Anaconda

In the next screen you will be asked whether you want to initialize Anaconda:

What this does is update the .bashrc file found in your Home Directory. To view this file, select the folder options and then Show Hidden Files:

If you open it up in the text editor:

You will see no conda commands by default:

In order to initialize Anaconda type in:

yes

Then press ↵.

Now the file has the conda commands:

The terminal must be closed and opened so it looks to the .bashrc file for the additional conda commands:

You will see that your terminal prompt is now prefixed with (base) which means the (base) conda environment is selected.

Unfortunately blindly pressing ↵ during the installation will select no at the question to initialise Anaconda leaving Anaconda installed but in an unusable state. To rectify this, copy and paste the following to the end of your .bashrc file. Replace the four instances of my username philip with your own username (shown at the start of your terminal prompt):

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/philip/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/philip/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/philip/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/philip/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

The conda package manager

You have just installed Anaconda or Miniconda and this includes the conda package manager. To use it type in:

conda

This will display a number of subcommands you can use with it:

Syntax

The most important are:

conda list
conda search packagename
conda install packagename
conda remove packagename
conda update packagename

Where packagename is replaced by the package name that you are interested in. Like the Linux Terminal, the conda command can use options which have the form of a dash followed by one to two letters such as -n or -c and flags which have the form of two dashes and a full word such as –name or –channel.

The option -c and flag –channel refer to the same thing. The option is faster to type but the flag is more readable.

conda list

We can use conda list to list all the packages installed:

conda list

Since Anaconda is installed, the conda base environment has a large list of the most commonly used datascience libraries.

For simplicity I will mention only 7 of these packages which you are more likely to have some familiarity with:

python 3.9.7
numpy 1.20.3
matplotlib 3.4.3
pandas 1.3.4
seaborn 0.11.2
spyder 5.1.5
jupyterlab 3.2.1

Let's use these to understand how the conda package manager works a bit more under the hood. Let's go to the anconda3 folder:

~/anaconda3

In this folder we see a conda-meta subfolder:

~/anaconda3/conda-meta

This folder contains a collection of json files which list all of the libraries included in the base conda environment:

If we select the python-3.9.7 json file for example, it tells us where the extracted package is:

The extracted package is in the pkgs folder

~/anaconda3/pkgs

This contains all the packages used for all the conda environments – more details later, for now our only conda environment is base.

If we select the python 3.9.7 folder:

Then the lib:

Here we see all the Python standard inbuilt modules such as datetime:

When we type in a command such as:

import datetime

we are physically referring to this datetime.py file.

If now we do the same for numpy, we have a json file for numpy and numpy-base:

Once again we see the location in the pkgs folder:

We can explore the numpy base:

The lib subfolder:

The subfolder stating the python version this version of numpy is compatible with:

The site-packages subfolder:

And finally the numpy subfolder:

This folder contains __init__.py which is the file referenced when you type:

import numpy as np

This means look for numpy.py, if you don't find it look for a numpy folder and within that folder look for __init__.py

If we do the same for matplotlib:

The json files once again refer to the pkgs folder:

If we have a look at the matplotlib base folder:

We have a lib subfolder:

A python 3.9 subfolder stating the version of python this library is compatible for:

A site-packages subfolder:

And a matplotlib subfolder:

Notice there is an __init__.py and pyplot.py file:

When we use:

import matplotlib.pyplot

This means look for a matplotlib folder and within that folder look for pyplot.py. The dot between matplotlib.pyplot means the pyplot.py file is in the same directory as __init__.py

conda update

We can use conda update with the flag –all to look for any updates for the packages within the conda base environment from the official conda channel:

conda update --all

Here 5 updates are found mainly relating to the anaconda-navigator and the conda package manager:

In order to proceed input:

y

Now Anaconda is up to date (with respect to the official conda channel):

If we go to conda-meta subfolder and take Anaconda Navigator as an example. We see the json file now references version 2.1.2 and the older json file referencing version 2.1.1 is gone

In the pkgs folder however both versions are available:

Only the latest version is available.

If we type in conda list followed by the flag revision, we will see the initial revision rev0 and the new revision rev1:

conda list --revision

conda install

The conda install command can be used to install a conda package (which will be discussed in more detail) when managing conda environments. The conda install command however can be used to rollback a revision which is useful if an update causes a problem:

conda install --revision 0

The Anaconda Navigator

The Anaconda Navigator is a GUI version of the conda package manager although it is somewhat limited compared to its command line based equivalent. It also contains a series of tiles for launch installed Python Development Environments.

It can be launched from the terminal by typing in:

anaconda-navigator

The Anaconda Navigator shows a series of tiles for launching Python Integrated Development Environments such as Spyder and JupyterLab:

The Environments tab is essentially a GUI version of the conda package manager, think of it as a GUI version of conda list:

Anaconda Navigator Preferences

Some preferences can be changed by selecting File→Preferences:

These preferences are saved to a configuration file in a hidden folder:

~/.anaconda/navigator/anaconda-navigator.ini

LibGL Error

Unfortunately there is a conflict between the stdc++ library installed by Anaconda and the Operating System. Ubuntu 22.04 LTS uses a new graphics display model and the stdc++ driver installed by Anaconda is too old displaying this error:

libGL error: MESA-LOADER: failed to open iris

libGL error: failed to load driver iris

libGL error: failed to load driver: swrast

This error can be ignored and the Anaconda Navigator otherwise works as intended.

Alternatively you can go to the:

~/anaconda3/lib

Search for libstd and delete the three files shown. This will force the use of the Operating System default files and remove the error from displaying:

Hopefully the anaconda navigator will be updated to address this error for Ubuntu 22.04 LTS now that it is mainstream.

Spyder

The Spyder IDE can be launched from its tile in the Anaconda Navigator or by using the command:

spyder

Spyder will launch and prompt you for a tour of the IDE:

Spyder Preferences

The preferences can be changed by going to Tools and then Preferences:

These preferences are found within a hidden folder:

~/.config/spyder-py3/config/spyder.ini

JupyterLab

JupyterLab is a browser based IDE and can be launched from its tab in the Anaconda Navigator or by using the command:

jupyter-lab

This is the only place where a dash is used within the middle of the word jupyterlab.

JupyterLab opens in the default browser. Note that in Ubuntu 22.04 LTS, all browsers are snap packages. These are sandboxed and prevented from accessing hidden files. There is a permission error that needs to be addressed in order to use JupyterLab, we will rectify this in a moment.

To close JupyterLab, close the browser tab that JupyterLab is open in:

This will leave it still running a server in the terminal:

Press [Ctrl] + [ c ] to close the process and input y in order to select yes when asked whether you want to close down this process. You may need to repeat this a couple of times:

JupyterLab Preferences File

Since we can't even launch JupyterLab we need to generate a Preferences file. To do this input:

jupyter notebook --generate-config

Details about the file generated are shown in the Terminal. It is located in:

~/.jupyter/jupyter_notebook_config.py

To save changes to the configuration uncomment out the appropriate line and modify as needed. We are interested in line 157, as JupyterLab works better in Chromium change this line to:

c.NotebookApp.browser = 'Chromium'

And the most important change is on line 543. This will prevent it from looking for the redirect file the snap broswer has no permissions to access:

c.NotebookApp.use_redirect_file = False

Now inputting:

jupyter-lab

Will launch JupyterLab as expected:

Hopefully this modification will be done by default in newer builds of JupyterLab.

Visual Studio Code

Installing Visual Studio Code

Visual Studio Code is not preinstalled with Anaconda but can easily be installed from the Ubuntu Store.

Search for code:

Select Install:

Input your password to authenticate:

Launching Visual Studio Code

You can launch Visual Studio Code from its tile on the Start Screen, its tile in the anaconda Navigator or by using the command:

code

Installing the Python Extension

To use Python we must install the Python extension. To the right hand side select the Extensions Tab and then select the Python Extension and select Install:

The extension should now be installed:

Selecting the Python Interpreter

Python is inbuilt into Linux but the inbuilt Python has no Data Science libraries. Visual Studio Code uses the inbuilt Python by default meaning you'll get a module not found error if you attempt to use a Data Science library. We'll need to select the Python interpreter to use the conda (base) environment.

Press [Ctrl] + [] + [ p ] to open the Command Palette:

Search for interpreter:

With your mouse click on the Python: Select Interpreter:

Change to the Python 3.9.7 (base):

conda and conda-forge channels

There are two "official" conda channels conda and conda-forge.

conda is maintained by the Anaconda team who spend time putting a collection of Python packages together and assessing their compatibility with one another. The conda team tend to only update most their packages once to twice a year.

conda-forge is the developer channel. This is where the developers of each package submit their packages. In many cases the packages made by small developers are available on conda-forge but not the conda channel.

This can be demonstrated for example with python-docx which is a Python library used to create a Word Document from a Python script. There are no search results on the conda channel:

conda search python-docx

To search another channel we use the option -c followed by the name of the channel:

conda search -c conda-forge python-docx

Larger more well-known packages such as Spyder are available on both conda and conda-forge. Spyder has far more frequent updates than once to twice a year and therefore there are differences between what both channels display as the latest version:

conda search spyder
conda search -c conda-forge spyder

As a result sometimes there can be a fix or improvement made to Spyder for example that will take about 6 months-1 year to be updated by the Anaconda team.

conda-forge packages can be added to your conda (base) environment when the package is relatively small and has only a handful of dependencies.

conda install -c conda-forge python-docx

The conda package manager will be able to "solve" the environment and ask you to confirm the changes. Input:

y

The package will successfully be added:

conda environments

On the other hand for packages with a huge number of dependencies… The conda package manager will likely enter an infinite loop attempting to solve your environment. This issue is common when trying to update Spyder to the latest version available on conda-forge. There is a subfolder within anaconda3 called envs:

~/anaconda3/envs

This is empty by default as there is only one environment the base environment:

Install Latest Spyder 5 to New conda env

Let's create a new environment called spyder.

If you had an old environment called spyder e.g. from an older version use the following to remove it:

conda remove -n spyder

To create a new environment use:

conda create -n spyder

After inputting:

y

A spyder subfolder appears:

Note up until now all commands begin with (base) meaning the base conda environment is selected:

If we type in:

conda activate spyder

We now see that (spyder) is in place of (base). This means the (spyder) conda environment is selected. If we use conda list, conda update, conda install or conda remove it will make changes to the conda (spyder) environment and not (base). Note when the terminal is closed and opened it defaults to (base) however we can change back to (base) without closing the terminal by using:

conda activate base

Now let's install spyder from the conda-forge channel:

conda install -c conda-forge spyder

Here we see the huge number of mandatory dependencies for the latest version of the Spyder 5 IDE:

To install them input:

y

To launch spyder from the (spyder) conda env input:

spyder

We can see we are using the newer Spyder by going to Help→About:

If we go to Help→Dependencies, we can see that the optional dependencies are not installed, meaning we'll get module not found errors if we attempt to use a Data Science library:

To rectify this we can use:

conda install -c conda-forge cython seaborn sympy openpyxl xlrd xlsxwriter lxml sqlalchemy scikit-learn

Input:

y

to proceed.

We now have the mandatory and optional dependencies installed for Spyder.

You can add other packages to this conda environment if you need them using a similar install command to the above.

Ensure you remember to activate the environment before launching spyder.

To later update this conda environment using the latest packages from the conda-forge channel use:

conda activate spyder
conda update -c conda-forge --all

Do not attempt to do this with the base conda environment if using Anaconda (you should stick to the conda channel for the base environment where possible).

Install Latest JupyterLab 3 to New conda env

The procedure to install the latest version of JupyterLab in a conda environment is similar.

Remove an old conda env if present:

conda remove -n jupyterlab3

Install JupyterLab plus optional dependencies:

conda create -n jupyterlab
conda activate jupyterlab
conda install -c conda-forge jupyterlab
conda install -c conda-forge cython seaborn sympy openpyxl xlrd xlsxwriter lxml sqlalchemy scikit-learn
conda install -c conda-forge nodejs ipywidgets 
jupyter-lab

Install optional JupyterLab extensions such as the variable inspector, interactive matplotlib, plotly and drawio:

conda install -c conda-forge jupyterlab-variableinspector ipympl plotly jupyterlab-drawio

To later update this conda environment using the latest packages from the conda-forge channel use:

conda activate jupyterlab
conda update -c conda-forge --all

Do not attempt to do this with the base conda environment if using Anaconda (you should stick to the conda channel for the base environment where possible).

Spyder 5 IDE Tutorial

Spyder 5 is one of the best IDEs for learning Python and one of the most popular for Data Science.

Spyder Preferences

The Spyder Preference can be altered by going to Tools and Preferences:

In the Appearance Tab, the syntax highlighting theme can be changed from Spyder Dark to Spyder:

In the Editor, Indent Guides and Blank Spaces can be shown:

Select Apply and then Yes:

Spyder will restart using the Spyder Syntax highlighting scheme:

File Menu

Spyder has a file menu to save and open script files. Each script displays in its own tab. The current directory (the directory the last script is run) shows at the top:

We can save our script file in Documents for example:

In this case I will call it spyder_script

Now it displays at the top and changes aren't saved so it is indicated with a *

Syntax Highlighting

Syntax highlighting is carried out by default. Numeric values on line 2 and 3 are highlighted in brown. strs on line 7, 14, 15 and 16 are highlighted in green.

Note the matching bracket for the bracket selected on line 16 is highlighted in line 14.

There is a typo in the code and this is marked by an x on line 16 as the variable boll_num isn't defined:

Once this is fixed, there is no error:

Run Script

We can put some test code to create fundamental numeric variables and text variables. We can also create collections using the inbuilt classes:

#%% Fundamental Numeric Datatypes
full_num = 5
dec_num = 10.5
bool_num = True

#%% String
string = "Hello"

#%% Collections
list_col = [full_num, dec_num, bool_num, string]
tuple_col = (full_num, dec_num, bool_num, string)
dict_col = {"full": full_num, "dec": dec_num, "bool": "bool_num", "string": string}

Upon first launch we are prompted for the run settings which we can leave as default (these can later be changed in preferences if desired):

Variable Explorer

Spyder has a Variable Explorer which can be used to explore these variables. Each variable type is listed, alongside its size. In the case of a string, this is the number of characters and in the case of a list, this is the number of items in the list. Collections of variables for example this dict can be expanded to view in more detail:

Script Editor and Console

Spyder has a script editor and a console. The console keeps a track of the number of executions sent to the kernel. For example if the script is run to create the variable full_num, the value of full_num is shown in the variable explorer and we are informed in the console that we have run the script file:

full_num = 5

If we modify the script to print full_num and run it again:

full_num = 5
print(full_num)

The number of execution is now 2 and we are informed the script is run. The value 5 also displays below this as we used the print function:

Operations can be carried out in the console. For example this single line operation was the third execution to take place:

The console is often used to test out a quick line or two of code before adding it to a script.

Kernel

Restarting the Kernel will clear all variables from the Variable Explorer, clear the Console and close any imported modules. The number of executions will return back to 0.

The Kernel can be Restarted by going to Consoles → Restart Kernel:

Select Yes to Proceed:

The Kernel is now clear:

Cells

Comments can be added to the script by beginning with #. If a line begins with #%% it will create a new cell and the currently selected cell is highlighted in yellow.

We can run a single cell by selecting run cell:

Notice only the variables defined in the first cell display in the variable explorer and the first cell is still highlighted:

We can restart the Kernel and instead, select the next button. Run the cell and move onto the next cell:

Notice how the second cell is highlighted after the first cell is executed:

Finally we can use the 4th Run button to run only the highlighted selection:

Importing DataScience Libraries

Supposing we want to record some dependent y data values with respect to independent x values. We could use two lists to create two equally sized numeric vectors, a nested list or a dictionary:

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

xy = [[1, 2, 3, 4, 5], [2, 4, 6, 8, 10]]

xy2 = {"x": [1, 2, 3, 4, 5], "y": [2, 4, 6, 8, 10]}

Note however that each of these objects is 1 dimensional. i.e. xy is a list of lists.

Moreover the datatype of each item in the list can be independent which offers the most flexibility but it is not particularly useful in some cases where one is trying to plot the data to see a trend for example.

Finally the operators available for a collection such as a list are not optimised for numeric data. The + operator for example will concatenate a two lists, making a longer list, in a similar manner to the + operator being setup between two strings. It is not setup like the + operator between two ints which perform numeric addition.

We have two Python libraries based upon additional datatypes, numpy which is based numeric python arrays (which can be visualised as a mathematical matrix) and pandas which is based upon a DataFrame (which can be visualised as an Excel spreadsheet). These datatypes have a number of methods and operators, for example in the case of a numeric array, will carry out matrix operations.

Let's first examine numpy. Note the import line should be highlighted and ran which will execute it as shown in the console:

As numpy is imported into the kernel, code-completion for numpy will be accessible. In the case of np. a selection of objects which can be called from the numpy library:

We can use the function array to create a new numpy array. When this function is typed with parenthesis, details about the functions input arguments are shown.

Spyder has a Help Pane. Highlighting a function or class and pressing Ctrl + i will inspect it:

And attempt to retrieve the documentation:

For functions or classes from the DataScience libraries this can sometimes be quite limited and only give details about the library and not the specific function or class:

Usually a more detailed docstring can be accessed directly from the console by typing in a ? followed by the function or class to be investigated:

? np.array

Use the mouse wheel to scroll through the documentation and press q to quit the pager:

This will take you to the next line in the console:

We need the object which is usually a list or a list of lists. Everything else shown is optional and the datatype will be automatically determined:

import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
xy = np.array([[1, 2, 3, 4, 5], [2, 4, 6, 8, 10]])
xy2 = np.transpose(xy)

If we run the code above we can see all the objects in the variable explorer. Since the datatype is constant for each cell in a numpy array, this is just shown on the Variable Explorer:

We can now look at the pandas. Once again the import line should be ran in order to allow the code-completion to work:

In pandas we use the DataFrame (CamelCaseCapitalization) class to create an instance (our variable name xy):

We need to supply the data. This is provided in the form of a dictionary where they keys are strings of the column names and the values are lists. Notice how the dataframe xy looks like a matrix but each column is clearly labelled as "x" and "y" respectively:

Files

Notice the dataframe is in the form of an excel sheet and closely resembles the one below which is saved in Documents as Book1.xlsx:

We can use the read_excel function to read this data and create a new instance of a dataframe.

This Excel File has the title "Book1.xlsx" and is in the same folder as the spyder_script.py file. These can be seen by using the Files Tab. Each column has a name and the default Sheet name Sheet1 is used. Therefore we don't need to override the default values of any of the keyword input arguments in the read_excel function.

This reads in the data as a dataframe:

Plotting

Now we've got data, we can have a look at plotting it. There are a number of Python plotting libraries. The most frequently used one is matplotlib. It is frequently used with seaborn which acts like a wrapper around matplotlib to consistently change the styles of the plots and add some additional plot types (commonly used in data science).

We can now look at matplotlib. Once again the import line should be ran in order to allow the code-completion to work:

We can use the function plot to create a basic line plot:

For the args, we need to provide the x and y data. We can access a column from a dataframe as an attribute:

import pandas as pd
import matplotlib.pyplot as ply

xy = pd.read_excel("Book1.xlsx")

plt.plot(xy.x, xy.y)

Spyder by default, displays the plots as inline in the plots pane:

To change this and instead use automatic plotting to create each plot in its dedicated window select Tools → Preferences:

Select IPython Console. Then to the left select Graphics and change the setting to Automatic. The select Apply and Restart the Kernel:

Now relaunching the script shows an Automatic plot in its own dedicated Window:

We can also import seaborn and run the selection to allow code-completion to take place:

We can then change the style of the plots using set_style:

In this example, I am using whitegrid:

seaborn includes additional plot types that are used particularly in science. Some of these are duplicates with matplotlib but the syntax is a bit more geared towards dataframes.

We can use the function figure to create a new plot and assign the figure number. We will use figure 1 to create a line plot using matplotlib and figure 2 to create a lineplot with seaborn:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")

xy = pd.read_excel("Book1.xlsx")

plt.figure(1)
plt.plot(xy.x, xy.y)
plt.xlabel("x (units)")
plt.ylabel("y (units)")

plt.figure(2)
sns.lineplot(data=xy, x="x", y="y")
plt.xlabel("x (units)")
plt.ylabel("y (units)")

JupyterLab 3 IDE Tutorial

Recall that there is no start menu shortcut to JupyterLab and it needs to be launched via the Anaconda Navigator or Anaconda PowerShell Prompt.

File Explorer

JupyterLab is browser based. To the left handside is the file explorer alongside a File Menu which can be used to save files:

To the right hand side is the launcher. There are three common used files, the text file, markdown file and Notebook file:

Text File

The text file is essentially a plain text file and has the same capabilities of notepad. i.e. you can write text with no formatting capabilities:

The file can be renamed,by renaming the tab to the top or the file name in the JupyterLab file explorer to the left hand side:

In this case, I can rename it as textfile.txt:

This file can be viewed in File Explorer and opened in Text Editor:

Markdown File

We can use the + button in the top if the JupyterLab file explorer to open a new Launcher as a new tab to the right hand side.