This guide looks at installing Anaconda 2024-02.1 on Ubuntu 24.04.
Removing Previous Installations¶
Anaconda should be installed on a Linux PC that has no previous Python installations outwith the system Python. The system Python is preinstalled as part of the Linux Operating system and should be considered as part of the Operating System and not modified by the user.
If an old Anaconda Installation or an Anaconda based installation such as Miniconda or is present these should be removed by deleting their perspective folders. Note that deletion of these folders leaves behind a large number of configuration files and the presence of these files often results in problematic settings persisting after a reinstall. For best results it is recommended to delete all these configuration files. For more details see Uninstallation Instructions.
System Python¶
Linux has a preinstalled version of Python which is known as the system Python. The system Python only contains standard libraries and should be considered as part of the Operating System and should not be modified by the end user as doing so may make the Operating System unstable.
When Files is opened, the User directory Home is selected:
The System Python is in Other Locations and in the Root:
/
In particular it is in the root bin folder and bin is an abbreviation for binaries:
/bin/
This is a development version of Ubuntu 24.04 LTS and has two Python versions 3.11 and 3.12:
When the Terminal is opened, the ~
indicates the current working directory. The terminal by default also searches the /bin/
folder for binaries:
To access the latest Python the python3
binary can be used (this is an alias to python3.12). The 3 exists because Linux used to ship with the latest version of Python 2 and Python 3 and it was used to distinguish these two versions. Python 2 has reached end of life and is no longer preinstalled:
In the:
/lib/
folder are a number of libraries.
The ones for Python 3.12 are found in the python3.12 folder:
/lib/python3.12
For example the standard library email:
/lib/python3.12/email
Has a __init__.py
file which is imported when the folder is imported:
Some modules are single files, for example datetime.py
:
The system Python does not contain third-party libraries such as the scientific stack, numpy, pandas and matplotlib. If these are attempted to be used, a ModuleNotFoundError displays:
Note two programming languages have been used so far bash and Python. The highlighted code is running in the Python shell and the code not highlighted is the bash shell. In this case bash, the default language of the terminal was used to run a Python session:
Anaconda vs Miniconda¶
Anaconda is a Python distribution that has a base Python environment that is designed to be used as is. The base Python environment has the conda package manager that can be used to create a separate Python environment (subinstallation of Python) for a custom configuration of packages.
Miniconda is a bootstrap version of Anaconda, that only contains the conda package manager and can likewise be used to create Python environments.
Anaconda should be preferenced when, the user required a preconfigured (base) Python environment to be used as is. When the user plans to only create custom Python environments, Miniconda should be preferenced.
Anaconda Python Distribution¶
The Anaconda Python distribution comes with its own base Python environment that contains:
- Python
- Python Standard Libraries
- The conda Package Manager
- The Anaconda Navigator
- Third Party Libraries:
- numpy
- pandas
- matplotlib
- seaborn
- plotly
- pillow
- scikit-learn
- scikit-image
- ⁝
- Third-party IDEs:
- Spyder
- Jupyter
- JupyterLab
- Jupyter Notebook
- Jupyter QTConsole
- Jupyter Console
- Third-party formatters:
- autopep8
- isort
- black
Miniconda¶
Miniconda is a stripped down version of Anaconda containing only:
- Python
- Python Standard Libraries
- The conda Package Manager
conda¶
Anaconda and Miniconda have the conda package manager which should be used in preference to the native Python package manager pip.
- conda
pip
pip is strictly a package manager for Python packages. However many datascience projects under the hood, use code that is written in C++ for performance gains. The conda package manager manages both the Python and non-Python dependencies. The conda package manager has also been written in C++ for increased performance and reliability. This was separately developed as mamba and the conda package manager uses the libmamba (C++) solver by default.
The conda package manager uses two channels:
- conda-forge
- anaconda
The first channel community forge is the community channel and has the largest number of packages available.
The second channel is the channel maintained by Anaconda Inc. Anaconda Inc test packages for compatibility with the Anaconda Python Distribution.
As a consequence the latest version of a package available on the anaconda channel may be dated with respect to the package on the conda-forge channel as it takes Anaconda Inc some time to test packages. Moreover Anaconda Inc only test the most commonly used datascience libraries and therefore more niche packages will only be available on conda-forge.
In the Anaconda base environment the following commands should never be used:
pip install packageconda install conda-forge::packageconda install -c conda-forge package
This is because use of multiple package managers and use of multiple channels will make the Anaconda base Python environment unstable.
Only packages available from the anaconda channel should be installed in base:
- conda install anaconda::package
- conda install -c anaconda package
The base Python environment is normally used as is and instead a custom Python environment is used to install a subinstallation of Python with custom packages, usually from the conda-forge community channel. More details about channels will be given later.
Download Links¶
The latest Anaconda and Miniconda installer can be downloaded from:
Using Anaconda as an example, Linux can be selected:
Then the 64 Bit Linux installer can be downloaded:
The download will be save in Downloads by default:
Installing¶
Go to the Downloads folder and right click blank space and select Open in Terminal:
Note instead of Home ~
, the Downloads folder displays ~/Downloads
:
If another files instance is opened. We can return to:
/
/bin
Notice there is a bash
binary. This can be used to execute a shell script:
If the shell script is right clicked and renamed:
Its file name and file extension can be copied:
This can be pasted into the terminal so the bash command runs the script:
For Anaconda input:
bash Anaconda3-2024.02-1-Linux-x86_64.sh
Or for Miniconda input:
bash Miniconda3-latest-Linux-x86_64.sh
Press ↵
:
Press q
to quit scrolling through the license agreement:
Input yes
to accept:
Press ↵
to install in the default location:
During installation you will be asked to initialise Anaconda, input yes
to proceed:
Anaconda is installed and initialised and all open Terminals should be closed. New Terminals will be initialised:
Unfortunately the default option for initialisation is no
. If you have selected this by mistake. Anaconda or Miniconda can be manually initialised by changing directory to the anaconda3 or miniconda3 binary subdirectory:
cd ~/anaconda3/bin
cd ~/miniconda3/bin
And then running:
./conda init
The ./
is an instruction to run a binary from the current working directory. Once again the Terminal will need to be closed.
Any new Terminal instances will be initialised. The prefix (base)
shows in the Terminal:
Going to the Home directory:
~
There is now a
~/anaconda3
folder:
This has a
~/anaconda3/bin
subfolder:
In this subfolder are python binaries. Notice there is a python3
which also has the alias python
and these correspond to the current version of python in the base Python environment which in the case for Anaconda is python3.11
:
Because Anaconda is initialised, a search for a binary is made in the ~/anaconda3/bin
folder before it is made in the /bin
folder.
This means the python3
binary from the Anaconda base Python environment is used opposed to the system Python. This can be seen by the different version number:
There is an associated
~/anaconda3/lib
folder:
The python3.11
contains the Python libraries:
When the standard module email is imported, it is imported from this folder:
When the standard module datetime is imported, it is imported from this folder:
There is a
~/anaconda3/lib/site-packages
folder for third-party libraries.
There are a handful of entries for Miniconda but this is large for Anaconda. Anaconda has a numpy
folder:
When the module is imported, this file is imported:
numpy
is a large library and has submodules. A submodule can be in a folder and when this submodule is accessed, the following file is accessed:
Once again, the highlighted Python shell is ran within a bash shell:
If the
~/anaconda3/bin
folder is opened.
There is the binary clear
which can be used to clear the bash shell:
The conda
binary is the conda package manager:
Details about its commands can be seen by inputting
conda
For Anaconda or Miniconda, frequent checks should be made for the conda
package manager:
conda update conda
as it is commonly updated to address bug fixes when new packages are released.
For Anaconda, updating the package manager will update the Anaconda Python distribution.
In this case, the conda
package manager is up to date.
For Anaconda, the Anaconda Navigator is usually separately updated:
conda update anaconda-navigator
In this case, there is an update that can be installed:
conda
is a cross-platform package manager used for installing packages related to datascience.
Ubuntu has its own specific package manager (snap
) and because it is based on Debian also has (apt
).
The native package manager is found in:
/bin
Because there is no equivalent in "`conda/bin"", this binary will be used. TeX will be installed system wide:
sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic cm-super dvipng
Because changes are being made to the system, the command needs to be run as a super user. sudo
means superuser do.
You will need to provide your password to continue with the operation:
Input y
to proceed:
Anaconda Navigator¶
The Anaconda Navigator is a GUI implementation of the conda
package manager.
This is not available in Miniconda and not commonly installed in Miniconda.
Going back to
~/anaconda3/bin
Is the anaconda-navigator
binary, notice the -
which needs to be supplied.
The Anaconda Navigator can be launched using:
anaconda-navigator
Notice that the Terminal will be busy as it is running the event loop required for the QT application window to show:
The Anaconda Navigator contains a number of shortcuts to preinstalled IDEs:
Its settings can be changed by going to File and Preferences:
Unfortunately the Enable High DPI Scaling Setting doesn't always work well and on some screens, the following window will be outside the screen viewing area making it impossible to select the Apply button:
There are settings to Edit the Navigator and Conda programmatically:
If this does not display properly on your screen, Enable Hidden Files:
Go to the
~/.anaconda3/navigator
folder:
And edit the anaconda-navigator.ini
file in the text editor:
Set the enable_high_dpi_scaling```` to
False"`:
Go to
~
The .condarc
(conda recall parameters) should be automatically generated
It should be configured to use the default settings:
This can be updated to another channel for example the community channel.
However it is not recommended to change this as it may lead to an attempted update of base using the wrong channel which will result in an unstable base Python environment.
The Terminal is busy while the event loop for the QT MainWindow is open:
When this Windows is closed:
The Terminal should finish the process and move onto the next prompt. If it does not press Ctrl
+ c
to close the currently running operation:
Note to copy in the Terminal Ctrl
, ⇧
+ c
is used to copy and to paste in the Terminal Ctrl
, ⇧
+ v
is used.
IPython¶
Interactive Python (ipython) is a modified ipython shell. It isn't included with Miniconda but is typically installed in a Python environment with an IDE that is powered by ipython.
When installed, there is an ipython
and ipython3
binary which are alias of one another. The ipython shell looks similar to the Python shell however the prompt is numeric:
Pressing ↹
after a prefix will also show a list of identifiers that start with that prefix:
The identifiers beginning with %
are ipython magics. ipython magics are essentially reimplementations of common bash commands that are used when working with a Python script. For example %conda
which can be sued to access the conda package manager. An ipython magic can be used in the middle of an ipython shell allowing the versatility of both shells without exiting the shell and losing all variables.
The ?
can be used to print out a docstring:
Notice also that syntax highlighting is also applied:
The ipython shell can be exited:
Jupyter¶
Jupyter is an abbreviation for Julia, Python et R. There are four main binaries:
jupyter-console
jupyter-qtconsole
jupyter-notebook
jupyter-lab
The binary jupyter
also accepts the command option:
jupyter console
jupyter qtconsole
jupyter notebook
jupyter lab
In the Anaconda Python distribution only the Python Kernel is preinstalled. For Miniconda/Anaconda, the latest version of jupyter can be installed in a separate Python environment under the name jupyter-env
using packages from the community channel conda-forge
using:
conda create -n jupyter-env -c conda-forge python jupyterlab jupyter cython seaborn scikit-learn pyarrow sympy openpyxl xlrd xlsxwriter lxml sqlalchemy tabulate nodejs ipywidgets plotly pyqt isort autopep8 ruff black ipympl jupyterlab-variableinspector jupyterlab_code_formatter jupyterlab-spellchecker ghostscript nbconvert julia r-irkernel jupyter-lsp-r r-tidyverse r-ggthemes r-palmerpenguins r-writexl
The following specifies the name of the environment and channel used to install packages from respectively:
-n jupyter-env
-c conda-forge
The package julia
will install the Julia programming language and Julia kernel. Julia has its own package manager which should be used for its own libraries.
The package r-irkernel
will install the R programming language and the R kernel. R has its own package manager however its use should be avoided in a conda environment and additional packages should be installed from the community channel conda-forge
.
Details about the packages to be installed will be listed, press y
to proceed. Note if Julia and R are installed the latest version of Python may not be installed as conda
will determine the latest version of Python compatible with the other programming languages:
The Python environment is created with the list of packages:
To use it, it needs to be activated:
conda activate jupyter-env
Notice the (base)
prefix is now replaced by (jupyter-env)
. This means changes to packages will occur in this environment when conda
is used and -n
isn't specified. In addition binaries will be preferentially searched for in the jupyter-env
Python environment before looking in the base
Python environment or the system Python environment:
Python environments are found in the subfolder:
~/anaconda3/envs
~/miniconda3/envs
Now the three programming languages can be used:
The kernels can be listed using:
jupyter kernelspec list
The jupyter-console
can be launched using. By default the Python (ipython) kernel is used:
jupyter-console
Another kernel can be specified using the option --kernel
, for example R can be selected using:
jupyter-console --kernel=ir
The QTConsole is a rewrite of the Terminal using QT:
jupyter-qtconsole
The Terminal remains busy as the QT MainWindow application is running:
The QTConsole is similar to the ipython console. Pressing ↹
after a prefix will display identifiers:
A docstring will display as a popup when a recognised function is input with open parenthesis:
The QTConsole will also nest the graphics:
The session can be saved to HTML, which is the file format used for a static website:
This can be saved to Documents:
The images can be kept as Inline:
This file can be opened:
Which will display as a website that can be shared:
The jupyter-lab
binary can be launched using:
jupyter-lab
JupyterLab runs a server in the Terminal and displays all the visual UI elements in the browser:
It has a files tab which is a browser based version of files and a number of tiles which can be used to create new files. Additional options will be added if the Python environment with Julia and R is used instead of base. A new notebook file can be created:
It can be renamed in files:
Similar code to the QTConsole can be added. It is also possible to add markdown cells for formatted text:
The popup balloon for a docstring will only display if ⇧
+ ↹
are pressed:
Identifiers will only display if ↹
is pressed after a prefix:
The prefix can be a datamodel identifier which is otherwise hidden by default:
JupyterLab has a Variable Inspector which can be accessed by right clicking on the notebook and selecting Open Variable Inspector:
The notebook can be saved:
If the notebook is examined in Ubuntu files:
It is in the form of a JSON file. The browser uses these instructions to display the content. Note that a notebook is typically reopened within JupyterLab (with the Kernel restarted) and not using a text editor:
Note when the tab is closed:
The server is still running, in the Terminal:
Press Ctrl
+ c
to close the currently running operation, then press y
when prompted:
The server is closed and a new prompt displays:
Spyder¶
Another binary preinstalled by Anaconda is Spyder:
The latest version for Anaconda/Miniconda can be installed in a separate Python environment using:
conda create -n spyder-env -c conda-forge python spyder cython seaborn scikit-learn sympy openpyxl xlrd xlsxwriter lxml sqlalchemy tabulate pyqt ruff ghostscript
Spyder can be launched (activating the appropriate Python environment if necessary) by using:
spyder
This IDE has a script editor, that applies syntax highlighting, highlights syntax errors and has a number of tools to help improve code quality:
The Source menu has the Format File or Selection with AutoPEP8 option:
This will move imports to the top and address spacing issues:
The autoformatter can be changed using Tools → Preferences:
Selecting the Code and Linting tab to the left and then switching to black:
Now the Source menu has the Format File or Selection with Black option:
This applies blacks opinionated formatting. At current there are some limitations as black won't organise the imports correctly before applying opinionated formatting and so black does not work unless autopep8 has previously been used:
These options use the autopep
and black
applications found in the:
~\anaconda3\bin
folder. Spyder does not yet support isort
which is used to sort the imports alphabetically in two groupings (by standard module and third-party modules).
Unfortunately, blacks opinionated formatting differs from Pythons default representation and therefore many Python developers dislike black. A new project ruff is a faster implementation of black which can be configured to match Pythons default representation. Ruff is not yet integrated in Anaconda or Spyder.
Spyder has a very powerful Variable Explorer which can be used to visualise variables:
A variable that is a Collection
can be expanded. By default GNOM will display the currently selected window on top, meaning if the Spyder IDE is selected, the Variable will be behind it. This can be prevented by right clicking the Variable and selecting Always on Top:
Identifiers corresponding to a prefix will display as a popup, alongside the associated docstring for a function:
The docstring can also be accessed by right clicking an object and selecting Inspect current object:
This will open up the documentation in the Help pane:
Plots are by default displayed as static images in the plots pane:
The plot preferences can be changed by going to Tools → Preferences:
Selecting the IPython Console tab to the left, the Graphics tab to the top right and changing the backend to Qt5 (Automatic is an alias for Qt5):
To apply the new preferences select Consoles → Restart Kernel:
Running the script will now display the plot in its own interactive window:
A comment can be added to a Python script file using #
. If #%%
is used, a new cell is created:
The script file can be saved using File → Save As…:
It can then be saved to the Documents folder:
When this script is now rerun, the current working directory of the script file displays in the files tab:
The script file can be viewed in file explorer:
And opened in text editor. This applies syntax highlight but lacks other capabilities such as the ability to quickly look up identifiers or view a docstring:
When Spyder is closed, a new prompt will display. If it doesn't, press Ctrl
+ c
to close the currently running operation:
bioconda¶
So far two Python channels have been examined:
conda-forge
(community)anaconda
(maintained by the Anaconda company and tested for the Anaconda Python distribution)
The packages installed have been limited to popular packages that are widely used and actively maintained. In essence the following packages are normally tested when a Python version is in an alpha or beta stage and therefore updated for a RTM release of Python.
When a package becomes more specialised, there are normally only a smaller number of developers. These developers do not have the time to test for the current version of Python but instead release packages for a stable version of Python. i.e. the version of Python that is only being issued bugfixes, which is currently 3.10. For more details see Python Versions.
When attempting to install such niche libraries, these libraries should be specified during creation of a Python environment so that the conda
package manager can look at the latest version of the niche library and examine its requirements and therefore determine the latest version of Python and numpy to install. This is more reliable than attempting to downgrade versions of Python and numpy from a currently existing environment which usually results in an unstable Python environment.
Many of the bioinformatics tools are only developed for Posix systems (Linux/Mac) and are on the separate channel:
bioconda
Details about the packages available for this channel are given in the bioconda documentation.
The two channels are generally used together:
conda-forge
bioconda
The channels are specified using:
-c conda-forge -c bioconda
Which is an instruction to look for a package on the community channel first and the bioconda channel second.
Note that packages on the bioconda
channel are not typically configured to work with packages on the anaconda
channel. Attempting to mix these channels will result in an unstable Python environment which cannot be solved.
Therefore if the command is used, it will install packages from both channels:
conda create -n bioinformatics-env -c conda-forge -c bioconda python jupyterlab cython seaborn scikit-learn sympy openpyxl xlrd xlsxwriter lxml sqlalchemy tabulate nodejs ipywidgets plotly jupyterlab-variableinspector ipympl pyqt r-irkernel jupyter-lsp-r r-tidyverse r-ggthemes r-palmerpenguins r-writexl samtools htslib pysam bcftools bedtools libdeflate blast bioconductor-iranges bioconductor-s4vectors bioconductor-biocgenerics bowtie2 bioconductor-biobase bioconductor-biostrings bioconductor-genomeinfodb bioconductor-genomicranges bioconductor-zlibbioc bioconductor-xvector bioconductor-biocparallel bwa bioconductor-summarizedexperiment
The conda
package manager will search each channel for each package specified and as well as the package dependencies. It will solve the environment until the latest compatible set is found:
Details about each package will be specified:
The latest "stable" version of Python:
and numpy
will be used:
Input y
in order to proceed:
The Python environment is now created:
Environment File¶
The currently activate environment can be exported to a yml (yet another markdown file) file:
conda activate bioinformatics-env
conda env export > Documents/bioinformatic.yml
If this file is opened in text editor:
The channels and dependencies are shown. Note that a specific version of each package is specified:
The environment can be removed by using:
conda deactivate
conda env remove -n bioinformatics-env
In academic settings, an academic may issue a yml file which will reduce the liklihood of students encountering errors due to changes in newer versions of the libraries used.
The bioinformatics-env
specified in the bioinformatic.yml
file can be recreated using:
conda env create -f Documents/bioinformatic.yml
Because all the packages are specified, they will be downloaded. Quite often, the yml files are platform agnostic, however because this yml uses the bioconda
channel which only has packages for Posix systems (Linux/Mac) it won't work on Windows unless WSL is used:
Updating an Environment¶
The conda
package manager can be used to update --all
packages to the latest version.
–all should never be used with base, instead the conda
package manager should be used to update conda
which will collectively update the distribution.
-c conda-forge or -c bioconda should never be used with base, instead only the anaconda
channel (also known as the default channel) should be used.
The hupyter-env
can be updated using:
conda active jupyter-env
conda update -c conda-forge --all
In this case, a small number of packages are found which can be upgraded but a package is substantially downgraded:
Press y
to proceed:
Often with large Python environments, better results are achieved by deleted the Python environment and recreating it with all packages specified during the time of creation opposed to attempting to update an existing Python environment.
Revision¶
The packages in a conda environment can be listed using the list
command:
conda list
The --revision
option can be used to list packages for each revision:
conda list --revision
The install
command can use the --revision
option to revert to a previous revision:
conda install -c conda-forge --revision=0
However the conda package manager, seems to hang for an extremely long time here for such a simple change. This command option will likely be optimized in a later version of conda. Unfortunately the conda env export command isn't configured to recognize --revision
as an option so it is recommended to export an environment out to a yml files before updating it.