Skip to content
Contact Support

Databases

Many research projects use reference or external databases. This page describes databases that exist on Mahuika for use as well as recommendations for using some specific external databases.

Maintained databases on Mahuika

Some databases are readable for all users on Mahuika. These databases can be found at /opt/nesi/db. Some environmental modules depend on these databases and connect to these directories automatically.

Dataset Path Licence Status Notes
AlphaFold /opt/nesi/db/alphafold_db CC-BY-4.0 Predicted protein structures generated by AlphaFold
BLAST /opt/nesi/db/blast Public NCBI BLAST nucleotide and protein databases
cartopy /opt/nesi/db/cartopy BSD-3-Clause Databases for cartopy module
centrifuge /opt/nesi/db/centrifuge GPL-3.0 Databases for centrifuge module
CheckM2 /opt/nesi/db/CheckM2_DB GPL-3.0 Database for CheckM2 module
CheckM /opt/nesi/db/CheckM_DB GPL-3.0 Database for CheckM module
DAS Tool /opt/nesi/db/DAS_DB Public Database for DAS_Tool module
dammit /opt/nesi/db/dammit_db BSD Databases for dammit module
dfam 3.9 /opt/nesi/db/dfam_3.9 CC0 1.0 Transposable element profile HMM database
DRAM 1.3.5 /opt/nesi/db/DRAM_1.3.5 GPL-3.0 Databases for DRAM module
eggnogdb /opt/nesi/db/eggnog_db Unspecified Orthologous group and functional annotation database
FCS-GX /opt/nesi/db/FCS-GX United States Government Work Database for FCS-GX module
gtdbtk_202 /opt/nesi/db/gtdbtk_202 CC-BY-SA 4.0 Genome Taxonomy Database release 202 used by GTDB-Tk module
gtdbtk_207_v2 /opt/nesi/db/gtdbtk_207_v2 CC-BY-SA 4.0 Genome Taxonomy Database release 207 v2 used by GTDB-Tk module
gtdbtk_214 /opt/nesi/db/gtdbtk_214 CC-BY-SA 4.0 Genome Taxonomy Database release 214 used by GTDB-Tk module
gtdbtk_220 /opt/nesi/db/gtdbtk_220 CC-BY-SA 4.0 Genome Taxonomy Database release 220 used by GTDB-Tk module
HUMAnN /opt/nesi/db/Humann MIT Databases for HUMAnN module
Kaiju /opt/nesi/db/Kaiju GPL-3.0 Database index for Kaiju module
Kraken2 /opt/nesi/db/Kraken2 MIT Databases for Kraken2 module
megaX /opt/nesi/db/megaX Free for academics Evolutionary analysis reference data for MegaX module
nullarbor /opt/nesi/db/nullarbor_db GPL-2.0 Reference databases used by the Nullarbor module
Pfam /opt/nesi/db/Pfam CC0 Protein family HMM database
PhyloPhlAn /opt/nesi/db/PhyloPhlAn MIT Databases of universal markers for prokaryotes
prokka /opt/nesi/db/prokka GPL-3.0 Databases needed for prokka environmental module
ProteinDataBank /opt/nesi/db/ProteinDataBank CC0 1.0 3D structures of proteins and nucleic acids
RQCFilterData /opt/nesi/db/RQCFilterData Unspecified Reference data for read quality control
sortmerna /opt/nesi/db/sortmerna_db GPL-3.0 rRNA reference databases for SortMeRNA module
SqueezeMeta /opt/nesi/db/SqueezeMeta GPL-3.0 Databases needed for SqueezeMeta module
StrVCTVRE /opt/nesi/db/StrVCTVRE MIT Training and reference data for StrVCTVRE module
Trinotate /opt/nesi/db/Trinotate Public Databases needed for Trinotate module
Uniprot /opt/nesi/db/Uniprot CC-BY 4.0 Protein sequence and functional annotation database
VEP /opt/nesi/db/VariantEffectPredictor No restrictions Ensembl annotation data for variant effect prediction
VIBRANT /opt/nesi/db/VIBRANT_v1.2.1_databases GPL-3.0 Viral genome and HMM databases used by VIBRANT environmental module
VirSorter /opt/nesi/db/VirSorter GPL-2.0 Viral hallmark gene and profile databases
waafle /opt/nesi/db/waafle MIT Reference sets for gene neighborhood analysis for waafle environmental module
checkv /opt/nesi/db/checkv-db-v0.6 BSD 3-Clause-style Viral genome completeness and contamination database for CheckV environmental module

There are also some versioned databases which are accessible through environmental modules, specifically:

  • AlphaFold2DB: module avail AlphaFold2DB
  • AlphaFold3DB: module avail AlphaFold3DB
  • BLASTDB: module avail BLASTDB

Requesting new or updated databases

If there is a database you think may be useful to many Mahuika users, or if you would like an updated version of one of the maintained databases, please Contact our Support Team with details about the source and version of the database of interest.

Recommendations for obtaining data from selected external databases

JGI Portals

The Joint Genome Institute has many databases and data portals available. To download/access files from JGI you will need to register for an account. We recommend you utilize the Globus endpoint provided by JGI to directly transfer files from the JGI servers to Mahuika. For more information about using Globus on Mahuika see the Globus docs section.

NCBI

We have severak environmental modules to aid in finding and downloading data from NCBI: