pyGeno’s database is populated by importing tar.gz compressed archives called datawraps. An importation is a one time step and once a datawrap has been imported it can be discarded with no concequences to the database.

Here’s how you import a reference genome datawrap:

from pyGeno.importation.Genomes import *

And a SNP set datawrap:

from pyGeno.importation.SNPs import *

pyGeno comes with a few datawraps that you can quickly import using the Bootstraping module.

You can find a list of datawraps to import here: Datawraps

You can also easily create your own by simply putting a bunch of URLs into a manifest.ini file and compressing int into a tar.gz archive (as explained below or on the Wiki).



backup the current database version. automatically called by importGenome(). Returns the filename of the backup

importation.Genomes.deleteGenome(species, name)[source]

Removes a genome from the database

importation.Genomes.importGenome(packageFile, batchSize=50, verbose=0)[source]

Import a pyGeno genome package. A genome packages is folder or a tar.gz ball that contains at it’s root:

  • gziped fasta files for all chromosomes, or URLs from where them must be downloaded

  • gziped GTF gene_set file from Ensembl, or an URL from where it must be downloaded

  • a manifest.ini file such as:

    description = Test package. This package installs only chromosome Y of mus musculus
    maintainer = Tariq Daouda
    maintainer_contact = tariq.daouda [at] umontreal
    version = GRCm38.73
    species = Mus_musculus
    name = GRCm38_test
    source =
    Y = Mus_musculus.GRCm38.73.dna.chromosome.Y.fa.gz / or an url such as ftp://... or http://
    gtf = Mus_musculus.GRCm38.73_Y-only.gtf.gz / or an url such as ftp://... or http://

All files except the manifest can be downloaded from:

A rollback is performed if an exception is caught during importation

batchSize sets the number of genes to parse before performing a database save. PCs with little ram like small values, while those endowed with more memory may perform faster with higher ones.

Verbose must be an int [0, 4] for various levels of verbosity

Polymorphisms (SNPs and Indels)


deletes a set of polymorphisms


The big wrapper, this function should detect the SNP type by the package manifest and then launch the corresponding function. Here’s an example of a SNP manifest file for Casava SNPs:

description = Casava SNPs for testing purposes
maintainer = Tariq Daouda
maintainer_contact = tariq.daouda [at] umontreal
version = 1

species = human
name = dummySRY
type = Agnostic
source = my place at the IRIC

filename = snps.txt # as with genomes you can either include de file at the root of the package or specify an URL from where it must be downloaded