Objects

With pyGeno you can manipulate familiar object in intuituive way. All the following classes except SNP inherit from pyGenoObjectWrapper and have therefor access to functions sur as get(), count(), ensureIndex()…

Base classes

Base classes are abstract and are not meant to be instanciated, they nonetheless implement most of the functions that classes such as Genome possess.

class pyGeno.pyGenoObjectBases.RLWrapper(rabaObj, listObjectType, rl)[source]

A wrapper for rabalists that replaces raba objects by pyGeno Object

class pyGeno.pyGenoObjectBases.pyGenoRabaObject(*args, **fieldsDct)[source]

pyGeno uses rabaDB to persistenly store data. Most persistent objects have classes that inherit from this one (Genome_Raba, Chromosome_Raba, Gene_Raba, Protein_Raba, Exon_Raba). Theses classes are not mean to be accessed directly. Users manipulate wrappers such as : Genome, Chromosome etc… pyGenoRabaObject extends the Raba class by adding a function _curate that is called just before saving. This class is to be considered abstract, and is not meant to be instanciated

save()[source]

Calls _curate() before performing a normal rabaDB lazy save (saving only occurs if the object has been modified)

class pyGeno.pyGenoObjectBases.pyGenoRabaObjectWrapper(wrapped_object_and_bag=(), *args, **kwargs)[source]

All the wrapper classes such as Genome and Chromosome inherit from this class. It has most that make pyGeno useful, such as get(), count(), ensureIndex(). This class is to be considered abstract, and is not meant to be instanciated

count(objectType, *args, **coolArgs)[source]

Returns the number of elements satisfying the query

classmethod dropGlobalIndex(fields)[source]

Drops an index, the opposite of ensureGlobalIndex()

classmethod ensureGlobalIndex(fields)[source]

Add a GLOBAL index to the db to speedup lookouts. Fields can be a list of fields for Multi-Column Indices or simply the name of a single field. A global index is an index on the entire type. A global index on ‘Transcript’ on field ‘name’, will index the names for all the transcripts in the database

classmethod flushIndexes()[source]

Drops all the indexes attached to the object’s class. Ex Transcript.flushIndexes()

get(objectType, *args, gen=False, **coolArgs)[source]

Raba Magic inside. This is th function that you use for querying pyGeno’s DB.

Usage examples:

  • myGenome.get(“Gene”, name = ‘TPST2’)

  • myGene.get(Protein, id = ‘ENSID…’)

  • myGenome.get(Transcript, {‘start >’ : x, ‘end <’ : y})

classmethod getIndexes()[source]

Returns a list of indexes attached to the object’s class. Ex Transcript.getIndexes()

getSequencesData()[source]

This lazy abstract function is only called if the object sequences need to be loaded

classmethod help()[source]

Returns a list of available field for queries. Ex Transcript.help()

iterGet(objectType, *args, **coolArgs)[source]

Same as get. But retuns the elements one by one, much more efficient for large outputs

class pyGeno.pyGenoObjectBases.pyGenoRabaObjectWrapper_metaclass(name, bases, dct)[source]

This metaclass keeps track of the relationship between wrapped classes and wrappers

Genome

class pyGeno.Genome.Genome(SNPs=None, SNPFilter=None, *args, **kwargs)[source]

This is the entry point to pyGeno:

myGeno = Genome(name = 'GRCh37.75', SNPs = ['RNA_S1', 'DNA_S1'], SNPFilter = MyFilter)
for prot in myGeno.get(Protein) :
        print prot.sequence
class pyGeno.Genome.Genome_Raba(*args, **fieldsDct)[source]

The wrapped Raba object that really holds the data

pyGeno.Genome.getGenomeList()[source]

Return the names of all imported genomes

Chromosome

class pyGeno.Chromosome.Chromosome(*args, **kwargs)[source]

The wrapper for playing with Chromosomes

class pyGeno.Chromosome.Chromosome_Raba(*args, **fieldsDct)[source]

The wrapped Raba object that really holds the data

class pyGeno.Chromosome.ChrosomeSequence(data, chromosome, refOnly=False)[source]

Represents a chromosome sequence. If ‘refOnly’ no ploymorphisms are applied and the ref sequence is always returned

Gene

class pyGeno.Gene.Gene(*args, **kwargs)[source]

The wrapper for playing with genes

class pyGeno.Gene.Gene_Raba(*args, **fieldsDct)[source]

The wrapped Raba object that really holds the data

Transcript

class pyGeno.Transcript.Transcript(*args, **kwargs)[source]

The wrapper for playing with Transcripts

find(sequence)[source]

return the position of the first occurance of sequence

findAll(sequence)[source]

Returns a list of all positions where sequence was found

findAllInUTR3(sequence)[source]

Returns a lits of all positions where sequence was found in the 3’UTR

findAllInUTR5(sequence)[source]

Returns a list of all positions where sequence was found in the 5’UTR

findAllIncDNA(sequence)[source]

Returns a list of all positions where sequence was found in the cDNA

findInUTR3(sequence)[source]

return the position of the first occurance of sequence in the 3’UTR

findInUTR5(sequence)[source]

return the position of the first occurance of sequence in the 5’UTR

findIncDNA(sequence)[source]

return the position of the first occurance of sequence

getCodon(i)[source]

returns the ith codon

getNbCodons()[source]

returns the number of codons in the transcript

getNucleotideCodon(cdnaX1)[source]

Returns the entire codon of the nucleotide at pos cdnaX1 in the cdna, and the position of that nocleotide in the codon

getUTR3Length()[source]

returns the length of the 3’UTR

getUTR5Length()[source]

returns the length of the 5’UTR

getcDNALength()[source]

returns the length of the cDNA

iterCodons()[source]

iterates through the codons

class pyGeno.Transcript.Transcript_Raba(*args, **fieldsDct)[source]

The wrapped Raba object that really holds the data

Exon

class pyGeno.Exon.Exon(*args, **kwargs)[source]

The wrapper for playing with Exons

find(sequence)[source]

return the position of the first occurance of sequence

findAll(sequence)[source]

Returns a lits of all positions where sequence was found

findAllInCDS(sequence)[source]

Returns a lits of all positions where sequence was found

findInCDS(sequence)[source]

return the position of the first occurance of sequence

getCDSLength()[source]

returns the length of the CDS sequence

hasCDS()[source]

returns true or false depending on if the exon has a CDS

nextExon()[source]

Returns the next exon of the transcript, or None if there is none

pluck()[source]

Returns a plucked object. Plucks the exon off the tree, set the value of self.transcript into str(self.transcript). This effectively disconnects the object and makes it much more lighter in case you’d like to pickle it

previousExon()[source]

Returns the previous exon of the transcript, or None if there is none

class pyGeno.Exon.Exon_Raba(*args, **fieldsDct)[source]

The wrapped Raba object that really holds the data

Protein

class pyGeno.Protein.Protein(*args, **kwargs)[source]

The wrapper for playing with Proteins

find(sequence)[source]

Returns the position of the first occurence of sequence taking polymorphisms into account

findAll(sequence)[source]

Returns all the position of the occurences of sequence taking polymorphisms into accoun

findString(sequence)[source]

Returns the first occurence of sequence using simple string search in sequence that doesn’t care about polymorphisms

findStringAll(sequence)[source]

Returns all first occurences of sequence using simple string search in sequence that doesn’t care about polymorphisms

getDefaultSequence()[source]

Returns a version str sequence where only the last allele of each polymorphisms is shown

getPolymorphisms()[source]

Returns a list of all polymorphisms contained in the protein

class pyGeno.Protein.Protein_Raba(*args, **fieldsDct)[source]

The wrapped Raba object that really holds the data

SNP

class pyGeno.SNP.AgnosticSNP(*args, **fieldsDct)[source]

This is a generic SNPs/Indels format that you can easily make from the result of any SNP caller. AgnosticSNP files are tab delimited files such as:

chromosomeNumber uniqueId start end ref alleles quality caller Y 1 2655643 2655644 T AG 30 TopHat Y 2 2655645 2655647 - AG 28 TopHat Y 3 2655648 2655650 TT - 10 TopHat

All positions must be 0 based The ‘-‘ indicates a deletion or an insertion. Collumn order has no importance.

class pyGeno.SNP.CasavaSNP(*args, **fieldsDct)[source]

A SNP of Casava

class pyGeno.SNP.SNPMaster(*args, **fieldsDct)[source]

This object keeps track of SNP sets and their types

class pyGeno.SNP.SNP_INDEL(*args, **fieldsDct)[source]

All SNPs should inherit from me. The name of the class must end with SNP

class pyGeno.SNP.TopHatSNP(*args, **fieldsDct)[source]

A SNP from Top Hat, not implemented

class pyGeno.SNP.dbSNPSNP(*args, **fieldsDct)[source]

This class is for SNPs from dbSNP. Feel free to uncomment the fields that you need

pyGeno.SNP.getSNPSetsList()[source]

Return the names of all imported snp sets