Tools

pyGeno provides a set of tools that can be used independentely. Here you’ll find goodies for translation, indexation, and more.

Progress Bar

pyGeno’s awesome progress bar, with logging capabilities and remaining time estimation.

class tools.ProgressBar.ProgressBar(nbEpochs=-1, width=25, label='progress', minRefeshTime=1)[source]

A very simple unthreaded progress bar. This progress bar also logs stats in .logs. Usage example:

p = ProgressBar(nbEpochs = -1)
        for i in range(200000) :
                p.update(label = 'value of i %d' % i)
p.close()

If you don’t know the maximum number of epochs you can enter nbEpochs < 1

close()[source]

Closes the bar so your next print will be on another line

log()[source]

logs stats about the progression, without printing anything on screen

saveLogs(filename)[source]

dumps logs into a nice pickle

update(label='', forceRefresh=False, log=False)[source]

the function to be called at each iteration. Setting log = True is the same as calling log() just after update()

Useful functions

This module is a bunch of handy bioinfo functions.

tools.UsefulFunctions.complement(seq)[source]

returns the complementary sequence without inversing it

tools.UsefulFunctions.decodePolymorphicNucleotide(nuc)[source]

the opposite of encodePolymorphicNucleotide, from ‘R’ to [‘A’, ‘G’]

tools.UsefulFunctions.decodePolymorphicNucleotide_str(nuc)[source]

same as decodePolymorphicNucleotide but returns a string instead of a list, from ‘R’ to ‘A/G

tools.UsefulFunctions.encodePolymorphicNucleotide(polySeq)[source]

returns a single character encoding all nucletides of polySeq in a single character. PolySeq must have one of the following forms: [‘A’, ‘T’, ‘G’], ‘ATG’, ‘A/T/G’

tools.UsefulFunctions.findAll(haystack, needle)[source]

returns a list of all occurances of needle in haystack

tools.UsefulFunctions.getNucleotideCodon(sequence, x1)[source]

Returns the entire codon of the nucleotide at pos x1 in sequence, and the position of that nocleotide in the codon in a tuple

tools.UsefulFunctions.getSequenceCombinaisons(polymorphipolymorphicDnaSeqSeq, pos=0)[source]

Takes a dna sequence with polymorphismes and returns all the possible sequences that it can yield

tools.UsefulFunctions.highlightSubsequence(sequence, x1, x2, start=' [', stop='] ')[source]

returns a sequence where the subsequence in [x1, x2[ is placed in bewteen ‘start’ and ‘stop’

tools.UsefulFunctions.polymorphicCodonCombinaisons(codon)[source]

Returns all the possible amino acids encoded by codon

tools.UsefulFunctions.reverseComplement(seq)[source]

Complements a DNA sequence, returning the reverse complement.

tools.UsefulFunctions.showDifferences(seq1, seq2)[source]

Returns a string highligthing differences between seq1 and seq2:

  • Matches by ‘-‘
  • Differences : ‘A|T’
  • Exceeded length : ‘#’
tools.UsefulFunctions.translateDNA(sequence, frame='f1')[source]

Translates DNA code, frame : fwd1, fwd2, fwd3, rev1, rev2, rev3

tools.UsefulFunctions.translateDNA_6Frames(sequence)[source]

returns 6 translation of sequence. One for each reading frame

Binary sequences

To encode sequence in binary formats

class tools.BinarySequence.AABinarySequence(sequence)[source]

A binary sequence of amino acids

class tools.BinarySequence.BinarySequence(sequence, arrayForma, charToBinDict)[source]

A class for representing sequences in a binary format

decode(binSequence)[source]

decodes a binary sequence to return a string

encode(sequence)[source]

Returns a tuple (binary reprensentation, default sequence, polymorphisms list)

find(strSeq)[source]

returns the first occurence of strSeq in self. Takes polymorphisms into account

findAll(strSeq)[source]

Same as find but returns a list of all occurences

findPolymorphisms(strSeq, strict=False)[source]

Compares strSeq with self.sequence. If not ‘strict’, this function ignores the cases of matching heterozygocity (ex: for a given position i, strSeq[i] = A and self.sequence[i] = ‘A/G’). If ‘strict’ it returns all positions where strSeq differs self,sequence

getDefaultSequence()[source]

returns a default version of sequence where only the last allele of each polymorphisms is shown

getNbVariants(x1, x2=-1)[source]

returns the nb of variants of sequences between x1 and x2

getPolymorphisms()[source]

returns all polymorphsims in the form of a dict pos => alleles

getSequenceVariants(x1=0, x2=-1, maxVariantNumber=128)[source]

returns the sequences resulting from all combinaisons of all polymorphismes between x1 and x2. The results is a couple (bool, variants of sequence between x1 and x2), bool == true if there’s more combinaisons than maxVariantNumber

class tools.BinarySequence.NucBinarySequence(sequence)[source]

A binary sequence of nucleotides

Segment tree

Segment trees are an optimised way to index a genome.

class tools.SegmentTree.SegmentTree(x1=None, x2=None, name='', referedObject=[], father=None, level=0)[source]

Optimised genome annotations. A segment tree is an arborescence of segments. First position is inclusive, second exlusive, respectively refered to as x1 and x2. A segment tree has the following properties :

  • The root has no x1 or x2 (both set to None).
  • Segment are arrangend in an ascending order
  • For two segment S1 and S2 : [S2.x1, S2.x2[ C [S1.x1, S1.x2[ <=> S2 is a child of S1

Here’s an example of a tree :

  • Root : 0-15
  • —->Segment : 0-12
  • ——->Segment : 1-6
  • ———->Segment : 2-3
  • ———->Segment : 4-5
  • ——->Segment : 7-8
  • ——->Segment : 9-10
  • —->Segment : 11-14
  • ——->Segment : 12-14
  • —->Segment : 13-15

Each segment can have a ‘name’ and a ‘referedObject’. ReferedObject are objects are stored within the graph for future usage. These objects are always stored in lists. If referedObject is already a list it will be stored as is.

emptyChildren()[source]

Kills of children

flatten()[source]

Flattens the tree. The tree become a tree of depth 1 where overlapping regions have been merged together

getFirstLevel()[source]

returns a list of couples (x1, x2) of all the first level indexed regions

getIndexedLength()[source]

Returns the total length of indexed regions

getX1()[source]

Returns the starting position of the tree

getX2()[source]

Returns the ending position of the tree

insert(x1, x2, name='', referedObject=[])[source]

Insert the segment in it’s right place and returns it. If there’s already a segment S as S.x1 == x1 and S.x2 == x2. S.name will be changed to ‘S.name U name’ and the referedObject will be appended to the already existing list

insertTree(childTree)[source]

inserts childTree in the right position (regions will be rearanged to fit the organisation of self)

intersect(x1, x2=None)[source]

Returns a list of all segments intersected by [x1, x2]

move(newX1)[source]

Moves tree to a new starting position, updates x1s of children

removeGaps()[source]

Remove all gaps between regions

tools.SegmentTree.aux_insertTree(childTree, parentTree)[source]

This a private (You shouldn’t have to call it) recursive function that inserts a child tree into a parent tree.

tools.SegmentTree.aux_moveTree(offset, tree)[source]

This a private recursive (You shouldn’t have to call it) function that translates tree(and it’s children) to a given x1

Secure memory map

A write protected memory map.

class tools.SecureMmap.SecureMmap(filename, enableWrite=False)[source]

In a normal mmap, modifying the string modifies the file. This is a mmap with write protection

forceSet(x1, v)[source]

Forces modification even if the mmap is write protected