3.4.0.1
-------
- small changes to get MIRA compile on newer and older platforms and compilers
  (*buntu 11.10, gcc 4.6.x)


3.4.0
-----
Development of the 3.4 series of MIRA concentrated on making assemblies with
30m to 100m reads more "liveable", i.e., reduce memory and disk footprint of
MIRA as well as improving run-times. At the same time, an updated assembly
strategy both for genome and EST / RNASeq data was devised to reduce the
influence of chimeras and intronic data on the assembly. Also MIRA is now
pretty smart in handling de-novo Solexa projects with "low coverage" (<30x) as
well as "high coverage" (>= 100x).

While we are at it: default parameters for Solexa de-novo were adapted to work
with at least 75mers. While doing de-novo assembly with smaller read lengths
is still possible for MIRA, the whole concept of ultra-short-read de-novo
assembly is a silly idea in the first place. So don't do it.

The new ability to handle IonTorrent data also made its appearance in MIRA as
implementing support for this kind of sequencing technology was comparatively
simple and straight forward. MIRA supports all kind of read lengths presently
on the market (100bp, 220bp) out of the box, but longer read lengths should
not pose a problem. Current IonTorrent data behaves very much like early 454
GS20 reads and I am curious whether Life will be able to perform the same
length and quality improvement within 12 month like 454 did in 2006. Time will
tell.

For PacBio, results are a mixed bag: CCS reads as well as error-corrected CLR
data works extremely well with MIRA, at least I'm happy how the E. coli
C227-11 demo data from the PacBio DevNet gets assembled. I suppose MIRA will
still need to get a couple more rules regarding the error profile of those
reads, but I'll be able to do that only once I've seen more data. What does
not work at all at the moment (and causes me some terrible headache) are the
CLR reads: those with an accuracy of only 80% to 85%. I'm not sure how to
tackle them efficiently.

For mapping assemblies, many smaller and bigger improvements ease the daily
life and improve results with those data sets. Exemplarily named should be
improved mapping quality of reads in highly repetitive regions of a genome
when the reference sequence is not optimal as well as the new ability to load
backbone sequences and annotation from GFF3 format files (saving will follow
shortly).

Quality control and automated clipping has been another focus in the past few
months. Notable developments there are automated clipping of known adaptors in
Solexa and IonTorrent data, improvements in the detection and avoidance of
chimeric reads and a some new automated editing algorithms which edit away
pretty clear cases of sequencing errors.

Regarding utilities, 'convert_project' has been revamped to be able to
convert large assembly or data files with less memory. It also got a number of
new options to get even more use cases covered. The new tool 'mirabait'
enables to quickly extract reads based on matching k-mers from a huge data
set.

For detailed changes, please consult the src/mira/CHANGES_old.txt file in the
source distribution.
