Freie Universität Berlin has been selected as new Intel® Parallel Computing Center (Intel® PCC)

The Intel® PCC program supports universities, institutions, and labs identified as leaders in their fields. It focuses on modernizing applications or software libraries to increase performance on modern microprocessors and coprocessors using parallel computing.

Parallel computing involves the simultaneous operation of multiple processors running elements of a computer program at the same time and is hence much faster than executing the operations sequentially, one after another. This necessitates routines to address the proper use of cores, caches, threads, and vector capabilities of the hardware as well as synchronization routines for parallel computations.

The latest Intel® PCC will be led by Knut Reinert, who is professor for Bioinformatics at the department of Mathematics and Computer Science at Freie Universität (FU) Berlin and a fellow at the Max-Planck-Institute for Molecular Genetics. Mr. Reinert’s research focuses on providing efficient tools for the analysis of Next Generation Sequencing (NGS) data stemming from a technology breakthrough several years ago which enables the cheap sequencing of terabytes of genomic sequence. The Reinert lab based its application development on well-designed algorithmic components and their implementation in the SeqAn C++ software library.

The SeqAn library is well-established and used worldwide in numerous analysis tools for NGS analysis. Since this year, it has also been supported in the CIBI center as part of the German bioinformatics infrastructure network (de.NBI), demonstrating its leading role in the development of new analysis tools for biomedical applications.

“We are very glad that Intel supports our vision for moving biomedical software development forward. SeqAn is a software library containing well designed key components for sequence analysis and we always thought that parallelizing and vectorizing key components of SeqAn will have a big impact on the field by accelerating many applications that make use of those components.”

Knut Reinert, PI

The center will work on abstracting primitives in SeqAn’s template based core to offer a unified interface to multicore and SIMD vector units including the Intel® Xeon and Intel® Xeon Phi™ coprocessors and then accelerate key routines like alignment algorithms or traversing data parallel containers. The generic design of SeqAn makes it well-suited for this approach.

“Intel regards SeqAn as a very promising software package that has all the right ingredients to considerably speed up Next Generation Sequencing analysis on modern Intel processors. We are looking forward to collaborating with Professor Reinert and his team to add our technical know-how about Intel® Architecture and combine it with his algorithmic expertise, and in this way turn SeqAn into a premier software tool in this domain of rapidly growing importance.”

Kristina Kermanshahche
Chief Architect, Intel® Health & Life Sciences

The above strategy will accelerate existing and future applications based on the free SeqAn library (under BSD license) and make hardware acceleration easily available for developers. The FU Berlin will benefit from its role as an Intel® PCC by working with Intel experts and their software tools, and advanced technologies. The Reinert lab will incorporate its work in advanced tutorials and courses at FU Berlin and looks forward to sharing its findings at conferences such as the International Supercomputing Conference and the Intel® Xeon Phi™ Coprocessor User Group (IXPUG) meetings.

FU Berlin Partner im Deutschen Netzwerk Bioinformatik-Infrastruktur (de.NBI)

Das Bundeministerium für Bildung und Forschung (BMBF) fördert ab März für fünf Jahre das Deutsche Netzwerk für Bioinformatik Infrastruktur (de.NBI). Eines der acht Leistungszentren in diesem Netzwerk – das Zentrum für Integrative Bioinformatik (CIBi) – wird dabei für die nächsten fünf Jahre mit zwei Millionen Euro gefördert. CIBi ist ein gemeinsames Zentrum der Universitäten Tübingen und Konstanz sowie der Freien Universität Berlin.

In der biomedizinischen Forschung hat die Einführung von neuen Sequenziermethoden und der hochauflösenden Massenspektrometrie einen Paradigmenwechsel ermöglicht. Die darauf basierenden Hochdurchsatzmethoden wie Genomik, Transkriptomik, Proteomik und Metabolomik – auch Omics-Methoden genannt – geben zwar sehr umfängliche und tiefe Einsichten in zelluläre Systeme, aber die erzeugten Daten sind äußerst umfangreich (im Bereich von Terabytes) und sehr komplex. Zunehmend werden heute auch Daten aus mehreren Technologien parallel erzeugt, zum Beispiel Daten zum Genom und zu den Proteinkonzentrationen in einer Zelle. Für Analyse und Interpretation solcher Datensätze sind daher innovative Algorithmen notwendig. Da ein einzelner Algorithmus für die Analyse dieser Daten nicht mehr ausreicht, werden diese Werkzeuge dann in komplexe Datenanalyse-Workflows eingebunden. Damit wird dann die automatisierte Auswertung selbst komplexester Daten möglich. Das BMBF fördert im Rahmen des Netzwerks die Weiterentwicklung von Algorithmen für die Analyse von Proteom- und Metabolomdaten entwickelt (Tübingen, Softwarepaket OpenMS, Prof. Oliver Kohlbacher), von Genom- und Transkriptomdaten (Berlin, Softwarepaket SeqAn, Prof. Knut Reinert) und der Integration dieser Tools in Workflows (Konstanz, Softwarepaket KNIME, Prof. Michael Berthold)

Das Zentrum für Integrative Bioinformatik ist eng an die anderen sieben Leistungszentren im Deutschen Netzwerk für Bioinformatik angebunden. Das Gesamtnetzwerk wird für fünf Jahre gefördert und nach drei Jahren zwischenevaluiert. Die Koordination des Netzwerks liegt bei der Universität Bielefeld

SeqAn 2.0 released

We are happy to announce the new release of SeqAn 2.0.0.

We have many new features and applications for you and improved many parts in sense of usability, performance and stability.

For example, we have improved the usability and the performance of the I/O-modules. Now BAM-I/O supports parallel read and write operations. We improved automatic read/write operations for compressed file formats, like gzip, bzip2, etc..

We improved the performance of several data structures like the FMIndex, which runs now up to 4 times faster than the old version.

We extended the library with new features like the X-drop extensions for alignments allowing affine gap costs. We added a new realignment module as well as a translation module to translate amino acid alphabet into DNA alphabet.

We implemented many new apps like ANISE and BASIL for insert assembly, Fiona for read error correction, Yara, an enhanced read aligner replacing Masai, and many many more.

With SeqAn 2.0.0 we moved the complete sources to GitHub, which enhances the development cycle in a great way. We improved our build system and added Continous Integration builds with Travis CI. We also updated and improved the API documentation (docs.seqan.de) and switched the tutorials to seqan.readthedocs.org.

You can download the new release and the updated apps from www.seqan.de in the downloads section. You can get the complete sources of the SeqAn 2.0.0 from https://github/seqan/seqan as well.

Simply run:

git clone -b seqan-v2.0.0 https://github.com/seqan/seqan.git seqan-src

and start developing and having fun with SeqAn 2.0.0.

Enjoy!

The SeqAn Team

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

Papers at ECCB 2014

The Reinert lab and collaborators presented two papers at the 14th European Conference on Computational biology in Strassbourg.

Hannes Hauswedell from our group presented
“Lambda: the local aligner for massive biological data [1] while Marcel Schulz from the Max-Planck in Saarbrücken presented “Fiona: a parallel and automatic strategy for read error correction” [2].

[1] [doi] H. Hauswedell, J. Singer, and K. Reinert, “Lambda: the local aligner for massive biological data,” Bioinformatics (oxford, england), vol. 30, iss. 17, p. i349–i355, 2014.
[Bibtex]
@article{Hauswedell:2014bt,
author = {Hauswedell, Hannes and Singer, Jochen and Reinert, Knut},
title = {{Lambda: the local aligner for massive biological data}},
journal = {Bioinformatics (Oxford, England)},
year = {2014},
volume = {30},
number = {17},
pages = {i349--i355},
month = sep,
publisher = {Oxford University Press},
affiliation = {Department of Mathematics and Computer Science, Freie Universit{\"a}t Berlin, Takustr. 9, 14195 Berlin, Germany.},
doi = {10.1093/bioinformatics/btu439},
pmid = {25161219},
pmcid = {PMC4147892},
language = {English},
read = {Yes},
rating = {0},
date-added = {2014-09-08T12:33:47GMT},
date-modified = {2014-09-08T12:36:10GMT},
abstract = {MOTIVATION:Next-generation sequencing technologies produce unprecedented amounts of data, leading to completely new research fields. One of these is metagenomics, the study of large-size DNA samples containing a multitude of diverse organisms. A key problem in metagenomics is to functionally and taxonomically classify the sequenced DNA, to which end the well-known BLAST program is usually used. But BLAST has dramatic resource requirements at metagenomic scales of data, imposing a high financial or technical burden on the researcher. Multiple attempts have been made to overcome these limitations and present a viable alternative to BLAST.
RESULTS:In this work we present Lambda, our own alternative for BLAST in the context of sequence classification. In our tests, Lambda often outperforms the best tools at reproducing BLAST's results and is the fastest compared with the current state of the art at comparable levels of sensitivity.
AVAILABILITY AND IMPLEMENTATION:Lambda was implemented in the SeqAn open-source C++ library for sequence analysis and is publicly available for download at http://www.seqan.de/projects/lambda.
CONTACT:hannes.hauswedell@fu-berlin.de
SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.},
url = {http://bioinformatics.oxfordjournals.org/content/30/17/i349.full},
local-url = {file://localhost/Users/reinert/Dropbox/Library.papers3/Files/94/9488230A-A39B-4CC4-9E72-D226E28C7C90},
file = {{9488230A-A39B-4CC4-9E72-D226E28C7C90:/Users/reinert/Dropbox/Library.papers3/Files/94/9488230A-A39B-4CC4-9E72-D226E28C7C90:application/pdf;9488230A-A39B-4CC4-9E72-D226E28C7C90:/Users/reinert/Dropbox/Library.papers3/Files/94/9488230A-A39B-4CC4-9E72-D226E28C7C90:application/pdf}},
uri = {\url{papers3://publication/doi/10.1093/bioinformatics/btu439}}
}
[2] [doi] M. H. Schulz, D. Weese, M. Holtgrewe, V. Dimitrova, S. Niu, K. Reinert, and H. Richard, “Fiona: a parallel and automatic strategy for read error correction,” Bioinformatics (oxford, england), vol. 30, iss. 17, p. i356–i363, 2014.
[Bibtex]
@article{Schulz:2014dm,
author = {Schulz, Marcel H and Weese, David and Holtgrewe, Manuel and Dimitrova, Viktoria and Niu, Sijia and Reinert, Knut and Richard, Hugues},
title = {{Fiona: a parallel and automatic strategy for read error correction}},
journal = {Bioinformatics (Oxford, England)},
year = {2014},
volume = {30},
number = {17},
pages = {i356--i363},
month = sep,
publisher = {Oxford University Press},
affiliation = {'Multimodal Computing and Interaction', Saarland University {\&} Department for Computational Biology and Applied Computing, Max Planck Institute for Informatics, Saarbr{\"u}cken, 66123 Saarland, Germany, Ray and Stephanie Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, 15206 PA, USA, Department of Mathematics and Computer Science, Freie Universit{\"a}t Berlin, 14195 Berlin, Germany, Universit{\'e} Pierre et Marie Curie, UMR7238, CNRS-UPMC, Paris, France and CNRS, UMR7238, Laboratory of Computational and Quantitative Biology, Paris, France 'Multimodal Computing and Interaction', Saarland University {\&} Department for Computational Biology and Applied Computing, Max Planck Institute for Informatics, Saarbr{\"u}cken, 66123 Saarland, Germany, Ray and Stephanie Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, 15206 PA, USA, Department of Mathematics and Computer Science, Freie Universit{\"a}t Berlin, 14195 Berlin, Germany, Universit{\'e} Pierre et Marie Curie, UMR7238, CNRS-UPMC, Paris, France and CNRS, UMR7238, Laboratory of Computational and Quantitative Biology, Paris, France.},
doi = {10.1093/bioinformatics/btu440},
pmid = {25161220},
pmcid = {PMC4147893},
language = {English},
read = {Yes},
rating = {0},
date-added = {2014-09-08T12:32:23GMT},
date-modified = {2014-09-08T12:36:10GMT},
abstract = {MOTIVATION:Automatic error correction of high-throughput sequencing data can have a dramatic impact on the amount of usable base pairs and their quality. It has been shown that the performance of tasks such as de novo genome assembly and SNP calling can be dramatically improved after read error correction. While a large number of methods specialized for correcting substitution errors as found in Illumina data exist, few methods for the correction of indel errors, common to technologies like 454 or Ion Torrent, have been proposed.
RESULTS:We present Fiona, a new stand-alone read error-correction method. Fiona provides a new statistical approach for sequencing error detection and optimal error correction and estimates its parameters automatically. Fiona is able to correct substitution, insertion and deletion errors and can be applied to any sequencing technology. It uses an efficient implementation of the partial suffix array to detect read overlaps with different seed lengths in parallel. We tested Fiona on several real datasets from a variety of organisms with different read lengths and compared its performance with state-of-the-art methods. Fiona shows a constantly higher correction accuracy over a broad range of datasets from 454 and Ion Torrent sequencers, without compromise in speed.
CONCLUSION:Fiona is an accurate parameter-free read error-correction method that can be run on inexpensive hardware and can make use of multicore parallelization whenever available. Fiona was implemented using the SeqAn library for sequence analysis and is publicly available for download at http://www.seqan.de/projects/fiona.
CONTACT:mschulz@mmci.uni-saarland.de or hugues.richard@upmc.fr
SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.},
url = {http://bioinformatics.oxfordjournals.org/content/30/17/i356.full},
local-url = {file://localhost/Users/reinert/Dropbox/Library.papers3/Files/75/75795684-ABC8-488D-BB7B-330F2F28B93C},
file = {{75795684-ABC8-488D-BB7B-330F2F28B93C:/Users/reinert/Dropbox/Library.papers3/Files/75/75795684-ABC8-488D-BB7B-330F2F28B93C:application/pdf;75795684-ABC8-488D-BB7B-330F2F28B93C:/Users/reinert/Dropbox/Library.papers3/Files/75/75795684-ABC8-488D-BB7B-330F2F28B93C:application/pdf}},
uri = {\url{papers3://publication/doi/10.1093/bioinformatics/btu440}}
}