Musing from a software developer that mostly works in bioinformatics

Introduction to Nextflow workshop in Uruguay


The Regional Student Group Uruguay (RSG-Uruguay) is a non-profit academic group intended to gather young Uruguayan researchers among the bioinformatics and computational biology fields. They have been organizing various local training events and meetings. In this context I offer to host a Nextflow introductory workshop in April 2025 as I was going to be in the country on holiday.

We decided to use the Hello Nextflow course for the workshop. This one is very well suited for people who are not familiar with workflow management systems. The plan was for me to go over the lessons, stopping to explain and answer any questions. Then, the participants would work through the same lessons themselves.

Read more ⟶

"Sync" Channels in Nextflow


Nextflow is based on the Dataflow paradigm. The operations on the data are defined in a directed acyclic graph (DAG). The nodes of the DAG are the operations that need to happen and the edges are the inputs and outputs which connect the edges. This short post won’t go into too much detail about this.

Nextflow handles the parallelization and distribution of the task for the user. Developers have two basic structures to handle the parallelization of tasks, channels and channel operators. Channels are used to communicate between processes, and channel operators, often simply called operators, are used to consume, transform, and produce channels themselves. Operators borrow concepts from functional programming, as this paradigm is well suited to handle asynchronous tasks.

Read more ⟶

Dynamic resource allocation based on inputs in Nextflow


Nextflow fits my mental model for pipelines perfectly. One of the features I appreciate most is the ability to dynamically assign resources to a process based on various rules, such as the size of input files. This is particularly convenient when running bioinformatics pipelines on HPC systems. In bioinformatics, many tools are not optimized (though many are, and I mean no disrespect). As a result, depending on the input or other circumstances, some tools may require large amounts of resources.

Read more ⟶

Counting seqs in fastas with python


I was working with a script in mgnify-pipelines-toolkit, and I bumped into some issues running the test suite locally on my laptop (running macosx).

The problem is caused by how the script was trying to count the number of reads in a fasta and fastq files. It was using the good ‘ol subprocess.Popen way and running zcat, grep and wc -l to get the count. The subprocess calls were expecting the GNU coretuil version and there was a missing flag of sorts in zcat. I decided to look for a fully python implementation of this.

Read more ⟶