Just 15 minutes + questions, we focus on topics about using and developing nf-core
pipelines. These are recorded and made available at https://nf-co.re
, helping to build an archive of training material. Got an idea for a talk? Let us know on the #bytesize
Slack channel!
This week, Franziska (@FranBonath) will share her experiences as a beginner in using nf-core provided material and will give tips on how to get started.
Video transcription
:::note The content has been edited to make it reader-friendly :::0:01 Hello everyone and welcome to today’s bytesize talk. Today it’s just me, I’m Franziska and this talk is aimed at people who want to start with nf-core or who know someone who would like to start with nf-core and don’t really know how to best get into it. It’s based solely on my own experience so there might be other ways and there might even be better ways but this is how I found in the end to best start doing work in nf-core actual development work.
0:33 In order to give you a bit of an idea where I come from: I did a PhD in developmental biology, I was working with Drosophila, creating NGS data for microRNAs and other RNAs and at the time I had to analyse my own data. There was no bioinformatician in my group, there was no one that could help me with anything so it was all learning by doing without much help from anyone really. In the time I did everything in Perl because that was the state of the art, the big programming language that you had to learn if you were using bioinformatics. I also did some R on the side and to put it all together I did some bash scripting as well. After my PhD I did a postdoc and there I was looking at other small RNAs, not microRNAs anymore also some RNA and all analysed with NGS. This time there were a lot of bioinformaticians around me, it was mainly actually a bioinformatics lab but I wanted to analyse my own data this time. One reason was because I wanted it exactly in that specific way, another reason was that they were busy with their own projects so sometimes I had to wait and I’m not known for wanting to wait for things. In the time I was doing more R work, a bit more bash but generally nothing advanced. After my postdoc I started at a core facility called NGI and there I started in the lab as well so I got contact with a lot more of different NGS data for one in the lab but also to analyse and look at the results in the end. I wanted to do a bit more bioinfo and I came in contact with nf-core for the first time. This time I started to learn Python and I was using Nextflow pipelines myself and this is also when I started to become a member of nf-core. About one and a half years ago I switched to the bioinformatics side solely and it was still at the same core facility, I did a bit more Python work and this was the time when I was thinking about doing my own Nextflow pipeline based on nf-core. That’s when the trouble began.
3:01 What I wanted to do was actually really really simple, I wanted to do QC pipeline for Hi-C libraries. In the lab we offer a service for Hi-C library generation and after that we want to check if the library prep worked. To do that we need to map the reads with BWA and then we run pairtools, we generate some tables based on the results that we get from pairtools and then we feed this into MultiQC and we get a beautiful report out of it. That was the idea and I thought it would be super easy because there’s already modules for BWA and pairtools, even some pipelines available that use them and MultiQC is anyway is part of all the nf-core pipelines so it would not be an issue and also because we did run this QC already with a bash script, I already had the Python script to make these tables from the pairtools output. I really thought it would be a piece of cake.
4:15 Based on my background, what could I do before I started this? I had some very general scripting experience, like I had my R, my Perl, my bash and Python in the background and I was like okay, so it was fairly easy to go from for example from R to Python, there was not that much more that I had to learn, just a slightly different syntax so it should be fine. Also I felt very comfortable on the command line already at that point, having been doing a lot of command line work up to that point so that was not an issue and being part of nf-core and the core team, I had some experience with Git and GitHub so I knew how to make a fork and have my own repo and how to work on it on the remote and things like that. I felt a bit like a hacker man, I learned this on my own and I can do this, it’s easy. Then also of course I had some experience with running nf-core pipelines and this gives me some familiarity with the names of things and I was like okay, I just fill in the gaps and then it should work out. Also I was really interested, I really wanted to do this and I was curious how it would work and I like things to be neat and tidy so a Nextflow pipeline would be exactly what I wanted to have. Finally I had some very healthy overestimation of my knowledge. This can be a good thing in that it lowers the threshold to actually get into things but of course it also gives you some major drawbacks later on so it’s good and bad to be a bit over-optimistic.
6:12 What were the mistakes that I run into at the very beginning when I wanted to start? So one of the things was that this attitude of I can do this, how difficult can this be? It should be very easy to just put things together and the idea for me was also in the beginning to not have this as a standalone Nextflow pipeline but I append this to an already existing nf-core pipeline and that turned out to be a bit more than I can chew and I fairly quickly gave up on that. Also because of that I was not really working on a testing repo where I started from scratch to learn the basics but I forked the existing pipeline and then I just looked at it and tried to figure out what the different bits mean and it was definitely not the right way to start this. I would not advise that to anyone starting from scratch. Finally it’s not as much a mistake as more coming with my work. I could not work on it continuously so I had an hour here, an hour there. At some point I was like Monday mornings I will work on this and it didn’t work, at least for the very beginning you need I would say a week where you do nothing but start to learn Nextflow.
7:38 This brings me now to what you probably should do at least from my point of view, what my recommendations are. The very first one is to plan your project. Learning Nextflow just in order to have it learned is probably not very fruitful. You should have an idea what you want to do with it afterwards and this will also deepen your knowledge and it will create some base for later work. If you don’t have a current project that you are working on and that you want to implement it can be also something like “I want to generate the RNAseq pipeline but with these different things that the current one can’t do” or something like that. It doesn’t have to be perfect, it doesn’t have to be a big one, but have an idea what you want to do on your own once you think you are at a step where you can start your own work and have enough training.
8:41 The second one which was definitely true for me or where I did not follow my own recommendations, was to watch the Nextflow tutorial. You do need a foundation of Nextflow in order to understand how nf-core pipelines work. There are YouTube tutorials I can show you here. Let me know in the chat if you cannot see this now. Maybe I have to change my share. I just assume that you can see it. On YouTube here you have the foundational Nextflow training which was just recently done, like just a week ago I believe. There’s three sessions, they are two and a half hours long roughly and if you have different language requirements we also had them beginning of the year in different languages like for example we had them in Hindi I think, in French, in Spanish. Have a look at these trainings and these trainings also come with a training tutorial which is here and here you can start your training workshop. This is independent of when we actually hosted the training that you can go through this. Be aware that we only do this YouTube tutorials twice a year and we continuously improve on the Nextflow training documentation. Maybe if something of the two is not exactly the same as the other, stick with the one that is written because that would be the most up-to-date one. Also choose the latest training if you can because that will have the latest updates in them as well from nf-core. There’s something in the chat. Oh yeah, the videos should be embedded in the training.nextflow.io very soon.
10:48 I would suggest to you to take notes when you do the Nextflow tutorial. Reason for that is that at least for me I can easier remember things that I actually wrote down. I have maybe my own logic of how I organize things and that helps me remembering things. Also I would very much recommend you to do the exercises on your own. When you’re going through the tutorial, they will show you how the exercises are done. They will in a way already show you the results. Try to stop it there. Do the exercises on your own and then go back and see if you did it right. Then also take time. Sometimes an exercise will not make sense immediately. Maybe you want to go back. You might also want to read up on other documentation or you want to redo the exercise that you’ve done two sessions before. I said those tutorials take two and a half hours, but I actually needed a full day for each session because I wanted to write down, I wanted to really think it through what this means and how it relates to the previous session or the previous exercises that I’ve done. Feeding into that, I sometimes made up my own exercises. Like I now understand these three points, how they work and how they interact with each other. Now I combine them all and I want to do this. Then I tried it on my own and I tried to figure out if it works or not. I just kept at it until I got it to work. This is my points for the tutorial.
12:34 Once I had done the tutorial, I was again, very confident, I was like, yes, I understand this. It is a piece of cake to just put this all into work and get my Nextflow, nf-core workflow to be done. Unfortunately I had to learn that there is a gap between the Nextflow course and the nf-core pipelines. It is addressed now. There’s going to be an advanced course at the end of the month, I believe, that I think you can still sign up to. But at least at the point of this video, there is no advanced course available that closes this gap. There are some steps that you have to make sure before you start working with the nf-core template and on nf-core pipelines, in my opinion. One is to familiarize yourself with the template that we get from nf-core. Like what do the different entries mean? What are the different directories? What are they used for? For example, very important is the work directory. How can I utilize that? So look through it and make sure that you understand these things before you actually start writing anything.
13:53 Then there will be a time when this will not be enough and you will get stuck. Without tooting my own horn here, I think the bytesize helped me quite a bit. We do have on YouTube, a bytesize playlist that is specifically for developers. I hope you can see this. It has from very, very basic things like resources to learn nextflow, which maybe is the next step to this video. Onto things that I used, for example, was how to customize my MultiQC report or in my case, I wanted to integrate a custom script. That one helped me a lot. It is very good for you to look through here if maybe something of this applies to your problem and maybe it gives you exactly the answer you need. There is also, of course, times when YouTube doesn’t help you and you really need someone to help you directly with your code. This is usually the case when you have an error message that is not helpful. In my case, I had forgotten a comma in a tuple and it told me that my process was already used and the error message had nothing to do with the problem and I just couldn’t figure it out on my own. Then I turned to Slack and yeah, there I got help in the end. First in Slack, I would advise you to look through if someone else had the same issue before and maybe then you don’t have to spend more time or someone else’s time to look at your problem specifically because it has already been solved. But more often than not, your problem is either not directly described or you don’t understand the solution and then of course you can ask for help. There is a Slack channel that is called #nostupidquestions and that is really the case. There’s no stupid question in that channel. You can ask whatever you need to.
16:01 Finally for me, one of the main points was to set small goals. Like I said in the beginning, I started with this big idea of having everything at once and I started with trying to get this goal specifically in the beginning and it didn’t work out. I needed to start small, set my goal to be, I don’t know, adding this one already existing module to a test pipeline that I had or something like that. That helped a lot. With that, I would like to end this today. Thank you all for listening. I will now allow everyone to unmute themselves and also share their video if they want and I’m open for any questions. Thank you. Okay. It seems we don’t have questions. Hi Phil!
17:14 (comment) Thank you for your talk, Fran, it was really good. It was really nice to hear that story and also to hear the projects moving along. (speaker) I am now at the Multi-QC report, maybe I have some questions about that. (comment) I just wanted to reiterate a couple of bits in the chats. You mentioned the error messages, it’s like a common thing I was going to say this week. Yesterday, a podcast went out where Ben from Seqera, Nextflow developer, and I discussed in detail about why error messages in Nextflow are difficult, but also had some good news saying that the Edge release that went out this week had a whole load of improvements in error messages. That one you found, we actually specifically discussed were tuples with a comma: Now it says, did you miss out a tuple comma somewhere? Hopefully that will make life a bit easier for beginners. Also the training is actually, we’ve got three trainings in September. We just had the foundational, we’ve got a short one for beginners, which is new, as well called hands-on, or at least revitalize, which is just one session, which is good for beginners, also good for anyone who just wants a refresher, who’s done the foundational a year ago, hasn’t used it very much and just wants to get up to speed. Then Rob is doing the advanced training, which is the first time we’ve done that publicly and they’re all online, free and will be online forever.
(host) Awesome. Okay. Are there any questions, otherwise I would like to thank you all for listening and as usual, I would like to thank the Chan Zuckerberg Initiative for funding our bytesize talks. I hope to see you all next week. Bye-bye.