How to Loop a Workflow
DAFNI has powerful looping functionality, which allows you to loop over a Workflow no matter how many steps, and for most of the step types DAFNI provides when creating a Workflow.
Note: Currently, the only step type not allowed in the Workflow to be looped is the loop step type itself. This means that loops of loops are currently not supported. However, this functionality will be added in a future update.
You can loop over parameters or dataslots, providing either a list of values or datasets. If the parameter you want to loop takes numeric values, you can generate values using one of DAFNI's random number-sampling techniques.
The configuration to support looping is complex and can be a little intimidating at first glance so this page will guide you through how to configure a loop step and the differences between the options available.
Setup
In order to provide as much flexibility as possible, DAFNI loops over entire Workflows rather than individual Models or individual steps. This means that when it comes to creating a loop, the first thing you need to do is decide on what steps you would like to loop and then create a Workflow that contains only those steps.
This may be best understood with an example, so let's take the Covid 19 Model below.
As you can see, this Model has a parameter called "R Number", let's say you were wanting to run the Model several times while varying the "R Number" each time it is run. Then for each time, the Model is run you want to upload its results into DAFNI's data catalogue.
To do this you would need to:
- create a Workflow with a Model step that runs the Covid 19 Model
- Add a publish step that uploads the results of the Covid 19 Model to the Data catalogue (see image below)
If you are not yet comfortable with creating a Workflow on DAFNI and would like more information then see How to Create a Workflow.
In order to loop over this Workflow, DAFNI needs a Parameter Set that contains the values to use for the parameters and dataslots that are not being varied by the loop.
For the Covid 19 Model, that would be the following parameters:
- "Country Name"
- "Duration of case isolation"
- "Household level compliance with quarantine"
So the next thing to do would be to create a Parameter Set for the Workflow you are wanting to loop that contains the correct values for the other parameters. As DAFNI doesn't know you are creating a loop at this point, you will need to enter values for the parameter(s) you are going to vary into the Parameter Set. These values will be overwritten during the loop so don't worry too much about getting them right at this stage.
Given you will be asked to create a Parameter Set after creating your Workflow, this is a natural next step and there's nothing extra that you would have to do on top of the creation of a regular Workflow.
Creating a Loop
At this point, you should have a Workflow that contains the steps that you would like to loop over and a Parameter Set that contains the values for the parameters and datasets for the dataslots that you are not varying within the loop. In this section, we will go step by step through the process of creating a loop.
As with the previous section, the first step is to press the "Create workflow" button on the Workflow catalogue page.
Then you will be asked to enter some metadata for this new Workflow. I'm choosing to call the Workflow "Covid 19 Loop Workflow" since it will be looping over the Covid 19 Model.
Once you've entered your metadata, press the "Continue" button and you will see the Workflow creation canvas. You will notice (both from the key and the step type selection) that there are two kinds of loop steps:
- Loop-parallel
- Loop-sequential
Parallel vs Sequential
DAFNI allows you to loop a Workflow using two different methods.
Parallel
A parallel loop will try to run all iterations of the loop simultaneously, with each loop in isolation. In practice, depending on the number of iterations you are trying to run and the hardware available at the time, the iterations may be scheduled so they are not simultaneous. Though they will all still be run in isolation and started as soon as is possible.
The number of iterations that are run in a parallel loop depends on the number of values/
datasets you provide to the parameters/dataslots you are varying. DAFNI will programmatically
generate all possible permutations of the parameters and dataslots you set on the loop step. As
an example, consider you are providing the following values for two parameters (first_name
and
surname
) as well as a single dataslot (family_tree
):
first_name: ['Joe', 'Holly']
surname: ['King', 'Wood']
family_tree: [['family-tree-one-dataset-id'], ['family-tree-two-dataset-id']]
DAFNI would then generate all possible permutations of these values which would result in 8 (2x2x2) iterations being created.
Sequential
A sequential loop runs each iteration of the loop in sequence, with the results of the first iteration made available to the second iteration. The results of the second iteration made available to the third iteration and so on.
A sequential loop will provide the first value of all the parameters/dataslots you are varying to the first iteration, the second value to the second iteration etc. This means the number of iterations that will be run in a sequential loop is determined by the number of values you provide to the parameters you want to vary and also results in an extra validation requirement - all the parameters/dataslots you are varying in the loop must have the same number of values/ datasets provided.
Given the previous example with two parameters and a single dataslot, you would need to make sure to provide the same number of values for both parameters as well as the dataslot. In fact, the previous example would also work for a sequential loop but would only generate 2 sequential steps:
1.
first_name: 'Joe'
surname: 'King'
input_dataset: 'family-tree-one-dataset-id'
2.
first_name: 'Holly'
surname: 'Wood'
input_dataset: 'family-tree-two-dataset-id'
Any outputs from the first sequence would feed into the second one.
Note: Currently, DAFNI limits the number of iterations in a loop to 50 until we add a scheduler for Workflows. This limit will be lifted in a future update.
You should keep the differences in the types of loop in mind when choosing which to use for your particular usecase. If you are running a Model on different areas of the UK then a parallel loop may be better for you. If you are running a Model for different periods of time where the results of each period of time are affected by the previous period of time's results then a sequential loop might be more appropriate.
Loop Step Options in the Workflow
For the purposes of this tutorial, let's choose a parallel loop step because the "R number" parameter we are varying will produce completely independent loop iterations.
As you can see there are very few options when initially adding a loop step to a Workflow. This is because you will enter the parameters/dataslots you wish to vary when creating the Parameter Set for this Workflow. As with every step type, this step will need a name - let's call the step "covid-loop". The only other information we need at this point is to pick the Workflow that this step is going to be looping over. We will be looping over the Workflow created earlier in this tutorial, click on the "Select workflow" button to open a dialog box with the inside it as below.
Search for the Workflow you would like to loop and press the checkbox to the left of the name when you have found it. Once you have the Workflow you would like to loop selected then press "Select workflow". The dialog should now close and the name and version information should be displayed in the "Workflow name" field. Press the "Create step" button to finish creating the loop step and you will see it added to your Workflow as below.
Uploading Loop Step Results
The Workflow we created at the start of this tutorial has a publish step in it that creates a dataset from the results of the Model step that runs the Covid 19 Model. This means that, for each iteration produced by the parallel loop step we just added, a dataset will be created.
This is not the only way to publish a dataset containing the results of a loop step, we could
add a publish step to the Workflow we are currently creating and select the loop step
as the "Step to include files from". This publish step will create a single dataset with all of
the results of all of the iterations of a loop included in the dataset. If you upload a dataset
using this method then the files in the dataset will be prefixed by the loop step name, the name
of the step that produced the file and then the iteration number i.e.
{loop step name}-{step name}-{iteration number}-{file name}
. This is to allow you to work out
which iteration and as a result which parameter values produced which file in the dataset.
Unfortunately, we can't do the same with files produced in the first method, adding the publish step to the Workflow to be looped, so if you want to use that method and still distinguish which iteration created which dataset we'd recommend generating the metadata for your datasets in one of your Model steps rather than using the form when creating your publish step.
I will leave it up to you which way you want to upload data from your loop. If you need help creating your publish step then I would again refer you to How to Create a Workflow for a detailed guide.
Loop Step Options in the Parameter Set
Once you have finished adding steps to your Workflow press "Continue" to start adding metadata to the Parameter Set we will create for this Workflow. I've added the metadata below to describe how this Parameter Set will be making the loop step vary the "R Number" parameter.
As you can see on the right-hand side of the metadata panel the "Create default parameter set" option has been disabled. This is because the Workflow contains a loop step and it doesn't make sense to use default values for a loop step - it would result in a loop with only one iteration. So this means whenever you add a loop step to your Workflow you must continue on to the final step to create a Parameter Set for that Workflow.
Press "Continue" and then click on the loop step to add the parameters and dataslots that we are going to vary. You should see the options below.
Base Parameter Set
As mentioned previously in this tutorial, a loop step needs a base parameter set for the Workflow it is looping over to determine what values will be provided to the Model steps in addition to those values it is varying. This is set through the "Base parameter set" field, this is a drop-down that will show the list of Parameter Sets for the Workflow we chose to loop when adding the step. Choose a Parameter Set to use, if you are following this tutorial then there will likely only be one to choose from.
Parameters to Iterate
The next section, "Parameters to iterate" is where things start getting a little more complicated. As you can see the section starts empty, to add a parameter to the loop step press "Add parameter". You will see the section expand to show some more options, as below:
The first thing to do is select the steps that you want to vary a parameter for, you can vary one parameter in many steps at the same time. You can vary more than one parameter but you will need to add a new parameter for each parameter you want to vary. The "Steps" field is another drop-down, this will show the list of Model steps that exist in the Workflow that is being looped. The Workflow we are looping contains only one Model step so there will be only one in the list - "covid19-model". Select this and we can move to the next field.
Next, we need to add the name of the parameter that will be varied. At the moment, you need to remember the name of the parameters you would like to vary - in a future update, we will be providing lists of available parameters to make creating a loop step easier. Enter the name of the "R Number" parameter into the "Parameter name" field, this needs to be the name of the parameter and not the title of the parameter. The name for "R Number" is "R_NUMBER", you can find out the name of a parameter by looking at the Model details page.
Now we need to decide on how we will be providing the values for the "R Number" parameter. In DAFNI you can either specify values manually or, if the parameter is a numeric one, you can generate values for the parameter using a random number-sampling technique on DAFNI. If your parameter is not numeric or you decide to specify values manually then leave the "Generate values" checkbox unchecked and enter the values into the "Values" field below it. After you type a value press "Enter" on your keyboard to confirm the value and you should see it become a "chip" like below:
These are the values we'll be using for this loop but I will quickly cover generating values in the next section to be thorough.
Generating Values
If you would like to generate values for your parameter then check the "Generate values" checkbox and you should see the "Values" field disappear and the "Count" and "Distribution" fields appear as below:
The "Count" field will tell DAFNI how many values to generate.
The "Distribution" field is a drop-down that lists the possible ways you can spread (or distribute) the generated values. Currently, DAFNI allows you to generate values in two possible distributions "Normal" and "Uniform". To learn more about them it's probably best to read the documentation for the package we use here. As you can see the package has plenty more distributions that it can generate, if you would like us to add an additional distribution to the list then make an issue on our user feedback board. Depending on the distribution you choose, two more fields will appear below "Count" and "Distribution". This is because each distribution requires different options to know how to generate the values.
For "Normal" distributions either a standard deviation or a mean value are needed to generate the values - or you can provide both if you would like to. These values can be entered into the "Standard Deviation" and "Mean" fields.
For "Uniform" distributions both a minimum and a maximum value are needed to generate the values, the only requirement is that the minimum must be lower than the maximum value. These values can be entered into the "Minimum" and "Maximum" fields.
Dataslots to Iterate
The final section for a loop step is the "Dataslots to iterate" section. Similarly to the previous section, this section starts empty, a dataslot can be added to the loop step using the "Add dataslot" button. Pressing this button will also see the section expand and a new dataslot will appear as below:
Each dataslot in a Model can be provided with many datasets; in order to enable looping without losing this functionality, each iteration of a loop that is looping over a dataslot must be able to take many datasets.
The first thing we need to do is to select the steps that you would like to vary this dataslot for, as with parameters you can do this by clicking on the "Steps" field and choosing the steps from the drop-down menu that appears.
Next, we need to tell the loop step what the dataslot we are varying is called, enter the name of the dataslot into the "Dataslot name" field. You can find out the name of a dataslot from the Model details page.
You have probably noticed that you haven't yet added any datasets that you want to use as iterations for this dataslot. In order to start adding iterations to your dataslot press the "Add iteration" button. A table will appear, this table shows the datasets that will be provided to the dataslot during this iteration. It will initially look like below:
In order to add datasets to this iteration press the "Add datasets" button, a dialog will open showing the data catalogue, similar to when you select datasets for a dataslot in a Model step. You can select dataslots by clicking the "Select Dataset" checkboxes next to the datasets in the dialog, you can select as many datasets as you want before pressing the "Confirm" button to close the dialog. The datasets that you select here will comprise one of the lists of datasets in the overall list of lists.
This will result in a loop step with one iteration that takes the datasets listed in the table. In order to add another iteration, repeat the process of pressing "Add iteration" then the "Add datasets" button and selecting some datasets. Each table represents an iteration of the loop.
Finishing up
Once you have finished customising the loop step to vary the parameters and dataslots you would like then you can press the "Set parameters and datasets" button to apply these changes to the step. You can then press "Upload workflow and parameter set" to finish the Workflow and upload it to DAFNI.
Now you have a Workflow that will perform a loop of the Covid 19 Model and can run this as you would run any other Workflow in DAFNI.