Skip to content

Commit e1beb37

Browse files
Mackenzie-OO7mr-c
authored andcommitted
document scatterMethod
1 parent b6c2f66 commit e1beb37

File tree

1 file changed

+114
-5
lines changed

1 file changed

+114
-5
lines changed

src/topics/workflows.md

+114-5
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,7 @@ This feature tells the runner that you wish to run a tool or workflow multiple t
402402
of inputs. The workflow then takes the input(s) as an array and will run the specified step(s)
403403
on each element of the array as if it were a single input. This allows you to run the same workflow
404404
on multiple inputs without having to generate many different commands or input yaml files.
405+
To use `scatter`, `ScatterFeatureRequirement` must be specified in the workflow or workflow step requirements.
405406

406407
```cwl
407408
requirements:
@@ -439,13 +440,12 @@ steps:
439440
```
440441

441442
Here we've added a new field to the step `echo` called `scatter`. This field tells the
442-
runner that we'd like to scatter over this input for this particular step. Note that
443+
runner that we'd like to scatter over this input for this particular step. An input parameter may be listed more than once, if a parameter is listed more than once, it becomes
444+
a nested array. As a result, upstream parameters which are connected to the
445+
scattered parameters must be arrays. Note that
443446
the input name listed after scatter is the one of the step's input, not a workflow level input.
444447

445-
For our first scatter, it's as simple as that! Since our tool doesn't collect any outputs, we
446-
still use `outputs: []` in our workflow, but if you expect that the final output of your
447-
workflow will now have multiple outputs to collect, be sure to update that to an array type
448-
as well!
448+
For our first scatter, it's as simple as that! Each job in the scatter results in an entry in the output array because all output parameter types are also implicitly wrapped in arrays. Since our tool doesn't collect any outputs, we still use `outputs: []` in our workflow, but if you expect that the final output of your workflow will now have multiple outputs to collect, be sure to update that to an array type as well!
449449

450450
Using the following input file:
451451

@@ -519,6 +519,115 @@ two-step workflow to a single step subworkflow:
519519
Now the scatter acts on a single step, but that step consists of two steps so each step is performed
520520
in parallel.
521521

522+
If `scatter` declares more than one input parameter, `scatterMethod`
523+
describes how to divide the inputs into separate jobs. There are 3 scatter methods in CWL: `dot_product`, `flat_crossproduct`, and `nested_crossproduct`
524+
525+
`dotproduct` specifies that each of the input arrays are aligned and one
526+
element taken from each array to construct each job. It is an error
527+
if all input arrays are not the same length.
528+
529+
```cwl
530+
#!/usr/bin/env cwl-runner
531+
cwlVersion: v1.2
532+
class: Workflow
533+
534+
requirements:
535+
ScatterFeatureRequirement: {}
536+
537+
inputs:
538+
message_file: File[]
539+
message_array: string[]
540+
541+
outputs:
542+
output_array:
543+
type: File[]
544+
outputSource: step1/output
545+
546+
steps:
547+
step1:
548+
run: example.cwl
549+
scatter: [input_file, input_array]
550+
scatterMethod: dotproduct
551+
in:
552+
input_file: message_file
553+
input_array: message_array
554+
out: [output]
555+
```
556+
557+
`nested_crossproduct` specifies the Cartesian product of the inputs,
558+
producing a job for every combination of the scattered inputs. The
559+
output must be nested arrays for each level of scattering, in the
560+
order that the input arrays are listed in the `scatter` field.
561+
562+
```cwl
563+
#!/usr/bin/env cwl-runner
564+
cwlVersion: v1.2
565+
class: Workflow
566+
567+
requirements:
568+
ScatterFeatureRequirement: {}
569+
570+
inputs:
571+
message_file: File[]
572+
message_array: string[]
573+
574+
outputs:
575+
output_array:
576+
type:
577+
type: array
578+
items:
579+
type: array
580+
items: File
581+
outputSource: step1/output
582+
583+
steps:
584+
step1:
585+
run: example.cwl
586+
scatter: [input_file, input_array]
587+
scatterMethod: nested_crossproduct
588+
in:
589+
input_file: message_file
590+
input_array: message_array
591+
out: [output]
592+
```
593+
594+
`flat_crossproduct` specifies the Cartesian product of the inputs,
595+
producing a job for every combination of the scattered inputs. The
596+
output arrays must be flattened to a single level, but otherwise listed in the
597+
order that the input arrays are listed in the `scatter` field.
598+
599+
```cwl
600+
#!/usr/bin/env cwl-runner
601+
cwlVersion: v1.2
602+
class: Workflow
603+
604+
requirements:
605+
ScatterFeatureRequirement: {}
606+
607+
inputs:
608+
message_file: File[]
609+
message_array: string[]
610+
611+
outputs:
612+
output_array:
613+
type:
614+
type: array
615+
items:
616+
type: array
617+
items: File
618+
outputSource: step1/output
619+
620+
steps:
621+
step1:
622+
run: example.cwl
623+
scatter: [input_file, input_array]
624+
scatterMethod: flat_crossproduct
625+
in:
626+
input_file: message_file
627+
input_array: message_array
628+
out: [output]
629+
```
630+
522631
## Conditional Workflows
523632

524633
This workflow contains a conditional step and is executed based on the input.

0 commit comments

Comments
 (0)