You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/topics/workflows.md
+114-5
Original file line number
Diff line number
Diff line change
@@ -402,6 +402,7 @@ This feature tells the runner that you wish to run a tool or workflow multiple t
402
402
of inputs. The workflow then takes the input(s) as an array and will run the specified step(s)
403
403
on each element of the array as if it were a single input. This allows you to run the same workflow
404
404
on multiple inputs without having to generate many different commands or input yaml files.
405
+
To use `scatter`, `ScatterFeatureRequirement` must be specified in the workflow or workflow step requirements.
405
406
406
407
```cwl
407
408
requirements:
@@ -439,13 +440,12 @@ steps:
439
440
```
440
441
441
442
Here we've added a new field to the step `echo` called `scatter`. This field tells the
442
-
runner that we'd like to scatter over this input for this particular step. Note that
443
+
runner that we'd like to scatter over this input for this particular step. An input parameter may be listed more than once, if a parameter is listed more than once, it becomes
444
+
a nested array. As a result, upstream parameters which are connected to the
445
+
scattered parameters must be arrays. Note that
443
446
the input name listed after scatter is the one of the step's input, not a workflow level input.
444
447
445
-
For our first scatter, it's as simple as that! Since our tool doesn't collect any outputs, we
446
-
still use `outputs: []` in our workflow, but if you expect that the final output of your
447
-
workflow will now have multiple outputs to collect, be sure to update that to an array type
448
-
as well!
448
+
For our first scatter, it's as simple as that! Each job in the scatter results in an entry in the output array because all output parameter types are also implicitly wrapped in arrays. Since our tool doesn't collect any outputs, we still use `outputs: []` in our workflow, but if you expect that the final output of your workflow will now have multiple outputs to collect, be sure to update that to an array type as well!
449
449
450
450
Using the following input file:
451
451
@@ -519,6 +519,115 @@ two-step workflow to a single step subworkflow:
519
519
Now the scatter acts on a single step, but that step consists of two steps so each step is performed
520
520
in parallel.
521
521
522
+
If `scatter` declares more than one input parameter, `scatterMethod`
523
+
describes how to divide the inputs into separate jobs. There are 3 scatter methods in CWL: `dot_product`, `flat_crossproduct`, and `nested_crossproduct`
524
+
525
+
`dotproduct` specifies that each of the input arrays are aligned and one
526
+
element taken from each array to construct each job. It is an error
527
+
if all input arrays are not the same length.
528
+
529
+
```cwl
530
+
#!/usr/bin/env cwl-runner
531
+
cwlVersion: v1.2
532
+
class: Workflow
533
+
534
+
requirements:
535
+
ScatterFeatureRequirement: {}
536
+
537
+
inputs:
538
+
message_file: File[]
539
+
message_array: string[]
540
+
541
+
outputs:
542
+
output_array:
543
+
type: File[]
544
+
outputSource: step1/output
545
+
546
+
steps:
547
+
step1:
548
+
run: example.cwl
549
+
scatter: [input_file, input_array]
550
+
scatterMethod: dotproduct
551
+
in:
552
+
input_file: message_file
553
+
input_array: message_array
554
+
out: [output]
555
+
```
556
+
557
+
`nested_crossproduct` specifies the Cartesian product of the inputs,
558
+
producing a job for every combination of the scattered inputs. The
559
+
output must be nested arrays for each level of scattering, in the
560
+
order that the input arrays are listed in the `scatter` field.
561
+
562
+
```cwl
563
+
#!/usr/bin/env cwl-runner
564
+
cwlVersion: v1.2
565
+
class: Workflow
566
+
567
+
requirements:
568
+
ScatterFeatureRequirement: {}
569
+
570
+
inputs:
571
+
message_file: File[]
572
+
message_array: string[]
573
+
574
+
outputs:
575
+
output_array:
576
+
type:
577
+
type: array
578
+
items:
579
+
type: array
580
+
items: File
581
+
outputSource: step1/output
582
+
583
+
steps:
584
+
step1:
585
+
run: example.cwl
586
+
scatter: [input_file, input_array]
587
+
scatterMethod: nested_crossproduct
588
+
in:
589
+
input_file: message_file
590
+
input_array: message_array
591
+
out: [output]
592
+
```
593
+
594
+
`flat_crossproduct` specifies the Cartesian product of the inputs,
595
+
producing a job for every combination of the scattered inputs. The
596
+
output arrays must be flattened to a single level, but otherwise listed in the
597
+
order that the input arrays are listed in the `scatter` field.
598
+
599
+
```cwl
600
+
#!/usr/bin/env cwl-runner
601
+
cwlVersion: v1.2
602
+
class: Workflow
603
+
604
+
requirements:
605
+
ScatterFeatureRequirement: {}
606
+
607
+
inputs:
608
+
message_file: File[]
609
+
message_array: string[]
610
+
611
+
outputs:
612
+
output_array:
613
+
type:
614
+
type: array
615
+
items:
616
+
type: array
617
+
items: File
618
+
outputSource: step1/output
619
+
620
+
steps:
621
+
step1:
622
+
run: example.cwl
623
+
scatter: [input_file, input_array]
624
+
scatterMethod: flat_crossproduct
625
+
in:
626
+
input_file: message_file
627
+
input_array: message_array
628
+
out: [output]
629
+
```
630
+
522
631
## Conditional Workflows
523
632
524
633
This workflow contains a conditional step and is executed based on the input.
0 commit comments