From fcd27ba733fdd9ff93e5eea33f75dd9100c55f3c Mon Sep 17 00:00:00 2001
From: Dorota Jarecka <djarecka@gmail.com>
Date: Mon, 16 Oct 2023 22:37:23 -0400
Subject: [PATCH 1/2] updates to Intro and FunctionTask

---
 notebooks/1_intro_pydra.md        |  51 +++----
 notebooks/2_intro_functiontask.md | 220 ++++++------------------------
 2 files changed, 63 insertions(+), 208 deletions(-)

diff --git a/notebooks/1_intro_pydra.md b/notebooks/1_intro_pydra.md
index fe65a2d..0b64f62 100644
--- a/notebooks/1_intro_pydra.md
+++ b/notebooks/1_intro_pydra.md
@@ -4,9 +4,9 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.14.0
+    jupytext_version: 1.15.2
 kernelspec:
-  display_name: Python 3
+  display_name: Python 3 (ipykernel)
   language: python
   name: python3
 ---
@@ -15,46 +15,31 @@ kernelspec:
 
 +++
 
-Pydra is a lightweight, Python 3.7+ dataflow engine for computational graph construction, manipulation, and distributed execution.
-Designed as a general-purpose engine to support analytics in any scientific domain; created for [Nipype](https://github.com/nipy/nipype), and helps build reproducible, scalable, reusable, and fully automated, provenance tracked scientific workflows.
-The power of Pydra lies in ease of workflow creation
-and execution for complex multiparameter map-reduce operations, and the use of global cache.
+Pydra is a lightweight, Python 3.7+ dataflow engine. While it originated within the neuroimaging community, its versatile design makes it suitable as a general-purpose engine to facilitate analytics across various scientific fields.
 
-Pydra's key features are:
-- Consistent API for Task and Workflow
-- Splitting & combining semantics on Task/Workflow level
-- Global cache support to reduce recomputation
-- Support for execution of Tasks in containerized environments
 
-+++
+You can discover a more in-depth explanation of the concept behind Pydra here. TODO-LINK
 
-## Pydra computational objects - Tasks
-There are two main types of objects in *pydra*: `Task` and `Workflow`, that is also a type of `Task`, and can be used in a nested workflow.
-![nested_workflow.png](../figures/nested_workflow.png)
++++
 
+In this tutorial you will create and execute your first `Task`s from standard Python functions and shell commands. 
+You'll also construct basic `Workflow`s that link multiple tasks together. Furthermore, you'll have the opportunity to produce `Task`s and `Workflow`s capable of automatically running for multiple inputs values.
 
++++
 
-**These are the current `Task` implemented in Pydra:**
-- `Workflow`: connects multiple `Task`s withing a graph
-- `FunctionTask`: wrapper for Python functions
-- `ShellCommandTask`: wrapper for shell commands
-    - `ContainerTask`: wrapper for shell commands run within containers
-      - `DockerTask`: `ContainerTask` that uses Docker
-      - `SingularityTask`: `ContainerTask` that uses Singularity
+**Before going to the main notebooks, let's check if pydra is properly installed.** If you have any issues running the following cell, please revisit the Installation section. TODO-LINK
 
-+++
+```{code-cell} ipython3
+import pydra
+```
 
-## Pydra Workers
-Pydra supports multiple workers to execute `Tasks` and `Workflows`:
-- `ConcurrentFutures`
-- `SLURM`
-- `Dask`
-- `PSI/J`
+### Additional notes
 
 +++
 
-**Before going to next notebooks, let's check if pydra is properly installed**
-
-```{code-cell}
-import pydra
+At the beginning of each tutorial you will see:
+```
+import nest_asyncio
+nest_asyncio.apply()
 ```
+This is run because both *Jupyter* and *Pydra* use `asyncio` and in some cases you can see `RuntimeError: This event loop is already running` if `nest_asyncio` is not used. **This part is not needed if Pydra is used outside the Jupyter environment.**
diff --git a/notebooks/2_intro_functiontask.md b/notebooks/2_intro_functiontask.md
index ace8f2d..e1e2c9c 100644
--- a/notebooks/2_intro_functiontask.md
+++ b/notebooks/2_intro_functiontask.md
@@ -4,15 +4,13 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.15.0
+    jupytext_version: 1.15.2
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
   name: python3
 ---
 
-# FunctionTask
-
 ```{code-cell} ipython3
 ---
 jupyter:
@@ -23,11 +21,14 @@ pycharm:
     '
 ---
 import nest_asyncio
-
 nest_asyncio.apply()
 ```
 
-A `FunctionTask` is a `Task` that can be created from every *python* function by using *pydra* decorator: `pydra.mark.task`:
+# FunctionTask
+
++++
+
+In this tutorial, you will generate your initial *pydra* `Task`, which is a fundamental *pydra*'s component capable of processing data. You will start from a `FunctionTask`, a type of `Task` that can be created from every *python* function by using *pydra* decorator: `pydra.mark.task`:
 
 ```{code-cell} ipython3
 import pydra
@@ -37,26 +38,20 @@ def add_var(a, b):
     return a + b
 ```
 
-Once we decorate the function, we can create a pydra `Task` and specify the input:
+Now that we decorated the function, we can create a pydra `Task` and specify the input, for this example values for `a` and `b` are needed.
 
 ```{code-cell} ipython3
 task0 = add_var(a=4, b=5)
 ```
 
-We can check the type of `task0`:
-
-```{code-cell} ipython3
-type(task0)
-```
-
-and we can check if the task has correct values of `a` and `b`, they should be saved in the task `inputs`:
+You can now check if the task has correct values of `a` and `b`, they should be saved in the task `inputs`:
 
 ```{code-cell} ipython3
 print(f'a = {task0.inputs.a}')
 print(f'b = {task0.inputs.b}')
 ```
 
-We can also check content of entire `inputs`:
+You can also check content of entire `inputs`:
 
 ```{code-cell} ipython3
 task0.inputs
@@ -64,132 +59,41 @@ task0.inputs
 
 As you could see, `task.inputs` contains also information about the function, that is an inseparable part of the `FunctionTask`.
 
-Once we have the task with set input, we can run it. Since `Task` is a "callable object", we can use the syntax:
+Once you have the task with set values of input, you can run it. Since `Task` is a "callable object", we can use the following syntax:
 
 ```{code-cell} ipython3
 task0()
 ```
 
-As you can see, the result was returned right away, but we can also access it later:
+As you can see, the result was returned right away, but you can also access it later:
 
 ```{code-cell} ipython3
 task0.result()
 ```
 
-`Result` contains more than just an output, so if we want to get the task output, we can type:
+The function should return the `Result` object. `Result` contains more than just an output, so if you want to get the task output, we can type:
 
 ```{code-cell} ipython3
 result = task0.result()
 result.output.out
 ```
 
-And if we want to see the input that was used in the task, we can set an optional argument `return_inputs` to True.
+You can also see the input that was used to run the task by setting an optional argument `return_inputs` to True.
 
 ```{code-cell} ipython3
 task0.result(return_inputs=True)
 ```
 
-## Type-checking
-
-+++
-
-### What is Type-checking?
-
-Type-checking is verifying the type of a value at compile or run time. It ensures that operations or assignments to variables are semantically meaningful and can be executed without type errors, enhancing code reliability and maintainability.
-
-+++
-
-### Why Use Type-checking?
-
-1. **Error Prevention**: Type-checking helps catch type mismatches early, preventing potential runtime errors.
-2. **Improved Readability**: Type annotations make understanding what types of values a function expects and returns more straightforward.
-3. **Better Documentation**: Explicitly stating expected types acts as inline documentation, simplifying code collaboration and review.
-4. **Optimized Performance**: Type-related optimizations can be made during compilation when types are explicitly specified.
+Notice that the full name of the input variables contains the name of the task!
 
 +++
 
-### How is Type-checking Implemented in Pydra?
+If you want to practice, change the values of `a` and `b` and run the task again. 
 
 +++
 
-#### Static Type-Checking
-Static type-checking is done using Python's type annotations. You annotate the types of your function arguments and the return type and then use a tool like `mypy` to statically check if you're using the function correctly according to those annotations.
-
-```{code-cell} ipython3
-@pydra.mark.task
-def add(a: int, b: int) -> int:
-    return a + b
-```
-
-```{code-cell} ipython3
-# This usage is correct according to static type hints:
-task1a = add(a=5, b=3)
-task1a()
-```
-
-```{code-cell} ipython3
-:tags: [raises-exception]
-# This usage is incorrect according to static type hints:
-task1b = add(a="hello", b="world")
-task1b()
-```
-
-#### Dynamic Type-Checking
-
-Dynamic type-checking is done at runtime. Add dynamic type checks if you want to enforce types when the function is executed.
-
-```{code-cell} ipython3
-@pydra.mark.task
-def add(a, b):
-    if not (isinstance(a, int) and isinstance(b, int)):
-        raise TypeError("Both inputs should be integers.")
-    return a + b
-```
-
-```{code-cell} ipython3
-# This usage is correct and will not raise a runtime error:
-task1c = add(a=5, b=3)
-task1c()
-```
-
-```{code-cell} ipython3
-:tags: [raises-exception]
-# This usage is incorrect and will raise a runtime TypeError:
-task1d = add(a="hello", b="world")
-task1d()
-```
-
-#### Checking Complex Types
-
-For more complex types like lists, dictionaries, or custom objects, we can use type hints combined with dynamic checks.
-
-```{code-cell} ipython3
-from typing import List, Tuple
-
-@pydra.mark.task
-def sum_of_pairs(pairs: List[Tuple[int, int]]) -> List[int]:
-    if not all(isinstance(pair, Tuple) and len(pair) == 2 for pair in pairs):
-        raise ValueError("Input should be a list of pairs (tuples with 2 integers each).")
-    return [sum(pair) for pair in pairs]
-```
-
-```{code-cell} ipython3
-# Correct usage
-task1e = sum_of_pairs(pairs=[(1, 2), (3, 4)])  
-task1e()
-```
-
-```{code-cell} ipython3
-:tags: [raises-exception]
-# This will raise a ValueError
-task1f = sum_of_pairs(pairs=[(1, 2), (3, "4")])  
-task1f()
-```
-
 ## Customizing output names
-Note, that "out" is the default name for the task output, but we can always customize it. There are two ways of doing it: using *python* function annotation and using another *pydra* decorator:
-
-Let's start from the function annotation:
+Note, that "out" is the default name for the task output, but you can always customize it by using *python* function annotation.
 
 ```{code-cell} ipython3
 import typing as ty
@@ -217,27 +121,9 @@ task2b = modf_an(a=3.5)
 task2b()
 ```
 
-The second way of customizing the output requires another decorator - `pydra.mark.annotate`
-
-```{code-cell} ipython3
-@pydra.mark.task
-@pydra.mark.annotate({'return': {'fractional': ty.Any, 'integer': ty.Any}})
-def modf(a: float):
-    import math
-
-    return math.modf(a)
-
-task2c = modf(a=3.5)
-task2c()
-```
-
-**Note, that the order of the pydra decorators is important!**
-
-+++
-
 ## Setting the input
 
-We don't have to provide the input when we create a task, we can always set it later:
+Note that you don't have to provide the input when you create a task, you can always set it later:
 
 ```{code-cell} ipython3
 task3 = add_var()
@@ -246,19 +132,16 @@ task3.inputs.b = 5
 task3()
 ```
 
-If we don't specify the input, `attr.NOTHING` will be used as the default value
+If you don't specify the input, `attr.NOTHING` will be used as the default value
 
 ```{code-cell} ipython3
 task3a = add_var()
 task3a.inputs.a = 4
 
-# importing attr library, and checking the type of `b`
-import attr
-
-task3a.inputs.b == attr.NOTHING
+task3a.inputs.b
 ```
 
-And if we try to run the task, an error will be raised:
+And if you try to run the task, an error will be raised:
 
 ```{code-cell} ipython3
 :tags: [raises-exception]
@@ -266,9 +149,13 @@ And if we try to run the task, an error will be raised:
 task3a()
 ```
 
+You can now try to fix the task and run it again.
+
++++
+
 ## Output directory and caching the results
 
-After running the task, we can check where the output directory with the results was created:
+After running the task, you can check where the output directory with the results was created:
 
 ```{code-cell} ipython3
 task3.output_dir
@@ -278,13 +165,13 @@ Within the directory you can find the file with the results: `_result.pklz`.
 
 ```{code-cell} ipython3
 import os
-```
-
-```{code-cell} ipython3
 os.listdir(task3.output_dir)
 ```
 
-But we can also provide the path where we want to store the results. If a path is provided for the cache directory, then pydra will use the cached results of a node instead of recomputing the result. Let's create a temporary directory and a specific subdirectory "task4":
+But you can also provide the path where you want to store the results. 
+**Note that if the same path is provided when you run the task again, pydra will use the cached results instead of recomputing the result.** 
+
+Let's create a temporary directory and a specific subdirectory "task4":
 
 ```{code-cell} ipython3
 from tempfile import mkdtemp
@@ -296,7 +183,7 @@ cache_dir_tmp = Path(mkdtemp()) / 'task4'
 print(cache_dir_tmp)
 ```
 
-Now we can pass this path to the argument of `FunctionTask` - `cache_dir`. To observe the execution time, we specify a function that is sleeping for 5s:
+Now you can pass this path to the argument of `FunctionTask` - `cache_dir`. To observe the execution time, you can specify a function that is sleeping for 5s:
 
 ```{code-cell} ipython3
 @pydra.mark.task
@@ -311,25 +198,29 @@ task4 = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
 
 If you're running the cell first time, it should take around 5s.
 
+You can meassure the exact time by using a special method from Jupyter by adding `%%time`.
+
 ```{code-cell} ipython3
+%%time
 task4()
 task4.result()
 ```
 
-We can check `output_dir` of our task, it should contain the path of `cache_dir_tmp` and the last part contains the name of the task class `FunctionTask` and the task checksum:
+You can check `output_dir` of our task, it should contain the path of `cache_dir_tmp` and the last part contains the name of the task class `FunctionTask` and the task checksum that is unique for a specific function and specific set of input values. You can read more about checksum here TODO-LINK
 
 ```{code-cell} ipython3
 task4.output_dir
 ```
 
-Let's see what happens when we defined identical task again with the same `cache_dir`:
+Let's see what happens when an identical task is run again with the same `cache_dir`:
 
 ```{code-cell} ipython3
+%%time
 task4a = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
 task4a()
 ```
 
-This time the result should be ready right away! *pydra* uses available results and do not recompute the task.
+This time the result should be ready right away! *pydra* uses available results and do not recompute the task. The wall time provided by `%%tinme` should be in milliseconds.
 
 *pydra* not only checks for the results in `cache_dir`, but you can provide a list of other locations that should be checked. Let's create another directory that will be used as `cache_dir` and previous working directory will be used in `cache_locations`.
 
@@ -342,7 +233,7 @@ task4b = add_var_wait(
 task4b()
 ```
 
-This time the results should be also returned quickly! And we can check that `task4b.output_dir` was not created:
+This time the results should be also returned quickly! And you can check that `task4b.output_dir` was not created:
 
 ```{code-cell} ipython3
 task4b.output_dir.exists()
@@ -361,7 +252,7 @@ task4c(rerun=True)
 task4c.output_dir.exists()
 ```
 
-If we update the input of the task, and run again, the new directory will be created and task will be recomputed:
+Remember that if you update the input of the task, the new directory will be created and task will be recomputed!
 
 ```{code-cell} ipython3
 task4b.inputs.a = 1
@@ -369,25 +260,28 @@ print(task4b())
 print(task4b.output_dir.exists())
 ```
 
-and when we check the `output_dir`, we can see that it's different than last time:
+and when you check the `output_dir`, you can see that it's different than last time:
 
 ```{code-cell} ipython3
 task4b.output_dir
 ```
 
-This is because, the checksum changes when we change either input or function.
+This is because, the checksum changes when you change either input or function.
 
 +++ {"solution2": "hidden", "solution2_first": true}
 
 ### Exercise 1
+Now you can practice creating new tasks!
+
 Create a task that take a list of numbers as an input and returns two fields: `mean` with the mean value and `std` with the standard deviation value.
 
 ```{code-cell} ipython3
 :tags: [hide-cell]
 
+#TODO-HIDE
 @pydra.mark.task
 @pydra.mark.annotate({'return': {'mean': ty.Any, 'std': ty.Any}})
-def mean_dev(my_list: List):
+def mean_dev(my_list):
     import statistics as st
 
     return st.mean(my_list), st.stdev(my_list)
@@ -400,27 +294,3 @@ my_task.result()
 ```{code-cell} ipython3
 # write your solution here (you can use statistics module)
 ```
-
-## Using Audit
-
-*pydra* can record various run time information, including the workflow provenance, by setting `audit_flags` and the type of messengers.
-
-`AuditFlag.RESOURCE` allows you to monitor resource usage for the `Task`, while `AuditFlag.PROV` tracks the provenance of the `Task`.
-
-```{code-cell} ipython3
-from pydra.utils.messenger import AuditFlag, PrintMessenger
-
-task5 = add_var(a=4, b=5, audit_flags=AuditFlag.RESOURCE)
-task5()
-task5.result()
-```
-
-One can turn on both audit flags using `AuditFlag.ALL`, and print the messages on the terminal using the `PrintMessenger`.
-
-```{code-cell} ipython3
-task5 = add_var(
-    a=4, b=5, audit_flags=AuditFlag.ALL, messengers=PrintMessenger()
-)
-task5()
-task5.result()
-```

From 31078bf4bf49696bfe1b3aafa7983d62ce2fee4f Mon Sep 17 00:00:00 2001
From: Dorota Jarecka <djarecka@gmail.com>
Date: Tue, 17 Oct 2023 14:02:53 -0400
Subject: [PATCH 2/2] applying Yibei suggestions

---
 notebooks/1_intro_pydra.md        |  4 ++--
 notebooks/2_intro_functiontask.md | 12 ++++++------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/notebooks/1_intro_pydra.md b/notebooks/1_intro_pydra.md
index 0b64f62..ec85f2e 100644
--- a/notebooks/1_intro_pydra.md
+++ b/notebooks/1_intro_pydra.md
@@ -27,7 +27,7 @@ You'll also construct basic `Workflow`s that link multiple tasks together. Furth
 
 +++
 
-**Before going to the main notebooks, let's check if pydra is properly installed.** If you have any issues running the following cell, please revisit the Installation section. TODO-LINK
+**Let's check if pydra is properly installed.** If you have any issues running the following cell, please revisit the Installation section. TODO-LINK
 
 ```{code-cell} ipython3
 import pydra
@@ -42,4 +42,4 @@ At the beginning of each tutorial you will see:
 import nest_asyncio
 nest_asyncio.apply()
 ```
-This is run because both *Jupyter* and *Pydra* use `asyncio` and in some cases you can see `RuntimeError: This event loop is already running` if `nest_asyncio` is not used. **This part is not needed if Pydra is used outside the Jupyter environment.**
+This is because both *Jupyter* and *Pydra* use `asyncio` and you can get `RuntimeError: This event loop is already running` if `nest_asyncio` is not used. **This part is not needed if Pydra is used outside of Jupyter Notebook/Lab.**
diff --git a/notebooks/2_intro_functiontask.md b/notebooks/2_intro_functiontask.md
index e1e2c9c..bb34a72 100644
--- a/notebooks/2_intro_functiontask.md
+++ b/notebooks/2_intro_functiontask.md
@@ -28,7 +28,7 @@ nest_asyncio.apply()
 
 +++
 
-In this tutorial, you will generate your initial *pydra* `Task`, which is a fundamental *pydra*'s component capable of processing data. You will start from a `FunctionTask`, a type of `Task` that can be created from every *python* function by using *pydra* decorator: `pydra.mark.task`:
+In this tutorial, you will generate your initial *Pydra* `Task`, which is a fundamental *Pydra*'s component capable of processing data. You will start from a `FunctionTask`, a type of `Task` that can be created from every *python* function by using *Pydra* decorator: `pydra.mark.task`:
 
 ```{code-cell} ipython3
 import pydra
@@ -38,7 +38,7 @@ def add_var(a, b):
     return a + b
 ```
 
-Now that we decorated the function, we can create a pydra `Task` and specify the input, for this example values for `a` and `b` are needed.
+After decorating the function, you can create a Pydra `Task` and specify the input. In this example, values for `a` and `b` are needed.
 
 ```{code-cell} ipython3
 task0 = add_var(a=4, b=5)
@@ -93,7 +93,7 @@ If you want to practice, change the values of `a` and `b` and run the task again
 +++
 
 ## Customizing output names
-Note, that "out" is the default name for the task output, but you can always customize it by using *python* function annotation.
+Note, that `out` from `result.output.out` is the default name for the task output, but you can always customize it by using *python* function annotation.
 
 ```{code-cell} ipython3
 import typing as ty
@@ -169,7 +169,7 @@ os.listdir(task3.output_dir)
 ```
 
 But you can also provide the path where you want to store the results. 
-**Note that if the same path is provided when you run the task again, pydra will use the cached results instead of recomputing the result.** 
+**Note that if the same path is provided when you run the task again, Pydra will use the cached results instead of recomputing the result.** 
 
 Let's create a temporary directory and a specific subdirectory "task4":
 
@@ -220,9 +220,9 @@ task4a = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
 task4a()
 ```
 
-This time the result should be ready right away! *pydra* uses available results and do not recompute the task. The wall time provided by `%%tinme` should be in milliseconds.
+This time the result should be ready right away! *Pydra* uses available results and do not recompute the task. The wall time provided by `%%tinme` should be in milliseconds.
 
-*pydra* not only checks for the results in `cache_dir`, but you can provide a list of other locations that should be checked. Let's create another directory that will be used as `cache_dir` and previous working directory will be used in `cache_locations`.
+*Pydra* not only checks for the results in `cache_dir`, but you can provide a list of other locations that should be checked. Let's create another directory that will be used as `cache_dir` and previous working directory will be used in `cache_locations`.
 
 ```{code-cell} ipython3
 cache_dir_tmp_new = Path(mkdtemp()) / 'task4b'