make single pipeop application easier #683

mb706 · 2022-08-12T14:03:13Z

p = po("encode")
task_preproc = p$train(list(task))[[1]]

mb706 · 2022-08-12T14:03:44Z

potential problem: we don't want to tempt people into doing the wrong kind of preprocessing on train and test set

mb706 · 2022-08-12T14:05:31Z

suggestion (bb):

outdata = preproc("encode", indata, paramlist, state = state)

mb706 · 2022-08-12T14:06:11Z

maybe we want to have preproc(po("encode", paramlist)) instead, more consistent and would also work with PipeOps that are not in po().

mb706 · 2024-08-16T11:15:46Z

could also overload the %>>% operator here.

mb706 · 2024-09-17T13:55:52Z

maybe: whenever state is given, or when optional var predict = TRUE (default FALSE) do prediction, otherwise training. We live with the fact that the state gets saved to the 1st arg pipeop. No reason not to make this work with graphs as well.

mb706 · 2025-02-20T10:41:31Z

preproc(indata, graph, state = NULL, predict = !is.null(state))

Thoughts:

we put 'indata' first so |> works well for transforming data.
overloading %>>% is not so nice since predict = ... could not be set. We could have something like %>>p% for that, but that is probably too much clutter for not much gain so we currently choose not to do that.

some details:

indata could also be a data.frame if we convert to a Task that does not have a target. See if TaskUnsupervised works (maybe it is too abstract and we need to subclass it here). Autotests for pipeops that do not use target at all should see if preproc() works for them for data.frames: expect_datapreproc_pipeop_class should take the $data(cols = $feature_names) and call call preproc() with it. Maybe expect_datapreproc_pipeop_class` needs an extra argument specifying whether target-less prepco() should be possible.
would only work with single output graphs that accepts graphs; basically everything that works in a GraphLearner should also work here, so we could do similar checks as in the GraphLearner.
Check if we can use assert_graph() on 2nd argument or if it breaks things, because 2nd argument should be modified by-reference (state should be set!).
If no state is given and predict is FALSE (or not given, as it defaults to FALSE when state is NULL), graph$train() should be used and the graph should be modified by-reference.
If state is given, the graph should be modified by-reference.
If state is given and predict is FALSE, that is an error.
Annoyingly, if you put in a graph with a state that is set and do not set predict = TRUE, the graph's state will be overwritten. We just live with this, but we should highlight it in the docs and in an example.

mb706 assigned advieser Sep 17, 2024

advieser linked a pull request Feb 23, 2025 that will close this issue

New preproc() function for easier application of PipeOps or Graphs #882

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make single pipeop application easier #683

make single pipeop application easier #683

mb706 commented Aug 12, 2022

mb706 commented Aug 12, 2022

mb706 commented Aug 12, 2022

mb706 commented Aug 12, 2022

mb706 commented Aug 16, 2024

mb706 commented Sep 17, 2024

mb706 commented Feb 20, 2025 •

edited by advieser

Loading

make single pipeop application easier #683

make single pipeop application easier #683

Comments

mb706 commented Aug 12, 2022

mb706 commented Aug 12, 2022

mb706 commented Aug 12, 2022

mb706 commented Aug 12, 2022

mb706 commented Aug 16, 2024

mb706 commented Sep 17, 2024

mb706 commented Feb 20, 2025 • edited by advieser Loading

mb706 commented Feb 20, 2025 •

edited by advieser

Loading