Skip to content

make single pipeop application easier #683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mb706 opened this issue Aug 12, 2022 · 6 comments · May be fixed by #882
Open

make single pipeop application easier #683

mb706 opened this issue Aug 12, 2022 · 6 comments · May be fixed by #882
Assignees

Comments

@mb706
Copy link
Collaborator

mb706 commented Aug 12, 2022

p = po("encode")
task_preproc = p$train(list(task))[[1]]
@mb706
Copy link
Collaborator Author

mb706 commented Aug 12, 2022

potential problem: we don't want to tempt people into doing the wrong kind of preprocessing on train and test set

@mb706
Copy link
Collaborator Author

mb706 commented Aug 12, 2022

suggestion (bb):

outdata = preproc("encode", indata, paramlist, state = state)

@mb706
Copy link
Collaborator Author

mb706 commented Aug 12, 2022

maybe we want to have preproc(po("encode", paramlist)) instead, more consistent and would also work with PipeOps that are not in po().

@mb706
Copy link
Collaborator Author

mb706 commented Aug 16, 2024

could also overload the %>>% operator here.

@mb706
Copy link
Collaborator Author

mb706 commented Sep 17, 2024

maybe: whenever state is given, or when optional var predict = TRUE (default FALSE) do prediction, otherwise training. We live with the fact that the state gets saved to the 1st arg pipeop. No reason not to make this work with graphs as well.

@mb706
Copy link
Collaborator Author

mb706 commented Feb 20, 2025

preproc(indata, graph, state = NULL, predict = !is.null(state))

Thoughts:

  • we put 'indata' first so |> works well for transforming data.
  • overloading %>>% is not so nice since predict = ... could not be set. We could have something like %>>p% for that, but that is probably too much clutter for not much gain so we currently choose not to do that.

some details:

  • indata could also be a data.frame if we convert to a Task that does not have a target. See if TaskUnsupervised works (maybe it is too abstract and we need to subclass it here). Autotests for pipeops that do not use target at all should see if preproc() works for them for data.frames: expect_datapreproc_pipeop_class should take the $data(cols = $feature_names) and call call preproc() with it. Maybe expect_datapreproc_pipeop_class` needs an extra argument specifying whether target-less prepco() should be possible.
  • would only work with single output graphs that accepts graphs; basically everything that works in a GraphLearner should also work here, so we could do similar checks as in the GraphLearner.
  • Check if we can use assert_graph() on 2nd argument or if it breaks things, because 2nd argument should be modified by-reference (state should be set!).
  • If no state is given and predict is FALSE (or not given, as it defaults to FALSE when state is NULL), graph$train() should be used and the graph should be modified by-reference.
  • If state is given, the graph should be modified by-reference.
  • If state is given and predict is FALSE, that is an error.
  • Annoyingly, if you put in a graph with a state that is set and do not set predict = TRUE, the graph's state will be overwritten. We just live with this, but we should highlight it in the docs and in an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants