Skip to content

H2O3 Mojo model scoring fails in python when offset column is used #16590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
arunaryasomayajula opened this issue Mar 12, 2025 · 5 comments · May be fixed by #16605
Open

H2O3 Mojo model scoring fails in python when offset column is used #16590

arunaryasomayajula opened this issue Mar 12, 2025 · 5 comments · May be fixed by #16605
Assignees
Labels
bug cust-statefarm Mojo reporter-support Reported as a support issue by cuetomer

Comments

@arunaryasomayajula
Copy link

H2O version, Operating System and Environment
3.46.0.6
Actual behavior
We have a XGBoost model trained on an older version with offset that has been thru a lot of evaluation. We are planning to deploy this model. However, we are unable to predict using a mojo model after we zero out the offset column. We get the following error.
The model predict works when we use the binary model. Can you please take a look and let us know any alternate way to use the mojo model? I don’t think we can retrain the model at this point.

OSError: Job with key $03017f00000132d4ffffffff$_aa0f1f0d307fc3704b9cb49444844cd7 failed with an exception: DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
stacktrace:
DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at water.MRTask.getResult(MRTask.java:660)
at water.MRTask.getResult(MRTask.java:670)
at water.MRTask.doAll(MRTask.java:530)
at water.MRTask.doAll(MRTask.java:549)
at hex.Model.predictScoreImpl(Model.java:2161)
at hex.generic.GenericModel.predictScoreImpl(GenericModel.java:161)
at hex.Model.score(Model.java:2002)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:555)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1704)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at hex.genmodel.algos.xgboost.XGBoostMojoModel.score0(XGBoostMojoModel.java:88)
at hex.generic.GenericModel.score0(GenericModel.java:311)
at hex.generic.GenericModel.score0(GenericModel.java:317)
at hex.Model.score0(Model.java:2378)
at hex.Model$BigScore.score0(Model.java:2320)
at hex.Model$BigScore.map(Model.java:2295)
at water.MRTask.compute2(MRTask.java:836)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)

Expected behavior
MOJO models created with offset column should not fail as above when loaded into python env and supplied an offset column as required.

Steps to reproduce
import h2o
from h2o.estimators import H2OXGBoostEstimator

h2o.init()
import pandas as pd

Create a sample DataFrame

data = {
"numeric1": [1.0, 2.0, 3.0, 4.0, 5.0],
"numeric2": [5.0, 4.0, 3.0, 2.0, 1.0],
"categorical": ["A", "B", "A", "B", "A"],
"offset": [0.1, 0.2, 0.3, 0.4, 0.5],
"target": [10.0, 20.0, 30.0, 40.0, 50.0]
}

Convert the DataFrame to an H2OFrame

df = pd.DataFrame(data)
h2o_frame = h2o.H2OFrame(df)

Define the predictors and response

predictors = ["numeric1", "numeric2", "categorical", "offset"]
response = "target"

Convert the categorical column to a factor

h2o_frame["categorical"] = h2o_frame["categorical"].asfactor()

Specify the offset column

offset_column = "offset"

Initialize the H2O XGBoost model

xgb_model = H2OXGBoostEstimator(
ntrees=50,
max_depth=5,
learn_rate=0.1,
offset_column=offset_column
)

Train the model

xgb_model.train(x=predictors, y=response, training_frame=h2o_frame)

Print the model performance

Save the model as a binary model

binary_model_path = h2o.save_model(model=xgb_model, path="binary_model", force=True)
print(f"Binary model saved to: {binary_model_path}")

Save the model as a MOJO

mojo_model_path = xgb_model.save_mojo(path="mojo_model", force=True)
print(f"MOJO model saved to: {binary_model_path}")

Load the model from a binary file

binary_model = h2o.load_model(binary_model_path)
print(f"Loaded binary model")

Load the model from a MOJO file

mojo_model = h2o.upload_mojo(mojo_model_path)
print(f"Loaded MOJO model from: {mojo_model_path}")

h2o_frame['offset'] = 0
binary_predict = binary_model.predict(h2o_frame)
binary_predict

########

The following predict will throw an exception

'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset

########
mojo_predict = mojo_model.predict(h2o_frame)
#mojo_predict.shape
generic Model Build progress: |██████████████████████████████████████████████████| (done) 100%
Loaded MOJO model from: /Users/arun/mojo_model/XGBoost_model_python_1740518954413_8.zip
xgboost prediction progress: |███████████████████████████████████████████████████| (done) 100%
generic prediction progress: | (failed)

OSError Traceback (most recent call last)
Cell In[29], line 13
7 binary_predict
9 ########
10 # The following predict will throw an exception
11 # 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
12 ########
---> 13 mojo_predict = mojo_model.predict(h2o_frame)

File /opt/anaconda3/lib/python3.12/site-packages/h2o/model/model_base.py:334, in ModelBase.predict(self, test_data, custom_metric, custom_metric_func)
331 if not isinstance(test_data, h2o.H2OFrame): raise ValueError("test_data must be an instance of H2OFrame")
332 j = H2OJob(h2o.api("POST /4/Predictions/models/%s/frames/%s" % (self.model_id, test_data.frame_id), data = {'custom_metric_func': custom_metric_func}),
333 self._model_json["algo"] + " prediction")
--> 334 j.poll()
335 return h2o.get_frame(j.dest_key)

File /opt/anaconda3/lib/python3.12/site-packages/h2o/job.py:88, in H2OJob.poll(self, poll_updates)
86 if self.status == "FAILED":
87 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
---> 88 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
89 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
90 else:
91 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))

OSError: Job with key $03017f00000132d4ffffffff$_a42e026e7fb346acd0a318c6996a485f failed with an exception: DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
stacktrace:
DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at water.MRTask.getResult(MRTask.java:660)
at water.MRTask.getResult(MRTask.java:670)
at water.MRTask.doAll(MRTask.java:530)
at water.MRTask.doAll(MRTask.java:549)
at hex.Model.predictScoreImpl(Model.java:2161)
at hex.generic.GenericModel.predictScoreImpl(GenericModel.java:161)
at hex.Model.score(Model.java:2002)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:555)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1704)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at hex.genmodel.algos.xgboost.XGBoostMojoModel.score0(XGBoostMojoModel.java:88)
at hex.generic.GenericModel.score0(GenericModel.java:311)
at hex.generic.GenericModel.score0(GenericModel.java:317)
at hex.Model.score0(Model.java:2378)
at hex.Model$BigScore.score0(Model.java:2320)
at hex.Model$BigScore.map(Model.java:2295)
at water.MRTask.compute2(MRTask.java:836)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)
... 5 more
Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@Mathanraj-Sharma
Copy link
Member

@arunaryasomayajula I assume this is different issue from this #16592 which I am already working on?

@valenad1
Copy link
Collaborator

@Mathanraj-Sharma yes, it is different.

@Mathanraj-Sharma
Copy link
Member

@arunaryasomayajula, what is the old H2O3 version they used to create the mojo model?

@Mathanraj-Sharma
Copy link
Member

@arunaryasomayajula, if I am not wrong, this is where things go south

@Override
protected double[] score0(double[] data, double[] preds, double offset) {
if (offset == 0) // MOJO doesn't like when score0 is called with 0 offset for problems that were trained without offset
return score0(data, preds);
else
return genModel().score0(data, offset, preds);
}

When we pass 0.0 for offset, it routes the scoring to

public final double[] score0(double[] row, double[] preds) {
if (_hasOffset) {
throw new IllegalStateException("Model was trained with offset, use score0 with offset");
}
return score0(row, 0.0, preds);
}

Based on the comment // MOJO doesn't like when score0 is called with 0 offset for problems that were trained without offset , I believe this is made after considering the limitation from Mojo. I am unsure of the underlying reason; you were the ticket reporter: #7419 (comment). Do you know why?

I can change the logic to check whether the model is trained with offset, and if it is true, then allow zero values for offset. but I would like some clarity on the existing check, to decide which is best (making this change or asking customer to retrain the model without offset)

@arunaryasomayajula
Copy link
Author

The customer wants to be able to get the relative values of he response without the offset and needs to use 0 offset column at time of prediction. Please make this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cust-statefarm Mojo reporter-support Reported as a support issue by cuetomer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants