-
Notifications
You must be signed in to change notification settings - Fork 2k
H2O3 Mojo model scoring fails in python when offset column is used #16590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@arunaryasomayajula I assume this is different issue from this #16592 which I am already working on? |
@Mathanraj-Sharma yes, it is different. |
@arunaryasomayajula, what is the old H2O3 version they used to create the mojo model? |
@arunaryasomayajula, if I am not wrong, this is where things go south h2o-3/h2o-algos/src/main/java/hex/generic/GenericModel.java Lines 315 to 321 in 49d6da5
When we pass h2o-3/h2o-genmodel-extensions/xgboost/src/main/java/hex/genmodel/algos/xgboost/XGBoostMojoModel.java Lines 86 to 91 in 49d6da5
Based on the comment I can change the logic to check whether the model is trained with offset, and if it is true, then allow zero values for offset. but I would like some clarity on the existing check, to decide which is best (making this change or asking customer to retrain the model without offset) |
The customer wants to be able to get the relative values of he response without the offset and needs to use 0 offset column at time of prediction. Please make this change. |
H2O version, Operating System and Environment
3.46.0.6
Actual behavior
We have a XGBoost model trained on an older version with offset that has been thru a lot of evaluation. We are planning to deploy this model. However, we are unable to predict using a mojo model after we zero out the offset column. We get the following error.
The model predict works when we use the binary model. Can you please take a look and let us know any alternate way to use the mojo model? I don’t think we can retrain the model at this point.
OSError: Job with key $03017f00000132d4ffffffff$_aa0f1f0d307fc3704b9cb49444844cd7 failed with an exception: DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
stacktrace:
DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at water.MRTask.getResult(MRTask.java:660)
at water.MRTask.getResult(MRTask.java:670)
at water.MRTask.doAll(MRTask.java:530)
at water.MRTask.doAll(MRTask.java:549)
at hex.Model.predictScoreImpl(Model.java:2161)
at hex.generic.GenericModel.predictScoreImpl(GenericModel.java:161)
at hex.Model.score(Model.java:2002)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:555)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1704)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at hex.genmodel.algos.xgboost.XGBoostMojoModel.score0(XGBoostMojoModel.java:88)
at hex.generic.GenericModel.score0(GenericModel.java:311)
at hex.generic.GenericModel.score0(GenericModel.java:317)
at hex.Model.score0(Model.java:2378)
at hex.Model$BigScore.score0(Model.java:2320)
at hex.Model$BigScore.map(Model.java:2295)
at water.MRTask.compute2(MRTask.java:836)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)
Expected behavior
MOJO models created with offset column should not fail as above when loaded into python env and supplied an offset column as required.
Steps to reproduce
import h2o
from h2o.estimators import H2OXGBoostEstimator
h2o.init()
import pandas as pd
Create a sample DataFrame
data = {
"numeric1": [1.0, 2.0, 3.0, 4.0, 5.0],
"numeric2": [5.0, 4.0, 3.0, 2.0, 1.0],
"categorical": ["A", "B", "A", "B", "A"],
"offset": [0.1, 0.2, 0.3, 0.4, 0.5],
"target": [10.0, 20.0, 30.0, 40.0, 50.0]
}
Convert the DataFrame to an H2OFrame
df = pd.DataFrame(data)
h2o_frame = h2o.H2OFrame(df)
Define the predictors and response
predictors = ["numeric1", "numeric2", "categorical", "offset"]
response = "target"
Convert the categorical column to a factor
h2o_frame["categorical"] = h2o_frame["categorical"].asfactor()
Specify the offset column
offset_column = "offset"
Initialize the H2O XGBoost model
xgb_model = H2OXGBoostEstimator(
ntrees=50,
max_depth=5,
learn_rate=0.1,
offset_column=offset_column
)
Train the model
xgb_model.train(x=predictors, y=response, training_frame=h2o_frame)
Print the model performance
Save the model as a binary model
binary_model_path = h2o.save_model(model=xgb_model, path="binary_model", force=True)
print(f"Binary model saved to: {binary_model_path}")
Save the model as a MOJO
mojo_model_path = xgb_model.save_mojo(path="mojo_model", force=True)
print(f"MOJO model saved to: {binary_model_path}")
Load the model from a binary file
binary_model = h2o.load_model(binary_model_path)
print(f"Loaded binary model")
Load the model from a MOJO file
mojo_model = h2o.upload_mojo(mojo_model_path)
print(f"Loaded MOJO model from: {mojo_model_path}")
h2o_frame['offset'] = 0
binary_predict = binary_model.predict(h2o_frame)
binary_predict
########
The following predict will throw an exception
'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
########
mojo_predict = mojo_model.predict(h2o_frame)
#mojo_predict.shape
generic Model Build progress: |██████████████████████████████████████████████████| (done) 100%
Loaded MOJO model from: /Users/arun/mojo_model/XGBoost_model_python_1740518954413_8.zip
xgboost prediction progress: |███████████████████████████████████████████████████| (done) 100%
generic prediction progress: | (failed)
OSError Traceback (most recent call last)
Cell In[29], line 13
7 binary_predict
9 ########
10 # The following predict will throw an exception
11 # 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
12 ########
---> 13 mojo_predict = mojo_model.predict(h2o_frame)
File /opt/anaconda3/lib/python3.12/site-packages/h2o/model/model_base.py:334, in ModelBase.predict(self, test_data, custom_metric, custom_metric_func)
331 if not isinstance(test_data, h2o.H2OFrame): raise ValueError("test_data must be an instance of H2OFrame")
332 j = H2OJob(h2o.api("POST /4/Predictions/models/%s/frames/%s" % (self.model_id, test_data.frame_id), data = {'custom_metric_func': custom_metric_func}),
333 self._model_json["algo"] + " prediction")
--> 334 j.poll()
335 return h2o.get_frame(j.dest_key)
File /opt/anaconda3/lib/python3.12/site-packages/h2o/job.py:88, in H2OJob.poll(self, poll_updates)
86 if self.status == "FAILED":
87 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
---> 88 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
89 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
90 else:
91 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))
OSError: Job with key $03017f00000132d4ffffffff$_a42e026e7fb346acd0a318c6996a485f failed with an exception: DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
stacktrace:
DistributedException from /127.0.0.1:54321: 'Model was trained with offset, use score0 with offset', caused by java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at water.MRTask.getResult(MRTask.java:660)
at water.MRTask.getResult(MRTask.java:670)
at water.MRTask.doAll(MRTask.java:530)
at water.MRTask.doAll(MRTask.java:549)
at hex.Model.predictScoreImpl(Model.java:2161)
at hex.generic.GenericModel.predictScoreImpl(GenericModel.java:161)
at hex.Model.score(Model.java:2002)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:555)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1704)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.IllegalStateException: Model was trained with offset, use score0 with offset
at hex.genmodel.algos.xgboost.XGBoostMojoModel.score0(XGBoostMojoModel.java:88)
at hex.generic.GenericModel.score0(GenericModel.java:311)
at hex.generic.GenericModel.score0(GenericModel.java:317)
at hex.Model.score0(Model.java:2378)
at hex.Model$BigScore.score0(Model.java:2320)
at hex.Model$BigScore.map(Model.java:2295)
at water.MRTask.compute2(MRTask.java:836)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1707)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1703)
... 5 more
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: