Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools
. These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning
for more info.
Databricks Runtime 5.3 and 5.3 ML and above support automatic MLflow tracking for MLlib tuning in Python.
With this feature, PySpark
will automatically log to MLflow, organizing runs in a hierarchy and logging hyperparameters and the evaluation metric. For example, calling
will log one parent run. Under this run,
will log one child run for each hyperparameter setting, and each of those child runs will include the hyperparameter setting and the evaluation metric. Comparing these runs in the MLflow UI helps with visualizing the effect of tuning each hyperparameter.