Skip to main content

Bigquery ML Project Phases

There will be 5 key phases of a machine learning project.

  1. Extract, transform and load data into BigQuery
  2. Select and preprocess features
  3. Create the model inside BigQuery
  4. Evaluate the performance of the trained model
  5. Use the model to make the predictions

BigQuery ML project phases

1. Extract, transform and load data into BigQuery

In Phase 1 we extract, transform and load data into BigQuery if it isn't there already. If you're already using other Google products like YouTube for example, look out for easy connectors to get the data into BigQuery before you build your own pipeline.

You can enrich your existing data warehouse with other data sources by using SQL joins.

2. Select and preprocess features

In Phase 2, you select and preprocessed features. You can use SQL to create the training dataset for the model to learn from. You'll recall that BigQuery ML does some of the preprocessing for you, like one-hot encoding of your categorical variables. One-hot encoding converts your categorical data into numeric data that is required by a training model.

3. Create the model inside BigQuery

In phase 3 you create the model inside BigQuery. This is done by using the create model command, give it a name, specify the model type and pass it in a sequel query with your training dataset, from there you can run the query.

Use the "CREATE MODEL" command.

# standardSQL

CREATE OR REPLACE MODEL
ecommerce.classification

OPTIONS
  (
model_type = 'logistic_reg',
input_label_cols =
  ['will_buy_later']
) AS

# SQL query with training data

4. Evaluate the performance of the trained model

In phase 4 after your model is trained you can execute an ML dot evaluate query to evaluate the performance of the trained model on your evaluation dataset. It's here that you can analyze lost metrics like a root mean squared error for forecasting models and area under the curve accuracy, precision and recall for classification models.

Execute an ML.EVALUATE query

# standardSQL

SELECT
  roc_auc,
  accuracy,
  precision,
  recall
FROM
  ML.EVALUATE (MODEL `ecommerce.classification`)

# SQL query with evaluation data

5. Use the model to make the predictions

In Phase 5, the final phase when you're happy with your model performance, you can then use it to make predictions. To do so invoke that ML dot predict command on your newly trained model to return with predictions and the model's confidence in those predictions. With the results your label field will have predicted added to the field name. This is your model's prediction for that label.

Invoke the ml.PREDICT command.

# standardSQL

SELECT * FROM
ML.PREDICT
(MODEL
ecommerce.classification)

# SQL query with test data.

Post Tags: