Using Optimizer Hints in Rails


Using Optimizer Hints in Rails

Query Plan

An execution plan describes a recommended method of execution for a SQL statement.
The query optimizer attempts to generate the most optimal execution plan for a SQL statement.

The optimizer chooses the plan with the lowest cost in terms of I/O,
CPU, and communication.
The database gathers these statistics
about our objects like the number of rows,
distinct values, and data distribution.
The optimizer has such internal statistics
and tools at its disposal
which are used to calculate the said cost and pick an execution plan.

Optimizer Hints

The Optimizer Hints give us the opportunity, to influence the optimizer.
We, as an application architect, might know information about our data that the optimizer does not.
And that knowledge, on some occasions, can help us to choose a more optimized execution plan than the optimizer.

Before Rails 6

To use Optimizer Hints before Rails 6, we had to use raw SQL queries.


pry(main)> sql = "Select /*+ FULL(students) */ first_name
pry(main)>*       From students
pry(main)>*       Where year = 1;"
=> "Select /*+ FULL(students) */ first_namenFrom studentsnWhere year = 1;n"
pry(main)> ActiveRecord::Base.connection.execute(sql)
(0.4ms)  Select /*+ FULL(students) */ first_name
From students
Where year = 1;
=> #<PG::Result:0x00007fe47e521c90 status=PGRES_TUPLES_OK ntuples=12 nfields=1 cmd_tuples=18>

In the example above, the +
before the comment tells the database server that this is a hint.
hint instructs that a full table scan should be performed on the students table.

Rails 6

Rails 6 adds
support for setting optimizer hints on databases.

We can use ActiveRecord::QueryMethods#optimizer_hints
to specify optimizer hints to be used in the SELECT statement.



pry(main)> User.optimizer_hints("MAX_EXECUTION_TIME(10000)", "INDEX_MERGE(students ssn, PRIMARY)")
User Load (0.2ms)  SELECT /*+ MAX_EXECUTION_TIME(10000) INDEX_MERGE(students ssn, PRIMARY) */ "users".* FROM "users" LIMIT ?  [["LIMIT", 11]]

For PostgreSQL

pry(main)> User.joins(:comments).where(published: true).optimizer_hints("MergeJoin(users comments)", "SeqScan(users)")
User Load (0.2ms)  SELECT /*+ MAX_EXECUTION_TIME(10000) SeqScan(users) */ "users".* FROM "users" LIMIT ?  [["LIMIT", 11]]

In the example above, we can see that the query has our optimizer hints included.

Example from our Application

Let us discuss a live use-case of optimizer hints in one of our Rails App in production.

In one of our client’s project,
we were facing very long query times
because we had to query a table with millions of records.

The solution was decided to be a timeout after 10 seconds on the user facing query,
and run the query in background and cache for larger results.

We implemented as below:

1. Using the timeout

# Return result from cache for large results
Timeout::timeout(SPECIFIC_CACHE_TIMEOUT) do
# query with long execution time
rescue Timeout::Error => exception
# Enqueue Job to cache the result and us in future

In the snippet above, we are using ruby’s timeout
from the stdlib

The timeout
method allows us to terminate any operation taking
longer than the specified duration.

This turned out to not be a viable solution,
as it stops the execution of the block,
but does not
stop the SQL query if it has already started,
which was our intention, to begin with.

2. Using MySQL system variable max_execution_time

def timeout_queries(milliseconds:)
connection = ActiveRecord::Base.connection
connection.execute("set max_execution_time = #{miliseconds}")
rescue ActiveRecord::StatementInvalid => e
# query timed out
# reset to original max_execution_time

In the snippet above, we use the timeout_queries method
to set the MySQL server system variable max_execution_time

For every query that is wrapped around timeout_queries
method call,
we set the value of max_execution_time
which results in query timing out
if its execution time reaches milliseconds value.

We perform this extra overhead,
so that we don’t have to rewrite all our existing queries with raw SQL.

Note:This only works on SELECT

Use Optimizer Hints with Rails 6 for timeout

The above problem can now be fixed using Optimizer Hints instead of writing out the timeout method.
MySQL provides an optimizer hint named MAX_EXECUTION_TIME

This limits the query to not be executed for more than the specified time limit N
(in milliseconds).
This optimizer hint takes precedence over the value of the MySQL server system variable max_execution_time
Which means that even if there is a positive value of max_execution_time
already set, the execution time limit given in the optimizer hint would be honored.

Here’s how we can achieve the same in Rails 6:


In the snippet above, the query will be executed for a maximum of 10 seconds.

As we can see, this eliminates the need for setting the value of the MySQL server system variable max_execution_time
before every query as well as not using raw SQL.

Note:This only works on SELECT


We saw what optimizer hints are and how we can use them in our queries.
However, more often than not, we don’t require them.
Modern-day databases do not encourage their use since with every upgrade,
or a new patch, things are likely to go wrong.

However, we also saw a live example where using an optimizer hint,
could help us achieve a particular behavior which was earlier done
in a much more passive way.



OPPO Ace2鲁大师流畅度跑分实测:186.02分 Android机型排行第一



Using Optimizer Hints in Rails