存储架构

Validate and Fix Ruby on Rails ActiveRecord PostgreSQL Data Integrity

微信扫一扫,分享到朋友圈

Validate and Fix Ruby on Rails ActiveRecord PostgreSQL Data Integrity
0

Most Ruby developers work with Rails and Active Record for PostgreSQL database interactions. It provides a ton of magic and is simple to start with. Data integrity problems start creeping up once the code base and database structure gets older. In this blog post, I will describe a couple of techniques for ensuring data integrity and validation in Ruby on Rails web apps.

Your Rails app data integrity can get out of sync for various reasons. Active Record approach to database modeling encourages a developer to keep most of the logic in the app layer Ruby code. Let’s list some example actions that could corrupt your data state:

update_column
delete
save(validate: false)

Rails is not making it easy. Data in an invalid state might bite you very hard eg. when running a migration that would raise an exception in case of a validation error.

How to “Brute-force” detect validation errors

I want to share a dirty hack I’ve recently come up with. It might seem obvious and resorting to it look like a signal of a very poor codebase quality but bear with me.

It’s a Sidekiq worker class:

class ModelDataIntegrityCheckerJob
  include Sidekiq::Worker
  sidekiq_options retry: false

  def perform(klass_name)
    klass = klass_name.constantize
    invalid_objects = []

    klass.find_each do |object|
      unless object.valid?
        invalid_objects << [object.id, object.errors.full_messages]
      end
    end

    if invalid_objects.present?
      raise "Invalid state for #{klass} objects: #{invalid_objects}"
    end

    invalid_objects
  end
end

You can use it like that:

ModelDataIntegrityCheckerJob.perform_async(User.name)
    ModelDataIntegrityCheckerJob.perform_async(Course.name)

It will raise a runtime error with IDs of invalid objects from a given class. In turn you can fix them or corresponding validations manually. If your dataset is big, you might want to run those jobs at night or scope the range of objects validated.

It would seem like the brute force approach is not a perfect solution to validating data integrity. But can you with 100% certainty tell that every single one of your ActiveRecord objects is currently valid
?

I sometimes run those brute force checks before doing more complex database migrations to ensure that some sneaky invalid objects won’t blow up during the process. I like to think of them as a periodical data validation health checks.

Let’s list a couple of more elegant
techniques you can use to keep your app’s data in a more integrated and valid state.

NOT NULL
constraints

“Non-binary logic”
is a frequent guest in Ruby on Rails apps. I’ve seen many nasty bugs caused by a simple fact that:

false != nil

Sometimes the business logic itself was dependent on the fact nil
in a database meant something else than false
… (╯°□°)╯︵ ┻━┻

Adding a boolean field without a NOT NULL
constraint is a ticking bomb. Make sure to update all the boolean
fields in your database using the following migration:

change_column_default :users, :boolean_attribute, false
  change_column_null    :users, :boolean_attribute, false

Database foreign key constraints

Database level constraints are not a first-class citizen in Rails apps. You can create a database relation without adding a corresponding foreign key, and Active Record will work correctly. Building a database schema without enforcing it using built-in PostgreSQL mechanisms can lead to various issues down the road. It is usually a good idea to add a foreign key constraint when creating has_many/belongs_to
relation.

I always prefer to explicitly add foreign key, column, and indexes, instead of using add_reference
helper method. There’s enough magic in Active Record already.

Adding posts
relation to User
model can be done using the following migration:

add_column :posts, user_id: :integer, null: false
add_index :posts, :user_id
add_foreign_key :posts, :users, column: :user_id

With this database structure, it will be impossible to leave orphaned
post records in the database. Trying to remove a user, who still has some posts will result in an ActiveRecord::InvalidForeignKey
error.

To “delete cascade” or “dependent destroy” ?

We can ensure removing the children records when the parent is destroyed using Active Record callback (with dependent: :destroy
has_many
method option) or directly in the database. Adding on_delete: :cascade
to add_foreign_key
method will remove children using database level triggers. It will be much better in terms of performance but would not execute ActiveRecord callbacks for removed posts.

Despite all the hate those callbacks get they are still useful in certain cases. One example could be a well-known Carrierwave
gem. It uses callbacks to remove associated files when an object is destroyed. Removing objects using on_delete: :cascade
approach would result in orphaned files stuck forever without any reference on S3 or wherever you keep them.

My approach would be to use foreign keys for ensuring integrity, but stick with dependent: :destroy
for removing objects. Skipping callbacks could lead to surprising behavior if you are riding on Active Record for PostgreSQL interactions.

Summary

This post just scratches the surface of the complex topic of data integrity and validation in Ruby on Rails web apps. Active Record is a powerful tool that makes creating complex database schemas just too easy, without forcing developers to think about how it works under the hood. Moving at least some of the rules to database level is a good starting point to keeping your Rails apps healthy in the long run.

阅读原文...

微信扫一扫,分享到朋友圈

Validate and Fix Ruby on Rails ActiveRecord PostgreSQL Data Integrity
0
Avatar

智慧安防进校园,学生安全有保障

上一篇

Herblink 李东亮:企业运营七大指标及提升思路

下一篇

评论已经被关闭。

插入图片

热门分类

往期推荐

Validate and Fix Ruby on Rails ActiveRecord PostgreSQL Data Integrity

长按储存图像,分享给朋友