Dead simple Rails monitoring

13th of March, 2021
rails, ruby, ops

Observability, performance monitoring and instrumentation. Big words. There are a lot of great tools keep tabs on Ruby on Rails applications. Yet, sometimes a little less is more. Let's create a dead simple solution...
by Tom Rothe

Context & Why

We are working on a Rails application which processes highly sensitive information. This system can not be accessed outside of a VPN. We try to minimize dependencies like including gems and we do not use any external services.

The application has grown for a few years now and performance was never really an issue. The virtual machine hummed away with single digit cpu percentages. Yet, sometimes we saw spikes that worried us. Who was the culprit? A delayed job? A long running request? We had to introduce some sort of monitoring…

Due to the sensitive nature of the data, we decided not to go for one of the standard monitoring products. We brewed our own solution and it was surprisingly simple and rewarding.

Getting the data

Rails offers a very convenient Instrumentation API. We can register to events and throw them into the database. Let’s create a database table, a model and an initializer:

# db/migrate/20210313112100_create_monitoring_measurements.rb
class CreateMonitoringMeasurements < ActiveRecord::Migration[5.2]
  def change
    create_table :monitoring_measurements do |t|
      t.string :type, null: false, index: true
      t.timestamp :recorded_at
      t.string :subject
      t.float :value
      t.jsonb :details
    end
  end
end

# models/monitoring/measurement.rb
class Monitoring::Measurement < ActiveRecord::Base
end

# models/monitoring/request.rb
class Monitoring::Request < Monitoring::Measurement
  store_accessor :details, :view_time, :db_time
end

# config/initializers/monitoring.rb
ActiveSupport::Notifications.subscribe 'process_action.action_controller' do |name, started, finished, unique_id, data|
  Monitoring::Request.create!(
      recorded_at: started,
      subject: "#{data[:controller]}##{data[:action]}",
      value: finished - started,
      view_time: data[:view_runtime]&./(1000),
      db_time: data[:db_runtime]&./(1000)
    )
rescue => e
  # notify exception
end

What have we done? We created a single table inheritance model where the type field decides which kind of event we measure. This will come in handy when we look beyond requests and start logging jobs and mails. In the case of a Request, the value is the amount of seconds the request took.

The initializer has 10 lines and does the heavy lifting: it records every single controller action. Here an example of data in the table:

id | type                | recorded_at         | subject               | value | details
---|---------------------|---------------------|-----------------------|-------|----------------------------------------
 1 | Monitoring::Request | 2021-03-13 11:33:00 | MyController#show     | 6.983 | {"db_time": 6.318, "view_time": 0.005}
 2 | Monitoring::Request | 2021-03-13 11:33:03 | OtherController#index | 5.049 | {"db_time": 3.902, "view_time": 0.159}
 3 | Monitoring::Request | 2021-03-13 11:33:05 | MyController#destrot  | 0.433 | {"db_time": 0.333, "view_time": 0.085}

Wow! we have only written around 30 lines of code and we can already discuss slow running requests.

For the sake of brevity, we will not include all request types, but please checkout this gist for adding DelayedJob (using ActiveJob should be even simpler), ActionMail and database table sizes.

Viewing the data

After we had accumulated a few measurements we went spelunking in the database. It was quite a bit of fun to write a simple queries and gain valuable insights. We came to appreciate the power and flexibility of writing our own SQL. Since the monitoring interface would only be used by us developers, we decided to allow ourselves a geeky frontend.

# controllers/monitoring/dashboard_controller.rb
class Monitoring::DashboardController < ApplicationController
  def index
    @query = params[:form][:query].presence || 'SELECT * FROM monitoring_measurements ORDER BY id DESC LIMIT 100'
    ActiveRecord::Base.transaction do
      @result = ActiveRecord::Base.connection.execute(query)
      @result.map_types!(PG::BasicTypeMapForResults.new(ActiveRecord::Base.connection.raw_connection)) # make sure we get type casted fields
      raise ActiveRecord::Rollback # make sure that we can not do any harm
    end
    rescue ActiveRecord::StatementInvalid => e
      @errors = e.to_s
    end

    render partial: 'results', layout: false if request.xhr? # default to index action unless AJAX
  end
end

-# views/monitoring/dashboard/index.html.haml
%h1 Monitoring

= form_for :form, url: monitoring_dashboard_path, remote: true, html: { id: 'query-form' } do |f|
  = f.text_area :query
  = f.submit 'Execute'

%h2 Results

#results= render partial: 'results'

:javascript
  form = $("#query-form")
  results = $("#results")
  form.on("ajax:success", (event, data) => { results.html(data) })
  form.on("ajax:error", (event) => { alert('something went really wrong!') });

-# views/monitoring/dashboard/_results.html.haml
- if @errors.blank? && @result.present?
  %table
    %thead
      %tr
        - @result.fields.each do |caption|
          %th= caption.titleize
    %tbody
      - @result.each do |row|
        %tr
          - row.each do |_field, value|
            %td= value
- else
  %p.text-error= @error

The querying of the database is straight forward and the rollback at the end of the transaction ensures that we operate in read-only mode (so we can not accidentally delete the whole database). The type mapping was a little tricky to get right, because we had to dig around in the PostgreSQL adapter, but it worked eventually.

The neat thing about this view is that it works asynchronously (mind the remote: true in the form declaration). When we we submit the form via AJAX, the controller action only renders the partial and returns the resulting HTML. This then replaces the results div.

Extensions

We added a few more things such as

a cleanup job which removes data older than 3 months
templates to store common queries (+20 lines of code)
other measures such as email, jobs and table sizes (see here)
and a dead simple visualization (if the second column contains a number, use it as y-value for a bar chart, +10 lines of code)

Wrapping up

With less than 100 lines, we have a monitoring solution. Sure, it does not have all the bells and whisles, but it works and gives a rough idea where to start an investigation when things get slow. We can query for arbitrary things, because we are not limited to the instrumentation data. E.g. for a shop system, we could query for the “number of sign-ups last week”, “how many processed orders in March”, “number of hits on the article page with id 8781”.

We did not use any external services, we did not have to add any dependencies and the code is super easy to reason about. Amazing what you can do with Rails!

Please send us an email and tell us what you think: email us.

P.S.: We removed some structural improvements (such as form objects) to focus the code on what counts.