Skip to content

Commit fd9a731

Browse files
jazairiJPrevost
andcommitted
Refactor SuggestedResource to leverage new Fingerprint model
Why these changes are being introduced: We [recently decided](#145) to make a separate Fingerprint model, associated with Term, as multiple detectors are likely to use fringerprinting (implemented in #138). We have also begun to split the ActiveRecord components of detectors into separate models (implemented for Detector::Journal in #162). Relevant ticket(s): * [TCO-111](https://mitlibraries.atlassian.net/browse/TCO-111) * [TCO-122](https://mitlibraries.atlassian.net/browse/TCO-122) How this addresses that need: * Splits the ActiveRecord components of Detector::SuggestedResource into a separate SuggestedResource model. * Associates SuggestedResource with Fingerprint, via Term, such that a suggested resource can have multiple terms and fingerprints. * Removes the suggested resource dashboard (see side effects). Side effects of this change: * Config has been adjusted to allow for development logging. This was lost in the most recent Rails upgrade. * Terms that are associated with a suggested resource should not be destroyed. Rails does not allow the `:dependent` option on `belongs_to` associations, so this commit instead adds a `before_destroy` hook with a custom method that aborts the callback and logs the attempt in Sentry. * Because administrate does not handle has_many relationships well, we will need to build a custom dashboard to manage suggested resources. This is ticketed as [TCO-145](https://mitlibraries.atlassian.net/browse/TCO-145). Until that UI is ready, we will use the Rails console to make any requested changes to suggested resources. co-authored-by: Jeremy Prevost <[email protected]>
1 parent c94bf23 commit fd9a731

38 files changed

+504
-361
lines changed

README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,13 @@ config:
4747
webroot: .
4848
```
4949
50+
We use Lando here because its use in our WordPress environment. However, any static local webserver will work.
51+
5052
If you need to regenerate these cassettes, the following procedure should be sufficient:
5153
52-
1. Use the configuration above to ensure the needed files are visible at `http://static.lndo.site/filename.ext`.
54+
1. Use the configuration above to ensure the needed files are visible at `http://static.lndo.site/filename.ext` (i.e.,
55+
run `lando start` in `tacos/test/fixtures/files`). If you are using a server other than Lando, configure it such that
56+
`tacos/test/fixtures/files` is the root directory, then start the server.
5357
2. Delete any existing cassette files which need to be regenerated.
5458
3. Run the test(s).
5559
4. Commit the resulting files along with your other work.

app/dashboards/detector/suggested_resource_dashboard.rb

-75
This file was deleted.

app/models/detector/suggested_resource.rb

+3-86
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,10 @@
11
# frozen_string_literal: true
22

3-
# == Schema Information
4-
#
5-
# Table name: detector_suggested_resources
6-
#
7-
# id :integer not null, primary key
8-
# title :string
9-
# url :string
10-
# phrase :string
11-
# fingerprint :string
12-
# created_at :datetime not null
13-
# updated_at :datetime not null
14-
#
15-
163
require 'stringex/core_ext'
174

185
class Detector
19-
# Detector::SuggestedResource stores custom hints that we want to send to the
20-
# user in response to specific strings. For example, a search for "web of
21-
# science" should be met with our custom login link to Web of Science via MIT.
22-
class SuggestedResource < ApplicationRecord
23-
before_save :update_fingerprint
24-
25-
def self.table_name_prefix
26-
'detector_'
27-
end
28-
29-
# This exists for the before_save lifecycle hook to call the calculate_fingerprint method, to ensure that these
30-
# records always have a correctly-calculated fingerprint. It has no arguments and returns nothing.
31-
def update_fingerprint
32-
self.fingerprint = Detector::SuggestedResource.calculate_fingerprint(phrase)
33-
end
34-
35-
# This implements the OpenRefine fingerprinting algorithm. See
36-
# https://openrefine.org/docs/technical-reference/clustering-in-depth#fingerprint
37-
#
38-
# @param old_phrase [String] A text string which needs to have its fingerprint calculated. This could either be the
39-
# "phrase" field on the SuggestedResource record, or an incoming search term received from a contributing system.
40-
#
41-
# @return [String] A string of all words in the input, downcased, normalized, and alphabetized.
42-
def self.calculate_fingerprint(old_phrase)
43-
modified_phrase = old_phrase
44-
modified_phrase = modified_phrase.strip
45-
modified_phrase = modified_phrase.downcase
46-
47-
# This removes all punctuation and symbol characters from the string.
48-
modified_phrase = modified_phrase.gsub(/\p{P}|\p{S}/, '')
49-
50-
# Normalize to ASCII (e.g. gödel and godel are liable to be intended to
51-
# find the same thing)
52-
modified_phrase = modified_phrase.to_ascii
53-
54-
# Coercion to ASCII can introduce new symbols, so we remove those now.
55-
modified_phrase = modified_phrase.gsub(/\p{P}|\p{S}/, '')
56-
57-
# Tokenize
58-
tokens = modified_phrase.split
59-
60-
# Remove duplicates and sort
61-
tokens = tokens.uniq
62-
tokens = tokens.sort
63-
64-
# Rejoin tokens
65-
tokens.join(' ')
66-
end
67-
68-
# This replaces all current Detector::SuggestedResource records with a new set from an imported CSV.
69-
#
70-
# @note This method is called by the suggested_resource:reload rake task.
71-
#
72-
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
73-
# at least three headers, named "Title", "URL", and "Phrase". Please note: these values
74-
# are case sensitive.
75-
def self.bulk_replace(input)
76-
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)
77-
78-
# Need to check what columns exist in input
79-
required_headers = %w[Title URL Phrase]
80-
missing_headers = required_headers - input.headers
81-
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?
82-
83-
Detector::SuggestedResource.delete_all
84-
85-
input.each do |line|
86-
record = Detector::SuggestedResource.new({ title: line['Title'], url: line['URL'], phrase: line['Phrase'] })
87-
record.save
88-
end
89-
end
90-
6+
# Detector::SuggestedResource handles detections for SuggestedResource records.
7+
class SuggestedResource
918
# Identify any SuggestedResource record whose pre-calculated fingerprint matches the fingerprint of the incoming
929
# phrase.
9310
#
@@ -98,7 +15,7 @@ def self.bulk_replace(input)
9815
#
9916
# @return [Detector::SuggestedResource] The record whose fingerprint matches that of the search term.
10017
def self.full_term_match(phrase)
101-
SuggestedResource.where(fingerprint: calculate_fingerprint(phrase))
18+
::SuggestedResource.joins(:fingerprints).where(fingerprints: { value: Fingerprint.calculate(phrase) })
10219
end
10320

10421
# Look up any matching Detector::SuggestedResource records, building on the full_term_match method. If a match is

app/models/suggested_resource.rb

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# frozen_string_literal: true
2+
3+
# SuggestedResource stores custom hints that we want to send to the
4+
# user in response to specific strings. For example, a search for "web of
5+
# science" should be met with our custom login link to Web of Science via MIT.
6+
class SuggestedResource < ApplicationRecord
7+
has_many :terms, dependent: :nullify
8+
has_many :fingerprints, through: :terms, dependent: :nullify
9+
10+
# This replaces all current SuggestedResource records with a new set from an imported CSV.
11+
#
12+
# @note This method is called by the suggested_resource:reload rake task.
13+
#
14+
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
15+
# at least three headers, named "Title", "URL", and "Phrase". Please note: these values
16+
# are case sensitive.
17+
def self.bulk_replace(input)
18+
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)
19+
20+
# Need to check what columns exist in input
21+
required_headers = %w[title url phrase]
22+
missing_headers = required_headers - input.headers
23+
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?
24+
25+
SuggestedResource.destroy_all
26+
27+
input.each do |line|
28+
term = Term.find_or_create_by(phrase: line['phrase'])
29+
30+
# check for existing SuggestedResource with the same title/url
31+
dup_check = SuggestedResource.where(title: line['title'], url: line['url'])
32+
33+
# link to existing SuggestedResource if one exists
34+
term.suggested_resource = if dup_check.count.positive?
35+
dup_check.first
36+
# create a new SuggestedResource if it doesn't exist
37+
else
38+
SuggestedResource.new({ title: line['title'], url: line['url'] })
39+
end
40+
term.save
41+
end
42+
end
43+
end

app/models/term.rb

+13
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,10 @@ class Term < ApplicationRecord
2020
has_many :categorizations, dependent: :destroy
2121
has_many :confirmations, dependent: :destroy
2222
belongs_to :fingerprint, optional: true
23+
belongs_to :suggested_resource, optional: true
2324

2425
before_save :register_fingerprint
26+
before_destroy :check_suggested_resource
2527
after_destroy :check_fingerprint_count
2628

2729
scope :categorized, -> { where.associated(:categorizations).distinct }
@@ -104,6 +106,17 @@ def check_fingerprint_count
104106
fingerprint.destroy if fingerprint&.terms&.count&.zero?
105107
end
106108

109+
# This is called before_destroy to avoid orphaning SuggestedResource records. Deleting terms should be an unlikely
110+
# event, so this should come up rarely. If it does, it warrants the extra care to delete the record manually in the
111+
# Rails console.
112+
def check_suggested_resource
113+
return unless suggested_resource
114+
115+
Rails.logger.error('Cannot delete term with associated suggested resource')
116+
Sentry.capture_message('Cannot delete term with associated suggested resource')
117+
throw :abort
118+
end
119+
107120
# This method looks up all current detections for the given term, and assembles their confidence scores in a format
108121
# usable by the calculate_categorizations method. It exists to transform data like:
109122
# [{3=>0.91}, {1=>0.1}] and [{3=>0.95}]

app/views/layouts/_site_nav.html.erb

-3
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,6 @@
1919
<% if can? :view, :playground %>
2020
<%= link_to('Playground', '/playground', class: 'nav-item') %>
2121
<% end %>
22-
<% if can? :manage, :detector__suggested_resource %>
23-
<%= link_to('Suggested Resources', admin_detector_suggested_resources_path, class: 'nav-item') %>
24-
<% end %>
2522
<% if can? :view, Categorization %>
2623
<%= link_to('Categorizations', admin_categorizations_path, class: 'nav-item') %>
2724
<% end %>

config/environments/development.rb

+4
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,10 @@
7676
# Raise error when a before_action's only/except options reference missing actions.
7777
config.action_controller.raise_on_missing_callback_actions = true
7878

79+
# Local logging overrides
80+
config.logger = Logger.new(STDOUT)
81+
config.log_level = :debug
82+
7983
# Apply autocorrection by RuboCop to files generated by `bin/rails generate`.
8084
# config.generators.apply_rubocop_autocorrect_after_generate!
8185
end

config/routes.rb

-5
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,6 @@
55
end
66

77
namespace :admin do
8-
# Lookup-style detector records
9-
namespace :detector do
10-
resources :suggested_resources
11-
end
12-
138
# Knowledge graph models
149
resources :detectors
1510
resources :detector_categories
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
class CreateSuggestedResources < ActiveRecord::Migration[7.1]
2+
def change
3+
create_table :suggested_resources do |t|
4+
t.string :title
5+
t.string :url
6+
7+
t.timestamps
8+
end
9+
end
10+
end
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
class DropDetectorSuggestedResources < ActiveRecord::Migration[7.1]
2+
def up
3+
drop_table :detector_suggested_resources
4+
end
5+
6+
def down
7+
create_table :detector_suggested_resources do |t|
8+
t.string :title
9+
t.string :url
10+
t.string :phrase
11+
t.string :fingerprint
12+
13+
t.timestamps
14+
end
15+
add_index :detector_suggested_resources, :phrase, unique: true
16+
add_index :detector_suggested_resources, :fingerprint, unique: true
17+
end
18+
end
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
class AddSuggestedResourceToTerms < ActiveRecord::Migration[7.1]
2+
def change
3+
add_reference :terms, :suggested_resource
4+
add_foreign_key :terms, :suggested_resources, on_delete: :nullify
5+
end
6+
end

db/schema.rb

+11-12
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)