Make URL friendly filenames in paperclip attachments
We use Thoughtbot’s Paperclip gem to attach images and other media to our Ruby on Rails models. When we save documents to a model object, we wanted to make sure the filenames were URL friendly — lowercase with only letters, numbers or hyphens. Paperclip’s processing chain makes it easy to insert this behavior before saving the file or running it through the thumbnail re-sizer.
To set this up you need a working Ruby on Rails project, the Paperclip gem installed, and a model with has_attached_file. Read the Paperclip documents on how to set this up or use Jim Neath’s example project.
Transliterate
To convert the filename into a URL friendly format we have the transliterate method.
def transliterate(str)
# Based on permalink_fu by Rick Olsen
# Escape str by transliterating to UTF-8 with Iconv
s = Iconv.iconv('ascii//ignore//translit', 'utf-8', str).to_s
# Downcase string
s.downcase!
# Remove apostrophes so isn't changes to isnt
s.gsub!(/'/, '')
# Replace any non-letter or non-number character with a space
s.gsub!(/[^A-Za-z0-9]+/, ' ')
# Remove spaces from beginning and end of string
s.strip!
# Replace groups of spaces with single hyphen
s.gsub!(/\ +/, '-')
return s
end
This useful method is saved in a shared library in all our projects. Besides changing filenames, we also use it to generate permalinks from story headlines. You can put it in the same Paperclip enabled model or in a shared library if you will use it in more than one place.
Filename processing
In the paperclip enabled model set the before_post_process declaration.
before_post_process :transliterate_file_name
Then in the private section of the model put transliterate_file_name
def transliterate_file_name
extension = File.extname(local_file_name).gsub(/^\.+/, '')
filename = local_file_name.gsub(/\.#{extension}$/, '')
self.local.instance_write(:file_name, "#{transliterate(filename)}.#{transliterate(extension)}")
end
And that’s it. Paperclip will accept the new filename and then begin processing the attachment. So if it is an uploaded image, all subsequent generated thumbnails will be based on the transliterated filename.
Transliterate tests
We use these tests for the transliterate method.
test 'transliterate should downcase and substitute spaces with dashes' do
assert_equal 'this-is-a-story-headline', transliterate('This is a Story Headline')
end
test 'transliterate should remove apostrophes, punctuation, trailing characters' do
assert_equal 'this-isnt-a-perfect-solution', transliterate(%Q{This isn't a "Perfect Solution."})
end
test 'transliterate should turn unicode characters into dashes' do
assert_equal 'alpha-beta-gamma', transliterate('Alpha α, Beta β, Gamma γ')
end
test 'transliterate should turn underscores into dashes' do
assert_equal 'change-underscores-to-dashes', transliterate('change_underscores_to_dashes')
end
Filename transliteration test
We created a tiny image called “IT’s, UPPERCASE! AND WeIRD.JPG” and saved it to test/fixtures/files. Then we run this test on our paperclip enabled photograph model.
test 'should transliterate the filename' do
photograph = Photograph.new
file = File.new(File.join(RAILS_ROOT, 'test', 'fixtures', 'files', %Q{IT's, UPPERCASE!.JPG}), 'rb')
photograph.local = file
assert_equal 'it-s-uppercase.jpg', photograph.local.original_filename
file.close
end
Note that Paperclip does some basic filename transliteration of it’s own. It generally replaces non-letter or non-digit characters with underscores. It didn’t go far enough for us since it leaves unicode characters intact and can leave trailing underscores. This transliteration is more complete. The final filename can be used, without escape codes, in the URL.