DaveSouth.org

Email RSS Twitter Delicious

Make URL friendly filenames in paperclip attachments

We use Thoughtbot’s Paperclip gem to attach images and other media to our Ruby on Rails models. When we save documents to a model object, we wanted to make sure the filenames were URL friendly — lowercase with only letters, numbers or hyphens. Paperclip’s processing chain makes it easy to insert this behavior before saving the file or running it through the thumbnail re-sizer.

To set this up you need a working Ruby on Rails project, the Paperclip gem installed, and a model with has_attached_file. Read the Paperclip documents on how to set this up or use Jim Neath’s example project.

Transliterate

To convert the filename into a URL friendly format we have the transliterate method.

  def transliterate(str)
    # Based on permalink_fu by Rick Olsen

    # Escape str by transliterating to UTF-8 with Iconv
    s = Iconv.iconv('ascii//ignore//translit', 'utf-8', str).to_s

    # Downcase string
    s.downcase!

    # Remove apostrophes so isn't changes to isnt
    s.gsub!(/'/, '')

    # Replace any non-letter or non-number character with a space
    s.gsub!(/[^A-Za-z0-9]+/, ' ')

    # Remove spaces from beginning and end of string
    s.strip!

    # Replace groups of spaces with single hyphen
    s.gsub!(/\ +/, '-')

    return s
  end

This useful method is saved in a shared library in all our projects. Besides changing filenames, we also use it to generate permalinks from story headlines. You can put it in the same Paperclip enabled model or in a shared library if you will use it in more than one place.

Filename processing

In the paperclip enabled model set the before_post_process declaration.

  before_post_process :transliterate_file_name

Then in the private section of the model put transliterate_file_name

  def transliterate_file_name
    extension = File.extname(local_file_name).gsub(/^\.+/, '')
    filename = local_file_name.gsub(/\.#{extension}$/, '')
    self.local.instance_write(:file_name, "#{transliterate(filename)}.#{transliterate(extension)}")
  end

And that’s it. Paperclip will accept the new filename and then begin processing the attachment. So if it is an uploaded image, all subsequent generated thumbnails will be based on the transliterated filename.

Transliterate tests

We use these tests for the transliterate method.

  test 'transliterate should downcase and substitute spaces with dashes' do
    assert_equal 'this-is-a-story-headline', transliterate('This is a Story Headline')
  end

  test 'transliterate should remove apostrophes, punctuation, trailing characters' do
    assert_equal 'this-isnt-a-perfect-solution', transliterate(%Q{This isn't a "Perfect Solution."})
  end

  test 'transliterate should turn unicode characters into dashes' do
    assert_equal 'alpha-beta-gamma', transliterate('Alpha α, Beta β, Gamma γ')
  end
  
  test 'transliterate should turn underscores into dashes' do
    assert_equal 'change-underscores-to-dashes', transliterate('change_underscores_to_dashes')
  end

Filename transliteration test

We created a tiny image called “IT’s, UPPERCASE! AND WeIRD.JPG” and saved it to test/fixtures/files. Then we run this test on our paperclip enabled photograph model.

  test 'should transliterate the filename' do
    photograph = Photograph.new
    file = File.new(File.join(RAILS_ROOT, 'test', 'fixtures', 'files', %Q{IT's,  UPPERCASE!.JPG}), 'rb')
    photograph.local = file
    assert_equal 'it-s-uppercase.jpg', photograph.local.original_filename
    file.close
  end

Note that Paperclip does some basic filename transliteration of it’s own. It generally replaces non-letter or non-digit characters with underscores. It didn’t go far enough for us since it leaves unicode characters intact and can leave trailing underscores. This transliteration is more complete. The final filename can be used, without escape codes, in the URL.