There are several good ways that you can make a jekyll site searchable. In the past I’ve used Lunr.js, (example here) or previously, Simple Jekyll Search. These are good options, but their success and function totally depends on how you parse the text content that will drive the search. For example, for the USRSE site, I thought I had stumbled on a fairly good solution:

{{ post.content | strip_html | replace_regex: "[\s/\n]+"," " | strip | jsonify }}

It could serve to have a strip_newlines tag, but the above does work well on GitHub pages in that the search works. But what has been bothering me? The fact that if you pass it into a Json object, it looks like this:

It’s totally gross. Depending on how you implement the search (with some kind of a preview?) the user might see this cruft appear in their search result.

A Better Solution?

I stumbled on a much cleaner solution this morning when I was looking for some easy way to use regular expressions or the replace filter. We can actually use the Jekyll slugify filter to do the work for us! It’s technically supposed to turn some stringy thing into a unique resource identifier, usually without spaces, but we can also use it to clean up html and illegal characters. So here is what I tried instead:

{{ post.content | strip_html | strip_newlines | slugify: 'ascii' | replace: '-',' ' }}

And it worked like a charm! Look how much cleaner the result is:

It’s almost working as a tokenizer, so the words are left without anything else. It’s also typically the case that the user searches with lowercase, so I’m not worried about that.

I wanted to get this online because it’s definitely something I’ll lose or forget about, and then want a place to look it up. And very likely someone else is also running into this issue, and will be thankful to find it.

And on Sunday, it was a day of pancakes. And boy, it was great!




Suggested Citation:
Sochat, Vanessa. "Jekyll Search Content." @vsoch (blog), 30 Jun 2019, https://vsoch.github.io/2019/jekyll-search/ (accessed 18 Nov 24).