As content creators, we’re always looking for ways to optimise our workflow and automate the tedious aspects of web development. One such task is creating URL slugs. We all know the importance of a clean, readable slug for SEO and user experience, but what happens when you’re drafting a post with a title as long as a railroad track?

What is a URL slug?

A URL slug is the part of a web address that comes after the domain name, typically representing the title or subject of a page in a readable and SEO-friendly format. For example, in the URL www.example.com/slug-definition, the “slug-definition” part is the URL slug. It helps both search engines and users understand what the page is about before clicking on it. A good slug is concise, descriptive, and ideally free from unnecessary characters.

Traditionally, I’ve relied on a simple shell script to generate slugs. This script would lower-case the title, strip away everything except the letters, and insert dashes between the words. It’s straightforward and works well for shorter titles. But when faced with a lengthy headline, it quickly becomes cumbersome. The slug ends up just as long, which isn’t ideal. That’s when I thought, why not bring in some AI? What if I used natural language processing to extract the meaning from the title and generate a more concise, meaningful slug?

I initially considered summarising the title based on its semantic meaning, but I found the endeavour pretty ambitious for something as simple as a URL slug. So, I opted for a more practical solution: tokenising the title and extracting key words to create a slug that still conveys the main ideas, without being unnecessarily long.

Enter spaCy

To achieve this, I turned to spaCy, a powerful Python NLP library. I wrote a script that extracts the most important parts of speech – specifically nouns and verbs – from the title. After identifying these key terms, the script removes any duplicates and constructs the slug in two parts. The first half comes from the first few nouns and verbs, and the second half from the last few.

Here’s the twist: if the title is shorter than a certain number of words, the slug can be constructed from all of the words in the title and the shell script won’t call the NLP script at all. But if the title exceeds this threshold, the NLP script kicks in, distilling lengthy titles down into concise, meaningful slugs.

This method produces SEO-friendly slugs that not only shorten long titles but also focus on the core ideas of the post. It’s smarter than the basic lowercasing-and-dash approach, and handles complex titles with ease.

The result? Clean, effective slugs that incorporate a bit of AI into your workflow.

Check out the gist