I am still in the process of transitioning my blog from WordPress to Hugo. The content has been moved, but I see that WordPress has created quite a mess of files. Here’s a simple way to identify and remove unused images using a few command-line tools.

1. Find All Used Images in Your Content

First, extract all image references from your Markdown files. This command searches for Markdown image links and outputs a sorted, unique list:

grep -hroP '\!\[.*\]\(\K[^")]*' content/ | sed 's/"[^"]*"//g' | sort -u > used_images.txt
  • grep -hroP '\!\[.*\]\(\K[^")]*' content/ finds all image paths in your content.
  • sed 's/"[^"]*"//g' cleans up any extra attributes.
  • sort -u sorts and removes duplicates.

2. List All Image Files in Your Static Folder

Next, list every image file in your static directory:

find ./static -type f \( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" -o -name "*.gif" -o -name "*.svg" \) -printf "/%P\n" | sort -u > all_images.txt

This command finds all common image types and outputs their paths relative to the static folder.

3. Compare Used and Unused Images

Now, compare the two lists to find images that are present in your static folder but not referenced in your content:

comm -23 all_images.txt used_images.txt > unused_images.txt
  • comm -23 outputs lines only in all_images.txt (i.e., unused images).

4. Review and Clean Up

Open unused_images.txt to review which images are safe to delete.