Case Study: How I used PDF Index Generator software for a paperback anthology

PDF index generator software writing queries

Researching index generating software

A couple of months ago I researched different Index generation software options for a couple of authors I was mentoring who were self-publishing a Singapore expat travel guide, and I wrote an Indexing summary article about my research findings.

After I’d done the research I realized it was also useful for an Alzheimer’s anthology collaboration project I was working on. As well as taking on the role of project manager, and internal content formatter for the e-book and paperback (in Vellum), I decided to commit to learning one of the indexing software options. So after trying the demo version of PDF Index Generator (which lets you index 10 pages), I bought the software to use on all of my paperback projects.

table of contents example

Do you need a index if you have a table of content?

I have purchased quite a few nonfiction books written by indie authors recently about self-publishing and writing, and I was surprised that not one of them had an index. It depends on your content whether a index is necessary or not. A good rule of thumb is a fiction paperback doesn’t need one, but a nonfiction book would benefit from one.

We’ve all got used to using a table of contents in our documents and books, and they do a good job of providing a high-level view of the content in sequential order, but an index is much more useful as it provides the next level of detail for your content, and creates relationships between like-minded content.


Auto generating my index in PDF Index Generator software

I thought indexing would be a lot more straightforward that it actually was. Obviously there’s a learning curve with any new software, as you figure out the basic functionality and work out how to streamline repetitive steps. The trick to using PDF Index Generator software is to use the font queries search function and leverage the default and custom include and exclude lists, but unless your text is consistent there will be some entries you need to edit manually.

PDF index generator software step one

Step One in the indexing process is to select whether you want to index all pages or just specific pages in your PDF file.

If you book includes front and back matter you should exclude these from your index. In the index generator I started my indexing from the start of the first chapter, to the end of the last chapter, and excluded the front and back matter pages.


Creating Include and Exclude word lists

PDF index generator software step oneStep Two in the indexing process is to set up your Include and Exclude word lists.

You have the option of indexing all words in the book – but I wouldn’t recommend this – because you don’t want to index every word in your text, only the key terms useful to your readers.

There’s 25 predefined lists you can turn on or off (adjectives, adverbs, pronouns, verbs, places, etc.) but you can also set up your own Exclude word lists.

I defined my own list of words I wanted to exclude, which was created based on my first set of indexing results. Here’s the steps I took:

  • I ran the indexing process.
  • I reviewed the results list and was able to untag the words I didn’t want indexing.
  • I used these untagged words to create a list of words I wanted to exclude.
  • I returned to Step 2 and created my own Exclude Word list, and activated it for the next round of indexing.
  • I ran the indexing process again, and the over 2,300 words I identified were automatically omitted.
PDF index generator software Exclude list
My custom Exclude list

The anthology editors had created a list of specific keywords they wanted the book indexed for, and that’s an ideal way to make sure you’re creating an index around your primary search terms. They came up with a list of 70+ specific words to index for, and I used these to create a custom Include list.

PDF index generator software Include list

This is actually an essential first step for creating your index. When you’re proofreading your text, or doing the final read through, make a list of the key words or terms that help your reader understand your book’s core content.


Define Search and Format Queries

By far the most complex element of the include/exclude process is the writing of queries, which is where you need source content consistency in order for the query to pick up each instance a specific word string or arrangement.

PDF index generator software writing queries

Each chapter in the anthology included an author mention, and each name was preceded by the same word, e.g. By Jean Lee. So I wrote a query to automatically find each instance of these author mentions, and format them by last name, first name in the index.

PDF index generator software writing queries

There are base queries you can use as templates and edit them to create your own custom query. Once you understand the methodology of the query strings, it’s straightforward enough to work your way through the elements for a custom string, and there’s functionality to be able to test the survey and results.

Where I ran into issues is when an author had three names, so the 2 name query string I’d created cut one of the names off and excluded it from the index, so I had to manually adjust these in the search research. With a bit more time and experience with these query strings, I have no doubt I’d have been able to master how to capture all of the author names exactly as I wanted them to appear. But I was short on time, so it was easier and quicker to adjust the index listings manually.

But that means any time I edit my base file, I’ll need to run my index process again, and make the same manual adjustments. So the automated approach is the best way to set up your index exclusion and inclusion process, or to set specific listing formats.

One useful query type I didn’t use yet was a Font Query.

The font queries feature allows you to index phrases in your book that have a specific font styling, like indexing bold phrases. This tutorial will show you how to use font queries to index titles or subtitles based on font styles, colors, or font name.


Reviewing your index results

Step three in the index generation process is reviewing your results, and depending on the size of your book you’ll need to review your results page and use it as input into your exclude and include lists, and to write your query strings.

PDF index generator software search results

Once you’ve created your final list of search results (which could include multiple rounds of writing queries and creating lists, in order to automate your index edits), then it’s time to create index cross references and index hierarchy relationships.

You have to have focus and be in the right mindset to create the optimal index. It takes a lot of concentration and thought, and logic to write the queries. But there’s an immense satisfaction when you whittle down thousands of indexed words into a concise and useful nonfiction index you can append to your PDF file.

PDF index generator software
My first index

If you’re writing a nonfiction book – take the time to create an index, or enlist the help of a freelancer to build your index on your behalf … your readers will thank your for it.

PDF index generator software
Anthology Index in the paperback

Addendum: After having to regenerate my index using an updated version of the Anthology, I found an more efficient way to index the book:

  • Run the index without any custom filters.
  • Export the entire list
  • Delete words you DON’T want to index
  • Create an Include list based on all the remaining words
  • Run the index process using the Only Search for my specified list function

This looks pretty straight forward, but there is still some manipulation of the results needed. The results only show single words, so you’ll have to research your manuscript to identify your word strings, and add those to your Include list.

This approach is a reverse-engineered approach. But you can approach your indexing in the other direction.

  • Review your book and identify the word and word strings you want to index for
  • Use these to create an Include List
  • Run the index process using the Only Search for my specified list function
Read these tips about how I approached the indexing process for my latest nonfiction book. #bookindexing #nonfiction #selfpublish Click To Tweet

For more information about PDF Index Generator software – visit their website.


Share your nonfiction book indexing tips with me in the comments below.

Read about my involvement in this Anthology, and the power of words, over on my personal blog.

Alzheimer's Dementia Anthology on Roving Jay

Author: Jay Artale

Focused on helping travel bloggers and writers achieve their self-publishing goals. Owner of Birds of a Feather Press. Travel Writer. Nonfiction Author. Project Manager Specialising in Content Marketing and Social Media Strategy.

5 thoughts on “Case Study: How I used PDF Index Generator software for a paperback anthology

  1. I have used PDF Index Generator for the first time. I used it for the history of the church I grew up in. Someone who was involved in writing a church history had recommended it to me.

    First, if you have the money and don’t like to fuss, hire someone.
    Second, I like the article Jay Artale has here. I wish I would have seen it earlier. The other shorts on YouTube are so many that it is hard to find what Jay is talking about here.
    Third, I got tired after a while and just wanted it done. Now that I have done the index, I regret that I did not take a few more minutes to, for example, to follow the recommendation as to where to start the pages. (Because I did not leave blanks for the beginning Roman Numerals, they are in the index when they should not be).
    Fourth, there is some grunt work involved, at least the way I did it. One goes through the lines of paired couplets and replace them with a name that the software missed. Which, by the way, is a great amount. For example, if there is a period after a last name, it did not pick it up.
    Fifth, those queries in step three are the meat of the Index, as she implies.
    Sixth, As Jay says here, it is a nice sense of satisfaction when you have this figured out. So if you see yourself with a high average IQ, then plan to spend six or seven hours to understand the index.

Leave a Reply

Your email address will not be published.

CommentLuv badge