Information Management Part II: The Database

Though I’ve been looking at the story of my great uncle Jack and the crew of B for Baker since about 1996, my work has really become serious over the last five or so years. In that time I’ve amassed and created a great deal of information: facts, documents, books, photographs, sound recordings, emails, letters and yes, even blog posts. Keeping track of all of that data is a significant task. In my last post I explained how I catalogue my sources. How to easily access the information contained within those sources forms the second part of my information management system.

The files from the first research I completed on my great uncle, which was for a competition called the National History Challenge when I was still in primary school, all fitted into a single A4 display folder. The second phase of work I did, in the year I took off between high school and university, filled a small portable filing box. I graduated to a two-drawer filing cabinet for paper notes and documents when I decided to start my current (and somewhat obsessive) work about five years ago, but by then digital storage was becoming predominant. While it’s still important to have access to the paper files in that cabinet (and I’ve filled another box with them too), most of my work (including this blog post as I write it) is now saved onto a portable hard drive on my computer and catalogued as I explained in my last post. And having stored and catalogued all of this information, the next step is the ability to easily search the whole lot to find the parts I’m looking for at any particular time. That is where a solid digital database comes in handy.

The one I eventually settled on is a self-contained piece of software called Personal Knowbase. It’s a simple, easy-to-use program that allows me to save my information in text-based articles and attach any number of relevant keywords to each article. Each article has a date stamp which can be set to any date – like the date of the letter or document under study, for example. I can then easily pull up any articles tagged with a particular keyword, or combination of keywords. Some examples of tags I’ve used might be general themes (‘training’, ‘flying’, ‘operations’, ‘England’ etc), individuals’ names, book titles, aircraft types, targets, airfields and so on and so on. I’ve also used my catalogue numbers as keywords which makes it easy to locate the source of any specific quote. I can easily run a more customised search using a combination of keywords, basic text searches and date ranges, across the entire database or a selected subset of it. Searches can then be exported in various formats for printing or review.

The point is it is very easy to access information when it is needed, and to be able to tell where that particular piece of information came from. And if I’ve used my keywords effectively, I can also pull up related articles – very helpful when looking for everything I have on a particular raid that happened on 10 May 1944, for example…

It’s not a perfect system. Simple things like typographical errors can make word searches difficult, and care needs to be taken to use appropriate keywords to avoid burying an important fact under a pile of other stuff. And like anything computer-related, the file is susceptible to a hardware failure or a file corruption, for example – the latter of which happened to me late last year. Happily, a solid backup regime provides a certain degree of redundancy and I was able to recover my file without losing too much work (the cause was eventually traced to a dodgy portable hard drive). My database is now automatically backed up in two separate locations, one of which is ‘off-site’, and I make an occasional manual copy too ‘just in case’.

Overall, it’s a useful bit of gear. I have the database window open on one side of my screen whenever I’m working on my research. Together with the catalogue spreadsheet, the database makes it easy to store, search and find virtually anything in my collection of sources.

© 2013 Adam Purcell


Information Management Part I: The Catalogue

I have something in the order of 13,000 individual pages in the sources listed in my catalogue for this project. Dealing with the sheer volume of stuff that I’ve gathered remains one of the big challenges of the work. Making interesting discoveries won’t do much good if I then forget where the discoveries came from, how I made them and where they fit into the story when it eventually comes time to write my planned book. There are two keys to my information management system: the catalogue, for knowing which source information comes from, and the database, for knowing what information is in those sources. This post will look at how I catalogue my sources.

If you’ve been reading this blog closely over the last few years, you may have noticed the occasional strange group of letters and numbers popping up in the posts themselves. The codes are in fact references to my catalogue of sources, and are the way I keep track of where my information comes from. The codes look like this, for example, from my ‘Accidents’ post of September 2010:

The second engine faltered shortly after crossing into England so they sought out an emergency aerodrome and, in Phil’s memorable understatement… (B03-001-016)

“…we crash landed rather unsuccessfully…”

Or these two, in ‘Motivations’ (November 2012):

Bill Brill was ‘getting a little accustomed to being scared’ (C07-036-159). And there is no doubt that airmen knew very well exactly how low their chances of surviving a tour were. Gil Pate wrote to his mother in November 1943 (A01-409-001): “It seems an age since I last saw you all + I guess I’ll need a lot of luck to do so again, the way things happen.”

The catalogue was one of the first things I set up when I got seriously stuck into this work in about mid 2008 and though it’s not an incredibly sophisticated system it is quite effective in keeping track of all the sources I’ve gathered over the last few years. It lives on a (well backed-up) multiple-tabbed Excel spreadsheet that I continually add to whenever I obtain a new source. Broadly, the code is split into four groups:

Designator-Series-Item Number-Page Number

The Designator tells me what type of source I’m looking at. It is the first letter in the group, and translates as follows:

A: Original (ie Primary) Documents, Scans or Copies – a document that originated during the war or the immediate period thereafter

B: Transcripts of original documents (used when I have not seen the original – ie someone else has transcribed it)

C: Post-war (ie Secondary) material

The types are broadly defined and can sometimes be a little ambiguous – at this stage it is not critical to define each type precisely. A general idea is sufficient.

The Series indicator defines the broad category under which the source fits. It decodes like this:

01: Letters and Telegrams (including letters I’ve personally sent and received)

02: Flying Logbooks

03: Diaries and Operational Record Books

04: Official Documents and Service Records – typically these documents come from archival collections such as the National Archives of Australia []

05: Photographs

06: Articles, Newspaper Clippings and Media Reports – including magazines

07: Books, Memoirs and Video

08: Databases

09: [Currently spare]

The Item Number is simply an increasing three-digit number for each individual document in each series (disregarding the designator), allocated in order of cataloguing. I haven’t yet reached greater than 999 items in any category, but if I do I’ll simply transition to a four-digit number for subsequent items.

The three-digit Page Number is the page of the document on which the actual quote or information can be found. Obviously for single-page documents or photos this would remain 001. If it’s a really long book with more than 999 pages, there is nothing stopping me using a four-digit page number.

So putting it all together, using as the first example the reference from my ‘Accidents’ post quoted above:


This refers to a transcript (B) of a diary (03), which was the first diary I catalogued (001). The quote can be found on page 6 (006).

I then take those details over to my Excel spread sheet, where I find…catalogue

Which tells me that it’s a quote from Phil Smith’s diary. I originally got the document from Mollie Smith, and my copy of it resides in my filing cabinet in the folder ‘Smith, Phil’.

The quote from Bill Brill, in my second example above, has the catalogue number C07-063-159. So it comes from page 159 of the 36th book I catalogued, and the book was written post-war. Referring to my handy spreadsheet, I can see its source is Hank Nelson’s book Chased by the Sun, and that there’s a copy of it on my shelf should I feel the need to check the quote.

I’ve used this cataloguing system throughout this blog mainly for my own benefit, so that when I start writing the book I keep telling myself I want to write I can easily find where I found all of my information when I was writing the blog. Obviously the final book will be properly referenced rather than being interrupted by my own strange system of code groups of letters and numbers – but behind the scenes, when I’m doing the research and the actual writing of the story, it’s a quick and easy short-hand method of accurately keeping track of exactly where my information comes from and ensuring that I can easily check my sources for accuracy where required.

I have also incorporated the referencing numbers into the second part of my information management system – the database. Conveniently, that will be the subject of my next post.

© 2013 Adam Purcell