Tag Archives: writing

Batch convert RTF files to TXT

Last year I decided to use plain text files (TXT) as the main file type for all my computer text input. There are several reasons for this, but perhaps the most important one was all the problems experienced when trying to open other types of text-based files (RTF, DOC, etc.) on various iOS and Android devices that I use daily. Another reason is to become independent of specific software solutions, forcing you to use a specific software for something as basic as writing text on your computer or device. Along the way I decided to shift my note-taking from MacJournal to nvALT. The best thing about nvALT is that it can unobtrusively monitor a folder of text files, and it allows for quickly searching in old files and write new ones. Since all the files are just plain text files stored in a regular folder (and sync’ed to the cloud), I can of course also use any text editor to view and write the files.

The problem was how to get all my previous notes into my new “system”. I have used a number of different note taking software over the years (e.g. Journler, DevonThink and Evernote). Fortunately, I have been quite careful about exporting all the notes regularly, mainly as RTF files. Having a few thousand such files (and some others), I looked for a solution to quickly convert them to plain text files. There are more complex solutions for converting text files to various formats (e.g. Pandoc), but I found the easiest solution was to use the OSX command line utility textutil. This little line will convert all RTF files in a folder to TXT files:

find . -name \*.rtf -print0 | xargs -0 textutil -convert txt

It will (of course) remove any formatting, but it will preserve all the (text) content nicely.

To footnote or not

By coincidence I have had several discussions about footnotes, endnotes and different types of citation styles recently. Such discussions often end up in “religious” wars, in which researchers from different disciplines argue why “their” system is the best. I often find myself agreeing with none or everyone in such discussions, since I am working in and between several different disciplines (the arts, humanities, technology, psychology, medicine), and publish my own work in journals that use different ways of handling citations and notes.

What to cite or note?

Before discussing the different systems in more detail, it is worth remembering that there are usually two types of information that an author would like to include in the text:

  1. references to books, papers, etc. that you mention in the text.
  2. extra information that you do not feel it is necessary to keep in the main body of the text.

I will try to make a clear separation of these two cases in the following discussion.

Different systems

The Chicago Manual of Style suggests that there are two basic documentation systems: (1) notes and bibliography and (2) author-date. In my experience there is also a third main type, which I could call numbered citations. They each have different use:

  1. Author-year: the author’s name and the year of the publication is placed within parentheses or brackets in the text, and at the end of the text is a reference list, usually ordered alphabetically. This system is only meant for citations, and can easily be combined with using (foot/end)notes to add extra information. The “author-year” style is widespread in a number of disciplines, and is also widely used in many musicological disciplines but music history (I am here thinking of musicology in the European tradition, i.e., a heterogenous group of disciplines focusing on the study of music).
  2. Notes and bibliography: in this system both citations and extra information is put into either footnotes or endnotes. The system is used differently, dependent on the journal or publisher. Sometimes an author-year type of citation is put in the note and a full reference list is included at the end of the text. Other times the full reference is included in the note, without the need for a reference list. I have come across a number of different solutions of how to implement these two (and combinatory) methods of approaching citations in the “notes and bibliography” system. The “notes and bibliography” style is widespread in parts of the humanities, particularly that of historical disciplines (including music history). The main difference between the “notes and bibliography” system and the two others (“author-year” and “numbered citations”) is that it allows for mixing citations and other types of information in the notes.
  3. Numbered citations: This may at first glance seem like a system quite similar to using endnotes with “notes and bibliography”, but in fact is quite different. The numbered citation system does not allow for mixing in other information, it is a purely citation-based system in which the numbering used in the citations in the text refer to numbers in the reference list, either in order of appearance in the text or alphabetically. This style I often encounter in more technology-oriented publications, as well as in some medical and psychological journals. Sometimes you even find a combination of the “author-year” and the “numbered citations” systems, with abbreviated citation keys, e.g. (Jen07) instead of (Jensenius, 2007).

I guess there might be some researchers that only work with one of these systems throughout their entire career, but I usually have to adapt to any of these systems dependent on where I want to publish. Coming to think of it: I have just proofread the camera-ready versions of three journal papers that will be published in the coming months (more on that later), each of which is using one of the three systems mentioned above.

Since I use all three systems regularly, I have worked out writing and formatting techniques that work well with all of them (thanks to LaTeX and BibTeX), and have no problems adapting to whatever the publisher wants. That said, I have throughout the years made up a clear opinion of what I prefer myself: the “author-year” method. This opinion is solely based on what I think is the most efficient method for reading and writing texts. In the following I will try to explain the rationale behind this decision.

Why I like author-year for citations

My main argument for using the author-year style is based on efficiency of writing and reading. More precisely I will argue that the author-year system is:

  • Compact: The author-year system makes it possible to create compact texts, since the citations only take up a small space on a line (at least if the names are not too long). As such, it is more compact than putting citations (or even full references) on separate lines in footnotes or endnotes. The “author-year” system is less compact than the “numbered citations” system, in which the citation is only a number, but this is also what makes the “author-year” system more readable.
  • Readable: The author-year system makes it possible to read the text continuously, since the citations are placed inline with the text. I do agree that it may be more of a distraction to look up citations in a reference list at the back of the paper than to look in a footnote. However, if you know the field fairly well, or read through the reference list before reading the paper, it is possible to understand who is being referenced by only reading the main body of the text. The disadvantage of having to look up references in footnotes, is that it distracts from the reading — you have to constantly shift focus up and down the page to find the note and then find back to where you left off. I have not found any tests on the speed of reading with different systems, but my own feeling is that it dramatically reduces the speed at which I read when I have to constantly move up and down the page between the content and footnotes.
  • Easier: I am doing a fair bit of manuscript reviewing, and lots of supervision of student papers and theses, and am highly convinced that the “author-year” system is easier to handle for most writers. From my own and lots of my students’ experience, working with (foot/end)notes is a pain in most WYSIWYG programs (e.g. MS Word). I have seen countless of examples of how all the footnotes in long master theses documents have been scrambled, renumbered, reformatted, etc. I have had no such technical challenges in LaTeX, but it is still a much easier writing and layout-process of just including everything in the main body of the text.

Why I try to avoid (foot/end)notes

Some arguments for footnotes are that they allow for:

  • Quick access: the information is there at the bottom of the page.
  • Elaboration: allows you to further develop the arguments without distracting the main narrative.

I have for a long time been very fascinated with the concept of hypertext, and the possibilities that non-linear writing opens for. This in itself should be a good argument for me liking footnotes. The problem, however, is that foonotes, at least in the traditional sense, are very far from the ideas of hypertext. First, footnotes often seem to be used to dump content that the author did not feel was necessary/relevant/interesting enough to include in the main text. Second, the footnote is a dead-end, from which the only way out is to go back to where you came from. As such, footnotes do not open for the concept of hypertext as an interwoven web of texts (yes, it sounds a bit silly to write this in 2012, but despite the progress of the www, hypertext as a concept and method is still just in its infancy).

There are certainly cases when you are uncertain as to whether a part of your text should be included or not, particularly when beginning to write a manuscript. That is also one of the reasons why I often use footnotes myself as a writing method, moving content back and forth between the main text and the footnotes. Writing is always based on decision-making: what should I include and what should I leave out? The problem with footnotes is that they can be used as an excuse of not getting rid of content that is not really necessary, as this quote summarises well:

But think whether such information needs to be present at all. If the term being footnoted in the first of these examples is so obscure, why not merely explain it? […] You should make every effort to make your work a pleasure to read. Reading it should not be an epic struggle on the part of your hapless reader.

That is the reason why I usually end up either throwing away the footnotes, or including them in the text as I finalise my manuscripts.

(Foot/end)notes in electronic documents

The last reason I am sceptical about footnotes, is the move towards electronic documents. While footnotes may make sense in a printed document, they usually end up as endnotes in electronic documents (where there is no static concept of “pages”). For that reason, it may be easier to just work with endnotes in the first place, since the document can be more easily used in both printed and electronic format.

Working with static, dead-end endnotes in electronic documents is not very future-optimistic, though. Then I would rather hope that we could work towards proper hypertexts, in which multiple layers/levels of text could be intertwined. Until that is possible, and accepted in scientific writing, I prefer to write and read linear texts without (foot/end)notes. I believe that is easier both for the author and for the reader.

Application writing as example of stretchtext

I have been working on an ERC Starting Grant application over the last months. Besides the usual conceptual/practical challenges of writing funding applications, this particular application also posed the challenge of writing not only one proposal document, but two: one long (15 pages) and one short (5 pages). I am used to writing research papers and applications where you are dealing with three levels:

  • title
  • abstract
  • content

But for the ERC application I had to handle four levels:

  • Title
  • Summary (2000 characters)
  • Synopsis (5 pages)
  • Proposal (15 pages)

While working on the application, I started thinking about my old fascination of hypertext theory. One concept I found (and still find) interesting here is Ted Nelson’s idea of stretchtext. Stretchtext can be seen as text that can literally be “stretched” to any desired length (see, for example, this example). Conceptually this makes sense. After all, we as humans are able to do such stretching fairly easily, always trying to maximize our content to the limitations we may have. For example, I have no problems talking about my current research project for 1 minute, 5 minutes, 20 minutes or 45 minutes, it is just about “interpolating” the content over the required timeframe. The challenge, of course, is to balance the content in such a way that it makes sense for different durations or number of pages.

But how do you go about when having to write 5 and 15 pages about the same thing. Should you write 15 pages first, and then cut it down to 5? Or is it better to start with 5 pages and then “interpolate” it to 15? My approach this time was not particularly structured, and I constantly found myself moving back and forth between the two documents. This was perhaps not the most ideal solution, since I often found myself making the same changes twice.

The strategy I ended up with, and that I would probably start out with If I were to do such a thing again, was to use the commenting function of LaTeX more actively. In regular word processing software (MS Word, OpenOffice, etc.) there is no easy way to include or remove content from the document easily. The text in the document is there, and if you remove it, it is gone. In LaTeX it is possible to comment out blocks of text by just typing the % sign in front of the line. This makes it easy to “turn off” whole blocks of text. As such, my final 5 page synopsis document contained more or less the same stuff as the full 15 page document, but with large parts of the text commented out.

It would have been nice if LaTeX had had the opportunity to define levels of text. Then I could have chosen to write only one document, and defined which parts should be at level 1, 2, 3, etc. This could then have been used to output the different levels more or less automatically. Such an approach could perhaps be done done with a text outliner (e.g., OmniOutliner), and I am curious to test this out at some point.

However, the biggest challenge of writing a stretchtext is probably not the software being used. It is rather to figure out what content to include, and make it work linguistically at the different levels. In the end, you might end up with writing two separate documents after all…

Writing complex documents

I have been using LaTeX for most of my more advanced writing needs for so many years, that I tend to forget that there are so few other good options out there for writing what could be called “complex” documents, i.e. book-sized documents with a good portion of notes, pictures, links, etc.

I just had to help out in trying to create a large document based on 30+ individual documents in MS Word. Word offers the possibility of creating a ”master document” for embedding multiple individual documents. This (in theory) makes it possible to create one large table of contents, internal links, etc. However, in practice this turns out to be a nightmare of dimensions: styles change, links disappear or stop working, the table of contents finds most things, but with wrong styles, page numbers don’t get updated properly…

I’m glad I don’t rely on MS Word for such things, and I feel sorry for everyone that has to go through so much pain to create a large and complex document. Unfortunately, the rather steep learning curve of LaTeX makes it difficult to suggest it to people that are not inclined for writing code themselves. But what other options are there? OpenOffice might work a little better, but it is based on the same idea of mixing content and layout as Word. Layout programs are usually not particularly good for writing text, not to say footnotes, bibliography, etc. Scrivener is good for structuring large portions of text, but lacks most other thing required in scientific writing (and it is OSX only). Adobe FrameMaker could have been a solution, had it not been Windows only and fairly costly.

Any suggestions for other software would be welcome, and I will pass them on to the next unfortunate Word user I meet.