Batch convert RTF files to TXT

Last year I decided to use plain text files (TXT) as the main file type for all my computer text input. There are several reasons for this, but perhaps the most important one was all the problems experienced when trying to open other types of text-based files (RTF, DOC, etc.) on various iOS and Android devices that I use daily. Another reason is to become independent of specific software solutions, forcing you to use a specific software for something as basic as writing text on your computer or device. Along the way I decided to shift my note-taking from MacJournal to nvALT. The best thing about nvALT is that it can unobtrusively monitor a folder of text files, and it allows for quickly searching in old files and write new ones. Since all the files are just plain text files stored in a regular folder (and sync’ed to the cloud), I can of course also use any text editor to view and write the files.

The problem was how to get all my previous notes into my new “system”. I have used a number of different note taking software over the years (e.g. Journler, DevonThink and Evernote). Fortunately, I have been quite careful about exporting all the notes regularly, mainly as RTF files. Having a few thousand such files (and some others), I looked for a solution to quickly convert them to plain text files. There are more complex solutions for converting text files to various formats (e.g. Pandoc), but I found the easiest solution was to use the OSX command line utility textutil. This little line will convert all RTF files in a folder to TXT files:

find . -name \*.rtf -print0 | xargs -0 textutil -convert txt

It will (of course) remove any formatting, but it will preserve all the (text) content nicely.

Application writing as example of stretchtext

I have been working on an ERC Starting Grant application over the last months. Besides the usual conceptual/practical challenges of writing funding applications, this particular application also posed the challenge of writing not only one proposal document, but two: one long (15 pages) and one short (5 pages). I am used to writing research papers and applications where you are dealing with three levels:

  • title
  • abstract
  • content

But for the ERC application I had to handle four levels:

  • Title
  • Summary (2000 characters)
  • Synopsis (5 pages)
  • Proposal (15 pages)

While working on the application, I started thinking about my old fascination of hypertext theory. One concept I found (and still find) interesting here is Ted Nelson’s idea of stretchtext. Stretchtext can be seen as text that can literally be “stretched” to any desired length (see, for example, this example). Conceptually this makes sense. After all, we as humans are able to do such stretching fairly easily, always trying to maximize our content to the limitations we may have. For example, I have no problems talking about my current research project for 1 minute, 5 minutes, 20 minutes or 45 minutes, it is just about “interpolating” the content over the required timeframe. The challenge, of course, is to balance the content in such a way that it makes sense for different durations or number of pages.

But how do you go about when having to write 5 and 15 pages about the same thing. Should you write 15 pages first, and then cut it down to 5? Or is it better to start with 5 pages and then “interpolate” it to 15? My approach this time was not particularly structured, and I constantly found myself moving back and forth between the two documents. This was perhaps not the most ideal solution, since I often found myself making the same changes twice.

The strategy I ended up with, and that I would probably start out with If I were to do such a thing again, was to use the commenting function of LaTeX more actively. In regular word processing software (MS Word, OpenOffice, etc.) there is no easy way to include or remove content from the document easily. The text in the document is there, and if you remove it, it is gone. In LaTeX it is possible to comment out blocks of text by just typing the % sign in front of the line. This makes it easy to “turn off” whole blocks of text. As such, my final 5 page synopsis document contained more or less the same stuff as the full 15 page document, but with large parts of the text commented out.

It would have been nice if LaTeX had had the opportunity to define levels of text. Then I could have chosen to write only one document, and defined which parts should be at level 1, 2, 3, etc. This could then have been used to output the different levels more or less automatically. Such an approach could perhaps be done done with a text outliner (e.g., OmniOutliner), and I am curious to test this out at some point.

However, the biggest challenge of writing a stretchtext is probably not the software being used. It is rather to figure out what content to include, and make it work linguistically at the different levels. In the end, you might end up with writing two separate documents after all…

Rules for computing happiness

This, and several other recent and forthcoming blog posts, have been lying in the drafts folder of my blog writing software (MarsEdit) for a while (some for more than 4 years…). I am currently going through the drafts one by one, deleting most of them, but also posting a few. Here is one I started writing back in 2009:

Alex Payne has published a list of rules for computing happiness. I don’t agree with all of them, but many of them resonate with my own thoughts. Here is a condensed list, based on the things I find most important:

  • Use as little software as possible.
  • Use software that does one thing well, do not use software that does many things poorly.
  • Do not use software that must sync over the internet to function.
  • Use a plain text editor that you know well.  Not a word processor, a plain text editor.
  • Do not use software that’s unmaintained.
  • Pay for software that’s worth paying for, but only after evaluating it for no less than two weeks.
  • Keep as much as possible in plain text. Not Word or Pages documents, plain text.
  • For tasks that plain text doesn’t fit, store documents in an open standard file format if possible.

Particularly the last ones, about using plain text files rather than a bunch of proprietary formats, is something I have been more concerned about recently.

Many lines in a text file

I am trying to debug a Max patch that does video analysis. For some reason many of the exported text files containing the analysis results contain exactly 4314 lines. This is an odd number for a computer program to dislike, so I am currently going through the patch to figure out what is wrong.

The first thing I thought about was the text object, which is used for storing the data and write to a text file. So to check the possible limitations of the object, I have made a small patch that writes lines of 60 random values. It turns out, however, that the text object easily handles 1 000 000 lines of random values, and manage to write the file to disk (366.9 MB).

So I continue my quest for the problem. I attach a screenshot of the code below for people that may be interested in this. The code may also be useful if you want to generate huge files with random numbers.

Screen shot 2010-10-11 at 09.24.02.png