Since I switched to english for this blog — except for a few one-liners auto-imported from Twitter when I post there — I’ve been using a plugin that computes some lexical statistics to try and improve my writing. I tend to write wordy and confusing paragraphs more often than I wish I would because (1) trees of subordinate clauses are more common in my native language and people parse them better, in part because there are more clause connectives specifying the structure of complicated sentences, (2) lots of latin and greek-derived words are colloquial in latin-derived languages like portuguese but stuffy and academic in english and (3) sometimes my english plain sucks, and my old portuguese-speaking readers do complain.
So, anyway — I’ve been posting transcripts from the DNCC, and my edit box will automatically compute those lexical stats I mentioned. For the sake of full methodological disclaimer, these have a small margin of error since they include my very brief opening remark — two sentences at most — and I don’t have enough free time to dick around some more, but those shouldn’t introduce statistical biases in analyzing texts that have upwards of 100 sentences — so pretend you didn’t read this note.
Exformation is, selon Wikipedia, “explictly discarded information”. Quoth the aforementioned.
“Exformation is everything we do not actually say but have in our heads when, or before, we say anything at all - whereas information is the measurable, demonstrable utterance we actually come out with.”
“In 1862 the author Victor Hugo wrote to his publisher asking how his most recent book, Les Misérables, was getting on. Hugo just wrote “?” in his message, to which his publisher replied “!”, to indicate it was selling well. This exchange of messages would have no meaning to a third party because the shared context is unique to those taking part in it. The amount of information (a single character) was extremely small, and yet because of exformation a meaning is clearly conveyed.
Clever hack: a 42 kb .zip file that decompresses unto your hearts’ content.
Quoting freely from various Wikipedia articles in italic, the specification for ZIP indicates that files can be stored either uncompressed or using a variety of compression algorithms. However, in practice, ZIP is almost always used with Katz’s DEFLATE algorithm, except when files being added are already compressed or are resistant to compression”; where DEFLATE is a combination of the LZ77 algorithm and Huffman coding.
Whille DEFLATE, the LZ family and other compression systems are often referred as algorithms, what they really are is file specification standards. It’s theoretically possible that the standard DEFLATE algorithm isn’t the most efficient in terms of final size/uncompressed size such that the resulting file is a DEFLATE standard stream. Which is why such clever hacks as the 42kb zip bomb that uncompresses into petabytes of nonsense data are possible.
This extends in general to all kinds of compression systems, and even more for lossy ones. It’s actually widely-known that there exist competing MP3 encoders which will produce a varying range of “qualities” while still generating valid MP3 files. (These qualities might be multi-dimensional — not “better”, but better for rock or classical music, or perhaps a spectrum between “best for a wide amplitude range” or “best for a wide frequency range”).