Image quality

A ȷupyter notebook

If I would be asked to nominate the one property which characterizes the modern world best, it would be its increasing focus on graphical representations. We are experiencing a transition from a textual to a pictorial society. People communicate with each other not by sending texts, but by photographs on social networks or emojis in chat applications. They learn how to do that not by reading a manual, but by video tutorials. Even highly technical articles on IT oriented websites are invariably illustrated, however unrelated or absurd the respective illustration may be.

I have no serious problem with that as long as the images are a pleasure to look at and are nice to my band width. For keeping this balance, however, a certain understanding of graphic formats is helpful. Only yesterday I received a two-page abstract for a conference which weighed a hefty 73 MB. Saving the images in an appropriate format reduced the size to 1.2 MB. That was not even the most extreme example I've experienced with excessively large files...

More frequently, however, the problem is not an excessive image size but a mediocre image quality. To avoid this problem requires either a certain perceptiveness, or an algorithm suitable for an objective ranking of image quality. The structural similarity index (SSIM) is such an algorithm, and is widely used as video quality metrics in the televison industry.

The present ȷupyter notebook calculates the SSIM for two bitmaps derived from a vector graphics. The first bitmap is obtained by

pdftocairo -png -r 1200 org.pdf org.png
pngquant org.png web.png

The resulting bitmaps measure 550×934 px, and the files 'org.png' and 'web.png' are 30519 and 14797 bytes in size, respectively — significantly larger than the original graphics 'org.pdf' (4214 bytes).

Can we reduce this size further by converting the image to the popular jpeg format? As was to be expected for a line graphics with hard contrasts, we can, but pay the prize of a much reduced image quality. Indeed, to obtain a jpeg file of similar size (to be precise, 15556 bytes) we have to use very low settings:

convert org.png -quality 20 lossy.jpg

As you see in the comparison below, the SSIM for 'web.png' is almost identical to the original, while that of 'lossy.jpg' is drastically lower. And many of you will probably wonder why, since you don't see a difference in the images below – right?

I do, but I have a trained eye. To convince you that the SSIM is meaningful, here are the two images for direct inspection: web.png lossy.jpg.

%matplotlib inline

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

from skimage import img_as_float
import skimage.io as io
from skimage.measure import compare_ssim as ssim

mpl.rcParams['font.size'] = 20
mpl.rcParams['figure.figsize'] = (16, 8)
org = io.imread('org.png', as_grey=False)
org_png = io.imread('web.png', as_grey=False)
org_jpg = io.imread('lossy.jpg', as_grey=False)

img_org = img_as_float(org)
img_png = img_as_float(org_png)
img_jpg = img_as_float(org_jpg)

fig, (ax0, ax1, ax2) = plt.subplots(nrows=1, ncols=3)
plt.tight_layout()

ssim_none = ssim(img_org, img_org, multichannel=True, 
                 dynamic_range=img_org.max() - img_org.min())
ssim_png = ssim(img_org, img_png, multichannel=True,
                  dynamic_range=img_png.max() - img_png.min())
ssim_jpg = ssim(img_org, img_jpg, multichannel=True, 
                  dynamic_range=img_jpg.max() - img_jpg.min())

label = 'SSIM: %.4f'

ax0.imshow(img_org, cmap=plt.cm.gray, vmin=0, vmax=1)
ax0.set_xlabel(label % (ssim_none))
ax0.set_title('Original image')

ax1.imshow(img_png, cmap=plt.cm.gray, vmin=0, vmax=1)
ax1.set_xlabel(label % (ssim_png))
ax1.set_title('PNG')

ax2.imshow(img_jpg, cmap=plt.cm.gray, vmin=0, vmax=1)
ax2.set_xlabel(label % (ssim_jpg))
ax2.set_title('JPEG')

plt.show()

png

World of wonder: a century of progress

There are way too many news to notice, but this month, three struck me as truly noteworthy. Since one of them confirms a prediction made a century ago, I couldn't help musing over the possible reactions of a contemporary of Einstein (let's call him Niels). What would Niels think when reading these headlines:

Gravitation waves detected 100 years after their prediction

A century ago, Einstein predicted the existence of gravitational waves as a consequence of his theory of general relativity. This week, gravitational waves have been detected by the two observatories of the advanced LIGO experiment. The collaps of a massive double quasar, which we now understand to be a merger of a binary black hole, stretched and squeezed earth by a few femtometers, just enough be be observable.

Artificial intelligence defeats Go champion

Chess is the domain of the machine since ‎about 20 years. Go, with its enormous complexity, was long thought to be out of reach of even today's computational power. That has turned out be wrong: deep learning neural networks have recently outwitted a European champion. Can they defeat the top players too? We will know in March.

Virtual reality becomes available for consumers

People love to be in their own little world. I see that every morning and evening in the subway, where everyone is glued to a smartphone connecting to Facebook and WhatEver and powering the all important Dr. Dre headphones to escape the bland, perhaps sometimes dismal reality. For the(se) people, the industry now has the ultimate solution.

What would Niels think? As an educated person, he wouldn't be too surprised about the existence of gravitational waves. Sure, he would be amazed by the accuracy of the LIGO measurements, and shocked by the concept of a black hole. But otherwise, he'd be cool.

Niels wouldn't have even heard about artificial intelligence. And defeating human beings? I'm not sure whether he'd like that. Still, being an intelligent guy, he would be able to grasp the concept and accept it within a short time.

Social networks and virtual reality? I feel that Niels would have great difficulties in understanding these concepts. I have them myself. 😉 But perhaps he would learn to love them, after getting used to them? I doubt that very much, but that's just antisocial me.

Danaergeschenk

(a German idiom)


A son, well-meaning, selects a notebook as a Christmas present for his mother. The lady, a librarian close to retirement, is as happy as helpless when looking for the first time at the start screen of Windows 8.1. Everybody at the party has a suggestion, but after hours of touching and prodding, the initial joy turns to disappointment.

Days later, she worked up the courage to ask me for help. Her desperation was so obvious that I agreed immediately. Only minutes later, I had second thoughts. The last version of Windows that I had physically installed was 2000, and that's 15 years ago. I have never even looked at Windows 8 but, I reflected, I was perhaps not totally unprepared as I had digested an article on Windows 8 in the German computer magazine c't in 2012.

The little I remembered from the article was indeed helpful. There was anyway very little to do, except for configuring e-mail access and a browser with an ad blocker. The system came with a full version of Norton Security, and I explained that we would have to find a solution once the license expired in a year. Oh, and the notebook turned out to have a normal display — no touch screen.

....

Two years later. My client asks me if I could have a look at her notebook. She admits to have used it only sporadically, and not all during the past 18 month. She reports that it "behaved strangely" the last time she has logged in. She doesn't know how to state that more accurately.

Well, booting takes an eternity (4 minutes). Norton complains that the license has expired. I uninstall it and activate defender. Full scan, nothing found. Then, I update Windows. And update. And update.

Two days (!) later. After four reboots, all updates have been installed. Firefox crashes on start. I install the current version and add ublock Origin and Ghostery. When opening heise.de, ads start to populate the page to an extent I've never seen before. It's creepy, like a swarm of big, ugly insects invading a cadaver.

Malwarebytes finds 97 "potentially unwanted programs" (so much for Norton Security). Desinfec't (which I could boot after dealing with the secure boot UEFI specialties) detects another 16 varieties of malware of the advertising sort.

After all these scans, and altogether one week later, the system is (reportedly) clean, and everything works as expected.

....

My client was horrified when I reported my findings, and she asked how that could have happened. I wanted to be frank with her, and thus had no choice than telling her that the origin of this "infection" was the download and installation of an apparently innocuous program called Regclean Pro. Needless to say, she never heard about this program in particular, or about download portals in general. With her (naive) view of the internet thoroughly shattered, she sat there with a forlorn, crestfallen expression on her face, not feeling at home in this world anymore.

I'm usually cynical enough to have no mercy at all with people whose view of the world doesn't match reality. But I felt different this time. Microsoft, with the help of the advertising industry as well as profiteers and criminal organizations around the globe, has created a monster. To keep Windows clean requires knowledge, time, and effort. Definitely too much of all, if you ask me.

MacOS and Linux are the best known alternatives to Windows. The former is essentially as vulnerable to adware (or, as that crap is called euphemistically, "potentially unwanted programs aka PUPs") as Windows. The latter is not. However, a full blown Linux installation seems a bit of an overkill for a user who only wants to explore the part of the WWW devoted to bone china accessories, and for the e-mails resulting from the occasional acquisition.

For a user with this profile, a Chromebook would do admirably. It's affordable, boots fast, is updated automatically, is not affected by PUPs, does not require an anti-virus scanner, nor any other pampering, and has a builtin backup in the cloud. And I don't think privacy and data protection are the most important issue here, particularly when compared to the existing notebook with its login based on a Microsoft account. 😉

Update: I'm not alone.

Total recall

I've always been an enthusiastic supporter of the idea of a desktop search. It's weird that this idea has not acquired any significant acceptance among users, except for those on Mac OS where Spotlight gained huge popularity. In contrast, users of Windows and Linux exude great irritation and a kind of angry disapproval when confronted with the idea of an application which can find texts, mails, documents in seconds regardless of their location in the file system. These users vehemently stress that they know exactly where their files are, which is why they don't need such a thing (and never will!), that is anyway used only by total noobs who don't keep any order on their hard drives, etc. etc.

Now, nobody would accuse me as being disorganized — in fact, most would rather describe me as a meticulous, even fastidious person. That's certainly true with regard to electronic media: I stash away every file in the place it belongs and obey strict naming conventions. Still, I regard a desktop search as an invaluable tool for my daily work. Let me give you an example why.

Suppose you plan a project concerning Lucanus cervus, the stag beetle. You are particularly interested in its mating behavior and the preceding territorial fights. You have collected already many of the most important papers on this topics. Of course, you have discussed the issue extensively with potential collaborators around the world by email and chat. Together with some of them, you are on the way to a first publication, and you are also preparing a project application to the European Union.

Naturally, the files concerning this project are distributed across several different folders. Their names may bear no resemblance to the actual project. You may know where to find them in principle, but to get an overview of the discussion you had a year ago on Prosopocoilus giraffa would require substantial time and effort. A good desktop search would find this discussion and all related documents in the blink of an eye.

But is there a "good" desktop search for Linux? My previous experiences with the desktop search integrated into KDE were actually quite depressing. Whatever the KDE developers came up with ate way too much RAM and CPU and worked only occasionally. I wasn't too impressed with tracker under Gnome either. Actually, my most positive experience dates back to the times of beagle, perhaps fifteen years ago.

A year ago, I decided to quit desktop environments in favor of a simple and transparent openbox solution. Naturally, I wondered which desktop search I could profitably use on my new minimal desktop. And I found recoll, the desktop search I wanted to have since the beginning of time. Ironic that I had to ditch desktop environments to find a desktop search which really works...

Long-time support

LaTeX files with a bibliography (or an index) require several compiler passes to resolve all references for building:

latex paper.tex
bibtex paper.aux
latex paper.tex
latex paper.tex

Tedious, isn't it? For that reason, wrappers have been invented, automating this procedure. The veterans on this field are latexmk and rubber, but there are several others, including a new kid on the block called latexrun that sounds truly promising.

I've used rubber for many years, despite the fact that its development had stalled nine years ago. Rubber has worked for me since scientific publishers are ... ahem ... extremely conservative: they will probably stick to the pdflatex/bibtex workflow for the next few centuries (or forever, whatever comes earlier).

However, for documents not intended for submission to a publisher, I've long ago switched to XeLateX/Biber. For example, my curriculum vitae is to be compiled with these engines rather than with their prehistoric ancestors. Since it is technically identical to its template, you can download Adrien's CV to test the wrapper you want to use.

The following commands should compile this CV in a single run, including the bibliography:

latexmk --xelatex --synctex=1 --silent cv.tex

rubber --module xelatex --module biber --synctex cv.tex

latexrun --latex-cmd xelatex --bibtex-cmd biber cv.tex

In July 2015, only latexmk managed to do that, while rubber 1.2 failed despite my manual addition of XeLaTeX and Biber support. I was only mildly disappointed: after all, rubber had become lame and grey, and I realized that it was time to leave it behind. I was surprised, though, that latexrun, the newly developed contender, failed as well.

Now, to my absolute astonishment, the rubber project suddenly became alive again this year. After nine years of inactivity, rubber 1.3 was published in October, and 1.4 just a few days ago. Rubber 1.4 passed the test above with flying colors, and twice as fast as latexmk. What's more, SyncTeX support, which I previously smuggled in manually, has now been added to the rubber feature list as well.

Rubber thus manages to once again take the crown of LaTeX wrappers, with latexmk just seconds behind (literally) because of its inferior performance. Latexrun in its present state leaves much to be desired, but I'll keep an eye on its further development (if any 😉 ).

Learn to love the CLI

When I tell my students that they should use 'git' or 'mercurial' to manage their manuscripts, I always draw blanks. When I show them the few commands necessary to do so, I invariably earn stares of disbelief.

The command line. Primeval fear.

The tension subsides markedly after introducing tortoisegit and tortoisehg. Personally, I wouldn't install a GUI for the three commands necessary to locally manage a manuscript. But then, I've spend time and effort to make my stay at the command line as joyous as possible.

Shell

First of all, I need a well configured shell, and I'm not aware of any distribution which would offer one out of the box (well, perhaps grml, but that's not your typical desktop linux). I want command completion and sensible command aliases. I want an informative prompt: it should show if I'm in an hg or mercurial repository, and whether a virtual python environment is active or not. I also want to keep my typing at a minimum, i.e., a powerful history search is mandatory, and some sort of autojump feature is highly desirable.

No current shell offers all that out of the box. Many do not support these features at all. Only the "big three" may be configured such that they comply to all of my criteria: the bash (the de facto standard shell in Linux and MacOS), the zsh, and the fish.

fish

My long-time favorite is the fish, since it offers all of the above with truly minimal effort. And more!

Minimal effort? Really?

I speaketh the truth. I only had to...

...define some elementary aliases, such as

alias c "clear"; and funcsave c 
alias la "ls -la"; and funcsave la 
alias p "cd -"; and funcsave p 
alias pa "ps aux | grep"; and funcsave pa 
alias s "cd .."; and funcsave s 
alias x "exit"; and funcsave x

(and some more involved ones which I might discuss in a later post — just note that oneliners are usually better defined as an alias rather than a script)

...install autojump and virtualfish and load them in ~/.config/fish/config.fish:

set -g -x EDITOR vim                               
set -g -x BROWSER chromium

if test -e /etc/profle.d/autojump.fish
    . /etc/profile.d/autojump.fish
end

set -xU WORKON_HOME ~/.virtualenvs
eval (python -m virtualfish compat_aliases auto_activation)

...select a proper prompt (I usually use RobyRussell) by issuing

fish_config

...and modifying this prompt to also indicate a python virtualenv as detailed here.

Here's an example for navigating the file system via the autojump feature including directories under revison control and containing a python virtualenv.

Note how the virtualenv activated itself automagically thanks to the 'auto_activation' option in fish's config above. ☺ And: the config above is really the entire configuration file. 559 bytes!

bash

To get a similar result for the bash requires significantly more effort. As root, I never use the fish but always the bash, and I thus need to spend time to configure this shell as well. In particular, I expect the shell to autocomplete systemctl calls such as

systemctl status pos

with

systemctl status postfix.service

This functionality requires installation of the bash-completion package.

Some of the history features which fish offers out-of-the-box have to be activated for the bash, and I've explained how to do that previously. Other than that, I edited my .bashrc to include important features of its history, and the following lines at the end:

source /etc/profile.d/autojump.sh
export WORKON_HOME=~/.virtualenvs
source /usr/bin/virtualenvwrapper.sh

As for the prompt, I mostly use the various ones made available by bash-it, a bash framework similar to oh-my-zsh for the zsh, or wahoo for the fish. Since I keep both the aliases and the prompt separate from the main configuration file, we can compare sizes: my .bashrc weighs a rather hefty 1627 bytes.

Here's an example with the tylenol theme:

zsh

Never really used it for longer times, as I never really saw a significant advantage compared to a properly configured bash (or even a fish out-of-the-box). That's more my fault than that of the zsh. In nay case, if I feel using the zsh, I rely on the oh-my-zsh framework. Here's an example using the dstufft theme:

Terminal

The shell is one thing, but the other one is the terminal we are using to interact with it. I typically use the terminal emulators appropriate for a given environment: konsole within kde, gnome-terminal in gnome, and urxvt in wmii. For the bastardized openbox environments on my desktops and notebooks, I employ lxterminal, xfce4-terminal, terminator, and guake.

The by far most obvious characteristics of the terminal is its font. I still prefer terminus for low display resolutions (such as 1366x768 in the screenshots above), but I switched to antialiased fonts for higher ones.

Buchhaltung

Since my knowledge of financial English is essentially nil, I spare myself the trouble and keep the following in German.

Als Wissenschaftler im öffentlichen Dienst wird man sicher nicht reich, kann aber bei bescheidener Lebensweise den einen oder anderen Euro zur Seite legen. Anno 2009 gab's auf diese Euros noch 5% Zinsen auf Festgeldkonten, aber diese Zeiten sind vorbei. Als kürzlich der Zinssatz auf meinem Tagesgeldkonto unter 1% fiel, hatte ich gelinde gesagt den Kanal voll. Bloß, was soll man tun, wenn man von Finanzdingen so gänzlich unbeleckt ist?

Wenn man keine Ahnung hat, muß man sie sich anlesen. Zu meiner Überraschung gibt es unzählige deutsche Finanzblogs, die in dieser Hinsicht überaus hilfreich sind. Besonders positiv fiel mir der Finanzwesir auf, dessen Beiträge nicht nur informativ, sondern auch flott geschrieben und oft sehr unterhaltsam sind. Allein sein Gastbeitrag bei den kritischen Anlegern ist in jeder Hinsicht Gold wert.

Aber womit verwaltet man das Portfolio, dessen Anlegung so empfohlen wird? Viele benutzen dafür Excel. Ganz abgesehen davon, daß ich Excel als Bestandteil von MS Office unter Linux nicht direkt nutzen kann, mag ich auch dessen Funktionsweise nicht, was die Nachbauten von GNU, Star/Open/Libreoffice und Softmaker mit einschließt. Was nun?

Spezialisierte Finanzsoftware gibt es natürlich, ist aber meist kostenpflichtig und nur für proprietäre Systeme erhältlich. Mit der prominenten Ausnahme von GnuCash, das ich als erstes ausprobierte, aber als völlig unzugänglich empfand. An diesem Punkt angelangt hält man normalerweise die Klappe und zahlt. Oder man programmiert sich das selbst, wie Andreas Buchen, der mit portfolio ein Programm entwickelte, das unter der EPL steht, und punktgenau meinen Bedarf trifft.

Wie immer bei Archlinux findet man auch die abgefahrensten Dinge im AUR, und so auch portfolio. Beim kürzlichen Übergang des AURs zu einem git-basierten System wurde portfolio allerdings entfernt, da es nie aktualisiert worden war und deswegen als verwaist eingestuft wurde. Glücklicherweise wurde der PKGBUILD archiviert, was es mir wiederum sehr leicht machte, das ganze wiederzubeleben.

Ich werde diesen Blog nicht in einen Finanzblog umwandeln — mir fehlt dazu jegliche Kenntnis und Kompetenz. Ich möchte aber alle dazu ermutigen, sich mit diesem Thema zu beschäftigen. Emotionale Widerstände gegen dieses Thema sind meist ein Ergebnis einer ideologischen Verbrämung, die eines genauen Hinsehens nicht standhält. Diese Widerstände schaden letzlich nur einem: uns selbst.

Back up the NAS

Two weeks ago, my NAS suddenly emitted a high-pitched, enervating sound, and the indicator of one of the hard disk trays blinked nervously. The info line in the display screamed 'hard disk failure, RAID degraded', my wife's face was frozen in terror, the cats stared at me with eyes as big as saucers, and I was close to a heart attack. What prevented the disaster was my sudden realization that I had made a backup copy just a week ago...

In any case, since I've set up a RAID 5 for the four disks in my NAS, nothing was lost yet. And I actually wanted to upgrade the disks for quite some time, since their total capacity of somewhat less than 6 TB was close to be exhausted. Hence, I ordered four 3 TB NAS disks right away, but kept brooding over the question how to create backup copies when the capacity would be close to 9 TB. Should I really invest in a second NAS? Naa: I really don't need another always-on device in my flat.

The solution was so simple. With three 2 TB disks freed from the NAS, two 1 TB disks just lying around, and 6 TB in three USB hard disks, there's already more than enough disk space to back up the NAS. And the orphan SATA disks can be utilized by simply employing a docking station! Yes, it's that easy.

I've acquired this one and, just for he fun of it, partioned and formatted three of the orphan disks. Copying data from the NAS (mounted via cifs over a GB switch) worked flawlessly.

When the new disks arrived, I exchanged one by one, as one does for a RAID. It always took a few hours to rebuild the system, so I simply waited from one night to the other. Finally, it just took 10 min to expand the RAID to the larger disk size. The almost 9 TB should suffice for some time. 😊

Conky's resurrection

Finally, with version 1.10.0-5, conky started to work again and finally understood all commands of my configuration. All? Well, it still doesn't recognize 'pre_exec', but I substituted this command with an 'execi 3600', which will do for the time being. I've put the configuration file of my Lifebook here. For those interested in the weather, there's also a config file of a weather conky. Both look exactly as in this screenshot.

Update, 04/23/16: The weather conky is broken since Yahoo locked its weather API. The workaround posted here stopped working a few days later. Plenty of alternatives exist, but I haven't found the time yet to examine them.

Update, 04/24/16: Found another workaround. Let's see how long that one lasts...

Update, 01/03/19: It's dead, Jim. Important EOL Notice: As of Thursday, Jan. 3, 2019, the weather.yahooapis.com and query.yahooapis.com for Yahoo Weather API will be retired.