Yay!

Veteran users of Archlinux scoff at AUR helpers, tools that automate certain tasks for using the Arch User Repository (AUR). If you listen to them, they all install packages from the AUR in the canonical way, i.e., via 'makepkg -si'. In reality, of course, they write their own scripts for interacting with the AUR. 😉

I've used yaourt from the start until it became clear that it has serious security issues. At that point I switched to pacaur, which was declared to be orphaned in December 2017.

Well, when looking at the table displayed in the AUR helper page, it's clear that there are a few promising candidates for a future AUR helper on my desktop. I finally opted for yay since I like the name and appreciate the support for all three shells I'm using. But what I like best is the system summary:

➜  ~ yay -Ps
Yay version v8.1115
===========================================
Total installed packages: 1821
Total foreign installed packages: 81
Explicitly installed packages: 510
Total Size occupied by packages: 12.1 GiB
===========================================
Ten biggest packages:
texlive-fontsextra: 1.0 GiB
texlive-core: 384.7 MiB
libreoffice-fresh: 370.1 MiB
linux-firmware: 297.6 MiB
atom: 244.3 MiB
languagetool: 229.6 MiB
nvidia-utils: 183.1 MiB
jre10-openjdk-headless: 169.6 MiB
chromium: 164.6 MiB
gravit-designer-bin: 159.8 MiB
===========================================
:: Querying AUR...
 -> Orphaned AUR Packages:  ipython-ipyparallel
 -> Out Of Date AUR Packages:  hyperspy  netselect  pacserve  scidavis  ttf-vista-fonts

The fifth biggest package in this list is atom, an editor. 244 MB for an editor? Well, there are very few applications that I'm using as much as an editor, and I'm thus constantly evaluating potentially promising newcomers in the editor/IDE category. Is atom worth the ¼ GB of disk space? You'll get the answer in the comprehensive editor shootout that I will post later this year. 😉

A one-liner to upgrade your virtualenvs

This blog is powered by Nikola, a Python-based static blog compiler, which I've installed in a Python virtualenv to separate it from the system-wide Python installation of my notebook. Updates of the major version of Python (like from 3.6 to 3.7) inevitably break these virtualenvs, and I have so far accepted that there's no other way to get them back than to rebuild them from scratch. In fact, that's what you get to hear even from experienced Linux developers.

The recent update to Python 3.7 brought that topic back to my attention, and I kind of lost my patience. I just couldn't accept that there shouldn't be a better way, and indeed found a solution for those using the venv module of Python 3.3+:

python -m venv --upgrade <path_of_existing_venv>

Despite the fact that I'm using virtualenv instead of venv, this command worked exactly as I had hoped. ☺

The virtualenv can now be updated as usual. Well, almost – both pip and pip-tools got a lot more conservative and actually have to be told explicitly that they really should upgrade to the latest version. For a particular package, that looks like this:

pip install --upgrade Nikola --upgrade-strategy=eager

A rather weird behavior, if you ask me, but what do I know. ☺

Back to our virtualenv. To really, genuinely and truly update all requirements, the following sequence of commands is necessary:

pip install --upgrade setuptools
pip install --upgrade pip
pip install --upgrade pip-tools
pip-compile --rebuild --upgrade --output-file requirements.txt requirements.in
pip-sync requirements.txt

The update to Nikola 8.0.0 broke my old theme (based on bootstrap2), and it was about time: too many things were not working as desired. I'm now using an essentially unmodified bootblog4, as before with Kreon as the main font, and Muli for the headlines (from 15.12.2018: Oswald for the latter).

Security headers

A few simple provisions were sufficient to make this blog GDPR compliant. Even less was required to get a high rating regarding security. As a first step, I've obtained certificates from the Let's Encrypt initiative and configured Hiawatha (our web server) accordingly:

VirtualHost {
        ...
        TLScertFile = /etc/hiawatha/tls/pdes-net.org.pem
        RequireTLS = yes, 31536000; includeSubDomains; preload
        ...
}

This configuration got an A+ rating from Qualys SSL labs.

However, there's more to the security of a website than transport encryption. For example, the Content Security Policy “provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website”. There are actually a number of these security headers, and after some research I came up with the following settings:

VirtualHost {
        ...
        CustomHeader = Vary: Accept-Encoding
        CustomHeaderClient = X-Frame-Options: sameorigin
        CustomHeaderClient = X-XSS-Protection: 1; mode=block
        CustomHeaderClient = X-Content-Type-Options: nosniff
        CustomHeaderClient = X-Robots-Tag: none
        CustomHeaderClient = X-Permitted-Cross-Domain-Policies: none
        CustomHeaderClient = Referrer-Policy: same-origin
        CustomHeaderClient = Expect-CT: enforce; max-age=3600
        CustomHeaderClient = Content-Security-Policy: frame-src 'self'; worker-src 'self'; connect-src *; default-src 'self'; img-src 'self' data: chrome-extension-resource:; font-src 'self' data:; object-src 'self'; media-src 'self' data:; manifest-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; base-uri 'none'
        ...
}

The CSP proved to be tricky since Chromium did not display images in SVG format with stricter settings. A post of April King finally provided the missing piece of the puzzle. Furthermore, MathJax only works in my implementation if I allow scripts and styles to be included 'unsafe-inline'. As a result of this latter setting, my current configuration achieves only a 'B+' instead of the 'A+' depicted below.

Update: I've cleaned up and simplified the CSP, which now reads

CustomHeaderClient = Content-Security-Policy: frame-ancestors 'none'; frame-src 'self'; default-src 'self'; object-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; base-uri 'none'; form-action 'none'

That still gets a 'B+' because of the 'unsafe-inline' options.

Mike Kuketz listed a number of online scanners evaluating the implementation of security policies on arbitrary web sites. One of the most informative one of these tools is Mozilla's observatory, developed by the same April whose post had helped me with the CSP. ☺

../images/observatory.png

The office suites disaster

I was recently involved in the preparation of a project proposal, i.e, the attempt to obtain external funding for a certain research project. This particular project is a joint initiative of two academic institutions, and after having agreed that we would like to collaborate, a practical question arose. How should we prepare the documents required for the proposal?

The youngest of us proposed to use Google Docs (“it's so convenient”), but this proposal just received hems and haws from all others. My colleague then remarked, with an apologetic smile, that I would work exclusively with LaTeX. Our benjamin cheerfully chimed in and suggested Authorea. Once again, his proposal was met with limited enthusiasm. I gleefully added CoCalc as another LaTeX-based collaborative cloud service and briefly enjoyed the embarrassed silence that followed. After a few seconds I hastened to add that I, being foreseeably only responsible for a minor part of the proposal, would agree to anything with which the majority would feel comfortable with (which, in hindsight, was entirely besides the point).

In the end, we decided to use a Microsoft Sharepoint server run by our partner institution. I didn't connect to it myself, but was told by my colleague that the comments and tracking features didn't work correctly. And so we ended up with good ol' (cough) Microsoft Word as the least common denominator.

Well, I have LibreOffice (LO) on all of my systems, which should do for a simple document like the one we would create. To be on the safe side, I've ordered a license for Softmaker Office (SMO) which is praised for being largely compatible with MS Office (MSO), and which I planned to test anyway in my function as IT strategy officer (muhaha). Besides, in the office I have Microsoft Word 2007 available in a virtual machine running Windows 7. So what could go wrong?

At the very beginning, our document contained only text with minimal formatting, and was displayed with only insignificant differences on MSO and LO. After a few iterations, comments and the track of changes became longer than the actual text. That's when the problems started.

We were in the middle of an intense discussion when I noticed that, from my view, it just didn't seem to make any sense. My coworkers were discussing item (1) of an enumerated list, but their arguments revolved around an entirely different subject. When the discussion moved to item (2), I suddenly understood the origin of this confusion. LO, apparently triggered by a deletion just prior to the list, had removed the first entry, and my list thus only consisted of items (2)–(7), but LO enumerated them as (1)–(6). Gnarf.

Since my SMO order had not yet been approved (it's no problem to order MSO licences for €50,000, but an unknown product for €49.99 stirs up the entire administration), I decided to open and edit the proposal by MSO 2007. Only to discover that my colleagues, using MSO 2013 and 2016, talked about a paragraph that simply didn't exist in my version of the document. The excessive tracking also broke compatibility to MSO 2007. I wasn't too surprised, but at this point I really looked forward to SMO.

When the license finally arrived, we were busy with creating figures. A simple sketch created by our youngster in Powerpoint 2016 looked more like surreal modern art in LO, and not much better in MSO 2007. But in SMO, it was missing altogether. What the hell?

It turned out that the much acclaimed SMO has serious deficiencies with vector graphics, particularly under Linux. This flaw, together with the missing formula editor, makes SMO basically unusable in a scientific environment. That's a pity, since SMO is responsive and has a flexible and intuitive user interface.

Anyway, with the deadline approaching rapidly, my colleagues started to work on the proposal over the weekend at home. It turned out that none of them had MSO at home, so they all used LO. Now everyone's printed version deviated in one way or the other from the master copy on ownCloud, and we had to carefully compare the content, line by line, in the subsequent discussion.

Miserable pathetic fools! Why on earth didn't we agree at the outset of this endeavour to use LO? We could have agreed even on the same exact version to use. Our work on the proposal would have been easier and much more efficient. And instead of creating figures the hard way with Powerpoint, we could have used a full-scale vector graphics suite such as Inkscape, since recent versions of LO support the svg format. Equations could have been handled with TeXMaths, which allows users to insert any kind of math losslessly in an LO document. Here's a slide composed exclusively of vector graphic elements:

../images/vector_graphics_in_libreoffice.svg

No, I did not suddenly become an enthusiastic user and advocate of LO. On the contrary, I still firmly believe that for the task at hand, LaTeX would have been most appropriate and convenient solution. However, LO would have obviously been a much more rational choice than MSO.

When I said that aloud, my colleagues gave me this 'oh-my-god-now-he's-going-mad' look. Normal people accept LO for personal use, but not for a professional one. Why? Well, they have this deeply internalized believe that what you get is what you pay for. Free software is for amateurs and nerds, but as a professional, one uses professional software! Oh wait ...

Pagespeed

After having removed all external resources from this blog, I was curious to see how well it fares in terms of performance. See for yourself:

I was pleasantly surprised, since all I did otherwise to improve performance was to add a few file types to those that are subject to gzip compression, and a few more to those types that the client should cache for some time. To get a well performing website is actually not as terribly difficult as I previously believed (Ich habe eine Blogsoftware in C gehackt, mit dietlibc, libowfat, unter gatling laufend und mit einem tinyldap-Backend.)

The number one reason to use Linux

A reader of my previous post remarked that while he found my reminiscences amusing, he believes them to be factually incorrect. Windows and MacOS, he insists, are not prone to crashes as I've described, but are just as stable as any Linux.

That's more or less true, but only for times more recent than those I've dwelled on, namely, the times prior to the release of MacOS X and Windows XP in 2001 (and yes, I knew the older versions of NT and was impressed by their stability, but I didn't like their price tag!).

But if recent versions of MacOS and Windows do not exhibit any deficiency in terms of stability, the reader asks, is there any other reason to prefer Linux?

One? Well, let's at least briefly discuss some of the aspects that led me to chose Linux instead of Windows or MacOS. GNU/Linux is free software, both as in speech as in beer. Contrary to the free software foundation, I'm pragmatic enough to appreciate the latter point as least as much as the former one. Had I decided to employ commercial software for my work, I would not have been able to equip desktop, notebook, and netbook with the complete set of applications required for my work. As it is, I'm fully equipped to work at both office and home, at a pub, or while travelling, without the need to carry a high-end notebook back and forth between all of these locations. I greatly enjoy this freedom.

But free software is not a phenomenon restricted to Linux. In fact, almost everything I really need for my work on an everyday basis is available for Windows as well. So what's the difference?

The difference is the effort is takes to get everything working. On Windows, software installation hasn't changed since the 1990s. Software is installed from any installation medium deemed to be trustworthy by the user or, more recently, by Microsoft. In the 1990s, freeware was installed from CDs acquired at computer shops, nowadays, from any place in the internet. The user usually 'googles' for the software, opens the first hit without checking if it is the developers' site or some obscure download portal, downloads the installer, and installs it by clicking on various 'next' and 'yes' buttons without paying attention to the browser toolbars and other potentially unwanted utilities installed at the same time. The user repeats this awkward procedure for every software to be installed. And that's just the beginning: for the vast majority of both free and commercial software, the user has to check for updates himself, and if there is an update, he has to go again through the same archaic, error-prone, and time-consuming routine.

That's because the Microsoft updater only updates Windows and software from MIcrosoft (such as Office), but nothing else. This update is scheduled to occur only once per month (at the patch tuesday), but is so slow and heavy on resources that every Windows user dreads this day. No wonder Windows users hate updates – but it never ceases to surprise me that they don't draw the obvious conclusion. In particular since this situation is not a mere inconvenience, but inevitably results in unpatched systems that are easily compromised. With potentially unwanted consequences. 😉

The game-changing feature of Linux compared to its commercial rival(s) is the existence of central software repositories, containing tens of thousands of individual program packages, combined with a powerful built-in package management system. Updating the system updates every single program installed on it. Security updates are not held back but are delivered in time, typically even before you read about the underlying vulnerability in the news—at least when using the big six. And installing the updates is usually a matter of seconds to (at most) minutes and does not require a reboot (with the exception of libc and kernel updates).

But that's not the only advantage of the GNU/Linux software distribution model. For example, one can transform a vanilla Linux installation into a dedicated workstation within minutes, and I find that immensely useful. That's because I need a number of applications for my work, including texlive, libreoffice, gnuplot, numpy, scipy, matplotlib, mpmath, gmpy, sympy, pandas, seaborn, hyperspy, gimp, inkscape, scribus, gwyddion, imagej, vesta, and perhaps a few more. On Arch Linux, I'm able to install all of these packages with a single command that takes me a few seconds to enter. And from that moment on, all of these applications are included in the update process, and I get the most current version available automatically. On Windows, I would not only have to search for these programs on the internet, manually download the installation files, and install all programs individually, but I also would have to update all of this software by myself, without any kind of automatism. That means once again visiting the respective websites, downloading the new installation files, ...

Just the first time installation is the work of one day. Do you really think I want to continue doing that for the rest of my life? As a matter of fact, I don't, just as everybody else. And as a consequence, you'll find plenty of outdated software on your average Windows installation. This situation is arguably the major reason why Windows is the preferred target of malware developers: they can simply rely on the fact that their drones will find a huge number of vulnerable targets.

When I set up my wife's gaming rig, where Windows serves as a game starter, I've decided to install and update the few applications she needs with Ninite, and I'm not unhappy with this poor man's package manager. I've also installed Secunia's Personal Software Inspector to detect outstanding updates. Alas, this very useful tool has been discontinued, and I have to find a replacement. There are several contenders, but as usual under Windows, it is not clear which of them are trustworthy, in the sense that which of them are actually doing the job advertised, and not instead being a vehicle to deliver ads to your desktop. It takes time to separate the wheat from the chaff, time I could have otherwise spent on much more valuable and enjoyable activities. That's Windows, the greatest productivity killer ever invented.

Perhaps I should have a look at Chocolatey, which claims to be a “real” package manager for Windows (for a comparison with Ninite, see here). Now wouldn't that be the perfect tool for Windows, which is marketed since version 10 as a service receiving regular updates that contain new features and other changes? Sure it would, but what does Microsoft do with this chance to establish Windows as a rolling release model? They stick to the patchday and roll out new features in the form of semiannual 'Creator Updates', which turn out to require a full upgrade installation. Home users can't even delay this upgrade.

That's ... well, pathetic. In the end, Windows in 2020 will be no different from Windows in 1994: a pile of duct-taped debris desperately trying to look like a modern operating system.

General Data Protection Regulation

The new European data protection law (the GDPR, or DSGVO in German) will be in effect in exactly 12 days from now. Time to act! I basically had the choice between implementing a privacy policy detailing on several pages why I definitely need to use all these external services, and finding alternatives. Well, since I very much prefer technical solutions over legal ones, the choice wasn't too difficult. 😉

I've used Google Analytics mainly for two reasons: first, I like to look at aggregate statistical data, and second, I don't see the evil in it. The user can block this external service in uncountable ways, for example, by simply rejecting the corresponding cookie, or by installing a script- or adblocker. Instead of creating a monstrous legal construct such as the GDPR, one should rather educate users to take care of themselves again. But the data protection industry only exists since these things are massively blown out of proportion, and of course they serve their own interests. They like to see the users as drooling rugrats, and they like to keep them like that. Well, that's the general trend of all western societies.

../images/ct_schlagseite_2018_11_90.webp

c't-Schlagseite von Ritsch+Renn in c't 11/2018

Anyway, since I recently integrated a black list into my local DNS resolver, it's actually not that simple anymore to access the Google analytics domain. 😄 Furthermore, I'm interested how many of my readers actually block Google Analytics. And most importantly, I thought it would be a good idea to take the opportunity to get rid of this proprietary garbage. 😊

I thus searched for local web analyzers and came across several, but took an immediate liking in goaccess. It's easily installed and even easier to use:

# echo "deb http://deb.goaccess.io/ $(lsb_release -cs) main" | tee -a etc/apt/sources.list.d/goaccess.list
# wget -O - https://deb.goaccess.io/gnugpg.key | apt-key add -
$ sudo apt update
$ sudo wajig install goaccess

Goaccess comes preconfigured with the log formats of the most popular web servers, but for Hiawatha, our web server of choice, manual configuration was required. Fortunately, I found all relevant information in the Hiawatha forum:

time-format %T %z
date-format %a %d %b %Y
log-format %h|%d %t|%s|%b||%r|%v|%u|%^|%^|%^|%R|%^|%^

Since the reports of goaccess are directly generated from the access logs of the web server, I anonymized the IP by activating the

anonymizeIP = yes

option in /etc/hiawatha/hiawatha.conf.

Finally, I configured goaccess to automatically generate html reports. Logrotate takes care of periodically deleting the logs (which don't contain personal information anyway). The reports, by the way, show that at least 75% of all visitors of this site block Google Analytics. Excellent! Also, I'm happy to see that I have four times as many readers than I thought. 😉

The next task was a local installation of the Google fonts I'm using. That was very quickly done by cloning the bash script 'best-served-local' of Ronald van Engelen and running it:

git clone https://github.com/ronalde/best-served-local
cd best-served-local
./best-served-local -i ../static/fonts "Kreon:300,400,700" > ~/temp/fonts/kreon.css
./best-served-local -i ../static/fonts "Fira Mono:400,700" > ~/temp/fonts/fira.css

The css snippets thus created have to be included in the main css file of the blog's theme, which in my case is a modified bootstrap. The fonts themselves go into ../output/assets/static/fonts (that has to be consistent with the option for the command line parameter -i above).

Next, I wanted to get rid of the dependency on an external resource for MathJax. I struggled first with a local installation of KaTeX, but couldn't get it to work. MathJax instead was very easy. I simply installed libjs-mathjax and fonts-mathjax on pdes-net.org, and copied it to a location accessible to the webserver

cp -r /usr/share/javascript/mathjax/ /var/www/hiawatha/cobra/output/assets/static/mathjax/

and back to my local blog installation

scp -r cobra@netcup:/var/www/hiawatha/cobra/output/assets/static/mathjax/ home/cobra/ownCloud/MyStuff/Documents/pdes-net.org/output/assets/static/

Finally, the Nikola configuration file had to be updated correspondingly:

<script src="../assets/static/mathjax/MathJax.js?config=TeX-AMS_SVG"></script>

Finally, the search box on this site already works on the client side by virtue of tipuesearch. So, I was done!

Ah, not entirely: I still had to update my contact page. After doing that, I also moved the link to the bottom of the page as its customary on most sites.


Well, when I look at the result, I'm actually quite pleased. I feel that I have really done something for the good of my visitors, instead of continuing to act as a data collector for Google and others, and justifying that by tons of legal mumbo-jumbo in a privacy policy nobody reads or understands. But IANAL, and it is likely that this particular species actually prefers the latter, no matter what the GDPR says about transparency and plain language. Let's see.

Server speed

An important feature of a server is the speed of its internet connection, or more precisely, its latency and bandwidth. How can we measure these quantities if all we have is command line access?

Regarding latency, look at my last post. Here's an example from pdes-net.org with the fastest mirror:

± netselect -vv https://mirror.de.leaseweb.net/debian/
Running netselect to choose 1 out of 1 address.
............
https://mirror.de.leaseweb.net/debian/ 2 ms   6 hops  100% ok (10/10) [    3]

If you want a closer look at the 6 hops, use mtr-tiny.

Concerning bandwidth, use speedtest-cli:

sudo wajig install speedtest-cli

Here's an example from pdes-net.org:

± speedtest
Retrieving speedtest.net configuration...
Testing from netcup GmbH (185.170.112.87)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by IT Ohlendorf (Salzgitter) [111.93 km]: 11.215 ms
Testing download speed.................................................
Download: 427.58 Mbit/s
Testing upload speed...................................................
Upload: 393.16 Mbit/s

Hm. A ping of 2 ms and a symmetric down- and upload of 0.4 GB/s for a handful of € per month? Why can't I have that at home?

HTTPS mirrors

And while we're at it, let's also configure HTTPS mirrors for our package updates. That may seem superfluous at the first glance, as these packages are public content and are signed with the private GPG keys of the developers, certifying their authenticity. However, the signatures are only one part of the story, and the encrypted transfer is the other. In fact, updates installed via a plain-text HTTP connection can be intercepted by GPG replay attacks. A comprehensive analysis of this scenario was done a decade ago at the University of Arizona. The following brief summary is due to Joe Damato:

“Even with GPG signatures on both the packages and the repositories, repositories are still vulnerable to replay attacks; you should access your repositories over HTTPS if at all possible. The short explanation of one attack is that a malicious attacker can snapshot repository metadata and the associated GPG signature at a particular time and replay that metadata and signature to a client which requests it, preventing the client from seeing updated packages. Since the metadata is not touched, the GPG signature will be valid. The attacker can then use an exploit against a known bug in the software that was not updated to attack the machine.”

The distributions I'm currently using and am familiar with (Archlinux and Debian) do not use HTTPS mirrors by default, but can be coaxed into doing so.

That's particularly easy for Archlinux after installing reflector, a python script that allows to filter mirrors by various criteria such as their geographical location, up-to-dateness, download rate, and, last but not least, connection protocol. The following one-liner overwrites /etc/pacman.d/mirrorlist with the current top-ten of all HTTPS mirrors in terms of up-to-dateness, overall score and speed:

reflector --verbose --protocol https --latest 100 --score 50 --fastest 10 --sort score --save /etc/pacman.d/mirrorlist

This command can be run manually, or automatically by either using a pacman hook to trigger it in the event of mirrorlist updates, or by a timed systemd service. Reflector is thus an as flexible as convenient tool for selecting the optimum mirrors.

I expected that Debian would offer something equivalent, but to my considerable surprise and disappointment, there's nothing coming even close. The netselect derivative netselect-apt is only capable of finding the ten fastest mirrors for the relevant release of Debian (i.e, stable, testing, or sid). But how do I know if these mirror support HTTPS? To the best of my knowledge, the only way short of trying them one-by-one is the python script available here (or one of its forks).

Armed with this script (I'm using the multithreaded variant), I usually just create a list with all mirrors, which I then pipe through netselect to find the fastest one:

± ./find-https-debian-archives.py --generic --no-err | awk '{print $1}' | grep https | tr '\n' ' ' | xargs netselect -vv

Oh, and one should not forget to install 'apt-transport-https' to allow apt to use HTTPS mirrors. Only true Debilians will find this situation tolerable.

Let's encrypt

Ever since I've set up the new server for this blog, I've wanted to make the switch from plain HTTP to TLS-encrypted HTTPS (if you think HTTPS is for online shops and banks only, think again).

This transition turned out to be much easier than I thought. Hiawatha, our web server of choice, comes with a script that takes care of registering the site at Let's Encrypt and requesting certificates for the associated domain(s). Chris Wadge, the maintainer of Hiawatha for Debian, provided an excellent tutorial guiding through the few steps necessary to configure Hiawatha for serving HTTPS content.

Since I had to configure vhosts for the certificates anyway, I took the opportunity to set up some proper subdomains. For example, this blog can now be reached at https://cobra.pdes-net.org.

After a bit of tweaking (setting HSTS to one year), the security rating of our site is flawless:

../images/qualys.png