Haui's bytes

news, diary, journal, whatever

Who Is Phoning Home

The current development in consumer electronics like tablets or mobile phones couldn’t be worse. At least if you care a little bit about privacy. A few years ago, Spy- and Adware that came bundled with other software was unwelcome to (almost) every Windows user. Internet Discussion Boards were overrun by people trying to get rid of annoying web browser bars that were installed together with their favorite file sharing tool. Others even refused to use the Opera web browser because it was ad-sponsored.

Nowadays, we don’t have these problems anymore. Spy- and Adware is now socially accepted and more popular than ever before. Naturally, the name has changed - the euphemism App just sounds cool and innocent.

Many (or even most) Apps for Android or IOS, the two major operating systems found on mobile phones and tablets, come with some important built-in consumer information right away. To retrieve these ads, an internet connection is needed because the advertising is tailored specifically for your needs (I know, it’s great). Apart from this, an active internet connection introduces another great field of application. While Ads are transmitted to your device, your personal data may be sent from your device to trustworthy servers around the world. So it’s a win-win situation for everybody - except for you.

The permission concepts of Android and IOS are, of course, completely useless because nobody cares anyway - “A flashlight app that requires access to Internet, Email accounts and GPS information - seems legit!”. In addition, the modern operating systems are an ideal breeding ground for these privacy nightmare applications because of their standardized APIs. In contrast to a classical desktop computer system, it’s extremely easy to collect data like calender events or Emails automatically because the APIs already include some ready-to-run methods for these purposes.

However, apart from avoiding this new technology, there isn’t that much you can do about the aforementioned issues. Still, there are ways to figure out which data is sent from and to your device and it’s even possible to filter this traffic. I’ll describe on of these ways in the following. But be warned - there’s no App for this and you will need a root shell ;)

../../../_images/bridge.png

The image describes the basic setup. The PC acts as a Wi-Fi hotspot and forwards the traffic received from a tablet to the DSL router. To turn the PC into a hotspot, a Wi-Fi USB adapter that can act as a hotspot (iw list |grep AP) is required. Once the adapter is properly installed on your system, it’s easy to create a hotspot with hostapd. The Arch Wiki contains some information about the configuration, but the default configuration file is quite self-explanatory and just needs a few adjustments. After the setup is done, brctl (on Debian contained in the package bridge-utils) is used to create a bridge that connects the wireless and the non-wireless network interfaces:

brctl addbr br0
brctl addif br0 eth0

Don’t forget to add the correct bridge configuration line in your hostapd.conf:

bridge=br0

After the interfaces are brought up, you may start hostapd:

ip link set dev eth0 up
ip link set dev wlan0 up
ip link set dev br0 up
hostapd -B /etc/hostapd/hostapd.conf

If no errors occurred, you should be able to connect your wireless device to the newly created hotspot. As all traffic now flows through the PC system, we’re able to record and inspect the network packets. tcpdump is the tool of choice for this task:

tcpdump -ni br0 -w tablet.pcap

The command collects all packets passing the bridge interface and writes them to a file tablet.pcap. No need to mention that the command must be run with root privileges. Once enough packets are collected, the PCAP file can be inspected with Wireshark. Hence we may, for example, check if the login data for our favorite shopping app is sent via a SSL secured connection or as plain text. As a thorough explanation of Wireshark’s (and tcpdump’s) capabilities could easily fill an entire book, I recommend you take a look at the documentation if you’re interested in topics like filter expressions. However, basic knowledge of the TCP/IP protocol suite is mandatory for this.

I’ve mentioned earlier, that the setup not only allows us to capture all network traffic, but also enables us to filter (and even modify) the traffic. A few basic firewall rules are enough to stop all communication between the tablet and a specific IP/IP range:

iptables -A FORWARD --dest 192.168.1.0/24 -j DROP
iptables -A FORWARD --src 192.168.1.0/24 -j DROP

In the example, I’ve used a locally assigned IP address range - in a real world example you would most likely pick a non-private IP. Filtering by IP addresses however is not always a satisfying solution. Sometimes the packet payload is way more important than the destination or source information. To filter out all packets containing a specific string like “adserver”, iptables’ string matching extension is very useful:

iptables -A FORWARD -m string --string "adserver" --algo=kmp -j DROP

For this deep packet inspection, better tools might exist, but I’m not familiar with these (and my iptables skills have also become quite rusty).

All in all, a Linux driven hotspot opens up completely new possibilities when compared to a standard Access Point. Still, inspecting the traffic and creating proper firewall rules is a very cumbersome procedure.

Custom Git Prompt

From all the available Revision Control Systems I know, Git is my personal favorite. I use it for all my revision control needs at work and at home. As a consequence, I got quite a few git repositories residing in my home directory. To keep track of all these repositories, I modified my bash prompt in a way that it displays the most important git infos in the current directory (of course only if the directory is a git repo). My solution basically consists of two parts. First, I wrote a simple Perl script, that summarizes and colors the output of git status -s. Second, I created a bash script that is sourced at the end of the ~/.bashrc. This bash script overwrites the standard cd command and checks whether the new directory is a git repository. If the check doesn’t fail, the output of the Perl script is integrated in the prompt.

In a first version, the bash prompt was completely refreshed after every command executed in the shell. For very large git repositories, however, this led to a noticeable lag after every command. So, I decided to rerun the git status script only if the last command executed in the shell was likely to modify the status of the git repo. My list of commands, that makes no claim to be complete, includes vim, cp, mv, and, of course, git. To manually rerun the status script, and, thus, update the prompt, I included the command g, that actually only triggers the git status script. As a drawback of this solution, changes made in the same git repo are not visible across multiple concurrent shell sessions. For my needs, however, it works well enough. See the following screenshot for a short demonstration:

../../../_images/gitstatus.png

Please note that my prompt spans across two lines by default (over three when inside a git directory).

You can download the two scripts here. Move gitstatus into ~/bin/ and make it executable. Drop gitstatus.sh in your home directory as .gitstatus and add the following line to your .bashrc

. ~/.gitstatus

You may of course modify the GITPREFIX and GITSUFFIX variables in gitstatus.sh to fit your needs.

Find files in APT packages

On a Debian based Linux distribution like Crunchbang, it’s quite easy to determine which package a specific file on the system belongs to. Issuing dpkg-query -S /usr/lib32/gconv/UNICODE.so on my desktop system tells me, that the file belongs to the libc6-i386 package - the 32 bit shared libraries for AMD64.

Sometimes, however, it comes handy to know what package a file not present on the filesystem belongs to. One prominent example is the tunctl program, that allows the creation of TUN/TAP interfaces on Linux. A search for tunctl via aptitude search tunctl doesn’t yield any results. That’s where apt-file comes into play. After installing it via aptitude install apt-file and updating the index with apt-file update, we can start a more advanced search. Using apt-file find tunctl, we get the info that the file /usr/sbin/tunctl is included in the uml-utilities package. To allow more sophisticated searches, apt-file offers various advanced options, e.g., for case insensitive searches based on regular expressions.

Sudoku

Solving sudoku puzzles usually requires nothing more than a pen and some time. Solving 50 sudoku puzzles, however, requires a huge amount of time. 50 sudoku puzzles...? Yep, to get the solution for Problem 96 on ProjectEuler, 50 sudokus need to be solved first. I’ve solved my first 49 problems on ProjectEuler a few years ago and recently rediscovered the website. So I started with the sudoku problem and got the solution quite fast by using a simple brute force algorithm. I’m not going to post the solution for the problem, but just the C++ code for my sudoku solver. After compiling the program with g++ -std=c++11 -O6 -o sudokusolver sudokusolver.cpp, sudokus given in an input file are solved with a simple recursion based algorithm. An example is given below:

Trying to solve:
---------
003020600
900305001
001806400
008102900
700000008
006708200
002609500
800203009
005010300
---------
    |
    V
---------
483921657
967345821
251876493
548132976
729564138
136798245
372689514
814253769
695417382
---------

The sudokus in the input file must consist of 9 consecutive lines containing the initial values inside the sudoku. Blank fields are represented by zero. Multiple sudokus have to be separated by at least one dash in a single line.

Gnuplot

Some years ago, when I had to create plots from measurement data for a student research project I first started using Gnuplot. Although several powerful alternatives exist, I never felt the need to switch away from Gnuplot and I’m still using it for all my plotting tasks. In almost all cases, these tasks consist of visualizing data that was acquired with the help of other tools, such as Iperf. Sometimes this data is not suitable to be feed directly into Gnuplot, but requires some preprocessing first. So my usual approach was to write a Bash or Perl script to preprocess and sanitize the collected data and then dynamically generate a Gnuplot script for this specific data set. Depending on data, however, this felt like a cumbersome and suboptimal approach. The two scripts res.pl and wrapper.sh illustrate this point. The Perl script is called by the bash script and transforms the input data into a whitespace seperated data set. The bash script then extracts several additional information, e.g. the axis labels. Then, a Gnuplot script is generated and executed to finally create the plot. Although this works, a more elegant way would have been the utilization of a gnuplot library, such as Chart::Graph::Gnuplot. I’ve written the small demo script frequency.pl to illustrate the usage of the aforementioned library. The script counts the character frequency of an input file and creates a plot like the following from the collected data.

http://pdes-net.org/x-haui/pics/frequency.png

All in all, the usage of this library feels much more comfortable, especially when dealing with poorly formatted input data that requires a great amount of preprocessing. Of course, Gnuplot bindings for languages other than Perl do exist as well.