Torsten on Tech2020-01-02T00:00:00Zhttps://torstencurdt.com/tech/feed.xmlTorsten Curdthi@torstencurdt.comhttps://torstencurdt.com/tech/posts/why-javadocs-suck/Why Javadocs Suck2005-03-23T00:00:00Z<p>Over the years I’ve been exposed to a number of Java code bases. And over time it has became painfully obvious that</p>
<blockquote>
<p>Javadocs just suck. Not as a concept, but what most teams make out of them.</p>
</blockquote>
<p>The main problem of documentation is that it needs to kept up to date. Having it separate from the code makes it really hard to mantain. Which means the idea behind Javadocs is not so bad. But let’s have a look at the following code and Javadoc snippet:</p>
<pre><code>public class FooEntry implements ArchiveEntry {
/** The entry's name. */
private String name = "";
/**
* Construct an entry with only a name.
*
* @param name the entry name
*/
public FooEntry(String name) {
this.name = name;
}
/**
* Get this entry's name.
*
* @return This entry's name.
*/
public String getName() {
return name;
}
/**
* Write an entry's header information to a stream.
*
* @param out an OutputStream to write to
* @throws IOException on error
*/
public void writeEntryHeader(OutputStream out)
throws IOException {
// write first byte
out.write(1);
// write second byte
out.write(2);
}
}
</code></pre>
<p>What’s the benefit of having Javadocs like the above? They are mostly re-iterating what is already expressed in code. A lot of it is nothing more than redundant boilerplate. Usually we programmers <span class="nobr">-as a species-</span> hate redundency. It makes you wonder.</p>
<blockquote>
<p>Why do some developers write such bad Javadocs?</p>
</blockquote>
<p>For one there is a good chance they <em>did not</em> write (much of) these docs in the first place - instead they just let their IDE generate them. I wouldn’t be surprised if the primary goal was to have just enough Javadocs in place to let checkstyle pass the build.</p>
<p>Of course the above snippet is a bit of an abstraction and exaggeration but we all know better than calling this unheard of.</p>
<h2>How to write better Javadocs?</h2>
<p>First and foremost just empathize with the person reading and trying to understand your code.</p>
<p>Focus on self-explanationary class, function and variable names. Often that’s even better than comments because the explanation is visible not only at declaration time but with every use.
That’s the path to <a href="https://martinfowler.com/bliki/CodeAsDocumentation.html">code as documentation</a>. A lot of comments and Javadocs then become duplication and can just be removed. This lets developers see the code flow with less visual clutter.</p>
<p>Explain algorithms and usage as much as possible at the <em>class level</em>. Focus on the <em>Why?</em>, explain the important parts of the <em>How?</em> and give <em>Notice</em> of any things important for usage. This helps people looking for usage information and it helps developers digging into the implementation.</p>
<p>If you want to force proper Javadocs with checkstyle do it only for <em>non-private</em> or even <em>public</em> scoped elements.</p>
<p>For the above example this could look as easy as this:</p>
<pre><code>/**
* Explain what "FooArchiveEntry" is used for.
* Explain how is works.
* Mention things like thread-safety and other usage specifics.
*/
public class FooArchiveEntry implements ArchiveEntry {
/*
Implementation specifics should be explained here.
*/
private String name = "";
public FooArchiveEntry(String name) {
this.name = name;
}
public String getName() {
return name;
}
public void writeEntryHeader(OutputStream out)
throws IOException {
out.write(1);
out.write(2);
}
}
</code></pre>
<p>While this post focused on purely Java this concept is of course applicable to other languages as well. As with many things it’s the mindset that counts.</p>
https://torstencurdt.com/tech/posts/duff-device/The Duff Device2006-01-10T00:00:00Z<p>“<a href="https://en.wikipedia.org/wiki/Duff%27s_device">Duff’s Device</a>” was discovered by Tom Duff in November 1983. It’s a technique more commonly seen in assembly code. It is used to reduce the counting and comparison overhead of loops - similar to <a href="https://en.wikipedia.org/wiki/Loop_unrolling">loop unrolling</a>.</p>
<p>Let’s say we start with a simple loop to copy some memory.</p>
<pre><code>for(int i=0; i<len; i++) {
*output++ = *input++;
}
</code></pre>
<p>For longer loops, the overhead of checking the loop condition adds up. On every iteration, the loop needs to check the loop counter against the <code>len</code>.</p>
<p>This is what the same loop looks like using a <a href="https://en.wikipedia.org/wiki/Duff%27s_device">Duff’s Device</a>.</p>
<pre><code>int n = (len + 8 - 1) / 8
switch(len % 8) {
case 0: do { *output++ = *input++;
case 7: *output++ = *input++;
case 6: *output++ = *input++;
case 5: *output++ = *input++;
case 4: *output++ = *input++;
case 3: *output++ = *input++;
case 2: *output++ = *input++;
case 1: *output++ = *input++;
} while(--n > 0);
}
</code></pre>
<p>It looks like someone tricked the parser but for better or worse it really is a valid C program. When you disentangle the two constructs it becomes a little easier to follow what is going on.</p>
<pre><code>int n = (len + 8 - 1) / 8
switch(len % 8) {
case 0: *output++ = *input++;
case 7: *output++ = *input++;
case 6: *output++ = *input++;
case 5: *output++ = *input++;
case 4: *output++ = *input++;
case 3: *output++ = *input++;
case 2: *output++ = *input++;
case 1: *output++ = *input++;
}
do {
*output++ = *input++;
*output++ = *input++;
*output++ = *input++;
*output++ = *input++;
*output++ = *input++;
*output++ = *input++;
*output++ = *input++;
*output++ = *input++;
} while(--n > 0);
</code></pre>
<p>In this example, we are left with only 1/8 of the original loop overhead. After the calculation of the remainder, the <code>switch</code> acts as a jump table to first copy the bytes of the remainder. After that, it’s just a loop of 8 copy instructions each until the copy is complete.</p>
<p>I’ve chosen a memory copy operation only for demonstration purposes. In the real world, the standard C library version of <code>memcpy</code> should be preferred. It may contain architecture-specific optimizations that could still make it significantly faster.</p>
<p>For other applications, it’s an interesting way of code optimization. That said if the code duplication is not too bad I would always prefer the disentangled version for clarity reasons. And as always it’s important to measure before doing any optimizations. It’s hard to know what optimizations the compiler will already apply automatically.</p>
https://torstencurdt.com/tech/posts/ditto-vs-zip/Compress using ditto vs zip2006-05-02T00:00:00Z<p>In versions of macOS up until 10.4 the <code>zip</code> command did not support <a href="https://en.wikipedia.org/wiki/Resource_fork">resource forks</a> at all. Which always was a bit strange given they are a remnant of the old Apple’s classic Mac OS. It was a feature of the MFS and HFS file system that allows storing meta data along side to files.</p>
<p>In order to zip up folders containing <em>all</em> data, you were meant to use <code>ditto</code> instead. Which is what the Finder uses to create archives, too.</p>
<pre><code>ditto -c -k -X --rsrc some_folder some_folder.zip
</code></pre>
<p>Times have changed slightly though. The standard <code>zip</code> now does support resource forks and copying resource forks is now the default behavior for <code>ditto</code>. If you want to <em>avoid</em> copying the additional metadata <code>--norsrc</code> is your friend.</p>
<pre><code>--rsrc
Preserve resource forks and HFS meta-data. ditto will
store this data in Carbon-compatible ._ AppleDouble
files on filesystems that do not natively support
resource forks. As of Mac OS X 10.4,
--rsrc is default behavior.
--norsrc
Do not preserve resource forks and HFS meta-data.
If both --norsrc and --rsrc are passed, whichever
is passed last will take precedence. Both options
override DITTONORSRC. Unless explicitly specified,
--norsrc also implies --noextattr and --noacl to
match the behavior of Mac OS X 10.4.
</code></pre>
<p>And the idea of resource forks did not die with the introduction of <a href="https://en.wikipedia.org/wiki/Apple_File_System">APFS</a>. APFS also supports, what is now called <em>extended attributes</em>. If you ever wondered how macOS remembers when you downloaded a file from the internet - that’s how. And Finder makes use of extended attributes to store things like tags for example. They are the quasi equivalent of resource forks in this new brave world.</p>
<p>The command line tool <code>xattr</code> can be used to read and write the attributes.</p>
<pre><code>$ echo "foo" > foo
$ xattr -w attr_name attr_value foo
$ xattr -p attr_name foo
</code></pre>
<p>Checking the behaviour of <code>zip</code> and <code>ditto</code> on APFS, the special <code>._foo</code> file (which holds the extended attributes data) is unfortunately only included in the zip created by <code>ditto</code>.</p>
<pre><code>$ zip foo-zip.zip foo
$ ditto -ck foo foo-ditto.zip
% unzip -l foo-ditto.zip
Archive: foo-ditto.zip
Length Date Time Name
--------- ---------- ----- ----
408 01-17-2020 00:41 foo
154 01-17-2020 00:41 ._foo
--------- -------
562 2 files
</code></pre>
<p>In the end there are now resources forks, extended attributes and a new file system. And still you should use <code>ditto</code> over <code>zip</code> to create an archive to include all meta data. Somehow this feels like history repeating.</p>
<p>TLTR: Just keep using <code>ditto</code>.</p>
https://torstencurdt.com/tech/posts/client-cert-authentication-with-java/SSL Client Cert Authentication with Java2006-10-10T00:00:00Z<p>Connecting to an https URL is easy in java. Just create a URL object and you are ready to go. If you need to provide a client certificate it gets a little more complicated to get right. You have to create a properly set up <code>SSLSocketFactory</code> to establish an authenticated connection. Next, you need to load the PKCS12 certificate into a keystore and provide that store to the <code>SSLContext</code>.</p>
<pre><code>private SSLSocketFactory getFactory( File pKeyFile, String pKeyPassword ) throws ... {
KeyManagerFactory keyManagerFactory =
KeyManagerFactory.getInstance("SunX509");
KeyStore keyStore =
KeyStore.getInstance("PKCS12");
InputStream keyInput = new FileInputStream(pKeyFile);
keyStore.load(keyInput, pKeyPassword.toCharArray());
keyInput.close();
keyManagerFactory.init(keyStore, pKeyPassword.toCharArray());
SSLContext context = SSLContext.getInstance("TLS");
context.init(
keyManagerFactory.getKeyManagers(),
null,
new SecureRandom()
);
return context.getSocketFactory();
}
URL url = new URL("https://someurl");
HttpsURLConnection con = (HttpsURLConnection) url.openConnection();
con.setSSLSocketFactory(getFactory(new File("file.p12"), "secret"));
</code></pre>
<p>If the client certificate was issued by your private CA you also need to
make sure the full certificate chain is in your JVMs keystore.</p>
<pre><code>STORE=/path/to/JRE/cacerts
sudo keytool -importcert \
-trustcacerts \
-keystore $STORE \
-storepass changeit \
-noprompt \
-file myca.pem \
-alias myca
</code></pre>
https://torstencurdt.com/tech/posts/building-debian-packages-in-java/Building Debian Packages in Java2007-01-19T00:00:00Z<p>Using native packaging formats to install Java applications leverages existing and well-tested tools and infrastructure for software deployment. But one of the perks of Java has always been the cross-platform build. Using the standard native tools to create these native packages breaks the promise of a cross-platform Java build.</p>
<p>With the <a href="https://github.com/tcurdt/jdeb">jdeb</a> project there is a way to create native Debian packages directly from your <a href="https://ant.apache.org/">ant</a> or <a href="https://maven.apache.org/">maven</a> build. It lets you create</p>
<blockquote>
<p>Debian packages without using the native tools.</p>
</blockquote>
<p>So whether you build on Linux, Windows or macOS you get a valid package that is ready to be deployed.</p>
<p>The only requirement is to add a control file that provides metadata about the package. It declares things like name, version and most importantly dependencies.</p>
<pre><code>Package: [[name]]
Version: [[version]]
Section: misc
Priority: low
Architecture: all
Description: [[description]]
Maintainer: Torsten Curdt <tcurdt@foo.org>
Depends: default-jre | java6-runtime
</code></pre>
<p>The integration is easy with <a href="https://ant.apache.org/">ant</a>. Just specify the paths and provide the data that should get included in the package.</p>
<pre><code><deb
destfile="${build.dir}/${ant.project.name}.deb"
control="${build.dir}/deb/control"
verbose="true" >
<data type="file"
src="${build.dir}/jar/${ant.project.name}-${version}.jar" >
<mapper type="perm" prefix="/usr/share/jdeb/lib"/>
</data>
</deb>
</code></pre>
<p>With <a href="https://maven.apache.org/">maven</a> jdeb attaches to the <em>package</em> phase. A build now creates the <em>jar</em> then builds the package and attaches it as a secondary artifact.</p>
<pre><code><plugin>
<artifactId>jdeb</artifactId>
<groupId>org.vafer</groupId>
<version>1.7</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>jdeb</goal>
</goals>
<configuration>
<controlDir>${basedir}/src/deb/control</controlDir>
<dataSet>
<data>
<src>${project.build.directory}/${project.build.finalName}.jar</src>
<type>file</type>
<mapper>
<type>perm</type>
<prefix>/usr/share/jdeb/lib</prefix>
<user>loader</user>
<group>loader</group>
<filemode>640</filemode>
</mapper>
</data>
</dataSet>
</configuration>
</execution>
</executions>
</plugin>
</code></pre>
<p>For more information check out the <a href="https://github.com/tcurdt/jdeb/tree/master/docs">documentation</a> and the <a href="https://github.com/tcurdt/jdeb/tree/master/src/examples">examples</a> or join the community for a chat on <a href="https://github.com/tcurdt/jdeb/tree/master/src/examples">gitter</a>.</p>
https://torstencurdt.com/tech/posts/native-file-locks-in-java/Native File Locks in Java2007-01-19T00:00:00Z<p>Since Java 1.4 there is a native IO layer. Of one the things it allows is, to create native file locks that get acknowledged by both <code>fcntl</code> and <code>flock</code> style locking. This is tremendously helpful if you need to share resources with native programs. What is in C</p>
<pre><code>int fd = open("/path/to/file", O_RDWR);
if (flock(fd,LOCK_EX) != 0 ) { ... }
printf("locked file\npress return");
char c = getchar();
if (flock(fd,LOCK_UN) != 0 ) { ... }
printf("released file\n");
close(fd);
</code></pre>
<p>and</p>
<pre><code>int fd = open("/path/to/file", O_RDWR);
struct flock lock;
lock.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 0;
lock.l_len = 0;
lock.l_pid = 0;
if (fcntl(fd, F_SETLK, &lock) == -1) { ... }
printf("locked file\npress return");
char c = getchar();
lock.l_type = F_UNLCK;
if (fcntl(fd, F_SETLK, &lock) == -1) { ... }
printf("released file\n");
close(fd);
</code></pre>
<p>becomes in java</p>
<pre><code>File file = new File("/path/to/file");
FileChannel channel =
new RandomAccessFile(file, "rw").getChannel();
FileLock lock = channel.lock();
System.out.println("locked file\npress return");
System.in.read();
lock.release();
System.out.println("released file\n");
</code></pre>
https://torstencurdt.com/tech/posts/recursive-file-listing-java/Recursive file listing in java2007-11-12T00:00:00Z<p>When the first version of this post was published, a search on the web for “recursive file java” returned only horrible examples on how to implement directory traversal. I published a version based on anonymous classes to improve on that. More than 10 years later it’s time to re-evaluate.</p>
<h2>Using Streams</h2>
<p>With <code>File.walk</code> java has gotten a <a href="https://docs.oracle.com/javase/tutorial/essential/io/walk.html">streaming API</a> that visits the given path in a depth-first manner. The most commonly found example is simple and straight forward.</p>
<pre><code>Files.walk(Paths.get(path))
.filter(Files::isRegularFile)
.forEach(System.out::println);
</code></pre>
<p>Unfortunately, a “hello world” example like this hides the shortcomings in real-world usage scenarios. While you can filter the emitted paths of the stream, there is no way to skip directories during traversal. Even worse - there is <a href="https://bugs.openjdk.org/browse/JDK-8039910">no way to gracefully handle exceptions</a> that are bound to happen. With <code>Files.find</code> you can pass in a predicate that might help to reduce redundant retrieval of file attributes - but other than that it suffers from the very same problems. I’d recommend avoiding both API methods because of that.</p>
<h2>Using the Visitor</h2>
<p>This leaves us with <a href="https://docs.oracle.com/javase/10/docs/api/java/nio/file/Files.html#walkFileTree(java.nio.file.Path,java.nio.file.FileVisitor)"><code>Files.walkFileTree</code></a> which accepts a <a href="https://docs.oracle.com/javase/10/docs/api/java/nio/file/FileVisitor.html"><code>FileVisitor</code></a> instead of returning a stream.</p>
<pre><code>FileTraversal visitor = new FileTraversal();
Files.walkFileTree(Paths.get("/"), visitor);
</code></pre>
<p>I’ve never been a fan of the formal implementation of the Visitor pattern and even extending the <a href="https://docs.oracle.com/javase/10/docs/api/java/nio/file/SimpleFileVisitor.html"><code>SimpleFileVisitor</code></a> only helps to reduce some of the cruft of the interface.</p>
<pre><code>public class FileTraversal extends SimpleFileVisitor<Path> {
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
throws IOException {
return onDirectory(dir, attrs)
? FileVisitResult.CONTINUE
: FileVisitResult.SKIP_SUBTREE;
}
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException {
onFile(file, attrs);
return FileVisitResult.CONTINUE;
}
public boolean onDirectory(Path dir, BasicFileAttributes attrs) {
return true;
}
public void onFile(Path file, BasicFileAttributes attrs) {
}
public void traverse( Path dir ) throws IOException {
Files.walkFileTree(dir, this);
}
}
new FileTraversal() {
public boolean onFile(Path dir, BasicFileAttributes attrs) {
System.out.println("dir: " + dir);
return true;
}
public void onFile(Path file, BasicFileAttributes attrs) {
System.out.println("file: " + file);
}
}.traverse(Paths.get("/"));
</code></pre>
<p>By basing off the <a href="https://docs.oracle.com/javase/10/docs/api/java/nio/file/SimpleFileVisitor.html"><code>SimpleFileVisitor</code></a> it comes quite close to the original version using the old File interface. Except that I find the old code still much easier to read and understand. I do believe, in comparison, the old code has aged gracefully.</p>
<pre><code>public class FileTraversal {
public final void traverse( File f ) throws IOException {
if (f.isDirectory()) {
if (onDirectory(f)) {
File[] childs = f.listFiles();
for( File child : childs ) {
traverse(child);
}
return;
}
}
onFile(f);
}
public boolean onDirectory( File d ) {
return true;
}
public void onFile( File f ) {
}
}
</code></pre>
<p>While the anonymous class approach never felt great it was quite effective. Despite being a believer in delegation over inheritance, I could not come up with a concise version that mirrors the clear elegance.</p>
<h2>Conclusion</h2>
<p>What would should you use today? A fluent API implementation of the <a href="https://docs.oracle.com/javase/10/docs/api/java/nio/file/FileVisitor.html"><code>FileVisitor</code></a> interface in combination with lambdas is probably the most compact version as of today - if you think about the usage and don’t look at the <a href="https://torstencurdt.com/tech/posts/recursive-file-listing-java/lambda.java">bloated implementaion</a> that is.</p>
<pre><code>new FileTraversal()
.onDir((path, attrs) -> {
return true;
})
.onFile((path, attrs) -> {
System.out.println(path);
})
.traverse(path);
</code></pre>
<p>At least in theory the visitor approach should also allow for some easy paralelisation. If you enjoy simplicity I’d argue there is no need to replace the old code - unless you have a good reason to. If performance is a major concern it might be worth to consider the upgrade.</p>
https://torstencurdt.com/tech/posts/not-enough-entropy/Not Enough Entropy?2007-12-19T00:00:00Z<p>When establishing a TLS/SSL connection somehow gets stuck there is a good chance that you are just lacking enough <a href="https://en.wikipedia.org/wiki/Entropy_(computing)">entropy</a>. The tricky part is that you cannot find anything about this in the logs. If you experience similar problems, on Linux the first thing to check if there is enough entropy available.</p>
<pre><code>cat /proc/sys/kernel/random/entropy_avail
</code></pre>
<p>If the number is below 1000 that might be the problem. It means that your system does not generate enough randomness for cryptographically secure communications - and waits until there is.</p>
<h2>What is Entropy?</h2>
<p>While the concept of randomness is a familiar concept, the distinction to entropy is less clear. The material I found from <a href="https://www.blackhat.com/docs/us-15/materials/us-15-Potter-Understanding-And-Managing-Entropy-Usage-wp.pdf">Blackhat session</a> is not just really interesting, it also explains it in a surprisingly easy way.</p>
<dl>
<dt>Entropy</dt>
<dd>is the uncertainty of a future outcome.</dd>
<dt>Randomness</dt>
<dd>is the quality of uncertainty of historic outcomes.</dd>
</dl>
<h2>Why not use PRNG?</h2>
<p>An easy but terrible workaround is to only use a pseudo-random generator.</p>
<pre><code>mv /dev/random /dev/random.old
ln -s /dev/urandom /dev/random
</code></pre>
<p>By definition, these pseudo-random numbers cannot be really random. Mouse movements, key presses, audio or video input or disk access can be sources for proper randomness. There are dedicated daemones like <a href="https://www.issihosts.com/haveged/">haveged</a>, <a href="http://egd.sourceforge.net/">egd</a>, <a href="http://prngd.sourceforge.net/">prngd</a> or <a href="https://wiki.debian.org/Entropy">others</a> that do just that. Unfortunately on a virtual server, your mileage may vary.</p>
<p>Cryptography algorithms rely heavily on access to high quality random numbers. They are needed for key and nonce creation. I remember a time where I was asked to move a mouse to create a good encryption key - go figure. If the bandwidth and quality of random numbers suffers, the security of the underlying can be compromised.</p>
<p>How bad this can be is to look at <a href="https://rdist.root.org/2009/05/17/the-debian-pgp-disaster-that-almost-was/">what (almost) happend at Debian</a> in 2009. It should demonstrate very clearly how bad the “workaround” from above really is.</p>
<blockquote>
<p>The impact […] is that every signature generated on a vulnerable system reveals the signer’s private key</p>
</blockquote>
<p>So don’t skimp on entropy, your security may depend on it. There is enough randomness in this life - use it.</p>
https://torstencurdt.com/tech/posts/debugging-https/Logging HTTPS traffic2008-02-21T00:00:00Z<p>A first step to debugging usually is logging. The same is true for network connections. But eavesdropping is exactly what HTTPS/SSL/TLS is supposed to prevent. Let’s have a closer look if and how you debug your secured connections.</p>
<h2>The Web Scenario</h2>
<p>The HTTPS connection is kind of like a tunnel. On both ends you can see what goes in and what comes out. At one of these ends is the browser. It has full access to all the content. This is why, if you are debugging a web application, a browser extension is probably the best, or at least the easiest choice.</p>
<figure data-type="image"><img src="https://torstencurdt.com/tech/posts/debugging-https/browser.svg" alt="" /><figcaption>Listening from within the Browser</figcaption></figure>
<p>For both, <a href="https://chrome.google.com/webstore/detail/live-http-headers/ianhploojoffmpcpilhgpacbeaifanid">Chrome</a> and <a href="https://addons.mozilla.org/en-US/firefox/addon/http-header-live/">Firefox</a>, extensions are available that allow you to inspect the communication beyond what is already available through the built-in developer tools.</p>
<h2>The Mobile App Scenario</h2>
<p>If a browser extension is not feasible there isn’t much of a choice but to interrupt the tunnel. You basically need to split the tunnel into two to get another point that allows incepting the communication. This is usually what is referred to as a “man in the middle”.</p>
<figure data-type="image"><img src="https://torstencurdt.com/tech/posts/debugging-https/mitm.svg" alt="" /><figcaption>Listening as MITM</figcaption></figure>
<p>By telling the OS to connect through a proxy, the proxy can split the tunnel. It can terminate the HTTPS connection and pretend to be the origin server. But this won’t go unnoticed. Unless - the new server can generate a fake certificate that is trusted. For the certificate to be trusted is has to be signed by one of the CAs that are considered trusted by the OS. This is quite unlikely to happen. But the proxy can use its own CA. By telling the OS to trust that CA, the generated fake certificate can pass as the real deal. And that’s basically how all the following tools work.</p>
<table>
<thead>
<tr>
<th></th>
<th>Mac</th>
<th>Linux</th>
<th>Windows</th>
<th>Price</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.charlesproxy.com/">Charles</a></td>
<td>y</td>
<td>y</td>
<td>y</td>
<td>~50 USD</td>
</tr>
<tr>
<td><a href="https://www.telerik.com/fiddler">Fiddler Everywhere</a></td>
<td>y</td>
<td>y</td>
<td>y</td>
<td>free or subscription</td>
</tr>
<tr>
<td><a href="https://mitmproxy.org/">mitmproxy</a></td>
<td>y</td>
<td>y</td>
<td>y (WSL)</td>
<td>free</td>
</tr>
</tbody>
</table>
<p>If you need to dig one level deeper, there is also <a href="https://github.com/droe/sslsplit">SSLsplit</a>. It supports not only HTTPS, but any SSL TCP connection.</p>
<p>I am a happy <a href="https://www.charlesproxy.com/">Charles</a> customer for many years.</p>
<h2>Certificate Pinning</h2>
<p>As shown above, with HTTPS it is all a matter of who you trust. To fully prevent this kind of eavesdropping some software uses certificate pinning. By checking the fingerprints of the certificate chain, switching the certifacte and even using a custum CA does not go unnoticed.</p>
https://torstencurdt.com/tech/posts/java-classpath-and-directories/Java classpath and directories2008-12-03T00:00:00Z<p>The java classpath has always been quite a sad story. Countless of shell script were written to start java applications.</p>
<p>Of course we were always able to pass a directory of classes to the jvm like this:</p>
<pre><code>java -classpath dirwithclasses Main
</code></pre>
<p>But what you usually deal with are jars. Often quite a bunch of them. If you add them the same way</p>
<pre><code>java -classpath dirwithclasses:dirwithjars Main
</code></pre>
<p>until java 5 all you got was a <em>ClassNotFoundException</em>. Java just did not search for jars, but only classes in the directories. It completely ignored the jars. So what pretty much everyone ended up doing is providing yet another shell script to build up the classpath and pass it to the jvm via command line. Another kind of classpath hell.</p>
<pre><code>java -classpath lib/commons-logging-1.1.1.jar:lib/commons-jci-core-1.0.jar:commons-io-1.2.jar:lib/junit-3.8.2.jar:lib/maven-project-2.0.8.jar:lib/rhino-js-1.6.5.jar: ...
</code></pre>
<p>Since Java 6 you can use wildcards in the classpath.</p>
<pre><code>java -classpath classes:lib/'*' Main
</code></pre>
<p>Naturally this means the order of the jars is implicit. Which in turn means you need to be extra careful how you name your jars. But it’s a nice feature that should have been there from day one.</p>
https://torstencurdt.com/tech/posts/random-lines-from-large-files/Random Lines from Large Files2011-05-13T00:00:00Z<p>When working with big data taking samples is the only road to quick answers. Unfortunately, that already posts a bigger hurdle than it should be. When you ask people how to get a random sample of lines from a file you most likely will get these as an answer:</p>
<pre><code>cat file.txt | shuf -n 10 | head -n 10
cat file.txt | sort --random-sort | head -n 10
</code></pre>
<p>Unsurprisingly, <code>sort</code> and big data do not mix all that well. And even <code>shuf</code> <a href="https://github.com/coreutils/coreutils/blob/master/src/shuf.c">reads the whole input file into memory</a> first.</p>
<p>But if using <em>stdin</em> is not a requirement providing a file allows for seeking and reading the file size. This allows for picking some random positions not based on lines but on byte positions. A quick seek, then skipping the remainder of line and output the next full line as the random value.</p>
<p>Simple and fast - even on big files.</p>
<pre><code>lines = 10
filename = "filename.txt"
filesize = File.size(filename)
positions = lines.times.map { rand(filesize) }.sort
File.open(filename) do |file|
positions.each do |pos|
file.pos = [0, pos - 1].max
file.gets
puts file.gets
end
end
</code></pre>
<aside>
<p>If the lines have a quite even length distribution this approach is fine. If not this can cause a skew in the randomness of the lines selected. Whether that’s really a problem for taking samples is not so easy to answer - but surely something to be conscious of.</p>
</aside>
https://torstencurdt.com/tech/posts/efficient-time-series-data/Efficient Time Series Data2011-05-15T00:00:00Z<p>Many sites provide statistics for their users’ contents. And many of them provide a graph of how many events (plays/views/visits) the content got over time.</p>
<p>This is easy when aggregating over the raw events data. But that’s neither particularly fast nor scalable. As soon as the number of events or the time window gets too big this approach is no longer feasible.</p>
<p>A natural choice is to rollup the data into time buckets. By creating counters per different time units, the number of operations can be kept at a minimum and still provide fast aggregation over large time windows. The time needed then no longer depends on the actual number of events.</p>
<p>Calculating the aggregate then becomes as simple as adding a couple of counters. It’s the sum of the most covering aggregates that fit into the given time interval.</p>
<p>Getting the total count for the interval from 2008-11-31 until 2011-02-03 means getting 8 values and adding them.</p>
<figure data-type="image"><img src="https://torstencurdt.com/tech/posts/efficient-time-series-data/timeseries.svg" alt="" /><figcaption>The 8 rollup values used by the query</figcaption></figure>
<p>Choosing the time units here defines the read pattern and the number of counters needed. A natural combination of bucket units would, of course, be <em>year+month+day</em> but <em>year+week+day</em> could be an option, too. Even just rolling up the counts into <em>day</em> buckets could make sense. The icing on the cake is when the units can be kept abstracted enough and switching between models is possible without bigger efforts.</p>
<p>But for an informed decision, it’s best to check against real user request and to calculate the required storage needs.</p>
<h2>Rollups with Map Reduce</h2>
<p>While the read path should be clear the events still need to be rolled up. A naive approach is to increments counters while they arrive. A fan-out where a single event causes multiple increment operations</p>
<pre><code>event at 2011-03-14 =>
2011 += 1
2011-03 += 1
2011-03-14 += 1
</code></pre>
<p>This puts a lot of write pressure on the store. Of course, an increment is not just a write but also a read. And on top of that of all that these increments should happen in a single transaction. And this still doesn’t account for the read load.</p>
<p>Another option is to generate the counters from log files. Which could be implemented with a simple map-reduce job. The mapper emits the data for the track and/or global counters.</p>
<pre><code>func mapper(lines [][]string) [][]string {
var result [][]string
for _, line := range lines {
date := line[0]
track := line[1]
// per track counters
result = append(result, []string{track, date})
// global counters (if needed)
result = append(result, []string{"*", date})
}
return result
}
</code></pre>
<p>Next, the data gets sorted. A distributed setup also needs a partitioner to split the data up per <em>track+year</em>. This guarantees that a single reducer sees the data of a full year per track. Otherwise, I could not calculate the full rollup per year.</p>
<p>The reducer is simple but the real workhorse. It detects changes in the incoming data, does the rollups and emits the final counts.</p>
<pre><code>func reducer(lines [][]string) {
counter_d := 1
counter_m := 1
counter_y := 1
prev_d := ""
prev_m := ""
prev_y := ""
prev_t := ""
for i, line := range lines {
date := line[1]
d := date[0:10]
m := date[0:7]
y := date[0:4]
t := line[0]
if i != 0 {
if d != prev_d {
write(prev_t, prev_d, counter_d)
counter_d = 1
} else {
counter_d += 1
}
if m != prev_m {
write(prev_t, prev_m, counter_m)
counter_m = 1
} else {
counter_m += 1
}
if y != prev_y {
write(prev_t, prev_y, counter_y)
counter_y = 1
} else {
counter_y += 1
}
if t != prev_t {
counter_d = 1
counter_m = 1
counter_y = 1
}
}
prev_d = d
prev_m = m
prev_y = y
prev_t = t
}
if prev_t != "" {
write(prev_t, prev_d, counter_d)
write(prev_t, prev_m, counter_m)
write(prev_t, prev_y, counter_y)
}
}
</code></pre>
<p>Ideally, this job will create the data that is ready for a bulk import into a fast store that can then serve the web requests in a timely manner.</p>
<p>The code is just for demonstration purposed and not suited for real-world usage. But here is <a href="https://torstencurdt.com/tech/posts/efficient-time-series-data/timeseries.go">the full code</a> in <a href="https://go.dev/play/p/PTV7ee2frlM">a playground</a> to see the example in action.</p>
<aside>
<p>If the presented approach is being used for global counters it means that all the events of a <em>full</em> year need to pass through a <em>single</em> reducer - a <em>single</em> machine. This presents a hard limit in processing time that will also not decrease by growing the cluster. To eliminate the hard limit and thereby improve the scalability the partitioner approach is not acceptable. Instead, it needs a secondary job to combine the rollups. In the end, this is just a batch job - so it really depends on the exact use case whether the added complexity is really needed or not.</p>
</aside>
<h2>From Batch to Real Time</h2>
<p>Calculating the rollups purely in a batch job creates a lag. Whether that is acceptable or not depends on the use case but often enough real time stats are now the requirement.</p>
<p>With a hybrid approach, a <a href="https://en.wikipedia.org/wiki/Lambda_architecture">lambda architecture</a> can close the gap from batch to real-time. The batch layer provides the rollups up until a certain point in time. Everything past is served from a fast in-memory layer.</p>
<figure data-type="image"><img src="https://torstencurdt.com/tech/posts/efficient-time-series-data/realtime.svg" alt="" /><figcaption>Combining Rollups and Real Time Counters</figcaption></figure>
<p>The in-memory layer can even be transient as the eventual source of truth will always be the batch layer. A <a href="https://developers.soundcloud.com/blog/keeping-counts-in-sync/">similar architecture</a> is serving the play count statistics at <a href="https://soundcloud.com/">SoundCloud</a>.</p>
https://torstencurdt.com/tech/posts/master-slave-consistency-to-scale-mysql/Consistency to Scale MySQL2011-06-14T00:00:00Z<p>Scaling out a database usually means creating a master-slave setup where the slaves replicate information from the master. This naturally creates a replication lag. Depending on the load of the systems the slaves will be behind master anything between seconds, minutes or even more.</p>
<figure data-type="image"><img src="https://torstencurdt.com/tech/posts/master-slave-consistency-to-scale-mysql/cluster.svg" alt="" /><figcaption>cluster</figcaption></figure>
<p>This lag creates problems in the application tier. In the above figure, only one of the slaves has caught up with the replication. Picking the wrong slave when reading can return outdated data.</p>
<p>When data consistency in the user interaction is required applications usually fall back to the master. When the user changes a setting the subsequent request should not show stale data from a slave that is not up to speed yet.</p>
<p>For Rails, multiple master-slave adapters are available that act as transparent delegates to connect to the master-slave setup. By wrapping the database access <em>ActiveRecord</em> can be forced to go to the master when full consistency for the user interaction is required.</p>
<pre><code>ActiveRecord::Base.with_master do
...
end
</code></pre>
<p>But this can lead to many more requests to the master than what is really needed. Taking a step back what really is required most of the time is just a per-user consistency.</p>
<blockquote>
<p>Users should immediatly see what they changed - not necessarily what other users have changed.</p>
</blockquote>
<p>This loosens the contract and is the basis for an idea.</p>
<p>The master server has a binlog position that increases with every write. Slaves are fully replicated when their binlog position matches the one from master. It’s a race of an ever-increasing clock.</p>
<p>This means that if the clock on the slave matches (or is larger than) the clock of the last write of the user, the query can safely use the slave without risking inconsistencies to be presented to the user.</p>
<p>To express this contract we introduced a new construct into the adapter</p>
<pre><code>new_clock = ActiveRecord::Base.with_consistency(old_clock) do
...
end
</code></pre>
<p>Whenever there is a write the clock is increased. The <em>with_consistency</em> is a contract that the block is executed on a database that has reached the given clock. When entering such a block the master-slave adapter checks the slave replication status and makes the right choice whether the query can be served from the slave or whether it requires to go to the master. It shaves off all the unnecessary read requests to the master.</p>
<blockquote>
<p>Focus on the user’s view of the database.</p>
</blockquote>
<p>By introducing this change at <a href="https://soundcloud.com/">SoundCloud</a> we managed to reduce the query load on the master server by roughly 50% and made better use of our slaves. The master no longer is the bottleneck. The per-user clocks are stored in memcache. If the user does not have a clock yet we fall back to master.</p>
<p>The fork of the master-slave adapter is <a href="https://github.com/soundcloud/master_slave_adapter">available on github</a> and can be used from Rails as a gem.</p>
<pre><code>gem install master_slave_adapter
</code></pre>
<p>It is in production since June and we are more than happy with the results so far. If you have Rails and a MySQL cluster you should probably give it a try.</p>
<aside>
<p>Because it makes use of <code>SHOW MASTER STATUS</code> and <code>SHOW SLAVE STATUS</code> it’s available to MySQL only for now. It certainly should work for any other database that allows access to the binlog position via SQL.</p>
</aside>
https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/Modulo of Negative Numbers2011-06-28T00:00:00Z<p>The modulo or often referred to as “mod” represents the remainder of a division. In 1801 Gauss published a book covering modular arithmetics. Later a widely accepted mathematical definition was given by <a href="https://en.wikipedia.org/wiki/Donald_Knuth">Donald Knuth</a></p>
<pre><code>mod(a, n) = a - n * floor(a / n)
</code></pre>
<p>Doing an integer division and then multiplying it again means finding the biggest number smaller than <code>a</code> that is dividable by <code>n</code> without a remainder. Subtracting this from <code>a</code> yields the remainder of the division and by that the modulo.</p>
<p>But what does the modulu operation do? What can it be used for when coding?</p>
<h2>Restricting Bounds</h2>
<p>In programming, the modulo operator (<code>%</code> or sometimes <code>mod</code>) often is used to restrict an index to the bounds of an array or length limited data structure.</p>
<pre><code>values = [ 3, 4, 5 ]
index = 5
value_at_index = values[ index % values.length ]
</code></pre>
<p>For the above example this means <code>5 mod 3 = 2</code> following the definition is <code>5 - floor(5/3)*3 = 2</code>. This means that no matter the value <code>index</code> has, the array bounds are met.</p>
<p><strong>But is that really the case?</strong></p>
<p>What happens if the dividend or the divisor is signed and holds a negative value?
Turns out the rules of modulo on negative numbers indeed depend on the language you are using.</p>
<p>How does modulus work in java?<br />
How does modulus work in javascript?<br />
How does modulus work in python?<br /></p>
<p>While the code looks pretty much the same in most languages, printing the result shows languages in mostly two different camps.</p>
<p>Only two languages take a different stance: Dart and in particular Zig, which
<a href="https://github.com/ziglang/zig/issues/217">distinguishes both cases</a> as <code>@rem(a,b)</code> and <code>@mod(a,b)</code> and errors out on a negative divisor.</p>
<table>
<thead>
<tr>
<th>Language</th>
<th style="text-align:center">13 mod 3</th>
<th style="text-align:center">-13 mod 3</th>
<th style="text-align:center">13 mod -3</th>
<th style="text-align:center">-13 mod -3</th>
</tr>
</thead>
<tbody>
<tr>
<td>C</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>C#</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>C++</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Elixir</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Erlang</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Go</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Java</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Javascript</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Kotlin</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Nim</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>PHP</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Rust</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Scala</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Swift</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Crystal</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">-2</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Haskell</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">-2</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Lua</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">-2</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Python</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">-2</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Ruby</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">-2</td>
<td style="text-align:center">-1</td>
</tr>
<tr>
<td>Dart</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
</tr>
<tr>
<td>Zig @rem</td>
<td style="text-align:center">1</td>
<td style="text-align:center">-1</td>
<td style="text-align:center">error</td>
<td style="text-align:center">error</td>
</tr>
<tr>
<td>Zig @mod</td>
<td style="text-align:center">1</td>
<td style="text-align:center">2</td>
<td style="text-align:center">error</td>
<td style="text-align:center">error</td>
</tr>
</tbody>
</table>
<!--
Ada | ? | ? | -? | -? |
cobol | ? | ? | -? | -? |
Clojure | ? | ? | -? | -? |
D | ? | ? | -? | -? |
Groovy | ? | ? | -? | -? |
Lisp | ? | ? | -? | -? |
Perl | ? | ? | -? | -? |
V | ? | ? | -? | -? |
OCaml | ? | ? | -? | -? |
Pascal | ? | ? | -? | -? |
Smalltalk | ? | ? | -? | -? |
-->
<p>So if you use the modulo operator to ensure correct bounds for accessing a collection, beware that some languages need a little more diligence. A simple and efficient way is to check the sign.</p>
<pre><code>int mod(a, b) {
c = a % b
return (c < 0) ? c + b : c
}
</code></pre>
<p>As another option, you could also apply the modulo twice.</p>
<pre><code>int mod(a, b) {
(((a % b) + b) % b)
}
</code></pre>
<h2>Even or Odd</h2>
<p>Another pitfall to watch out for is when testing whether a number is odd or even using the modulo operator. Based on the above findings you should always compare against <code>0</code>.</p>
<pre><code>bool is_odd(int n) {
return n % 2 != 0; // could be 1 or -1
}
</code></pre>
<p>But anyone that has ever looked a layer below C will point out that using the modulo isn’t necessarily the best implementation for <code>is_odd</code> anyway. Multiplication and especially divisions are some of the most expensive instructions on a CPU. If you are dealing with 2-based numbers there is often a faster way.</p>
<pre><code>x % 2n == x & (2n - 1) // for n>0
</code></pre>
<p>At least for a positive divisor, the modulo operation can be replaced with a simple bitwise <code>and</code> operation.</p>
<pre><code>x % 2 == x & 1
x % 4 == x & 3
x % 8 == x & 7
...
</code></pre>
<p>Which allows for a much faster implementation of <code>is_odd</code>.</p>
<pre><code>bool is_odd(int n) {
return n & 1 != 0;
}
</code></pre>
<h2>In Summary</h2>
<p>The modulo operator can be incredibly useful but developers need also to be aware of the above edge cases and when to use or not use it.</p>
<p>For a more detailed discussion see the <a href="https://en.wikipedia.org/wiki/Modulo_operation">wikipedia article</a>.</p>
https://torstencurdt.com/tech/posts/qr-barcode-automation/Barcodes for Automation2020-01-02T00:00:00Z<p>Shell scripts and Automator workflows are powerful tools to build automation setups on Mac. Unfortunately, neither allows for interfacing with QR or barcodes easily.</p>
<p>While there are plenty of tools to generated QR and barcodes, the options for scanning them from the built-in camera are scarce. Let alone connecting the reading process to scripting.</p>
<p>I wrote a little tool called <a href="https://en.wikipedia.org/wiki/EPC_QR_code">ScanCode</a> that acts as an input for shell scripts and Automator workflows. This lets you easily read barcodes and transform their data however you like.</p>
<figure data-type="image"><img src="https://torstencurdt.com/tech/posts/qr-barcode-automation/scancode_wide.jpg" alt="" /><figcaption>ScanCode triggering a Bash Script</figcaption></figure>
<p>While scanning all your books to a CSV file is a possible and easy use case, let me walk you through a slightly more complex application that helps with SEPA transfers.</p>
<h2>SEPA Transfers</h2>
<p>More and more often you see QR codes printed on invoices. They contain the data for the transfer. No typing, and more importantly, no typing errors when copying that long IBAN into your banking software.</p>
<p>Some banking software allows for directly scanning the QR code. While <a href="https://moneymoney-app.com/">MoneyMoney</a> (the banking software of my choice) does not support direct scanning, it does support a URL schemes for starting a new SEPA transfer. So all is needed is to transform the scanned QR code data into a conforming URL and then open MoneyMoney with that URL.</p>
<h3>EPC</h3>
<p>Probably the most widely used standard to encode payment information in QR codes is <a href="https://en.wikipedia.org/wiki/EPC_QR_code">EPC code</a> - at least in the EU. To this date, there is adoption in Austria, Belgium, Finland, The Netherlands, and Germany.</p>
<p>The EPC is a lines based encoding of the SEPA transfer information. The <a href="https://www.europeanpaymentscouncil.eu/sites/default/files/KB/files/EPC069-12%20v2.1%20Quick%20Response%20Code%20-%20Guidelines%20to%20Enable%20the%20Data%20Capture%20for%20the%20Initiation%20of%20a%20SCT.pdf">documentation</a> is available from the European Payments Council. Here is an example:</p>
<pre><code>BCD
002
1
SCT
RLNWATWW
Doctors Without Borders
AT973200000000518548
EUR1500.99
Emergency Donation
</code></pre>
<p>An <a href="https://qrcode.tec-it.com/en/SEPA">online generator</a> is available, too.</p>
<h3>URL schemes</h3>
<p>MoneyMoney supports two URL schemes:</p>
<pre><code>bank://
payto://
</code></pre>
<p>The <em>Bezahlcode</em> was a specification written by a German company. It used a rather flexible URI scheme to encode the payment information.</p>
<pre><code>bank://singlepaymentsepa?name={NAME}&reason={REASON}&iban={IBAN}&amount={AMOUNT}
</code></pre>
<p>The most import fields are <code>name</code>, <code>reason</code>, <code>iban</code> and <code>amount</code>. Unfortunately, the official documentation is no longer online. But fortunately enough the usage is simple and it still is a supported interface to pass information into MoneyMoney.</p>
<p>The <code>payto</code> scheme is defined in <a href="https://www.rfc-editor.org/rfc/rfc8905">RFC 8905</a> and uses a very similar approach:</p>
<pre><code>payto://iban/{IBAN}?amount=EUR:{AMOUNT}&message={MESSAGE}
</code></pre>
<p>It uses very similar input fields. The <code>message</code> field is the equivalent of <code>reason</code> in <code>Bezahlcode</code>.</p>
<h3>Developing the Script</h3>
<p>A simple script helps to store the QR code data in a file. This allows for a much faster development cycle.</p>
<pre><code>#!/bin/sh
cat > $HOME/Desktop/code.txt
</code></pre>
<p>First open the image of the <a href="https://torstencurdt.com/tech/posts/qr-barcode-automation/bezahlcode.png">Bezahlcode</a> on the phone, scan, and rename. Next open the image of the <a href="https://torstencurdt.com/tech/posts/qr-barcode-automation/epc.png">EPC code</a> on the phone, scan, and rename. Now we can pipe those files into the target script.</p>
<pre><code>cat bezahlcode.txt | ./bezahlcode.sh
cat epc.txt | ./bezahlcode.sh
</code></pre>
<p>Both executions should open as a SEPA transfer in MoneyMoney. Let’s write the <code>bezahlcode.sh</code> script.</p>
<p>First, we turn the lines that come in via <code>stdin</code> into an array for easier access.</p>
<pre><code>unset lines
while IFS= read -r; do
lines+=("$REPLY")
done
[[ $REPLY ]] && lines+=("$REPLY")
</code></pre>
<p>While there is a much shorter and nicer version for the above shell code, this one also handles empty lines correctly.</p>
<p>Now we need to check with what type of code we are dealing with.</p>
<pre><code>if [ "${lines[0]:0:25}" = "bank://singlepaymentsepa?" ]; then
open "${lines[0]}"
exit 0
fi
</code></pre>
<p>For Bezahlcode there is nothing to be done. We can just pass the URL as is. For EPC it’s a bit more work. We need to extract, adjust some and urlencode all data before we create the final URL.</p>
<pre><code>if [ "${lines[0]}" = "BCD" ]; then
BIC=`echo ${lines[4]} | urlencode`
NAME=`echo ${lines[5]} | urlencode`
IBAN=`echo ${lines[6]} | urlencode`
AMOUNT=`echo ${lines[7]} | tr -cd "[:digit:] [:punct:]" | urlencode`
REASON=`echo ${lines[10]} | urlencode`
open "bank://singlepaymentsepa?name=${NAME}&iban=${IBAN}&bic=${BIC}&amount=${AMOUNT}&reason=${REASON}"
exit 0
fi
</code></pre>
<p>That’s it. Here is a link to <a href="https://torstencurdt.com/tech/posts/qr-barcode-automation/bezahlcode.sh">download the full script</a>. It includes the <code>urlencode</code> function that I left out above for brevity. Just put the <code>bezahlcode.sh</code> script into the ScanCode script folder, make sure permissions allow for execution and you are good to go.</p>
<p>The script is now also included with ScanCode inside the example folder.</p>
<p><a href="https://apps.apple.com/de/app/scancode/id1482517543?l=en&mt=12"><img src="https://torstencurdt.com/tech/posts/qr-barcode-automation/mas.svg?button" alt="ScanCode on the Mac App Store" /></a></p>