Zephyrnet Logo

Is Your Mental Model Of Bash Pipelines Wrong?

Date:

[Michael Lynch] encountered a strange situation. Why was compiling then running his program nearly 10x faster than just running the program by itself? [Michael] ran into this issue while benchmarking a programming project, pared it down to its essentials for repeatability and analysis, and discovered it highlighted an incorrect mental model of how bash pipelines worked.

Here’s the situation. The first thing [Michael]’s pared-down program does is start a timer. Then it simply reads and counts some bytes from stdin, then prints out how long it took for that to happen. When running the test program in the following way, it takes about 13 microseconds.

$ echo '00010203040506070809' | xxd -r -p | zig build run -Doptimize=ReleaseFast
bytes: 10
execution time: 13.549µs

When running the (already-compiled) program directly, execution time swells to 162 microseconds.

$ echo '00010203040506070809' | xxd -r -p | ./zig-out/bin/count-bytes
bytes: 10
execution time: 162.195µs

Again, the only difference between zig build run and ./zig-out/bin/count-bytes is that the first compiles the code, then immediately runs it. The second simply runs the compiled program.

How can adding an extra compile step decrease the execution time? Turns out that [Michael]’s mental model of how bash pipelines work was incorrect, and he does a great job of explaining how they actually work, and why that caused the strange behavior he was seeing.

In short, commands in a bash pipeline are not launched sequentially. They are all launched at the same time and execute in parallel. That meant that when run directly, [Michael]’s byte-counter program launched immediately. Then it waited around doing nothing much for about 150 microseconds while the echo '00010203040506070809' | xxd -r -p part of the pipeline got around to delivering its data for the program to read. This is where the extra execution time comes from when running the already-compiled version.

So why is compiling it first running faster? Same basic reason: when the zig build run command kicks off, it spends a little time compiling the program first. Then when the compiled program is actually launched (and begins its execution timer), the input data from the bash pipeline is already ready. So, the freshly-compiled program executes in less time because it doesn’t sit around waiting for data from earlier in the pipeline to become available.

It’s an interesting look at how bash pipelines actually function under the hood, and we’re delighted with the detail [Micheal] puts into the whole journey and explanation. Sooner or later, details like this crop up and cause some eyebrows to raise, like the user who discovered troublesome edge cases regarding spaces in ssh commands.

spot_img

Latest Intelligence

spot_img