oss-sec mailing list archives

Re: Update on the distro-backdoor-scanner effort

From: Jacob Bachmeyer <jcb62281 () gmail com>
Date: Sun, 28 Apr 2024 23:48:13 -0500

Hank Leininger wrote:

On 2024-04-27, Jacob Bachmeyer wrote:

  - Output is manageable; able to rule out all hits not part of the
    actual xz-utils backdoors as false positives.

This is what I would expect:  the backdoor dropper appears to have
been specifically developed for xz-utils, but could /possibly/ be
adaptable to other compression tools.


Indeed, well my thinking was more along the lines of: "This is an
impressive amount of moving parts to be created new for this project,
and burned for just this project. What if what we are seeing is the 2nd
or 3rd gen of such a toolkit, and earlier ones have some similar
characteristics but maybe fewer layers, etc. so we could spot them?"

I disagree here because, while I agree that there are quite a few movingparts, I also (having replicated unpacking the dropper shell scripts)see incremental development paths, and a few major mistakes that a morepolished backdoor could have avoided. If this were a 2nd or 3rdgeneration toolkit, those mistakes would have been fixed.

Overall, I think the "Jia Tan" crew went straight for the "Golden Key"and bit off /way/ more than they could chew.

[...] while the liblzma decompressor had
certain things going for it as a target, why not other things? (Nor am I
restricting to "other things that would get linked into sshd", but
really more broadly.)

I disagree with this assessment: the backdoor blob, according toreports so far, does not appear to actually affect xz or liblzma /at//all/; liblzma is merely used (as a dependency of libsystemd) to smugglethe blob into the sshd process and pass control into the blob duringprocess initialization. From the reports that I have seen, theentrypoint that liblzma is patched to call does very little itself, andonly modifies ld.so's data segment to half-register the blob with theLD_AUDIT framework so it will be called again later. Nearly all of thebackdoor seems to be independent of liblzma.

[...] The
xz-utils stuff is many generations ahead of those; it would be
interesting to spot some missing links.

I suspect that the "missing links" you seek are to be found in variousWindows malware over the years, at least as far as the binary backdoorblob is involved, but I have not been directly involved in thatanalysis. From the reports I have read, it seems to be more-or-less apiece of Windows malware adapted to an ELF environment. The adaptationsalso seem to be exactly the parts that went poorly, causing performanceproblems that led to the backdoor's discovery. (Or ld.so really is thatmuch more efficient than the Windows module loader, and performanceissues that are routinely lost in the noise on Windows got the "Jia Tan"crew caught.)

The dropper scripts do not appear well-developed to me. Overall, they/do/ strike me as a first-generation effort, likely the authors' firstshell scripts on top of that, and evidence a poor understanding of theGNU system. There are worse-than-beginner mistakes in there, likeomitting the spaces around the '=' in a string comparison, and overallpatterns that suggest a poor understanding of shell scripting. Forexample, a common idiom in shell programming is to use && and || forconditional execution of single commands, especially conditional exits;the dropper uses this idiom once, getting the test wrong (theaforementioned missing spaces), but then uses if/then/fi later forconditional exits. The dropper scripts also appear to be the product ofcopy-and-paste cargo cult programming: some commands are written as"[...] || true" even though their exit status will actually be ignored,suggesting that they were adapted from a Makefile or a script that runswith "set -e" in effect. Lastly, the outer dropper uses a linear seriesof head(1) commands to "skip 1KiB of garbage, extract 2KiB of backdoor,repeat" but simply reading the manuals would have revealed moreappropriate tools for that---and a way to make the outer dropperindependent of the inner dropper's length.

You might get better results by indexing macro definitions found in
*.m4 files, instead of trying to fuzzily hash the files.  The
interesting comparison is then different definitions of macros with
the same name.


I like this a lot as a potential next layer for the m4 reconciliation.
Essentially a field-level matching once things that match at a
file-level have been eliminated. I don't see why (he says, not actually
having dug into the m4 format much) we couldn't break apart all the m4s
we are choosing to consider known-good, and catalog each individual
macro, and then do the same when bashing project-specific files. Could
be we can entirely "clear" a file with an unknown checksum because it
consists 100% of idnividual macros that are known.

More likely, you would be able to "clear" a file that actually differsin either whitespace or comments---or catch a file that has somenastiness inserted between macro definitions.

(Insert weird machine
here that combined OK macros in surprising ways.)

While m4 technically is Turing-complete, I strongly doubt the existenceof m4 weird machines: m4 is a macro processor that performs textexpansion. You should have a manual for it in the Info system.

many (most?) modern Linux kernels are compressed using xz, which means
that a Thompsonesque attack could binary-patch a freshly built kernel
while compressing vmlinux to make vmlinuz.


Good call. This may be far out on the.... "far out" scale, but it'd be
pretty trivial to harvest distro kernels that used xz to make their
vmlinuz, and then run those through multiple independent implementations
like we have done with .tar.xz files. Sold.

Make sure that at least one of those implementations is a wrapper aroundthe xz-embedded decompresser that Linux itself uses. (As in, mmap()vmlinuz MAP_PRIVATE, mmap an output file at the appropriate virtualaddress, patch the jump into the kernel at the end of the decompressionstub with a return, and call the embedded decompresser.) If you areconcerned about weird machines, there is a tiny, paranoid, possibilitythat a tampered compressed stream would decompress "clean" with anyother decompresser.

I doubt that you will find anything, though. The "Jia Tan" crew doesnot seem to have been that advanced.

The IFUNC mechanism is actually a security feature.  In "inner-loop"
code, having multiple implementations with different optimizations
with the preferred implementation for the local processor chosen at
runtime is fairly common.


Thanks for this! I've seen that discussed as a (valid, useful) use of
IFUNC but also AFAIK things like musl don't implement such a thing, so
either software that wants it just doesn't support musl, or can't pick
an optimization, or does so in an undesirable way like writable pointers
in the data segment or... some other option?

When IFUNC usage was added to xz-utils, the older "dispatch through afunction pointer" technique was kept as a fallback when building withoutIFUNC.

[...] I'd be satisfied being able to say "IFUNCs are
used by 15 out of 10,000 packages; that's a small enough number we can
a) audit them all b) add alerting to tooling used for builds; when a
package suddenly starts using them, look into it." If the real number
turns out to be 1,000 out of 10,000, then that's good to know, and
probably give up.

Alerting should be simple matter of `grep -Ri ifunc .` at the top of thepackage tree.

I currently suspect that the crackers used IFUNC support as a covert
flag.  The "jankiness" of the current glibc IFUNC implementation
provided a convenient excuse to ask oss-fuzz to --disable-ifunc when
building xz-utils, which *also* conveniently inhibited the backdoor
dropper and ensured that the fuzzing builds would not contain the
backdoor.


Uncynically, I like this conspiracy theory.

Then I will go a step farther: I half-suspect that the entireCLMUL-based CRC calculation may have been added as an excuse to add thedispatch logic, which in turn was an excuse to use IFUNC. While I havenot run tests, I suspect that the baseline table-driven CRC calculationis probably already I/O-bound on modern processors, especially x86-64processors new enough to /have/ CLMUL.

[...]
I think Sam looked into existing pkg-config verifiers and found they do
not complain about things we thought they should complain about (this
could just mean we misunderstand their purpose). A strict lint-checker
for such files would be better than just checking for specific
suspicious patterns. But, I don't yet know how strict a format we could
insist on (would it turn out 10% of files in fact break what we
initially think are reasonable rules?). Even still, I think you could
embed badness in legit variables, although I haven't dug in enough to
know that for sure.


Quoting pkg-config(1):

Files have two kinds of line: keyword lines start with a keyword plus a
colon, and variable definitions start with an alphanumeric string plus
an equals sign. Keywords are defined in advance and have special
meaning to /pkg-config/; variables do not, you can have any variables
that you wish (however, users may expect to retrieve the usual
directory name variables).

The format is clearly defined; any pkg-config file containing nonblanklines /not/ matching the above description should be rejected, andpkg-config itself should implement this check. This rule ensures that apkg-config file cannot also contain shell commands.

The sample backdoor also uses an *-uninstalled.pc file to override alegitimate file, but places the "uninstalled" file in a systemdirectory, which means it is actually installed! The pkg-config programshould also refuse to read *-uninstalled.pc files from systemdirectories, and emit a warning if such a file is found to exist.



-- Jacob

Current thread:

Update on the distro-backdoor-scanner effort Hank Leininger (Apr 26)
- Re: Update on the distro-backdoor-scanner effort Simon McVittie (Apr 26)
  - Re: Update on the distro-backdoor-scanner effort Sam James (Apr 26)
- Re: Update on the distro-backdoor-scanner effort Jacob Bachmeyer (Apr 27)
- Re: Update on the distro-backdoor-scanner effort Morten Linderud (Apr 27)
- <Possible follow-ups>
- Re: Update on the distro-backdoor-scanner effort Hank Leininger (Apr 28)
  - Re: Update on the distro-backdoor-scanner effort Jacob Bachmeyer (Apr 29)
  - Re: Update on the distro-backdoor-scanner effort Vegard Nossum (Apr 29)
    - Re: Update on the distro-backdoor-scanner effort Gabriel Ravier (Apr 29)
    - Re: Update on the distro-backdoor-scanner effort Jacob Bachmeyer (Apr 30)
- Re: Update on the distro-backdoor-scanner effort Hank Leininger (Apr 28)