oss-sec mailing list archives

Re: Analysis on who is Jia Tan, and who he could work for, reading xz.git


From: Alejandro Colomar <alx () kernel org>
Date: Wed, 10 Apr 2024 20:19:46 +0200

Hi Joey, Alexander,

On Wed, Apr 10, 2024 at 12:10:51PM -0400, Joey Hess wrote:
Alejandro Colomar wrote:
I suspect those +0200 and +0300 correspond to a few times that this guy
would have traveled to his intelligence agency for some special work

That's a theory. But many of the commits with author Jia Tan in those
time zones have committer Lasse Collin, and show signs of being eg,
git-amed patch sets which may have also been rebased. In which case
it would make sense that these have Lasse Collin's usual timezone.

Yep, I also had the feeling that some of those might be the result of
git-am(1) (TBH, I had those feelings today, after the email had been
sent).  In principle, git-am(1) respects the author date, but if some
mails (assuming patches taken via mail) were somehow malformed, or Lasse
had something misconfigured, it might have overwritten the author date.
Maybe this helps Lasse investigate his emails, and see if this makes any
sense for him.

The CommitDate, however, is certain to be from Jia, and there are
exactly 4 commits from him with a suspicious CommitDate (+0200), in a
short and recent period of time: 2024-02-29 - 2024-03-05, part of the
most "fun" period of this thing.  It would be interesting to investigate
the mails that led to those patches being pushed, if they were discussed
publicly, since they may contain IPs and other traces.

BTW, there's more indication that shows there were several people
involved: there's a mix of timezones in this period:

        $ git log --all --since=2024-02-28 --until=2024-03-06 \
                --pretty=fuller --date=iso \
        | grep -B1 Date: \
        | grep -A1 jia \
        | grep -v -- -- \
        | grep -v jia \
        | sed 's/......Date: //' \
        | while read d; do
                date -u --iso-8601=seconds --date="$d";
                echo "  %%  $d";
        done \
        | sed 'N;s/\n//' \
        | sort;
        2024-02-29T14:35:52+00:00  %%  2024-02-29 16:35:52 +0200
        2024-02-29T14:35:52+00:00  %%  2024-02-29 16:35:52 +0200
        2024-03-04T16:27:31+00:00  %%  2024-03-05 00:27:31 +0800
        2024-03-04T16:27:31+00:00  %%  2024-03-05 00:27:31 +0800
        2024-03-04T16:27:31+00:00  %%  2024-03-05 00:27:31 +0800
        2024-03-04T16:34:46+00:00  %%  2024-03-05 00:34:46 +0800
        2024-03-04T16:34:46+00:00  %%  2024-03-05 00:34:46 +0800
        2024-03-04T16:34:46+00:00  %%  2024-03-05 00:34:46 +0800
        2024-03-04T17:23:18+00:00  %%  2024-03-04 19:23:18 +0200
        2024-03-04T17:54:30+00:00  %%  2024-03-05 01:54:30 +0800
        2024-03-04T17:54:30+00:00  %%  2024-03-05 01:54:30 +0800
        2024-03-05T21:21:26+00:00  %%  2024-03-05 23:21:26 +0200

Of course, all of that can be faked, but it's a starting point.  And
even for state actors, it's hard to not make mistakes, so they likely
leaked something at some point.


I analized that here: https://hachyderm.io/@joeyh/112193146103113070

Regarding to your question in that post:

< anyone know of a common #git workflow that would result in 4 commits
< with 2 separate authors all having one timestamp as a common commit
< timestamp and a second timestamp as a common author timestamp?

For the author dates you get them with `git commit --reuse-message`.  I
do that seldom, but I do it.  It's useful when I decide I want to reuse
a commit message (for a patch set which has repetitively stuff in the
commit message, for example, where you can --reuse-message and then
adjust).  You can find a few examples in the Linux man-pages repo.
The committer date you can get them with a rebase of the patch set.

So he reused a commit message + ammend, and then rebased at some point.
It's not unconceivable.  Here's how to reproduce it:

        alx@debian:~/tmp$ mkdir foo
        alx@debian:~/tmp$ cd foo/
        alx@debian:~/tmp/foo$ git init
        Initialized empty Git repository in /home/alx/tmp/foo/.git/
        alx@debian:~/tmp/foo$ git commit --allow-empty -m init
        [main (root-commit) dd8f3ea] init
        alx@debian:~/tmp/foo$ touch a
        alx@debian:~/tmp/foo$ git add .
        alx@debian:~/tmp/foo$ git commit -m a
        [main c92b1c1] a
         1 file changed, 0 insertions(+), 0 deletions(-)
         create mode 100644 a
        alx@debian:~/tmp/foo$ touch b
        alx@debian:~/tmp/foo$ git add .
        alx@debian:~/tmp/foo$ git commit -m b
        [main 511aa3c] b
         1 file changed, 0 insertions(+), 0 deletions(-)
         create mode 100644 b
        alx@debian:~/tmp/foo$ touch c
        alx@debian:~/tmp/foo$ git add .
        alx@debian:~/tmp/foo$ git commit --reuse-message=HEAD
        [main 28cd344] b
         Date: Wed Apr 10 19:31:33 2024 +0200
         1 file changed, 0 insertions(+), 0 deletions(-)
         create mode 100644 c
        alx@debian:~/tmp/foo$ git commit --amend -m c
        [main 94aa6a9] c
         Date: Wed Apr 10 19:31:33 2024 +0200
         1 file changed, 0 insertions(+), 0 deletions(-)
         create mode 100644 c
        alx@debian:~/tmp/foo$ git rebase -i HEAD^^^
        [detached HEAD 460b428] a
         Date: Wed Apr 10 19:31:20 2024 +0200
         1 file changed, 0 insertions(+), 0 deletions(-)
         create mode 100644 a
        Successfully rebased and updated refs/heads/main.
        alx@debian:~/tmp/foo$ git log --pretty=fuller
        commit 7b102f35902a5212114cd1ceb5ecf4e648c83abb (HEAD -> main)
        Author:     Alejandro Colomar <alx () kernel org>
        AuthorDate: Wed Apr 10 19:31:33 2024 +0200
        Commit:     Alejandro Colomar <alx () kernel org>
        CommitDate: Wed Apr 10 19:36:41 2024 +0200

            c

        commit b28ec7f6a33eacd9dd27f6493493bc399ecff66e
        Author:     Alejandro Colomar <alx () kernel org>
        AuthorDate: Wed Apr 10 19:31:33 2024 +0200
        Commit:     Alejandro Colomar <alx () kernel org>
        CommitDate: Wed Apr 10 19:36:41 2024 +0200

            b

        commit 460b42821313d48207760e79583d8fbd3f6fe3ec
        Author:     Alejandro Colomar <alx () kernel org>
        AuthorDate: Wed Apr 10 19:31:20 2024 +0200
        Commit:     Alejandro Colomar <alx () kernel org>
        CommitDate: Wed Apr 10 19:36:38 2024 +0200

            a

        commit dd8f3ea6a3dc1272389e7ad5afd950b3194bdea8
        Author:     Alejandro Colomar <alx () kernel org>
        AuthorDate: Wed Apr 10 19:30:59 2024 +0200
        Commit:     Alejandro Colomar <alx () kernel org>
        CommitDate: Wed Apr 10 19:30:59 2024 +0200

            init


Have a lovely day!
Alex


On Wed, Apr 10, 2024 at 06:28:13PM +0200, Solar Designer wrote:
On Wed, Apr 10, 2024 at 05:16:52AM +0200, Alejandro Colomar wrote:
I've been researching xz.git to learn about this malicious actor, and
who he might have worked for.

As a moderator, I reluctantly let this through out of respect for
Alejandro's time and knowing that many readers will find it interesting.

Thank you.

However:

This is almost off-topic for oss-security and it risks provoking further
speculation and potentially hatred in follow-ups.  Related analyses,
including not only of timezones but also of commit times, were already
posted elsewhere (e.g., a Wired story).  So let's please limit the
follow-ups to (1) corrections of any factual errors or major omissions
(to the extent of being misleading) there might be in Alejandro's
postings and (2) observations that more directly help us identify or
prevent more compromises like this (if any can be made based on this
analysis, which I doubt).  One major omission I'd like to point out is
that timezones can be faked - we have no reliable way to know which of
these, if any, actually correspond to where Jia Tan was.

Note that other recent threads in here about search for code patterns
similar to Jia Tan's and even for PGP keys similar to Jia Tan's are more
relevant to oss-security, because they're aimed to uncover potential
related backdoor code in other projects.  In contrast, identifying who
Jia Tan is or what country/ies they're from doesn't obviously help.  At
best, it may give us guesses on where the presumed targets are, but then
what?  We need to protect the whole ecosystem regardless of who/where
the current attackers are, and we need to develop means to detect such
attacks everywhere, not only at currently likely targets.


P.S.:  While the first part of this email is within "corrections of any
factual errors or major omissions", I acknowledge that this last part
might be getting even more off-topic.  Since I guess it's short and will
have no replies, I included it.  Sorry.

P.S.2:  I didn't find the other similar investigations in other sites
until today.  There's so much stuff about this that it's hard to find it
all.  Sorry for duplication.  Hopefully, this might contain some new
idea that might help someone.  Sorrt again.  :)

P.S.3:  I hope nobody takes this incident as an excuse to hate a group
of people.  This is a thing about states being evil, and there are
powerful states of all inclinations that do evil stuff.  And even if it
were just an individual, the same can be said of individuals.  I don't
intend this thread to be used for increasing hatred; instead I did it
for learning about how this has happened, and what kinds of mistakes and
patterns of mistakes can authors of this and similar attacks have
forgotten to check, which could be useful to detect similar attacks in
other projects, if similar git history checks are done in other repos.

P.S. Let's also not spam distro security teams with this (CC's dropped).
I'm sure they don't want tickets auto-created for such analyses, like
they would for vulnerability reports.  And I certainly don't want to
spend time removing more ticket auto-replies from our moderation queue.

Ok.


-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description:


Current thread: