funsec mailing list archives

Re: Image forensics


From: Dan Kaminsky <dan () doxpara com>
Date: Mon, 28 Dec 2009 18:44:18 +0100

I don't necessarily disagree with your assertions, Neal -- or, I at  
least think you're well within your rights as an author to take your  
particular position.

However, as an independent reviewer, I see a really small sample size  
for your findings, and no ground truth analysis. In other words, if I  
hand you 100 photos, approximately 50 of which are photoshopped and  
approximately 50 of which aren't, what percentage will your tools be  
better than chance at picking out the altered photos, and determining  
the alterations?

As you yourself admit, natural features can trigger your tool.  How  
often *do* they?  As you intriguingly point out, not always. This is  
good.

However.

Forensics aren't a game. People live and die over the determinations  
we make. There have...been issues, with bite mark analysis, and with  
arson determination, that have thoroughly destroyed lives, up to and  
including the death penalty.  This stuff is really important, way more  
than anything on this list.

What I would like to do is actually give you the hundred images as  
described, and receive:

A) The raw output from your tool (identical settings for all files --  
if you need multiple settings, multiply them out across all files).
B) Your interpretation of the output

I will then unmask the originals, and changes, and we can calculate  
the relative effectiveness of your various approaches.

I've always liked your work, Neal. I mean that, I was a graphics geek  
before I was a security geek, and you've done amazing work at the  
intersection.  I just think some numbers would make it infinitely  
stronger.

What do you think?


On Dec 28, 2009, at 6:13 PM, "Dr. Neal Krawetz" <hf () hackerfactor com>  
wrote:

On 27 Dec 2009, Rob, grandpa of Ryan, Trevor, Devon & Hannah wrote:
An interesting analysis of a graphic recently used by Victoria's  
Secret in their

advertising.  This gives chapter and verse of the techniques used,  
and results
obtained, demonstrating the ability to determine if an image has  
been altered, and
even which parts of an image have been modified, and how.

http://www.hackerfactor.com/blog/index.php?/archives/322-Body-By-Victoria.html

[snip]

Thanks for the compliments.
(I'm just catching up on my emails...)


Re: Dan Kaminsky
Neal's code is neat and pretty, but chapter and verse is no  
substitute
for open code and side by side checks. A LOT of his output bears a
strong resemblence to edge detection (really, look for high frequency
signal, it'll show up in every test).

Edges can show up for many reasons.
 - The edge may be a high frequency region (as you stated) that  
appears.
 - With algorithms like ELA and LG, high contrast edges (like  
stripes on
   a zebra) can be at a higher error level or strong gradient than the
   rest of the image. However, it will not be significantly stronger.
   (If ELA has a black background, then the high contrast edge may be
   grayish, but not white.)
 - Artists usually make changes at edges to reduce visual detection.
   Think about it: if you are going to cut out or mask something,  
you are
   going to do it along the edge.  In the VS example, her outline is
   visible, but inside edges are not.  If the algorithms were only
   picking up edges, then all edges (inside, outside, and outline)  
should
   be at the same level.  They are not.

As a counter example to your edge theory, consider:
http://www.hackerfactor.com/blog/index.php?/archives/338-Id-Rather-Wear-Photoshop.html
(If you get a 503 server error, just reload.  GoDaddy's server is  
having
trouble with the concurrent connection load right now.  This will be
fixed in January.)
In the Error Level Analysis, the halo totally disappears, even  
though it
is a high contrast and high frequency element (white on dark).
If the algorithm was measuring edges, then the halo should still be  
visible
at least to some degree.

Second, with regards to "open code", I strongly disagree with your
assumption.  You seem to assume that releasing the code will allow  
people
to validate the methods.

- If I release my own tool, then they will just use it and look at the
  results.  This does not validate the code nor the methods.

- If I don't release my own tools, but describe the algorithms, then
  people will create their own and perform a more scientific  
comparison.

If you create your own tool that implements a variation of the  
algorithm(s)
and you cannot generate the same kind of results, then there is either
something wrong with your code or with mine.  Now we can do a proper
comparison.  We have a hypothesis and multiple tools to test it.

As an example, I have implemented my own PCA, DCT, and wavelet  
libraries.
(I couldn't use any of the public ones due to GPL issues.)  To  
validate
my libraries, I compared the results with GSL and other public  
libraries.
Since GSL and the other public libraries generate the same output as
my own library, it validates the implementation and method.

Thus, to validate the algorithms I use, someone else needs to  
implement
something based on the description of the algorithm.  Already, someone
implemented ELA based on the description in my Black Hat presentation:
 http://www.tinyappz.com/wiki/Error_Level_Analyser
His tool creates different coloring (he decided to use a temperature  
map),
but it generates results that are similar enough to validate the  
algorithm
and implementation.

There is another group that is working on their own variation of  
Luminance
Gradient, but they have not yet released their code. (And I don't  
know if
they plan to.)  Then again, my LG implementation is not unique.   
There are
dozens of published papers that implement variations of the algorithm.
The algorithm I use is one of the most trivial methods (but it is fast
and effective).

Finally, I have no intention of releasing my code to the open source
community.  My code is designed to assist forensic investigators  
with a
serious problem: distinguishing real photos from computer graphics,  
and
identifying manipulation.  (This is the "real vs virtual" child porn
problem.)  A full, public release only helps the bad guys.
(Yes: this is the Security by Obscurity vs Full Disclosure debate.   
I've
chosen my side.)


Re: Imri Goldberg
John Graham-Cummings' copy-move code is really pretty cool.
I wrote my own variation (based on the same paper that he cites);  
mine is
heavily optimized.  I described some of my optimization at:
http://www.hackerfactor.com/blog/index.php?/archives/308-Send-In-The-Clones.html
There is even a group working on their own variation:
 http://www.tinyappz.com/wiki/Copymove
(If John's code, my code, and Tinyappz all generate similar results,  
then
the algorithm must work and the methodology must be sound!)


Re: Martin Tomasek
I like wavelet-based algorithms the most.

To each their own. :-)
Wavelets definitely have some strong points.
But for signal analysis, I'm actually growing very fond of Gaussian
Pyramid Decomposition.

                   -Neal
--
Neal Krawetz, Ph.D.
Hacker Factor Solutions
http://www.hackerfactor.com/
Author of "Introduction to Network Security" (Charles River Media,  
2006)
and "Hacking Ubuntu" (Wiley, 2007)

_______________________________________________
Fun and Misc security discussion for OT posts.
https://linuxbox.org/cgi-bin/mailman/listinfo/funsec
Note: funsec is a public and open mailing list.
_______________________________________________
Fun and Misc security discussion for OT posts.
https://linuxbox.org/cgi-bin/mailman/listinfo/funsec
Note: funsec is a public and open mailing list.


Current thread: