nanog mailing list archives

Re: Most energy efficient (home) setup


From: Jimmy Hess <mysidia () gmail com>
Date: Sun, 15 Apr 2012 19:12:55 -0500

On Sun, Apr 15, 2012 at 5:35 PM, Mike <ispbuilder () gmail com> wrote:

It's not like ECC memory requires a lot of power, a full-blown ATX
board or something; there is the Intel S1200KP  Mini-ITX board.

See,
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.5936&rep=rep1&type=pdf

But the exact rate of single bit errors in non-ECC memory today is not
necessarily predictable based on past studies from the 90s,   and
depends on environment also --  local lightning, solar activity, which
is increasing lately;  how much extra shielding you have in place
(Server placed inside a Faraday cage/Lead box ?), etc     ---  you'd
need measurements for your specific hardware;  there are likely
dependencies on the size of the memory cells,  the  vertical cross
section, other components in the system.


I think the simple test for this problem is to take a non-ECC machine, boot
from a CD/USB Key/etc with memtest or memtest86+ on it, and see if you get
errors over the course of a few days.

Memtest86+  contains a series of tests that help uncover specific
kinds of common memory faults; at any particular point in time, during
a memtest,  there is only a confined range of physical memory
addresses under test,  a bit flip anywhere else won't be detected.

Which means that Memtest is not likely to detect the error.

Test #11 Bit-Fade  with modifications could have some promise;  you
need a  24 hour delay instead of a 5 minute delay.      You need to
have close to the entire physical address space under test.
And you need truly random bit values  stored to some "reliable"
medium,  instead of the shortcut of storing known bit patterns.

*Memtest86+ itself and the system BIOS have to be stored in memory or
CPU cache somewhere.
But then again,  a  random bit flip  in  non-ECC  CPU L2  cache is a
possibility,  but  software like memtest if suitably modified could be
made to detect a 1-bit error that showed up in the majority of the
memory addresses.


--
-JH


Current thread: