Optimising away memset() calls?

Post by Sandy Harris
some_function()
{
char temp[N] ;
...
do something that puts sensitive data in temp[]
....
memset( temp, 0, N ) ;
}
The claim is that gcc may optimise away the memset() call since that
memory will not be referenced again.

True, the difference cannot be observed in a standard program, so it makes
sense that you have to say something special to convince the compiler not
to optimize it.

Post by Sandy Harris
There are various solutions to this. Linux now has memzero_explicit(),

I expected the kernel to be compiled with -fno-builtin-memset (or some
other option that implies this one), which would disable this
optimization.

Post by Sandy Harris
" A real fix would make memset() do the right thing reliably; if the
" programmer puts in memset( x, 0, nbytes) then the memory should
" be cleared, no ifs or buts.

If the programmer writes 1+1, the compiler should emit an add instruction,
no ifs or buts?

--
Marc Glisse

Andrew Haley

2014-10-08 08:29:37 UTC

Post by Sandy Harris
" A real fix would make memset() do the right thing reliably; if the
" programmer puts in memset( x, 0, nbytes) then the memory should
" be cleared, no ifs or buts. I do not know or care if that means
" changes in the compiler or in the library code or even both, but
" the fix should make the standard library code work right, not
" require adding a new function and expecting everyone to use it.
It seemed worth tossing this out for comment here.

That's not going to happen, given that memset_s() is a standard
idiom for this operation.

As it stands, even a call to memset_s() isn't sufficient: another
thread may not observe the memset_s() until a barrier has been
executed (and maybe not even then!) but I suppose it's better than
nothing.

Andrew.

Florian Weimer

2014-10-09 10:18:02 UTC

Post by Sandy Harris
There are various solutions to this. Linux now has memzero_explicit(),
Open SSH has bzero_explicit(), C11 has memset_s().

Minor nit: The C11 standard still allows memset_s to be optimized away
if this does not cause an observable difference in behavior (in C
terms). I know the intent is different, but this is impossible to
address within the standard, considering the direction in which the
language has developed over the last decades.

--
Florian Weimer / Red Hat Product Security

Andrew Haley

2014-10-09 10:33:16 UTC

Post by Sandy Harris
There are various solutions to this. Linux now has memzero_explicit(),
Open SSH has bzero_explicit(), C11 has memset_s().

The problem is that what constitutes an access to memory is not
defined, so no definition can ever be possible. Nonetheless, the
intent is clear, and we can only talk about implementations in this
context.

But really this is hopeless: there is nothing to prevent a C
implementation from taking another copy of a key and stashing it
somewhere, and it is not at all unlikely, for at least part of the
key. Anybody who really cares about this kind of thing must do some
machine-specific coding.

Andrew.

Ángel González

2014-10-09 20:52:35 UTC

Post by Sandy Harris
There are various solutions to this. Linux now has memzero_explicit(),
Open SSH has bzero_explicit(), C11 has memset_s().

How so?

Post by Florian Weimer
Unlike memset, any call to
the memset_s function shall be evaluated strictly according to the
rules of the abstract
machine as described in (5.1.2.3). That is, any call to the memset_s
function shall
assume that the memory indicated by s and n may be accessible in the
future and thus
must contain the values indicated by c.

For all the C compler knows, memset_s library function might be storing
a pointer to s in a
global variable, and checking its value on every libc call (including
exit functions).
The compiler would need to know that memset_s is special (either
intrinsically or thorugh
eg. function attributes). Either way, IMHO an advanced knowledge
allowing to optimize it out
would be a violation of K.3.7.4.1.
The only misbehavior route I see would be the compiler using undefined
behavior elsewhere
for deciding that the whole path is unreachable.

Regards

David Brown

2014-10-10 07:46:56 UTC

Post by Ãngel GonzÃ¡lez

Post by Sandy Harris
There are various solutions to this. Linux now has memzero_explicit(),
Open SSH has bzero_explicit(), C11 has memset_s().

How so?

When a function is specified in the C standards, then the compiler
/does/ know all about it. It knows that the memset_s library function
does not "store s in a global variable", because the C standard does not
allow it to do that (or at least, it does not allow such an action to be
visible to the program). And the compiler is free to implement memset_s
in any way it wants, including inlining it or perhaps even removing it
as long as the behaviour is correct as seen by the C abstract machine.
This is complicated by the fact that the standards don't actually
specify what is meant by things like "memory accesses".

Adding to that, as has been noted by others, particular architectures
might need things like memory barriers, cache flushes, synchronisation
instructions, etc., in order for the writes to be visible across the
system. The C compiler knows nothing about these things (it can provide
helpful intrinsic functions, but can't use them automatically), because
the C standards don't cover them.

So the only way to be absolutely sure that a memory area really is
cleared is to use an external function that the compiler does not know
about, and which also incorporates any required additional
machine-specific code. Thus you need to use memzero_explicit(),
bzero_explicit(), or equivalent.

Ángel González

2014-10-10 22:34:30 UTC

Post by David Brown
When a function is specified in the C standards, then the compiler
/does/ know all about it. It knows that the memset_s library function
does not "store s in a global variable", because the C standard does not
allow it to do that (or at least, it does not allow such an action to be
visible to the program).

I was refering both compilers not knowing about memset_s definition and
those that do. For the later, the compiler knows that memset_s won't store
s in a global variable, but the semantic of "shall assume that the memory
indicated by s and n may be accessible in the future and thus must contain
the values indicated by c" would be equivalent in this aspect to "the
pointer
is stored in a global variable".

Post by David Brown
And the compiler is free to implement memset_s
in any way it wants, including inlining it

That shouldn't be a problem.

Post by David Brown
or perhaps even removing it
as long as the behaviour is correct as seen by the C abstract machine.

My point was that both things (removing + correct behaviour) could not
be done.
(I was midly expecting someone to readily present a counterexample, though)

Post by David Brown
This is complicated by the fact that the standards don't actually
specify what is meant by things like "memory accesses".
Adding to that, as has been noted by others, particular architectures
might need things like memory barriers, cache flushes, synchronisation
instructions, etc., in order for the writes to be visible across the
system. The C compiler knows nothing about these things (it can provide
helpful intrinsic functions, but can't use them automatically), because
the C standards don't cover them.

This is an interesting point. I agree that the compiler could reorder
the memset_s call,
but I don't think that more than delaying it a few statements for which
it can prove
they don't access that area. the next library call (even if it is in the
C spec, remember
that the way they are implemented is not defined).

Thus, the zeroed contents might not be immediatly available for an
omniscient inspector,
but they would in a small delta. Or, if we have a bizarre architecture
needing a barrier in
order to "commit" the memory write, that shall be performed by memset_s
for fullfilling
the "must contain the values indicated by c" requeriment.

It is true that you may need special instructions for ensuring the new
value from a concurrent
thread (I don't consider memset_s suitable for clearing a spinlock), but
C doesn't deal with
threads or shared memory, thus you are . Then you either use another
primitive to synchronize
them (and at that point the memory will have to be memsetted), or they
are in a race condition
and there's nothing specified for what it may contain.

I also speculated with the idea of a processor that optimized the
microcode in such a way that ended
up removing the memset_s call, but concluded that (in addition of the
cpu requiring such global
knowledge not to be realistic) then memset_s would have to include its
special barriers for fulfilling
the "shall assume that the memory indicated by s and n may be accessible
in the future and thus
must contain the values indicated by c" if it was otherwise implemented
with such instructions that
would allow the processor not to store the values in memory.

Post by David Brown
So the only way to be absolutely sure that a memory area really is
cleared is to use an external function that the compiler does not know
about, and which also incorporates any required additional
machine-specific code. Thus you need to use memzero_explicit(),
bzero_explicit(), or equivalent.

And is architecture specific and not portable. IMHO memset_s serves the
task equally well, with the benefit
of being a standard function.

PS: As I was finishing this email, I thought the following dull
implementation (error checking skipped):

errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n) {
size_t i; unsigned char *tmp;
tmp = malloc(n);
for (i = 0; i < n; i++) {
tmp[i] = ((unsigned char *)s)[i] ^(unsigned char) c;
}
for (i = 0; i < n; i++) {
((unsigned char *)s)[i] ^= tmp[i];
}
}

Although completely missing the point, I think it would be conformant.*
If you know that a specific implementation
is flawed, please avoid it (you can use a replacement) or, better yet,
replace your libc with a good one.

Andrew Haley

2014-10-10 09:20:03 UTC

Post by Ãngel GonzÃ¡lez
The compiler would need to know that memset_s is special (either
intrinsically or thorugh eg. function attributes). Either way, IMHO
an advanced knowledge allowing to optimize it out would be a
violation of K.3.7.4.1.

It would be a perverse thing to do and goes against intent, but we
again fall into the problem of defining an access. But this is
irrelevant anyway: even if a key is stored in an array X in the source
code and the array X is later wiped with memset_s(), there is
absolutely nothing to force the compiler to use X during the
computation: it may well store the key somewhere else altogether. So,
all we can have here is a best effort. Anyone who wants to be sure
that the key is wiped is going to have to do something machine-
dependent.

Andrew.

Sandy Harris

2014-10-10 13:10:06 UTC

It would be a perverse thing to do and goes against intent, but we
again fall into the problem of defining an access. ... So,
all we can have here is a best effort. Anyone who wants to be sure
that the key is wiped is going to have to do something machine-
dependent.

I agree there may have to be some machine-dependent code, but
it seems to me it should be in the definition of memset_s(). The
library code (or the compiler if it provides it as a built-in) should
deal with this so application programmers do not have to.

That is one of the main reasons to have library functions. Almost
any programmer could write a version of memset() easily, but we
use the library function instead and let library implementers
worry about things like optimizing for a particular word size or
using assembler for speed. Similarly, making memset_s() live
up to the spec is a problem for the library implementers.

Currently glibc, at least on my Ubuntu box, does not have
memset_s(), so Linux kernel code has memzero_explicit().
That is OK as a short-term solution, but memsets_s() would
be a cleaner solution.

Andrew Haley

2014-10-10 17:29:18 UTC

Post by Sandy Harris

I don't think you read what I wrote. Or think it said something
different from what I intended.

Andrew.

Ángel González

2014-10-10 17:25:39 UTC

This is a different problem: While the compiler must wipe X, the compiler
might have stored copies somewhere else (typically in the stack, although
it would be possible to leave sensitive data in registers that aren't
overwritten
later).

Florian Weimer

2014-10-11 10:07:16 UTC

It's not even that hard to come up with an example where calling
memset_s causes the creation of another copy which is the only one which
is being cleared:

#include <stddef.h>

struct key {
unsigned long long low;
unsigned long long high;
};

struct key get_key(void);
void use_key(struct key);

void secure_clear_memory(void *, size_t);

void
without_clear(void)
{
struct key k;
k = get_key();
use_key(k);
}

void
with_clear(void)
{
struct key k;
k = get_key();
use_key(k);
secure_clear_memory(&k, sizeof(k));
}

without_clear:
subq $8, %rsp
call get_key
movq %rdx, %rsi
movq %rax, %rdi
call use_key
addq $8, %rsp
ret

with_clear:
subq $24, %rsp
call get_key
movq %rax, %rdi
movq %rdx, %rsi
movq %rax, (%rsp)
movq %rdx, 8(%rsp)
call use_key
movq %rsp, %rdi
movl $16, %esi
call secure_clear_memory
addq $24, %rsp
ret

--
Florian Weimer / Red Hat Product Security

Ángel González

2014-10-11 20:20:01 UTC