Discussion:
is portable aliasing possible in C++?
Andy Webber
2014-09-04 16:11:36 UTC
Permalink
Our goal is to avoid bugs caused by strict aliasing in our networking
libraries. My question is how to guarantee that we're not violating
the aliasing rules while also getting the most optimization. I've
read through a ton of information about this online and in some gcc
discussions, but I don't see a consensus.

Memcpy always works, but is dependent on optimization to avoid copies.
The union of values is guaranteed to work by C++11, but may involve
copies. In the past we've always used reinterpret_cast, but I don't
believe that to be guaranteed by the standard. The __may_alias__
attribute is specific to gcc. Placement new changes the dynamic type
of the memory which I believe guarantees correct access through a
layout*.

Two interesting gcc discussions:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286#c1 (placement new
does not change the dynamic type as it should)
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01440.html (Strenghten
assumption about dynamic type changes (placement new))
https://gcc.gnu.org/gcc-4.4/porting_to.html (gcc 4.4 release notes)

Each test works when built with -O3 on gcc-4.8.3, but I would like to
standardize across compilers and versions. The optimization
information generated by -fdump-tree-all is interesting here as it
shows slightly different optimization for each case though
reinterpret_cast and placement new generate identical code in the end.

Here are tests illustrating the different options:

#include <cstring>
#include <memory>

struct layout
{
int i;
short s;
char c;
};

template<typename T, typename U>
void verify(T t, U u)
{
if(t != u)
abort();
}

void mem_copy()
{
char storage[sizeof(layout)];
storage[0] = 1;
storage[1] = 0;
storage[2] = 0;
storage[3] = 0;
storage[4] = 2;
storage[5] = 0;
storage[6] = 3;
storage[7] = 0;

layout l;

std::memcpy(&l, &storage, sizeof(layout));

verify(l.i, 1);
verify(l.s, 2);
verify(l.c, 3);
}

void union_copy()
{
union both
{
char storage[sizeof(layout)];
layout l;
};

both b;

b.storage[0] = 1;
b.storage[1] = 0;
b.storage[2] = 0;
b.storage[3] = 0;
b.storage[4] = 2;
b.storage[5] = 0;
b.storage[6] = 3;
b.storage[7] = 0;

verify(b.l.i, 1);
verify(b.l.s, 2);
verify(b.l.c, 3);
}

int placement_new()
{
char storage[sizeof(layout)];
storage[0] = 1;
storage[1] = 0;
storage[2] = 0;
storage[3] = 0;
storage[4] = 2;
storage[5] = 0;
storage[6] = 3;
storage[7] = 0;

auto l = new(storage) layout;

verify(l->i, 1);
verify(l->s, 2);
verify(l->c, 3);
}

struct layout_alias: layout {} __attribute__((__may_alias__));
int may_alias()
{
char storage[sizeof(layout)];
storage[0] = 1;
storage[1] = 0;
storage[2] = 0;
storage[3] = 0;
storage[4] = 2;
storage[5] = 0;
storage[6] = 3;
storage[7] = 0;

auto l = reinterpret_cast<layout_alias*>(storage);

verify(l->i, 1);
verify(l->s, 2);
verify(l->c, 3);
}

int reint_cast()
{
char storage[sizeof(layout)];
storage[0] = 1;
storage[1] = 0;
storage[2] = 0;
storage[3] = 0;
storage[4] = 2;
storage[5] = 0;
storage[6] = 3;
storage[7] = 0;

auto l = reinterpret_cast<layout*>(storage);
verify(l->i, 1);
verify(l->s, 2);
verify(l->c, 3);
}


int main()
{
mem_copy();
union_copy();
placement_new();
may_alias();
reint_cast();
}
Andrew Haley
2014-09-04 16:51:26 UTC
Permalink
Post by Andy Webber
Our goal is to avoid bugs caused by strict aliasing in our networking
libraries. My question is how to guarantee that we're not violating
the aliasing rules while also getting the most optimization. I've
read through a ton of information about this online and in some gcc
discussions, but I don't see a consensus.
Memcpy always works, but is dependent on optimization to avoid copies.
The union of values is guaranteed to work by C++11, but may involve
copies.
Is this a real worry? IME it makes copies when it needs to.
Post by Andy Webber
Each test works when built with -O3 on gcc-4.8.3, but I would like to
standardize across compilers and versions. The optimization
information generated by -fdump-tree-all is interesting here as it
shows slightly different optimization for each case though
reinterpret_cast and placement new generate identical code in the end.
The "union trick" has always worked with GCC, and is now hallowed by
the standard. It's also easy to understand. It generates code as
efficient as all the other ways of doing it, AFAIAA. It's what we
have always recommended.

Your test is nice. I suppose we could argue that this is a missed
optimization:

union_copy():
movl $2, %eax
cmpw $2, %ax
jne .L13

I don't know why we only generate code for one of the tests.

Andrew.
Andy Webber
2014-09-04 17:18:06 UTC
Permalink
Post by Andrew Haley
Post by Andy Webber
Our goal is to avoid bugs caused by strict aliasing in our networking
libraries. My question is how to guarantee that we're not violating
the aliasing rules while also getting the most optimization. I've
read through a ton of information about this online and in some gcc
discussions, but I don't see a consensus.
Memcpy always works, but is dependent on optimization to avoid copies.
The union of values is guaranteed to work by C++11, but may involve
copies.
Is this a real worry? IME it makes copies when it needs to.
Post by Andy Webber
Each test works when built with -O3 on gcc-4.8.3, but I would like to
standardize across compilers and versions. The optimization
information generated by -fdump-tree-all is interesting here as it
shows slightly different optimization for each case though
reinterpret_cast and placement new generate identical code in the end.
The "union trick" has always worked with GCC, and is now hallowed by
the standard. It's also easy to understand. It generates code as
efficient as all the other ways of doing it, AFAIAA. It's what we
have always recommended.
Your test is nice. I suppose we could argue that this is a missed
movl $2, %eax
cmpw $2, %ax
jne .L13
I don't know why we only generate code for one of the tests.
Andrew.
Thanks for responding. I appreciate any clarity that the gcc devs and
standards experts can give here.

I'm especially interested in the validity of the placement new
approach. Your recommendation of going through unions causes some
difficulty for us in terms of type abstraction. Specifically,
receiving network bytes directly into a union with all possible
message types present in the union is somewhat less flexible than
determining the correct message type and doing a placement new to
create essentially a memory overlay. Is placement new a suitable
substitute for __may_alias__ in this specific example?

Andy
Andrew Haley
2014-09-04 17:23:33 UTC
Permalink
Regrettably,
Post by Andy Webber
Post by Andrew Haley
Post by Andy Webber
Our goal is to avoid bugs caused by strict aliasing in our networking
libraries. My question is how to guarantee that we're not violating
the aliasing rules while also getting the most optimization. I've
read through a ton of information about this online and in some gcc
discussions, but I don't see a consensus.
Memcpy always works, but is dependent on optimization to avoid copies.
The union of values is guaranteed to work by C++11, but may involve
copies.
Is this a real worry? IME it makes copies when it needs to.
Post by Andy Webber
Each test works when built with -O3 on gcc-4.8.3, but I would like to
standardize across compilers and versions. The optimization
information generated by -fdump-tree-all is interesting here as it
shows slightly different optimization for each case though
reinterpret_cast and placement new generate identical code in the end.
The "union trick" has always worked with GCC, and is now hallowed by
the standard. It's also easy to understand. It generates code as
efficient as all the other ways of doing it, AFAIAA. It's what we
have always recommended.
Your test is nice. I suppose we could argue that this is a missed
movl $2, %eax
cmpw $2, %ax
jne .L13
I don't know why we only generate code for one of the tests.
Thanks for responding. I appreciate any clarity that the gcc devs and
standards experts can give here.
I'm especially interested in the validity of the placement new
approach. Your recommendation of going through unions causes some
difficulty for us in terms of type abstraction. Specifically,
receiving network bytes directly into a union with all possible
message types present in the union is somewhat less flexible than
determining the correct message type and doing a placement new to
create essentially a memory overlay. Is placement new a suitable
substitute for __may_alias__ in this specific example?
I regret that the exact legality of placement new in this context is
beyond me. I think it's OK as long as you only do it with POD-types, but
I'd have bounce this off someone like Jason Merrill.

Andrew.
Andy Webber
2014-09-04 17:44:47 UTC
Permalink
Know of any way to ask Jason Merrill or Richard Biener to weigh in?
They seem to be very knowledgable in this area.
Post by Andrew Haley
Regrettably,
Post by Andy Webber
Post by Andrew Haley
Post by Andy Webber
Our goal is to avoid bugs caused by strict aliasing in our networking
libraries. My question is how to guarantee that we're not violating
the aliasing rules while also getting the most optimization. I've
read through a ton of information about this online and in some gcc
discussions, but I don't see a consensus.
Memcpy always works, but is dependent on optimization to avoid copies.
The union of values is guaranteed to work by C++11, but may involve
copies.
Is this a real worry? IME it makes copies when it needs to.
Post by Andy Webber
Each test works when built with -O3 on gcc-4.8.3, but I would like to
standardize across compilers and versions. The optimization
information generated by -fdump-tree-all is interesting here as it
shows slightly different optimization for each case though
reinterpret_cast and placement new generate identical code in the end.
The "union trick" has always worked with GCC, and is now hallowed by
the standard. It's also easy to understand. It generates code as
efficient as all the other ways of doing it, AFAIAA. It's what we
have always recommended.
Your test is nice. I suppose we could argue that this is a missed
movl $2, %eax
cmpw $2, %ax
jne .L13
I don't know why we only generate code for one of the tests.
Thanks for responding. I appreciate any clarity that the gcc devs and
standards experts can give here.
I'm especially interested in the validity of the placement new
approach. Your recommendation of going through unions causes some
difficulty for us in terms of type abstraction. Specifically,
receiving network bytes directly into a union with all possible
message types present in the union is somewhat less flexible than
determining the correct message type and doing a placement new to
create essentially a memory overlay. Is placement new a suitable
substitute for __may_alias__ in this specific example?
I regret that the exact legality of placement new in this context is
beyond me. I think it's OK as long as you only do it with POD-types, but
I'd have bounce this off someone like Jason Merrill.
Andrew.
Andy Webber
2014-09-04 17:47:35 UTC
Permalink
Post by Andy Webber
Know of any way to ask Jason Merrill or Richard Biener to weigh in?
They seem to be very knowledgable in this area.
Post by Andrew Haley
Regrettably,
Post by Andy Webber
Post by Andrew Haley
Post by Andy Webber
Our goal is to avoid bugs caused by strict aliasing in our networking
libraries. My question is how to guarantee that we're not violating
the aliasing rules while also getting the most optimization. I've
read through a ton of information about this online and in some gcc
discussions, but I don't see a consensus.
Memcpy always works, but is dependent on optimization to avoid copies.
The union of values is guaranteed to work by C++11, but may involve
copies.
Is this a real worry? IME it makes copies when it needs to.
Post by Andy Webber
Each test works when built with -O3 on gcc-4.8.3, but I would like to
standardize across compilers and versions. The optimization
information generated by -fdump-tree-all is interesting here as it
shows slightly different optimization for each case though
reinterpret_cast and placement new generate identical code in the end.
The "union trick" has always worked with GCC, and is now hallowed by
the standard. It's also easy to understand. It generates code as
efficient as all the other ways of doing it, AFAIAA. It's what we
have always recommended.
Your test is nice. I suppose we could argue that this is a missed
movl $2, %eax
cmpw $2, %ax
jne .L13
I don't know why we only generate code for one of the tests.
Thanks for responding. I appreciate any clarity that the gcc devs and
standards experts can give here.
I'm especially interested in the validity of the placement new
approach. Your recommendation of going through unions causes some
difficulty for us in terms of type abstraction. Specifically,
receiving network bytes directly into a union with all possible
message types present in the union is somewhat less flexible than
determining the correct message type and doing a placement new to
create essentially a memory overlay. Is placement new a suitable
substitute for __may_alias__ in this specific example?
I regret that the exact legality of placement new in this context is
beyond me. I think it's OK as long as you only do it with POD-types, but
I'd have bounce this off someone like Jason Merrill.
Andrew.
Sorry, didn't mean to top post the last reply.

Know of any way to ask Jason Merrill or Richard Biener to weigh in?
They seem to be very knowledgable in this area.
Andrew Haley
2014-09-04 17:48:05 UTC
Permalink
Post by Andy Webber
Know of any way to ask Jason Merrill or Richard Biener to weigh in?
They seem to be very knowledgable in this area.
They should be. Drop them an email, with a pointer to the thread:

https://gcc.gnu.org/ml/gcc-help/2014-09/msg00030.html
Jonathan Wakely
2014-09-04 23:11:51 UTC
Permalink
Post by Andrew Haley
The "union trick" has always worked with GCC, and is now hallowed by
the standard. It's also easy to understand. It generates code as
efficient as all the other ways of doing it, AFAIAA. It's what we
have always recommended.
Type punning with unions is allowed in C since C99 but not by any C++
standard (although it does work with GCC).

Placement new might work with GCC in practice as long as the buffer is
correctly aligned and the type being constructed does not have
non-trivial initialization. However, my reading of the standard is
that after the placement new, if the object's members are not
initialized then they have indeterminate values (not the values that
were at those memory addresses already, even though that's likely to
be what happens in practice). Objects with indeterminate values can
only be used in very limited ways and generally lead to undefined
behaviour, and code with undefined behaviour does not usually mix well
with aggressive optimizations.

FWIW I prefer the memcpy approach, it usually generates the same code
as type punning via a union, but is more portable and has guaranteed
results that are well-defined (and if you're interested in maximum
performance then you should not be using unoptimized code, so it
shouldn't matter that the mempcy solution relies on optimizations).
Andrew Haley
2014-09-05 07:16:18 UTC
Permalink
Post by Jonathan Wakely
FWIW I prefer the memcpy approach, it usually generates the same code
as type punning via a union,
You wouldn't say that if you'd tried the test code provided by the OP.

Andrew.
Jason Merrill
2014-09-05 14:19:18 UTC
Permalink
Post by Jonathan Wakely
Placement new might work with GCC in practice as long as the buffer is
correctly aligned and the type being constructed does not have
non-trivial initialization. However, my reading of the standard is
that after the placement new, if the object's members are not
initialized then they have indeterminate values (not the values that
were at those memory addresses already, even though that's likely to
be what happens in practice).
I think that's a defect in the (non-normative) note in 5.3.4/17, which
is assuming that this follows from the rule in 8.5/12. But I don't
think it does, because in this case the storage has been initialized and
therefore is no longer indeterminate.

So I think the placement new form is OK.

Jason
Richard Biener
2014-09-08 09:33:49 UTC
Permalink
Post by Jonathan Wakely
Placement new might work with GCC in practice as long as the buffer is
correctly aligned and the type being constructed does not have
non-trivial initialization. However, my reading of the standard is
that after the placement new, if the object's members are not
initialized then they have indeterminate values (not the values that
were at those memory addresses already, even though that's likely to
be what happens in practice).
I think that's a defect in the (non-normative) note in 5.3.4/17, which is
assuming that this follows from the rule in 8.5/12. But I don't think it
does, because in this case the storage has been initialized and therefore is
no longer indeterminate.
So I think the placement new form is OK.
Huh? Doesn't placement new end the lifetime of the object that resided
at the address (by re-using memory) and start lifetime of a new object
at that address?

So how can the new object see the storage contents of the old object?

GCC certainly doesn't support type-punning via placement new.

As with all methods you might be lucky with recent enough GCC
versions as if they see a must-alias they don't try to disambiguate
further with TBAA.

Richard.
Jason
Jason Merrill
2014-09-10 14:31:36 UTC
Permalink
I asked the C++ committee core language mailing list about this, and
everyone that has weighed in seems to agree that after placement new the
of the object value is indeterminate, so no type-punning.

Jason
h***@yahoo.com
2014-09-09 23:12:56 UTC
Permalink
Using reinterpret_cast to convert between unrelated pointer types is a language feature that, by definition, makes your code not portable (assuming you dereference).

While it's undefined by the standard, that doesn't mean it's undefined for a given platform (OS/compiler/compiler options). For example, on an architecture which allows unaligned reads and the compiler does not perform strict-aliasing optimizations, it's likely OK (e.g. MS or Clang on x86). If the compiler does perform those optimizations and provides a way to programmatically disable them (like GCC's may_alias) and you make use of that feature correctly, it's also likely OK. But if the compiler does perform those optimizations and it doesn't provide something like may_alias, then it is undefined, and you have no choice but to disable the optimization for the whole program or resort to memcpy.

So it would seem the answer is no, portable aliasing is not possible in C++. Though I wonder if it should be (with a new language feature).

Jay Haynberg
Andrew Haley
2014-09-10 08:16:59 UTC
Permalink
Post by h***@yahoo.com
So it would seem the answer is no, portable aliasing is not possible
in C++. Though I wonder if it should be (with a new language
feature).
In practice, a union is exactly that, and I've not heard of a compiler
which doesn't do the right thing. Given that it's explicitly legal in
C11, I can't see any reason which C++ shouldn't simply adopt the same
language.

Andrew.
h***@yahoo.com
2014-09-10 23:03:51 UTC
Permalink
Post by Andrew Haley
Post by h***@yahoo.com
So it would seem the answer is no, portable aliasing is not possible
in C++. Though I wonder if it should be (with a new language
feature).
In practice, a union is exactly that, and I've not heard of a compiler
which doesn't do the right thing. Given that it's explicitly legal in
C11, I can't see any reason which C++ shouldn't simply adopt the same
language.
I agree. But in the union case, you're copying bytes. If you don't perform

a copy, for example:

struct msg {
// ...
};
char *get_bytes();
msg *p = reinterpret_cast<msg*>(get_bytes());
if (p->i)
// ...

Then there can't be a portable way to do this, because there's hardware that

doesn't permit unaligned reads (e.g. where you'd get a SIGBUS). So I guess

I retract that "I wonder if" part :-)

Jay Haynberg
Andrew Haley
2014-09-11 08:11:03 UTC
Permalink
Post by h***@yahoo.com
Post by Andrew Haley
Post by h***@yahoo.com
So it would seem the answer is no, portable aliasing is not possible
in C++. Though I wonder if it should be (with a new language
feature).
In practice, a union is exactly that, and I've not heard of a compiler
which doesn't do the right thing. Given that it's explicitly legal in
C11, I can't see any reason which C++ shouldn't simply adopt the same
language.
I agree. But in the union case, you're copying bytes. If you don't perform
struct msg {
// ...
};
char *get_bytes();
msg *p = reinterpret_cast<msg*>(get_bytes());
Why are you doing this?
Post by h***@yahoo.com
if (p->i)
// ...
Then there can't be a portable way to do this, because there's hardware that
doesn't permit unaligned reads (e.g. where you'd get a SIGBUS).
But a union will always be correctly aligned for all members.

Andrew.
h***@yahoo.com
2014-09-11 23:25:38 UTC
Permalink
Post by Andrew Haley
Post by h***@yahoo.com
msg *p = reinterpret_cast<msg*>(get_bytes());
Why are you doing this?
For efficiency, by preventing a copy (imagine get_bytes() is getting bytes out of a socket buffer).


Putting alignment/padding concerns aside, it would be nice if there was a way to explicitly tell the compiler, I want to do this and, please don’t reorder stores and loads, or perform other strict-aliasing optimizations, on the memory pointed to by this pointer (similar to the effect of a memcpy). I believe the only way to do this is with the GCC may_alias attribute, or a more heavy-handed memory clobber. I think the OP wanted to ask the GCC folks if there was another, possibly more portable, way; for example, placement new, but that turned out not to be an option.

Another related, maybe more important, question is if GCC sees a reinterpet_cast like this (without a may_alias type), is it free to discard code or otherwise drastically change it due to the fact that it’s undefined by the standard? Like some of the cases shown in:

“any undefined behavior in C gives license to the implementation (the compiler and runtime) to produce code that ... does completely unexpected things, or worse”

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

Jay Haynberg
Andrew Haley
2014-09-12 08:32:05 UTC
Permalink
Post by h***@yahoo.com
Post by Andrew Haley
Post by h***@yahoo.com
msg *p = reinterpret_cast<msg*>(get_bytes());
Why are you doing this?
For efficiency, by preventing a copy (imagine get_bytes() is getting
bytes out of a socket buffer).
Firstly, char types alias everything.

Secondly, even if you call memcpy(), a compiler doesn't have to do any
copies if it can prove that the union you're reading into doesn't
escape.

Look at this:

double kludge(void *p) {
union {
char bytes[sizeof (double)];
double d;
} u;
memcpy(u.bytes, p, sizeof u.bytes);
return u.d;
}

which generates

kludge:
ldr d0, [x0]
ret

and is completely portable, with no undefined behaviour.
Post by h***@yahoo.com
Putting alignment/padding concerns aside, it would be nice if there
was a way to explicitly tell the compiler, I want to do this and,
please don’t reorder stores and loads, or perform other
strict-aliasing optimizations, on the memory pointed to by this
pointer (similar to the effect of a memcpy). I believe the only way
to do this is with the GCC may_alias attribute, or a more
heavy-handed memory clobber. I think the OP wanted to ask the GCC
folks if there was another, possibly more portable, way; for
example, placement new, but that turned out not to be an option.
Well, there isn't a more portable way, and we can't ignore alignment.
All that GCC can do is provide a way to do it; we can't make anyone
else comply.
Post by h***@yahoo.com
Another related, maybe more important, question is if GCC sees a
reinterpet_cast like this (without a may_alias type), is it free to
discard code or otherwise drastically change it due to the fact that
it’s undefined by the standard?
Yes. It may, and it does.

Andrew.
h***@yahoo.com
2014-09-12 22:58:08 UTC
Permalink
Post by Andrew Haley
Firstly, char types alias everything.
(I'm not sure why you wrote that?) I know if you cast to char type,
it can, but I'm going from a char type.
Post by Andrew Haley
Secondly, even if you call memcpy(), a compiler doesn't have to do any
copies if it can prove that the union you're reading into doesn't
escape.
If the compiler can optimize away memcpy, why memcpy into a union, why
not just to the type in question? Or is memcpy into a union special?

In other words, if I write:

msg p;
memcpy(&p, get_bytes(), sizeof p); // assume the size OK
if (p.i)
// ...

Can the memcpy be optimized away, making it similar to the cast version
(but not undefined)?
Post by Andrew Haley
Post by h***@yahoo.com
Putting alignment/padding concerns aside
FYI, I meant assume I'm on x86, or hardware that allows unaligned reads,
and my struct doesn't have padding issues. For example, sending a pragma
packed struct over a socket.
Post by Andrew Haley
Post by h***@yahoo.com
Another related, maybe more important, question is if GCC sees a
reinterpet_cast like this (without a may_alias type), is it free to
discard code or otherwise drastically change it due to the fact that
it’s undefined by the standard?
Yes. It may, and it does.
At some point, I understand GCC began to optimize more heavily in
strict-aliasing opportunities. For future reference, when GCC makes
changes like this, are they always mentioned in the release notes or
someplace else?

Thx!
Jay Haynberg
Andrew Haley
2014-09-13 07:23:00 UTC
Permalink
Post by h***@yahoo.com
Post by Andrew Haley
Firstly, char types alias everything.
(I'm not sure why you wrote that?)
Because the specification says so. 6.3.2.3, Pointers, in C9X.
Post by h***@yahoo.com
I know if you cast to char type,
it can, but I'm going from a char type.
Post by Andrew Haley
Secondly, even if you call memcpy(), a compiler doesn't have to do any
copies if it can prove that the union you're reading into doesn't
escape.
If the compiler can optimize away memcpy, why memcpy into a union, why
not just to the type in question? Or is memcpy into a union special?
You can copy the bytes from one object to another, and it has the
same effect. I can't guarantee it generates the same code as a
union in all cases.
Post by h***@yahoo.com
msg p;
memcpy(&p, get_bytes(), sizeof p); // assume the size OK
if (p.i)
// ...
Can the memcpy be optimized away, making it similar to the cast version
(but not undefined)?
Sure. Try it.
Post by h***@yahoo.com
Post by Andrew Haley
Post by h***@yahoo.com
Putting alignment/padding concerns aside
FYI, I meant assume I'm on x86, or hardware that allows unaligned reads,
and my struct doesn't have padding issues. For example, sending a pragma
packed struct over a socket.
Post by Andrew Haley
Post by h***@yahoo.com
Another related, maybe more important, question is if GCC sees a
reinterpet_cast like this (without a may_alias type), is it free to
discard code or otherwise drastically change it due to the fact that
it’s undefined by the standard?
Yes. It may, and it does.
At some point, I understand GCC began to optimize more heavily in
strict-aliasing opportunities. For future reference, when GCC makes
changes like this, are they always mentioned in the release notes or
someplace else?
Not AFAIK. We take the view that GCC is free to optimize as much as
possible, subject to the constraints of the language. And, of course,
we can't always predict what an optimization might do to every buggy
program.

Andrew.
Oleg Endo
2014-09-13 11:45:02 UTC
Permalink
Post by Andrew Haley
Post by h***@yahoo.com
Post by Andrew Haley
Firstly, char types alias everything.
(I'm not sure why you wrote that?)
Because the specification says so. 6.3.2.3, Pointers, in C9X.
Post by h***@yahoo.com
I know if you cast to char type,
it can, but I'm going from a char type.
Post by Andrew Haley
Secondly, even if you call memcpy(), a compiler doesn't have to do any
copies if it can prove that the union you're reading into doesn't
escape.
If the compiler can optimize away memcpy, why memcpy into a union, why
not just to the type in question? Or is memcpy into a union special?
You can copy the bytes from one object to another, and it has the
same effect. I can't guarantee it generates the same code as a
union in all cases.
Post by h***@yahoo.com
msg p;
memcpy(&p, get_bytes(), sizeof p); // assume the size OK
if (p.i)
// ...
Can the memcpy be optimized away, making it similar to the cast version
(but not undefined)?
I ran into a similar thing a while ago. See also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59478 for some experiments.
Probably the optimizations will look differently on other targets than
SH. On strict alignment targets there's also this memcpy issue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 which might be
related.

Cheers,
Oleg
Hei Chan
2014-09-15 02:36:55 UTC
Permalink
Hi,


This is an interesting thread.

I think it is very common that people try to avoid making a copy from the buffer filled by recv() (or alike) to achieve lowest latency.

Given that
1. The "union trick" has always worked with GCC, and is now hallowed by
the standard. So it sounds like GCC might change in the future.

2. Somewhere in the code that might manipulate the buffer via somehow casted packed C struct. Hence, any compiler is unlikely able to avoid making call if memcpy() is used.


Then, I have the following questions:
A. I use GCC and portability isn't an issue. What is the best type punning method to achieve lowest latency?
B. Let's say portability is important. What's the best type punning method to achieve lowest latency? It seems like memcpy() is the only choice?

Thanks in advance.
Post by h***@yahoo.com
Post by Andrew Haley
Firstly, char types alias everything.
(I'm not sure why you wrote that?)
Because the specification says so. 6.3.2.3, Pointers, in C9X.
Post by h***@yahoo.com
I know if you cast to char type,
it can, but I'm going from a char type.
Post by Andrew Haley
Secondly, even if you call memcpy(), a compiler doesn't have to do any
copies if it can prove that the union you're reading into doesn't
escape.
If the compiler can optimize away memcpy, why memcpy into a union, why
not just to the type in question? Or is memcpy into a union special?
You can copy the bytes from one object to another, and it has the
same effect. I can't guarantee it generates the same code as a
union in all cases.
Post by h***@yahoo.com
msg p;
memcpy(&p, get_bytes(), sizeof p); // assume the size OK
if (p.i)
// ...
Can the memcpy be optimized away, making it similar to the cast version
(but not undefined)?
Sure. Try it.
Post by h***@yahoo.com
Post by Andrew Haley
Post by h***@yahoo.com
Putting alignment/padding concerns aside
FYI, I meant assume I'm on x86, or hardware that allows unaligned reads,
and my struct doesn't have padding issues. For example, sending a pragma
packed struct over a socket.
Post by Andrew Haley
Post by h***@yahoo.com
Another related, maybe more important, question is if GCC sees a
reinterpet_cast like this (without a may_alias type), is it free to
discard code or otherwise drastically change it due to the fact that
it’s undefined by the standard?
Yes. It may, and it does.
At some point, I understand GCC began to optimize more heavily in
strict-aliasing opportunities. For future reference, when GCC makes
changes like this, are they always mentioned in the release notes or
someplace else?
Not AFAIK. We take the view that GCC is free to optimize as much as
possible, subject to the constraints of the language. And, of course,
we can't always predict what an optimization might do to every buggy
program.

Andrew.
Andrew Haley
2014-09-15 08:35:45 UTC
Permalink
Post by Hei Chan
This is an interesting thread.
I think it is very common that people try to avoid making a copy
from the buffer filled by recv() (or alike) to achieve lowest
latency.
Given that
1. The "union trick" has always worked with GCC, and is now hallowed
by the standard. So it sounds like GCC might change in the future.
Why?
Post by Hei Chan
2. Somewhere in the code that might manipulate the buffer via
somehow casted packed C struct. Hence, any compiler is unlikely
able to avoid making call if memcpy() is used.
I don't understand what you mean by this. You can always write a
function which takes a pointer to a character type and calls memcpy()
to copy it into any scalar type, and it won't unnecessarily call
anything; or if it does that's a missed-optimization bug.
Post by Hei Chan
A. I use GCC and portability isn't an issue. What is the best type
punning method to achieve lowest latency?
A union. You need a union to guarantee alignment.
Post by Hei Chan
B. Let's say portability is important. What's the best type punning
method to achieve lowest latency? It seems like memcpy() is the
only choice?
A union. In practice, this seems to work everywhere. If you are
really standards-pedantic, use memcpy().

Andrew.
Hei Chan
2014-09-15 11:07:06 UTC
Permalink
Post by Hei Chan
This is an interesting thread.
I think it is very common that people try to avoid making a copy
from the buffer filled by recv() (or alike) to achieve lowest
latency.
Given that
1. The "union trick" has always worked with GCC, and is now hallowed
by the standard. So it sounds like GCC might change in the future.
Why?

Your statement that the trick "is now hallowed by the standard" makes it sounds like at some point GCC won't guarantee it work anymore.
Post by Hei Chan
2. Somewhere in the code that might manipulate the buffer via
somehow casted packed C struct. Hence, any compiler is unlikely
able to avoid making call if memcpy() is used.
I don't understand what you mean by this. You can always write a
function which takes a pointer to a character type and calls memcpy()
to copy it into any scalar type, and it won't unnecessarily call
anything; or if it does that's a missed-optimization bug.


Sorry, it is a typo -- I mean "compiler is unlikely able to avoid making *a copy* if memcpy() is used".

Using the unsafe reinterpret_cast (C fashion cast), it won't have an extra copy. Using memcpy(), the compiler will have to make a copy because it sees that few lines, for example, down, the program tries to manipulate the copy.
Post by Hei Chan
A. I use GCC and portability isn't an issue. What is the best type
punning method to achieve lowest latency?
A union. You need a union to guarantee alignment.



So I guess there is no way to avoid a copy if the code manipulates the member of the union, right?

I understand that union and memcpy() would guarantee alignment. I was just hoping that there is a way of guaranteeing alignment without an extra copy. Sounds like there is no way?
Post by Hei Chan
B. Let's say portability is important. What's the best type punning
method to achieve lowest latency? It seems like memcpy() is the
only choice?
A union. In practice, this seems to work everywhere. If you are
really standards-pedantic, use memcpy().




Andrew.
Andrew Haley
2014-09-15 11:21:37 UTC
Permalink
Post by Hei Chan
Post by Andrew Haley
Post by Hei Chan
This is an interesting thread.
I think it is very common that people try to avoid making a copy
from the buffer filled by recv() (or alike) to achieve lowest
latency.
Given that
1. The "union trick" has always worked with GCC, and is now hallowed
by the standard. So it sounds like GCC might change in the future.
Why?
Your statement that the trick "is now hallowed by the standard"
makes it sounds like at some point GCC won't guarantee it work
anymore.
I disagree. It does not say that. GCC will not change this
behaviour.
Post by Hei Chan
Post by Andrew Haley
Post by Hei Chan
2. Somewhere in the code that might manipulate the buffer via
somehow casted packed C struct. Hence, any compiler is unlikely
able to avoid making call if memcpy() is used.
I don't understand what you mean by this. You can always write a
function which takes a pointer to a character type and calls memcpy()
to copy it into any scalar type, and it won't unnecessarily call
anything; or if it does that's a missed-optimization bug.
Sorry, it is a typo -- I mean "compiler is unlikely able to avoid
making *a copy* if memcpy() is used".
The compiler is likely to be able to avoid making a copy if memcpy() is
used.
Post by Hei Chan
Using the unsafe reinterpret_cast (C fashion cast), it won't have an
extra copy.
The alignment requirement is a property of the hardware. It is not a
property of the software. If the type needs aligning, it'll have to
be aligned somehow. Using reinterpret_cast does not help.
Post by Hei Chan
Using memcpy(), the compiler will have to make a copy
because it sees that few lines, for example, down, the program tries
to manipulate the copy.
So, don't manipulate the copy, then. Use it once, then throw it away.
Post by Hei Chan
Post by Andrew Haley
Post by Hei Chan
A. I use GCC and portability isn't an issue. What is the best type
punning method to achieve lowest latency?
A union. You need a union to guarantee alignment.
So I guess there is no way to avoid a copy if the code manipulates
the member of the union, right?
There is no need for a copy. I already produced an example which
proves that.
Post by Hei Chan
I understand that union and memcpy() would guarantee alignment. I
was just hoping that there is a way of guaranteeing alignment
without an extra copy. Sounds like there is no way?
It depends on the processor; on x86 and some ARMs and others yes. On
some others no.

Andrew.
Hei Chan
2014-09-15 11:29:28 UTC
Permalink
Post by Hei Chan
Post by Andrew Haley
Post by Hei Chan
This is an interesting thread.
I think it is very common that people try to avoid making a copy
from the buffer filled by recv() (or alike) to achieve lowest
latency.
Given that
1. The "union trick" has always worked with GCC, and is now hallowed
by the standard. So it sounds like GCC might change in the future.
Why?
Your statement that the trick "is now hallowed by the standard"
makes it sounds like at some point GCC won't guarantee it work
anymore.
I disagree. It does not say that. GCC will not change this
behaviour.
Post by Hei Chan
Post by Andrew Haley
Post by Hei Chan
2. Somewhere in the code that might manipulate the buffer via
somehow casted packed C struct. Hence, any compiler is unlikely
able to avoid making call if memcpy() is used.
I don't understand what you mean by this. You can always write a
function which takes a pointer to a character type and calls memcpy()
to copy it into any scalar type, and it won't unnecessarily call
anything; or if it does that's a missed-optimization bug.
Sorry, it is a typo -- I mean "compiler is unlikely able to avoid
making *a copy* if memcpy() is used".
The compiler is likely to be able to avoid making a copy if memcpy() is
used.
Post by Hei Chan
Using the unsafe reinterpret_cast (C fashion cast), it won't have an
extra copy.
The alignment requirement is a property of the hardware. It is not a
property of the software. If the type needs aligning, it'll have to
be aligned somehow. Using reinterpret_cast does not help.
Post by Hei Chan
Using memcpy(), the compiler will have to make a copy
because it sees that few lines, for example, down, the program tries
to manipulate the copy.
So, don't manipulate the copy, then. Use it once, then throw it away.


Sometimes, due to the endianness, I am forced to manipulate the copy...
Post by Hei Chan
Post by Andrew Haley
Post by Hei Chan
A. I use GCC and portability isn't an issue. What is the best type
punning method to achieve lowest latency?
A union. You need a union to guarantee alignment.
So I guess there is no way to avoid a copy if the code manipulates
the member of the union, right?
There is no need for a copy. I already produced an example which
proves that.
Post by Hei Chan
I understand that union and memcpy() would guarantee alignment. I
was just hoping that there is a way of guaranteeing alignment
without an extra copy. Sounds like there is no way?
It depends on the processor; on x86 and some ARMs and others yes. On
some others no.




Andrew.
Andrew Haley
2014-09-15 11:32:35 UTC
Permalink
Post by Andrew Haley
Post by Hei Chan
Using memcpy(), the compiler will have to make a copy
because it sees that few lines, for example, down, the program tries
to manipulate the copy.
So, don't manipulate the copy, then. Use it once, then throw it away.
Sometimes, due to the endianness, I am forced to manipulate the copy...
I don't know what you mean. A small example would help.

Andrew.
Hei Chan
2014-09-15 11:57:09 UTC
Permalink
// on a big-endian 64-bit machine

struct Message {
int32_t a;
int16_t b;
char c;
char padding;
};

// send over a socket
Message msg = {12345, 678, 'x', 0};
send(fd, &msg, sizeof Message, 0);


// another machine: a little-endian 64-bit machine
char buffer[1024];
if (recv(fd, buffer, sizeof buffer, 0)) {
Message msg;
// or we can use the union trick
memcpy(&msg, buffer, sizeof Message); SomeFunctioToConvertFromBigEndianToSmallEndian(msg.a);

SomeFunctioToConvertFromBigEndianToSmallEndian(msg.b);
}
Post by Andrew Haley
Post by Hei Chan
Using memcpy(), the compiler will have to make a copy
because it sees that few lines, for example, down, the program tries
to manipulate the copy.
So, don't manipulate the copy, then. Use it once, then throw it away.
Sometimes, due to the endianness, I am forced to manipulate the copy...
I don't know what you mean. A small example would help.




Andrew.
Andrew Haley
2014-09-15 13:20:56 UTC
Permalink
Post by Hei Chan
// on a big-endian 64-bit machine
struct Message {
int32_t a;
int16_t b;
char c;
char padding;
};
// send over a socket
Message msg = {12345, 678, 'x', 0};
I'd do this:

Message msg = {htonl(12345), htons(678), 'x', 0};
Post by Hei Chan
send(fd, &msg, sizeof Message, 0);
// another machine: a little-endian 64-bit machine
char buffer[1024];
if (recv(fd, buffer, sizeof buffer, 0)) {
Message msg;
// or we can use the union trick
Why not read into the Message?
Post by Hei Chan
memcpy(&msg, buffer, sizeof Message); SomeFunctioToConvertFromBigEndianToSmallEndian(msg.a);
I don't know why you'd want to byte-reverse in place if you actually
care about zero-copy network programming. That makes no sense to me.

Andrew.
Hei Chan
2014-09-15 13:31:12 UTC
Permalink
Post by Hei Chan
// on a big-endian 64-bit machine
struct Message {
int32_t a;
int16_t b;
char c;
char padding;
};
// send over a socket
Message msg = {12345, 678, 'x', 0};
I'd do this:

Message msg = {htonl(12345), htons(678), 'x', 0};


What if I have no control of the sender?
Post by Hei Chan
send(fd, &msg, sizeof Message, 0);
// another machine: a little-endian 64-bit machine
char buffer[1024];
if (recv(fd, buffer, sizeof buffer, 0)) {
Message msg;
// or we can use the union trick
Why not read into the Message?

I think it goes back to the same issue as Andy (the original poster) -- the server/sender will send multiple kinds of message types.

For instances,

enum class MsgType : int32_t {
Heartbeat,
Logon,
Logout
};

struct Header {
int32_t MsgType;
int32_t padding;
}

struct LogonBody {
int32_t a;
int16_t b;
char c;
char padding;
}

I can read into an instance of Header first (1 system call), then I know the message type so I can read into an instance of LogonBody (another system call). But my goal is to avoid latency. I certainly would prefer 1 system call instead of 2 if possible.

Hope now it makes sense to you.
Post by Hei Chan
memcpy(&msg, buffer, sizeof Message); SomeFunctioToConvertFromBigEndianToSmallEndian(msg.a);
I don't know why you'd want to byte-reverse in place if you actually
care about zero-copy network programming. That makes no sense to me.




Andrew.
Andrew Haley
2014-09-15 14:11:35 UTC
Permalink
Post by Andrew Haley
Post by Hei Chan
// on a big-endian 64-bit machine
struct Message {
int32_t a;
int16_t b;
char c;
char padding;
};
// send over a socket
Message msg = {12345, 678, 'x', 0};
Message msg = {htonl(12345), htons(678), 'x', 0};
What if I have no control of the sender?
I don't understand this question. You said that you were sending.
Post by Andrew Haley
Post by Hei Chan
send(fd, &msg, sizeof Message, 0);
// another machine: a little-endian 64-bit machine
char buffer[1024];
if (recv(fd, buffer, sizeof buffer, 0)) {
Message msg;
// or we can use the union trick
Why not read into the Message?
I think it goes back to the same issue as Andy (the original poster)
-- the server/sender will send multiple kinds of message types.
For instances,
enum class MsgType : int32_t {
Heartbeat,
Logon,
Logout
};
struct Header {
int32_t MsgType;
int32_t padding;
}
struct LogonBody {
int32_t a;
int16_t b;
char c;
char padding;
}
I can read into an instance of Header first (1 system call), then I
know the message type so I can read into an instance of LogonBody
(another system call). But my goal is to avoid latency. I
certainly would prefer 1 system call instead of 2 if possible.
But you can either read into a union of the message types and handle
that, or read into a byte array and pick the data straight from there.

Like this:

int
byteswap (int u)
{
return ((((u) & 0xff000000) >> 24)
| (((u) & 0x00ff0000) >> 8)
| (((u) & 0x0000ff00) << 8)
| (((u) & 0x000000ff) << 24));
}

int readInt(char *a) {
int val;
memcpy(&val, a, sizeof val);
return byteswap(val);
}

which generates this:

readInt:
ldr w0, [x0]
rev w0, w0
ret
Post by Andrew Haley
Hope now it makes sense to you.
Not really, no. I know that there are many bad ways of solving the
problem. All I'm saying is that you don't have to do it in a bad way.
C provides you with everything you need to do it well.

If you want a way to read from an arbitrary position in an byte array
into any type, in any endianness, you can do that; see readInt above.
If you want to read into a union of all message types, you can do
that.

Andrew.

Jonathan Wakely
2014-09-15 11:27:41 UTC
Permalink
Post by Hei Chan
Your statement that the trick "is now hallowed by the standard" makes it sounds like at some point GCC won't guarantee it work anymore.
If GCC supports a feature, and that feature is now standardised, why
would GCC stop supporting it?

That would break old code *and* fail to conform to the standard!
Paul Smith
2014-09-15 12:10:01 UTC
Permalink
Post by Jonathan Wakely
Post by Hei Chan
Your statement that the trick "is now hallowed by the standard"
makes it sounds like at some point GCC won't guarantee it work
anymore.
If GCC supports a feature, and that feature is now standardised, why
would GCC stop supporting it?
Perhaps a language barrier issue? Using a less-common term like
"hallowed" may be throwing Hei off the mark?

"Hallowed" means "enshrined" or "blessed"; in other words, the standard
has adopted the behavior as part of the standard. Therefore GCC will
definitely not abandon it.

Hei; it would help us greatly if you could configure your mail software
to include some sort of quoting to the original text when you reply to
email. Your responses and the text you are replying to look exactly the
same and it's very confusing.

Cheers!
Continue reading on narkive:
Loading...