Integrating CMake support for xerces

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Integrating CMake support for xerces

Roger Leigh
Dear all,

Last July, I posted a message on this list regarding adding CMake
support, and created this ticket:
https://issues.apache.org/jira/browse/XERCESC-2077

The patch is ready for integration, having been tested on a multitude of
platforms, including FreeBSD, Linux, MacOS X and Windows with multiple
configuration and compiler combinations.  Following the last comments in
the above ticket, I'm writing here to propose and ask for comments on
the next steps for integrating it.

There are two choices for merging it:
- to the 3.1 branch
- to the trunk, for releasing as 3.2

Since the proposed changes don't touch any of the existing build
systems, merging onto the 3.1 branch would be safe, but since it's a
fairly large change it would be understandable to leave this for a new
minor release.  Is there any particular preference?

A follow-on question would be the continued support for other build
systems following the merging of the patch.  The use of CMake will allow
for the removal of the many version-specific Visual Studio solution and
project files, since CMake can support all the same Visual Studio
versions, and with a great deal more flexibility for e.g. ICU support
and other configure-time options.  The same could also be said of the
Autoconf support, since CMake can also generate Unix Makefiles.  For
maintenance reasons, I'd like to propose removing all the Visual Studio
files; this was one of the primary reasons for developing the CMake
support in the first place.  This would make sense to do on the
trunk/3.2 branch, since we wouldn't want to remove existing
functionality on the 3.1 branch.

Removing the Autoconf support would also be a possibility if there was
consensus to do so.  The CMake support certainly implements all the
Autoconf features--it reproduces every single feature test and option
exactly.  But the maintenance cost is vastly less than the Visual Studio
support, so retaining both Autoconf and CMake support is certainly possible.

Integration test results for a range of platforms is also here:
https://ci.openmicroscopy.org/view/Third-Party/ (all the XERCESC- jobs).

Additionally, if anyone wanted to review and test the patch, it's
attached to the above ticket and also available here:
https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1


Kind regards,
Roger Leigh

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Cantor, Scott
On 4/22/17, 2:59 PM, "Roger Leigh" <[hidden email]> wrote:

> There are two choices for merging it:
> - to the 3.1 branch
> - to the trunk, for releasing as 3.2

Or a third branch, but I think you already did that via git anyway and that's simpler in practice so we can dismiss that one.

> Since the proposed changes don't touch any of the existing build
> systems, merging onto the 3.1 branch would be safe, but since it's a
> fairly large change it would be understandable to leave this for a new
> minor release.  Is there any particular preference?

I think there's a relevant parallel discussion about the project's next step. Right now there are some apparent regressions on the branch I introduced trying to fix security issues in code I didn't understand. And there are some outstanding security issues on both branches because of that DOMHelper code that's making in-memory object layout assumptions with improper casts. That has got to be fixed.

Meanwhile, Red Hat has refused to ship existing security fixes in their copy of 3.1, which is leaving my customers screwed, and that's becoming intolerable.

My only practical solution to fix that is to get my software rebased onto a new version, 3.2, which I can ship in a non-conflicting package. So I'm inclined to do the very ugly work of figuring out what's missing from the trunk and reviewing all the additional work there that was done before the project went into moribundity, and try and get a 3.2 out the door this summer.

So given that, I suspect the thing to do is to put it on trunk, but I'd like a bit of time to review current trunk before we do that merge so I'm not dealing with both at once if that's ok.

> For maintenance reasons, I'd like to propose removing all the Visual Studio
> files; this was one of the primary reasons for developing the CMake
> support in the first place.  This would make sense to do on the
> trunk/3.2 branch, since we wouldn't want to remove existing
> functionality on the 3.1 branch.

Right, I think that's another good argument for using trunk.

> Removing the Autoconf support would also be a possibility if there was
> consensus to do so.  The CMake support certainly implements all the
> Autoconf features--it reproduces every single feature test and option
> exactly.  But the maintenance cost is vastly less than the Visual Studio
> support, so retaining both Autoconf and CMake support is certainly possible.

I'm not inclined to consider removint autoconf without seeing the alternative and understanding the implications, particularly wrt libtool.

> Additionally, if anyone wanted to review and test the patch, it's
> attached to the above ticket and also available here:
> https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1

I will do so when I can.

-- Scott




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Roger Leigh
On 23/04/2017 18:26, Cantor, Scott wrote:
> On 4/22/17, 2:59 PM, "Roger Leigh" <[hidden email]> wrote:
>
>> There are two choices for merging it:
>> - to the 3.1 branch
>> - to the trunk, for releasing as 3.2
>
> Or a third branch, but I think you already did that via git anyway and that's simpler in practice so we can dismiss that one.

I can certainly rebase the cmake-3.1 branch onto trunk if that would
make sense.  However, looking at the differences between the 3.1 branch
and the trunk, it looks like the trunk might need a fair amount of 3.1
work applying.  Is it a bit out of date?

I'm getting a few conflicts in EXTRA_DIST in a few Makefile.ams.
Nothing major, but it's highlighting that there's stuff missing on the
trunk.

>> Since the proposed changes don't touch any of the existing build
>> systems, merging onto the 3.1 branch would be safe, but since it's a
>> fairly large change it would be understandable to leave this for a new
>> minor release.  Is there any particular preference?
>
> I think there's a relevant parallel discussion about the project's next step. Right now there are some apparent regressions on the branch I introduced trying to fix security issues in code I didn't understand. And there are some outstanding security issues on both branches because of that DOMHelper code that's making in-memory object layout assumptions with improper casts. That has got to be fixed.
>
> Meanwhile, Red Hat has refused to ship existing security fixes in their copy of 3.1, which is leaving my customers screwed, and that's becoming intolerable.
>
> My only practical solution to fix that is to get my software rebased onto a new version, 3.2, which I can ship in a non-conflicting package. So I'm inclined to do the very ugly work of figuring out what's missing from the trunk and reviewing all the additional work there that was done before the project went into moribundity, and try and get a 3.2 out the door this summer.
>
> So given that, I suspect the thing to do is to put it on trunk, but I'd like a bit of time to review current trunk before we do that merge so I'm not dealing with both at once if that's ok.

OK.  If there's anything I can do to help out here, I can certainly try.


Regards,
Roger

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
> I can certainly rebase the cmake-3.1 branch onto trunk if that would
> make sense.  However, looking at the differences between the 3.1 branch
> and the trunk, it looks like the trunk might need a fair amount of 3.1
> work applying.  Is it a bit out of date?

Yes.

> OK.  If there's anything I can do to help out here, I can certainly try.

Only if you're an encoding expert. ;-)

The really risky and ugly issues are all these security bugs in the character processing code, and the sticky part is that there's supposedly a regression caused by one of those fixes. Since I had no idea how any of that code worked, that's not really surprising.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Boris Kolpackov-2
In reply to this post by Cantor, Scott
Hi All,

Cantor, Scott <[hidden email]> writes:

> So I'm inclined to do the very ugly work of figuring out what's
> missing from the trunk and reviewing all the additional work
> there that was done before the project went into moribundity,
> and try and get a 3.2 out the door this summer.

That would be great.

Since we are sharing plans, we (as in Code Synthesis) are planning
to package Xerces-C++ for build2[1] in the near future (but no
definite time-frame). While I haven't looked into this closely
yet, the options we consider range between just packaging it as
is to pretty much forking it. The main reasons for forking would
be: (1) to switch to git (life is just too short for svn), (2)
to get rid of the Apache bureaucracy, and (3) rip all the legacy
parts out and clean things up (maybe even switching to C++11/14).

[1] https://build2.org/

Boris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
> Since we are sharing plans, we (as in Code Synthesis) are planning
> to package Xerces-C++ for build2[1] in the near future (but no
> definite time-frame). While I haven't looked into this closely
> yet, the options we consider range between just packaging it as
> is to pretty much forking it. The main reasons for forking would
> be: (1) to switch to git (life is just too short for svn), (2)
> to get rid of the Apache bureaucracy, and (3) rip all the legacy
> parts out and clean things up (maybe even switching to C++11/14).

(1) doesn't matter to me, but +1000 to (2) and I have very little compunction about (3), aside from the obvious fact that once you start pulling that thread, you're on slippery ground.

I wasn't prepared to really go so far as to start tossing things out or proposing really invasive changes but it sounds like cleaning up and releasing the trunk would serve both short term and longer term ends here.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Roger Leigh
On 25/04/2017 18:56, Cantor, Scott wrote:

>> Since we are sharing plans, we (as in Code Synthesis) are planning
>> to package Xerces-C++ for build2[1] in the near future (but no
>> definite time-frame). While I haven't looked into this closely
>> yet, the options we consider range between just packaging it as
>> is to pretty much forking it. The main reasons for forking would
>> be: (1) to switch to git (life is just too short for svn), (2)
>> to get rid of the Apache bureaucracy, and (3) rip all the legacy
>> parts out and clean things up (maybe even switching to C++11/14).
>
> (1) doesn't matter to me, but +1000 to (2) and I have very little compunction about (3), aside from the obvious fact that once you start pulling that thread, you're on slippery ground.
>
> I wasn't prepared to really go so far as to start tossing things out or proposing really invasive changes but it sounds like cleaning up and releasing the trunk would serve both short term and longer term ends here.

Switching to git would be wonderful.  We could also enable CI testing
with e.g. Travis or some other CI service on github at that time to
enable testing of all PRs, if that would be accceptable.  Or does the
Apache project provide any equivalent services internally?

Regarding (3), it's a bit outside the scope of this CMake ticket.  My
intentions here were to get a build system which would provide a working
build on all platforms, including the unit tests.  I didn't want to go
down the rabbit hole at the same time.  Ideally, if we merge this to the
trunk and branch off a 3.2 and release that, more adventurous changes
could be then done on the trunk.  I'd rather have a working release with
the CMake support included than to do both and not have an immediately
usable and API compatible release!

That said, I'd not be averse to including support for standard C++;
using Xerces is often a bugbear due to its age.  All our code is now
C++11, with RAII wrappers to make Xerces play nicely.  Primarily the
lack of RAII, non-standard exception types, odd memory management
semantics and transcoding all input.  Something worth noting is that our
(optional) ICU dependency switched to requiring C++11 with ICU 59.1.  It
switched to using the standard char16_t as its XML string type.  If
Xerces were to also switch (or at least use a suitable typedef), we
could be using const char16_t* foo = u"UTF-16 strings" and/or u8"UTF-8"
strings directly in both the xerces sources and in client programs.  A
major usability improvement.

In a recent performance testing exercise at work, we found string
transcoding inside xerces-c to be a major time sink--using valgrind
callgrind--it was one of the major uses of CPU time during parsing and
DOM processing.  It was slower than xerces-j for the same operations,
and this was likely to be a major cause.

Certainly cleaning up and releasing trunk would be a step towards any of
that, should there be a consensus for that.


Regards,
Roger


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Cantor, Scott
On 4/25/17, 3:17 PM, "Roger Leigh" <[hidden email]> wrote:

> Switching to git would be wonderful.  We could also enable CI testing
> with e.g. Travis or some other CI service on github at that time to
> enable testing of all PRs, if that would be accceptable.  Or does the
> Apache project provide any equivalent services internally?

There are already mirrors of the code at git.apache.org (and to github from there), and of course all CI tools can pull from svn just as easily as git. That's never been an impediment. I don't know if there are tests sufficient to be worth exercising like that or not.

> Regarding (3), it's a bit outside the scope of this CMake ticket.  My
> intentions here were to get a build system which would provide a working
> build on all platforms, including the unit tests.  I didn't want to go
> down the rabbit hole at the same time.  Ideally, if we merge this to the
> trunk and branch off a 3.2 and release that, more adventurous changes
> could be then done on the trunk.  I'd rather have a working release with
> the CMake support included than to do both and not have an immediately
> usable and API compatible release!

+1

I wasn't suggesting anything else, and it makes sense to go ahead and branch again if there's going to be any real screwing around, I need a stable branch myself.

I have made some progress today after a few hours reviewing trunk and I'm only about 10 commits back from when I started cherry picking things back to the 3.1 branch, at which point the trunk essentially froze. So far there is very little divergence, just a few small API additions that are unique to the trunk. So I don't foresee anything terribly risky about releasing this after some additional fixes, some testing, and incorporating your patch.

> That said, I'd not be averse to including support for standard C++;
> using Xerces is often a bugbear due to its age.  All our code is now
> C++11, with RAII wrappers to make Xerces play nicely.  Primarily the
> lack of RAII, non-standard exception types, odd memory management
> semantics and transcoding all input.

The problem with C++11 is it's just not portable to enough compilers outside of Windows. I'm aware gcc probably supports it but gcc on actual Linux distros that people still use heavily does not. If I can't build it on RH6 it's not usable for me, and since I'm the one doing most of the work right now...

Really, C++11 is beside the point. Simply good old C++ would fix many issues, but this code dates to back when using real C++ and the STL was just too non-portable, along with the usual Unix anti-C++ bias.

> Something worth noting is that our
> (optional) ICU dependency switched to requiring C++11 with ICU 59.1.  It
>  switched to using the standard char16_t as its XML string type.  If
> Xerces were to also switch (or at least use a suitable typedef), we
>  could be using const char16_t* foo = u"UTF-16 strings" and/or u8"UTF-8"
> strings directly in both the xerces sources and in client programs.  A
> major usability improvement.

At a huge cost in portability unfortunately. Believe me, I wish that were viable for me. So, so much.

> In a recent performance testing exercise at work, we found string
> transcoding inside xerces-c to be a major time sink--using valgrind
> callgrind--it was one of the major uses of CPU time during parsing and
> DOM processing.  It was slower than xerces-j for the same operations,
> and this was likely to be a major cause.

I'm not sure that you're going to fix that. It's already using UTF-16 internally. If there are problems with transcoding, I think that's just the cost of transcoding, I don't think the need to transcode goes away unless I'm missing something.

Anyway, within a week or two I expect to be able to put trunk in a position to accept your patch and we can continue on from there.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Cantor, Scott
On 4/25/17, 8:30 PM, "Cantor, Scott" <[hidden email]> wrote:

> So far there is very little divergence, just a few small API additions that are unique to the trunk. So I don't foresee anything
> terribly risky about releasing this after some additional fixes, some testing, and incorporating your patch.

Other than some things I have to port up from the branch and other bug reports that have come in, the two big commits on trunk are:

r1517488 (XERCESC-2016)
r1528170 (XERCESC-2019)

The former is a patch that's pretty invasive to add XML 1.0 5th edition support, which I surmise actually removes a lot of the special handling of XML 1.1 All of that is outside my expertise, so I don't have any insight into how risky that change is or how well it was tested. For myself I don't need it at all and would as soon undo it if it can't be verified as safe, but I'm not suggesting that exactly, just noting it's significant.

The latter is smaller and is a change to memory handling of text buffers in the DOM. I haven't fully grokked that yet but I doubt it's a big deal, just worth a look.

Everything else on trunk now that's not on the branch is much simpler and I don't see as risky.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Roger Leigh
In reply to this post by Cantor, Scott
On 26/04/2017 01:30, Cantor, Scott wrote:
> On 4/25/17, 3:17 PM, "Roger Leigh" <[hidden email]> wrote:

>> That said, I'd not be averse to including support for standard C++;
>> using Xerces is often a bugbear due to its age.  All our code is now
>> C++11, with RAII wrappers to make Xerces play nicely.  Primarily the
>> lack of RAII, non-standard exception types, odd memory management
>> semantics and transcoding all input.
>
> The problem with C++11 is it's just not portable to enough compilers outside of Windows. I'm aware gcc probably supports it but gcc on actual Linux distros that people still use heavily does not. If I can't build it on RH6 it's not usable for me, and since I'm the one doing most of the work right now...
>
> Really, C++11 is beside the point. Simply good old C++ would fix many issues, but this code dates to back when using real C++ and the STL was just too non-portable, along with the usual Unix anti-C++ bias.

Agreed that just moving up to C++98 standard types in and of itself
would be greatly beneficial.  There should be no portability barrier to
achieving that.

Regarding portability, I also have the "pleasure" of supporting code on
CentOS 6.  I don't know if you've tried it, but we switched to using the
SCL "devtoolset-3" (now "devtoolset-4") packages which backport a modern
GCC and the rest of the toolchain to CentOS6 (and 7).  We use this to
build C++11 code on CentOS 6, and it's been trouble free for us.  Apart
from CentOS, we build C++11 without any trouble on current FreeBSD
10/11, MacOS X 10.9+, Ubuntu 14.04/16.04 and Windows (VS2013, 15, 17
soon).  It's CentOS 6 which is currently the lowest common denominator;
everything else has supported C++11 well for many years at this point.
Our projects made the switch a few months back once they were buildable
and supportable across the board.


Regards,
Roger

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Cantor, Scott
On 4/26/17, 4:04 AM, "Roger Leigh" <[hidden email]> wrote:

> Agreed that just moving up to C++98 standard types in and of itself
> would be greatly beneficial.  There should be no portability barrier to
> achieving that.

No, definitely not. I've been using the STL and Boost for years now on many platforms.

> Regarding portability, I also have the "pleasure" of supporting code on
> CentOS 6.  I don't know if you've tried it, but we switched to using the
> SCL "devtoolset-3" (now "devtoolset-4") packages which backport a modern
> GCC and the rest of the toolchain to CentOS6 (and 7).

Do the packages built from that work on an unmodified CentOS 6 system? Meaning does it pull in any dependencies for that from the standard repos?

The change that's really relevant for me is that Red Hat 5 dropped out of standard support in March, so that was a major switch.

Unfortunately I also have many older SUSE versions to support also.

-- Scott




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Dean Roddey
In reply to this post by Roger Leigh
I would have been writing the original version of the parser in about 98 probably, so C++98 would have only just come into being. Though, the thing with not just sharing allocated memory with the containing application was added later. I didn't write it that way at the start and wouldn't have ever went that way myself.

There definitely wasn't any Unix anti-C++ bias in the original code, since I'm a Windows guy and had been doing C++ for quite a few years by then. But to support the 9ish or more platforms we did at the time without any conditional code (other than the per-platform stuff and some endian sensitive code I guess), meant keeping it pretty plain jane.

--------------------------------------------
Dean Roddey
Chairman/CTO
Charmed Quark Systems, Ltd
www.charmedquark.com


-----Original Message-----
From: Roger Leigh [mailto:[hidden email]]
Sent: Wednesday, April 26, 2017 4:04 AM
To: [hidden email]
Subject: Re: Integrating CMake support for xerces

On 26/04/2017 01:30, Cantor, Scott wrote:
> On 4/25/17, 3:17 PM, "Roger Leigh" <[hidden email]> wrote:

>> That said, I'd not be averse to including support for standard C++;
>> using Xerces is often a bugbear due to its age.  All our code is now
>> C++11, with RAII wrappers to make Xerces play nicely.  Primarily the
>> lack of RAII, non-standard exception types, odd memory management
>> semantics and transcoding all input.
>
> The problem with C++11 is it's just not portable to enough compilers outside of Windows. I'm aware gcc probably supports it but gcc on actual Linux distros that people still use heavily does not. If I can't build it on RH6 it's not usable for me, and since I'm the one doing most of the work right now...
>
> Really, C++11 is beside the point. Simply good old C++ would fix many issues, but this code dates to back when using real C++ and the STL was just too non-portable, along with the usual Unix anti-C++ bias.

Agreed that just moving up to C++98 standard types in and of itself would be greatly beneficial.  There should be no portability barrier to achieving that.

Regarding portability, I also have the "pleasure" of supporting code on CentOS 6.  I don't know if you've tried it, but we switched to using the SCL "devtoolset-3" (now "devtoolset-4") packages which backport a modern GCC and the rest of the toolchain to CentOS6 (and 7).  We use this to build C++11 code on CentOS 6, and it's been trouble free for us.  Apart from CentOS, we build C++11 without any trouble on current FreeBSD 10/11, MacOS X 10.9+, Ubuntu 14.04/16.04 and Windows (VS2013, 15, 17 soon).  It's CentOS 6 which is currently the lowest common denominator; everything else has supported C++11 well for many years at this point.
Our projects made the switch a few months back once they were buildable and supportable across the board.


Regards,
Roger

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
In reply to this post by Roger Leigh
> Additionally, if anyone wanted to review and test the patch, it's
> attached to the above ticket and also available here:
> https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1

Playing with this now, I had two issues I wanted to ask about. One is that it looks like there definitely is a lot of overlap with the material we have to maintain for autoconf, and I'm just concerned about the dual maintenance of it or the possibility of it falling out of sync since nobody else really knows cmake.

Leaving aside whether it's a good or bad idea, if we wanted to standardize on this, does the cmake system generate actual autoconf-compatible build files (i.e. you run configure?) or does it take over that role. I think the latter is a non-starter given the prevalence of autoconf assumptions across the landscape. And if so, that does raise maintenance concerns.

Secondly, I'm mainly playing with the Windows side of this, and I was unclear if it's possible to generate solution files for both 32- and 64-bit at once? It looks like it picks one to do at a time so that if you had to build both you'd have to generate the whole set of files twice in between or use separate unpacked trees, etc. Is that correct?

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
In reply to this post by Roger Leigh
> Secondly, I'm mainly playing with the Windows side of this, and I was unclear
> if it's possible to generate solution files for both 32- and 64-bit at once? It
> looks like it picks one to do at a time so that if you had to build both you'd
> have to generate the whole set of files twice in between or use separate
> unpacked trees, etc. Is that correct?

Never mind, I see the instructions now for running it in separate directories now.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
In reply to this post by Roger Leigh
One issue I did notice on the Windows side is that the DLL names are different from the existing convention. I would have to personally adjust them back and I don't think we'd have any reason to want them changed, so I assume that could be adjusted back?

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Roger Leigh
In reply to this post by Cantor, Scott
On 16/05/2017 16:37, Cantor, Scott wrote:
>> Additionally, if anyone wanted to review and test the patch, it's
>> attached to the above ticket and also available here:
>> https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1
>
> Playing with this now, I had two issues I wanted to ask about. One is that it looks like there definitely is a lot of overlap with the material we have to maintain for autoconf, and I'm just concerned about the dual maintenance of it or the possibility of it falling out of sync since nobody else really knows cmake.

There's certainly a good amount of duplication, most of it intentionally
so that the CMake logic mirrors the existing Autoconf feature tests
exactly.  This is so it can be a drop-in replacement in every respect;
we could have gone with a more CMake-native approach in some cases, or
dropped some historical parts entirely.  Most if it is needed to support
all the configurable options so that we're exactly reproducing the same
#defines in the configuration header, and doing exactly the same
conditional compilation.

It would have been great if there was some way to share the logic
between the two.  The snippets of program code used for feature tests
might be shared, but they are so small and trivial that it's likely not
worth it.

One place we can definitely share code is the unit test output.  Here,
we add test output files, one per unit test, for validating the tests.
This permits individual tests to be run, run in parallel (and it also
fixes the tests on FreeBSD where there are issues with newlines at the
end of file for some reason).  We could make automake "make check" use
these as well, replacing the perl script currently in use, which will
make the autotools testing a bit more accessible and transparent, as
well as a bit more portable.  I can certainly look at this as a followup
task.

> Leaving aside whether it's a good or bad idea, if we wanted to standardize on this, does the cmake system generate actual autoconf-compatible build files (i.e. you run configure?) or does it take over that role. I think the latter is a non-starter given the prevalence of autoconf assumptions across the landscape. And if so, that does raise maintenance concerns.

I'm not entirely sure about the question you're asking here.  By
autoconf-compatible build files, you're talking about the end result of
configure--the generated Makefiles and headers, or the intermediate
autoconf/make scripts like configure/Makefile.in?

CMake and its CMakeLists.txt file are equivalent to
autoconf/automake/autoheader/libtoolize and configure.ac, Makefile.am,
config.h.in etc. along with the generated Makefile.in/configure etc.
CMake is a single tool which generates build files for any supported
build system by evaluating the CMakeLists.txt script.  In the case of
its "Unix Makefiles" generator, this generates Makefiles and headers
just like configure, but without any intermediate scripts being needed.
The Makefiles are broadly equivalent in terms of supported targets like
"make install"; and when running cmake, you can configure all the
options and path prefixes almost the same as configure (the names are
slightly different but the intent is the same).

While on Unix this is certainly duplicating much of the autotools logic,
and could potentially replace autotools entirely, the real gain (and the
intention for doing this) is the vastly improved support for Windows.
I'd suggest that the cost of the duplication is outweighed by the
existing maintenance burden of the Visual Studio solutions, which was
often externalised since people like myself had to hand-patch every
release to make them work with ICU, newer Visual Studio versions etc.
With CMake, we just run cmake with the desired options, and we're done,
and we can directly integrate this with other projects.

> Secondly, I'm mainly playing with the Windows side of this, and I was unclear if it's possible to generate solution files for both 32- and 64-bit at once? It looks like it picks one to do at a time so that if you had to build both you'd have to generate the whole set of files twice in between or use separate unpacked trees, etc. Is that correct?

Yes, as you mentioned in your other reply, you need to use separate
build directories for each compiler/platform combination; you can use a
common source tree.  On Windows, you can

- Choose the developer command prompt of your choice, and then use the
"-G" option to select the "NMake, "Ninja" or any other generator of your
choice.
- Use the "Visual Studio nn yyyy [Win64]" generator from a regular
command prompt, and it will set up the compiler environment for you;
build with "msbuild" or open up the solution in Visual Studio.
- Use Cygwin or MinGW with the "Unix Makefiles" or any other generator
of choice

So CMake does make building

 > One issue I did notice on the Windows side is that the DLL names are
 > different from the existing convention. I would have to personally
 > adjust them back and I don't think we'd have any reason to want them
 > changed, so I assume that could be adjusted back?

I think it's fairly simple to change; I'll have to check.  The same also
applies to the full .so version on Unix, which isn't identical to
libtool in its pattern.  Same name to link with, and same SOVERSION
symlink, but the library itself has a slightly different pattern.

The question I have here is this: why is the link name different on
Windows?  With CMake being cross-platform, it would allow the naming and
versioning conventions to be the same on both Unix and Windows.  This is
a usability gain when a cross-platform downstream project wants to link
with xerces: right now they have to hardcode the discrepancy.  I'm sure
we can tweak it to retain the existing platform-specific conventions
with a few additional bits of configuration, however.


Kind regards,
Roger

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
> There's certainly a good amount of duplication, most of it intentionally
> so that the CMake logic mirrors the existing Autoconf feature tests
> exactly.

Right, I understand the motivation. And Xerces has one of the more horrendous config.h messes I've dealt with, it wouldn't necessarily be so much work in every project.

> It would have been great if there was some way to share the logic
> between the two.  The snippets of program code used for feature tests
> might be shared, but they are so small and trivial that it's likely not
> worth it.

No, it was more the overarching set of defines and how they're populated.

> One place we can definitely share code is the unit test output.  Here,
> we add test output files, one per unit test, for validating the tests.

I'm very unfamiliar with the tests but that all makes sense.

> I'm not entirely sure about the question you're asking here.  By
> autoconf-compatible build files, you're talking about the end result of
> configure--the generated Makefiles and headers, or the intermediate
> autoconf/make scripts like configure/Makefile.in?

No, the intermediates. People *want*, virtually *demand* the ability to do source builds with configure/make/make install and any violation of the norm is just painful for anybody using Xerces in their projects together with dozens of other libraries all expecting to be built that way. "Different" is bad in this context.

> While on Unix this is certainly duplicating much of the autotools logic,
> and could potentially replace autotools entirely, the real gain (and the
> intention for doing this) is the vastly improved support for Windows.

Oh, I know that, it's the concern that if we don't/can't replace the autotools pieces we end up needing to maintain both to keep the config.h/etc. material in sync and working.

> I think it's fairly simple to change; I'll have to check.  The same also
> applies to the full .so version on Unix, which isn't identical to
> libtool in its pattern.  Same name to link with, and same SOVERSION
> symlink, but the library itself has a slightly different pattern.

I'm not a fan of the non-soname versioning used, but there's backstory there and I don't think we want to re-litigate it, so the convention there has been to embed the ABI version in the name (libxerces-c-3.1.so, new one will be libxerces-c-3.2.so)

> The question I have here is this: why is the link name different on
> Windows?  With CMake being cross-platform, it would allow the naming and
> versioning conventions to be the same on both Unix and Windows.  This is
> a usability gain when a cross-platform downstream project wants to link
> with xerces: right now they have to hardcode the discrepancy.  I'm sure
> we can tweak it to retain the existing platform-specific conventions
> with a few additional bits of configuration, however.

Most projects don't have any cross platform build tooling for Linux and Windows, so the name difference doesn't really come up much. On the Windows side, it's critical for the ABI version tag and a D (for debug) be present, but I'm not personally wedded to anything in particular other than, again, not wanting to engage in a bikeshedding discussion about it, so leaving it consistent with 3.1 seemed the simplest choice.

If there's value in aligning with the Linux names, though, I'm not really opposed personally, but Linux doesn't have the debug/non-debug distinction either, so I don't think they could really be identical anyway...

I've successfully built various combinations of the Windows build and I get the gist of it.

I guess for myself, my comfort level would go up a bit if there was just a brief sort of outline of how a given AC_DEFINE or Windows #define would be added to the cmake build if it became needed. If I had some comfort level with how it works, the duplication wouldn't bother me too much.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Roger Leigh
On 16/05/2017 19:02, Cantor, Scott wrote:
>> There's certainly a good amount of duplication, most of it intentionally
>> so that the CMake logic mirrors the existing Autoconf feature tests
>> exactly.
>
> Right, I understand the motivation. And Xerces has one of the more horrendous config.h messes I've dealt with, it wouldn't necessarily be so much work in every project.

Definitely not; this is the most complex conversion I've done to date.
The previous most complex one was libtiff, which also had a fair amount
of historical stuff.  Most are trivial in comparison.

>> I'm not entirely sure about the question you're asking here.  By
>> autoconf-compatible build files, you're talking about the end result of
>> configure--the generated Makefiles and headers, or the intermediate
>> autoconf/make scripts like configure/Makefile.in?
>
> No, the intermediates. People *want*, virtually *demand* the ability to do source builds with configure/make/make install and any violation of the norm is just painful for anybody using Xerces in their projects together with dozens of other libraries all expecting to be built that way. "Different" is bad in this context.

Ah, OK.  In the case where we wanted to drop the Autotools and maintain
just one system, it would be absolutely possible to create a wrapper
"configure" script which forwards all the expected arguments to cmake.
Certainly for all prefixes, flags, --with/--enable options etc.
Likewise with any target name differences--we can create proxy targets
to forward to the real target.  The only problem might be more esoteric
cases, like cross-compilation, where it wouldn't be trivial to wrap.  We
could certainly cater for the common case scenarios though.

Hopefully I got the question you were asking.  I didn't do this in the
patch because the intention was not to replace the Autotools.  But I can
look into this if desired.

>> I think it's fairly simple to change; I'll have to check.  The same also
>> applies to the full .so version on Unix, which isn't identical to
>> libtool in its pattern.  Same name to link with, and same SOVERSION
>> symlink, but the library itself has a slightly different pattern.
>
> I'm not a fan of the non-soname versioning used, but there's backstory there and I don't think we want to re-litigate it, so the convention there has been to embed the ABI version in the name (libxerces-c-3.1.so, new one will be libxerces-c-3.2.so)

OK

>> The question I have here is this: why is the link name different on
>> Windows?  With CMake being cross-platform, it would allow the naming and
>> versioning conventions to be the same on both Unix and Windows.  This is
>> a usability gain when a cross-platform downstream project wants to link
>> with xerces: right now they have to hardcode the discrepancy.  I'm sure
>> we can tweak it to retain the existing platform-specific conventions
>> with a few additional bits of configuration, however.
>
> Most projects don't have any cross platform build tooling for Linux and Windows, so the name difference doesn't really come up much. On the Windows side, it's critical for the ABI version tag and a D (for debug) be present, but I'm not personally wedded to anything in particular other than, again, not wanting to engage in a bikeshedding discussion about it, so leaving it consistent with 3.1 seemed the simplest choice.
>
> If there's value in aligning with the Linux names, though, I'm not really opposed personally, but Linux doesn't have the debug/non-debug distinction either, so I don't think they could really be identical anyway...

OK.  I'll look into copying the existing Windows and Libtool semantics
exactly.  If there's a possibility for aligning them with the next major
release, we could revisit it then, but I'll revert to the status quo for
now.

> I've successfully built various combinations of the Windows build and I get the gist of it.
>
> I guess for myself, my comfort level would go up a bit if there was just a brief sort of outline of how a given AC_DEFINE or Windows #define would be added to the cmake build if it became needed. If I had some comfort level with how it works, the duplication wouldn't bother me too much.

This is pretty straightforward:

src/xercesc/util/Xerces_autoconf_config.hpp.cmake.in is the template.
It's the same as src/xercesc/util/Xerces_autoconf_config.hpp.in with
these exceptions:

- "#cmakedefine var 1" is used in place of "#undef var" for Boolean macros
- "#define var @var@" is used for substitutions of types and values with
direct replacement of the @var@, just like in configure .in files.
- It also includes the missing Windows logic which wasn't needed for the
Unix-only autoconf template

The header is generated from the template by the configure_file in the
root CMakeLists.txt:

configure_file(
 
${CMAKE_CURRENT_SOURCE_DIR}/src/xercesc/util/Xerces_autoconf_config.hpp.cmake.in
   ${CMAKE_CURRENT_BINARY_DIR}/src/xercesc/util/Xerces_autoconf_config.hpp
   @ONLY)

The @ONLY means only @var@ substitions are allowed (no ${var}).

There is no AC_DEFINE equivalent; there is no autoheader, and the
substitutions are regular variables, so

     set(var ON|OFF|TRUE|FALSE|1|0) -- Boolean
     set(var some-value) -- string

will work for both types of substitution in the template.

All of the feature tests set a variable for propagation into the
template.  Example: each header test in cmake/XercesIncludes.cmake sets
the a specified result variable which we then use directly.  They are
named exactly the same as the autoconf variables to make the direct
equivalence clear.  Or in the case of cmake/XercesDLL.cmake we set
XERCES_PLATFORM_EXPORT and XERCES_PLATFORM_IMPORT which are then
substituted into the header.

I hope that explains things, but I'm happy to go into more detail for
any aspects which are unclear.

Regards,
Roger




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Integrating CMake support for xerces

Roger Leigh
On 16/05/2017 19:42, Roger Leigh wrote:

> configure_file(
>
> ${CMAKE_CURRENT_SOURCE_DIR}/src/xercesc/util/Xerces_autoconf_config.hpp.cmake.in
>
>   ${CMAKE_CURRENT_BINARY_DIR}/src/xercesc/util/Xerces_autoconf_config.hpp
>   @ONLY)

I should have mentioned: from an autoconf point of view,
"configure_file" is directly equivalent to "AC_OUTPUT" but with a bit
more flexibility in that the input and output names are explicit and can
differ--no implicit .in extension in the source tree or requirement that
they have the same pathname.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Integrating CMake support for xerces

Cantor, Scott
In reply to this post by Roger Leigh
> Definitely not; this is the most complex conversion I've done to date.
> The previous most complex one was libtiff, which also had a fair amount
> of historical stuff.  Most are trivial in comparison.

Good to know.

> Hopefully I got the question you were asking.  I didn't do this in the
> patch because the intention was not to replace the Autotools.  But I can
> look into this if desired.

Not at the moment, I was just trying to understand the relationship between the tools and the implications of having both on the maintenance.

> OK.  I'll look into copying the existing Windows and Libtool semantics
> exactly.  If there's a possibility for aligning them with the next major
> release, we could revisit it then, but I'll revert to the status quo for
> now.

Thx.

> I hope that explains things, but I'm happy to go into more detail for
> any aspects which are unclear.

Thanks, I'll review further while you make the filename adjustments but I think if others don't have concerns we can plan for merging it back to trunk.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
123