freelanceprogrammers.org Forum Index » XML / XSL

Conversion: Word documents to XML (Docbook )


View user's profile Post To page top
bn_shr Posted: Fri Jun 02, 2006 2:37 pm


Joined: 02 Jun 2006

Posts: 1
Conversion: Word documents to XML (Docbook )
Hi All,

I have to find methods of converting the Word documents to Docbook coversion
which finally are to be exported to the FrameMaker format.

Could any one suggest an method for the process without loss in formatting?

Thanks,
Shruti


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

[Non-text portions of this message have been removed]
Reply with quote
Send private message
View user's profile Post To page top
feim4162 Posted: Fri Jun 02, 2006 8:39 pm


Joined: 09 Feb 2005

Posts: 4
Conversion: Word documents to XML (Docbook )
Shruti,

You can always use FrameMaker conversion tables to structure the content.
The theoretically simple procedure is:

1. Import Word Content into FrameMaker. The content will be unstructured.
2. Create a conversion table to convert the unstructured Frame content into
structured Frame/DocBook.
3. Save as XML.

In reality, you`ll probably have to:

1. Massage the content in Word so that it uses logical/semantic paragraph
styles with no overrides. VBA works great for this.
2. Ensure you have a template in Frame with the proper paragraph/character
styles that matches your current formatting and is semantically rich enough.
3. Ensure that the elements in the DocBook EDD are mapped to the
paragraph/character styles in your template.
4. Import the Word content into FrameMaker.
5. Clean up the content to remove some residual Word funkiness.
6. Generate a conversion table and map the paragraph styles, character
styles, table elements, etc. to the proper DocBook elements.
7. Run the Frame file against the conversion table.
8. Clean up the structured Frame document so that it is valid.
9. Save as XML.

Repeat any of the above steps as often as necessary. It can be a very
iterative process.

I don`t use DocBook, so others can chime in on the strengths and weaknesses
of FrameMaker`s implementation of DocBook. But, this process has worked
fairly well for me in the past and with a current project.

Mike Feimster
IDD Technical Analyst

ACS Technologies
180 N. Dunbarton Drive
Florence, SC 29501
p / 843.413.8122
f / 843.413.8122
e / mike.feimster@...


-----Original Message-----
From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf Of
Shruti bn
Sent: Friday, June 02, 2006 5:38 AM
To: xml-doc@yahoogroups.com
Subject: [xml-doc] Conversion: Word documents to XML (Docbook )

Hi All,

I have to find methods of converting the Word documents to Docbook
coversion which finally are to be exported to the FrameMaker format.

Could any one suggest an method for the process without loss in
formatting?

Thanks,
Shruti


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

[Non-text portions of this message have been removed]




Yahoo! Groups Links
Reply with quote
Send private message
View user's profile Post To page top
t_craw2002 Posted: Fri Jun 02, 2006 8:44 pm


Joined: 12 Jul 2005

Posts: 2
Conversion: Word documents to XML (Docbook )
Hi Shruti,
You could try opening the Word documents in Open Office, which is
supposed to have a built-in Docbook convertor, or this can be added. I
think there is some loss of formatting in opening the Word document in
OO, but not much.
As for going from Docbook to FM, it sounds a little strange to me then I
don`t know your processes.
All the best,
Tom.



-----Original Message-----
From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf
Of Shruti bn
Sent: 02 June 2006 11:38
To: xml-doc@yahoogroups.com
Subject: [xml-doc] Conversion: Word documents to XML (Docbook )

Hi All,

I have to find methods of converting the Word documents to Docbook
coversion which finally are to be exported to the FrameMaker format.

Could any one suggest an method for the process without loss in
formatting?

Thanks,
Shruti


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

[Non-text portions of this message have been removed]




Yahoo! Groups Links







____________________________________________________________

• This email and any files transmitted with it are CONFIDENTIAL and intended
solely for the use of the individual or entity to which they are addressed.
• Any unauthorized copying, disclosure, or distribution of the material within
this email is strictly forbidden.
• Any views or opinions presented within this e-mail are solely those of the
author and do not necessarily represent those of Odyssey Asset Management
Systems SA unless otherwise specifically stated.
• An electronic message is not binding on its sender. Any message referring to
a binding engagement must be confirmed in writing and duly signed.
• If you have received this email in error, please notify the sender immediately
and delete the original.
Reply with quote
Send private message
View user's profile Post To page top
dirtroad30534 Posted: Fri Jun 02, 2006 8:58 pm


Joined: 13 Jun 2003

Posts: 39
Conversion: Word documents to XML (Docbook )
> I have to find methods of converting the Word documents to
> Docbook coversion which finally are to be exported to the FrameMaker
format.
>
> Could any one suggest an method for the process without loss in
formatting?

Try importing the Word file into OpenOffice and saving as
DocBook from there. Your Word file will have to use styles
thoroughly and consistently for this to have any chance of
working well, and I would expect some cleanup work in any
case.

--
Larry Kollar, Senior Technical Writer, ARRIS CPE Products
"Content creators are the engine that drives
value in the information life cycle."
-- Barry Schaeffer, on XML-Doc
Reply with quote
Send private message
View user's profile Post To page top
shuttie27 Posted: Fri Jun 02, 2006 10:21 pm


Joined: 02 Jun 2005

Posts: 3
Conversion: Word documents to XML (Docbook )
Larry suggested use of Open Office as an intermediate step. An
alternative is to use the free rtf2xml utility
(http://rtf2xml.sourceforge.net/). This converts your Word doc to RTF
and then to DocBook. Some set-up is required, and you need to install
Python.

Again, the degree of success depends on whether the Word docs use styles
consistently, and whether the docs are consistently structured. In any
case you should expect to have to do a fair amount of cleanup. For
cleanup in FrameMaker, you should investigate use of the FrameSLT plugin
from West Street Consulting. The node wizard will help a great deal.

I`m not sure what you mean by "without loss in formatting" since by
definition you will lose the formatting on conversion to XML. Your
FrameMaker setup will take care of that once the conversion is complete,
assuming you create an EDD, template, and read-write rules.

Hope this helps.

Roger

Roger Shuttleworth
Documentation Team Lead
Activplant Corporation
140 Fullarton St.
London, Ontario
N6A 5P2
Canada
Tel. 519 668-7336
Fax. 519 668-3227
www.activplant.com
-----Original Message-----
From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf
Of Shruti bn
Sent: Friday, June 02, 2006 5:38 AM
To: xml-doc@yahoogroups.com
Subject: [xml-doc] Conversion: Word documents to XML (Docbook )

Hi All,

I have to find methods of converting the Word documents to Docbook
coversion which finally are to be exported to the FrameMaker format.

Could any one suggest an method for the process without loss in
formatting?

Thanks,
Shruti


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

[Non-text portions of this message have been removed]




Yahoo! Groups Links







-------------------------------------------------------

The information in this email is confidential and is
intended solely for the addressee. Access to this email
by anyone else is unauthorized.

If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to
be taken in reliance on it, is prohibited and may be
unlawful. Please contact privacy@... for
cases where you have received this email and were not
the intended recipient.
Reply with quote
Send private message
View user's profile Post To page top
melaniekendell Posted: Sat Jun 03, 2006 8:01 am


Joined: 08 Feb 2005

Posts: 8
Conversion: Word documents to XML (Docbook )
Hi Shruti

You might find it easier to import to FrameMaker directly (you don`t
say whether you already have a FrameMaker to Docbook mechanism set up
already).

Word to FrameMaker works pretty well as long as you have a FrameMaker
template set up with the same styles as the Word doc (FM to Word is,
unfortunately, not as successful).

Just a thought.

-Melanie

On 02/06/06, Shruti bn <bnshruti@...> wrote:
> Hi All,
>
> I have to find methods of converting the Word documents to Docbook
coversion which finally are to be exported to the FrameMaker format.
>
> Could any one suggest an method for the process without loss in formatting?
>
> Thanks,
> Shruti
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> [Non-text portions of this message have been removed]
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
Reply with quote
Send private message
View user's profile Post To page top
eoincampbell2 Posted: Tue Jun 06, 2006 7:16 pm


Joined: 06 Jun 2006

Posts: 1
Conversion: Word documents to XML (Docbook )
There are a number of commercial Word to DocBook XML converters
including UpCast, Logictran and (our own offering) YAWC Pro
(www.yawcpro.com).

With all of them, the key is to clean up the Word file before attempting
to convert to XML.
This means applying heading and character level styles consistently, and
using named
styles (e.g. List Bullet, Heading 1, etc.) rather than presentation-only
formatting.

We have developed a Word template which assists the editing process, by
making explicit
a lot of the commonly used styles in Word, so that editors/authors find
it easy to apply the
required style. The template has an explicit menu item, toolbar icon and
keyboard
shortcut to apply the most common structural styles (e.g. <Ctrl>+1 =
Heading 1).

You can download it from
http://www.yawconline.com/wordtemplates/yawcOnline.dot
Feel free to use it as you wish.


Once correctly formatted in Word, any Word to XML converter will do a
reasonably good job of turning
it into DocBook XML, although only a simple section hierarchy will be
supported.
If you want to automatically convert certain Word constructs to specific
DocBook element structures,
then you will need to customise the conversion process to a greater or
lesser extent.


xml-doc@yahoogroups.com wrote:
>
> -----Original Message-----
> From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf Of
> Shruti bn
> Sent: Friday, June 02, 2006 5:38 AM
> To: xml-doc@yahoogroups.com
> Subject: [xml-doc] Conversion: Word documents to XML (Docbook )
>
> Hi All,
>
> I have to find methods of converting the Word documents to Docbook
> coversion which finally are to be exported to the FrameMaker format.
>
> Could any one suggest an method for the process without loss in
> formatting?
>
>

--
--
Eoin Campbell, Technical Director, XML Workshop Ltd.
10 Greenmount Industrial Estate, Harolds Cross, Dublin, Ireland.
Phone: +353 1 4547811; fax: +353 1 4496299.
Email: ecampbell@...; web: www.xmlw.ie
YAWC: One-click web publishing from Word!
YAWC Pro: www.yawcpro.com
YAWC Online: www.yawconline.com
Reply with quote
Send private message
View user's profile Post To page top
lorax1284 Posted: Wed Jun 07, 2006 7:29 pm


Joined: 11 Jun 2003

Posts: 6
Conversion: Word documents to XML (Docbook )
--- In xml-doc@yahoogroups.com, Eoin Campbell <ecampbell@...> wrote:

> With all of them, the key is to clean up the Word file before
> attempting to convert to XML. This means applying heading and
> character level styles consistently, and using named styles
> (e.g. List Bullet, Heading 1, etc.) rather than presentation-only
> formatting.

Hello; if this sounds like too much work for you, and you`re not
inclined to spend the time doing the work yourself, the company I work
for, Exegenix, uses a different approach; print the file to
PostScript, and we will analyse the formatting to properly intuit the
section hierarchy, and we automatically detect lists, tables etc., by
their layout on the page, regardless of the formatting codes used.

If there are specific semantic tags desired, like "author" or
"publishername", tagging can be handled during the quality assurance
phase using our ECS Inspector tool (either by us, or by you) or, as
Eoin suggests, you could pre-process the document... but instead of
having to ensure every single FORMATTING construct is properly tagged,
you JUST tag the handful of important semantic objects... the
processing from that point is automated.

The pricing model is based on volume of output, with
cost-per-kilocharacter of output decreasing with higher volumes. If
the time you spend doing tagging and cleanup is part of your cost
consideration, Exegenix is very cost effective. Otherwise, you can
spend hours or days of your own time cleaning up and fixing things,
instead of being outside enjoying the summer weather. :-)

Visit www.exegenix.com and submit a sample document to us and we can
provide the sample output for you.

Ryan Germann
Exegenix Product Manager
Reply with quote
Send private message
View user's profile Post To page top
phil_caisley Posted: Fri Jun 09, 2006 3:23 am


Joined: 07 Mar 2005

Posts: 2
Conversion: Word documents to XML (Docbook )
Hi all,

You could also take a look at exegenix.com who can convert any styled PDF or
postscript file to Docbook XML and retain all the formatting as attributes
of the Docbook XML elements.

Cheers
Phil


[Non-text portions of this message have been removed]
Reply with quote
Send private message
Post new topic Reply to topic
Display posts from previous:   
 

All times are GMT
Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Freelace Website Designer - Customer web design and software building.
China Wholesale - Electronics Products
Character Studio - Tutorials and Help