freelanceprogrammers.org Forum Index » XML / XSL
Regarding "Inherent Structure in Documents"
Joined: 11 Jun 2003
Posts: 6
Regarding "Inherent Structure in Documents"
A few weeks ago, the following message was posted:
Subject: [xml-doc] Question - inherent structure of documents
Date: Thu, 22 May 2003 17:14:38 -0000
From: binisiya <binisiya@...>
Reply-To: xml-doc@yahoogroups.com
To: xml-doc@yahoogroups.com
Most everyone here is anxious to "go structured" via
the imposed DTD/Schema route but, has it occurred anyone
that documents created with conventional authoring tools
are already inherently structured? If so, what do you
make of this?
Also, it seems that documentation inherently has at
least three dimenions: schemata (knowledge and its
abstract organization), structure (layout), symantic
(structure to support meaning).
Efforts, like docbook, attempt to collaspe all aspects
in a single, undifferentiated schema. Has anyone else
used name space to make explicit the multi-dimensional
structure of documents?
<snip />
I would like to direct you to a white paper:
There are no unstructured documents, Presented at XML Europe 2002 by
David Slocombe and Rodney Boyd, Exegenix
http://www.exegenix.com/media/pdf/exegenix_xmleurope2002_paper.pdf
where we provide details on how the Exegenix Conversion System
identifies the structure inherent in printed documents, and makes it
explicit as XML markup.
This idea is the basis for our conversion technology. I hope you find
it informative.
--
Ryan Germann, Product Manager
Exegenix
2490 Bloor St. W., Suite 200
Toronto, ON, Canada M6S 1R4
416-762-2433 fax 416-762-2453
ryan@... http://www.exegenix.com
Joined: 14 Jun 2003
Posts: 10
Regarding "Inherent Structure in Documents"
I`m curious to know if your converter recognizes the
structure of plain text documents and, if so, how it
deals with the complexity of "continuation paragraphs"
especially in multiple-depth lists with multiple line
elements.
The situtions look like this:
------------------------
* A list entry
with multiple lines
o A subentry that also
has multiple lines
A continuation paragraph.
Another paragraph.
And finally another paragraph.
-------------------------------
The problem is to determine which list entries get the
continuation paragraphs appended to them, after any
sublists. (The last paragraph in the example clearly is
not a continuation paragraph, since it is flush left.
But what about the other two?
Especially ambigyous cases arise when proportional fonts
are used, since the writer can easily miss having an
even indentation amounts by several characters.
Ryan Germann wrote:
> A few weeks ago, the following message was posted:
>
> Subject: [xml-doc] Question - inherent structure of documents
> Date: Thu, 22 May 2003 17:14:38 -0000
> From: binisiya <binisiya@...>
> Reply-To: xml-doc@yahoogroups.com
> To: xml-doc@yahoogroups.com
>
> Most everyone here is anxious to "go structured" via
> the imposed DTD/Schema route but, has it occurred anyone
> that documents created with conventional authoring tools
> are already inherently structured? If so, what do you
> make of this?
>
> Also, it seems that documentation inherently has at
> least three dimenions: schemata (knowledge and its
> abstract organization), structure (layout), symantic
> (structure to support meaning).
>
> Efforts, like docbook, attempt to collaspe all aspects
> in a single, undifferentiated schema. Has anyone else
> used name space to make explicit the multi-dimensional
> structure of documents?
>
> <snip />
>
> I would like to direct you to a white paper:
>
> There are no unstructured documents, Presented at XML Europe 2002 by
> David Slocombe and Rodney Boyd, Exegenix
>
> http://www.exegenix.com/media/pdf/exegenix_xmleurope2002_paper.pdf
>
> where we provide details on how the Exegenix Conversion System
> identifies the structure inherent in printed documents, and makes it
> explicit as XML markup.
>
> This idea is the basis for our conversion technology. I hope you find
> it informative.
>
Joined: 11 Jun 2003
Posts: 6
Regarding "Inherent Structure in Documents"
Eric Armstrong wrote:
> I`m curious to know if your converter recognizes the
> structure of plain text documents and, if so, how it
> deals with the complexity of "continuation paragraphs"
> especially in multiple-depth lists with multiple line
> elements.
> The situtions look like this:
> ------------------------
> * A list entry
> with multiple lines
> o A subentry that also
> has multiple lines
>
> A continuation paragraph.
> Another paragraph.
> And finally another paragraph.
> -------------------------------
>
> The problem is to determine which list entries get the
> continuation paragraphs appended to them, after any
> sublists. (The last paragraph in the example clearly is
> not a continuation paragraph, since it is flush left.
> But what about the other two?
Our technology does handle plain text document... a variety of clues
help determine the nesting level of a particular object; indent is one
of those clues. If you`d like to send along a content sample, please
contact me directly via one of the methods in my sig.
Ryan
--
Ryan Germann, Product Manager
Exegenix
2490 Bloor St. W., Suite 200
Toronto, ON, Canada M6S 1R4
416-762-2433 fax 416-762-2453
ryan@... http://www.exegenix.com
Joined: 17 Jun 2003
Posts: 4
Regarding "Inherent Structure in Documents"
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
/ Ryan Germann <ryan@...> was heard to say:
| A few weeks ago, the following message was posted:
|
| Subject: [xml-doc] Question - inherent structure of documents
| Date: Thu, 22 May 2003 17:14:38 -0000
| From: binisiya <binisiya@...>
| Reply-To: xml-doc@yahoogroups.com
| To: xml-doc@yahoogroups.com
|
| Most everyone here is anxious to "go structured" via
| the imposed DTD/Schema route but, has it occurred anyone
| that documents created with conventional authoring tools
| are already inherently structured? If so, what do you
| make of this?
|
| Also, it seems that documentation inherently has at
| least three dimenions: schemata (knowledge and its
| abstract organization), structure (layout), symantic
| (structure to support meaning).
|
| Efforts, like docbook, attempt to collaspe all aspects
| in a single, undifferentiated schema. Has anyone else
| used name space to make explicit the multi-dimensional
| structure of documents?
Any attempt to do this would almost certainly require overlapping
markup. If technologies like LMNL take off, that might become
practical.
Be seeing you,
norm
- --
Norman Walsh <normyahoo@...> | I have seen the truth and it
http://nwalsh.com/ | makes no sense.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>
iD8DBQE+7jF3OyltUcwYWjsRAjEzAKCSbKxVysg5C+aj71wJUQpvUVn4hQCeOs2Z
AAV5MqbHtygg6LamWcsBo3g=
=yobj
-----END PGP SIGNATURE-----
All times are GMT
Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Freelace Website Designer - Customer web design and software building.
China Wholesale - Electronics Products
Character Studio - Tutorials and Help
China Wholesale - Electronics Products
Character Studio - Tutorials and Help







