<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Help us with the site</title>
	<atom:link href="http://archive.mises.org/9526/help-us-with-the-site/feed/" rel="self" type="application/rss+xml" />
	<link>http://archive.mises.org/9526/help-us-with-the-site/</link>
	<description>Proceeding Ever More Boldly Against Evil</description>
	<lastBuildDate>Fri, 24 May 2013 07:53:49 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: hz</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-507135</link>
		<dc:creator>hz</dc:creator>
		<pubDate>Tue, 03 Mar 2009 02:29:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-507135</guid>
		<description><![CDATA[I dunno, I use restructured text quite regularly for these types of documents (i.e. relatively straighforward text with sections and footnotes and a table or two), and it works very well for turning out xhtml and pdf quickly with no hassle. It is definitely easier for humans to read and write than XML. In fact in my technical writing I used to mark up in XML as i wrote, but it got so distracting i switched to something that felt more natural.

But I admit that XML might be &quot;better&quot; in that if marked up right it&#039;s more explicit... it&#039;s just uglier (and XSLT for conversion also stinks.) Not saying you&#039;re wrong, just thinking of what is easy for volunteers to do.]]></description>
		<content:encoded><![CDATA[<p>I dunno, I use restructured text quite regularly for these types of documents (i.e. relatively straighforward text with sections and footnotes and a table or two), and it works very well for turning out xhtml and pdf quickly with no hassle. It is definitely easier for humans to read and write than XML. In fact in my technical writing I used to mark up in XML as i wrote, but it got so distracting i switched to something that felt more natural.</p>
<p>But I admit that XML might be &#8220;better&#8221; in that if marked up right it&#8217;s more explicit&#8230; it&#8217;s just uglier (and XSLT for conversion also stinks.) Not saying you&#8217;re wrong, just thinking of what is easy for volunteers to do.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-507116</link>
		<dc:creator>Peter</dc:creator>
		<pubDate>Tue, 03 Mar 2009 00:43:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-507116</guid>
		<description><![CDATA[Bah.  reStructuredText is OK for formatting docstrings without markup, but nowhere near suitable for real work.  XML, please (hence XHTML; avoid HTML that doesn&#039;t parse as valid XML, too).]]></description>
		<content:encoded><![CDATA[<p>Bah.  reStructuredText is OK for formatting docstrings without markup, but nowhere near suitable for real work.  XML, please (hence XHTML; avoid HTML that doesn&#8217;t parse as valid XML, too).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hz</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506989</link>
		<dc:creator>hz</dc:creator>
		<pubDate>Mon, 02 Mar 2009 15:25:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506989</guid>
		<description><![CDATA[while there may not be a need for HTML specifically, I can definitely see the need for getting publications out of the fixed page format into a structured format that is easily parsable. 

off the top of my head:
1. lower bandwidth
2. device and software independence
3. somewhat easier / more accurate indexing
4. ease of conversion to whatever your needs might be

If it were me, I would be using a lightweight markup language (reStructured text in python Docutils or SiSU are probably the best for this application) as my base format, rather than HTML. From there you can turn out HTML, palm doc, nicely formatted pdf with LaTex, ODF, or simply read the markup as is in plain text. ]]></description>
		<content:encoded><![CDATA[<p>while there may not be a need for HTML specifically, I can definitely see the need for getting publications out of the fixed page format into a structured format that is easily parsable. </p>
<p>off the top of my head:<br />
1. lower bandwidth<br />
2. device and software independence<br />
3. somewhat easier / more accurate indexing<br />
4. ease of conversion to whatever your needs might be</p>
<p>If it were me, I would be using a lightweight markup language (reStructured text in python Docutils or SiSU are probably the best for this application) as my base format, rather than HTML. From there you can turn out HTML, palm doc, nicely formatted pdf with LaTex, ODF, or simply read the markup as is in plain text. </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506978</link>
		<dc:creator>Peter</dc:creator>
		<pubDate>Mon, 02 Mar 2009 14:37:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506978</guid>
		<description><![CDATA[Huang Di: one advantage of conversion to HTML is easier conversion from there to epub (which is actually just restricted XHTML plus some additional bits); people who won&#039;t read PDFs won&#039;t read HTML (or epub) either, but people who will read PDFs might well prefer to read in reflowable HTML/epub format on a hand-held eInk device - I know I would...not to mention reducing the download size by a factor of 50 or more.]]></description>
		<content:encoded><![CDATA[<p>Huang Di: one advantage of conversion to HTML is easier conversion from there to epub (which is actually just restricted XHTML plus some additional bits); people who won&#8217;t read PDFs won&#8217;t read HTML (or epub) either, but people who will read PDFs might well prefer to read in reflowable HTML/epub format on a hand-held eInk device &#8211; I know I would&#8230;not to mention reducing the download size by a factor of 50 or more.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Huang Di</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506939</link>
		<dc:creator>Huang Di</dc:creator>
		<pubDate>Mon, 02 Mar 2009 12:26:43 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506939</guid>
		<description><![CDATA[I missed the explanation for the NEED for HTML conversion of the PDFs ...

If it&#039;s more than 5 pages in length, HTML-ization will do it no good, as if people do not read PDFs, they won&#039;t read lengthy HTMLs (plus screen staring is bad for the eyes) ...

Maybe out of concern for the application switching ? PDF readers are/will soon incorporate note taking &amp; page bookmarks, making them more definitely usable than HTMLs]]></description>
		<content:encoded><![CDATA[<p>I missed the explanation for the NEED for HTML conversion of the PDFs &#8230;</p>
<p>If it&#8217;s more than 5 pages in length, HTML-ization will do it no good, as if people do not read PDFs, they won&#8217;t read lengthy HTMLs (plus screen staring is bad for the eyes) &#8230;</p>
<p>Maybe out of concern for the application switching ? PDF readers are/will soon incorporate note taking &#038; page bookmarks, making them more definitely usable than HTMLs</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diakrisis LogismÅn</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506754</link>
		<dc:creator>Diakrisis LogismÅn</dc:creator>
		<pubDate>Mon, 02 Mar 2009 06:09:40 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506754</guid>
		<description><![CDATA[I have used Read I.R.I.S. OCR software to convert PDF to html, with very good results :0)]]></description>
		<content:encoded><![CDATA[<p>I have used Read I.R.I.S. OCR software to convert PDF to html, with very good results :0)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hz</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506693</link>
		<dc:creator>hz</dc:creator>
		<pubDate>Mon, 02 Mar 2009 04:24:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506693</guid>
		<description><![CDATA[Denis - not really speaking for mises.org, but most likely the reason it can&#039;t be fully automated is that they&#039;re trying to digitize scanned documents, and the OCR is never perfect.

Something like pgdp.net is the best you can do, which is not exactly automatic but does distribute the load. Mises.org might benefit from checking out some of their pre-processing scripts. 

Having done quite a bit of this for my own personal use, there are some consistent OCR errors that can be relatively easily caught and fixed... and something like aspell -list will produce a list of &quot;bad&quot; words that you can scan to fix &quot;keywords&quot; that are misspelled for better search results.

But if you want a perfect document you pretty much have to go through it line by line sooner or later.]]></description>
		<content:encoded><![CDATA[<p>Denis &#8211; not really speaking for mises.org, but most likely the reason it can&#8217;t be fully automated is that they&#8217;re trying to digitize scanned documents, and the OCR is never perfect.</p>
<p>Something like pgdp.net is the best you can do, which is not exactly automatic but does distribute the load. Mises.org might benefit from checking out some of their pre-processing scripts. </p>
<p>Having done quite a bit of this for my own personal use, there are some consistent OCR errors that can be relatively easily caught and fixed&#8230; and something like aspell -list will produce a list of &#8220;bad&#8221; words that you can scan to fix &#8220;keywords&#8221; that are misspelled for better search results.</p>
<p>But if you want a perfect document you pretty much have to go through it line by line sooner or later.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JD</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506691</link>
		<dc:creator>JD</dc:creator>
		<pubDate>Mon, 02 Mar 2009 04:24:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506691</guid>
		<description><![CDATA[Adobe has a conversion tool for this purpose, which can be found here:

http://www.adobe.com/products/acrobat/access_onlinetools.html

I am also curious as to why you stated that the pdf files cannot be converted via a parser.  Does it have anything to do with the fact that some of the PDF&#039;s are optical scans from books?  ]]></description>
		<content:encoded><![CDATA[<p>Adobe has a conversion tool for this purpose, which can be found here:</p>
<p><a href="http://www.adobe.com/products/acrobat/access_onlinetools.html" rel="nofollow">http://www.adobe.com/products/acrobat/access_onlinetools.html</a></p>
<p>I am also curious as to why you stated that the pdf files cannot be converted via a parser.  Does it have anything to do with the fact that some of the PDF&#8217;s are optical scans from books?  </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Denis</title>
		<link>http://archive.mises.org/9526/help-us-with-the-site/comment-page-1/#comment-506664</link>
		<dc:creator>Denis</dc:creator>
		<pubDate>Mon, 02 Mar 2009 03:32:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mises.org/archives/009526.asp#comment-506664</guid>
		<description><![CDATA[Why can&#039;t this be automated? There are quite a few tools and classes out there that let you translate pdf to html. Surely there must be one that is compatible with an asp server somewhere, no?]]></description>
		<content:encoded><![CDATA[<p>Why can&#8217;t this be automated? There are quite a few tools and classes out there that let you translate pdf to html. Surely there must be one that is compatible with an asp server somewhere, no?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using apc
Database Caching 2/16 queries in 0.008 seconds using memcached
Object Caching 417/422 objects using apc

 Served from: archive.mises.org @ 2013-05-24 12:27:28 by W3 Total Cache -->