rome/ROMEDevelopmentProposals/ROMEFeatureRequests.html

127 lines
No EOL
12 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by Apache Maven Doxia Site Renderer 1.4 at 2013-10-04 -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>ROME - ROME Feature Requests</title>
<style type="text/css" media="all">
@import url("../css/maven-base.css");
@import url("../css/maven-theme.css");
@import url("../css/site.css");
</style>
<link rel="stylesheet" href="../css/print.css" type="text/css" media="print" />
<meta name="author" content="mkurz" />
<meta name="Date-Creation-yyyymmdd" content="20110815" />
<meta name="Date-Revision-yyyymmdd" content="20131004" />
<meta http-equiv="Content-Language" content="en" />
</head>
<body class="composite">
<div id="banner">
<a href="http://github.com/rometools/" id="bannerLeft">
<img src="../images/romelogo.png" alt="ROME" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright">
<span id="publishDate">Last Published: 2013-10-04</span>
&nbsp;| <span id="projectVersion">Version: 2.0.0-SNAPSHOT</span>
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>Rome</h5>
<ul>
<li class="none">
<a href="../index.html" title="Overview">Overview</a>
</li>
<li class="collapsed">
<a href="../HowRomeWorks/index.html" title="How Rome Works">How Rome Works</a>
</li>
<li class="none">
<a href="../RssAndAtOMUtilitiEsROMEV0.5AndAboveTutorialsAndArticles/index.html" title="Tutorials And Articles">Tutorials And Articles</a>
</li>
<li class="collapsed">
<a href="../ROMEReleases/index.html" title="Releases">Releases</a>
</li>
<li class="none">
<a href="../ROMEDevelopmentProposals/index.html" title="ROME Development Proposals">ROME Development Proposals</a>
</li>
</ul>
<h5>Project Documentation</h5>
<ul>
<li class="collapsed">
<a href="../project-info.html" title="Project Information">Project Information</a>
</li>
<li class="collapsed">
<a href="../project-reports.html" title="Project Reports">Project Reports</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="poweredBy" alt="Built by Maven" src="../images/logos/maven-feather.png" />
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<div class="section">
<h2>ROME Feature Requests<a name="ROME_Feature_Requests"></a></h2>
<ul>
<li><b>BUG:</b> com.sun.syndication.io.impl.DateParser:Date parseW3CDateTime(String) incorrectly uses a comma (&quot;,&quot;) rather than a decimal (&quot;.&quot;) to delimit the seconds from miliseconds. The correct format can be found on <a class="externalLink" href="http://www.w3.org/TR/NOTE-datetime">http://www.w3.org/TR/NOTE-datetime</a>. The bug is on line 170 (version 0.8). The fix is to replace the line with this: <tt>int secFraction = pre.indexOf(&quot;.&quot;);</tt> -- JLP 9/4/2006</li>
<li><b>BUG:</b> Atom 1.0 parsing uses wrong Content types. &quot;text&quot;, &quot;html&quot;, &quot;xhtml&quot; do not match what is parse from the content elements. Subsequently, the content elements always have a null value - no way to get content.</li>
<li><b>BUG:</b> Link in description is not parsed<br />Try to parse <a class="externalLink" href="http://jakarta.apache.org/site/rss.xml">http://jakarta.apache.org/site/rss.xml</a>, look at entry <a class="externalLink" href="http://jakarta.apache.org/site/news/news-2006-q1.html#20060107.1">http://jakarta.apache.org/site/news/news-2006-q1.html#20060107.1</a> This entry has an &quot;&lt;a href...&quot; in the first line, which isn't parsed by Rome -- Main.iterson - 25 Jan 2006</li>
<li><b>BUG:</b> Support all encodings<br />The problem is when reading RSS a space between the encoding to the value or ualue in '' insted of &quot;&quot; will cause error, for example: this will work work encoding=&quot;windows-1255&quot; but this: encoding = &quot;windows-1255&quot; or encoding='windows-1255' won't work.</li>
<li><b>BUG:</b> The reader doesn't attempt use the masks that defined in the rome.properties for reading the date for all date parsing method, e.g. RSS093Parser.parseItem uses DateParser.parserRFC822 which is not covered by that logic -- Main.den_st - 17 Jan 2006; if it will use the mask the code will run good. I had a problem to read date and I defined a mask in the properties file (datetime.extra.masks=yyyy-MM-dd'T'HH:mm:ss trying to read 2005-09-22T09:00:41} ). Then i try to change one of the mask at runtime to the mask i defined in the properties file and it works good. The logic in the code trys to format the date with each one of the default masks if it faild it returns null instead of trying to format the date using the format that defined in the rome.properties file.</li>
<li>Support for writing to OutputStreams. If I want to compress the feeds to a (.gz) file or write to a socket, I have to extend SyndFeedOutput and WireFeedOutput to add a method called output(SyndFeed, OuputStream). It would be nice to have that built in instead. -- Main.agherna - 08 Aug 2005</li>
<li>I'd like the getDate method on feeds and entries to go to the associated modules and retrieve the appropriate dc:date when the getDate() method returns null. This way entries from feeds like this one: <a class="externalLink" href="http://www.magpiebrain.com/index.xml">http://www.magpiebrain.com/index.xml</a> would have valid dates without requiring me to write code work out what format the feed is in and act accordingly.</li>
<li>Would like to see <a href="../../opml/index.html">OPML</a> parser also.<br /><tt>This is already supported by Rome</tt>. RSS2.0 parser, see <a href="../RssAndAtOMUtilitiEsROMEV0.5AndAboveTutorialsAndArticles/FeedsDateElementsMappingToSyndFeedAndSyndEntry.html">Date Elements mapping</a> by default does not process Modules. Refer to the Modules Plugins documentation to see how to enable this.</li>
<li><a class="externalLink" href="http://bobwyman.pubsub.com/main/2004/09/implementations.html">RFC3229</a> support (in <a href="../../fetcher/index.html">RomeFetcher</a> and example code implementing it for production) would be a killer feature.</li>
<li>The RSS 1.0 Spec <a class="externalLink" href="http://web.resource.org/rss/1.0/spec">http://web.resource.org/rss/1.0/spec</a> indicates that the <b>suggested</b> maximum length for a description field on an entry is 500 characters, but the 0.4 codebase enforces 500 characters as a hard limit -- exceeding it on input or output generates a FeedException. Since one doesn't always have control over the feeds one consumes, it seems to me that it would be a good idea if Rome were more forgiving in accepting feed entries that exceed the suggested lengths.</li>
<li>Is there a chance to include an option in Rome for liberal parsing, ie. trying to get most out of a feed even when it's non-conforming without throwing exceptions? I believe RSS is pretty close to HTML not from a technical point of view but thinking of practical use. Hence, RSS feeds will be incorrect in many cases however they still could join the party with a tolerant parser. Maybe Rome could do for Java what <a class="externalLink" href="http://diveintomark.org/projects/feed_parser/">Mark Pilgrim</a> has done for Python (although I did not verify his ultraliberal parser's tolerance)?</li>
<li>More liberal parsing for dates, to handle un-parseable dates like: &quot;12 sep 1998&quot;, &quot;'05&quot; or monsters like this one : &quot;[2005]&quot;. I faced this problem using DCModule, dc:date attribute can have mentioned values. In older versions of my app there were no constrains for date format, so users have written them very freely.</li>
<li>I think that Rome has problems parsing rss feeds where the xml contains a link to a stylesheet. Try parsing <a class="externalLink" href="http://ihatemyflatmate.blogspot.com/atom.xml">http://ihatemyflatmate.blogspot.com/atom.xml</a> (Atom) or <a class="externalLink" href="http://msdn.microsoft.com/rss.xml">http://msdn.microsoft.com/rss.xml</a> (RSS 1.0). I get Exceptions with both, and they both have stylesheets, whereas other working feeds don't.</li>
<li>It would be very nice to have a possibility to add stylesheet to generated feed. I can do this by replacing header in generated String, but this method is ...</li>
<li>There are problems with the correct encoding of HTML when generating RSS2.0. It is in the area of extended character sets. If you encode the following:
<div class="source">
<pre>
&lt;FONT size=&quot;2&quot;&gt;Quatre pi&amp;amp;#232;ces&lt;/FONT&gt;
</pre></div>
<p>In the hope of gettin &quot;Quatre pi&#xe8;ces&quot; in your html feed. You get from the SynFeedOutput.output() this: </p>
<div class="source">
<pre>
&amp;amp;lt;FONT size=&quot;2&quot;&amp;amp;gt;Quatre pi&amp;amp;amp;#232;ces&amp;amp;lt;/FONT&amp;amp;gt;
</pre></div>
<p>Which ends up with &quot;Quatre pi&#xe8;ces&quot; being displayed in the RSS Reader that is taking your feed. To get the correct ouput I have had to resort to outputString.replaceAll(&quot;&#&quot;,&quot;&#&quot;); OK as a a workaround but not very elegant or performant! -- Main.rjwallis - 19 Mar 2005</p></li>
<li>From that what i know about Rome it's impossible to use &quot;&lt;[!CDDATA [&quot; entities in content of feed's tags. i know this isn't essential, but it would a very nice feature.</li>
<li>Provide support for lastBuildDate in RSS, many news provider, including Yahoo News and BBC use lastBuildDate instead of lastPublishDate.</li></ul></div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
Copyright &#169; 2004-2013
<a href="http://www.rometools.org">ROME Project</a>.
All Rights Reserved.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>