ROME

ROME2 2nd Proposal (July 18th 2006) CURRENT

It has been 2 years since ROME started and along the way we've fixed, improved, enhanced and changed the API and the implementation. We've had to address things we did not think at first, we have to add support for specifications that were not around when ROME started, doing that has not being a hard task. The best indicator we've done a fair job with ROME is adoption.

Some cracks are starting to appear, mistakes in the design (such as overloading the use of DCModule), a bit of implementation over-engineering (such as the home grown bean-introspector and the smart ObjectBean & co classes), a more complex than needed API (the existence of 2 abstraction levels for feed, Synd and Wire beans).

This proposal attempts to address the problems and limitations we currently have with ROME.

Backwards Compatibility, Support and Upgrade

ROME2 will change the API breaking backwards compatibility (after all we are an Open Source Project and that is what they do best).

We will maintain ROME 1.0 (bugfixing only) for 1 year to allow a smooth transition for all ROME users. We will also prepare migration/upgrade tips (based on our own experience) for the ROME community.

Leveraging New Language Features

ROME2 will make use of Generics to type collections in its beans. It will also use the enum construct when applicable.

Minimum Number of Dependencies

ROME2 will implement its core competency (Atom and RSS parsing, generation and manipulation), it will leverage other components as much as possible but it will, as ROME, be as lean as possible not only in its code but in its dependencies (keeping them down to a reasonable minimum).

One Abstraction Level, 2 Models

ROME2 will not have an abstract representation of feeds (the Synd beans). It will have 2 distinct models, Atom and RSS, both of them first citizens.

ROME2 users will use the model that fits more their needs.

Conversion (automatic and programmatic) between the 2 models will be part of ROME2.

Bean Interfaces

After some discussions on the first ROME2 proposal we are going back to the interface model. The self proxy pattern proposed in the first proposal (to be able use arbitrary implementations) adds a bit of complexity into understanding what is going one plus it does not address in a clean way the needs for persistency (such as Hibernate or JPA).

All ROME2 beans will interfaces. Different from ROME, the implementation classes for the ROME2 beans will not be part of the public API. Instead using directly constructors to create ROME2 beans, a factory pattern will be used. A Convenience class, Rome, will wrap the factory class providing a concise way of creating a ROME2 bean.

For example, creating ROME2 beans using the Rome class convenience class will be something like:

Feed feed = Rome.create(Feed.class);
Entry entry1 = Rome.create(Entry.class));
Entry entry2 = Rome.create(Entry);
feed.getEntries().add(entry1);
feed.getEntries().add(entry2);

Which is equivalent to (using the Rome Bean factory class):

Feed feed = BeanFactory.getFactory().create(Feed.class);
Entry entry1 = BeanFactory.getFactory().create(Entry.class));
Entry entry2 = BeanFactory.getFactory().create(Entry);
feed.getEntries().add(entry1);
feed.getEntries().add(entry2);

To provide an alternate implementation beans an alternate set of beans will have to be provided. In addition, if necessary, an alternate bean factory could be used.

To write an alternate implementation the bean classes will have to implement the ROME2 interface beans.

Using multiple Bean implementations simultaneously

ROME2 will support the use of multiple implementation beans simultaneously by changing the bean factory at anytime.

A use case could be retrieving a feed from a persistent store (which uses its own implementation beans), retrieving a feed from the internet (using the default implementation beans), merging their entries and producing an output feed, into the persistent store or out to the internet.

Bean Factory Scopes

The bean factory will support a default factory and a context factory. The default factory is used if no context factory is available. The context factory uses a InheritableThreadLocal to store the factory in context, this is useful for use in dependency injection containers (Servlet, Spring, etc).

Collection Elements

All collection properties (such as the authors, categories, contributors, links and entries of an Atom feed bean) will only have getter methods in the ROME2 API beans, no setter methods. For example:

public class Feed {
  public List<Person> getAuthors() { ... }
  public List<Category> getCategories() { ... }
  ...
}

This will ensure that the implementation bean has control on the implementation of the collection being used. This is particularly important to enable alternate implementation beans to fetch data from a repository as the collection is iterated over.

Modules

Feeds, both RSS and Atom, are extensible via namespaced elements at feed/channel and entry/item level, these namespace elements are known as modules. Beans supporting modules (feed, channel, entry and item) will all have the following method to support modules.

public Map<String,Module> getModules() { ... }

Because modules are uniquely identified by their URI, a Map will be used, the key will be the URI of the module and the value the module itself. As with other ROME2 API bean collection properties, the getModules() property has a getter only, the Map implementation is controlled by the implementation bean.

Module URIs in ROME2, as all the links URLs in the beans, will be Strings thus reducing object creation explosion a bit. For cases that some URI manipulation is required, the JDK URI class should be used.

Dynamic Modules Support

ROME2 modules design has to ensure it provides simple and comprehensive support for dynamic modules such as SLE and GData.

Unknown Modules

A special Module subclass, UnknownModule, will serve as placeholder for module data not explicitly processed by ROME2 available parsers/generators. This will allow to consume and re-export a feed without losing unknown (to ROME2) modules. Each unknown module will be keyed off with its URI in the map of the associated feed, channel, entry or item.

The data of the unknown module will be available as a String. ROME2 users should not use =UnknownModule='s data directly, if they need to manipulate that data they should find or implement bean/parser/generator for the module they need to manipulate.

Because unknown modules brings some extra overhead in the parsing and generating process ROME2 will have a property to enable/disable processing of unknown modules.

The xml:lang Attributes

All ROME2 API beans will have xml:lang attributes, String and enum and primitive types properties won't.

The xml:base Attributes

The xml:base attribute will not be present in any ROME2 API bean.

It will be the responsibility of the parsers to resolve any relative URL present in the feed at parsing time. Similarly, generators may relativize URLs as a size optimization (for God's sake we are doing XML).

Object Class Methods in ROME2 Beans

The equals() and hashCode() methods will not be overridden. As all ROME2 beans are mutable, it is not safe to use them as keys of hash structures.

Cloning will not be directly defined by ROME2 interface beans, instead they will have a copyFrom() method which is type safe and does a deep copy. Cloning a ROME2 bean will be a two step process, for example:

Feed feed1 = Rome.create(Feed.class);
...
Feed feed2 = Rome.create(Feed.class);
feed2.copyFrom(feed1);

The toString() method in the beans will print the property that most likely identifies the bean plus the class name of the implementation bean. For example for a Feed bean the output of toString() would be:

[xxx.rome2.impl.pojo.atom.FeedBean - http://foo.com/atom.xml]

--++ Plugins ClassLoading

Bean implementations, parsers, generators, converters and all pluggable ROME2 components will be loaded using the same classLoader ROME2 core classes are loaded from.

This is to keep simple the handling of them via singletons and their instantiation. Supporting different classLoaders would require complex handling for instantiation, to avoid missing classes and class mismatches.

NOTE: As an example of how easy things can go sour, imagine ROME2 core classes in the common path of a web-container and 2 web-apps adding their own beans/parsers/generators/modules, the singletons managing them are in ROME2 core, managed by the common classLoader, a simple singleton won't cut it. ROME2 core will be small enough that it will not be a memory consumption issue if it is once in each web-app.

Parsers and Generators

To support large feeds (several megabytes or even gigabytes) ROME2 will support stream parsing. We have to see if how this could be done leveraging Abdera, else using StAX API directly.

Using XML stream parsers/generators is more complicated than using a DOM ones, to make easier for developers to implement parsers and generators ROME2 will take care of the details of the XML streaming API and it will expose feed fragments (the feed header and entries) via the JDom API, as JDom Elements. For example, the streaming version of an Atom parser and generator would be something like:

public interface AtomParser {
    // A JDom Document with just the root element, its attributes and namespaces in it.
    boolean canParseFeed(Document jdomDoc);

    // repeatable operation, returns was has been read from the header so far
    // feed parameter has whatever is has been readed from the feed header so far.
    // If the given feed parameter is not null data from the jdom element is injected in it.
    Feed parseFeed(Element jdomFeedElement, Feed feed) throws FeedException;


    // the jdomFeedElement allows the parser to get context (such as base URL, namespaces)
    // for the entry being parsed.
    Entry parseEntry(Element jdomFeedElement, Element jdomEntryElement) throws FeedException;
  }

  public interface AtomGenerator {
    String getFeedType();

    // if called must be called once and before a generateEntry(Entry)
    Element generateFeed(Feed feed) throws FeedException;

    Element generateEntry(Entry entry) throws FeedException;
  }

For Modules the parser and generator interface would be something like:

public interface ModuleParser<M extends Module> {
    String getUri();

    M parseModule(Element jdomFeedElement);

    M parseModule(Element jdomFeedElement, Element jdomEntryElement);
  }
   public interface ModuleGenerator<M extends Module> {
    String getUri();

    Element generateModule(M module);
  }

Parsers and Generators for modules will follow the same principle.

As with ROME, in ROME2 Parsers will be as lenient as possible. Generators will be strict.

ROME2 IO classes

Feed parsers and generators will not be directly accessed by the ROME2 user, they are used by ROME2 to expose a more convenient API in the form of streaming API and builder API. For example:

public class RomeIO {

    // streaming API

    public static AtomReader createAtomReader(Reader reader, boolean xmlHealing) { };
    public static RssReader createRssReader(Reader reader, boolean xmlHealing) { };

    public static AtomWriter createAtomWriter(Writer writer, String feedType) { };
    public static RssWriter createRssWriter(Writer writer, String feedType) { };


    // builder API

    public Feed parseAsFeed(Reader reader) { };
    public Channel parseAsChannel(Reader reader) { };

    public void generate(Writer writer, Feed feed, String feedType) { };
    public void generate(Writer writer, Channel channel, String feedType) { };
  }
   public interface AtomReader {

    String getFeedType();

    // repeatable read, returns the current parsed state of the feed header
    Feed readFeed() throws FeedException;

    // returns a feed entry while there are more, NULL when done
    Entry readEntry() throws FeedException;

    // closes the feed reader
    void close() throws FeedException;
  }
   public interface AtomWriter {

    String getFeedType();

    // if called must be called once and before a write(Entry)
    void writeFeed(Feed feed) throws FeedException;

    // if the first write is for an Entry, then the output is an item document
    void writeEntry(Entry entry) throws FeedException;

    void close() throws FeedException;
  }

Feed Validators

A bean FeedValidator class will verify that a feed or entry bean is valid for a given feed type. Feed generators would use this class to ensure feed correctness.

Feed Conversion

Conversion from Atom to RSS beans and vice versa will be done by an implementation of the FeedConverter interface. Conversion will be supported at both feed/channel and entry/item level. All properties, including modules data will be copied, after an conversion the state of the beans will be completely independent.

The converter implementation will be pluggable, the implementation will be obtained from the bean factory and the Rome convenience class.

Conversion from Atom beans to RSS beans and vice versa will be done by a FeedConverter class that would have the following signature:

public interface FeedConverter {
    public Feed convertToFeed(Channel channel);
    public Feed convertToFeed(Channel channel, boolean processItems);
    public Entry convertToEntry(Item item);

    public Channel convertToChannel(Feed feed);
    public Channel convertToChannel(Feed feed, boolean processEntries);
    public Item convertToItem(Entry entry);
}

Parser and Generator Filters

Manipulation of feeds during parsing and generation at feed/channel and entry/item level would be possible by implementing readers and writers wrappers that work on top of the original reader and writer instances.

This filtering/wrapping could be automated via configuration.

Sources

The following ZIP file only includes the ROME2 beans as described in this proposal, it does not include any of the parser, generator or IO API.

rome2proto2.zip

Project Documentation