Woonsan on Open Source Software

Tuesday, August 30, 2016

Can't we store huge amount of binary data in JCR?

Can't we store huge amount of binary data in JCR? If you as a software architect have ever met a question like this (e.g, a requirement to store huge amount of binary data such as PDF files in JCR), maybe you could have had a moment depicting some candidate solutions. What is technically feasible and what's not? What is most appropriate to fulfill all the different quality attributes (such as scalability, performance, security, etc.) with acceptable trade-offs? Furthermore, what is more cost-effective and what's not?

Surprisingly, many people have tried to avoid JCR storage for binary data if the amount is going to be really huge. Instead of using JCR, in many cases, they have tried to implement a custom (UI) module to store binary data directly to a different storage such as SFTP, S3 or WebDAV through specific backend APIs.

It somewhat makes sense to separate binary data store if the amount is going to be really huge. Otherwise, the size of the database used by JCR can grow too much, which makes it harder and more costly to maintain, backup, restore and deploy as time goes by. Also, if your application requires to serve the binary data in a very scalable way, it will be more difficult with keeping everything in single database than separating the binary data store somewhere else.

But there is a big disadvantage with this custom (UI) module approach. If you store a PDF file through a custom (UI) module, you won't be able to search the content through standard JCR Query API any more because JCR (Jackrabbit) is never involved in storing/indexing/retrieving the binary data. If you could use JCR API to store the data, then Apache Jackrabbit could have indexed your binary node automatically and you could have been able to search the content very easily. Being unable to search PDF documents through standard JCR API could be a big disappointment.

Let's face the initial question again: Can't we store huge amount of binary data in JCR?
Actually... yes, we can. We can store huge amount of binary data through JCR in a standard way if you choose a right Apache Jackrabbit DataStore for a different backend such as SFTP, WebDAV or S3. Apache Jackrabbit was designed in a way to be able to plug in a different DataStore, and has provided various DataStore components for various backends. As of Apache Jackrabbit 2.13.2 (released on August, 29, 2016), it supports even Apache Commons VFS based DataStore component which enables to use SFTP and WebDAV as backend storage. That's what I'm going to talk about here.

DataStore Component in Apache Jackrabbit

Before jumping into the details, let me try to explain what DataStore was designed for in Apache Jackrabbit first. Basically, Apache Jackrabbit DataStore was designed to support large binary store for performance, reducing disk usage. Normally all node and property data is stored through PersistenceManager, but for relatively large binaries such as PDF files are stored through DataStore component separately.

DataStore enables:

Fast copy (only the identifier is stored by PersistenceManager, in database for example),
No blocking in storing and reading,
Immutable objects in DataStore,
Hot backup support, and
All cluster nodes using the same DataStore.

Please see https://wiki.apache.org/jackrabbit/DataStore for more detail. Especially, please note that a binary data entry in DataStore is immutable. So, a binary data entry cannot be changed after creation. This makes it a lot easier to support caching, hot backup/restore and clustering. Binary data items that are no longer used will be deleted automatically by the Jackrabbit Garbage collector.

Apache Jackrabbit has several DataStore implementations as shown below:

FileDataStore uses a local file system, DbDataStore uses a relational databases, and S3DataStore uses Amazon S3 as backend. Very interestingly, VFSDataStore uses a virtual file system provided by Apache Commons VFS module.

FileDataStore cannot be used if you don't have a stable shared file system between cluster nodes. DbDataStore has been used by Hippo Repository by default because it can work well in a clustered environment unless the binary data increases extremely too much. S3DataStore and VFSDataStore look more interesting because you can store binary data into an external storage. In the following diagrams, binary data is handled by Jackrabbit through standard JCR APIs, so it has a chance to index even binary data such as PDF files. Jackrabbit invokes S3DataStore or VFSDataStore to store or retrieve binary data and the DataStore component invokes its internal Backend component (S3Backend or VFSBackend) to write/read to/from the backend storage.

One important thing to note is that both S3DataStore and VFSDataStore extend CachingDataStore of Apache Jackrabbit. This gives a big performance benefit because a CachingDataStore caches binary data entries in local file system not to communicate with the backend if unnecessary.

As shown in the preceding diagram, when Jackrabbit needs to retrieve a binary data entry, it invokes DataStore (a CachingDataStore such as S3DataStore or VFSDataStore, in this case) with an identifier. CachingDataStore checks if the binary data entry already exists in its LocalCache first. [R1] If not found there, it invokes its Backend (such as S3Backend or VFSBackend) to read the data from the backend storage such as S3, SFTP, WebDAV, etc. [B1] When reading the data entry, it stores the entry into the LocalCache as well and serve the data back to Jackrabbit. CachingDataStore keeps the LRU cache, LocalCache, up to 64GB by default in a local folder that can be changed in the configuration. Therefore, it should be very performant when a binary data entry is requested multiple times because it is most likely to be served from the local file cache. Serving a binary data from a local cached file is probably much faster than serving data using DbDataStore since DbDataStore doesn't extend CachingDataStore nor have a local file cache concept at all (yet).

Using VFSDataStore in a Hippo CMS Project

To use VFSDataStore, you have the following properties in the root pom.xml:

  <properties>

    <!--***START temporary override of versions*** -->
    <!-- ***END temporary override of versions*** -->
    <com.jcraft.jsch.version>0.1.53</com.jcraft.jsch.version>

    <-- SNIP -->

  </properties>

Apache Jackrabbit VFSDataStore is supported since 2.13.2. You also need to add the following dependencies in cms/pom.xml:

    <!-- Adding jackrabbit-vfs-ext -->
    <dependency>
      <groupId>org.apache.jackrabbit</groupId>
      <artifactId>jackrabbit-vfs-ext</artifactId>
      <version>${jackrabbit.version}</version>
      <scope>runtime</scope>
      <!--
        Exclude jackrabbit-api and jackrabbit-jcr-commons since those were pulled
        in by Hippo Repository modules.
      -->
      <exclusions>
        <exclusion>
          <groupId>org.apache.jackrabbit</groupId>
          <artifactId>jackrabbit-api</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.jackrabbit</groupId>
          <artifactId>jackrabbit-jcr-commons</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <!-- Required to use SFTP VFS2 File System -->
    <dependency>
      <groupId>com.jcraft</groupId>
      <artifactId>jsch</artifactId>
      <version>${com.jcraft.jsch.version}</version>
    </dependency>

And, we need to configure VFSDataStore in conf/repository.xml like the following example:

<Repository>

  <!-- SNIP -->

  <DataStore class="org.apache.jackrabbit.vfs.ext.ds.VFSDataStore">
    <param name="config" value="${catalina.base}/conf/vfs2.properties" />
    <!-- VFSDataStore specific parameters -->
    <param name="asyncWritePoolSize" value="10" />
    <!--
      CachingDataStore specific parameters:
        - secret : key to generate a secure reference to a binary.
    -->
    <param name="secret" value="123456789"/>
    <!--
      Other important CachingDataStore parameters with default values, just for information:
        - path : local cache directory path. ${rep.home}/repository/datastore by default.
        - cacheSize : The number of bytes in the cache. 64GB by default.
        - minRecordLength : The minimum size of an object that should be stored in this data store. 16KB by default.
        - recLengthCacheSize : In-memory cache size to hold DataRecord#getLength() against DataIdentifier. One item for 140 bytes approximately.
    -->
    <param name="minRecordLength" value="1024"/>
    <param name="recLengthCacheSize" value="10000" />
  </DataStore>

  <!-- SNIP -->

</Repository>

The VFS connectivity is configured in ${catalina.base}/conf/vfs2.properties like the following for instance:

baseFolderUri = sftp://tester:secret@localhost/vfsds

So, the VFSDataStore uses SFTP backend storage in this specific example as configured in the properties file to store/read binary data in the end.

If you want to see more detailed information, examples and other backend usages such as WebDAV through VFSDataBackend, please visit my demo project here:

https://github.com/woonsanko/hippo-davstore-demo

Note: Hippo CMS 10.x and 11.0 pull in modules of Apache Jackrabbit 2.10.x at the moment. However, there has not been any significant changes nor incompatible changes in org.apache.jackrabbit:jackrabbit-data and org.apache.jackrabbit:jackrabbit-vfs-ext between Apache Jackrabbit 2.10.x and Apache Jackrabbit 2.13.x. Therefore, it seems no problem to pull in org.apache.jackrabbit:jackrabbit-vfs-ext:jar:2.13.x dependency in cms/pom.xml like the preceding at the moment. But it should be more ideal to match all the versions of Apache Jackrabbit modules some day soon.
Update: Note that Hippo CMS 12.x pulls in Apache Jackrabbit 14.0+. Therefore, you can simply use ${jackrabbit.version} for the dependencies mentioned in this article.

Configuration for S3DataStore

In case you want to use S3DataStore instead, you need the following dependency:

    <!-- Adding jackrabbit-aws-ext -->
    <dependency>
      <groupId>org.apache.jackrabbit</groupId>
      <artifactId>jackrabbit-aws-ext</artifactId>
      <!-- ${jackrabbit.version} or a specific version like 2.14.0-h2. -->
      <version>${jackrabbit.version}</version>
      <scope>runtime</scope>
      <!--
        Exclude jackrabbit-api and jackrabbit-jcr-commons since those were pulled
        in by Hippo Repository modules.
      -->
      <exclusions>
        <exclusion>
          <groupId>org.apache.jackrabbit</groupId>
          <artifactId>jackrabbit-api</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.jackrabbit</groupId>
          <artifactId>jackrabbit-jcr-commons</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <!-- Consider using the latest AWS Java SDK for latest bug fixes. -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk-s3</artifactId>
      <version>1.11.95</version>
    </dependency>

And, we need to configure S3DataStore in conf/repository.xml like the following example (excerpt from https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-aws-ext/src/test/resources/repository_sample.xml):

<Repository>

  <!-- SNIP -->

  <DataStore class="org.apache.jackrabbit.aws.ext.ds.S3DataStore">
    <param name="config" value="${catalina.base}/conf/aws.properties"/>
    <param name="secret" value="123456789"/>
    <param name="minRecordLength " value="16384"/> 
    <param name="cacheSize" value="68719476736"/>
    <param name="cachePurgeTrigFactor" value="0.95d"/>
    <param name="cachePurgeResizeFactor" value="0.85d"/>
    <param name="continueOnAsyncUploadFailure" value="false"/>
    <param name="concurrentUploadsThreads" value="10"/>
    <param name="asyncUploadLimit" value="100"/>
    <param name="uploadRetries" value="3"/>
  </DataStore>

  <!-- SNIP -->

</Repository>

The AWS S3 connectivity is configured in ${catalina.base}/conf/aws.properties in the above example.

Please find an example aws.properties of in the following and adjust the configuration for your environment:

https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-aws-ext/src/test/resources/aws.properties

Comparisons with Different DataStores

DbDataStore (the default DataStore used by most Hippo CMS projects) provides a simple clustering capability based on a centralized database, but it could increase the database size and as a result it could increase maintenance/deployment cost and make it relatively harder to use hot backup/restore if the amount of binary data becomes really huge. Also, because DbDataStore doesn't maintain local file cache for the "immutable" binary data entries, it is relatively less performant when serving binary data, in terms of binary data retrieval from JCR. Maybe you can argue that application is responsible for all the cache controls in order not to burden JCR though.

S3DataStore uses Amazon S3 as backend storage, and VFSDataStore uses a virtual file system provided by Apache Commons VFS module. They obviously help reduce the database size, so system administrators could save time and cost in maintenance or new deployments with these DataStores. They are internal plugged-in components as designed by Apache Jackrabbit, so clients can simply use standard JCR APIs to write/read binary data. More importantly, Jackrabbit is able to index the binary data such as PDF files internally to Lucene index, so clients can make standard JCR queries to retrieve data without having to implement custom code depending on specific backend APIs.

One of the notable differences between S3DataStore and VFSDataStore is, the former requires a cloud-based storage (Amazon S3) which might not be allowed in some highly secured environments, whereas the latter allows to use various and cost-effective backend storages including SFTP and WebDAV that can be deployed wherever they want to have. You can take full advantage of cloud based flexible storage with S3DataStore though.

Summary

Apache Jackrabbit VFSDataStore can give a very feasible, cost-effective and secure option in many projects when it is required to host huge amount of binary data in JCR. VFSDataStore enables to use SFTP, WebDAV, etc. as backend storage at a moderate cost, and enables to deploy wherever they want to have. Also, it allows to use standard JCR APIs to read and write binary data, so it should save more development effort and time than implementing a custom (UI) plugin to communicate directly with a specific backend storage.

Other Materials

I have once presented this topic to my colleagues. I'd like to share that with you as well.

Hosting huge amount of binaries in JCR from Woonsan Ko

Please leave a comment if you have any questions or remarks.

Thursday, May 28, 2015

Hiding Hippo Channel Manager toolbar when unnecessary

WARNING: The solution described in this article is applicable only to Hippo CMS v10.x. As Hippo CMS rewrote many parts of Channel Manager using Angular framework since v11, it is not applicable any more since v11.

In some use cases, content editors don't want to be distracted by the toolbar when editing a page in Hippo Channel Manager. In such use cases, they're okay with using Hippo Channel Manager just as a simple preview tool for the editing content.

So, it is not surprising to hear that they want the toolbar to be hidden in a project unless the current user is really a power user like the 'admin' user.

Yes, that should be easy. I'll look for possible configuration options or ask around on how to hide the toolbar based on the user.

Well, I initially expected that there should be a configuration option somewhere to show the toolbar only to some groups of users. That's why I said so. But, unfortunately, there's no option like that at the moment (at least until 7.9).

Actually someone suggested that I should hack around some CSS classes to hide it, but it would be really hard to set CSS classes properly based on the group memberships of the current user. Also, it sounds really hacky and unmaintainable, which I always try to avoid.

After digging in for a while, the following article took my sights:

Add custom button to the template composer toolbar, http://www.onehippo.org/library/development/add-custom-button-to-the-template-composer-toolbar.html

After reading that article, it didn't take minutes for me to think about adding an invisible toolbar widget to do some JavaScript tricks to hide the whole toolbar. Right? That should be really an easy and maintainable solution!

I followed the guideline described in the article and was able to implement a solution which hides the whole toolbar unless the user is in the 'admin' group by default. Also, I even added a plugin configuration to be able to set which groups are allowed to see the toolbar.

Here's my plugin source:

// cms/src/main/java/com/example/cms/channelmanager/templatecomposer/ToolbarHidingPlugin.java

package com.example.cms.channelmanager.templatecomposer;

import java.text.MessageFormat;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashSet;
import java.util.Set;

import javax.jcr.NodeIterator;
import javax.jcr.RepositoryException;
import javax.jcr.query.Query;
import javax.jcr.query.QueryResult;

import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.lang.ArrayUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.wicket.Component;
import org.apache.wicket.markup.head.IHeaderResponse;
import org.apache.wicket.markup.head.JavaScriptHeaderItem;
import org.apache.wicket.request.resource.JavaScriptResourceReference;
import org.hippoecm.frontend.plugin.IPluginContext;
import org.hippoecm.frontend.plugin.config.IPluginConfig;
import org.hippoecm.frontend.session.UserSession;
import org.json.JSONException;
import org.json.JSONObject;
import org.onehippo.cms7.channelmanager.templatecomposer.ToolbarPlugin;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.wicketstuff.js.ext.util.ExtClass;

/**
 * Invisible Channel Manager Page Editor toolbar widget plugin
 * in order to do some javascript trick like hiding the toolbar
 * based on user's group information.
 * <P>
 * By default, this plugin compares the group names of the current user
 * with the configured {@code groupNamesWithToolbarEnabled} group names.
 * 'admin' group is added to {@code groupNamesWithToolbarEnabled} by default.
 * If there's any common between both, this shows the toolbar.
 * Otherwise, this hides the toolbar.
 * </P>
 * @see http://www.onehippo.org/library/development/add-custom-button-to-the-template-composer-toolbar.html
 */
@ExtClass("Example.ChannelManager.ToolbarHidingPlugin")
public class ToolbarHidingPlugin extends ToolbarPlugin {

    private static Logger log = LoggerFactory.getLogger(ToolbarHidingPlugin.class);

    /**
     * Ext.js plugin JavaScript code.
     */
    private static final JavaScriptResourceReference TOOLBAR_HIDING_PLUGIN_JS =
        new JavaScriptResourceReference(ToolbarHidingPlugin.class, "ToolbarHidingPlugin.js");

    /**
     * JCR query statement to retrieve all the group names of the current user.
     */
    private static final String GROUPS_OF_USER_QUERY =
        "//element(*, hipposys:group)[(@hipposys:members = ''{0}'' or @hipposys:members = ''*'') and @hipposys:securityprovider = ''internal'']";

    /**
     * The names of the groups which the toolbar should be enabled to.
     */
    private Set<String> groupNamesWithToolbarEnabled = new HashSet<String>();

    public ToolbarHidingPlugin(IPluginContext context, IPluginConfig config) {
        super(context, config);

        String param = config.getString("group.names.with.toolbar.enabled", "admin");
        String [] groupNames = StringUtils.split(param, ",");

        if (ArrayUtils.isNotEmpty(groupNames)) {
            groupNamesWithToolbarEnabled.addAll(Arrays.asList(groupNames));
        }
    }

    @Override
    public void renderHead(final Component component, final IHeaderResponse response) {
        super.renderHead(component, response);
        response.render(JavaScriptHeaderItem.forReference(TOOLBAR_HIDING_PLUGIN_JS));
    }

    @Override
    protected JSONObject getProperties() throws JSONException {
        JSONObject properties = super.getProperties();

        if (groupNamesWithToolbarEnabled.contains("*")) {
            properties.put("toolbarEnabled", true);
        } else {
            Set<String> groupNames = getGroupNamesOfCurrentUser();
            Collection intersection = CollectionUtils.intersection(groupNames, groupNamesWithToolbarEnabled);
            properties.put("toolbarEnabled", CollectionUtils.isNotEmpty(intersection));
        }

        return properties;
    }

    private Set<String> getGroupNamesOfCurrentUser() {
        Set<String> groupNames = new HashSet<String>();

        try {
            final String username = UserSession.get().getJcrSession().getUserID();
            String statement = MessageFormat.format(GROUPS_OF_USER_QUERY, username);

            Query q = UserSession.get().getJcrSession().getWorkspace().getQueryManager().createQuery(statement, Query.XPATH);
            QueryResult result = q.execute();
            NodeIterator nodeIt = result.getNodes();
            String groupName;

            while (nodeIt.hasNext()) {
                groupName = nodeIt.nextNode().getName();
                groupNames.add(groupName);
            }
        } catch (RepositoryException e) {
            log.error("Failed to retrieve group names of the current user.", e);
        }

        return groupNames;
    }
}

Basically, the plugin class compares the group membership of the current user with the configured group names to which the toolbar should be enabled. And, it simply sets a flag value to the JSON properties in #getProperties() method. The JSON properties will be passed to the Ext.js class in the end.

Because Hippo Channel Manager components are mostly implemented in Ext.js as well, I need the following Ext.js class. This Ext.js class will read the flag variable passed from the plugin class and hide or show the toolbar HTML element.

// cms/src/main/resources/com/example/cms/channelmanager/templatecomposer/ToolbarHidingPlugin.js

Ext.namespace('Example.ChannelManager');

Example.ChannelManager.ToolbarHidingPlugin = Ext.extend(Ext.Container, {
  constructor: function(config) {

    // hide first and show if the current user has a group membership to which it is allowed.
    $('#pageEditorToolbar').hide();
    if (config.toolbarEnabled) {
      $('#pageEditorToolbar').show();
    }

    // show an empty invisible container widget.
    Example.ChannelManager.ToolbarHidingPlugin.superclass.constructor.call(this, Ext.apply(config, {
      width: 0,
      renderTo: Ext.getBody(),
      border: 0,
    }));
  }
});

I used a simple jQuery trick to hide/show the toolbar (#pageEditorToolbar):

$('#pageEditorToolbar').hide();
$('#pageEditorToolbar').show();

Now, I need to bootstrap this custom toolbar plugin into repository like the following:

<?xml version="1.0" encoding="UTF-8"?>

<!-- bootstrap/configuration/src/main/resources/configuration/frontend/hippo-channel-manager/templatecomposer-toolbar-hiding.xml -->

<sv:node sv:name="templatecomposer-toolbar-hiding" xmlns:sv="http://www.jcp.org/jcr/sv/1.0">
  <sv:property sv:name="jcr:primaryType" sv:type="Name">
    <sv:value>frontend:plugin</sv:value>
  </sv:property>
  <sv:property sv:name="plugin.class" sv:type="String">
    <sv:value>com.example.cms.channelmanager.templatecomposer.ToolbarHidingPlugin</sv:value>
  </sv:property>
  <sv:property sv:name="position.edit" sv:type="String">
    <sv:value>first</sv:value>
  </sv:property>
  <sv:property sv:name="position.view" sv:type="String">
    <sv:value>after template-composer-toolbar-pages-button</sv:value>
  </sv:property>
</sv:node>

Of course, the bootstrap XML should be added by a hippo:initializeitem in hippoecm-extension.xml like the following:

<!-- bootstrap/configuration/src/main/resources/hippoecm-extension.xml -->

    <!-- SNIP -->

    <sv:node sv:name="example-hippo-configuration-hippo-frontend-cms-hippo-channel-manager-templatecomposer-toolbar-hiding">
        <sv:property sv:name="jcr:primaryType" sv:type="Name">
            <sv:value>hippo:initializeitem</sv:value>
        </sv:property>
        <sv:property sv:name="hippo:sequence" sv:type="Double">
            <sv:value>30000.3</sv:value>
        </sv:property>
        <sv:property sv:name="hippo:contentresource" sv:type="String">
            <sv:value>configuration/frontend/hippo-channel-manager/templatecomposer-toolbar-hiding.xml</sv:value>
        </sv:property>
        <sv:property sv:name="hippo:contentroot" sv:type="String">
            <sv:value>/hippo:configuration/hippo:frontend/cms/hippo-channel-manager</sv:value>
        </sv:property>
        <sv:property sv:name="hippo:reloadonstartup" sv:type="Boolean">
            <sv:value>true</sv:value>
        </sv:property>
    </sv:node>

    <!-- SNIP -->

All right. That's it! Enjoy taming your Hippo!

Wednesday, March 19, 2014

A Generic Field Picker Plugin for Hippo CMS

Recently, I released a new forge plugin which provides a generic document field picker so that developers can easily inject their own domain-specific external document browsing functionality.

In Hippo CMS UI, this External Document Picker Base plugin is installed. Then you can configure field(s) in document template bootstrap XML files (a.k.a Hippo CMS "namespace"). So, when you are editing a document in CMS UI, you will see those configured fields displayed by this plugin.
Each field must be configured with a custom domain-specific ExternalSearchServiceFacade implementation class name. Then this plugin component instantiates your ExternalSearchServiceFacade class and invokes it whenever it needs to select/display your domain-specific custom POJOs.

Here is the project homepage:

https://onehippo-forge.github.io/external-document-picker/

After adding its dependency in your project, the only thing you should do is to implement your domain specific external document service facade (see the javadoc for the detail on the facade interface) and configure its FQCN in the field plugin configuration of your namespace bootstrap XML file.See the page below for detail on how to implement/configure the plugin for your custom picker fields:

https://onehippo-forge.github.io/external-document-picker/field/architecture.html

~~At the moment, it supports only Hippo 7.9. I'll try to add a new branch for 7.8 if there are needs.~~
As of April 15, 2014, it supports both Hippo CMS 7.8.x and 7.9.x. Please see its release notes:

https://onehippo-forge.github.io/external-document-picker/release-notes.html

Also, as of 2.0.3, it started supporting a generic link picker in CKEditor toolbars as well. Please see the following pages for details:

This is really powerful because you can simply provide a REST Service URL to the plugin with custom configurations, without having to look into the detail of plugin details.

Enjoy!

Sunday, December 30, 2012

Node.js is great! Run Reverse Proxy on your laptop!

Node.js is great! You can test your Hippo CMS project with a full-featured Reverse Proxy Server on your local development machine SO EASILY! This enables you to test it as same as your production server. You can download the reverse proxy script here: https://github.com/woonsan/hippo7-rproxy-nodejs.

By the way, this solution is very generic, agnostic to Hippo CMS, so you can apply to any different scenarios for different backends other than Hippo CMS, just by configuring the mappings in the script. See the README.md for the details.

Hippo CMS solutions usually consist of multiple web applications and system administrators often deploy a reverse proxy server before Java application servers for many reasons. Apache HTTP Server with mod_proxy has been one of the most popular solutions for the reverse proxy node.

However, it is not so convenient to install Apache HTTP Server on a developer's computer. Sometimes they have to install compilers, make tools, etc. in order to build Apache HTTP Server!

So, I looked for an alternative solution for convenience of developers who want to test in the same environment as the production server. The solution is Node.js!
Yes, I was able to implement a full-featured, reliable reverse proxy script with Node.js very quickly.
This is my reverse proxy script project based on Node.js:

https://github.com/woonsan/hippo7-rproxy-nodejs

How to run the reverse proxy server script

Note: You need to install Node.js in order to run Reverse Proxy Server script.
And, let's suppose you run the Hippo CMS 7 with Tomcat. e.g, `mvn -P cargo.run` at port 8080.

Follow the installation instruction in https://github.com/woonsan/hippo7-rproxy-nodejs.
Move to the root folder of your Hippo CMS 7 project in the command line console and run the following command:
```
$ sudo node rproxy.js
```
The above command will run the Reverse Proxy Server at port 80 by default. (You need super user access to open port 80. That's why you need `sudo` in this example.)
You can run it at a different port like the following example:
```
$ node rproxy.js 8888
```

Now, if you run the rproxy.js at port 80, then visit http://localhost/ simply.

Note: Finally, DON'T FORGET to turn off '@showPort' and '@showContextPath' in /hst:hst/hst:hosts node in your Hippo Repository! If you want to run the rproxy.js at port 80 and remove the /site context path, then you must turn off those properties.

OK. Now enjoy working with rproxy.js (powered by Node.js) !!

Tuesday, July 10, 2012

Converting Apache/Tomcat Access Logs to CSV

Recently, I had to analyze the Apache / Tomcat access log files, and so I needed to convert the log files into CSV in order to have a chance to use other tools such as spreadsheet.
The conversion shouldn't be hard. I found some scripts (in PHP, AWK, Perl or Ruby) on the internet, but those didn't fit my needs quite well. I didn't want to lose any data such as http method, byte size sent in response, etc. Also, the CSV should contain spreadsheet friendly data format. For example, "2012-07-10 22:30:03" instead of "10/Jul/2012:22:30:03".
So, I ended up writing yet another one by myself. Why not? ;-)
Here's the link to the source:

https://github.com/woonsan/accesslog2csv/blob/master/accesslog2csv.pl

The script can be executed like the following:

$ perl accesslog2csv.pl access_log_file > csv_output_file.csv

Or, you can redirect STDIN like the following examples:

$ perl accesslog2csv.pl < access_log_file > csv_output_file.csv

$ cat access_log_file | perl accesslog2csv.pl > csv_output_file.csv

Also, you can check invalid log lines by redirecting STDERR, too:

$ perl accesslog2csv.pl < access_log_file > csv_output_file.csv 2> invalid_log_lines.txt

Hope it helps somewhere! :-)

Generating Reports from Web Logs with AWStats

When you want to analyze the web access pattern from the web access logs, AWStats (http://awstats.sourceforge.net) is a handy solution. In my case, I needed to collect summary data from Tomcat access log files and build proper sample data for load testing.
Here's how to generate reports with AWStats from an access log file:

1. Prerequisites

Your system should be able to run Perl scripts. If not, get Perl now: http://www.perl.org/get.html
If you want to generate PDF report files, then you have HTMLDOC installed: http://www.htmldoc.org/.
Let's assume the htmldoc command is available as `/usr/bin/htmldoc` after installation.
If you want to run AWStats on Web, you should have an Apache2 Web Server and enable CGI. In this article, I'll skip this. See the online documentation if you're interested in: http://awstats.sourceforge.net/docs/awstats_setup.html

2. Install AWStats

If you extract the compressed AWStats distribution file, then you can find the `awstats_configure.pl' script under `tools' directory. You can start from the script like the following example.

$ perl ./awstats_configure.pl

<SNIP>

Do you want to continue setup from this NON standard directory [yN] ? y

<SNIP>

-----> Need to create a new config file ?
Do you want me to build a new AWStats config/profile file (required if first install) [y/N] ? y

-----> Define config file name to create
What is the name of your web site or profile analysis ?
Example: www.mysite.com
Example: demo
Your web site, virtual server or profile name:
> demo


<SNIP>

Press ENTER to continue... 

<SNIP>


Press ENTER to finish...

In the above example, I just installed AWStats just to generate reports offline from access log files without installing onto Apache Web Server for simplicity.
In the second prompt, I just typed 'demo' for a demo analysis task.
The above execution will generate the configuration file for the demo into the `../wwwroot/cgi-bin/awstats.demo.conf' file.

3. Setting the configuration file

Let's open and edit the configuration file for the 'demo' analysis task.
Assuming you're going to analyze a Tomcat access log file, which is in Apache Common Log format.
Here are what you need to edit at least in the configuration file (e.g., `../wwwroot/cgi-bin/awstats.demo.conf'):

# <SNIP>

# Set the access log file path here
LogFile="/var/log/tomcat/access.log"

# <SNIP>

# Examples for Apache combined logs (following two examples are equivalent):
# LogFormat = 1
# <SNIP/>
# For Apache Common Log Format (e.g., Tomcat access log), set it to 4.
LogFormat=4

# <SNIP>

# Set the data directory where AWStats internal data files are stored.
DirData="/var/log/data"

# <SNIP>

With the above configuration (the name of which is 'demo' as shown earlier), this analysis task will analyze the log file configured by 'LogFile' directive, and the internal data will be stored in the directory configured by 'DirData' directive.

4. Update Log Data

Now, you can run AWStats. Go to the `../wwwroot/cgi-bin/' directory and run the following command to update the data from the configured log file:

$ cd ../wwwroot/cgi-bin/
$ perl awstats.pl -config=demo -update

Create/Update database for config "./awstats.demo.conf" by AWStats version 7.0 (build 1.971)
From data in log file "/var/log/tomcat/access.log"...
Phase 1 : First bypass old records, searching new record...
Searching new records from beginning of log file...
Phase 2 : Now process new records (Flush history on disk after 20000 hosts)...
Jumped lines in file: 0
Parsed lines in file: 44217
 Found 0 dropped records,
 Found 0 comments,
 Found 0 blank records,
 Found 1 corrupted records,
 Found 0 old records,
 Found 44216 new qualified records.

By the above command, AWStats will reads all the data from the configured log file and update the internal data files.
If you want to delete the data and re-update from the log files, then you can simply delete all the `*.txt' files in the data directory (which was configured by DirData directive above) and run `perl awstats.pl -config=demo -update` again.

5. Generate Reports

Finally, you can generate a report from the updated data by the following command:

#
# First copy the awstats_buildstaticpages.pl script from tools directory 
# if not exists here.
#
$ cp ../../tools/awstats_buildstaticpages.pl ./

$ perl awstats_buildstaticpages.pl -config=demo -month=all -year=2012 -dir=/tmp -awstatsprog=./awstats.pl -buildpdf=/usr/bin/htmldoc

or

$ perl awstats_buildstaticpages.pl -config=demo -month=all -year=2012 -dir=/tmp -awstatsprog=./awstats.pl

Main HTML page is 'awstats.demo.html'.
PDF file is 'awstats.demo.pdf'.

$



Now, the report file is generated into either html files or /tmp/awstats.demo.pdf!

You can skip `-buildpdf ...' option if you do not have HTMLDOC installed.  

Open the pdf file or the main html page now. It contains nice reports!

Thursday, July 5, 2012

Spring Web MVC framework support in HST-2

(This article was migrated from http://blogs.onehippo.org/woonsan/2009/06/spring_web_mvc_framework_suppo_1.html, originally written on June 5, 2009.)

HST-2 has provided a basic support to enable developers to utilize Spring Framework IoC container for HST components. [1]
Now, HST-2 provides even more. It supports Spring Web MVC Framework based applications under HST-2 environment! Using Spring Web MVC Framework in HST-2 based application development, developers can make use of very useful features that Spring Web MVC Framework is providing, such as clear separation of roles (controller, validator, command object, form object, model object, handler mapping, view resolver, etc.), high configurability, customizability and flexibility.

Acknowledgement: I wrote and tested this Spring Web MVC Framework bridging solution with Spring Framework 2.5.6. However, I think this bridging solution would work with Spring Framework 1.1.5 or later version because the bridging solution depends on the followings only:

The bridging solution's extended DispatcherServlet needs to override protected void render(ModelAndView mv, HttpServletRequest request, HttpServletResponse response), which was added since Spring Framework 1.1.5.
The bridging solution is using simple URL dispatching to spring managed URLs, which has been already in the core part of Spring Web MVC Framework since its origination.

1. A Simple Form Controller Example: Contact-SpringMVC

You can build and run a Spring Web MVC Framework integration example. This example is available since HST-2.03.07.

Build all:
$ mvn clean install -DskipTests

Run a testsuite's cms application:
$ cd testsuite/cms

$ mvn jetty:run-war

Run a testsuite's site application:
$ cd testsuite/site

$ mvn jetty:run

Visit http://localhost:8080/site/preview/news

Now, click the "Contact-SpringMVC" link on the left menu. You can see a page like the following:

If you enter some invalid information, e.g., "wicky" as email, the page will show some validation errors which were generated by the Spring Web MVC Framework like the following:

Now, let's fill valid information and it will show a success view which is defined in the Spring Web MVC configurations.

Here's the simplified configuration for the above Spring Web MVC Framework based application.

<?xml version="1.0" ?>  <beans> <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver">     <property name="viewClass" value="org.springframework.web.servlet.view.JstlView"/>     <property name="prefix" value="/WEB-INF/jsp/"/>     <property name="suffix" value=".jsp"/> </bean> <bean     class="org.springframework.web.servlet.handler.SimpleUrlHandlerMapping">     <property name="mappings">       <value>         /spring/contactspringmvc.do = contactFormController       </value>     </property> </bean>  <bean id="contactFormController" class="org.hippoecm.hst.springmvc.ContactFormController">     <property name="mailSender" ref="mailSender" />     <property name="templateMessage" ref="templateMessage" />     <property name="formView" value="/spring/contactspringmvc-form"/>     <property name="successView" value="/spring/contactspringmvc-success"/>     <property name="commandName" value="contactMessage"/>     <property name="commandClass" value="org.hippoecm.hst.springmvc.ContactMessageBean"/>     <property name="validateOnBinding" value="true"/>     <property name="validators">       <list>         <bean class="org.hippoecm.hst.springmvc.ContactMessageValidator" />       </list>     </property> </bean> <beans>

There's nothing new. Every beans in the applicationContext.xml are just normal beans which can be found in just a normal Spring Web MVC Framework applications.
The only connection point from HST-2 container is the following component configurations in the repository:

     <sv:node sv:name="contactspringmvcform">         <sv:property sv:name="jcr:primaryType" sv:type="Name">             <sv:value>hst:component</sv:value>         </sv:property>         <sv:property sv:name="hst:template" sv:type="String">             <sv:value>contactspringmvc</sv:value>         </sv:property>         <sv:property sv:name="hst:componentclassname" sv:type="String">             <sv:value>org.hippoecm.hst.component.support.SimpleDispatcherHstComponent</sv:value>         </sv:property>         <sv:property sv:name="hst:parameternames" sv:type="String">             <sv:value>action-path</sv:value>         </sv:property>         <sv:property sv:name="hst:parametervalues" sv:type="String">             <sv:value>/spring/contactspringmvc.do</sv:value>         </sv:property>     </sv:node>

The Contact-SpringMVC example has one component, "contactspringmvcform", which component class should be set to "org.hippoecm.hst.component.support.SimpleDispatcherHstComponent" to enable bridging to a pure Spring Web MVC Framework application.
Please note that this bridge component can have additional parameters as follows:

Name Description

dispatch-path The default dispatch path, to which the container dispatches on each invocation.

action-path The dispatch path for doAction() invocation of the component. If this is not configured, then 'dispatch-path' would be used instead.

before-render-path The dispatch path for doBeforeRender() invocation of the component. If this is not configured, then 'dispatch-path' would be used instead.

render-path The dispatch path for rendering phase. If this is not configured, then 'dispatch-path' would be used instead.

before-resource-path The dispatch path for doBeforeServeResource() invocation of the component. If this is not configured, then 'dispatch-path' would be used instead.

resource-path The dispatch path for resource serving phase. If this is not configured, then 'dispatch-path' would be used instead.

The bridge component, "org.hippoecm.hst.component.support.SimpleDispatcherHstComponent", delegates all invocations by dispatching to configured paths.

Finally, you need to configure an extended Dispatcher Servlet in web.xml to run this example:

<servlet>     <servlet-name>HstDispatcherServlet</servlet-name>     <servlet-class>org.hippoecm.hst.component.support.spring.mvc.HstDispatcherServlet</servlet-class>     <init-param>       <param-name>contextConfigLocation</param-name>       <param-value>/WEB-INF/applicationContext.xml</param-value>     </init-param> </servlet> <servlet-mapping>     <servlet-name>HstDispatcherServlet</servlet-name>     <url-pattern>*.do</url-pattern> </servlet-mapping>

The only difference is that the extended Dispatcher Servlet, "org.hippoecm.hst.component.support.spring.mvc.HstDispatcherServlet" should be used instead of the default "org.springframework.web.servlet.DispatcherServlet" of Spring Web MVC Framework.
The reason why this is necessary is explained in the next section.

In summary, you can use Spring Web MVC based application for HST-2 component development.

To enable this bridging from HST-2 container, you need to use the delegator component, "org.hippoecm.hst.component.support.SimpleDispatcherHstComponent".

To allow seamless bridging from HST-2 container, you need to use the "org.hippoecm.hst.component.support.spring.mvc.HstDispatcherServlet" in your web.xml.

You need to configure some parameters such as "action-path" for the delegator comopnent, "org.hippoecm.hst.component.support.SimpleDispatcherHstComponent".

You can now make use of all features provided by Spring Web MVC Framework such as validating, form controller, etc.!

2. The Internal with Architectural Explanation

2.1. Introduction to HST-2 request processing
I think it is a good time to explain briefly about the request processing architecture here because it is fundamental to understand the bridging solution.
For simplicity, I'd like to show an interaction between the HST container and each HST component here instead of explaining all details.
The basic interactions can be depicted as follows.

In the above diagram, the followings are assumed:

The client is requesting a page which maps to a page configuration which is composed of a root HstComponents, "Parent"

The "Parent" component has two child components, "LeftChild" and "RightChild". These two child components are siblings.

At the time, the client is submitting a form included in the HstComponent, "RightChild".

The interaction sequences would be like the following in this case:

Client requests to HST-2 container.

Because the client is submitting a form by an action URL, the container invokes doAction() of "RightChild".

The container redirects to a render page.
(Because the container aggregates multiple components in a page, the action phase should be separated from the render phase of all components. HST-2 container's aggregation implies the PRG pattern. [2])

Client requests to the render page.

The container invokes doBeforeRender() of each component. The invocation order of doBeforeRender() is from parent to child. The invocation order between siblings is not specified.

The container dispatches to the render path of each component. The dispatch order of render page of component is from child to parent. The invocation order between siblings is not specified.

A parent component's render page can include the rendered content of a child component.

The container writes the aggregated content to the client.

Here are important things to note as a bridge solution developer:

Because the action phase request and render phase request are separated in HST-2 request processing, the web application framework should not assume that the request would be shared between the two phases.
For example, when you use SimpleFormController of Spring Web MVC Framework with a form view page and a validator, if a user enters invalid information in the form, the dispatcher would render the form view again with some validation error information. Internally, this information is stored in a ModelAndView object to be rendered in the render phase. This cannot work in HST-2 request processing because the requests are not shared between action phase and render phase.
Therefore, that kind of shared information between action phase and render phase should be passed correctly between two separate request processing phases by bridging solutions.

By the way, because HST request and response objects are just extended objects to the default HttpServletRequest and HttpServletResponse, the other bridging integration stuffs could be easier than expected in other technologies such as Apache Portals Bridge. [3]

Because HST request and response objects are just HttpServletRequest and HttpServletResponse objects, we can think of a very simple bridging solution. We can create a HstComponent which dispatches all invocation to the specified dispatch path. In this case, all necessary handlings should be done by the dispatched servlet or JSP page.
This is covered in the next section.

2.2. A very simple bridging solution: `SimpleDispatcherHstComponent`

This component is the simplest bridging solution to native servlet-based applications.
Here's the simplified source:

package org.hippoecm.hst.component.support;

public class SimpleDispatcherHstComponent extends GenericHstComponent {
    public static final String LIFECYCLE_PHASE_ATTRIBUTE = SimpleDispatcherHstComponent.class.getName() + ".lifecycle.phase";
    public static final String BEFORE_RENDER_PHASE = "BEFORE_RENDER_PHASE";
    public static final String DISPATCH_PATH_PARAM_NAME = "dispatch-path";
    public static final String BEFORE_RENDER_PATH_PARAM_NAME = "before-render-path";
    public static final String RENDER_PATH_PARAM_NAME = "render-path";
    public static final String ACTION_PATH_PARAM_NAME = "action-path";

    @Override
    public void doAction(HstRequest request, HstResponse response) throws HstComponentException {
        doDispatch(getDispatchPathParameter(request, request.getLifecyclePhase()), request, response);
    }

    @Override
    public void doBeforeRender(HstRequest request, HstResponse response) throws HstComponentException {
        request.setAttribute(LIFECYCLE_PHASE_ATTRIBUTE, BEFORE_RENDER_PHASE);
        String dispatchPath = getDispatchPathParameter(request, request.getLifecyclePhase());
    
        if (dispatchPath != null) {
            response.setRenderPath(dispatchPath);
        }

        try {
            doDispatch(getDispatchPathParameter(request, BEFORE_RENDER_PHASE), request, response);

        } finally {
            request.removeAttribute(LIFECYCLE_PHASE_ATTRIBUTE);
        }
    }

    protected void doDispatch(String dispatchPath, HstRequest request, HstResponse response) throws HstComponentException {
        if (dispatchPath != null) {
            try {
                getServletConfig().getServletContext().getRequestDispatcher(dispatchPath).include(request, response);
            } catch (ServletException e) {
                throw new HstComponentException(e);
            } catch (IOException e) {
                throw new HstComponentException(e);
            }
        }
    }

    protected String getDispatchPathParameter(HstRequest request, String lifecyclePhase) {
        String dispatchPath = null;
    
        if (BEFORE_RENDER_PHASE.equals(lifecyclePhase)) {
            dispatchPath = getParameter(BEFORE_RENDER_PATH_PARAM_NAME, request, null);
        } else if (HstRequest.RENDER_PHASE.equals(lifecyclePhase)) {
            dispatchPath = getParameter(RENDER_PATH_PARAM_NAME, request, null);
        } else if (HstRequest.ACTION_PHASE.equals(lifecyclePhase)) {
            dispatchPath = getParameter(ACTION_PATH_PARAM_NAME, request, null);
        }
    
        if (dispatchPath == null) {
            dispatchPath = getParameter(DISPATCH_PATH_PARAM_NAME, request, null);
        }
    
        if (dispatchPath != null) {
            if (dispatchPath.charAt(0) != '/') {
                dispatchPath = new StringBuilder(dispatchPath.length() + 1).append('/').append(dispatchPath).toString();
            }
        }
    
        return dispatchPath;
    }

    protected String getParameter(String name, HstRequest request, String defaultValue) {
        String value = (String) this.getComponentConfiguration().getParameter(name, request.getRequestContext().getResolvedSiteMapItem());
        return (value != null ? value : defaultValue);
    }
}

In the above component, doAction() just dispatches to a dispatch path, which is configured by 'action-path' or falled back to 'dispatch-path' if 'action-path' is not specified in the repository configuration.
And, doBeforeRender() just dispatches to a dispatch path, which is configured by 'before-render-path' or falled back to 'dispatch-path' if 'before-render-path' is not specified in the repository configuration. Also, it sets the render path dynamically by the configuration value for 'render-path', which can be falled back to 'dispatch-path' if not configured.
So, when the container invokes doAction() or doBeforeRender() of this component, it actually dispatches to the native servlet or JSP pages. Also, the container would invoke the render path dynamically set by this component.
The remaining thing is to write the dispatched servlet or JSP page to handle all the invocation correctly.

In most web application framework, the frontend controller should be a servlet, but I'd like to use a simple JSP page for simplicity here.
The above component should have a paramter 'dispatch-url' set to 'jsp/components/contactdispatch.jsp'.
Here is an example native JSP page to handle those (as a simplified version):

<%-- contactdispatch.jsp --%> <%! private static String[] formFields = {"name","email","textarea"}; private void doBeforeRender(HstRequest request, HstResponse response) throws HstComponentException { HttpSession session = request.getSession(true); FormMap formMap = (FormMap) session.getAttribute("contactdispatch:formMap"); if (formMap == null) { formMap = new FormMap(); session.setAttribute("contactdispatch:formMap", formMap); } request.setAttribute("form", formMap); } private void doAction(HstRequest request, HstResponse response) throws HstComponentException { HttpSession session = request.getSession(true); FormMap formMap = new FormMap(request, formFields); session.setAttribute("contactdispatch:formMap", formMap); // Do a really simple validation: if (formMap.getField("email") != null && formMap.getField("email").contains("@")) { // success // do your business logic // possible do a redirect to a thankyou page: do not use directly response.sendRedirect; HstSiteMapItem item = request.getRequestContext().getResolvedSiteMapItem().getHstSiteMapItem().getChild("thankyou"); if (item != null) { sendRedirect(request, response, item.getId()); } else { log.warn("Cannot redirect because siteMapItem not found. "); } } else { // validation failed. Persist form map, and add possible error messages to the formMap formMap.addMessage("email", "Email address must contain '@'"); } } private void sendRedirect(HstRequest request, HstResponse response, String redirectToSiteMapItemId) { HstLinkCreator linkCreator = request.getRequestContext().getHstLinkCreator(); HstSiteMap siteMap = request.getRequestContext().getResolvedSiteMapItem().getHstSiteMapItem().getHstSiteMap(); HstLink link = linkCreator.create(siteMap.getSiteMapItemById(redirectToSiteMapItemId)); StringBuffer url = new StringBuffer(); for (String elem : link.getPathElements()) { String enc = response.encodeURL(elem); url.append("/").append(enc); } String urlString = ((HstResponse) response).createNavigationalURL(url.toString()).toString(); try { response.sendRedirect(urlString); } catch (IOException e) { throw new HstComponentException("Could not redirect. ",e); } } %> <% HstRequest hstRequest = (HstRequest) request; HstResponse hstResponse = (HstResponse) response; String hstRequestLifecyclePhase = hstRequest.getLifecyclePhase(); String dispatchLifecyclePhase = (String) hstRequest.getAttribute(SimpleDispatcherHstComponent.LIFECYCLE_PHASE_ATTRIBUTE); if (HstRequest.ACTION_PHASE.equals(hstRequestLifecyclePhase)) { doAction(hstRequest, hstResponse); } else if (SimpleDispatcherHstComponent.BEFORE_RENDER_PHASE.equals(dispatchLifecyclePhase)) { doBeforeRender(hstRequest, hstResponse); } else if (HstRequest.RENDER_PHASE.equals(hstRequestLifecyclePhase)) { %> <div> <form method="POST" name="myform" action="<hst:actionURL/>"> <input type="hidden" name="previous" value="${form.previous}"/> <br/> <table> <tr> <td>Name</td> <td><input type="text" name="name" value="${form.value['name']}" /></td> <td><font style="color:red">${form.message['name']}</font></td> </tr> <tr> <td>Email</td> <td><input type="text" name="email" value="${form.value['email']}"/></td> <td><font style="color:red">${form.message['email']}</font></td> </tr> <tr> <td>Text</td> <td><textarea name="textarea">${form.value['textarea']}</textarea></td> <td><font style="color:red">${form.message['textarea']}</font></td> </tr> <tr> <td> <c:if test="${form.previous != null}"> <input type="submit" name="prev" value="prev"/> </c:if> </td> <td><input type="submit" value="send"/></td> </tr> </table> </form> </div> <% } %>

Because the component just dispatches each invocation to a dispatch path, the above JSP pages should handle everything correctly.
The following JSP scriptlets detect the request process lifecycle phases and invoke the proper methods, which were just copied from the existing Contact component example.

<% if (HstRequest.ACTION_PHASE.equals(hstRequestLifecyclePhase)) { doAction(hstRequest, hstResponse); } else if (SimpleDispatcherHstComponent.BEFORE_RENDER_PHASE.equals(dispatchLifecyclePhase)) { doBeforeRender(hstRequest, hstResponse); } else if (HstRequest.RENDER_PHASE.equals(hstRequestLifecyclePhase)) { %> //... <% } %>

So, any kind of servlet based application can control everything by using this kind of technique.

2.3. An extended DispatcherServlet: HstDispatcherServlet
In the Spring Web MVC Framework bridging solution, the simplest bridging component, "SimpleDispatcherHstComponent", is used, and the 'action-path' parameter is just set to a spring managed URL like '/spring/contactspringmvc.do'.
So, we can say that the frontend controller should handle everything.
For this reason, we provide a dispatcher servlet, "HstDispatcherServlet", which extends the default DispatcherServlet.
The responsibility of HstDispatcherServlet is very simple. It should pass the ModelAndView object from the action request phase to render request phase:

After completing action phase, it should store the ModelAndView object into session attributes temporarily.

Before doing render phase, it should restore the ModelAndView object from the session attributes if available.

HstDispatcherServlet just overrides the method, render(ModelAndView mv, HttpServletRequest request, HttpServletResponse response) of the default DispatcherServlet to accomplish this.

References
[1] http://woonsanko.blogspot.com/2012/07/spring-framework-support-in-hst-2.html
[2] http://en.wikipedia.org/wiki/Post/Redirect/Get
[3] http://portals.apache.org/bridges/

Name	Description
dispatch-path	The default dispatch path, to which the container dispatches on each invocation.
action-path	The dispatch path for doAction() invocation of the component. If this is not configured, then 'dispatch-path' would be used instead.
before-render-path	The dispatch path for doBeforeRender() invocation of the component. If this is not configured, then 'dispatch-path' would be used instead.
render-path	The dispatch path for rendering phase. If this is not configured, then 'dispatch-path' would be used instead.
before-resource-path	The dispatch path for doBeforeServeResource() invocation of the component. If this is not configured, then 'dispatch-path' would be used instead.
resource-path	The dispatch path for resource serving phase. If this is not configured, then 'dispatch-path' would be used instead.