header

Torsten Curdt’s weblog

From push to pull with javaflow

Just recently Marcus blogged about how Sam turned expat into a pull parser via ruby continuations. I found that pretty interesting and was wondering if you could do the same with javaflow. So let’s run through this little program sketch…

A push parser basically is just (more or less) a long running process. It becomes the runnable that we pass into the Continuation class.


public class Parsing extends Runnable {
  public void run() {
    ...
    parser.setContentHandler(new ContinuationSaxHandler());
    parser.parse(inputSource);
  }
}

In order to turn this into a pull scenario we have another place of action where we explicitly ask for every event. That's the actuall pull loop.


while(true) {
  final Event e = c.pull();
  if (e != null) break;
    ...
}

Now how can we connect these two processes? As you can see the parser already uses a special ContinuationSaxHandler.


public class ContinuationSaxHandler extends DefaultHandler {

  public interface Event {
  }

  public class StartElementEvent implements Event {
    String qName;
  }

  public void startElement( String nsURI, String localName, String qName, Attr...
    ContinuationWrapper w = (ContinuationWrapper) Contiuation.getContext();
    w.result = new StartElementEvent(qName);
    Continuation.suspend();
  }
}

This special ContentHandler receives the events from the SAX parser and transforms them into Events objects. These objects need to get passed along when the execution returns at the suspend to the pull loop. In order to pass on the object, the Continuation needs to be wrapped in a ContinuatonWrapper that also provides the actual "pull" method and provides access to the Event object.


public class ContinuationWrapper {

  Continuation c;
  Event result;

  public ContinuationWrapper( Continuation c ) {
    this.c = c;
  }

  public Event pull() {
    result = null;
    c = Continuation.continueWith(c, this);
    return result;
  }
}

So in order to start the pull parsing we only need to create a ContinuationWrapper around the first continuation.


c = new ContinuationWrapper(Continuation.startWith(new Parsing(inputSource)));

while(true) {
  final Event e = c.pull();
  if (e != null) break;
    ...
}

So in theory this should work just fine. I am a bit concerned about the overhead though. The complete parser would need to be instrumented. Plus we have an external resource that is not that easy to "freeze" - the file descriptor. It's an external resource an therefor should not be part of the continuation. All such external resources should usually be passed in through the context object that can be provided on the startWith/continueWith calls.

Now what really blows me away is the fact that in ruby this obviously seems to work also for native code. (expat is native) So I would be really interested why/how this actually works. What happens if I spawn another continuation tree while parsing the same file? ...problems like these come to mind. Interesting stuff!

    blog comments powered by Disqus