Bixo Hackathon September 7th & 8th
August 26, 2010
There’s a Bixo hackathon next month, and you’re invited.
While that’s probably a long jaunt for many, even if you can’t make it you can still help by providing input on areas of Bixo that you think need the most love.
Note that even if you’re not a hard-core Bixo user, fringe benefits from participating include learning a lot about the very useful underlying technologies (Cascading, Hadoop, HttpClient) as well as getting an excuse to visit beautiful Nevada City, California.
Some known issues are:
- Documentation & tutorials (of course).
- Changing the xxxDatum data model to be wrappers for Cascading tuples, versus POJOs.
- Making datum metadata into a single unchecked field in datums passed through pre-defined sub-assemblies.
- Using abstract base classes versus interfaces for many/most extension points.
- Creating separate crawl and parse policies, versus having just a fetch policy.
- Emitting (optional) binary data and cleaned up XHTML text in ParseDatum.
- Supporting page scoring/links scores out-of-the-box.
- Switching back to using SequenceFiles for maintaining crawl state.
So send a note to the bixo-dev mailing list if you’re interested in attending, or just want to cast a vote/suggest additional changes.
No comments yet