Skip to content

Bixo Hackathon September 7th & 8th

August 26, 2010

There’s a Bixo hackathon next month, and you’re invited.

While that’s probably a long jaunt for many, even if you can’t make it you can still help by providing input on areas of Bixo that you think need the most love.

Note that even if you’re not a hard-core Bixo user, fringe benefits from participating include learning a lot about the very useful underlying technologies (Cascading, Hadoop, HttpClient) as well as getting an excuse to visit beautiful Nevada City, California.

Some known issues are:

  • Documentation & tutorials (of course).
  • Changing the xxxDatum data model to be wrappers for Cascading tuples, versus POJOs.
  • Making datum metadata into a single unchecked field in datums passed through pre-defined sub-assemblies.
  • Using abstract base classes versus interfaces for many/most extension points.
  • Creating separate crawl and parse policies, versus having just a fetch policy.
  • Emitting (optional) binary data and cleaned up XHTML text in ParseDatum.
  • Supporting page scoring/links scores out-of-the-box.
  • Switching back to using SequenceFiles for maintaining crawl state.

So send a note to the bixo-dev mailing list if you’re interested in attending, or just want to cast a vote/suggest additional changes.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: