Skip to content

Redirect Mode added

August 29, 2010

The master version of Bixo now has support for how redirects get handled during fetching. Why would you care? Well, if the URLs you are processing wind up redirecting between domains, then you often want to avoid blindly following them, as when that happens there is no check for whether the URL is blocked by robots.txt. Also, if you need to track links because you’re building a link graph, then you need to know that the link from Page A to Page B should actually be treated as a link to Page C.

How has this been implemented? It’s a new FetcherPolicy setting. From

// Possible redirect handling modes. If a redirect is NOT followed
// because of this setting, then a RedirectFetchException is thrown,
// which is the same as what happens if too many redirects occur.
// But RedirectFetchException now has a reason field, which can 

public enum RedirectMode {
    FOLLOW_ALL,       // Fetcher will try to follow all redirects
    FOLLOW_TEMP,     // Temp redirects are auto-followed, but not permanent.
    FOLLOW_NONE      // No redirects are followed.

The default setting is FOLLOW_ALL, in which case the SimpleHttpFetcher behaves the same as before. To set a new mode, you’d do something like:

    FetcherPolicy policy = new FetcherPolicy();
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: