How do I go back to get older articles, or all the articles, from a site’s archives

Contents

[ hide ]

    *For more how-tos and advice, return to How Do I …?

    Q: How do I go back to get older articles, or all the articles, from a site’s archives? FeedWordPress only picks up recent articles!

    A. Unfortunately, there probably is not yet a way to do what you want to do with feeds. There’s little that FeedWordPress can do about it, because this has to do with some basic features of the way that syndication feeds work, and the lack of standardized ways to query archives.

    Syndication feeds are designed to notify subscribing clients about new content as it is published, or new events as they happen. They typically do not provide a comprehensive view of all the posts in a site’s archives (it would, in any case, be extremely wasteful, in terms of bandwidth and other resource consumption, for a post to deliver its entire content archives to a client that regularly requested a feed once an hour!). Typically, a feed contains the most recent 10-15 items from a site; in order to aggregate and archive content over time, a feed consumer like FeedWordPress simply has to keep track of new content as it comes in, and commit it to a long-term store like the WordPress database so that the data will still be available after it ages out of the live feed.

    Some services, such as del.icio.us, do provide special methods (usually using a parameter added to the feed URL) which allow you to determine how many items will be returned, or to fetch older posts that appeared within some fixed window of time. When services support features like this, you can set up FeedWordPress to take advantage of the feature by temporarily subscribing to the service’s URL with those special parameters, performing a single update in order to import the older posts, and then switching the URL back to a URL that retrieves only the newest updates. So, for example, with del.icio.us feeds, you could retrieve the past 1,000 items from the user radgeek by subscribing to the following feed in FeedWordPress:

    http://feeds.delicious.com/v2/rss/radgeek?count=1000
    

    After subscribing to this feed and using a manual Update from within FeedWordPress to import the posts, you should now have the last 1,000 posts from this user archived on your WordPress installation; then you should Switch Feed on that subscription back to a feed that returns only the most recent batch of posts:

    http://feeds.delicious.com/v2/rss/radgeek?count=20
    

    (If you leave the subscription on the feed with count=1000 over a long period of time, this may be considered abusive by del.icio.us; it will certainly slow down your FeedWordPress installation to no good effect, by forcing FeedWordPress to scan through hundreds of posts that it has already syndicated every single time it tries to perform an update.)

    Unfortunately, while specific services offer special parameters, there is currently no generally accepted standard for how to go back through site archives to fetch older posts — you have to figure it out specially for each individual site, and many sites currently provide no way to do it at all.

    There have been some discussions of standardizing methods for paging and archiving features in Atom feeds, but so far they haven’t been implemented widely enough to rely on them as standards. If, in the future, standardized methods emerge for requesting back archives, FeedWordPress will introduce features to support them, but until then there’s not much FeedWordPress can do in any programmatic way.

    Does that answer your question? If not, use the Talk page to comment on this post or contact me by e-mail for help. Be sure to describe what you are trying to do, and the problems you are running into, with as much detail as possible.

    For more how-tos and advice, return to How Do I …?

    This page is a Wiki! Log in or register an account to edit.

    Leave a Reply

    Your email address will not be published. Required fields are marked *