Hacker News new | past | comments | ask | show | jobs | submit login
Providing APIs for content-driven websites (aaronparecki.com)
36 points by aaronpk on Oct 7, 2012 | hide | past | favorite | 7 comments



So, much as I'm inclined to keep my head down given I sparked all this silliness off with a stupid joke that got out of hand, I'll say this... In addition to using microformats, RDFa, microdata and other structured-data-in-HTML techniques, Rails does the whole content negotiaton and suffix fallback thing the right way out of the box.

It astounds me that people use Rails and end up doing all the craziness with making custom APIs when Rails already does it the right way out of the box. That they then to go on and pat themselves on the back and call what they've done "RESTful" when they've made an API that's less RESTful than what comes out of the box with Rails is bizarre.

In Java-land, Apache Jersey is also doing it the right way (content negotiation) and is trivially easy to setup suffix fallback.


But why even place the burden on the content producer to create these formats? Why not just establish a practice of raw feeds and let consumers write their own parsers? (I can imagine where consumers write parsers and put the code on Github.) Clearly, consumers are willing to do some work with respect to the data they seek, they will be doing some bulk manipulation of it, else they would not be ready to jump through hoops like "API keys".

My favorite data is raw file on FTP servers, i.e. bulk data. For me, it is the easiest to work with. I can translate it to XML or JSON if I want to, but many times I do not even need to go through that intermediary step to get what I want. If others wanted the data in, say, JSON, I'd be happy to share my parsers with them. I'd bet others would too.

I think Tom's point was "Cut the BS, and just give us the damn data, already." (Correct me if I'm wrong, Tom.) And I think he is spot on. This API nonsense has gotten out of hand.


This makes a tremendous amount of sense on many levels. Having to write screen scrappers for the vast array of HTML / Content layouts on the web is an very tiresome task. Think of all the cool services like Instapaper, Evernote, etc... that would have been considerably easier to build should something like this exist!


The FCC uses (and open sourced[0]) a Drupal module[1] that provides API access to much of the site's content.

[0] http://www.fcc.gov/encyclopedia/content-api-drupal-module

[1] http://www.fcc.gov/developers/fcc-content-api


Interesting idea, but a waste of bandwidth and harder to maintain than a separate API IMHO, even though it allows the responses to be cached for normal and API usage together.

Might as well go back to serving XML and letting the browser transform it (http://www.w3schools.com/xsl/xsl_client.asp), that would be more elegant for such a task (even though it never caught on due to limited browser support).

The content negotiation method would seem best except that most content-driven web pages will contain much content that isn't interesting to the scraper on every page, so a well-designed API with very specific queries will be faster and more useful. Whatever content is on each page is also a moving target for the scraper, even if she gets it as JSON, not so with a straight API.


I once built a very tiny web application with the structure of the GUI in an xml file that was transformed (on the server) with xslt. I probably did one or two things wrong, but it was a complete misery. It could only be worse if the transformation was done in the uncontrollable environment on the client.


On the client side, there were lots of bugs, some are discussed in this SO thread:

http://stackoverflow.com/questions/274290/any-big-sites-usin...

You'd have to do hacky browser xslt detection as detailed here: http://www.informit.com/articles/printerfriendly.aspx?p=6779...

XML in this context apparently pretty much died because of too many badly written parsers, libraries and browsers - and to some extent probably also because dominant search engines would not index XML files properly (see http://news.oreilly.com/2008/06/why-xslt-support-in-the-brow...).

But nowdays at least there are some reasonably portable solutions like http://archive.plugins.jquery.com/project/Transform (but that's unnecessarily slow).

On the server side, it should be fairly simple, unless you hit bugs/deficiencies (like wasting memory / leaks) in libraries ...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact