Journal tags: storage

6

Caching and storing

When I was speaking at conferences last year about service workers, I’d introduce the Cache API. I wanted some way of explaining the difference between caching and other kinds of storage.

The way I explained was that, while you might store stuff for a long time, you’d only cache stuff that you knew you were going to need again. So according to that definition, when you make a backup of your hard drive, that’s not caching …becuase you hope you’ll never need to use the backup.

But that explanation never sat well with me. Then more recently, I was chatting with Amber about caching. Once again, we trying to define the difference between, say, the Cache API and things like LocalStorage and IndexedDB. At some point, we realised the fundamental difference: caches are for copies.

Think about it. If you store something in LocalStorage or IndexedDB, that’s the canonical home for that data. But anything you put into a cache must be a copy of something that exists elsewhere. That’s true of the Cache API, the browser cache, and caches on the server. An item in one of those caches is never the original—it’s always a copy of something that has a canonical home elsewhere.

By that definition, backing up your hard drive definitely is caching.

Anyway, I was glad to finally have a working definition to differentiate between caching and storing.

Saving forms

I added a long-overdue enhancement to The Session recently. Here’s the scenario…

You’re on a web page with a comment form. You type your well-considered thoughts into a textarea field. But then something happens. Maybe you accidentally navigate away from the page or maybe your network connection goes down right when you try to submit the form.

This is a textbook case for storing data locally on the user’s device …at least until it has safely been transmitted to the server. So that’s what I set about doing.

My first decision was choosing how to store the data locally. There are multiple APIs available: sessionStorage, IndexedDB, localStorage. It was clear that sessionStorage wasn’t right for this particular use case: I needed the data to be saved across browser sessions. So it was down to IndexedDB or localStorage. IndexedDB is the more versatile and powerful—because it’s asynchronous—but localStorage is nice and straightforward so I decided on that. I’m not sure if that was the right decision though.

Alright, so I’m going to store the contents of a form in localStorage. It accepts key/value pairs. I’ll make the key the current URL. The value will be the contents of that textarea. I can store other form fields too. Even though localStorage technically only stores one value, that value can be a JSON object so in reality you can store multiple values with one key (just remember to parse the JSON when you retrieve it).

Now I know what I’m going to store (the textarea contents) and how I’m going to store it (localStorage). The next question is when should I do it?

I could play it safe and store the comment whenever the user presses a key within the textarea. But that seems like overkill. It would be more efficient to only save when the user leaves the current page for any reason.

Alright then, I’ll use the unload event. No! Bad Jeremy! If I use that then the browser can’t reliably add the current page to the cache it uses for faster back-forwards navigations. The page life cycle is complicated.

So beforeunload then? Well, maybe. But modern browsers also support a pagehide event that looks like a better option.

In either case, just adding a listener for the event could screw up the caching of the page for back-forwards navigations. I should only listen for the event if I know that I need to store the contents of the textarea. And in order to know if the user has interacted with the textarea, I’m back to listening for key presses again.

But wait a minute! I don’t have to listen for every key press. If the user has typed anything, that’s enough for me. I only need to listen for the first key press in the textarea.

Handily, addEventListener accepts an object of options. One of those options is called “once”. If I set that to true, then the event listener is only fired once.

So I set up a cascade of event listeners. If the user types anything into the textarea, that fires an event listener (just once) that then adds the event listener for when the page is unloaded—and that’s when the textarea contents are put into localStorage.

I’ve abstracted my code into a gist. Here’s what it does:

  1. Cut the mustard. If this browser doesn’t support localStorage, bail out.
  2. Set the localStorage key to be the current URL.
  3. If there’s already an entry for the current URL, update the textarea with the value in localStorage.
  4. Write a function to store the contents of the textarea in localStorage but don’t call the function yet.
  5. The first time that a key is pressed inside the textarea, start listening for the page being unloaded.
  6. When the page is being unloaded, invoke that function that stores the contents of the textarea in localStorage.
  7. When the form is submitted, remove the entry in localStorage for the current URL.

That last step isn’t something I’m doing on The Session. Instead I’m relying on getting something back from the server to indicate that the form was successfully submitted. If you can do something like that, I’d recommend that instead of listening to the form submission event. After all, something could still go wrong between the form being submitted and the data being received by the server.

Still, this bit of code is better than nothing. Remember, it’s intended as an enhancement. You should be able to drop it into any project and improve the user experience a little bit. Ideally, no one will ever notice it’s there—it’s the kind of enhancement that only kicks in when something goes wrong. A little smidgen of resilient web design. A defensive enhancement.

Apple’s attack on service workers

Apple aren’t the best at developer relations. But, bad as their communications can be, I’m willing to cut them some slack. After all, they’re not used to talking with the developer community.

John Wilander wrote a blog post that starts with some excellent news: Full Third-Party Cookie Blocking and More. Safari is catching up to Firefox and disabling third-party cookies by default. Wonderful! I’ve had third-party cookies disabled for a few years now, and while something occassionally breaks, it’s honestly a pretty great experience all around. Denying companies the ability to track users across sites is A Good Thing.

In the same blog post, John said that client-side cookies will be capped to a seven-day lifespan, as previously announced. Just to be clear, this only applies to client-side cookies. If you’re setting a cookie on the server, using PHP or some other server-side language, it won’t be affected. So persistent logins are still doable.

Then, in an audacious example of burying the lede, towards the end of the blog post, John announces that a whole bunch of other client-side storage technologies will also be capped to seven days. Most of the technologies are APIs that, like cookies, can be used to store data: Indexed DB, Local Storage, and Session Storage (though there’s no mention of the Cache API). At the bottom of the list is this:

Service Worker registrations

Okay, let’s clear up a few things here (because they have been so poorly communicated in the blog post)…

The seven day timer refers to seven days of Safari usage, not seven calendar days (although, given how often most people use their phones, the two are probably interchangable). So if someone returns to your site within a seven day period of using Safari, the timer resets to zero, and your service worker gets a stay of execution. Lucky you.

This only applies to Safari. So if your site has been added to the home screen and your web app manifest has a value for the “display” property like “standalone” or “full screen”, the seven day timer doesn’t apply.

That piece of information was missing from the initial blog post. Since the blog post was updated to include this clarification, some people have taken this to mean that progressive web apps aren’t affected by the upcoming change. Not true. Only progressive web apps that have been added to the home screen (and that have an appropriate “display” value) will be spared. That’s a vanishingly small percentage of progressive web apps, especially on iOS. To add a site to the home screen on iOS, you need to dig and scroll through the share menu to find the right option. And you need to do this unprompted. There is no ambient badging in Safari to indicate that a site is installable. Chrome’s install banner isn’t perfect, but it’s better than nothing.

Just a reminder: a progressive web app is a website that

  • runs on HTTPS,
  • has a service worker,
  • and a web manifest.

Adding to the home screen is something you can do with a progressive web app (or any other website). It is not what defines progressive web apps.

In any case, this move to delete service workers after seven days of using Safari is very odd, and I’m struggling to find the connection to the rest of the blog post, which is about technologies that can store data.

As I understand it, with the crackdown on setting third-party cookies, trackers are moving to first-party technologies. So whereas in the past, a tracking company could tell its customers “Add this script element to your pages”, now they have to say “Add this script element and this script file to your pages.” That JavaScript file can then store a unique idenitifer on the client. This could be done with a cookie, with Local Storage, or with Indexed DB, for example. But I’m struggling to understand how a service worker script could be used in this way. I’d really like to see some examples of this actually happening.

The best explanation I can come up with for this move by Apple is that it feels like the neatest solution. That’s neat as in tidy, not as in nifty. It is definitely not a nifty solution.

If some technologies set by a specific domain are being purged after seven days, then the tidy thing to do is purge all technologies from that domain. Service workers are getting included in that dragnet.

Now, to be fair, browsers and operating systems are free to clean up storage space as they see fit. Caches, Local Storage, Indexed DB—all of those are subject to eventually getting cleaned up.

So I was curious. Wanting to give Apple the benefit of the doubt, I set about trying to find out how long service worker registrations currently last before getting deleted. Maybe this announcement of a seven day time limit would turn out to be not such a big change from current behaviour. Maybe currently service workers last for 90 days, or 60, or just 30.

Nope:

There was no time limit previously.

This is not a minor change. This is a crippling attack on service workers, a technology specifically designed to improve the user experience for return visits, whether it’s through improved performance or offline access.

I wouldn’t be so stunned had this announcement come with an accompanying feature that would allow Safari users to know when a website is a progressive web app that can be added to the home screen. But Safari continues to ignore the existence of progressive web apps. And now it will actively discourage people from using service workers.

If you’d like to give feedback on this ludicrous development, you can file a bug (down in the cellar in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying “Beware of the Leopard”).

No doubt there will still be plenty of Apple apologists telling us why it’s good that Safari has wished service workers into the cornfield. But make no mistake. This is a terrible move by Apple.

I will say this though: given The Situation we’re all living in right now, some good ol’ fashioned Hot Drama by a browser vendor behaving badly feels almost comforting.

Going offline at Indie Web Camp Düsseldorf

I’ve just come back from a ten-day trip to Germany. The trip kicked off with Indie Web Camp Düsseldorf over the course of a weekend.

IndieWebCamp Düsseldorf 2017

Once again the wonderful people at Sipgate hosted us in their beautiful building, and once again myself and Aaron helped facilitate the two days.

IndieWebCamp Düsseldorf 2017

Saturday was the BarCamp-like discussion day. Plenty of interesting topics were covered. I led a session on service workers, and that’s also what I decided to work on for the second day—that’s when the talking is done and we get down to making.

IndieWebCamp Düsseldorf 2017 IndieWebCamp Düsseldorf 2017 IndieWebCamp Düsseldorf 2017 IndieWebCamp Düsseldorf 2017

I like what Ethan is doing on his offline page. He shows a list of pages that have been cached, but instead of just listing URLs, he shows a title and description for each page.

I’ve already got a separate cache for pages that gets added to as the user browses around my site. I needed to figure out a way to store the metadata for those pages so that I could then display it on the offline page. I came up with a workable solution, and interestingly, it involved no changes to the service worker script at all.

When you visit any blog post, I put metadata about the page into localStorage (after first checking that there’s an active service worker):

if (navigator.serviceWorker && navigator.serviceWorker.controller) {
  window.addEventListener('load', function() {
    var data = {
      "title": "A minority report on artificial intelligence",
      "description": "Revisiting Spielberg’s films after a decade and a half.",
      "published": "May 7th, 2017",
      "timestamp": "1494171049"
    };
    localStorage.setItem(
      window.location.href,
      JSON.stringify(data)
    );
  });
}

In my case, I’m outputting the metadata from the server, but you could just as easily grab some from the DOM like this:

var data = {
  "title": document.querySelector("title").innerText,
  "description": document.querySelector("meta[name='description']").getAttribute("contents")
}

Meanwhile in my service worker, when you visit that same page, it gets added to a cache called “pages”. Both localStorage and the cache API are using URLs as keys. I take advantage of that on my offline page.

The nice thing about writing JavaScript on my offline page is that I know the page will only be seen by modern browsers that support service workers, so I can use all sorts of fancy from ES6, or whatever we’re calling it now.

I start by looping through the keys of the “pages” cache (that’s right—the cache API isn’t just for service workers; you can access it from any script). Then I check to see if there is a corresponding localStorage key with the same string (a URL). If there is, I pull the metadata out of local storage and add it to an array called browsingHistory:

const browsingHistory = [];
caches.open('pages')
.then( cache => {
  cache.keys()
  .then(keys => {
    keys.forEach( request => {
      let data = JSON.parse(localStorage.getItem(request.url));
        if (data) {
          data['url'] = request.url;
          browsingHistory.push(data);
      }
    });

Then I sort the list of pages in reverse chronological order:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I loop through each page in the browsing history list and construct a link to each URL, complete with title and description:

let markup = '';
browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

Finally I dump the constructed markup into a waiting div in the page with an ID of “history”:

let container = document.getElementById('history');
container.insertAdjacentHTML('beforeend', markup);

All those steps need to be wrapped inside the then clause attached to caches.open("pages") because the cache API is asynchronous.

There you have it. Now if you’re browsing adactio.com and your network connection drops (or my server goes offline), you can choose from a list of pages you’ve previously visited.

The current situation isn’t ideal though. I’ve got a clean-up operation in my service worker to limit the number of items stored in my “pages” cache. The cache never gets bigger than 35 items. But there’s no corresponding clean-up of metadata stored in localStorage. So there could be a lot more bits of metadata in local storage than there are pages in the cache. It’s not harmful, but it’s a bit wasteful.

I can’t do a clean-up of localStorage from my service worker because service workers can’t access localStorage. There’s a very good reason for that: the localStorage API is synchronous, and everything that happens in a service worker needs to be asynchronous.

Service workers can access indexedDB: it’s asynchronous. I could use indexedDB instead of localStorage, but I’m not a masochist. My best bet would be to use the localForage library, which wraps indexedDB in the simple syntax of localStorage.

Maybe I’ll do that at the next Homebrew Website Club here in Brighton.

Twilight

I went out to The Albert the other night to see Twilight Hotel play. There were really good, so after the show I bought their CD, When The Wolves Go Blind.

It was only when I got home that I realised that I had no device that could play Compact Discs. I play all my music on my iPod or iPhone connected to a speaker dock. And my computer is a Macbook Air …no disc drive. So I had to bring the CD into work with me, stick into my iMac and rip the songs from there.

It’s funny how format (or storage medium) obsolescence creeps up on you like that. I wonder how long it will be until I’m not using any kind of magnetic medium at all.

Simple Storage Service

Amazon’s new S3 service looks very interesting indeed. At first glance, it just looks like a very cheap way of storing and retrieving files — which it is — but the really fascinating aspect is that there is no user interface. It is purely a web service. As Sam Newman says:

When you get down to it, Amazon S3 is simply a large, distributed hash map with an API. Unless people build applications on top of it, it’s useless.

The creators of S3 have gone out of their way to keep the architecture as simple as possible. This is a smart move. I’m a great believer in the power of stupid networks.

Leaving aside the underlying technology, S3 is good news in purely practical terms. If nothing else, this should start a price war for data storage. Yet another barrier to entry has been lowered for anyone looking to publish anything online. Odeo and YouTube are good for audio and video respectively, but the agnostic nature of S3 means that you can store and stream on your own terms.

Hardware has been getting cheaper and cheaper for some time. Now it looks like bandwidth is heading the same way (for some amusing anecdotes on bandwidth issues, be sure to listen to Bernie Burns’ keynote from SXSW).

I’m looking forward to playing around with S3. For a service with no face, it sure looks like it’s got legs.