Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Read my Firefox's cache contents on Linux
- Date: Thu, 14 Sep 2006 11:00:43 +0900
- From: <stephen@example.com>
- Subject: Re: [tlug] Read my Firefox's cache contents on Linux
- References: <45067A41.1000208@example.com> <45068322.1060008@example.com> <4506B576.7030702@example.com> <45078AF2.2030106@example.com> <d8fcc0800609130035v3c78a3c8i1cf34d61b32ec247@example.com> <45088E31.7080308@example.com>
Dave M G writes: > I'm a little confused right now about how caching works. No, you're not, because there is no prescribed "how" to understand. Only whatever the application happens to do. In computer parlance, a *cache* is not only a collection of valuables (here, valuable data), it is also *by definition* optional and ephemeral. Caching is transparent to process; you only notice it by "side effects" such as disk usage and (presumably) snappier performance. For example, when you tab between page A and page B, the displayed portion of page A is cached in graphics memory, and when you switch back from page B to page A, the redisplay is instant. If you now scroll page A, the content was cached in memory and/or on disk, and display updating is instantaneous as far as a human can tell. In theory, the browser *could* go back out to the web and refetch the current page every time you tab, but you wouldn't put up with a browser like that, would you? > Is it possible for a web site, by JavaScript or some other method > to stop itself from being written to a cache? No. All networking requires buffers, and a cache is nothing more nor less than a buffer that somebody intentionally failed to clean up. What a web site can do (as Birkir pointed out) is to advise the application that caching will be ineffective because the data is volatile. Since resources are limited, a smart application will decide to cache those things that are (a) expensive to reacquire or (b) very likely to be reused, and preferably both, while releasing the space occupied by data that is neither. However, IIRC the no-cache pragma is entirely aimed at proxies, and the Expires header is advisory, and certainly is not going to result in a new fetch every time you hit PageDown! > I tried an experiment where (after backing up), I deleted the contents > of my cache entirely. Then I went to a web page which has dynamic > content, and is on the same web site from which I originally got the > data I'm after. > > I closed FireFox and did nothing else. So, in theory, the cache should > contain only my one visit to that site. That theory is wrong. The cache might contain nothing. The cache might also contain content from every page referenced by the page you visited, plus content from everything in your bookmarks and history. That latter would be a hellaciously aggressive cache. When taken to the petabyte extreme, it's also known as "Google". > And then I searched for text that I know for a fact to have been on that > web site because I just looked at it. No, you don't *know* that, unless you are very sophisticated indeed. What is in the cache could be the raw object, in various stages of partial decoding. For example, it might be UTF-16 text, so if you search for "Dave" you will not find, whereas "D\000a\000v\000e" would get a hit. > And yet, nothing. I can find all sorts of images referenced, from banner > ads and whatnot, and pretty much anything *except* the main body text of > the web page in question. That's easy to explain. Images are large and static, expensive to fetch, and likely to be reused. They're going to end up in the disk cache. Text is small and dynamic. It will be cached in memory until you close the containing tab or window, because you might tab back to it and scroll. But at that point it will be redundant, and the browser will release it sometime after the cache fills. > So either I don't know how caches work, or the main bulk of the text on > the web page is somehow avoiding being cached, or I'm still not > searching using the right methodology. > > Any hints or advice on this? It would be nice if you'd tell us why you want to do this, for starters. You keep telling us what operations you've tried, and that it didn't work. "Well, OK, Dave, that didn't work" is about all we can really say for sure. My suggestion at this point is "learn to use wget". Browsers are not designed to save everything they download. They're designed to show you pretty pictures quickly, they cache what they need to do achieve the highest performance possible given bounds on network responsiveness, and you decide what you want to save permanently. If what you want is sufficiently dynamic, you may have to write your own browser. Or maybe you can just drill down using the DOM explorer. If you really want to, you could try something like $ ( ulimit -d 5000; firefox ) & The ulimit command restricts the amount of memory firefox is allowed to use, which theoretically should force it to put practically everything into the disk cache (or crash it; my money's on the latter). HTH
- Follow-Ups:
- Re: [tlug] Read my Firefox's cache contents on Linux
- From: Josh Glover
- References:
- [tlug] Read my Firefox's cache contents on Linux
- From: Dave M G
- Re: [tlug] Read my Firefox's cache contents on Linux
- From: Dave M G
- Re: [tlug] Read my Firefox's cache contents on Linux
- From: Alain Hoang
- Re: [tlug] Read my Firefox's cache contents on Linux
- From: Dave M G
- Re: [tlug] Read my Firefox's cache contents on Linux
- From: Josh Glover
- Re: [tlug] Read my Firefox's cache contents on Linux
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Read my Firefox's cache contents on Linux
- Next by Date: [tlug] outsourcing email service
- Previous by thread: Re: [tlug] Read my Firefox's cache contents on Linux
- Next by thread: Re: [tlug] Read my Firefox's cache contents on Linux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links