« How to Organize a Database Table’s Keys for Scalability | Main | 17 Distributed Systems and Web Scalability Resources »

January 04, 2009



The first link of the article needs to be fixed.

Max Indelicato

Hi Oren,

Thanks, I've corrected the link.

Robert Brewer

I don't quite follow your use of "highly available". The schemes you outlined seem to depend on specific memcached servers being reachable at all times. For example, what happens to the insert process if the server (to which the key 'activeInsertUserShard' hashes) goes down?

Robert Brewer

I don't quite follow your use of "highly available". The schemes you outlined seem to depend on specific memcached servers being reachable at all times. For example, what happens to the insert process if the server (to which the key 'activeInsertUserShard' hashes) goes down?


If you want to maximize the number of records memcached can store, you could tweak the memcached memory block sizes. You'll need to recompile it to do that, but you could save lots of memory. If memory serves me, the relevant default block sizes are 64 and 128 bytes, which would mean that for each stored GUID you lose another 50-ish bytes to internal fragmentation. Tweaking that could give you 50% more stored keys.

Paolo Casciello

It doesn't seem transaction-safe...
If there's a problem in the MC's key upgrade what happens? You must re-implement a transaction-safe logic.

Or I missed something?


You could also use memcachedb instead of memache and no information will be dropped when you run out of memory.

Max Indelicato


Thanks for your comments. I'll try to reply to all of them here.

@Robert Brewer
This strategy is highly available in that it uses Memcache, which is in itself highly available. There are different ways to achieve high availability, none of which truly ensure 100% uptime. Using Memcache as I've described it achieves high availability in that if one index node goes down, only a subset of the overall index is down. That's high availability in action right there.

To use your example - if the shard which holds the 'activeInsertUserShard' goes down, we would have a problem. A work around for this is to replicate that value amongst multiple keys which are hashed to different servers. Regardless, in the worst case, you wouldn't be able to insert new users and that's all. No major data outages would occur as a result, beyond the indexed data that was on the downed node. The idea here is to minimize the effect of downed nodes by having the indexed data distributed amongst a number of servers.

Good point. Alternatively, you could remove the memory reclamation feature and it would server as an in-memory version of a database. When you've hit a server's memory limit, you would just receive an error and not an overwrite. This behavior would be far more predictable and could yield decent results.

@Paolo Casciello
You're correct, Memcache isn't transactional. However, that shouldn't have a major impact on the usage of Memcache in this way (or even in the normal way i.e. to cache data). Many distributed systems trade-off between scalability and consistency, and end up with eventual consistency. To ensure eventual consistency, you could have an out-of-band process checking data intermittently, looking for signs of a loss in data integrity. I believe Flickr uses something like this.

Good point. I haven't looked into memcachedb yet. Sounds like that may be an existing solution to the memory reclamation issue that occurs when using this strategy.


Wouldn't be wise to use memcached just like we use it in web app, that is to still have a shard for indexing purpose. Each time we need an info from the index, we query memcached. If found, we use it, if not, we query the shard for the info, store it in memcached for next time, and use it.

That way, there would be no need to re-load the data into memcached and even if a memory reclamation happens, it wouldn't so bad.

What do you think?

José Borges Ferreira

You can even use MySql memcached UDFs ( http://tangent.org/586/Memcached_Functions_for_MySQL.html ) and store procedures (or triggers) to update cache when you do inserts, updates or deletes on the database.

And i second Martin regarding the behavior of the client. You should always try to fetch from cache. Then if fail then fetch from DB and set cache.


"It should be highly available. The failure of any single node should, ideally, not result in any data being unavailable for an index lookup, and at worst, not result in a majority percentage of data being unavailable for an index lookup. Minimizing the impact of failed instances is very important."

You didn't discuss your strategy for this requirement. HA is the biggest issue since the data isn't persistent.

Ryan Schneider

Many people have already mentioned that the data being placed in MC as proposed in this article is not persistent and if you run out of memory you will start to "loose date". How about all the items that are more likely to happen such as maintenance and upgrades of MC. These are times when the MC instance will need to come down.

Max Indelicato

You certainly could, and there's nothing stopping you from implementing that on top of this strategy. However, I'm proposing to use Memcache in a different way - explicitly using it as in in-memory key/value datastore for a shard index.

@Jose Borges Ferreira
Good idea, that could make updating Memcached for index related inserts even easier. I'll have to look into that some more.

The data is persistent, just not within Memcache. You can always re-retrieve the index data from the MySQL source dataset. That behavior is how you would achieve HA; by reloading data from the source shards as necessary. I briefly address this process in the "Loading Index Data" section.

@Ryan Shneider
Please see my above reply to Chris. Taking down the MC instance may be unavoidable as a result of Memcache's lack of persistence. Using MemcacheDB might be a work around to that. Otherwise it may be necessary to just do a reload of the whole Memcache cluster. This could likely be done fairly quickly (seconds or minutes) if there are enough shards and enough Memcache nodes to distribute the load of the reloading operation, and there likely will be if you've planned ahead appropriately.

A quick note about where I was going with this article; I responded to a few of the same questions on Hacker News and wanted to re-post one of my responses here, to give further incite into the underlying purpose of this article.


"My intent in using Memcache as the mechanism for this type of indexing was more an attempt to relate the subject with something that most developers are at least somewhat familiar with. I explicitly address the weaknesses of Memcache in the section titled "Weaknesses". I also recommend some work-arounds to lessen the effect of Memcache's limitations. In the "Wrap-Up" section, I even go so far as to say that Memcache is really only one example of a distributed hash-table and that there are alternatives. Rolling your own is another option, and both of solutions are probably better suited to the indexing problem than Memcache.

Again, I was afraid that the subject would be lost on most people if there wasn't at least some relation made between the concept and something that concretely exists. Try to look at it as an exercise in thinking "outside the box", using the best example I could think of.

Thanks for the comments, good and bad. These critiques really do help."

José Borges Ferreira

Just a sidenote regarding the so called memcached "weakness".
Memcached is a cache. Like any other cache, it doesn't have to hold all the information. Assuming a nice read/write ratio , with Memcache the speed to the information will boost and your databases load will be lower!

If you really need to have all your data in cache add dedicated cluster. To see some impressive numbers take a look at FaceBook where they have "(...)more than 800 servers supplying over 28 terabytes of memory to our users(...). ( http://www.facebook.com/note.php?note_id=39391378919 )

Btw: Max, keep up the posts :p

The comments to this entry are closed.


  • Max Indelicato. Chief Software Architect and technology enthusiast.

Other Services I Use

January 2009

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31