I am quite smug because I recently solved a messy problem that had been bothering me every now and then, namely assigning unique serial numbers to multiple processes doing the same job simultaneously. This may sound basic but it's actually quite a serious problem.
Let's say you have a forum. A new user signs up and she is user number 1. The next user is No. 2. The next one is No. 3. And so on.
When a user signs up, the system needs to know what number is next. So for the next user, it can count all the users, add one, and you get 4. Or you can write into a file called user_serial_number.txt each time you assign a number. Then, when assigning a new number, you look at the file, and see that the entry is 3. You assign 4, and save 4 to the user_serial_number.txt file.
But what happens when you have a large number of users signing up in a hurry? Your app just went viral and you have 3,000 users joining per day. Two users might join simultaneously and both be assigned the same serial number. This would cause their files to get merged and their data to be intermingled. (That actually happened, in March 2008, with my project Qassia. I had a problem debugging it because both of them - and this is one of God's little pranks - happened to be into fish.)
The typical solution is to use a lockfile.
The user comes along, and before you assign the new serial number, you put in a lockfile. This is a file saved to a particular location and it may be called lockfile.txt, or lockfile_4567.txt (the process ID), or lockfile_1234567890 (the epoch time).
You then go and assign a serial number, and if another user comes along while this is happening, the system will see the lockfile, and wait.
Once the serial number to the first user has been assigned, the lockfile is removed, and the second user's process can put in a new lockfile, and so on.
The problem with this is that it gets awfully, awfully messy. High-performance sites under a lot of load may fail to properly delete the lockfile. The second process then has to wait forever, basically terminating any new sign-ups. Or it can wait a prescribed number of seconds, such as 30, then delete the lockfile. And you have the problem of lockfiles being written simultaneously, so after writing the lockfile you have to go and check if someone else also tried to write a lockfile. It gets messy, like I said.
What you could do is cheat and say, to hell with linear serial numbers (i.e. 1, 2, 3, 4, 5, etc) and assign serial numbers with the process ID attached. If two users arrive simultaneously for #4, you get:
4_4567
4_4568
where 4567 and 4568 are the process ID's assigned by the O/S.
But I hate cheating. A much more elegant solution is to treat serial numbers like the priority numbers you get at bank lobbies.
You write a script to pre-create a bunch of text files, one for each serial number:
1.txt, 2.txt, 3.txt, 4.txt, 5.txt... all the way up to 100,000,000.txt or whatever you are comfortable with.
These are your stubs to be handed out to processes who want them, one by one. When a process comes along and needs a number, it grabs the lowest one, by deleting it:
$result = unlink ("123.txt");
If it managed to delete it, the scalar $result will be "1", and "123" belongs to the happy process. But if another process came along quicker and deleted "123.txt" first, the $result will be "0", and it has to try snatch another number all over again. Which isn't a problem, it can keep doing so and eventually it will get one.
This technique is pretty much foolproof. In this post I used as an example the task of assigning serial numbers to users, but you can use it in situations much hairier. I used it when crawling 150,000 web pages per day for Qirina. Crawling is slow, so you need to have multiple crawlers running simultaneously. Each of them will be retrieving 1-3 pages a second, on average, so if you don't have a good assignment scheme, you WILL end up with duplicate serial numbers.