Reward-Based, Positive Reinforcement, and Clicker-Training
February 8, 2011 § Leave a comment
History and Theory: Lures, Rewards, Whistles, Clickers
Contemporary, reward-based animal training has its roots in marine-mammal training. During the mid-twentieth century, people began trying to train captive dolphins and orcas using negative reinforcement, or coercion methods, just as people had been training dogs and horses. The trainers prodded and poked the cetaceans; all of the animals reacted in one of two ways: they would either take a big breath and sink to the bottom of the pool in the center and stay there until they couldn’t hold their breath any longer, or they would swim to the far side of the pool and keep the entire pool between them and the trainer. In no instance did any of the animals actually even attempt to do a behavior in response to being pushed or prodded.
This complete lack of success using coercion methods, combined with the simple practical problem of trying to figure out how to force an orca, or even a dolphin, to jump in the air, or how to punish them for not doing something (imagine hitting a 2,000 pound carnivore with a rolled up newspaper) caused the trainers and biologists to rethink their methods and start from a different place altogether. It took a lot of experimentation, but what they ended up doing worked so well their methods allowed the trainers to teach the animals to do things they hadn’t even considered possible. Most of us have seen, at least on television, coordinated acrobatic displays of dolphins and orcas leaping, flipping, towing humans, even swimming backwards while most of their bodies are out of the water.
The trainers taught the animals to do these things by using positive reinforcement and by shaping the performers’ behavior using markers. With marine mammals, the best marker was found to be a whistle, and this is the precursor to the clicker and clicker-training used with dogs, cats, horses and even chickens today. The first step was to cause the cetaceans to identify the whistle as a reward in itself. Like Pavlov with his bells and metronomes, the trainers blew the whistle each time the animals got a fish. Very quickly, the animals associated the whistle with something positive: being fed a fish. This allowed the trainers to use the whistle as an immediate reward (it is difficult or impossible to get a fish into the mouth of a dolphin at exactly the moment it performs a requested action; relatively simple to blow the whistle at just the right moment). The animals quickly learned the sound of the whistle meant fish was coming, just not at that moment, and so the whistle became its own reward, and a marker.
To teach a dolphin a simple (for a dolphin) trick like jumping out of the water, the trainers waited until the dolphin did it on its own, then marked that behavior with the whistle and rewarded it with a fish. To teach the dolphin to do a flip, they would lure it by using a fish on a pole. Eventually, the dolphin would do the flip as it followed the fish in the air. When it did, the trainers would mark that with the whistle and reward it with a fish. Very quickly, the tricks got more complicated, and possibly the most amazing part was that other untrained dolphins would mimic the trained ones, hoping for their own whistles and fish.
In 1963, a biologist named Karen Pryor signed on as a dolphin trainer at Sea Life Park in Hawaii. She was an expert dog and horse trainer, who had great success using traditional methods with those animals, but who had to relearn everything when working with cetaceans. When she left Sea Life Park several years later, Pryor tried using the same general principles she had learned in working with sea mammals on her dogs and horses. The effects were immediately obvious, and quite astounding. In one year she trained an adult dog for obedience competition. Conventional wisdom up to that time said that serious obedience competitors needed at least two years, and that starting with an adult dog, instead of an adolescent, was all but a waste of a time. She entered her dog and won, and did it without using any “corrective” techniques or equipment at all: no choke chain, no hitting, not even the use of “No!” To help prove this was not, as many other trainers claimed, a fluke, she taught her methods to another trainer, who used an adult dog with little or no obedience training. This time, it took only six months, and the new trainer and dog won the same competitions Karen Pryor had won the previous year. People began to notice, and asked for more information.
Pryor did more experimentation and research, and trained horses, dogs, chickens (turns out chickens are pretty smart, given the chance to learn things) and even people using what she calls “positive reinforcement shaping and training”. She has published many papers and articles, but is best known for her bestselling book, Don’t Shoot the Dog, which lays out the history, principles, and benefits to reward-based, positive-reinforcement training. As with most technological breakthroughs, many people were working on similar theories and ideas at the same time (who really invented the radio, the computer mouse-interface, or the automobile?). It took a while for these general ideas to become mainstream, but now these methods are the norm, while just 20 years ago most dog-training methods involved force and coercion instead of rewards and reinforcement. Don’t Shoot the Dog seems to be the most dramatic and important (but far from the only) catalyst for the movement.
Actual Method: Priming
To learn clicker training, I recommend working with a professional trainer, or getting a book on the specific topic. The new, revised edition of Don’t Shoot the Dog has a chapter on clicker training, and there are whole books on the subject available at many large booksellers and pet-supply stores. Click for Joy, listed in this site’s blibiography is also a great resource. I will explain how I do it, and I have had some success with it.
The first step is to get the dog (or cat, chicken, what have you) primed for the clicker. This is fun; we get to play Pavlov. It is as simple as putting together a bag of treats and getting a clicker, then feeding the animal the treats, and clicking each time. Timing is important! The click should happen just as the dog puts her mouth on the treat. Wait a few seconds, until the dog has completely finished the treat and put her attention back to you, and then do it again. Ten or so times per session is about right, and do it a few times a day, several days in a row. At the end of a session, pocket the clicker and show your empty hands, saying “That is all!” or “All done!” or something to let your pet know the session is over.
It is not essential, but it is helpful to have different kinds of treats for this, and for training in the future as well. I use a combination of small (puppy size) dog biscuits in various flavors, cut up pieces of dog loaf (like Natural Balance Dog Food Roll), and freeze dried liver, heart, and lung. You only need very small pieces for this to be an effective reward. The reason it is better to have different foods in your treat bag is based on a principle which has been shown to apply to most animals and especially people. No matter how much the dog likes their favorite thing, he will get tired of it. And no matter how much they like consistency, randomization is almost always preferred. Apply this randomization to later training, too — it is always best, when convenient. Think about it in terms of a slot machine that always pays something. People blow their life savings on slot machines for a reason — the hoping-for something-great causes us to pull that arm, and deposit our money, again and again. Same with animals! Randomizing their reward causes them to be more interested.
Actual Method: Shaping
Once the dog recognizes the click as something good, it can be used as a marker for a behavior we like. And the best part is it can be used as a marker for getting close, thus teaching the dog that it is getting the right idea. The first thing I taught my dog, Iggy, to do using a clicker was to catch popcorn in his mouth. First I made a plan. Making a plan is important because it helps avoid confusing the dog. How can we avoid confusing our pets if we are confused ourselves? My goal was for my dog to catch popcorn in his mouth when I threw it to him. My plan was to shape his behavior by rewarding him, at first, for anything that was even vaguely moving in the right direction. I decided this meant at first I would click for him if the popcorn even hit his muzzle (but not any other part of his head). After a few sessions of this, I planned to click only if he attempted to catch it, even if he didn’t succeed. After a few sessions of that, he would only get a click if he actually made the catch. Also important, is that I would not throw the popcorn unless he was in front of me and attentive (looking at me). I decided I would not require him to be sitting, but requiring him to be sitting would have been a reasonable choice.
Timing is very important, and I found it challenging to get this right. The click should come at the instant that the behavior happens, so this meant clicking right as the popcorn hit his muzzle, then right as he lunged for it, then right as he caught it. It was harder than it sounds, but I kept trying and Iggy learned this trick in about three days. It is important to limit sessions — how long is appropriate varies with the animal, but 20 minutes would be the maximum for an adult dog who is fairly patient and already somewhat trained. For a cat, I would keep it to 5 at first, and for a puppy I would try 10. Better to end early, than late! It is also always best to end upbeat. I always make the last thing we do in a training session something that Iggy has down cold, like “Sit.” We also begin with things he already knows, to get warmed up. The whole session should be fun; if you get frustrated or pissed, end the session and start again later. All training should be fun!!!
The popcorn trick is a fun one for the dog, because there are a lot of rewards happening, and no downside. If the dog misses the popcorn altogether, he still gets to eat it off the floor. If the dog gets a click because it hit his muzzle, but then eats it off the floor, well great! He is happy — he still gets the popcorn, and he earned a click! With practice, Iggy began to get the secondary part of the reinforcement, though: catching popcorn means the next piece comes faster. He doesn’t have to spend time rooting around on the floor looking for it, or resetting himself for my next throw. As he got better at it, I would only click if he actually caught it — close didn’t count anymore. This is because he now knew the goal, and just needed practice to get good at it. The popcorn is plenty reward!
Clicking is Temporary!
The clicker is a tool used to teach. It works the same way to simply say “Yes!” or “Good!” at exactly the right moment as it does to click. The reason the clicker is so great is that it is completely consistent, and sounds the same whether or not the trainer is having a bad day, is hoarse from a cold, is depressed, moody, or even if it is a different person training the dog (like a different spouse or sibling). The clicker is also a great tool because the sound is unique, and is not likely to be encountered in the household or out in the world, whereas a dog is bound to hear lots of “Yes!” and “Good!” in its normal course of life. I have taught Iggy lots of things using the clicker, but rarely take one on walks or to the park. He knows how to come to the front, heel, spin etc., and while he learned those things with the clicker, once he knows them, he doesn’t need the clicker anymore! His rewards for these things are either treats (I usually carry a treat bag), affection (he loves to have his chest rubbed), or even verbal praise (Good, good dog! Yes!).