Rewards and Punishments: the role of operant conditioning in human behavior

From minimizing a dog’s barking to training dogs to serve as human companions to law enforcement’s drug detectors; from shaping the behavior of children and adults with autism to all stages of the educational ladder, B.F. Skinner’s Operant Conditioning is used to shape both human and animal behavior. We all use Operant Conditioning to mold the behavior of those around us, and it is used on each of us, without most of us knowing or understanding its mechanisms.

The story of B.F. Skinner’s Operant Conditioning begins with Edward Thorndike’s Law of Effect. In the late 1800’s and early 1900’s, it was noticed that stray cats in New York were disappearing. They would later find them in Edward Thorndike’s laboratory. Edward Thorndike’s experiment was simple: place a cat in a cage and see how long it would take for the cat to discover that a latch opens the door which leads to its freedom. As was expected, the initial time for the cat to escape was longer than subsequent trials. The second part of the experiment also was fairly simple: see whether the behavior repeats in the future. Edward Thorndike’s Law of Effect proposes that if an action produces a desirable outcome, the likelihood of that behavior repeating would increase. However, if an action produces an undesirable outcome, the likelihood of that behavior repeating in the future will decrease.

Operant versus Classical Conditioning
Classical Conditioning: the outcome (unconditioned stimulus) occurs no matter what the organism does (the organism may prepare for the expected US (preparatory response) to minimize or enhance its impact, but it will still happen)
Operant Conditioning: the organism must act/not act for the outcome to occur/not occur

B.F. Skinner extended Edward Thorndike’s Law of Effect and brought it from the laboratory to the mainstream by arguing its applicability to shape human behavior. He argued that abnormal behavior are behavior that were reinforced or rewarded (more on this later).

Operant Conditioning has two components: reinforcement (or reinforcer) and punishment (or punisher). Reinforcement increases the probability of a behavior occurring in the future whereas punishment decreases that probability.

The role of reinforcement in Operant Conditioning

There are two types of reinforcements: positive and negative. Positive reinforcements are desirable stimuli given to the subject to increase the probability of the behavior recurring in the future. Negative reinforcement removes an annoying (painful) stimulus that increases the probability of a behavior occurring in the future. [Note: negative reinforcement is not the same as punishment! Negative reinforcement increases the behavior, punishment decreases the likelihood of the behavior recurring.]

A look at positive and negative reinforcement at work: a mother decides to go to the mall with her two-year old toddler. As they walk around the mall, they spot a toy store. Inside the toy store, the toddler spots an item and begins to nag the mother to purchase it. The mother says no. The toddler begins to cry and scream. Still, the mother says no. To the mother’s dismay, the toddler’s crying and screaming increases. Embarrassed, the mother relents and purchases the item.

Positive reinforcement: in the scenario above, the mother’s act of purchasing the item served as a positive reinforcement (giving something desirable) to the child’s crying and screaming. The mother has taught the child that the child need only cry and scream in the future and the child will get what the child wants. Thus, the likelihood of the behavior of crying and screaming recurring in the future has increased.

Think about it: abnormal behavior are those behaviors that were reinforced in the past

Negative reinforcement: in the same scenario, the child’s crying and screaming ceased. It is a negative reinforcement in that it removed something annoying to the mother (the child’s crying and screaming). It is a reinforcement because the child had taught the mother that in the future the mother need only give what the child wants and the child will cease the undesirable stimulus. Thus, increasing the probability of the mother’s behavior of relenting in the future.

Schedule of reinforcement

When and how often a behavior is reinforced affects its effectiveness. In schedule of reinforcement, there are five categories: continuous, fixed ratio, variable ratio, fixed interval, and variable interval.

Continuous reinforcement: in continuous reinforcement, the reward is given each time the behavior is performed. While effective in the initial stages, one can imagine that after a number of trials an organism will lose interest in the reinforcer. (Imagine a soldier getting a medal each time the soldier goes to battle. He’ll soon run out of shirt space to pin those medals. Or a student who gets $1 for each “A” received for every test and quiz taken. At some point the parent will be broke and the child …)

fixed ratio reinforcement: in fixed ratio, the behavior is reinforced after a set number of behavioral responses. An example of fixed ratio is employment that relies on commission. In this case, the employee is rewarded only after a set number of sales. Just as in fixed interval (below), the behavior (in this case, selling) goes up closer to the time of expected reinforcement and tapers off after. Thus, behavior is unstable.

Variable ratio reinforcement: in variable interval, the behavior is reinforced after an unknown number of times the behavior is performed. Variable ratio is understood to be the most effective reinforcement schedule. An example of variable ratio reinforcement schedule is in the field of gambling. The gambler knows that at some point the behavior of betting will be reinforced (will win). What is unknown is when. A slot machine player will sit in front of the machine for long periods of time, continuously inserting coins and pulling the lever in anticipation of the reinforcement.

Fixed interval reinforcement: interval refers to the passage of time. In fixed interval, the reinforcement is provided after a set amount of time has lapsed. An example of fixed interval schedule of reinforcement is employment that has set pay schedule dependent upon the lapse of time — weekly, every two weeks, the 1st and the 15th of each month, monthly, etc. The problem with fixed interval schedule of reinforcement is that the behavior is unstable: lessens after the reinforcement (after payday) and increases as the next scheduled reinforcement approaches.

Variable interval reinforcement: interval refers to the passage of time. In variable interval, just as in variable ratio, the s schedule of reinforcement is unknown. It may take 15 minutes or two days; it may be in two weeks or two months. While the response (behavior) is low, for those who are in the variable interval schedule of reinforcement the behavior is consistent.

Example of variable interval in human behavior: in clinical setting, variable interval schedule of reinforcement appears in bad, abusive relationships. In these cases, one or both persons in the relationship would complain about how bad their relationship is and cite examples after examples of how the other person treats them (and others) abusively. Yet, while leaving the relationship seem to most as common sense, they would not. Why? Well, no relationship is 100% bad. Rare as the occasion may be, there are times when the couple laughs or share intimate moments. While one person or both people suffer, one or both hold on to the relationship for those rare occasions when they are intimate. When that may happen, neither one knows. It may be in a day, or maybe two weeks; it may be in a month, or maybe six. Regardless, one or both believe that at some point it will happen again.

The role of punishment in Operant Conditioning

Unlike reinforcement which increases the likelihood of a behavior recurring in the future, punishments serve to decrease the behavior. (Unfortunately, when it comes to shaping human behavior, it is punishment that we often use. As will be shown later, punishment is not the most effective way to modify human behavior.) In animals punishment can come in the form of loud noise, squirt of water, being hit; in humans, punishments can come in the form of monetary fines, social disapproval, incarceration. Just like reinforcement, there are two types of punishments: positive punishment and negative punishment. Regardless of the form or type of punishment, the intended (or at times unintended) result is the same: decrease of behavior.

Positive punishment: to use positive punishment is to give something undesirable that decreases the likelihood of the behavior recurring. [Note: positive does not mean it is good. In this case, positive means something is given.] An example of positive punishment (something undesirable is given) is social disapproval: screaming, yelling, public ridicule. The most common form of positive punishment is infliction of pain. Again, just because it is positive does not mean it is good.

Negative punishment: something desirable is taken away to reduce the likelihood of a behavior recurring. In our society, freedom is one such desirable stimulus that is often taken away to reduce a behavior. Time out, grounding, and incarceration are all forms of negative punishment. Monetary fines also serve as a negative punishment. The threat of excommunication for many Catholics has resulted in many behaviors diminishing or ceasing altogether. Point to consider when applying negative punishment: the stimulus being taken must be of significant importance for the punishment to be effective. Taking away play time with other children from a child who enjoys being alone will not have significant effect in reducing undesirable behavior.

[TO BE CONTINUED…]

Franco E. Santos, MA, Ed.D

Rewards and Punishments: the role of operant conditioning in human behavior

Related Posts