For a comparison of the processes of operant conditioning and classical conditioning, see: Comparison of classical and operant conditioning.
Operant conditioning includes:
Operant conditioning (also instrumental conditioning) is a common learning theory in which behaviour becomes controlled by its consequences. It is based on Edward Thorndike’s law of effect, which suggests that an individual is more likely to repeat behaviour with positive consequences, and less likely to repeat behaviour with negative consequences. This is also the basis of trial-and-error learning.
Skinner’s work involved the use of animals. He trained, for example, pigeons and rats to perform certain tasks (such as pressing a lever when a bell rang). Skinner used rewards such as food if the animal(s) performed the desired task(s), operating on the law of effect discussed earlier. Skinner found three elements of operant conditioning:
Together, these elements make up the S-R-C (stimulus – response – consequence) three-phase model of operant conditioning as told by B.F. Skinner.
Skinner eventually introduced technology into his studies to improve efficiency. He developed an operant chamber (Skinner’s Box), which revolutionised research into learning due to its enhanced capacities such as:
Below is an image of a setup similar to Skinner’s Box.
Operant conditioning is based on the principles of reinforcement and punishment. It is important to make the distinction between the two:
Positive reinforcement refers to a reward which strengthens the likelihood of a behaviour being repeated by providing a pleasant or satisfying consequence. For example, receiving a good grade (a pleasant consequence) after studying hard for a test (a behaviour) is likely to result in that behaviour being repeated for the next test.
It should be noted that negative reinforcement strengthens the likelihood of a response being repeated (remember the distinction between reinforcement and punishment). Negative reinforcement refers to a reward which strengthens the likelihood of a behaviour being repeated by removing an unpleasant or dissatisfying consequence. For example, the removal of a bad headache (a pleasant consequence) after using a particular type of medication (a behaviour) is likely to result in that behaviour being repeated the next time that individual has a headache.
Also positive punishment. Punishment refers to the provision of an unpleasant or dissatisfying consequence which weakens the likelihood of a behaviour being repeated. For example, having to pick up fifty pieces of rubbish (an unpleasant consequence) after littering (a behaviour) is likely to result in that behaviour not being repeated the next time that individual thinks about littering.
Also negative punishment. Response cost refers to the removal of a pleasant or satisfying consequence which weakens the likelihood of a behaviour being repeated. For example, an individual having their licence revoked (an unpleasant consequence) after speeding (a behaviour) is likely to result in that behaviour not being repeated the next that individual drives their car.
In his studies, Skinner tried to find the most efficient and effective ways of training through operant conditioning.
Continuous reinforcement – the simplest schedule – involves the desired response being reinforced every time it is displayed. For example, every time a dog rolls over when it is prompted to, it is given a food treat (positive reinforcement).
Also intermittent reinforcement. Conversely, partial reinforcement involves only some desired responses being reinforced. That is, the desired response is not reinforced every time it is displayed. There are numerous forms of partial reinforcement which should be considered, including fixed and variable interval schedules, and fixed and variable ratio schedules.
In a fixed interval schedule of reinforcement, reinforcement is delivered after a set time period, assuming that at least one desired response has been provided. For example, an employee of a large corporation may be paid $10 (reinforcement) every hour (a set time period) they work (the desired behaviour).
In a fixed ratio schedule of reinforcement, reinforcement is delivered after a set number of desired responses. For example, a student may be given $10 (reinforcement) every fifth time (a set number of desired responses) they score an A+ (the desired behaviour).
In a variable interval schedule of reinforcement, reinforcement is not provided with regular frequency. However, it occurs on the basis of a set average time interval. That is, for example, reinforcement may be provided on a five-minute variable interval schedule, first provided after two minutes, then nine minutes, then six minutes, then three minutes; there is variation in the time intervals between reinforcement, but there is an average of five minutes. These reinforcement sets assume that the desired behaviour is also being produced.
In a variable ratio schedule of reinforcement, reinforcement is not provided with regular frequency. However, it occurs on the basis of a set average number of desired responses. That is, for example, reinforcement may be provided on a twenty-response variable ratio schedule, first provided after fifteen desired responses, then after twenty-five, then after eighteen, then after twenty-two; there is variation in the intervals between reinforcement, but there is an average of twenty desired responses before reinforcement is given.
Below is a graph that shows the different pattern of each schedule of reinforcement.
Just as some schedules of reinforcement are easier to learn, particular schedules of reinforcement are easier to extinguish. In fact, there appears to be a correlation between the two. Continuous reinforcement, for example, where reinforcement is provided after every desired response, is very easy to learn, but also very easy to extinguish.
Also successive approximations. Shaping (by successive approximations) refers to a reinforcement procedure in which a reward is provided following responses that become closer and closer to the desired behaviour. Shaping is particularly useful when the desired behaviour is unlikely to occur naturally or after only a few trials. For example, shaping could be used to train a dog to collect the newspaper each morning. First, the dog would be rewarded for going to the door, then for going outside, then for going to the mailbox etc. until the desired behaviour was accomplished.
A token economy refers to a system of behaviour modification in which ‘tokens’ or other symbolic items are earned for performing target behaviours. These tokens can then be exchanged later for another reward. The tokens – or whatever is used in place of the tokens – are referred to as secondary reinforcers, as they are neutral stimuli associated with the primary reinforcers (whatever is exchanged for the tokens as reward). For example, a student may earn a token every time they arrive to class on time. Once they have earned ten tokens, they may be able to make an exchange for a chocolate bar (the primary reinforcer).
Acquisition involves the strengthening of a response (either positive or negative) due to reinforcement or punishment, leading to an increased likelihood of a particular behaviour occurring. For example, a dog may acquire the behaviour to roll over due to consistent reinforcement of food.
In operant conditioning, extinction occurs when reinforcement is removed following the associated behaviour. For example, a dog’s learned behaviour of rolling over may be extinguished if food presentation is discontinued.
Stimulus generalisation occurs when the desired response is shown following stimuli which is similar to the stimuli that were reinforced. For example, if a student learns to arrive early to class due to positive reinforcement, stimulus generalisation would occur if they also consistently arrived early to university the next year (where there was no reinforcement).
Stimulus discrimination occurs when the desired response is only shown following the specific stimulus which was enforced. For example, a student may only stop talking and listen (the desired response) to the particular teacher who had reinforced such a behaviour, and no other teacher.
Spontaneous recovery refers to the reappearance of a previously extinguished response. Generally, the response will be re-learned more quickly than it was initially learned. Spontaneous recovery follows a period of time in which the response did not occur. In operant conditioning, spontaneous recovery will occur when a behaviour is reinforced again after a period of delay.
In operant conditioning, the learner is active (they consciously choose to behave in a way to either acquire a reward or avoid punishment).
In operant conditioning, reinforcement occurs after the response (as a consequence to the behaviour).
In operant conditioning, the response of the learner is non-reflexive and voluntary.
Want to suggest an edit? Have some questions? General comments? Let us know how we can make this resource more useful to you.