Operant conditioning

For a comparison of the processes of operant conditioning and classical conditioning, see: Comparison of classical and operant conditioning.

Operant conditioning includes:

Operant conditioning (also instrumental conditioning) is a common learning theory in which behaviour becomes controlled by its consequences. It is based on Edward Thorndike’s law of effect, which suggests that an individual is more likely to repeat behaviour with positive consequences, and less likely to repeat behaviour with negative consequences. This is also the basis of trial-and-error learning.

Three-phase model of operant conditioning and B.F. Skinner

Skinner’s work involved the use of animals. He trained, for example, pigeons and rats to perform certain tasks (such as pressing a lever when a bell rang). Skinner used rewards such as food if the animal(s) performed the desired task(s), operating on the law of effect discussed earlier. Skinner found three elements of operant conditioning:

  • Stimulus (such as a bell ringing)
  • Response (such as the pigeon or rat pressing a lever)
  • Consequence (such as the pigeon or rat being rewarded with food)

Together, these elements make up the S-R-C (stimulus – response – consequence) three-phase model of operant conditioning as told by B.F. Skinner.

Skinner’s Box

Skinner eventually introduced technology into his studies to improve efficiency. He developed an operant chamber (Skinner’s Box), which revolutionised research into learning due to its enhanced capacities such as:

  • A mean of providing stimuli (lights or sounds)
  • A mean of recording the animal’s response (response lever)
  • A mean of providing reward (food) or punishment (electric shock from the electrified grid)
  • A mean of keeping an automatic, cumulative record of the animal’s reponses

Below is an image of a setup similar to Skinner’s Box.

Skinner's Box

Elements of operant conditioning

Operant conditioning is based on the principles of reinforcement and punishment. It is important to make the distinction between the two:

  • Reinforcement is any consequence that strengthens the likelihood of a behaviour being repeated
  • Punishment is any consequence that weakens the likelihood of a behaviour being repeated

Positive reinforcement

Positive reinforcement refers to a reward which strengthens the likelihood of a behaviour being repeated by providing a pleasant or satisfying consequence. For example, receiving a good grade (a pleasant consequence) after studying hard for a test (a behaviour) is likely to result in that behaviour being repeated for the next test.

Negative reinforcement

It should be noted that negative reinforcement strengthens the likelihood of a response being repeated (remember the distinction between reinforcement and punishment). Negative reinforcement refers to a reward which strengthens the likelihood of a behaviour being repeated by removing an unpleasant or dissatisfying consequence. For example, the removal of a bad headache (a pleasant consequence) after using a particular type of medication (a behaviour) is likely to result in that behaviour being repeated the next time that individual has a headache.

Punishment

Also positive punishment. Punishment refers to the provision of an unpleasant or dissatisfying consequence which weakens the likelihood of a behaviour being repeated. For example, having to pick up fifty pieces of rubbish (an unpleasant consequence) after littering (a behaviour) is likely to result in that behaviour not being repeated the next time that individual thinks about littering.

Response cost

Also negative punishment. Response cost refers to the removal of a pleasant or satisfying consequence which weakens the likelihood of a behaviour being repeated. For example, an individual having their licence revoked (an unpleasant consequence) after speeding (a behaviour) is likely to result in that behaviour not being repeated the next that individual drives their car.

Schedules of reinforcement

In his studies, Skinner tried to find the most efficient and effective ways of training through operant conditioning.

Continuous reinforcement

Continuous reinforcement – the simplest schedule – involves the desired response being reinforced every time it is displayed. For example, every time a dog rolls over when it is prompted to, it is given a food treat (positive reinforcement).

Partial reinforcement

Also intermittent reinforcement. Conversely, partial reinforcement involves only some desired responses being reinforced. That is, the desired response is not reinforced every time it is displayed. There are numerous forms of partial reinforcement which should be considered, including fixed and variable interval schedules, and fixed and variable ratio schedules.

Fixed interval schedule

In a fixed interval schedule of reinforcement, reinforcement is delivered after a set time period, assuming that at least one desired response has been provided. For example, an employee of a large corporation may be paid $10 (reinforcement) every hour (a set time period) they work (the desired behaviour).

Fixed ratio schedule

In a fixed ratio schedule of reinforcement, reinforcement is delivered after a set number of desired responses. For example, a student may be given $10 (reinforcement) every fifth time (a set number of desired responses) they score an A+ (the desired behaviour).

Variable interval schedule

In a variable interval schedule of reinforcement, reinforcement is not provided with regular frequency. However, it occurs on the basis of a set average time interval. That is, for example, reinforcement may be provided on a five-minute variable interval schedule, first provided after two minutes, then nine minutes, then six minutes, then three minutes; there is variation in the time intervals between reinforcement, but there is an average of five minutes. These reinforcement sets assume that the desired behaviour is also being produced.

Variable ratio schedule

In a variable ratio schedule of reinforcement, reinforcement is not provided with regular frequency. However, it occurs on the basis of a set average number of desired responses. That is, for example, reinforcement may be provided on a twenty-response variable ratio schedule, first provided after fifteen desired responses, then after twenty-five, then after eighteen, then after twenty-two; there is variation in the intervals between reinforcement, but there is an average of twenty desired responses before reinforcement is given.

Below is a graph that shows the different pattern of each schedule of reinforcement.

  • VR = variable ratio
  • FR = fixed ratio
  • VI = variable interval
  • FI = fixed interval
Schedule_of_reinforcement

So remember:

  • Fixed = a set amount of time, or a set number of desired responses
  • Variable = a varied amount of time, or a varied number of desired responses (centred on an average)
  • Interval = time
  • Ratio = number of responses

Extinction of schedules of reinforcement

Just as some schedules of reinforcement are easier to learn, particular schedules of reinforcement are easier to extinguish. In fact, there appears to be a correlation between the two. Continuous reinforcement, for example, where reinforcement is provided after every desired response, is very easy to learn, but also very easy to extinguish.

Applications of operant conditioning

Shaping

Also successive approximations. Shaping (by successive approximations) refers to a reinforcement procedure in which a reward is provided following responses that become closer and closer to the desired behaviour. Shaping is particularly useful when the desired behaviour is unlikely to occur naturally or after only a few trials. For example, shaping could be used to train a dog to collect the newspaper each morning. First, the dog would be rewarded for going to the door, then for going outside, then for going to the mailbox etc. until the desired behaviour was accomplished.

Token economies

A token economy refers to a system of behaviour modification in which ‘tokens’ or other symbolic items are earned for performing target behaviours. These tokens can then be exchanged later for another reward. The tokens – or whatever is used in place of the tokens – are referred to as secondary reinforcers, as they are neutral stimuli associated with the primary reinforcers (whatever is exchanged for the tokens as reward). For example, a student may earn a token every time they arrive to class on time. Once they have earned ten tokens, they may be able to make an exchange for a chocolate bar (the primary reinforcer).

Processes of operant conditioning

Acquisition

Acquisition involves the strengthening of a response (either positive or negative) due to reinforcement or punishment, leading to an increased likelihood of a particular behaviour occurring. For example, a dog may acquire the behaviour to roll over due to consistent reinforcement of food.

Extinction

In operant conditioning, extinction occurs when reinforcement is removed following the associated behaviour. For example, a dog’s learned behaviour of rolling over may be extinguished if food presentation is discontinued.

Stimulus generalisation

Stimulus generalisation occurs when the desired response is shown following stimuli which is similar to the stimuli that were reinforced. For example, if a student learns to arrive early to class due to positive reinforcement, stimulus generalisation would occur if they also consistently arrived early to university the next year (where there was no reinforcement).

Stimulus discrimination

Stimulus discrimination occurs when the desired response is only shown following the specific stimulus which was enforced. For example, a student may only stop talking and listen (the desired response) to the particular teacher who had reinforced such a behaviour, and no other teacher.

Spontaneous recovery

Spontaneous recovery refers to the reappearance of a previously extinguished response. Generally, the response will be re-learned more quickly than it was initially learned. Spontaneous recovery follows a period of time in which the response did not occur. In operant conditioning, spontaneous recovery will occur when a behaviour is reinforced again after a period of delay.

Role of the learner

In operant conditioning, the learner is active (they consciously choose to behave in a way to either acquire a reward or avoid punishment).

Timing of stimulus and response

In operant conditioning, reinforcement occurs after the response (as a consequence to the behaviour).

Nature of response

In operant conditioning, the response of the learner is non-reflexive and voluntary.