Using Structured Preference Assessment

Using Structured Preference Assessment on the Bridge River Water Use Plan (WUP)

BC Hydro 2003

Context

Water use planning is a multi-stakeholder multi-objective planning process to examine re-allocations of water at hydroelectric facilities in British Columbia, Canada. At BC Hydro’s Bridge River facilities, 24 stakeholders representing the provincial treasury board, provincial and federal fish and wildlife regulators, local residents and aboriginal communities, participated in a structured decision process to examine alternative ways of operating the facilities to better balance power, fish, wildlife, water quality and recreation interests. Over a period of 18 months, participants set objectives and attributes, and identified and evaluated alternatives. Alternatives were screened by iteratively refining them to find joint gains and eliminate dominated alternatives.

Objectives and Attributes

Participants began by setting objectives and attributes. A set of sixteen performance attributes was defined and used in preliminary analyses, later reduced to the following ten:

Objective Location Evaluation Criteria / Units
Minimize Flood Damage ALL Expected Flood Days per Year
Maximize Fish Welfare Lower Bridge River Constructed Scale
Seton Reservoir Constructed Scale
Dowton Reservoir Index*
Carpenter Reservoir Index*
Maximize Water Quality ALL Dissolved Solids, Tonnes per year
Maximize Vegetation Welfare Dowton Reservoir Vegetation, Weighted ha
Seton Reservoir Vegetation, Weighted ha
Carpenter Reservoir Vegetation, Weighted ha
Maximize Power Revenues ALL Revenue, Dollars per year (Millions)

* Fish impacts were represented via an index that incorporated a variety of technical indicators developed by experts (see below).

Numerous information gaps were identified. The attributes were used to prioritize proposed studies (discussed further below), a task that was conducted collaboratively by the committee. Upon completion of the studies, the attributes were refined and used to assess the first two rounds of alternatives. As the process progressed, the list of active attributes was iteratively reduced in size. Elimination or modification of attributes occurred if they were found to be:

  1. of low importance (which occurred, for example, when field studies provided baseline information that stressors initially thought to be important were in fact not significant);
  2. insensitive to the alternatives (as the set of alternatives was refined some attributes were no longer sensitive and thus no longer useful in discriminating among them), or;
  3. strongly correlated with others (such that one could be used as a proxy for another).

The design of attributes was an intensive process. Care was taken to ensure that they could, if necessary and helpful, be used in formal preference assessment. Indices or constructed scales (Keeney and Gregory, 2005) were particularly useful. For example, four proxy attributes for impacts on fish (littoral productivity, tributary access, entrainment and stranding) were individually weighted and combined into a normalized index indicating the overall utility for fish of each operating alternative. These constructed scales, initially awkward, became familiar to participants through repeated use and facilitated the informed participation of non-technical stakeholders.

Alternatives

Beginning with the objectives, and using a value-focused thinking approach (Keeney, 1990), participants brainstormed alternatives that would best meet the objectives. The design of operating alternatives centered on one of the three reservoirs – Carpenter Reservoir. The preliminary alternatives focused on one or two objectives – for example, one alternative was designed to maximize the fish and wildlife (vegetation) objectives in Carpenter Reservoir, without paying much attention to the consequences for other objectives. Another was designed to maximize power production revenues, without concern for ecological impacts. These “book-end” alternatives were not truly being proposed as viable solutions, but they played an important educational role in allowing participants to express what they valued, design an alternative they thought could deliver it, and then learn about the implications of that choice on other objectives.

Consequences

These preliminary alternatives were designed in detail, and simulated using facility operation models, revenue models and ecological models. Their impacts were estimated on all the attributes (fig. x simple figure showing models used). Alternatives were iteratively eliminated if they were ineffective (i.e., detailed modeling showed that the “obvious solutions” were not as good as they were thought to be), they were dominated (i.e., they were outperformed or nearly outperformed on all attributes by some other alternative) or they were subsequently refined (desirable elements of one alternative were combined with desirable elements of another to create hybrid alternatives that better met multiple objectives).

Some alternatives could not be combined; they were physically incompatible – the reservoir cannot be simultaneously raised to improve fish habitat and lowered to improve riparian vegetation. Thus the committee worked its way through an iterative process of refining alternatives to seek joint gains, exposing in the process fundamental trade-offs, or choices that they would have to make. The alternatives were both science based (for example, some took advantage of apparent break points in ecological responses revealed by detailed modeling) and value-based (in that they focused on the endpoints that people cared about and sought creative ways to improve them). Ultimately, six distinctly different alternatives were short-listed, representing fundamentally different ways to operate the facilities, and exposing irreducible trade-offs among the objectives. At this point all the technical improvements (joint gains) that could be made had been made. The choice between them was value-based, and depended on how individuals weighted the gains and losses for vegetation, fish and power (see the consequence table below).

Apps_BCHBR1

Evaluation and Selection

Participants were then faced with a complex decision problem: multiple alternatives, multiple performance measures, and difficult trade-offs. Structured preference assessments were introduced to provide insight to the deliberations. In addition to the top-down (direct ranking) method, two weighting methods were used for the bottom up approach: swing weights and pairwise comparisons. These methods were selected because:

  1. they have a strong theoretical basis and are technically defensible (rooted in Multi Attribute Utility Theory – MAUT);
  2. they are easy to understand and easy to process, with a quick turnaround time; and
  3. they produce results in a format that supports constructive deliberations.

Once all the impacts reported in the consequence table had been discussed, participants completed a questionnaire designed to assess their preferences using the three different methods. The responses were then entered into a spreadsheet based decision model, which in turn computed scores, compared rankings and generated outputs for each person as well as for the group as a whole.

Almost certainly, different methods would have produced slightly different results. However, the goal was to gain confidence in individual judgments, gain insight into controversial trade-offs, and trigger constructive dialogue, not to use the calculated scores to prescribe a solution. The specific weights that were assigned were less important than the quality of the dialogue they induced.

Participants completed a questionnaire for each method, and the results processed overnight (although it’s possible to process them within an hour). Participants were provided with print-outs of their own results and some of the key group results. Here we will discuss three particularly useful ways to use the results to facilitate helpful dialogue and reflection: a) the consistency of individual results across methods; b) areas of similarity and difference in attribute weights assigned by the group, and c) a comparison of ranks assigned to the alternatives by the group across methods.

It is useful to begin by exploring the ranks assigned by individuals by the different methods. The image below provides an example.

Apps_BCHBR2

This compares the ranks assigned by one individual via the direct method with the ranks computed by the swing weighting method. Options ranked the same by both methods fall on the 45 degree line. Options that fall far from the 45 degree line should trigger a re-examination of that alternative by the stakeholder.

For example, we see that this stakeholder’s ranks are quite consistent across the two methods except for Option L2 and M2. Option L2 is ranked very low by the direct method, but is ranked number one by the weighted method. The opposite is true for M2. While this does not necessarily mean that the direct rank is wrong, it may indicate any of a number of problems, such as: mixing up the options in the direct ranking (common when there are many options); overlooking some elements of performance in the direct ranking (common when there are many attributes); overlooking options that are less controversial or less visible (common when there are polarized positions for or against other options which lead them to dominate discussions).

Alternatively the direct ranking may be a more accurate reflection of the stakeholder’s values if the attributes do not adequately capture all the important elements of performance. The intent of the multi-method approach is therefore not to say that one method is better than another, but to expose inconsistencies, clarify the rationale for choices, and improve the transparency and accountability of choices.

The results can also be used to explore differences in the weights assigned by different participants. Figure 4 shows the resulting weights for one participant by the swing weighting approach.

Apps_BRWUP3

The attributes are shown across the bottom with the weights on the vertical axis. The markers represent the weights for this particular committee member and the vertical line represents the range of weights for all stakeholders. This chart was essential in highlighting productive areas of dialogue. From this figure, we see that there is a high degree of disagreement about the importance assigned to the Flood, Water Quality and Power criteria. We invited people to talk about these criteria; without naming names, we asked for volunteers to speak about why they thought these criteria were important, and/or why they thought they were not. We quickly discovered that several people had misunderstood the Flood criterion (thinking it represented a major dam breach rather than a modest periodic inundation of scattered facilities). This led to revision of weights. Discussions about water quality were equally productive. We learned that some participants believed in strong links between water quality and human health effects (having little confidence in existing analysis suggesting no effects) and had lower risk tolerance – the cost of being wrong for these people was greater than for others. This insight led ultimately to the prescription of a monitoring program to test the hypotheses on which the existing analysis was based.

In sum, this exploration of weights helped deliberations by diagnosing areas of agreement and difference and provided a focus for productive discussion. It exposed factual errors, value differences, risk tolerances, and key uncertainties, which in some cases affected monitoring priorities.

Finally, once individual choices and group weights have been discussed and modified if necessary, group ranks or preferences can be explored. Fix 5 summarizes the ranks assigned by stakeholders to each option by each method. Options ranked 1 or 2 are colored green, 3 or 4 are yellow, and 5 and 6 are red. Figure 5 shows the committee members across the top (numbered for anonymity) and alternatives down the side. For each alternative, a ranking is shown for each of the Swing, Paired Comparison and Direct methods. These results led the committee to focus on the N2 and L2 alternatives, and ultimately to request the project team to develop a final alternative that combined elements of these two alternatives along with a mitigation project to improve the vegetation performance.

Apps_BCHPortB4

Conclusions

It is important to emphasize at this point that this kind of preference assessment only works as part of a structured decision making process. Some key messages:

Preferences must be assessed in the context of specific choices. If we had asked participants “ which is more important, vegetation or fish,” we would have been sent packing. Both were important. Or we would have heard the official position of the people and agencies represented. Neither would have been helpful. General statements of priority are all but useless in a decision context. Only preferences stated with reference to specific trade-offs are valuable.

People learn and preferences change through the course of deliberation. Many of the participants in the Bridge WUP came to the process driven by a firm desire to see improvements in the riparian vegetation on Carpenter Reservoir. But in the end they unanimously supported a policy option that left the majority of this drawdown zone denuded, not just grudgingly accepting it, but endorsing it as a good policy decision. They did this because they had come to understand, after a comprehensive search for creative alternatives, that they could not enhance both the reservoir and downstream river, and they chose to enhance the downstream river. This lesson is profoundly important for decision makers. It is an all too common error to reject options based on the presumption that stakeholders will reject it. Stakeholders make surprising choices when they are truly engaged in the decision process.

Preferences depend on the alternatives that are presented. The status quo may be “acceptable” if the next best alternative involves large costs or other trade-offs. It may not be acceptable if there is a low cost alternative that virtually eliminates risk.

Preference assessment should be used to provide insight, not to prescribe answers. We emphasized that it would be used to make sure they understood the performance of the options, to think about their choices in different ways, and to focus group discussions on productive areas. From a very practical perspective, if we had introduced the preference assessment process as a means of finding the right answer, people would have rejected it outright. From a more theoretical perspective, given sensitivities to analyst choices and methodological bias, there simply is no “right ” answer.

It is reasonable to have some “quality” expectations with respect to stakeholder values. While it is clearly inappropriate to draw conclusions about whether value judgments are right or wrong, we can expect value judgments to be clear, consistent and explicit about trade-offs. They are not right or wrong, but some are more useful and defensible than others.

With this example, we have reported the results of the Bridge River WUP in some detail. However, similar methods were used on twenty facilities conducting water allocation reviews under BC’s water use planning process. Interestingly, while consensus was not a requirement of the process, this structured decision process, often using explicit preference assessment tools, led to consensus at 19 of 20 facilities. We believe that provided such methods are used to facilitate decision-focused value-based discussions they can be instrumental in improving the efficiency and the quality of stakeholder deliberations. Any attempts to use these methods to impose a mathematical solution to complex value-laden questions would not be helpful.