Best-Worst Scaling: A Practitioner's Guide
What Is Best-Worst Scaling?
Best-worst scaling (BWS) is a survey methodology where respondents repeatedly identify the best and worst options from sets of items, producing ratio-scaled preference data that reveals clear priority rankings. Jordan Louviere introduced the method in 1987 and formalized it through the 1990s as an alternative to rating scales and paired comparisons.
The terms "best-worst scaling" and "MaxDiff" are often used interchangeably, but there's a technical distinction. BWS is the broader methodology with three distinct cases. MaxDiff specifically refers to Case 1 (object scaling). In market research practice, most people say "MaxDiff" when they mean Case 1 and "best-worst" when discussing the methodology in general.
The Three Cases of Best-Worst Scaling
Case 1: Object Scaling (MaxDiff)
Respondents see a set of objects (features, messages, attributes, brands) and pick the best and worst. Every item in the set is a complete, standalone object.
Example: "Which of these features is most important to you? Which is least important?"
- Real-time collaboration
- Offline access
- Custom reporting
- API access
This is MaxDiff analysis. It's the most common case by far, accounting for the vast majority of commercial BWS studies.
Output: A ratio-scaled ranking of all items. Feature A scores 15 and Feature B scores 5 means A is 3x preferred.
Best for: Feature prioritization, message testing, brand attribute ranking, any list of 10-30 items you need to rank.
Case 2: Profile Case (Attribute-Level BWS)
Respondents see a single product profile (a combination of attribute levels, like a conjoint profile) and pick the best and worst attributes within that profile. This measures which attributes of a specific product drive preference.
Example: Given this hotel room:
- Price: $199/night
- Location: Downtown
- Breakfast: Included
- Pool: No pool
- WiFi: Free high-speed
"Which feature is the most appealing? Which is least appealing?"
Output: Importance scores for each attribute level, similar to part-worth utilities in conjoint, but measured through explicit best/worst choices within a profile rather than inferred from profile-to-profile comparisons.
Best for: Understanding which aspects of a specific product concept are most and least attractive. Useful in concept refinement when you want to know not just whether people like a product, but which parts they like and dislike.
Case 3: Multi-Profile Case (BWS-DCE)
Respondents see multiple complete product profiles (like a standard conjoint task) and pick the best and worst profiles. This is a discrete choice experiment where respondents evaluate two extremes instead of just picking one preferred option.
Example: Three hotel packages are shown, each with different price/location/amenity combinations. "Which would you most want to book? Which would you least want to book?"
Output: Part-worth utilities and relative importance, similar to conjoint but with additional information from the "worst" choice. Research suggests Case 3 BWS extracts 50-100% more information per task than standard CBC because two data points (best and worst) are collected per task rather than one.
Best for: Studies where you want conjoint-like trade-off data but need more statistical efficiency per respondent. Particularly useful with small samples or complex designs.
Case 1 (MaxDiff) in Detail
Since Case 1 dominates commercial practice, it's worth understanding the mechanics more deeply.
Experimental Design
The survey design determines which items appear together in each set. A good design ensures:
- Each item appears an equal number of times across all sets
- Each pair of items appears together roughly equally
- Items are rotated so no respondent sees the same combinations in the same order
For a study with 20 items shown 4 per set across 15 sets, the design creates 15 unique groupings where each item appears 3 times. The design is generated algorithmically (balanced incomplete block design) and most platforms handle this automatically.
Analysis Methods
Count analysis (simplest): For each item, count how many times it was chosen as "best" and subtract the times it was chosen as "worst." Divide by the number of times it appeared. This produces a simple score from -1 to +1 for each item. Fast, intuitive, and requires no specialized software, but only gives aggregate results.
Multinomial logit: A statistical model that estimates utility scores for each item based on the probability of being chosen as best or worst in each set. Produces aggregate-level utilities with standard errors. More rigorous than counting but still only group-level.
Hierarchical Bayes (HB): The current standard for commercial MaxDiff. Produces individual-level utility scores for every respondent, enabling segment analysis, latent class modeling, and individual-level predictions. Requires 200+ respondents for stable estimates.
When Count Analysis Is Enough
For quick-turnaround studies where you just need the top-line ranking ("which 5 features are most important?"), count analysis gives you the answer in minutes with no specialized software. The ranking from count analysis almost always matches the HB ranking at the aggregate level. HB adds value when you need to compare segments or cluster respondents by preference patterns.
Case 2 in Practice
Case 2 BWS is less common but fills a specific gap that neither MaxDiff nor conjoint addresses well.
In a standard conjoint study, you learn which attributes are important through inferred trade-offs. You never ask respondents directly "what's the best part of this product?" Case 2 BWS does exactly that: shows a product configuration and asks which feature is the highlight and which is the weakness.
This is particularly useful for:
- Product concept diagnostics: After a conjoint study identifies the winning product configuration, use Case 2 to understand why it won. Which specific features are driving preference?
- Advertising feedback: Show a proposed ad and ask which element (headline, image, claim, offer) is most compelling and which is weakest.
- Patient experience: Present a treatment profile and ask which aspect is most and least acceptable.
Case 2 requires more careful design than Case 1 because the profiles themselves need to be meaningful. Use it alongside other methods, not as a standalone.
Case 3 vs Standard Conjoint
Case 3 BWS-DCE is the most methodologically complex variant. It uses the same multi-profile choice task structure as conjoint but collects two responses per task (best and worst profile) instead of one (most preferred).
The research on Case 3 suggests it produces:
- More stable individual-level estimates at smaller sample sizes
- Better discrimination between close alternatives
- Comparable or better predictive validity versus standard CBC
The trade-off is respondent burden. Picking both a best and worst profile per task takes longer than picking just one. For studies with tight time constraints, standard CBC may be more practical. For studies where sample is expensive or limited (healthcare, B2B niche audiences), Case 3 gives you more data per respondent.
Choosing the Right Case
| Question | Use This Case |
|---|---|
| Which items matter most from a list? | Case 1 (MaxDiff) |
| What's the best/worst part of a specific product? | Case 2 (Profile) |
| Which product configuration is preferred and why? | Case 3 (Multi-Profile) |
| Simple feature ranking with clear priorities? | Case 1 |
| Diagnostic feedback on a product concept? | Case 2 |
| Conjoint-like trade-offs with a small sample? | Case 3 |
For most commercial research projects, Case 1 (MaxDiff) is the right starting point. It's simpler to design, easier to analyze, and answers the most common prioritization questions. Move to Case 2 or 3 when you have specific diagnostic or efficiency needs that Case 1 can't address.
Frequently Asked Questions
Is best-worst scaling the same as MaxDiff?
MaxDiff is Case 1 of best-worst scaling. In everyday research practice, most people use the terms interchangeably because Case 1 is by far the most common application. The distinction matters in academic literature, where BWS Cases 2 and 3 have distinct methodological properties.
Which case should I use for my first BWS study?
Start with Case 1 (MaxDiff). It's the simplest to design, the most widely supported by software, and addresses the most common research question: "Which items matter most?" Cases 2 and 3 are for specific situations where you need concept diagnostics or conjoint-like efficiency with smaller samples.
Can I combine different BWS cases in one study?
Yes. A common design uses Case 1 MaxDiff to prioritize a list of features, then Case 2 to diagnose why a specific product concept performs well or poorly. The two exercises measure different things and don't conflict.
What sample size does each case need?
Case 1: 150-200 for aggregate, 300+ for segment analysis. Case 2: 200-300 (similar to Case 1 but with more complex designs). Case 3: 150-250 (more efficient per respondent than standard conjoint).
Related Guides
- MaxDiff Analysis: Complete Guide -- Detailed Case 1 methodology
- How to Design a MaxDiff Survey -- Experimental design and fielding
- MaxDiff vs Conjoint -- When to use each method
- Conjoint Analysis -- Multi-attribute trade-off methodology
- MaxDiff Survey Template -- Ready-to-use Case 1 template
- Likert Scale -- The rating-scale alternative to BWS
Run best-worst scaling studies -- try Quali-Fi free for 14 days.