MaxDiff Analysis: Complete Guide for Researchers
What Is MaxDiff Analysis?
MaxDiff analysis (maximum difference scaling) is a survey-based research method that identifies how people prioritize a list of items by repeatedly asking them to pick the most and least important options from small subsets. Instead of rating every item on a scale (where everything clusters around "important"), MaxDiff forces trade-offs that reveal genuine preferences with clear separation between items.
The method is also called best-worst scaling (BWS), and the two terms are used interchangeably in practice. Jordan Louviere developed the approach in the early 1990s as a response to the known limitations of rating scales. It's since become standard for feature prioritization, message testing, brand attribute ranking, and any research question that asks "which of these matters most?"
Why MaxDiff Beats Rating Scales
Rating scales have a well-documented problem: respondents tend to rate everything as important. Ask 500 people to rate 20 product features on a 1-10 scale, and you'll get 15 features clustered between 7 and 9. That data doesn't help you decide what to build next.
Microsoft encountered this exact issue when prioritizing features for Windows. Likert importance ratings returned 85%+ of features rated as "important" or "very important," providing no meaningful differentiation. After switching to MaxDiff, they got a clear priority ranking that actually informed the product roadmap.
MaxDiff produces ratio-scaled scores. If Feature A scores 15 and Feature B scores 5, you can say Feature A is three times as preferred. You can't make that claim with Likert data. MaxDiff also eliminates scale-use bias (some respondents use the top of the scale for everything, others bunch toward the middle), which makes cross-cultural and cross-segment comparisons reliable.
When to Use MaxDiff
| Use MaxDiff When... | Don't Use MaxDiff When... |
|---|---|
| You need to rank/prioritize a list of 10-30 items | You need to understand trade-offs between feature combinations (use conjoint) |
| You want clear differentiation between items | You need absolute satisfaction or agreement scores |
| You're comparing preferences across segments or markets | Your list has fewer than 7 items (simple ranking works) |
| You need ratio-scaled data (A is 3x preferred to B) | You need historical comparability with existing Likert data |
| Budget or sample size is limited | You need to calculate willingness to pay |
Common applications:
- Feature prioritization: Which features should the product team build next?
- Message testing: Which value propositions resonate most with each audience segment?
- Brand attribute importance: Which brand attributes drive purchase decisions?
- Employee engagement drivers: Which workplace factors matter most to retention?
- Advertising claims testing: Which claims are most compelling and which fall flat?
How MaxDiff Works
The Respondent Experience
A respondent sees a set of 4-5 items drawn from a larger list. They pick the one that's "most important" (or most preferred, most appealing) and the one that's "least important." Then they see a new set with different items and repeat the process.
A typical MaxDiff exercise shows 10-15 sets. The experimental design ensures each item appears an equal number of times across all sets and is paired with every other item roughly equally, so the analysis can produce a clean ranking.
What Happens Behind the Scenes
Each best/worst choice provides two data points: one positive (the "best" pick) and one negative (the "worst" pick). Across many sets, items that consistently get picked as "best" accumulate high scores, and items that consistently get picked as "worst" accumulate low scores.
The analysis (typically hierarchical Bayesian estimation for individual-level scores, or simple counting methods for aggregate scores) produces a utility score for each item on a ratio scale anchored from 0 to 100, where the scores represent each item's share of total preference.
How to Design a MaxDiff Study
Step 1: Define Your Item List
Start with 10-30 items. Fewer than 10 and you're not getting much value over a simple ranking question. More than 30 and the survey gets long because each item needs to appear enough times for reliable estimation.
Items should be at the same level of abstraction. Don't mix specific features ("dark mode") with broad categories ("better user experience"). Each item should be understandable in a few words without additional context.
Step 2: Set the Number of Items Per Set
Show 4-5 items per set. Four is the most common default and works well for most studies. Five items per set collects slightly more information per task but adds cognitive load. Going above 5 per set is rare and generally not recommended.
Step 3: Determine the Number of Sets
Each item should appear at least 3 times across a respondent's sets (more is better). Use this formula as a starting point:
Minimum sets = (number of items x 3) / items per set
For 20 items shown 4 per set: 20 x 3 / 4 = 15 sets. That's a reasonable respondent burden. For 30 items: 30 x 3 / 4 = 22.5, which pushes toward the upper limit of what respondents will tolerate. Keep total sets at or below 20 when possible.
Step 4: Generate the Experimental Design
The design determines which items appear together in each set. It needs to balance two things: every item appears an equal number of times, and every pair of items appears together roughly equally across all sets.
Most MaxDiff software generates balanced incomplete block designs automatically. Check that no item is over- or under-represented, and that no pair of items always appears together (which would confound their effects).
Step 5: Choose Your Scale Framing
The question wording matters. "Most important / Least important" is the default, but you can adapt:
- "Most appealing / Least appealing" for message testing
- "Most likely to influence purchase / Least likely" for feature studies
- "Best describes [brand] / Least describes [brand]" for brand perception
Use framing that matches how respondents naturally think about the items.
Step 6: Field and Analyze
Launch to your target sample (200+ respondents for aggregate results, 200+ per segment for segment comparisons). Run HB estimation for individual-level scores or simple counting for quick aggregate results.
Sample Size Requirements
MaxDiff is less sample-hungry than conjoint analysis because each choice task is simpler:
| Analysis Level | Recommended Sample |
|---|---|
| Aggregate ranking (overall priorities) | 150-200 |
| Segment-level comparison (2-3 segments) | 200+ per segment |
| Individual-level scores (latent class, clustering) | 300-500 |
With fewer than 100 respondents, aggregate count-based analysis still produces a usable ranking, but you won't have the precision for segment splits or individual-level modeling.
For more details, see the MaxDiff sample size guide.
How to Interpret MaxDiff Results
Utility Scores
The primary output is a utility score for each item, typically rescaled to sum to 100 across all items. Higher scores mean stronger preference.
Example output from a SaaS feature prioritization study (20 features tested, top 10 shown):
| Feature | Utility Score |
|---|---|
| Real-time collaboration | 12.4 |
| Offline access | 9.8 |
| Custom reporting | 8.7 |
| API access | 7.5 |
| Mobile app | 7.1 |
| SSO/SAML | 6.3 |
| Slack integration | 5.2 |
| Dark mode | 4.8 |
| Custom branding | 3.9 |
| Gantt charts | 3.1 |
Because MaxDiff produces ratio-scaled data, you can say real-time collaboration (12.4) is roughly 4x as preferred as Gantt charts (3.1). That's a meaningful quantitative statement you can't make with rating-scale data.
Segment Comparisons
The real power of MaxDiff emerges when you compare scores across segments. Enterprise buyers might rank API access and SSO at the top while SMB buyers prioritize mobile app and real-time collaboration. These differences shape your product roadmap and go-to-market messaging for each audience.
Threshold Analysis
Look for natural break points in the score distribution. Often you'll see a cluster of high-priority items, a middle group, and a tail of low-priority items. The gaps between clusters tell you where the meaningful priority thresholds sit.
For more on reading MaxDiff output, see the interpretation guide.
Real-World Examples
SaaS: Feature Roadmap Prioritization
A project management SaaS company tested 25 potential features with 400 current users. The MaxDiff revealed that real-time collaboration and offline access dominated the top, while features the product team had been discussing for months (custom themes, advanced permissions) scored in the bottom quartile. The company reprioritized two engineering sprints based on the results, accelerating collaboration features by a full quarter.
CPG: Packaging Claim Testing
A snack brand tested 15 front-of-pack claims (organic, non-GMO, high protein, low sugar, locally made, etc.) with 500 grocery shoppers. "High protein" and "low sugar" scored 2.5x higher than "locally made" and "non-GMO," despite internal marketing assumptions that sustainability claims would lead. The brand redesigned their packaging hierarchy to lead with nutritional claims.
Healthcare: Treatment Attribute Importance
A hospital system tested 18 attributes of outpatient care experience with 300 patients. Wait time and provider communication topped the list. Parking availability and check-in technology, which the system was investing heavily in, ranked 15th and 16th. The results shifted capital allocation from facility upgrades to staffing and scheduling improvements.
Common Mistakes
Too many items. Beyond 30 items, the survey becomes tedious and the design requires too many sets. If you have 40+ items, pre-screen them with a qualitative phase and reduce to 25-30 for the MaxDiff.
Mixing abstraction levels. "Better UI design" and "Fix the login bug on Safari" shouldn't be in the same MaxDiff. Items need to be comparable in scope.
Ignoring the "worst" data. The least-preferred items are as informative as the most-preferred ones. They tell you what to de-prioritize, which is sometimes the more valuable finding.
Using MaxDiff when you need trade-offs. MaxDiff ranks items independently. It can't tell you "how much more would customers pay for Feature A vs. Feature B." For trade-off analysis, use conjoint.
Insufficient items per set. Showing only 3 items per set wastes respondent effort. Each task only provides one "best" and one "worst" data point regardless of set size, so 4-5 items per set gives you the same information cost with better design efficiency.
MaxDiff vs Alternatives
| Feature | MaxDiff | Likert Scale | Ranking | Constant Sum | Conjoint |
|---|---|---|---|---|---|
| Best for | Prioritizing 10-30 items | Measuring agreement/satisfaction | Simple ordering of 5-10 items | Allocating importance across 5-8 items | Feature trade-off analysis |
| Scale type | Ratio | Ordinal (treated as interval) | Ordinal | Ratio | Interval/ratio |
| Discrimination | High | Low (everything clusters) | Medium | Medium | High |
| Scale-use bias | None | High | Low | Medium | None |
| Respondent burden | Medium (10-15 tasks) | Low | Medium (above 7 items) | High (above 8 items) | High (10-15 tasks) |
| Individual-level data | Yes (with HB) | Yes | Not really | Not really | Yes (with HB) |
| Cross-cultural comparability | High | Low | Medium | Medium | High |
When to pick each: MaxDiff for prioritizing long lists. Likert for measuring intensity of agreement on individual statements. Ranking for quick ordering of short lists. Constant sum when you need explicit allocation. Conjoint when features interact and you need trade-off modeling.
How Quali-Fi Supports MaxDiff
Quali-Fi includes MaxDiff as a built-in question type across all product tiers. You define your item list, set the number of items per set and number of sets, and the platform generates a balanced experimental design automatically.
Respondents see clean, mobile-friendly best/worst selection screens. Analysis runs automatically as responses come in, producing utility scores, segment comparisons, and downloadable data for further analysis. You can embed MaxDiff within a larger survey alongside other question types, screening logic, and custom branding without needing a separate tool.
For larger studies with latent class segmentation or anchored MaxDiff designs, Quali-Fi's Professional Services team handles the advanced analysis.
Frequently Asked Questions
How many items can I test in a MaxDiff study?
The practical range is 10-30 items. Below 10, a simple ranking question works fine. Above 30, the survey becomes too long because each item needs to appear at least 3 times. If you have 40+ items, run a qualitative pre-screen to shortlist, then test the top 25-30 in MaxDiff.
What's the difference between MaxDiff and best-worst scaling?
They're the same thing. MaxDiff (maximum difference scaling) is the industry term used in market research. Best-worst scaling (BWS) is the academic term. Both refer to the same methodology developed by Jordan Louviere. Technically, BWS has three cases (object scaling, attribute scaling, and multi-profile), and MaxDiff corresponds to Case 1 (object scaling), but in practice the terms are interchangeable.
Can MaxDiff measure absolute importance?
Standard MaxDiff measures relative importance only. Item A is preferred 3x more than Item B, but you don't know if either is "important" in an absolute sense. Anchored MaxDiff addresses this by adding a threshold question (e.g., "Would you actually pay for this feature?"), which separates items respondents truly want from items that are merely "best of a bad list."
How long does a MaxDiff survey take?
A typical 20-item MaxDiff with 15 sets takes 3-5 minutes for the MaxDiff portion. Add screening and demographic questions, and total survey time is usually 8-12 minutes. That's significantly shorter than a conjoint study, which makes MaxDiff a good choice when you need quick turnaround or have limited respondent attention.
Can I use MaxDiff for pricing research?
Not directly. MaxDiff tells you which features are most valued but can't quantify willingness to pay. For pricing, use Van Westendorp (simple price sensitivity) or conjoint analysis (price as one attribute alongside features). You can pair MaxDiff with a separate pricing method: MaxDiff to prioritize features, then conjoint to optimize the price-feature bundle.
MaxDiff's real advantage isn't that it's statistically sophisticated, though it is. It's that it forces honesty. When respondents have to pick a worst alongside a best, they can't inflate every item to 'very important.' The data you get back actually differentiates. That's rarer than it should be.
Related Guides
- Best-Worst Scaling -- The academic foundations of MaxDiff methodology
- MaxDiff vs Likert Scale -- When to switch from rating scales to MaxDiff
- How to Design a MaxDiff Survey -- Item selection, set design, and fielding tips
- MaxDiff Sample Size Requirements -- How many respondents you need
- How to Interpret MaxDiff Results -- Reading scores, segments, and thresholds
- MaxDiff vs Conjoint -- Choosing between prioritization and trade-off analysis
- Feature Prioritization with MaxDiff -- Product team use cases
- Conjoint Analysis -- For multi-attribute trade-off modeling
- TURF Analysis -- For portfolio optimization using MaxDiff data
- MaxDiff Survey Template -- Ready-to-use MaxDiff template
Run your first MaxDiff study -- try Quali-Fi free for 14 days.