How to Interpret MaxDiff Results
What MaxDiff Output Looks Like
MaxDiff analysis produces a utility score for each item in your list, representing its share of total preference. The scores are ratio-scaled: if Item A scores 12 and Item B scores 4, respondents prefer A three times as much as B. This level of quantitative precision is what separates MaxDiff from rating scales, where everything clusters near the top.
Understanding how to read these scores, spot meaningful patterns, and translate them into decisions is where the method's real value sits.
Reading Utility Scores
The Basics
Scores are typically rescaled to sum to 100 across all items. Each score represents the percentage of total preference that item captures. Here's an example from a 15-item feature prioritization study:
| Rank | Feature | Score |
|---|---|---|
| 1 | Real-time collaboration | 12.8 |
| 2 | Offline access | 10.1 |
| 3 | Custom reporting | 8.4 |
| 4 | API access | 7.9 |
| 5 | Mobile app | 7.2 |
| 6 | SSO/SAML | 6.5 |
| 7 | Slack integration | 5.8 |
| 8 | Dark mode | 5.1 |
| 9 | Custom branding | 4.6 |
| 10 | Gantt charts | 4.2 |
| 11 | Time tracking | 3.8 |
| 12 | Resource allocation | 3.5 |
| 13 | Guest access | 2.9 |
| 14 | Custom fields | 2.4 |
| 15 | Emoji reactions | 1.8 |
What You Can Say
- Real-time collaboration is 7x more preferred than emoji reactions (12.8 / 1.8).
- The top 3 features capture 31.3% of total preference (12.8 + 10.1 + 8.4).
- The bottom 5 features account for only 15.4% combined.
These are ratio-scale statements. You can't make them with Likert data, where the same features might all score between 3.8 and 4.5 on a 5-point scale.
What You Can't Say
MaxDiff scores are relative. They don't tell you whether any item is "important" in an absolute sense. If all 15 features are niche and unexciting, the top-ranked one still scores 12.8. The score means it's the most preferred, not that customers urgently want it.
To address this limitation, consider anchored MaxDiff, which adds a threshold question ("Would you actually pay for / use this feature?") that separates genuinely desired items from items that are merely "best of the set."
Identifying Priority Tiers
Raw utility scores give you a continuous ranking, but business decisions usually need discrete categories: "build this," "consider this," "skip this."
Look for Natural Breaks
Plot the scores as a bar chart sorted by value. You'll often see visible gaps where clusters form:
- Tier 1 (must-build): Items 1-3 in the example above. Clear separation from the rest, collectively capturing 31% of preference.
- Tier 2 (should-build): Items 4-7. Solid scores but noticeably lower than Tier 1.
- Tier 3 (nice-to-have): Items 8-12. Moderate scores, no single item stands out.
- Tier 4 (deprioritize): Items 13-15. Minimal preference, safe to defer.
The break points depend on your specific data distribution. Sometimes there's a clear cliff between items 5 and 6. Sometimes the curve is smooth and you need to set thresholds based on business logic ("top quartile gets resourced this quarter").
Use the Ratio Test
For any pair of items, check the ratio. If Item A is 2x or more preferred than Item B, there's meaningful separation. If they're within 1.2x of each other (e.g., 5.8 vs. 5.1), they're effectively tied for practical purposes.
Segment Comparisons
MaxDiff's analytical value multiplies when you compare scores across segments. The same item list often produces dramatically different priorities for different audiences.
How to Compare
Run HB estimation separately for each segment, then compare the item rankings side by side. Look for:
- Items that rank high for all segments: These are table stakes. Everyone wants them, so they're foundational features, not differentiators.
- Items that rank high for one segment but low for another: These are segment-specific opportunities. API access might rank #2 for developers and #14 for marketing teams.
- Items where segments directly conflict: Rare but actionable. If enterprise buyers prioritize compliance features and SMB buyers rank those same features last, you can't serve both with the same product positioning.
Statistical Testing
Don't assume that a rank difference equals a meaningful difference. If Feature X scores 8.4 in Segment A and 7.1 in Segment B, those scores might not be statistically distinguishable given the confidence intervals. Most HB software outputs standard errors. Use them to test whether between-segment differences are real before making product decisions based on segment splits.
Presenting MaxDiff Results to Stakeholders
Lead With the Priority Ranking
A horizontal bar chart sorted by score is the most intuitive visualization. Color-code the tiers (green for must-build, yellow for consider, gray for defer). This single visual answers the core question: "What should we focus on?"
Show Segment Contrasts
A side-by-side bar chart or heat map comparing segment rankings gives stakeholders the "so what" beyond the overall ranking. "Enterprise wants API access and SSO. SMB wants mobile and offline. Our product roadmap should reflect that split."
Use Ratios, Not Raw Scores
"Collaboration is 7x more preferred than emoji reactions" is more impactful and easier to remember than "collaboration scored 12.8 and emoji reactions scored 1.8." Translate the data into language that matches how your stakeholders make decisions.
Avoid Over-Precision
Don't present MaxDiff scores to two decimal places and treat the ranking as a rigid hierarchy. The difference between rank 6 (5.8) and rank 7 (5.1) may not be meaningful. Present tiers and clear priority groups, not a 15-item ranked list where every position matters.
Common Interpretation Mistakes
Treating relative as absolute. The top-scored item isn't necessarily "important." It's the most preferred in your list. If the list contains weak items, the winner is the least weak.
Ignoring the bottom of the list. The lowest-scoring items tell you what to stop investing in. That's often more actionable than the top of the list.
Comparing scores across studies. A score of 12.8 in one study and 12.8 in another doesn't mean the same thing. MaxDiff scores are relative to the item list. Different items, different study, different scores.
Reporting without context. Always pair MaxDiff scores with segment data and, if possible, behavioral data (usage, purchase history). Stated preference and actual behavior don't always align.
Frequently Asked Questions
Can I compare MaxDiff scores to Likert importance ratings?
Not directly. They measure different things on different scales. You can run both in the same survey (on different items) and compare directionally, but the numbers aren't on the same metric.
What's a "good" MaxDiff score?
There's no universal benchmark. Scores are relative to your item list and study design. Focus on the spread between items (is there meaningful separation?) and the ratio between top and bottom items (a wide ratio means clear differentiation; a narrow ratio means no strong preferences).
Should I report raw utilities or rescaled scores?
Rescaled scores (summing to 100) are easier for non-technical audiences to understand and compare. Raw utilities are more useful for advanced analysis. Report rescaled scores to stakeholders and keep raw utilities for your analysis files.
Related Guides
- MaxDiff Analysis: Complete Guide -- Full methodology overview
- How to Design a MaxDiff Survey -- Design decisions that shape your output
- MaxDiff Sample Size Requirements -- Sample planning for reliable estimates
- MaxDiff vs Likert Scale -- Understanding the output differences
- Feature Prioritization with MaxDiff -- Translating scores into product decisions
- Conjoint Analysis Interpretation -- Comparison with conjoint output
Analyze MaxDiff results in real time -- try Quali-Fi free for 14 days.