Learn how to design, run, and analyze MaxDiff (best-worst scaling) studies. Practical guide with survey design tips, sample size requirements, and worked examples.

MaxDiff Analysis: Complete Guide for Researchers

What Is MaxDiff Analysis?

MaxDiff analysis (maximum difference scaling) is a survey-based research method that identifies how people prioritize a list of items by repeatedly asking them to pick the most and least important options from small subsets. Instead of rating every item on a scale (where everything clusters around "important"), MaxDiff forces trade-offs that reveal genuine preferences with clear separation between items.

The method is also called best-worst scaling (BWS), and the two terms are used interchangeably in practice. Jordan Louviere developed the approach in the early 1990s as a response to the known limitations of rating scales. It's since become standard for feature prioritization, message testing, brand attribute ranking, and any research question that asks "which of these matters most?"

Why MaxDiff Beats Rating Scales

Rating scales have a well-documented problem: respondents tend to rate everything as important. Ask 500 people to rate 20 product features on a 1-10 scale, and you'll get 15 features clustered between 7 and 9. That data doesn't help you decide what to build next.

Microsoft encountered this exact issue when prioritizing features for Windows. Likert importance ratings returned 85%+ of features rated as "important" or "very important," providing no meaningful differentiation. After switching to MaxDiff, they got a clear priority ranking that actually informed the product roadmap.

MaxDiff produces ratio-scaled scores. If Feature A scores 15 and Feature B scores 5, you can say Feature A is three times as preferred. You can't make that claim with Likert data. MaxDiff also eliminates scale-use bias (some respondents use the top of the scale for everything, others bunch toward the middle), which makes cross-cultural and cross-segment comparisons reliable.

When to Use MaxDiff

Use MaxDiff When...	Don't Use MaxDiff When...
You need to rank/prioritize a list of 10-30 items	You need to understand trade-offs between feature combinations (use conjoint)
You want clear differentiation between items	You need absolute satisfaction or agreement scores
You're comparing preferences across segments or markets	Your list has fewer than 7 items (simple ranking works)
You need ratio-scaled data (A is 3x preferred to B)	You need historical comparability with existing Likert data
Budget or sample size is limited	You need to calculate willingness to pay

Common applications:

Feature prioritization: Which features should the product team build next?
Message testing: Which value propositions resonate most with each audience segment?
Brand attribute importance: Which brand attributes drive purchase decisions?
Employee engagement drivers: Which workplace factors matter most to retention?
Advertising claims testing: Which claims are most compelling and which fall flat?

How MaxDiff Works

The Respondent Experience

A respondent sees a set of 4-5 items drawn from a larger list. They pick the one that's "most important" (or most preferred, most appealing) and the one that's "least important." Then they see a new set with different items and repeat the process.

A typical MaxDiff exercise shows 10-15 sets. The experimental design ensures each item appears an equal number of times across all sets and is paired with every other item roughly equally, so the analysis can produce a clean ranking.

What Happens Behind the Scenes

Each best/worst choice provides two data points: one positive (the "best" pick) and one negative (the "worst" pick). Across many sets, items that consistently get picked as "best" accumulate high scores, and items that consistently get picked as "worst" accumulate low scores.

The analysis (typically hierarchical Bayesian estimation for individual-level scores, or simple counting methods for aggregate scores) produces a utility score for each item on a ratio scale anchored from 0 to 100, where the scores represent each item's share of total preference.

How to Design a MaxDiff Study

Step 1: Define Your Item List

Start with 10-30 items. Fewer than 10 and you're not getting much value over a simple ranking question. More than 30 and the survey gets long because each item needs to appear enough times for reliable estimation.

Items should be at the same level of abstraction. Don't mix specific features ("dark mode") with broad categories ("better user experience"). Each item should be understandable in a few words without additional context.

Step 2: Set the Number of Items Per Set

Show 4-5 items per set. Four is the most common default and works well for most studies. Five items per set collects slightly more information per task but adds cognitive load. Going above 5 per set is rare and generally not recommended.

Step 3: Determine the Number of Sets

Each item should appear at least 3 times across a respondent's sets (more is better). Use this formula as a starting point:

Minimum sets = (number of items x 3) / items per set

For 20 items shown 4 per set: 20 x 3 / 4 = 15 sets. That's a reasonable respondent burden. For 30 items: 30 x 3 / 4 = 22.5, which pushes toward the upper limit of what respondents will tolerate. Keep total sets at or below 20 when possible.

Step 4: Generate the Experimental Design

The design determines which items appear together in each set. It needs to balance two things: every item appears an equal number of times, and every pair of items appears together roughly equally across all sets.

Most MaxDiff software generates balanced incomplete block designs automatically. Check that no item is over- or under-represented, and that no pair of items always appears together (which would confound their effects).

Step 5: Choose Your Scale Framing

The question wording matters. "Most important / Least important" is the default, but you can adapt:

"Most appealing / Least appealing" for message testing
"Most likely to influence purchase / Least likely" for feature studies
"Best describes [brand] / Least describes [brand]" for brand perception

Use framing that matches how respondents naturally think about the items.

Step 6: Field and Analyze

Launch to your target sample (200+ respondents for aggregate results, 200+ per segment for segment comparisons). Run HB estimation for individual-level scores or simple counting for quick aggregate results.

Sample Size Requirements

MaxDiff is less sample-hungry than conjoint analysis because each choice task is simpler:

Analysis Level	Recommended Sample
Aggregate ranking (overall priorities)	150-200
Segment-level comparison (2-3 segments)	200+ per segment
Individual-level scores (latent class, clustering)	300-500

With fewer than 100 respondents, aggregate count-based analysis still produces a usable ranking, but you won't have the precision for segment splits or individual-level modeling.

For more details, see the MaxDiff sample size guide.

How to Interpret MaxDiff Results

Utility Scores

The primary output is a utility score for each item, typically rescaled to sum to 100 across all items. Higher scores mean stronger preference.

Example output from a SaaS feature prioritization study (20 features tested, top 10 shown):

Feature	Utility Score
Real-time collaboration	12.4
Offline access	9.8
Custom reporting	8.7
API access	7.5
Mobile app	7.1
SSO/SAML	6.3
Slack integration	5.2
Dark mode	4.8
Custom branding	3.9
Gantt charts	3.1

Because MaxDiff produces ratio-scaled data, you can say real-time collaboration (12.4) is roughly 4x as preferred as Gantt charts (3.1). That's a meaningful quantitative statement you can't make with rating-scale data.

Segment Comparisons

The real power of MaxDiff emerges when you compare scores across segments. Enterprise buyers might rank API access and SSO at the top while SMB buyers prioritize mobile app and real-time collaboration. These differences shape your product roadmap and go-to-market messaging for each audience.

Threshold Analysis

Look for natural break points in the score distribution. Often you'll see a cluster of high-priority items, a middle group, and a tail of low-priority items. The gaps between clusters tell you where the meaningful priority thresholds sit.

For more on reading MaxDiff output, see the interpretation guide.

Real-World Examples

SaaS: Feature Roadmap Prioritization

A project management SaaS company tested 25 potential features with 400 current users. The MaxDiff revealed that real-time collaboration and offline access dominated the top, while features the product team had been discussing for months (custom themes, advanced permissions) scored in the bottom quartile. The company reprioritized two engineering sprints based on the results, accelerating collaboration features by a full quarter.

CPG: Packaging Claim Testing

A snack brand tested 15 front-of-pack claims (organic, non-GMO, high protein, low sugar, locally made, etc.) with 500 grocery shoppers. "High protein" and "low sugar" scored 2.5x higher than "locally made" and "non-GMO," despite internal marketing assumptions that sustainability claims would lead. The brand redesigned their packaging hierarchy to lead with nutritional claims.

Healthcare: Treatment Attribute Importance

A hospital system tested 18 attributes of outpatient care experience with 300 patients. Wait time and provider communication topped the list. Parking availability and check-in technology, which the system was investing heavily in, ranked 15th and 16th. The results shifted capital allocation from facility upgrades to staffing and scheduling improvements.

Common Mistakes

Too many items. Beyond 30 items, the survey becomes tedious and the design requires too many sets. If you have 40+ items, pre-screen them with a qualitative phase and reduce to 25-30 for the MaxDiff.
Mixing abstraction levels. "Better UI design" and "Fix the login bug on Safari" shouldn't be in the same MaxDiff. Items need to be comparable in scope.
Ignoring the "worst" data. The least-preferred items are as informative as the most-preferred ones. They tell you what to de-prioritize, which is sometimes the more valuable finding.
Using MaxDiff when you need trade-offs. MaxDiff ranks items independently. It can't tell you "how much more would customers pay for Feature A vs. Feature B." For trade-off analysis, use conjoint.
Insufficient items per set. Showing only 3 items per set wastes respondent effort. Each task only provides one "best" and one "worst" data point regardless of set size, so 4-5 items per set gives you the same information cost with better design efficiency.

MaxDiff vs Alternatives

Feature	MaxDiff	Likert Scale	Ranking	Constant Sum	Conjoint
Best for	Prioritizing 10-30 items	Measuring agreement/satisfaction	Simple ordering of 5-10 items	Allocating importance across 5-8 items	Feature trade-off analysis
Scale type	Ratio	Ordinal (treated as interval)	Ordinal	Ratio	Interval/ratio
Discrimination	High	Low (everything clusters)	Medium	Medium	High
Scale-use bias	None	High	Low	Medium	None
Respondent burden	Medium (10-15 tasks)	Low	Medium (above 7 items)	High (above 8 items)	High (10-15 tasks)
Individual-level data	Yes (with HB)	Yes	Not really	Not really	Yes (with HB)
Cross-cultural comparability	High	Low	Medium	Medium	High

When to pick each: MaxDiff for prioritizing long lists. Likert for measuring intensity of agreement on individual statements. Ranking for quick ordering of short lists. Constant sum when you need explicit allocation. Conjoint when features interact and you need trade-off modeling.

How Quali-Fi Supports MaxDiff

Quali-Fi includes MaxDiff as a built-in question type across all product tiers. You define your item list, set the number of items per set and number of sets, and the platform generates a balanced experimental design automatically.

Respondents see clean, mobile-friendly best/worst selection screens. Analysis runs automatically as responses come in, producing utility scores, segment comparisons, and downloadable data for further analysis. You can embed MaxDiff within a larger survey alongside other question types, screening logic, and custom branding without needing a separate tool.

For larger studies with latent class segmentation or anchored MaxDiff designs, Quali-Fi's Professional Services team handles the advanced analysis.

Frequently Asked Questions

How many items can I test in a MaxDiff study?

The practical range is 10-30 items. Below 10, a simple ranking question works fine. Above 30, the survey becomes too long because each item needs to appear at least 3 times. If you have 40+ items, run a qualitative pre-screen to shortlist, then test the top 25-30 in MaxDiff.

What's the difference between MaxDiff and best-worst scaling?

They're the same thing. MaxDiff (maximum difference scaling) is the industry term used in market research. Best-worst scaling (BWS) is the academic term. Both refer to the same methodology developed by Jordan Louviere. Technically, BWS has three cases (object scaling, attribute scaling, and multi-profile), and MaxDiff corresponds to Case 1 (object scaling), but in practice the terms are interchangeable.

Can MaxDiff measure absolute importance?

Standard MaxDiff measures relative importance only. Item A is preferred 3x more than Item B, but you don't know if either is "important" in an absolute sense. Anchored MaxDiff addresses this by adding a threshold question (e.g., "Would you actually pay for this feature?"), which separates items respondents truly want from items that are merely "best of a bad list."

How long does a MaxDiff survey take?

A typical 20-item MaxDiff with 15 sets takes 3-5 minutes for the MaxDiff portion. Add screening and demographic questions, and total survey time is usually 8-12 minutes. That's significantly shorter than a conjoint study, which makes MaxDiff a good choice when you need quick turnaround or have limited respondent attention.

Can I use MaxDiff for pricing research?

Not directly. MaxDiff tells you which features are most valued but can't quantify willingness to pay. For pricing, use Van Westendorp (simple price sensitivity) or conjoint analysis (price as one attribute alongside features). You can pair MaxDiff with a separate pricing method: MaxDiff to prioritize features, then conjoint to optimize the price-feature bundle.

MaxDiff's real advantage isn't that it's statistically sophisticated, though it is. It's that it forces honesty. When respondents have to pick a worst alongside a best, they can't inflate every item to 'very important.' The data you get back actually differentiates. That's rarer than it should be.

Best-Worst Scaling -- The academic foundations of MaxDiff methodology
MaxDiff vs Likert Scale -- When to switch from rating scales to MaxDiff
How to Design a MaxDiff Survey -- Item selection, set design, and fielding tips
MaxDiff Sample Size Requirements -- How many respondents you need
How to Interpret MaxDiff Results -- Reading scores, segments, and thresholds
MaxDiff vs Conjoint -- Choosing between prioritization and trade-off analysis
Feature Prioritization with MaxDiff -- Product team use cases
Conjoint Analysis -- For multi-attribute trade-off modeling
TURF Analysis -- For portfolio optimization using MaxDiff data
MaxDiff Survey Template -- Ready-to-use MaxDiff template

Run your first MaxDiff study -- try Quali-Fi free for 14 days.

MaxDiff Analysis: Complete Guide for Researchers

MaxDiff Analysis: Complete Guide for Researchers

What Is MaxDiff Analysis?

Why MaxDiff Beats Rating Scales

When to Use MaxDiff

How MaxDiff Works

The Respondent Experience

What Happens Behind the Scenes

How to Design a MaxDiff Study

Step 1: Define Your Item List

Step 2: Set the Number of Items Per Set

Step 3: Determine the Number of Sets

Step 4: Generate the Experimental Design

Step 5: Choose Your Scale Framing

Step 6: Field and Analyze

Sample Size Requirements

How to Interpret MaxDiff Results

Utility Scores

Segment Comparisons

Threshold Analysis

Real-World Examples

SaaS: Feature Roadmap Prioritization

CPG: Packaging Claim Testing

Healthcare: Treatment Attribute Importance

Common Mistakes

MaxDiff vs Alternatives

How Quali-Fi Supports MaxDiff

Frequently Asked Questions

How many items can I test in a MaxDiff study?

What's the difference between MaxDiff and best-worst scaling?

Can MaxDiff measure absolute importance?

How long does a MaxDiff survey take?

Can I use MaxDiff for pricing research?

Related Guides

Frequently Asked Questions

Related Guides

Conjoint Analysis: Complete Guide for Researchers

TURF Analysis: Complete Guide for Researchers

Van Westendorp Pricing Model: Step-by-Step Guide

Best-Worst Scaling: A Practitioner's Guide

How to Design a MaxDiff Survey

How to Interpret MaxDiff Results

MaxDiff vs Conjoint: Choosing the Right Method

MaxDiff vs Likert Scale: When to Use Each

Ready to apply this in your research?