National Cancer Institute Home at the National Institutes of Health | www.cancer.gov
Please wait while this form is being loaded....

HEI Tools for Researchers

This page provides information about the basic steps for calculating HEI component and total scores and further details for calculating scores at different levels of analysis (i.e., national food supply, food processing, community food environment, and individual food intake). Refer to the Research Uses page for more details about these levels and the types of studies that can be conducted at each. As all of the instructions and tools on this page relate only to the two most recent versions of the HEI, and not the original, the term "HEI" -- if used alone -- refers to either the HEI–2010 or the HEI–2005.

Basic Steps in Calculating HEI Scores

The basic steps in calculating HEI scores are:

  1. Identify the set of foods under consideration:

    The set of foods considered could include the entire US food supply, the sum of choices available in a particular environment, or the foods consumed by individuals on a day or over a longer period of time.

  2. Determine the amount of each relevant dietary constituent in the set of foods:

    The relevant dietary constituents for calculating HEI–2010 component and total scores are: total fruit; whole fruit (total fruit excluding juice); total vegetables; beans and peas; dark green vegetables; whole grains; dairy (milk, yogurt, cheese, and fortified soy beverages in the form of skim milk equivalents); total protein foods (lean fraction only); seafood; nuts and seeds; refined grains; saturated fatty acids; polyunsaturated fatty acids; monounsaturated fatty acids; sodium; calories from added sugars, solid fats, and alcohol (separately); and total calories. The relevant constituents for the HEI–2005 are slightly different: amounts of nuts and seeds, seafood, refined grains, and the unsaturated fatty acids are not required, but data on orange vegetables, total grains, and oils are.

    These constituents must be captured cleanly. That is, amounts should reflect only the constituent and not the total amount of the foods in which they may be contained. For example, the fruit juice fraction of a juice drink -- which may be only 10% of the total product -- counts toward total fruit, but the rest of the beverage counts toward added sugars. Likewise, the skim milk fraction of whole milk counts toward the dairy constituent, but the butterfat in whole milk counts toward calories from solid fat. Determining the amounts of each dietary constituent contained in the total quantity of foods under consideration requires linking to relevant databases, as information on both nutrients and food groups are needed to calculate HEI scores.

  3. Derive pertinent ratios and score each HEI component using the relevant standard:

    Calculating HEI scores starts with creating density values, or ratios. The resulting ratios are compared with the applicable standards for scoring (see standards for HEI–2010 and HEI–2005).

[Return to top]

Calculating HEI Scores at Different Levels

Regardless of the level of the food stream, the basic steps for deriving HEI scores are the same. However, the food and nutrient databases required vary by level, as shown in the following figure.

Graphic summarizing the three steps for deriving HEI scores across each of the four levels of the food stream. Read the following sections for a complete explanation.

[Return to top]

National Food Supply

Steps 1 and 2: Identify the set of foods under consideration and determine the amount of each relevant dietary constituent in the set of foods

At the national food supply level, the first two steps in calculating HEI scores are intertwined because the databases used enumerate the set of foods and provide the necessary compositional information. In contrast to other levels, researchers do not need to capture food supply data de novo, but rather can rely on publicly available databases.

Analyses using the HEI at the national food supply level reflect the set of foods that enter retail distribution channels. In the United States, quantities of each commodity are estimated by summing total annual production, imports, and beginning inventories and then subtracting exports, ending inventories, and nonfood uses. By dividing these aggregate amounts by the estimated population of the country, per capita estimates can be derived. The US Department of Agriculture (USDA) annually provides such data through their Food Availability Data series, and corresponding Nutrient Availability Data.

USDA also provides Loss-Adjusted Food Availability Data (LAFAD), which accounts for food spoilage, plate waste, and other losses. Waste is an important consideration in calculating the HEI because the components are density based, and differential losses across food categories, if unaccounted for, could lead to difficulties in interpretation. Another advantage of the LAFAD is that quantities are provided in units such as cups and ounces, rather than pounds per person per day, and this allows for simpler calculation of the HEI. Because the data represent individual commodities and do not include any mixed dishes, such as those that complicate analyses at other levels, no disaggregation of foods is needed.

The Food and Agriculture Organization (FAO) of the United Nations also publishes Food Balance SheetsExternal Web Site Policy, which are comparable, but not identical, to the US Food Availability Data in their derivation. The advantage of these data is that they are available for a wide range of countries, using a similar methodology, which makes them attractive for cross-country comparisons. However, because the associated nutrient data are limited and the waste adjustments used are not comprehensive, certain assumptions must be made and care is needed in interpretation.

Step 3: Derive pertinent ratios and score each HEI component using the relevant standard

For studies of the US food supply, the relevant dietary constituents needed to calculate the HEI scores are derived from the LAFAD and the Nutrient Availability Data. The Nutrient Availability Data require calibration to the LAFAD because they are not adjusted for waste in the way the LAFAD data are. For variables requiring the LAFAD, some can be obtained directly, whereas others require the application of assumptions and/or imputations.

At the food supply level, the amount of each dietary constituent is summed over all food commodities in the food supply and expressed as a ratio to total energy or, in the case of fatty acids, to total fatty acids. The resulting ratios are compared with the applicable standards for scoring.

Using the Total Fruits component as an example:

Graphic showing the mathematical formula for the described process: Dividing the sum of F for the set by the sum of E for the set gives the Assign Score for the set.

where F = total cups of fruit in the food supply for a given year and E = total energy content of the food supply for a given year.

Two SAS macros for implementing step 3 are available:

These macros can be used to calculate HEI–2010 and HEI–2005 component and total scores and can be applied to any SAS dataset containing the requisite variables (i.e., datasets for which steps 1 and 2 above have been completed). The first of the two macros allocates beans and peas to the Total Protein Foods, Seafood and Plant Proteins, Total Vegetables and Greens and Beans components. The USDA Food Patterns include beans and peas as part of both the vegetable and protein foods groups but stipulate that they be counted in only one or the other of these groups. In the HEI–2010 and HEI–2005, beans and peas are first allocated to the protein groups and only those beans and peas that are not needed to meet the standard for Total Protein Foods (or Meat and Beans, in the case of the HEI–2005) are counted toward the vegetable groups.

The second macro creates ratios and scores them using the applicable standards. The resulting output contains scores for the same number of observations as were included in the original input file. This could be one observation in the case of estimating component and total scores for the food supply for a single year or several in the case of estimating scores for many years or for many countries.

For details on the analysis of food supply data using the HEI–2005, see Reedy et al.External Web Site Policy and Krebs-Smith et al.External Web Site Policy Details regarding how variables can be constructed using food supply data to calculate HEI–2010 scores are expected to be published soon, as part of an analysis examining the US food supply from 1970 to 2010.

[Return to top]

Food Processing

Step 1: Identify the set of foods under consideration

At the food processing level, the set of foods under consideration might be the output of a given manufacturer or group of companies, such as those taking part in the Healthy Weight Commitment. The ideal way to capture such data would be to get them directly from the manufacturer(s). Methods for researchers to gather this information themselves, using available data on foods in the marketplace, have not been developed, as research at this level has been held back by a lack of available nutrient and food group compositional databases.

Step 2: Determine the amount of each relevant dietary constituent in the set of foods

One publicly available (fee-based) resource that provides data for packaged food and beverage products sold in the United States is the Gladson Nutrition DatabaseExternal Web Site Policy. It supplies ingredient content and nutrient composition for several nutrients including energy, fatty acids and sodium. Therefore, all the nutrient information needed to calculate the HEI is available. However, compositional databases on the food group content of packaged foods are missing. If such a database could be made available, research using the HEI at this level would be greatly facilitated.

Step 3: Derive pertinent ratios and score each HEI component using the relevant standard

Once food group compositional databases are available at this level, calculation of the variables, ratios, and scores will be straightforward. The same SAS macros provided for implementing this step at the food supply level could be used at this level.

[Return to top]

Community Food Environment

Step 1: Identify the set of foods under consideration

Aspects of the community food environment that could be evaluated with the HEI include the set of foods offered, served or sold at markets, outlets, schools and other institutions. Foods offered can be operationalized by enumerating the set of foods and beverages on a menu (for example, from a school's lunch program) or otherwise offered for sale (for example, in a grocery store's weekly ad). Foods served could be defined, for example, as the total foods actually served by a school over a given period. Likewise, foods sold could be represented by the total sales for a neighborhood grocery store. It is generally easier to obtain information on food offered than on foods served or sold, as the latter are not as readily available and generally require cooperation from the market, outlet, or institution.

Step 2: Determine the amount of each relevant dietary constituent in the set of foods

Nutrient and food group composition data are needed to calculate HEI scores. Values for energy and the relevant nutrients may be available from package labeling or nutrient composition databases. However, determining the values for the other relevant dietary constituents means that any food mixture containing ingredients from multiple food groups (pizza, for example), must be disaggregated into component ingredients before it can be tallied appropriately. Also, if necessary, yield factors must be applied so the amounts of cooked and raw foods are on an equivalent basis. This requires a database, such as the MyPyramid Equivalents Database (MPED) (see section below on individual food intake) that translates the foods into equivalent amounts of fruits, vegetables, added sugars, and so on.

If the set of foods under consideration represents an outlet or institution that sells or serves only ready-to-eat food, databases that have been developed for individual-level analyses can be used. However, no databases are available currently to translate unprepared foods (such as raw meats and untrimmed produce) and processed but not fully prepared foods (such as cake mixes) into appropriate food group equivalents. This is a limitation for studying markets that sell these foods. If the set of foods is small, this step can be done by hand, but this is a painstaking process. Studies of the total inventories of large grocery stores will be impracticable until market-appropriate databases are available.

Step 3: Derive pertinent ratios and score each HEI component using the relevant standard

At the community food environment level, the amount of each dietary constituent is summed over all foods in the set under consideration and expressed as a ratio to total energy or, in the case of fatty acids, to total fatty acids. The resulting ratios are compared with the applicable standards for scoring (link to HEI standards).

Using the Total Fruits component as an example:

Graphic showing the mathematical formula for the described process: Dividing the sum of F for the set by the sum of E for the set gives the Assign Score for the set.

where F = total cups of fruit in the set of foods and E = total energy content of the set of foods.

Two SAS macros for implementing step 3 are available:

These macros can be used to calculate HEI–2010 and HEI–2005 component and total scores and can be applied to any SAS dataset containing the requisite variables (i.e., datasets for which steps 1 and 2 above have been completed). The first of the two macros allocates beans and peas to the Total Protein Foods, Seafood and Plant Proteins, Total Vegetables, and Greens and Beans components. The USDA Food Patterns include beans and peas as part of both the vegetable and protein foods groups but stipulate that they be counted in only one or the other of these groups. In the HEI–2010 and HEI–2005, beans and peas are first allocated to the protein groups and only those beans and peas that are not needed to meet the standard for Total Protein Foods (or Meat and Beans, in the case of the HEI–2005) are counted toward the vegetable groups.

The second macro creates ratios and scores them using the applicable standards. The resulting output contains scores for the same number of observations as were included in the original input file. This could be one, in the case of estimating component and total scores for a single market or other environment or several, in the case of estimating scores for many markets or other environments.

See the methods sections of Volpe and Okrent, Reedy et al.External Web Site Policy, and Kirkpatrick et al.External Web Site Policy for examples of analyses using the HEI–2005 at the community food environment level.

[Return to top]

Individual Food Intake

Step 1: Identify the set of foods under consideration

The total foods and beverages consumed by individuals is the subject of interest at this level. Most often, researchers are interested in the usual (or long-run average) diets of groups of individuals.

Information on foods consumed by individuals on a day or over a longer period of time can be collected using various methods. These include 24-hour recall, food record, or food frequency questionnaire. For example, HEI scores can be calculated for recall data collected in the What We Eat in America component of the National Health and Nutrition Examination Survey (NHANES) or using the Automated Self-Administered 24-hour Recall (ASA24) system. Food frequency questionnaire data can also be used to calculate HEI scores if linkages to appropriate databases can be made (see step 2).

Calculating HEI scores requires information on the total diet and thus data from brief instruments focused on particular aspects of the diet cannot be used for this purpose.

Step 2: Determine the amount of each relevant dietary constituent in the set of foods

Determining the amounts of each dietary constituent contained in the total quantity of foods under consideration requires linking to relevant databases. Values for energy and the relevant nutrients can be obtained from a nutrient composition database.

Obtaining values for the other relevant dietary constituents requires a database that translates the foods into amounts of fruits, vegetables, lean meat, and so on. One publicly available database designed for this purpose is the MyPyramid Equivalents Database (MPED). The MPED links to the USDA's Food and Nutrient Database for Dietary Studies (FNDDS) and has been used to evaluate the US diet in relation to dietary guidance such as the USDA food patterns, which are part of the Dietary Guidelines for Americans. It translates the amounts of foods, as eaten, into cup and ounce equivalents that are consistent with the units of measure used for the HEI scoring standards.

Depending on the databases used, additional steps may be required to determine the amount of each constituent required to calculate HEI scores. For example, researchers drawing upon the MPED also must obtain information about whole fruit and fruit juice, available from USDA's Center for Nutrition Policy and Promotion (CNPP) MyPyramid Equivalents Databases for Whole Fruit and Fruit Juice. Further, in the HEI, soy beverages are included in the Dairy component whereas in the MPED, they are included in the Meat and Beans group along with other soy products. Beans and peas also must be allocated to the proper component. In the HEI, they are first counted toward the protein groups, with any amount left after the Total Proteins Food standard is met counting toward the vegetables groups.

Regardless of the data source and databases used, attention should be paid to the degree to which it is possible to estimate amounts of the dietary constituents required to calculate HEI scores.

Step 3: Derive pertinent ratios and score each HEI component using the relevant standard

In the case of individual diets, multiple approaches to creating ratios are possible. The most appropriate approach depends on the specific research question. Ideally, the HEI is calculated on the basis of the usual, or long-term average, dietary intake of an individual. This principle supports the use of the population ratio, bivariate, and multivariate approaches for describing HEI scores among populations. In particular, the bivariate and multivariate approaches allow for the correction of measurement error in self-reported dietary intake data that are used to estimate HEI scores. Further research is needed to inform the most appropriate approaches for analyses examining relationships between HEI scores and health outcomes or other variables. Further details are provided in the sections below, which describe how the HEI can be used in various types of studies at the individual intake level.

[Return to top]

Monitoring Dietary Intakes

  • Estimating mean HEI scores for a population or group

    Mean HEI scores for a population, subpopulation, or group can be estimated using 24-hour recall data and the population ratio method. This approach has been shown to be the preferred method of estimating a population's mean usual HEI–2005 component and total scores on the basis of a single day of recall data. The method may also be applicable to food record data.

    To apply the population ratio method, the mean intake of the relevant food groups, nutrients and energy among the population of interest is calculated first; then ratios of the means are calculated and compared with the applicable standards for scoring. See Freedman et al. for a description and application of the population ratio method using the HEI–2005. A brief report (PDF) on HEI–2010 population scores for 2007 - 2008 and 2001 - 2002 also is available.

    SAS code for calculating HEI scores using the population ratio method is available for use with 24-hour recall data collected in NHANES and through the ASA24 system. The sample code can be adapted for other data sources.

    The HEI–2010 SAS code for NHANES data (ZIP) performs the preliminary steps for creating the requisite variables (using the 2007-2008 NHANES data as the example) and calls two macros that allocate beans and peas to the protein and vegetable components and apply the HEI–2010 scoring algorithm. The output includes mean component and total HEI–2010 scores for the population, along with their standard errors and confidence intervals. Because the HEI–2010 is a multi-dimensional construct involving 12 densities (amounts of food per 1,000 calories and ratios of fatty acids), a simple method for estimating standard errors is not available. In this code, a Monte Carlo simulation step is included for the calculation of standard errors. This code is designed to account for the complex survey design of NHANES and can be modified to use with other survey datasets. Further details are available in the documentation that accompanies the code.

    SAS code for calculating HEI–2005 scores from NHANES data (ZIP) also is available. This code uses NHANES 2001 - 2002 data as an example.

    Similarly, the HEI–2010 SAS code for use with ASA24 data (ZIP) performs the preliminary steps for creating the requisite variables from ASA24 data and calls macros that allocate beans and legumes to the protein and vegetable components and applies the HEI–2010 scoring algorithm. The output includes mean component and total HEI–2010 scores for the population, along with their standard errors confidence intervals. This code can be modified to use with other datasets that do not involve complex sampling designs.

    SAS code for calculating HEI–2005 scores using ASA24 data (ZIP) also is available.

    If more than one recall is available for at least a subsample, a bivariate approach can be used to estimate mean HEI scores. This approach also allows the estimation of the distribution of component (but not total) scores based on usual intake (see section on distributions below) and is described in detail in Freedman et al. (PDF). SAS code for implementing this approach is available on this Web site. Note that although this method allows estimation of distributions of component scores, estimation of the distribution of total scores requires a multivariate approach, as described below.

  • Estimate distributions of HEI component and total scores for a population or group

    As noted above, if more than one recall is available for at least a subsample, a bivariate approach can be used to estimate distributions of scores for the components of the HEI. This approach is described in detail in Freedman et al. (PDF). SAS code for implementing this approach is available on this Web site.

    A multivariate approach that permits estimation of the distribution of HEI component and total scores also has been developed. This approach has been described for the HEI–2005 by Zhang et al. and applied in the forthcoming evaluation of the HEI–2010. SAS macros for the implementation of this approach are under development and are expected in the winter of 2014.

    With individual-level data, it also is possible to calculate mean scores or scores of the mean ratio. This approach is not recommended for the purposes of describing mean HEI scores. See Freedman et al. for details.

    The use of food frequency questionnaire (FFQ) data for the estimation of means or distributions of HEI scores is not recommended.

[Return to top]

Examining Relationships between HEI Scores & Other Variables

The recommended approaches for estimating means and distributions of scores for a population, subpopulation, or group are intended to minimize the effects of measurement error in dietary intake data such that the results better reflect HEI scores for usual intake. For analyses that require estimating scores for each person rather than means or distributions for a group, such as examining relationships between diet and health or other variables, there currently are few options for correcting for error in dietary intake data. Further, little is known about the impact of measurement error on the results of analyses that make use of the HEI. Biomarker-based validation studies focusing on energy and protein have shown that the observed effects of diet on health are biased (typically toward the null, or attenuated) when diet is measured with error. Energy adjustment appears to lessen, though not eliminate, this problem. The extent to which these findings apply to models using the HEI is not yet understood. Further, techniques for dealing with measurement error in such analyses have not yet been developed. Until more is known about the effects of measurement error on analyses using HEI total or component scores as exposures in regression models, researchers should consider the potential for bias due to error in the interpretation of their results.

Guidance on the appropriate estimation of HEI scores for use in models estimating relationships with health or other variables based on 24-hour recall data is in preparation. Researchers interested in examining whether HEI scores differ by some factor are encouraged to use the population ratio method described above; for example, this approach could be used to examine whether groups that differ on characteristics such as income or race/ethnicity differ in diet quality as assessed by the HEI. Further details on the estimation of scores for individuals for use in regression modeling using recall data will be posted when available.

SAS code that can be applied to FFQ data to estimate total and component scores for each individual is currently available:

This code uses NIH-AARP Diet and Health Study data as an example. This code performs the preliminary steps for creating the requisite variables from FFQ data (using NIH-AARP Diet and Health Study data as the example) and calls macros that appropriately allocate beans and peas to the protein and vegetable components and apply the HEI–2010 scoring algorithm. This code estimates component and total HEI–2010 scores for each individual and can be modified for use with other FFQs.

[Return to top]

Evaluating Interventions Using HEI Scores

For assessing the effects of interventions, researchers can use the approaches described above for estimating means or distributions of HEI scores for populations, subpopulations, or groups. For example, the mean or distribution of HEI scores can be compared between intervention and control groups. However, when interpreting their results, researchers should be aware that the intervention itself may have an effect on reporting of diet (that is, lead to reactivity bias).

[Return to top]

Last Modified: 11 Apr 2014