EVALUATING FORAGE VARIETY TRIAL EFFECTIVENESS
(Should These Data Be Reported?)
John Caddel 
Forage Agronomist  Oklahoma State University  Stillwater, OK
Dan Putnam 
Forage Extension Specialist  University of California  Davis, CA
Thad Busbice 
Alfalfa Breeder and President  Great Plains Research  Apex, NC

FORAGE VARIETY TESTING SYMPOSIUM
Current Status and Implications for the Future
ASA, CSSA, and SSSA Annual Meeting
Indianapolis, IN
Tuesday November 5, 1996

EVALUATING FORAGE VARIETY TRIAL EFFECTIVENESS
(Should These Data Be Reported?)
 

  John Caddel 
Forage Agronomist
Oklahoma State University
Stillwater, OK

   Dan Putnam 
Forage Extension Specialist University of California

Davis, CA

  Thad Busbice 
Alfalfa Breeder and President

Great Plains Research

Apex, NC

   

INTRODUCTION

Forage variety testing has been practiced by farmers, private seed companies, land grant colleges, and federal agencies for many years. Results of variety tests frequently influence which varieties are released by both public and private breeders. Test results also influence which varieties are marketed in particular regions and which varieties are chosen for use by farmers. In view of their potential importance forage variety tests should be highly effective. Variety tests should result in more than a list of yields produced by a group of varieties. They should discriminate between well-adapted varieties and inferior ones. They should provide good predictions of how one variety will perform in relation to another variety.  
The purposes of this paper are to describe methods to evaluate effectiveness of forage variety tests; propose how less-than-adequate test data should be treated; and suggest ways to improve the effectiveness of forage variety test by controlling sources of unwanted variance.

RECENT HISTORY OF ALFALFA VARIETY TESTING

The task of variety tester has changed drastically during the last 20 years. Development of new varieties has changed in several ways in alfalfa, and to a lesser degree in other forages. Table 1 summarizes five major changes during the last 20 years with respect to new alfalfa varieties and testing.

 

Table 1. Current alfalfa variety development and testing situation compared to 20 years ago.

20 Years Ago

Current Situation
Most varieties were developed by USDA/ARS and universities breeders with no financial incentive to release "new" varieties. Nearly all varieties are developed by a few industry breeding programs with financial incentive to release "new" varieties.
1 to 5 varieties were released each year in the US. 50 to 70 new varieties per year are approved for certification, and others are also released.
Public breeders conducted variety tests. Forage extension specialist oversee technicians charged with variety testing.
Variety testing was coordinated at "alfalfa" meetings. Few public variety testers attend "alfalfa" meetings. More attend ASA and AFGC meetings.
Almost everyone involved with alfalfa testing understood strengths and weaknesses of other alfalfa testing programs. Testers receive lists of lists of test results, and sometimes have time to look over others' results.

   

Several questions have been asked during this symposium and should be addressed by forage variety testers. Three questions that may lead the list follow: 

Do forage variety tests provide forage producers information needed to make wise variety choices?  
Does industry gets its money's-worth for fees paid?  
Are we doing the best job of reaching our testing objectives?

Many forage variety testing programs should answer these questions with strong "yes". It is, however, apparent that many public and private workers could (should) improve their tests, as indicated by Moutray (1992). In alfalfa there are standard test procedures to describe resistance to 6 insects, resistance to 16 disease, resistance to 3 nematodes, resistance to 5 environmental stresses, and 2 morphological or quality traits (Fox et al. 1991). There is no "standard test" to describe a variety's potential yielding ability or persistence, the traits most important to most producers and to the industry.

   OBJECTIVES OF FORAGE VARIETY TESTING PROGRAMS   

Forage variety testing programs have many objectives and the importance of the objectives change among locations and over time. Four objectives that are important to most university and private testing programs follow:

Provide data to seed companies for comparing regional performance of entries (cultivars and/or experimental strains).  
Make multiple-site, side-by-side comparisons of varieties marketed in a state. This may help develop a recommended variety list for producers.  
Compare experimental strains with the best available germplasm.  
Facilitate field days, providing a backdrop for extension presentations. 

  WAYS TO EVALUATE FORAGE VARIETY TESTS 

Many general ways to evaluate forage tests are utilized to some degree; however, most of them do not indicate if tests meet objectives listed above. There is no single way to evaluate the effectiveness of all forage variety tests because of diverse environments, purposes, measurements used, etc. We propose the use of two closely related methods based on common statistics to evaluate the merits of a forage test. These evaluations should meet the needs of most variety tests when forage yield is the most important measurement. These effectiveness tests are probably more rigorous than needed when the primary measurements are qualitative in nature with large differences among test entries.  

The CV (Coefficient of Variation) is frequently somewhat abstract, but it is probably the simplest and most useful single statistic one can use to decide if a test meets normal objectives of typical forage variety tests. A forage yield trial with a test CV less than 8% normally discriminate well enough to indicate if the poor varieties are really poor and the good varieties are really good. Tests with CV's greater than 12% do not discriminate well enough to separate varieties that are generally well adapted to the test environment. Forage yield differences must be very large (qualitative differences) to conclude that one variety is significantly better (or worse) than another when CV's are greater than 15%.  

Caddel (1993) examined 117 statistics sets from 12 central states that contributed to the 1992 CAIC (Central Alfalfa Improvement Conference) Variety Tests report. The CV's ranged from 3.4% to 18.6% with a mean of 7.8%, indicating many tests did not meet an acceptable level of effectiveness. Busbice (1995) reported similar results.  

Caddel (1993) and Busbice (1995) proposed the value of an alfalfa variety forage trial can be estimated by the modified coefficient of variation (MCV). Where

.

The MCV (LSD expressed as a fraction of the mean) will estimate the percentage difference between varieties that can be detected at a stated confidence level (p=0.05 is suggested). Busbice (1995) stated that a MCV value of less than 10% is required for a yield test to have practical value. Forage yields of adapted commercial alfalfa varieties are not expected to differ by more than 10%.  

Caddel (1993) regressed the CV of many different alfalfa forage tests against the MCV and showed that MCV can be calculated by multiplying the CV by 1.3 for most alfalfa variety forage yield tests with four replications (n=4) and 15 to 40 entries.  

A few other common methods to evaluate forage variety tests are discussed below. Each one has certain merits but rarely addresses the central issue -- "Does this test separate the good varieties from the poor ones?" 

Forage yield vs. expected yield tells the tester if yields are in a "normal" range of production in the area, but it does not indicate if the test discriminates between good and poor varieties.  

Performance of a particular entry does not address any of the real objectives of variety testing.  

Relative performance of check varieties only indicates how the checks yielded or persisted in relation to some preconceived idea.  

The appearance of a variety test tells a few things about the worth of the test because poor tests are normally not attractive, but attractive tests do not always clearly discriminate.  

Disease or insect pest severity rarely indicates if a test will accomplish it purpose. Some pest-free tests are not the most desirable tests.  

"F" tests are sometimes thought to indicate something about the quality of a forage test; however, by itself this statistics only describe the relative size of mean squares. A non-significant F does not necessarily indicate a poor variety test because there may be no real difference among varieties in that test. A significant F does not indicate an effective test because extremely large differences among varieties will cause F to be significant in spite of poor testing procedures.

HOW GOOD SHOULD TESTS BE?

Forage variety yield tests should be good enough to declare yields of two varieties significantly (p=0.05) different, if their yields differ by 10% of the test mean. In view of this, we propose to use and distribute data that do not meet this level with caution.  

When MCV is between 10% and 15%, {10%< () >15%} test data should not be published. One may then ask what should testers do with such data? If the tester believes that something must be done with the data, we suggest:  
 

    1. Send results to those who submitted material for the test, including data and statistics for only the originator's entries and checks varieties.  

    2. Do not send results of other entries.  

    3. Do not include results in extension publications  

    4. Do not include results in regional variety test reports (such as CAIC and WAIC). 

In those cases when MCV is greater than 15% {15%< ()} test data should be rejected as not serving any know purpose. Do not publish test data or use it for any purpose. 

      1. Treat data as if they were not collected.

      2. Treat data as if the tests were not conducted. 

There will be those who object to throwing away data; however, using the data from poor forage variety tests in other ways causes confusion and is likely to proliferate inaccurate information. It answers no normal objective of conducting forage variety tests and should be ignored. This approach puts yield data on the same footing as pest resistance data and data describing physiological data which must meet certain criteria before acceptance.  

For perennial forages with multiple harvests each year, when should these criteria be applied? They may be applied to individual harvests, annual totals, and/or totals over years. When they should be used depends on the purpose of the test. Forage yields for individual harvests normally tell very little about the overall worth of a perennial variety. The total yield over the life of a test is the most important criterion for judging the worth of a variety and is the most important place to test the effectiveness of a test.  

Frequently yields of individual harvests are not effective in discrimination among varieties, but the total of harvests within a year and totals of years discriminate well. Consequently testers should use totals, not individual harvest results. 

 

WAYS TO IMPROVE ALFALFA VARIETY TESTS
(WAYS TO IMPROVE TEST'S SENSITIVITY)

Everyone involved in variety testing has horror stories. Everyone also has partial solutions to the problem of controlling variation in forage tests. The following are suggestions of ways to improve alfalfa variety evaluations for forage yield and can be applied to many other variety tests. The "bottom line" is to carry out the best possible test. We cannot cut corners to make false time- and dollar-savings and expect to separate good varieties from poor varieties.  

  Site Selection    
Soil must be uniform throughout a test, especially within each replication. This includes cropping history, pesticide usage, fertilizer usage, soil depth, soil texture, soil series, etc., which can increase variation within replications. 

  Seedbed Preparation   
Seedbeds must be as uniform as possible. Small low spots or high spots can ruin an otherwise good test. Seedbeds should be uniformly firm, or irregular germination and emergence will cause high variation from the start.  

  Field Edges     
Stay well away from the edge of fields. There are many hidden irregularities at the edges of fields.  

  Irrigation    
Irrigation must be uniform. Irrigating forage variety tests when precipitation is limited does not always improve the effectiveness of tests. Wetting the entire surface does not insure uniform watering. Irrigation sprinklers are frequently spaced too far apart for uniform watering. Each spot in the test should be watered by at least four sprinklers at a time. Problems encountered with flood irrigation and drip irrigation are especially related to infiltration rate, soil depth, and surface shaping.  

  Farm Equipment    
Use large farm equipment as little as possible. It is difficult to treat the whole area uniformly with large tractors. When it is necessary to use large equipment, be sure that all plots in a replication are driven on exactly the same. This applies to preplant operations as well as those after the stand is up and growing.  

  Measurements    
Plots must be measured accurately before planting. Equal areas must be harvested for yield. Be sure scales are correctly calibrated and that they stay in calibration. The field crew must be well acquainted with their tasks and not tired or bored.  

  Insect Infestations    
Watch closely for the beginning of insect infestations. Sometimes they start in small areas. This is especially true for aphids. General feeders such as potato leafhoppers and grasshoppers often start at field borders. If insecticides are necessary, apply insecticides uniformly to all plots and the border areas. When an objective of a forage variety test is to evaluate the importance of insect resistance, insecticides should be avoided if the target insect pest is present, and special infestation techniques may be needed to assure uniformity.  

  Borders    
Alfalfa borders should be maintained for at least 6 feet around tests. Twenty-foot borders are better. Border areas normally receive more abuse from excess wheel traffic than the plots or a commercial field. Consequently, they should contain a good variety and cared for as well as the plot area.  

  Weed Control    
Herbicide or mechanical weeding should be applied equally to all plots, even when weed infestations are not uniform.  

  Plan Jobs Well    
There must be enough time to complete each job during the same short time interval. Each task should require a maximum of 1 to 2 hours for a trial. This lessens the danger of interruptions caused by accidents or rain.  

  Small experiments    
Alfalfa variety test with too many entries are unmanageable. It is impossible to control all the sources variation when tests are too big. For a single test, 24 to 30 varieties should be the maximum number of entries. If there are more entries, make two experiments with common check varieties. It is easier and more effective to make comparisons to check varieties than to try to control variation in extremely large tests. Even if a little more time is used to manage two experiments than a single large one, there is a good possibility that the two will be meaningful while the large one will serve no real purpose.  

  Plot Size    
While there is no ideal plot size, plots should be as small as possible. Large plots necessitate large replications which generally lead to higher variation. Plots 3' to 5' wide and 15' to 20' long are normally about right.  

  Plot Border Effects    
When varieties included in a tests are generally similar from the stand points of dormancy, winter hardiness, and pest resistance, whole individual plots should be harvested and border effects between plots ignored. On the other hand if varieties differ greatly, it is frequently better to harvest an inner portion of plots and leave the outsides to absorb border effects from adjacent plots. A negative aspect of borders for individual plots, other than increasing total space and labor requirements, is that plot borders serve to increase the size of replications which normally increases variances.  

  Make Replications Square    
A circular field replication will nearly always have less variation than a rectangular area. We cannot normally use circular reps. A square rep is much better than a long rectangle and nearly as good as a circle. When long ranges form replications, variation within a rep will be large.  

  Statistical Analysis    
The correct statistical analysis must be used. Do not choose the analysis because it has "always been used " or because you do not have time to learn new software. Incomplete lattice designs have been used successfully to reduce error in some programs.  

  Number of Replications    
The number of replications is sometimes limiting from a statistical stand point. The required degree of precision is often impossible with only four replications, considering the uncontrolled errors in this type of research. Five or six replications will usually suffice. In addition to increasing statistical precision, having five or six replications allows testers to ignore certain replications and still have an analyzable experiment. If several plots in a replication suffer from some unplanned effect (a few flooded plots is frequently a problem where uniform surface drainage is lacking) one or two replications can be removed from the analysis (or not even harvested) if there were extra replications in the beginning. If this happens when a test began with only four replications and the tester throws out one, nothing of real value is left. Thus, adding one or two replications decreases the risk of a test failure.  

Table 2 illustrates the improvement of MCV's in 11 alfalfa variety forage yield evaluations conducted in Oklahoma, using tests with six replications as compared to four replications. The "4 Reps" column was generated by randomly omitting two of the six replications and reanalyzing. As expected, the MCV was higher for the four reps than for the six reps. All MCV's were acceptable using data from six replications; however, MCV's in three tests exceed 10% when data from only four replications were used in analyses.

 

Table 2. MCV for total forage yield of 3-year alfalfa tests.
  MCV Based on
Test No. 6 Reps 4 Reps
931-92 4.8 5.5
961-92 4.8 6.6
962-92 8.4 11.5
982-92 6.0 7.3
001-93 4.8 6.5
002-93 9.2 11.4
032-93 8.0 11.5
041-93 7.6 9.5
101-94 5.5 6.9
121-94 4.7 6.0
151-94 4.7 5.6
Mean  6.2 8.0
Data from alfalfa variety tests in Oklahoma, sown 1989-1991. MCV's were calculated by -- LSD/test mean X 100.

 

  Frequent Observations  
Make frequent visual observations. If you can see real differences, you should be able to measure the differences. There are differences you cannot see. With a one-time observation, most of us cannot see differences as small as 15% in total annual yield. Apparent yields of varieties change over time and small differences add up over a year and over the life of an experiment.  

  Analyze the Analyses  
Examine all the statistics from a test before sharing it with others. If entries differ statistically, ask if the differences are logical. If varieties do not differ statistically, ask if the lack of difference is biological or due to inadequate testing procedures. If yield differences of more than 10% are not statistically significant, determine why not and change procedures.  

  Interpret Data  
Presenting yield tables is different from interpreting data. Do not claim yields of varieties differ or do not differ, based on minimal data. Several years at several locations, including several plantings are usually necessary to determine if one variety is really different from another. If you cannot include varieties in enough tests to be confident differences (or lack of differences) are real, ask why the tests were established. Many hours of data interpreting are necessary to conclude which varieties should be grown in a particular area.  

  Merging Data  
Merging data can be helpful when analyzing data. Be sure that you do not merge data that should not be merged. Be sure to not merge a good data set with a poor data set. The results will be another poor data set. Merging two poor data sets will still result in a poor data set.  

  Integrity    
The concept of public employees testing varieties is founded on the belief that everyone involved exercises the utmost integrity. Obviously the tester must be impartial. The person submitting seed must also be honest. Even though a real sample of the seed was tested impartially, there is nothing to stop someone from misusing the results. Everyone involved in the whole testing process must have the utmost integrity or the whole system of testing falls apart. Even the retailer who picks and chooses data to present to customers can destroy all the best testing. If the tester is not completely confident of the test results doubt permeates the system because of a lack of integrity. 

LITERATURE CITED 

    Busbice, T. H. 1995. How Good are Alfalfa Variety Trials? A Question of Ethics and Accuracy. Proc. 25th Cent. Alfalfa Imp. Conf. p. 9.  

    Caddel, J.L. 1993. How well are we testing alfalfa variety yields? Proc. 23rd Central Alfalfa Imp. Conf. p. 19.  

    Fox, C.C., R.C. Berberet, F.A. Gray, C.R. Grau, D.J. Jessen, and M.A. Peterson (ed.). 1991. Standard tests to characterize alfalfa cultivars. N. Am. Alfalfa Imp. Conf., Beltsville, MD.  

    Moutray, Jim B. 1992. Alfalfa variety testing, current status and future needs from an industry viewpoint. Proc. 22nd National Alfalfa Symposium. p. 10-13.

 
 
 
 

Previous Page