Psychological testing refers to the administration of psychological tests. A psychological test is "an objective and standardized measure of a sample of behavior" (p. 4). The term sample of behavior refers to an individual's performance on tasks that have usually been prescribed beforehand. The samples of behavior that make up a paper-and-pencil test, the most common type of test, are a series of items. Performance on these items produce a test score. A score on a well-constructed test is believed to reflect a psychological construct such as achievement in a school subject, cognitive ability, aptitude, emotional functioning, personality, etc. Differences in test scores are thought to reflect individual differences in the construct the test is supposed to measure. The technical term for the science behind psychological testing is psychometrics.
A psychological test is an instrument designed to measure unobserved constructs, also known as latent variables. Psychological tests are typically, but not necessarily, a series of tasks or problems that the respondent has to solve. Psychological tests can strongly resemble questionnaires, which are also designed to measure unobserved constructs, but differ in that psychological tests ask for a respondent's maximum performance whereas a questionnaire asks for the respondent's typical performance. A useful psychological test must be both valid (i.e., there is evidence to support the specified interpretation of the test results) and reliable (i.e., internally consistent or give consistent results over time, across raters, etc.).
It is important that people who are equal on the measured construct also have an equal probability of answering the test items correctly. For example, an item on a mathematics test could be "In a soccer match two players get a red card; how many players are left in the end?"; however, this item also requires knowledge of soccer to be answered correctly, not just mathematical ability. Group membership can also influence the chance of correctly answering items (differential item functioning). Often tests are constructed for a specific population, and this should be taken into account when administering tests. If a test is invariant to some group difference (e.g. gender) in one population (e.g. England) it does not automatically mean that it is also invariant in another population (e.g. Japan).
Psychological assessment is similar to psychological testing but usually involves a more comprehensive assessment of the individual. Psychological assessment is a process that involves checking the integration of information from multiple sources, such as tests of normal and abnormal personality, tests of ability or intelligence, tests of interests or attitudes, as well as information from personal interviews. Collateral information is also collected about personal, occupational, or medical history, such as from records or from interviews with parents, spouses, teachers, or previous therapists or physicians. A psychological test is one of the sources of data used within the process of assessment; usually more than one test is used. Many psychologists do some level of assessment when providing services to clients or patients, and may use for example, simple checklists to osis for treatment settings; to assess a particular area of functioning or disability often for school settings; to help select type of treatment or to assess treatment outcomes; to help courts decide issues such as child custody or competency to stand trial; or to help assess job applicants or employees and provide career development counseling or training.
The first large-scale mental test may have been the imperial examination system in China. The test, an early form of psychological testing, assessed candidates based on their proficiency in topics such as civil law and fiscal policies. Other early tests of intelligence were made for entertainment rather than analysis. Modern mental testing began in France in the 19th century. It contributed to separating mental retardation from mental illness and reducing the neglect, torture, and ridicule heaped on both groups.
Englishman Francis Galton coined the terms psychometrics and eugenics, and developed a method for measuring intelligence based on nonverbal sensory-motor tests. It was initially popular, but was abandoned after the discovery that it had no relationship to outcomes such as college grades. French psychologist Alfred Binet, together with psychologists Victor Henri and Théodore Simon, after about 15 years of development, published the Binet-Simon test in 1905, which focused on verbal abilities. It was intended to identify mental retardation in school children.
The origins of personality testing date back to the 18th and 19th centuries, when personality was assessed through phrenology, the measurement of the human skull, and physiognomy, which assessed personality based on a person's outer appearances. These early pseudoscientific techniques were eventually replaced with more empirical methods in the 20th century. One of the earliest modern personality tests was the Woolworth Personality Data Sheet, a self-report inventory developed for World War I and used for the psychiatric screening of new draftees.
Proper psychological testing is conducted after vigorous research and development in contrast to quick web-based or magazine questionnaires that say "Find out your Personality Color," or "What's your Inner Age?" Proper psychological testing consists of the following:
- Standardization - All procedures and steps must be conducted with consistency and under the same environment to achieve the same testing performance from those being tested.
- Objectivity - Scoring such that subjective judgments and biases are minimized, with results for each test taker obtained in the same way.
- Test Norms - The average test score within a large group of people where the performance of one individual can be compared to the results of others by establishing a point of comparison or frame of reference.
- Reliability - Obtaining the same result after multiple testing.
- Validity - The type of test being administered must measure what it is intended to measure.
Psychological tests, like many measurements of human characteristics, can be interpreted in a norm-referenced or criterion-referenced manner. Norms are statistical representations of a population. A norm-referenced score interpretation compares an individual's results on the test with the statistical representation of the population. In practice, rather than testing a population, a representative sample or group is tested. This provides a group norm or set of norms. One representation of norms is the Bell curve (also called "normal curve"). Norms are available for standardized psychological tests, allowing for an understanding of how an individual's scores compare with the group norms. Norm referenced scores are typically reported on the standard score (z) scale or a rescaling of it.
A criterion-referenced interpretation of a test score compares an individual's performance to some criterion other than performance of other individuals. For example, the generic school test typically provides a score in reference to a subject domain; a student might score 80% on a geography test. Criterion-referenced score interpretations are generally more applicable to achievement tests rather than psychological tests.
Often, test scores can be interpreted in both ways; answering 80% of the questions correctly on a geography test could place a student at the 84th percentile (that is, the student performed better than 83% of the class and worse than 16% of the classmates), or a standard score of 1.0 or even 2.0.
There are several broad categories of psychological tests:
IQ tests purport to be measures of intelligence, while achievement tests are measures of the use and level of development of use of the ability. IQ (or cognitive) tests and achievement tests are common norm-referenced tests. In these types of tests, a series of tasks is presented to the person being evaluated, and the person's responses are graded according to carefully prescribed guidelines. After the test is completed, the results can be compiled and compared to the responses of a norm group, usually composed of people at the same age or grade level as the person being evaluated. IQ tests which contain a series of tasks typically divide the tasks into verbal (relying on the use of language) and performance, or non-verbal (relying on eye–hand types of tasks, or use of symbols or objects). Examples of verbal IQ test tasks are vocabulary and information (answering general knowledge questions). Non-verbal examples are timed completion of puzzles (object assembly) and identifying images which fit a pattern (matrix reasoning).
IQ tests (e.g., WAIS-IV, WISC-V, Cattell Culture Fair III, Woodcock-Johnson Tests of Cognitive Abilities-IV, Stanford-Binet Intelligence Scales V) and academic achievement tests (e.g. WIAT, WRAT, Woodcock-Johnson Tests of Achievement-III) are designed to be administered to either an individual (by a trained evaluator) or to a group of people (paper and pencil tests). The individually administered tests tend to be more comprehensive, more reliable, more valid and generally to have better psychometric characteristics than group-administered tests. However, individually administered tests are more expensive to administer because of the need for a trained administrator (psychologist, school psychologist, or psychometrician).
Public safety employment tests
Vocations within the public safety field (i.e., fire service, law enforcement, corrections, emergency medical services) often require Industrial and Organizational Psychology tests for initial employment and advancement throughout the ranks. The National Firefighter Selection Inventory - NFSI, the National Criminal Justice Officer Selection Inventory - NCJOSI, and the Integrity Inventory are prominent examples of these tests.
Attitude test assess an individual's feelings about an event, person, or object. Attitude scales are used in marketing to determine individual (and group) preferences for brands, or items. Typically attitude tests use either a Thurstone scale, or Likert Scale to measure specific items.
These tests consist of specifically designed tasks used to measure a psychological function known to be linked to a particular brain structure or pathway. Neuropsychological tests can be used in a clinical context to assess impairment after an injury or illness known to affect neurocognitive functioning. When used in research, these tests can be used to contrast neuropsychological abilities across experimental groups.
Infant and Preschool Assessment
Due to the fact that infants and preschool aged children have limited capacities of communication, psychologists are unable to use traditional tests to assess them. Therefore, many tests have been designed just for children ages birth to around six years of age. These tests usually vary with age respectively from assessments of reflexes and developmental milestones, to sensory and motor skills, language skills, and simple cognitive skills.
Common tests for this age group are split into categories: Infant Ability, Preschool Intelligence, and School Readiness. Common infant ability tests include: Gesell Developmental Schedules (GDS) which measures the developmental progress of infants, Neonatal Behavioral Assessment Scale (NBAS) which tests newborn behavior, reflexes, and responses, Ordinal Scales of Psychological Development (OSPD) which assesses infant intellectual abilities, and Bayley-III which tests mental ability and motor skills.
Common preschool intelligence tests include: McCarthy Scales of Children’s Abilities (MSCA) which is similar to an infant IQ test, Differential Ability Scales (DAS) which can be used to test for learning disability, Wechsler Preschool and Primary Scale of Intelligence-III (WPPSI-III) and Stanford-Binet Intelligence Scales for Early Childhood which could be seen as infant versions of IQ tests, and Fagan Test of Infant Intelligence (FTII) which tests recognition memory.
Finally, some common school readiness tests are: Developmental Indicators for the Assessment of Learning-III (DIAL-III) which assesses motor, cognitive, and language skills, Denver II which tests motor, social, and language skills, and Home Observation for Measurement of Environment (HOME) which is a measure of the extent to which a child’s home environment facilitates school readiness.
Infant and preschool assessments, since they do not predict later childhood nor adult abilities, are mainly useful for testing if a child is experiencing developmental delay or disabilities. They are also useful for testing individual intelligence and ability, and, as aforementioned, there are some specifically designed to test school readiness and determine which children may struggle more in school.
Psychological measures of personality are often described as either objective tests or projective tests. The terms "objective test" and "projective test" have recently come under criticism in the Journal of Personality Assessment. The more descriptive "rating scale or self-report measures" and "free response measures" are suggested, rather than the terms "objective tests" and "projective tests," respectively.
Objective tests (Rating scale or self-report measure)
Objective tests have a restricted response format, such as allowing for true or false answers or rating using an ordinal scale. Prominent examples of objective personality tests include the Minnesota Multiphasic Personality Inventory, Millon Clinical Multiaxial Inventory-III, Child Behavior Checklist, Symptom Checklist 90 and the Beck Depression Inventory. Objective personality tests can be designed for use in business for potential employees, such as the NEO-PI, the 16PF, and the OPQ (Occupational Personality Questionnaire), all of which are based on the Big Five taxonomy. The Big Five, or Five Factor Model of normal personality, has gained acceptance since the early 1990s when some influential meta-analyses (e.g., Barrick & Mount 1991) found consistent relationships between the Big Five personality factors and important criterion variables.
Projective tests (Free response measures)
Projective tests allow for a freer type of response. An example of this would be the Rorschach test, in which a person states what each of ten ink blots might be.
Projective testing became a growth industry in the first half of the 1900s, with doubts about the theoretical assumptions behind projective testing arising in the second half of the 1900s. Some projective tests are used less often today because they are more time consuming to administer and because the reliability and validity are controversial.
As improved sampling and statistical methods developed, much controversy regarding the utility and validity of projective testing has occurred. The use of clinical judgement rather than norms and statistics to evaluate people's characteristics has raised criticism that projectives are deficient and unreliable (results are too dissimilar each time a test is given to the same person). However, as more objective scoring and interpretive systems supported by more rigorous scientific research have emerged, many practitioners continue to rely on projective testing. Projective tests may be useful in creating inferences to follow up with other methods. The most widely used scoring system for the Rorschach is the Exner system of scoring. Another common projective test is the Thematic Apperception Test (TAT), which is often scored with Westen's Social Cognition and Object Relations Scales and Phebe Cramer's Defense Mechanisms Manual. Both "rating scale" and "free response" measures are used in contemporary clinical practice, with a trend toward the former.
The number of tests specifically meant for the field of sexology is quite limited. The field of sexology provides different psychological evaluation devices in order to examine the various aspects of the discomfort, problem or dysfunction, regardless of whether they are individual or relational ones.
Direct observation tests
Although most psychological tests are "rating scale" or "free response" measures, psychological assessment may also involve the observation of people as they complete activities. This type of assessment is usually conducted with families in a laboratory, home or with children in a classroom. The purpose may be clinical, such as to establish a pre-intervention baseline of a child's hyperactive or aggressive classroom behaviors or to observe the nature of a parent-child interaction in order to understand a relational disorder. Direct observation procedures are also used in research, for example to study the relationship between intrapsychic variables and specific target behaviors, or to explore sequences of behavioral interaction.
The Parent-Child Interaction Assessment-II (PCIA) is an example of a direct observation procedure that is used with school-age children and parents. The parents and children are video recorded playing at a make-believe zoo. The Parent-Child Early Relational Assessment (Clark, 1999) is used to study parents and young children and involves a feeding and a puzzle task. The MacArthur Story Stem Battery (MSSB) is used to elicit narratives from children. The Dyadic Parent-Child Interaction Coding System-II (Eyberg, 1981) tracks the extent to which children follow the commands of parents and vice versa and is well suited to the study of children with Oppositional Defiant Disorders and their parents.
Psychological tests to assess a person’s interests and preferences. These tests are used primarily for career counseling. Interest tests include items about daily activities from among which applicants select their preferences. The rationale is that if a person exhibits the same pattern of interests and preferences as people who are successful in a given occupation, then the chances are high that the person taking the test will find satisfaction in that occupation. A widely used interest test is the Strong Interest Inventory, which is used in career assessment, career counseling, and educational guidance.
Psychological tests measure specific abilities, such as clerical, perceptual, numerical, or spatial aptitude. Sometimes these tests must be specially designed for a particular job, but there are also tests available that measure general clerical and mechanical aptitudes, or even general learning ability. An example of an occupational aptitude test is the Minnesota Clerical Test, which measures the perceptual speed and accuracy required to perform various clerical duties. Other widely used aptitude tests include Careerscope, the Differential Aptitude Tests (DAT), which assess verbal reasoning, numerical ability, abstract Reasoning, clerical speed and accuracy, mechanical reasoning, space relations, spelling and language usage. Another widely used test of aptitudes is the Wonderlic Test. These aptitudes are believed to be related to specific occupations and are used for career guidance as well as selection and recruitment.
Biographical Information Blank
The Biographical Information Blanks or BIB is a paper-and-pencil form that includes items that ask about detailed personal and work history. It is used to aid in the hiring of employees by matching the backgrounds of individuals to requirements of the job.
Many psychological tests are generally not available to the public, but rather, have restrictions both from publishers of the tests and from psychology licensing boards that prevent the disclosure of the tests themselves and information about the interpretation of the results. Test publishers consider both copyright and matters of professional ethics to be involved in protecting the secrecy of their tests, and they sell tests only to people who have proved their educational and professional qualifications to the test maker's satisfaction. Purchasers are legally bound from giving test answers or the tests themselves out to the public unless permitted under the test maker's standard conditions for administration of the tests.
The International Test Commission (ITC), an international association of national psychological societies and test publishers, publishes the International Guidelines for Test Use, which prescribes to "protect the integrity" of the tests by not publicly describing test techniques and by not "coaching individuals" so that they "might unfairly influence their test performance."
- Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
- Mellenbergh, G.J. (2008). Chapter 10: Surveys. In H.J. Adèr & G.J. Mellenbergh (Eds.) (with contributions by D.J. Hand), Advising on Research Methods: A consultant's companion (pp. 183-209). Huizen, The Netherlands: Johannes van Kessel Publishing.
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2), 127--143.
- Standards for Education and Training in Psychological Assessment: Position of the Society for Personality Assessment – An Official Statement of the Board of Trustees of the Society for Personality Assessment. Journal of Personality Assessment, 87, 355–357.
- Robert J. Gregory (2003). "The History of Psychological Testing". Psychological Testing : History, Principles, and Applications (PDF). Allyn & Bacon. p. 4 in chapter 1. ISBN 9780205354726.
- Jiannong Shi (2 February 2004). Robert J. Sternberg, ed. International Handbook of Intelligence. Cambridge University Press. pp. 330–331. ISBN 978-0-521-00402-2.
- IQ Testing 101, Alan S. Kaufman, 2009, Springer Publishing Company, ISBN 0-8261-0629-3 ISBN 978-0-8261-0629-2
- Gillham, Nicholas W. (2001). "Sir Francis Galton and the birth of eugenics". Annual Review of Genetics. 35 (1): 83–101. doi:10.1146/annurev.genet.35.102401.090055. PMID 11700278.
- Elahe Nezami; James N. Butcher (16 February 2000). G. Goldstein; Michel Hersen, eds. Handbook of Psychological Assessment. Elsevier. p. 415. ISBN 978-0-08-054002-3.
- Shultz & Schultz, Duane (2010). Psychology and work today. New York: Prentice Hall. pp. 99–102. ISBN 0-205-68358-4.
- Millon, T. (1994). Millon Clinical Multiaxial Inventory-III. Minneapolis, MN: National Computer Systems.
- Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA School-Age Forms and Profiles. Burlington: University of Vermont, Research Center for Children, Youth, and Families. ISBN 0-938565-73-7
- Derogatis L. R. (1983). SCL90: Administration, Scoring and Procedures Manual for the Revised Version. Baltimore: Clinical Psychometric Research.
- Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck Depression Inventory, 2nd ed. San Antonio, TX: The Psychological Corporation.
- McGhee, RL., Ehrler, D., & Buckhalt, J. (2008). Manual for the Five Factor Personality Inventory – Children. Austin, TX: Pro Ed, Inc.
- Wasserman, John D (2003). "Nonverbal Assessment of Personality and Psychopathology". In McCallum, R. Steve. Handbook of Nonverbal Assessment. New York: Kluwer Academic / Plenum Publishers. ISBN 0-306-47715-7. Retrieved 20 November 2010
- Exner, J. E. & Erdberg, P. (2005) The Rorschach: A comprehensive system: advanced Interpretation (3rd Edition. Vol 2). Hoboken, NJ: Wiley and Sons.
- Murray, H. A. (1943). Thematic Apperception Test manual. Cambridge, MA: Harvard University Press.
- Westen, D. (1991). Social cognition and object relations. Psychological Bulletin, 109(3), 429–455.
- Cramer, P. (2002). Defense Mechanism Manual, revised June 2002. Unpublished manuscript, Williams College. (Available from Dr. Phebe Cramer.)
- Holigrocki, R. J, Kaminski, P. L., & Frieswyk, S. H. (1999). Introduction to the Parent-Child Interaction Assessment. Bulletin of the Menninger Clinic, 63(3), 413–428.
- Clark, R. (1999). The Parent-Child Early Relational Assessment: A Factorial Validity Study. Educational and Psychological Measurement, 59(5), 821–846.
- Bretherton, I., Oppenheim, D., Buchsbaum, H., Emde, R. N., & the MacArthur Narrative Group. (1990). MacArthur Story-Stem battery. Unpublished manual.
- Aiken, L. R. (1998). Tests and Examinations: Measuring abilities and performance. New York: John Wiley & Sons.
- The Committee on Psychological Tests and Assessment (CPTA), American Psychological Association (1994). "Statement on the Use of Secure Psychological Tests in the Education of Graduate and Undergraduate Psychology Students". American Psychological Association.
It should be recognized that certain tests used by psychologists and related professionals may suffer irreparable harm to their validity if their items, scoring keys or protocols, and other materials are publicly disclosed.
- Kenneth R. Morel (2009-09-24). "Test Security in Medicolegal Cases: Proposed Guidelines for Attorneys Utilizing Neuropsychology Practice". Archives of Clinical Neuropsychology. Oxford University Press. 24 (7): 635–646. doi:10.1093/arclin/acp062. PMID 19778915. Retrieved 2009-11-08.
- Pearson Assessments (2009). "Legal Policies". Psychological Corporation. Retrieved 2009-11-15.
- International Test Commission (2000) International Guidelines for Test Use
- American Psychological Association webpage on testing and assessment
- BPS psych testing centre
- Guidelines of the International Test Commission
- International Item Pool, an alternative and free source of items available for research on personality