|This page is a guidance essay.|
It contains the advice of one or more Wikipedia contributors. It is not a Wikipedia policy or guideline, although it may be consulted for assistance. This essay may contain opinions that are shared by few or no other editors, as it has not been thoroughly vetted by the community.
|This page in a nutshell: Do not cherrypick. When selecting information from a source, include contradictory and significant qualifying information from the same source.|
In the context of editing an article, cherrypicking, in a negative sense, means selecting information without including contradictory or significant qualifying information from the same source and consequently misrepresenting what the source says. This applies both to quotations and to paraphrasings.
If you are familiar with multiple credible sources on a subject and they are significantly different from each other, you may realize that Wikipedia's policies and guidelines support reporting from some or all of the sources, and you should edit accordingly. If one editor is not familiar with some sources, another editor who is can edit accordingly. Irrespective of one editor's views, an article as a whole needs to conform to Wikipedia's policies and guidelines.
Outside of Wikipedia, cherrypicking often means selecting from the general range of sources on a topic so as to misrepresent a consensus or to misrepresent what has been published. For that, the remedy is to edit to reflect what another editor missed, because we don't expect an editor to know all the sources on a topic or even all of a consensus. Our concern within Wikipedia is about cherrypicking from within a source or closely-related multiple sources, and an editor should be careful in handling that.
Do not cherrypick.
- 1 Main information
- 2 Merely additional information
- 3 One source or multiple
- 4 Multiple sources
- 5 Paraphrasings and quotations
- 6 Remedies
- 7 Larger meaning of cherrypicking outside Wikipedia
- 8 Positive meaning of cherrypicking
- 9 See also
- 10 Notes and references
The main information from a source, insofar as stated in Wikipedia, must be accompanied by any contradictory and qualifying information from the same source.
Failure to do so often violates Wikipedia's policies and guidelines:
- WP:NPOV (policy): Neutral point of view, by selectively presenting one point of view from a source that actually includes two or more that conflict with each other
- WP:OR (policy): No original research, by presenting a statement not supported by any source, not even the cited sourcing
- WP:UNDUE (policy): Not giving undue weight to a view, by omitting information that shows that it is relatively unimportant
- WP:FRINGE (guideline): Not giving a fringe view undue weight, by omitting information that shows that it is a fringe view
- WP:RS (guideline): Not using an unreliable source, by omitting information that would show unreliability
Not all information must be presented. A source must be fairly represented for the purpose of the article and that includes contradictory and qualifying information, but, other than that, information need not be added if (for example) it is not due the weight or is redundant.
As to contradictory information that needs to be reported in Wikipedia, if, for example, a source says "Charlie loves all blue coats and hates all red coats", to report in Wikipedia that according to that source "Charlie loves all ... coats" is cherrypicking from the source. It is cherrypicking words with the effect of changing the meaning of what the source is saying. It is cherrypicking even if the source is precisely cited. It is still cherrypicking even if the editor meant well in changing the meaning; the issue is not the editor's intention, but how the Wikipedia article represents the source's meaning.
Timing matters. A statement at a given point in time may contradict any statement that was made earlier in time. However, a statement that is earlier in time does not contradict a later statement, even by the same person. Anyone is permitted to change their views. Generally, it does not discredit a person that they reject older views, or probably most research scientists would have no credibility. One scholar reputedly said, when challenged about changing his mind, "When the facts change, I change my mind. What do you do, sir?"
Politics is a field in which changing one's mind is commonly criticized, at least in the United States. A claim is that a lack of consistency over time may make a candidate less trustworthy. However, while that may be more relevant to electing one or another candidate, it is less relevant to editing an article. We may report that a candidate held one view at one time and another view at another time, sourcing each view, or only report one view if that is all that is entitled to weight in the article, but generally the view that came later in time is not contradicted by the view that came earlier in time for purposes of reporting in Wikipedia.
Earlier and later within a single source is generally not earlier or later in time. Exceptions may occur if a single source presents chronologically ordered material, such as an anthology of dated writings, a history, or a biography. However, even if an author is known to complete the writing of one chapter before beginning research for the next, do not assume that earlier or later in a source equates with earlier or later in time unless the source's content makes that chronological ordering clear. Even in sources that appear to rely on chronology, be careful about literary devices such as flashbacks in biographies, as where a military veteran has a flashback to a wartime experience, as they can make chronologies uncertain.
Multiple sources within a source
Suppose one book has several in-depth profiles of different artists. According to the book, artist A says it's necessary to schedule four hours a day to paint in order to produce masterpieces but artist B says one should never schedule time for painting but should instead await inspiration, so the work has not less than the highest quality, but neither artist knows about the other, so neither one is criticizing the other. In effect, the book is a source that contains multiple sources, which you can treat separately. If you're editing an article about artist B but not about artist A, you don't have to report what artist A said.
This can be a difficult concept to apply with integrity and consistency. It is easier if a book is an anthology, but it can be true of many other books as well. For example, it is common for journalists to interview multiple sources for one book; and many studies and surveys in the social sciences report what respondents said, and the result may be people disagreeing with each other without knowing about each other, thus not criticizing each other.
Some subjects are grounded in critique of society or of another subject. For instance, a belief system may contradict another. Given that a Wikipedia article is about only one subject, not every contradiction from outside of that subject need be reported, but substantial contradiction probably should be reported or summarized as criticism.
Mixed fields of study
Different fields of study have points of disagreement with each other. Even if the same author writes on multiple fields or disciplines in a single source, a Wikipedia article within one field of scholarship generally does not have to report contradictions emanating solely from other fields, unless they are criticisms.
- A theologian and a mathematician may contradict each other on the role of a deity in arithmetic.
- Linguists tend not to rank cultures as dominant versus subcultural or as developed versus underdeveloped while sociologists tend to do exactly that, not because one is ignorantly insensitive and the other is cruelly chauvinist but because linguists need to learn the languages and ranking does not provide much help to that end while sociologists specifically study relationships between peoples and thus study rankings without necessarily agreeing with those rankings in their own value systems.
- A scientist may be expected to start with a hypothesis and challenge it with a scientific investigation to ensure thoroughness while a lawyer or detective may be expected to start with no hypothesis and find what a forensic investigation yields free of bias.
Qualifying information is information that might not contradict the main information but that alters how the main information should be understood. For example, to quote a source that says that most Americans sleep late and skip work but to ignore that the source limits that by saying "on weekends" is to omit qualifying information and misrepresent the source on Americans' work customs. While qualifying information is infinite and cannot all be quoted or paraphrased, if it is significant, include it.
This example of qualifying information is from a book: "I have taken artistic license in conveying both reality and essence" and "[s]ome conversations ... are not intended ... as verbatim quotes."
Where to find them
Either contradictory or qualifying information may be found anywhere in a source, not necessarily adjacent to the main information. For example, while the main information may be in a middle chapter of a book, contradictory or qualifying information may be in an endnote, in an introduction, or on a cover. Many sources are well organized and make finding everything you need relatively simple, but not all sources are so helpful.
Merely additional information
On the other hand, merely additional information does not have to be provided. For example, if a source says "brain surgery is difficult" and goes on to state the experience of a surgeon who performed it without changing the meaning of the main information, the surgeon's experience does not have to be provided in Wikipedia.
One source or multiple
While Wikipedia may consider a "source" to be just p. 32 of a certain book, to prevent cherrypicking you should consider a source in its larger sense. While many sources are organized for speedy lookups, some are not or your interest may not allow looking in just one place. For instance, if a source has several volumes, consider all the volumes. That kind of source may have relevant content sprinkled across all the volumes. You may have to read all of them. For some, searching the index (even reading the whole index from A to Z) is needed. For some, you'll need to search online inside a source for various terms.
When editing Wikipedia, it's not cherrypicking in general to miss contradictory or qualifying information from a different source than a source that had information already being used, because we don't expect editors to be familiar with all of the possible sources that could be cited on a topic. Therefore, to have gotten information from one source without acknowledging that it was contradicted or qualified in another source is not a valid criticism of an editor's work in Wikipedia as cherrypicking.
However, an article as a whole should reflect the range of sources available on the article's subject. This does not require using every source that exists, just that the sourcing cited be reasonably representative of the range of sources that exist. This applies regardless of who edited it in the past. While an individual editor is not required to know all of the significant sources on a subject, it is helpful if you do, or if you know at least some of them. Therefore, if you are familiar with a different and unused source that should be used, feel free to edit an article consistently with the different source, if the source is otherwise eligible to be used in Wikipedia.
A later edition of a work usually replaces all earlier editions. Earlier editions are usually not authoritative as sourcing against later editions. However, an editor may not know of a newer edition or may not have access to it. Therefore, it generally is not cherrypicking to get information from an earlier edition and not from a later edition. (An editor with a later edition is encouraged to edit accordingly.)
If an editor finds that a later edition contradicts or qualifies an earlier edition that was cited in Wikipedia, the editor who found it should treat the newer edition as if it is a separate source and edit the Wikipedia article, perhaps replacing or editing old content and old citations or just adding new content and new citations.
Citing multiple editions in one article is permissible if the content is sourced to multiple editions. A statement being kept and which is supported by an older edition should continue to be supported by the older edition in a citation, unless an editor has found that the newer edition also supports the statement, in which case updating the citation to reflect the newer edition is helpful, or that the newer edition contradicts the statement, in which case the statement and the citation should both be updated.
It may be appropriate to cite an older edition even when a newer edition is also being cited, or instead of a newer edition, but this would be rare. The older information would have to be entitled to weight apart from the newer. One case is when writing about the historical development of an idea or of an author's views. Another case is when two editions of the same work are actually about different subjects; an example is with some popular guides to computer software, where an older edition of a book may be about an older version of the same software.
Printings and impressions
Printings and impressions may be treated like editions for these editorial purposes. Although in the U.S. different printings of one edition tend to be identical or very similar, generally there is no law requiring that and they may differ in any way a publisher wishes. For instance, errors may be corrected between printings even if they're of the same edition. Unfortunately, library catalogues often do not indicate what printing is in a library, differentiating only between editions. It is possible that, for current editions of modern books, libraries tend to buy first printings whereas bookstores may stock more recent printings, although you may have to visit in person to find out. Printings are often marked in a modern book on the copyright page but only in a code, such as a line that says only "1 2 3 4 5 6 7 8 9 10" or perhaps "10 9 8 7 6 5 4", in which the smallest number visible is often the printing number.
Wikipedia's citations almost never show which printing is cited.
Newspapers traditionally have idiosyncratic ways of labeling different editions in the course of a day, especially in past decades, so that the labels may not make clear which is earlier or later, especially to an out-of-town reader. Additionally, the publisher may arbitrarily designate one edition as authoritative and, almost always, microforms, PDFs, database copies, and library hard copies of a newspaper are limited to one edition per day, and that is not necessarily the last edition. Some newspapers that maintain their own websites may choose to put the latest version on the website, and sometimes corrections are even added days or weeks later, but that is not guaranteed. Databases may or may not be updated to match newspapers' own websites, since database publishers are often separate from newspaper publishers, even if they are connected by contract. In any case, we use what is available, and, if we have a choice, we should use the best available, and we cite what we use.
With regard to one author's views, editions per se do not matter because any later work that contradicts or qualifies any earlier work replaces the earlier work as authority. For example, if an author wrote in 1993 in the third edition of a Paris travel guide that ethology is wholly nonsense but in 2002 in the first edition of a cookbook that ethology is reliable, that author's view has changed and the later view is authoritative, regardless of which edition was first or third. However, if a distinction can be found between two views about one subject, both would be authoritative for that author and we would not report a change. If the distinction is minor, the earlier view may not be entitled to weight in Wikipedia.
Coauthors who form a stable group can be considered like a single author for these purposes.
Anthologies should be considered as collections in which each contribution has its own authorship, whether that results in all the contributions having the same or different authors.
Editors should not be considered like authors, even if an editor's name is more prominent than that of any author. Generally, what is published is the view or intellectual responsibility of an author and may or may not be that of an editor. An editor may even approve the publication of directly contradictory statements in one work, such as in an anthology.
Different sources may sometimes be treated differently. A case where this would be appropriate may be shown with a hypothetical example:
Suppose Smith wrote a book with one view and Washington wrote another book with a contradictory view. Suppose you agree with Smith but not with Washington. Provided they came from different sources (e.g., different books or different websites), when you edit an article, you're free to add Smith's view and not to add Washington's. However, if another editor adds Washington's view, you may not delete it on the ground that you disagree. No matter how sure you are that Washington is wrong to the bone, leave the content and the citation in place. What you may do is add a source that disputes Washington's view. This is consistent with Wikipedia being a work continually undergoing improvement and no editor is required to know or believe every source. Thus, to choose one source and not another is not cherrypicking from one source.
If Smith's and Washington's views are in the same source, you must report both views, and not only the one you agree with. They're in the same source and you may not cherrypick from a source.
Consensus change outside of Wikipedia
A consensus in a field or discipline is not usually the responsibility of one author, even if the author is an especially influential leader. A change in that consensus, such as on whether light needs ether as a medium for travel, is reportable in Wikipedia when sourcing reflecting consensus has reported that change, but one author's change of view usually is not a change in the consensus.
Paraphrasings and quotations
Cherrypicking should not be done, regardless of whether the result is a quotation or a paraphrase.
It is legitimate to ask on a page's talk page once about whether cherrypicking occurred in a specific case. If your question is based on speculation, that is where you may speculate, and even then only if reasonable. Beyond that, however, it is against assuming good faith to persist in claiming that cherrypicking occurred unless evidence of it has been uncovered. You may know a subject very well and believe an article to be erroneous, but either you know it from other sourcing (but a conflict between multiple sources is not evidence of cherrypicking) or, if you know it from the same sourcing that you believe was cherrypicked by another editor, you should be able to find evidence of cherrypicking. Speculation is not an acceptable ground for continuing to challenge content as cherrypicked. In general, the better course is to find any contradictory or significant qualifying information yourself and to edit the page to reflect what you have found.
Deletion or debate
Contradiction may justify deleting contradicted information more weakly sourced, but often it justifies presenting both sides of a topic, as by leaving intact the original statement and adding a new statement, so readers can know multiple perspectives. Which course to follow depends on the case, but hypothetical examples may illuminate the difference:
- An author says something is true and a year later retracts the statement. Usually, only report the later statement or nothing at all. The exception would be if the earlier statement remains especially notorious after the retraction and must be discussed even though it was retracted, but that is rare.
- A book says that according to one religion only persons A, B, and C are prophets but according to another religion only persons D, E, and F are prophets. Although that is contradictory, in an article about both religions both statements should be reported, perhaps in the form of a fully disclosed disagreement. The reason is that the book is itself relying on two other sources, each of which may be authoritative for its own subject but not for the other source's subject.
- A book's title appears to be a statement of fact but the author, inside the book or elsewhere, denies what the title says is correct as a statement of fact. This did happen with one book, the title of which placed one class of people as superior to another class, whereas inside the book the author denied that superiority. This can happen because a publisher wants a catchier title in order to encourage more sales, and some publishing contracts take control of the titles away from the authors. We do not ordinarily report a fact on the basis of a book title alone, but in a case like this we would be especially unlikely to do so.
Qualification probably does not require deletion or even debate, as long as significant qualifications are reported.
Larger meaning of cherrypicking outside Wikipedia
Outside of Wikipedia, in many media, cherrypicking has a wider meaning that is not entirely useful for the editing of Wikipedia. In those outside media, authors may be expected to have sufficient knowledge of a subject to refrain from misrepresenting the state of knowledge by cherrypicking sources from all the sources on the subject. This is especially true of scholarship. For example, for a peer-reviewed publication, a scholarly mathematician should know the consensus of mathematical knowledge on a topic or should not publish at all on that topic. However, Wikipedia has no such requirement for its editors. An editor who knows just one fact and just one source about it is allowed to add the fact and the source to Wikipedia, as long as all policies and guidelines are satisfied, and other editors can separately add more facts and sources from their knowledge. We strive to reach the state where the article as a whole is not the result of cherrypicking of some sources from many sources; for example, neutrality or due weight may require it, but an individual editor is not required to be neutral, only the article is, and an article's neutrality and balancing of weight are often achieved by adding content, without necessarily deleting other content. Our concern against cherrypicking when editing a Wikipedia article is generally against cherrypicking specifically from within a single source.
Positive meaning of cherrypicking
A positive sense of cherrypicking is 'selecting relevant information and not selecting irrelevant information'. We're supposed to do that when writing for Wikipedia. For example, if you're writing an article about one person and basing it on a source about that person's family, you generally should select only information about the one person and ignore most information about most other people, even though they're all extensively detailed in the same source.
- Wikipedia:Coatrack (the section Fact picking)
- Wikipedia:How to mine a source
- Wikipedia:Children's, adult new reader, and large print sources questionable on reliability (using these sources is not cherrypicking in the negative sense discussed in this essay but some editorial decisions in creating these sources may be akin to cherrypicking by other people)
- Template:Cherry picked (for an article with largely one-sided views regardless of sourcing)
Notes and references
- Although often attributed to John Maynard Keynes, that is doubted for want of a primary source (Zweig, Jason, Keynes: He Didn’t Say Half of What He Said. Or Did He?, in The Wall Street Journal (probably online only), February 11, 2011, 9:19 a.m., as accessed November 27, 2012), although one blog poster argued that it is reasonable (id., June 25, 2011, 3:14 a.m.) and, given that Google said it turned up 441,000 matches for the quotation (id., article), I argue now that doubtless someone besides Keynes said it and, given the content and if said by enough people, doubtless a scholar said it and it's useful and harmless to include it here (in a non-article) with the qualification, rather than write it in my own words and less wittily, somewhat as if I invented the notion. If an attribution to a presumably now-anonymous scholar can be provided, it'll be good to add it.
- Violet, Ultra, Famous For 15 Minutes: My Years With Andy Warhol (N.Y.: Avon Books, 1st Avon Books Trade Printing April 1990, © 1988 (ISBN 0-380-70843-4)), p. v (Disclaimer).
- Websites supporting searches include, for hardcopy sources, Amazon.com (http://www.amazon.com/) and Google Books (http://books.google.com/) and, for public domain works, Project Gutenberg (http://www.gutenberg.org/), the Perseus Digital Library (http://www.perseus.tufts.edu/hopper/), the Internet Archive or Wayback Machine (http://archive.org/) (this also has old Web pages), and paid-for databases (often free in libraries) such as JStor (for books and journals) and those from ProQuest and EbscoHost.
- Reportedly, the old consensus was affirmative; Einstein disagreed; and some scientists questioned or denied Einstein's view for a few decades. Einstein's view survived as the new consensus of the field.