ScreenshotCitePlagChineseStereoDisplay

Citation-based Plagiarism Detection (CbPD) is a novel approach capable of identifying disguised plagiarism in academic texts [1].

For details and an in-depth analysis of the CbPD approach, refer to the doctoral dissertation of Bela Gipp, which will be available as a book from Springer Vieweg Research in July 2014 [2]. The thesis is also available for download here.

CbPD can be applied to any text containing citations – this includes academic documents, scientific publications, patents, legal cases, etc. The approach overcomes the shortcoming of existing text-based plagiarism detection methods. Existing methods typically fail to detect translated and strongly disguised plagiarism instances, since they only examine words (i.e. text overlap) in documents to detect suspicious similarity.

In contrast, CbPD makes use of the semantic information implied by the citations within documents. The approach identifies and analyses similar patterns in the citation sequences of academic documents to compute similarity [3].

Our observations confirmed that citation pattern similarity often remains detectable even if text has been translated or strongly paraphrased [4]. Thus, in many instances, CbPD allows detecting plagiarisms that could otherwise not have been automatically identified using the traditional text-based approaches: for example, when text was sufficiently disguised by synonyms or word rearrangement, or because copied text was translated. That citation patterns in plagiarized texts often have suspicious similarities with the citation patterns in the original source document(s) was also confirmed in our analysis of the plagiarized doctoral thesis of Karl-Theodor zu Guttenberg [5] as well an analysis of the VroniPlag Wiki performed in [6]. An evaluation of the citation-based approach on a large scale collection of over 200,000 scientific publications in the PubMed Central Open Access Subset demonstrated the practicability of the approach in a real-world setting and on a range of realistically disguised plagiarism forms [7].

Below are two documents visualized using CbPD. Matching citations are highlighted and connected in a central column for quick document similarity examination. The documents share no literal text similarity: the left publication is in English and the right in Chinese. However, one can see, that the citation overlap is high, and the order in which citations are made is nearly identical in several paragraphs. The English text only contains 7 citations that are not shared (gray circles) with the Chinese text.

Chines_English_CitePlag

(This example is a real plagiarism – visualized using our Citeplag Prototype.
The scientific plagiarism occurred in the Elsevier Journal of Neuroscience Letters,
and the English publication, which translated the earlier Chinese publication has since been retracted.)

CbPD is not a substitute, but rather an extension to currently used text-based plagiarism detection approaches. While the text analysis method of today can detect even very short copies of text snippets if they have not been sufficiently modified, the CbPD approach requires longer passages of text with three or more citations to yield dependable results.

Test the CbPD approach for yourself in the first prototype of a citation-based plagiarism detection system [4]: CitePlag.

Related Publications

[1] [pdf] [doi] B. Gipp and J. Beel, “Citation Based Plagiarism Detection – A New Approach to Identify Plagiarized Work Language Independently,” in Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT’10), New York, NY, USA, 2010.
[Bibtex]
@InProceedings{Gipp10c,
  Title                    = {{C}itation {B}ased {P}lagiarism {D}etection - {A} {N}ew {A}pproach to {I}dentify {P}lagiarized {W}ork {L}anguage {I}ndependently},
  Author                   = {{G}ipp, {B}ela and {B}eel, {J}oeran},
  Booktitle                = {{P}roceedings of the 21st {ACM} {C}onference on {H}ypertext and {H}ypermedia ({HT}'10)},
  Year                     = {2010},
  Address                  = {New York, NY, USA},
  Month                    = {Jun.},
  Publisher                = {ACM},
  Doi                      = {10.1145/1810617.1810671},
  ISBN                     = {978-1-4503-0041-4},
  Location                 = {Toronto, Ontario, Canada}
}
[2] [pdf] [doi] B. Gipp, Citation-based Plagiarism Detection – Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis, Springer Vieweg Research, 2014.
[Bibtex]
@Book{ThesisBelaGipp,
  Title                    = {{C}itation-based {P}lagiarism {D}etection - {D}etecting {D}isguised and {C}ross-language {P}lagiarism using {C}itation {P}attern {A}nalysis},
  Author                   = {{G}ipp, {B}ela},
  Publisher                = {Springer Vieweg Research},
  Year                     = {2014},

  Doi                      = {0.1007/978-3-658-06394-8},
  ISBN                     = {978-3-658-06393-1},
  Pages                    = {350},
  School                   = {Department of Computer Science, Otto-von-Guericke-University Magdeburg, Germany},
  Url                      = {http://www.springer.com/978-3-658-06393-1}
}
[3] [pdf] [doi] B. Gipp and N. Meuschke, “Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence,” in Proceedings of the 11th ACM symposium on Document engineering (DocEng ’11), Mountain, View, CA, USA, 2011.
[Bibtex]
@InProceedings{Gipp11c,
  Title                    = {{C}itation {P}attern {M}atching {A}lgorithms for {C}itation-based {P}lagiarism {D}etection: {G}reedy {C}itation {T}iling, {C}itation {C}hunking and {L}ongest {C}ommon {C}itation {S}equence},
  Author                   = {{G}ipp, {B}ela and {M}euschke, {N}orman},
  Booktitle                = {{P}roceedings of the 11th {ACM} symposium on {D}ocument engineering ({D}oc{E}ng '11)},
  Year                     = {2011},
  Address                  = {Mountain, View, CA, USA},
  Month                    = {Sep.},
  Publisher                = {ACM},
  Doi                      = {10.1145/2034691.2034741},
  ISBN                     = {978-1-4503-0863-2}
}
[4] [pdf] [doi] B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nuernberger, “Demonstration of Citation Pattern Analysis for Plagiarism Detection,” in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, UK, 2013.
[Bibtex]
@InProceedings{Gipp13,
  Title                    = {{D}emonstration of {C}itation {P}attern {A}nalysis for {P}lagiarism {D}etection},
  Author                   = {{G}ipp, {B}ela and {M}euschke, {N}orman and {B}reitinger, {C}orinna and {L}ipinski, {M}ario and {N}uernberger, {A}ndreas},
  Booktitle                = {{P}roceedings of the 36th {I}nternational {ACM} {SIGIR} {C}onference on {R}esearch and {D}evelopment in {I}nformation {R}etrieval},
  Year                     = {2013},
  Address                  = {Dublin, UK},
  Month                    = {Jul. 28 - Aug. 1},
  Publisher                = {ACM},
  Doi                      = {10.1145/2484028.2484214}
}
[5] [pdf] [doi] B. Gipp, N. Meuschke, and J. Beel, “Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag,” in Proceedings of 11th annual international ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11), Ottawa, Canada, 2011.
[Bibtex]
@InProceedings{Gipp11,
  Title                    = {{C}omparative {E}valuation of {T}ext- and {C}itation-based {P}lagiarism {D}etection {A}pproaches using {G}utten{P}lag},
  Author                   = {{G}ipp, {B}ela and {M}euschke, {N}orman and {B}eel, {J}oeran},
  Booktitle                = {{P}roceedings of 11th annual international {ACM}/{IEEE}-{CS} {J}oint {C}onference on {D}igital {L}ibraries ({JCDL}'11)},
  Year                     = {2011},
  Address                  = {Ottawa, Canada},
  Publisher                = {ACM},

  Doi                      = {10.1145/1998076.1998124}
}
[6] [pdf] B. Gipp, Doctoral Thesis: Citation-based Plagiarism Detection: Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications, University of Magdeburg, 2013.
[Bibtex]
@Book{Gipp13a,
  Title                    = {{D}octoral {T}hesis: {C}itation-based {P}lagiarism {D}etection: {A}pplying {C}itation {P}attern {A}nalysis to {I}dentify {C}urrently {N}on-{M}achine-{D}etectable {D}isguised {P}lagiarism in {S}cientific {P}ublications},
  Author                   = {{G}ipp, {B}ela},
  Publisher                = {University of Magdeburg},
  Year                     = {2013},
  School                   = {Department of Computer Science, Otto-von-Guericke University Magdeburg, Germany}
}
[7] [pdf] [doi] B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus,” Journal of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540, 2014.
[Bibtex]
@Article{Gipp13b,
  Title                    = {{C}itation-based {P}lagiarism {D}etection: {P}racticability on a {L}arge-scale {S}cientific {C}orpus},
  Author                   = {{G}ipp, {B}ela and {M}euschke, {N}orman and {B}reitinger, {C}orinna},
  Journal                  = {{J}ournal of the {A}merican {S}ociety for {I}nformation {S}cience and {T}echnology {(JASIST)}},
  Year                     = {2014},
  Number                   = {2},
  Pages                    = {1527--1540},
  Volume                   = {65},

  Doi                      = {10.1002/asi.23228}
}