Co-Citation Proximity Analysis (CPA) [1, 2, 3] is a method to compute both local and global instances of semantic similarity in academic documents by examining citation proximity in the full texts of documents.

CPA was developed with two applications in mind: recommender systems and clustering.
Regarding the first application, an improved measure of document semantic similarity, which computes similarity at a more fine-grained resolution, has the potential to significantly improve the relevance of academic literature recommendations. Regarding the second application, a more granular measure of document similarity allows the development of more precise clustering algorithms for academic literature.

The CPA approach is an advancement of the well-known and widespread co-citation analysis. However, in addition to co-citation analysis, CPA was the first approach that proposed using modified weights based on the proximity of co-citations to each other within an article’s full text [4]. The underlying idea is that the closer citations are to each other in the full-text of documents, the more likely they are related.

In comparison to existing approaches, like bibliographic coupling, co-citation analysis or keyword-based similarity computations, CPA achieves a higher precision and offers the possibility to pinpoint related chapters, sections or paragraphs within the texts of academic documents. Moreover, CPA allows a more precise automatic document classification.

Related Publications

[1] [pdf] B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) – A New Approach for Identifying Related Work Based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), Rio de Janeiro, Brazil, 2009.
  Title                    = {{C}itation {P}roximity {A}nalysis ({CPA}) - {A} {N}ew {A}pproach for {I}dentifying {R}elated {W}ork {B}ased on {C}o-{C}itation {A}nalysis},
  Author                   = {{G}ipp, {B}ela and {B}eel, {J}oeran},
  Booktitle                = {{P}roceedings of the 12th {I}nternational {C}onference on {S}cientometrics and {I}nformetrics ({ISSI}'09)},
  Year                     = {2009},
  Address                  = {Rio de Janeiro, Brazil},
  Editor                   = {Larsen, Birger and Leta, Jacqueline},
  Month                    = {Jul.},
  Note                     = {ISSN 2175-1935},
  Publisher                = {International Society for Scientometrics and Informetrics},
  Volume                   = {2}
[2] [doi] B. Gipp and J. Beel, “Identifying Related Documents For Research Paper Recommender By CPA And COA,” in Proceedings of The World Congress on Engineering and Computer Science 2009, Berkeley, USA, 2009.
  Title                    = {{I}dentifying {R}elated {D}ocuments {F}or {R}esearch {P}aper {R}ecommender {B}y {CPA} {A}nd {COA}},
  Author                   = {{G}ipp, {B}ela and {B}eel, {J}oeran},
  Booktitle                = {{P}roceedings of {T}he {W}orld {C}ongress on {E}ngineering and {C}omputer {S}cience 2009},
  Year                     = {2009},
  Address                  = {Berkeley, USA},
  Editor                   = {Ao, S. I. and Douglas, C. and Grundfest, W. S. and Burgstone, J.},
  Month                    = {Oct.},
  Organization             = {International Association of Engineers (IAENG)},
  Publisher                = {Newswood Limited},
  Series                   = {Lecture Notes in Engineering and Computer Science},
  Volume                   = {1},

  ISBN                     = {978-988-17012-6-8},
  Doi                      = {10.1007/s11257-016-9174-x},
  Url                      = {}
[3] B. Gipp, “Measuring Document Relatedness by Citation Proximity Analysis and Citation Order Analysis,” in Research and Advanced Technology for Digital Libraries: Proceedings of the 14th European Conference on Digital Libraries (ECDL’10), 2010.
  Title                    = {{M}easuring {D}ocument {R}elatedness by {C}itation {P}roximity {A}nalysis and {C}itation {O}rder {A}nalysis},
  Author                   = {{G}ipp, {B}ela},
  Booktitle                = {{R}esearch and {A}dvanced {T}echnology for {D}igital {L}ibraries: {P}roceedings of the 14th {E}uropean {C}onference on {D}igital {L}ibraries ({ECDL}'10)},
  Year                     = {2010},
  Editor                   = {Lalmas, M. and Jose, J. and Rauber, A. and Sebastiani, F. and Frommholz, I.},
  Month                    = {Sep.},
  Publisher                = {Springer},
  Series                   = {Lecture Notes of Computer Science (LNCS)},
  Volume                   = {6273}

[4] Kevin W. Boyack, Henry Small, Richard Klavans, “Improving the Accuracy of Co-citation Clustering Using Full Text”, in Proceedings of 17th International Conference on Science and Technology Indicators, 2012.