Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013

Varování

Publikace nespadá pod Pedagogickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

SUCHOMEL Šimon KASPRZAK Jan BRANDEJS Michal

Rok publikování 2013
Druh Článek ve sborníku
Konference 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://ceur-ws.org/Vol-1179/
Obor Informatika
Klíčová slova suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; contextual n gram; word n gram; representative sentence; overlapping detection; snippet similarity; global postprocessing
Popis This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.