Publications

[Google scholar] [orcid]

Papers

  1. Dale Zhou, Shubhankar P. Patankar, David M. Lydon-Staley, Perry Zurn, Martin Gerlach*, Dani S. Bassett* (* equal contribution)
    Architectural styles of curiosity in global Wikipedia mobile app readership
    under review
    [arxiv] [meta]

  2. Akhil Arora, Robert West, Martin Gerlach
    Orphan Articles: The Dark Matter of Wikipedia
    ICWSM (2024), to appear
    [arxiv] [meta]

  3. Tiziano Piccardi, Martin Gerlach, Robert West
    Curious Rhythms: Temporal Regularities of Wikipedia Consumption
    ICWSM (2024), to appear
    [arxiv] [meta]

  4. Tiziano Piccardi, Martin Gerlach, Akhil Arora, Robert West
    A Large-Scale Characterization of How Readers Browse Wikipedia
    ACM Transaction on the Web (2023)
    [paper] [arxiv] [meta]

  5. Tiziano Piccardi, Martin Gerlach, Robert West
    Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading Sessions
    WikiWorkshop 2022, Companion Proceedings of The Web Conference 2022 (WWW ‘22).
    [paper] [arxiv]

  6. Akhil Arora, Martin Gerlach, Tiziano Piccardi, Alberto García-Durán, Robert West
    Wikipedia Reader Navigation: When Synthetic Data Is Enough
    WSDM 2022, Proceedings of the Fifteeenth ACM International Conference on Web Search and Data Mining
    [paper] [arxiv] [code] [meta]

  7. Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, Djellel Difallah
    A Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach
    CIKM 2021, Proceedings of the 30th ACM International Conference on Information & Knowledge Management
    [paper] [arxiv] [code] [meta] [tool]

  8. Charles C. Hyland, Yuanming Tao, Lamiae Azizi, Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann
    Multilayer networks for text analysis with multiple data types
    EPJ Data Science (2021)
    [paper] [arxiv] [data&code: TopSBM]

  9. Isaac Johnson, Martin Gerlach, Diego Saez-Trumper
    Language-agnostic Topic Classification for Wikipedia
    WikiWorkshop 2021, Companion Proceedings of the Web Conference 2021 (WWW ‘21)
    [paper] [arxiv] [code] [data] [meta]

  10. Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia
    A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft)
    unpublished
    [arxiv] [meta]

  11. Ziyou Ren, Martin Gerlach, Hanyu Shi, GR Scott Budinger, Luis A.N. Amaral
    Information-theory-based benchmarking and feature selection algorithm improve cell type annotation and reproducibility of single cell RNA-seq data analysis pipelines
    under review
    [biorxiv] [code]

  12. Martin Gerlach, Francesc Font-Clos
    A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics
    Entropy (2020)
    [paper] [arxiv] [code] [data]

  13. Martin Gerlach*, Hanyu Shi*, Luis A.N. Amaral (* equal contribution)
    A universal information theoretic approach to the identification of stopwords
    Nature Machine Intelligence (2019)
    [paper] [pdf (read-only)] [pdf] [data&code]

    Media Coverage:
    +Northwestern News

  14. Martin Gerlach, Eduardo G. Altmann
    Testing statistical laws in complex systems
    Physical Review Letters (2019)
    [paper] [arxiv] [data&code]

  15. Julia Poncela-Casasnovas, Martin Gerlach, Nathan Aguirre, Luis A.N. Amaral
    Large scale analysis of micro-level citation patterns reveals nuanced selection criteria
    Nature Human Behaviour (2019)
    [paper] [pdf] [data&code]

    Media Coverage:
    +phys.org

  16. Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A.N. Amaral
    A new evaluation framework for topic modeling algorithms based on synthetic corpora
    AISTATS (2019), Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics
    [paper] [arxiv] [code]

  17. Thomas Stoeger, Martin Gerlach, Richard I. Morimoto, Luis A.N. Amaral
    Large-scale investigation of the reasons why potentially important genes are ignored
    PLOS Biology (2018)
    [paper] [data&code]

    Media Coverage:
    + Top 10% most cited papers in PLOS Biology
    + PLOS Biology Primer by Ian Dunham
    + New York Times by Carl Zimmer
    + The Atlantic by Ed Yong
    + Northwestern News (Video)
    + The Economist
    + Science Magazine
    + Nature (Daily Briefing)
    + F1000-prime

    Comments/Replies:
    + Reply to “Far away from the lamppost”, PLOS Biology (2019)

  18. Martin Gerlach, Beatrice Farb, William Revelle, Luis A.N. Amaral
    A robust data-driven approach identifies four personality types across four large data sets
    Nature Human Behaviour (2018)
    [paper] [pdf] [data&code]

    Media Coverage:
    + Northwestern News (Video)
    + Scientific American
    + Science Magazine
    + Time Magazine
    + Washington Post
    + Süddeutsche Zeitung (German)

    Comments/Replies:

  19. Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann
    A network approach to topic models
    Science Advances (2018)
    [paper] [arxiv] [code: TopSBM]

    Media Coverage:
    + Northwestern Data Science Initiative
    + TechXplore

  20. Laercio Dias, Martin Gerlach, Joachim Scharloth, Eduardo G. Altmann
    Using text analysis to quantify the similarity and evolution of scientific disciplines
    Royal Society Open Science (2018)
    [paper] [arxiv]

  21. Eduardo G. Altmann, Laercio Dias, Martin Gerlach
    Generalized Entropies and the Similarity of Texts
    Journal of Statistical Mechanics (2017)
    [paper] [arxiv]

  22. Jorge C. Leitão, Jose M. Miotto, Martin Gerlach, Eduardo G. Altmann
    Is this scaling nonlinear?
    Royal Society Open Science (2016)
    [paper] [arxiv] [data&code]

  23. Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann
    Similarity of symbol frequency distributions with heavy tails
    Physical Review X (2016)
    [paper] [arxiv]

    Media Coverage:
    + APS Focus by Philip Ball
    + Physics Today
    + Ria Novotny (Russian)

  24. Eduardo G. Altmann, Martin Gerlach
    Statistical laws in linguistics
    In Creativity and Universality in Language (Springer, 2016)
    [paper] [arxiv]

  25. Martin Gerlach, Eduardo G. Altmann
    Scaling laws and fluctuations in the statistics of word frequencies
    New Journal of Physics (2014)
    [paper] [arxiv]

  26. Fakhteh Ghanbarnejad*, Martin Gerlach*, Jose M. Miotto, Eduardo G. Altmann (* equal contribution)
    Extracting information from S-curves of language change
    Journal of The Royal Society Interface (2014)
    [paper] [arxiv] [data&code]

    Media Coverage:
    + Spiegel Online (German)

  27. Martin Gerlach, Eduardo G. Altmann
    Stochastic Model for the Vocabulary Growth in Natural Languages
    Physical Review X (2013)
    [paper] [arxiv]

  28. Martin Gerlach, Sebastian Wüster, Jan-Michael Rost
    Kicking Electrons
    Journal of Physics B (2012)
    [paper] [arxiv]

    Media Coverage:
    + Selected for Highlights of 2012 by the editors of Journal of Physics B

Blogposts

  1. Martin Gerlach, Isaac Johnson, and Nazia Tasnim
    From hell to HTML: releasing a Python package to easily work with Wikimedia HTML dumps
    Wikimedia Tech-blog (2023)
    [tool] [code]

  2. Muniza A., Isaac Johnson, and Martin Gerlach
    Analyzing the Wikipedia clickstream just got easier with WikiNav
    Wikimedia Tech-blog (2021)
    [tool] [code] [meta]

  3. Cristina Butoiu, Martin Gerlach, and Leighanna Mixter
    World Suicide Prevention Day and the opportunity to increase access to mental health information on Wikimedia projects
    Wikimedia Diff-blog (2021)

Thesis

  • Doctoral Thesis (2016)
    Universality and variability in the statistics of data with fat-tailed distributions: The case of word frequencies in natural languages
    presented to the Physics Department of Dresden University of Technology; produced at the Max Planck Institute for the Physics of Complex Systems [link] [pdf]

  • Diploma Thesis (2011)
    Hamiltonians dominanter Wechselwirkung
    presented to the Physics Department of Dresden University of Technology; produced at the Max Planck Institute for the Physics of Complex Systems


Academic service

[Publons-profile]

I have reviewed for the following journals/conferences:

I have been handling editor for PNAS

I have been co-organizer for WikiWorkshop 2023 and WikiWorkshop 2024


Conferences & Seminars