[Google scholar] [orcid]


  1. Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, Djellel Difallah A Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach CIKM 2021 [paper] [arxiv] [code] [meta] [tool]

  2. Charles C. Hyland, Yuanming Tao, Lamiae Azizi, Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann Multilayer networks for text analysis with multiple data types
    EPJ Data Science (2021)
    [paper] [data&code: TopSBM]

  3. Isaac Johnson, Martin Gerlach, Diego Saez-Trumper Language-agnostic Topic Classification for Wikipedia
    WikiWorkshop 2021
    [paper] [arxiv] [code] [data] [meta]

  4. Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft)
    [paper] [meta]

  5. Ziyou Ren, Martin Gerlach, Hanyu Shi, GR Scott Budinger, Luis A Nunes Amaral Information-theory-based benchmarking and feature selection algorithm improve cell type annotation and reproducibility of single cell RNA-seq data analysis pipelines
    under review
    [paper] [code]

  6. Martin Gerlach, Francesc Font-Clos A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics
    Entropy (2020)
    [paper] [arxiv] [code] [data]

  7. Martin Gerlach*, Hanyu Shi*, Luis A.N. Amaral (* equal contribution) A universal information theoretic approach to the identification of stopwords
    Nature Machine Intelligence (2019)
    [paper] [pdf (read-only)] [pdf] [data&code]

    Media Coverage:
    +Northwestern News

  8. Martin Gerlach, Eduardo G. Altmann Testing statistical laws in complex systems
    Physical Review Letters (2019)
    [paper] [arxiv] [data&code]

  9. Julia Poncela-Casasnovas, Martin Gerlach, Nathan Aguirre, Luis A.N. Amaral Large scale analysis of micro-level citation patterns reveals nuanced selection criteria
    Nature Human Behaviour (2019)
    [paper] [pdf] [data&code]

    Media Coverage:

  10. Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A.N. Amaral A new evaluation framework for topic modeling algorithms based on synthetic corpora
    AISTATS (2019), Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics
    [paper] [arxiv] [code]

  11. Thomas Stoeger, Martin Gerlach, Richard I. Morimoto, Luis A.N. Amaral Large-scale investigation of the reasons why potentially important genes are ignored
    PLOS Biology (2018)
    [paper] [data&code]

    Media Coverage:
    + Top 10% most cited papers in PLOS Biology
    + PLOS Biology Primer by Ian Dunham
    + New York Times by Carl Zimmer
    + The Atlantic by Ed Yong
    + Northwestern News (Video)
    + The Economist
    + Science Magazine
    + Nature (Daily Briefing)
    + F1000-prime

    + Reply to “Far away from the lamppost”, PLOS Biology (2019)

  12. Martin Gerlach, Beatrice Farb, William Revelle, Luis A.N. Amaral A robust data-driven approach identifies four personality types across four large data sets
    Nature Human Behaviour (2018)
    [paper] [pdf] [data&code]

    Media Coverage:
    + Northwestern News (Video)
    + Scientific American
    + Science Magazine
    + Time Magazine
    + Washington Post
    + Süddeutsche Zeitung (German)


  13. Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann A network approach to topic models
    Science Advances (2018)
    [paper] [arxiv] [code: TopSBM]

    Media Coverage:
    + Northwestern Data Science Initiative
    + TechXplore

  14. Laercio Dias, Martin Gerlach, Joachim Scharloth, Eduardo G. Altmann Using text analysis to quantify the similarity and evolution of scientific disciplines
    Royal Society Open Science (2018)
    [paper] [arxiv]

  15. Eduardo G. Altmann, Laercio Dias, Martin Gerlach Generalized Entropies and the Similarity of Texts
    Journal of Statistical Mechanics (2017)
    [paper] [arxiv]

  16. Jorge C. Leitão, Jose M. Miotto, Martin Gerlach, Eduardo G. Altmann Is this scaling nonlinear?
    Royal Society Open Science (2016)
    [paper] [arxiv] [data&code]

  17. Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann Similarity of symbol frequency distributions with heavy tails
    Physical Review X (2016)
    [paper] [arxiv]

    Media Coverage:
    + APS Focus by Philip Ball
    + Physics Today
    + Ria Novotny (Russian)

  18. Eduardo G. Altmann, Martin Gerlach Statistical laws in linguistics
    In Creativity and Universality in Language (Springer, 2016)
    [paper] [arxiv]

  19. Martin Gerlach, Eduardo G. Altmann Scaling laws and fluctuations in the statistics of word frequencies
    New Journal of Physics (2014)
    [paper] [arxiv]

  20. Fakhteh Ghanbarnejad*, Martin Gerlach*, Jose M. Miotto, Eduardo G. Altmann (* equal contribution)
    Extracting information from S-curves of language change
    Journal of The Royal Society Interface (2014)
    [paper] [arxiv] [data&code]

    Media Coverage:
    + Spiegel Online (German)

  21. Martin Gerlach, Eduardo G. Altmann Stochastic Model for the Vocabulary Growth in Natural Languages
    Physical Review X (2013)
    [paper] [arxiv]

  22. Martin Gerlach, Sebastian Wüster, Jan-Michael Rost Kicking Electrons
    Journal of Physics B (2012)
    [paper] [arxiv]

    Media Coverage:
    + Selected for Highlights of 2012 by the editors of Journal of Physics B


  • Doctoral Thesis (2016)
    Universality and variability in the statistics of data with fat-tailed distributions: The case of word frequencies in natural languages presented to the Physics Department of Dresden University of Technology; produced at the Max Planck Institute for the Physics of Complex Systems [link] [pdf]

  • Diploma Thesis (2011)
    Hamiltonians dominanter Wechselwirkung presented to the Physics Department of Dresden University of Technology; produced at the Max Planck Institute for the Physics of Complex Systems

Peer-review & Editorial activity


I have reviewed for the following journals/conferences:

I have been handling editor for PNAS

Conferences & Seminars