  1. Isaac Johnson, Martin Gerlach, Diego Saez-Trumper Language-agnostic Topic Classification for Wikipedia WikiWorkshop 2021 [paper] [arxiv] [code] [data] [meta]

  2. Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, Djellel Difallah A Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach under review [paper] [code] [meta]

  3. Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft) unpublished [paper] [meta]

  4. Ziyou Ren, Martin Gerlach, Hanyu Shi, GR Scott Budinger, Luis A Nunes Amaral Information-theory-based benchmarking and feature selection algorithm improve cell type annotation and reproducibility of single cell RNA-seq data analysis pipelines under review [paper] [code]

  5. Martin Gerlach, Francesc Font-Clos A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics Entropy (2020) [paper] [arxiv] [code] [data]

  6. Martin Gerlach*, Hanyu Shi*, Luis A.N. Amaral (* equal contribution) A universal information theoretic approach to the identification of stopwords Nature Machine Intelligence (2019) [paper] [pdf (read-only)] [pdf] [data&code]

    Media Coverage: +Northwestern News

  7. Martin Gerlach, Eduardo G. Altmann Testing statistical laws in complex systems Physical Review Letters (2019) [paper] [arxiv] [data&code]

  8. Julia Poncela-Casasnovas, Martin Gerlach, Nathan Aguirre, Luis A.N. Amaral Large scale analysis of micro-level citation patterns reveals nuanced selection criteria Nature Human Behaviour (2019) [paper] [pdf] [data&code]

    Media Coverage:

  9. Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis A.N. Amaral A new evaluation framework for topic modeling algorithms based on synthetic corpora AISTATS (2019), Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics [paper] [arxiv] [code]

  10. Thomas Stoeger, Martin Gerlach, Richard I. Morimoto, Luis A.N. Amaral Large-scale investigation of the reasons why potentially important genes are ignored PLOS Biology (2018) [paper] [data&code]

    Media Coverage: + Top 10% most cited papers in PLOS Biology + PLOS Biology Primer by Ian Dunham + New York Times by Carl Zimmer + The Atlantic by Ed Yong + Northwestern News (Video) + The Economist + Science Magazine + Nature (Daily Briefing) + F1000-prime

    Comments/Replies: + Reply to “Far away from the lamppost”, PLOS Biology (2019)

  11. Martin Gerlach, Beatrice Farb, William Revelle, Luis A.N. Amaral A robust data-driven approach identifies four personality types across four large data sets Nature Human Behaviour (2018) [paper] [pdf] [data&code]

    Media Coverage: + Northwestern News (Video) + Scientific American + Science Magazine + Time Magazine + Washington Post + Süddeutsche Zeitung (German)


  12. Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann A network approach to topic models Science Advances (2018) [paper] [arxiv] [code: TopSBM]

    Media Coverage: + Northwestern Data Science Initiative + TechXplore

  13. Laercio Dias, Martin Gerlach, Joachim Scharloth, Eduardo G. Altmann Using text analysis to quantify the similarity and evolution of scientific disciplines Royal Society Open Science (2018) [paper] [arxiv]

  14. Eduardo G. Altmann, Laercio Dias, Martin Gerlach Generalized Entropies and the Similarity of Texts Journal of Statistical Mechanics (2017) [paper] [arxiv]

  15. Jorge C. Leitão, Jose M. Miotto, Martin Gerlach, Eduardo G. Altmann Is this scaling nonlinear? Royal Society Open Science (2016) [paper] [arxiv] [data&code]

  16. Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann Similarity of symbol frequency distributions with heavy tails Physical Review X (2016) [paper] [arxiv]

    Media Coverage: + APS Focus by Philip Ball + Physics Today + Ria Novotny (Russian)

  17. Eduardo G. Altmann, Martin Gerlach Statistical laws in linguistics In Creativity and Universality in Language (Springer, 2016) [paper] [arxiv]

  18. Martin Gerlach, Eduardo G. Altmann Scaling laws and fluctuations in the statistics of word frequencies New Journal of Physics (2014) [paper] [arxiv]

  19. Fakhteh Ghanbarnejad*, Martin Gerlach*, Jose M. Miotto, Eduardo G. Altmann (* equal contribution) Extracting information from S-curves of language change Journal of The Royal Society Interface (2014) [paper] [arxiv] [data&code]

    Media Coverage: + Spiegel Online (German)

  20. Martin Gerlach, Eduardo G. Altmann Stochastic Model for the Vocabulary Growth in Natural Languages Physical Review X (2013) [paper] [arxiv]

  21. Martin Gerlach, Sebastian Wüster, Jan-Michael Rost Kicking Electrons Journal of Physics B (2012) [paper] [arxiv]

    Media Coverage: + Selected for Highlights of 2012 by the editors of Journal of Physics B


  • Doctoral Thesis (2016) Universality and variability in the statistics of data with fat-tailed distributions: The case of word frequencies in natural languages presented to the Physics Department of Dresden University of Technology; produced at the Max Planck Institute for the Physics of Complex Systems [link] [pdf]

  • Diploma Thesis (2011) Hamiltonians dominanter Wechselwirkung presented to the Physics Department of Dresden University of Technology; produced at the Max Planck Institute for the Physics of Complex Systems

