2021

  1. LayoutReader: Pre-training of Text and Layout for Reading Order Detection Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, and Furu Wei EMNLP 2021 [arXiv] [Code]
  2. LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu*, Yiheng Xu*, Tengchao Lv*, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, and Lidong Zhou ACL 2021 [arXiv] [Code] [Blog]

2020

  1. DocBank: A Benchmark Dataset for Document Layout Analysis Minghao Li*, Yiheng Xu*, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou COLING 2020 [arXiv] [Code] [Data] [Blog]
  2. LayoutLM: Pre-training of Text and Layout for Document Image Understanding Yiheng Xu*, Minghao Li*, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou KDD 2020 [arXiv] [Slides] [Code] [Blog]
  3. Graph Convolutional Networks with Markov Random Field Reasoning for Social Spammer Detection Yongji Wu, Defu Lian, Yiheng Xu, Le Wu, and Enhong Chen AAAI 2020 [PDF]

Preprints

2021

  1. LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, and Furu Wei [arXiv] [Code]