BIBLIOGRAPHY
[36] C. Kervadec, G. Antipov, M. Baccouche, and C. Wolf, “Roses are red, violets
are blue... but should vqa expect them to?” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), June 2021,
pp. 2776–2785.
[37] R. Cadene, C. Dancette, H. Ben younes, M. Cord, and D. Parikh, “Rubi:
Reducing unimodal biases for visual question answering,” in Advances in Neural
Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer,
F. d'Alch´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates,
Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/
51d92be1c60d1db1d2e5e7a07da55b26-Paper.pdf
[38] Y. Lu, Q. Wang, S. Ma, T. Geng, Y. V. Chen, H. Chen, and D. Liu, “Transflow:
Transformer as flow learner,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2023, pp. 18 063–18 073.
[39] Q. Wang, L. Yang, X. Quan, F. Feng, D. Liu, Z. Xu, S. Wang, and H. Ma,
“Learning to generate question by asking question: a primal-dual approach with
uncommon word generation,” in Proceedings of the 2022 Conference on Empirical
Methods in Natural Language Processing, 2022, pp. 46–61.
[40] Z. Cao, Z. Chu, D. Liu, and Y. Chen, “A vector-based representation to enhance
head pose estimation,” in Proceedings of the IEEE/CVF Winter Conference on
applications of computer vision, 2021, pp. 1188–1197.
[41] Z. Cheng, J. Liang, H. Choi, G. Tao, Z. Cao, D. Liu, and X. Zhang, “Physi-
cal attack on monocular depth estimation with optimal adversarial patches,” in
European Conference on Computer Vision. Springer, 2022, pp. 514–532.
[42] Q. Wang, Y. Fang, A. Ravula, F. Feng, X. Quan, and D. Liu, “Webformer: The
web-page transformer for structure information extraction,” in Proceedings of the
ACM Web Conference 2022, 2022, pp. 3124–3133.
[43] D. Liu, J. Liang, T. Geng, A. Loui, and T. Zhou, “Tripartite feature enhanced
pyramid network for dense prediction,” IEEE Transactions on Image Processing,
2023.
[44] Z. Cheng, J. Liang, G. Tao, D. Liu, and X. Zhang, “Adversarial training of self-
supervised monocular depth estimation against physical-world attacks,” ICLR,
2023.
[45] C. Han, Q. Wang, Y. Cui, Z. Cao, W. Wang, S. Qi, and D. Liu, “Eˆ 2vpt: An
effective and efficient approach for visual prompt tuning,” ICCV, 2023.
[46] Z. Cao, D. Liu, Q. Wang, and Y. Chen, “Towards unbiased label distribution
learning for facial pose estimation using anisotropic spherical gaussian,” in Eu-
ropean Conference on Computer Vision. Springer, 2022, pp. 737–753.
60