Kenneth Wong, namely Wenbin Wang (王文彬), is currently a senior researcher at Huawei Noah’s Ark Lab. He obtained his Ph.D. degree in the Key Lab. of Intelligent Information Processing (IIP) at Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) , advised by Prof. Xilin Chen and Prof. Ruiping Wang . His research interests include but not limited to 2D/3D scene understanding, object detection, scene graph generation, and image captioning. Before this, he received his B.Eng. degree in Computer Science and Technology as an undergraduate student of Nankai University (NKU, 2013 - 2017).
“Those times when you get up early and you work hard; those times when you stay up late and you work hard; those times when don’t feel like working — you’re too tired, you don’t want to push yourself — but you do it anyway. That is actually the dream. That’s the dream.”
Ph.D. in Computer Vision / Artificial Intelligence, 2017 - 2022
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
B.Eng. in Computer Science and Technology, 2013 - 2017
Nankai University, Tianjin, China
[May. 9, 2023]: One paper was accepted to International Journal of Computer Vision (IJCV).
[Sept. 26, 2022]: I joined Huawei Noah’s Ark Lab.
[Sept. 18, 2022]: I obtained my Ph.D. degree from ICT, CAS.
[Jul. 19, 2022]: One paper was accepted to SCIENTIA SINICA Informationis.
[Jul. 23, 2021]: One paper was accepted to ICCV 2021!
[Aug. 10, 2020]: The code of our HetH (ECCV 2020) was released.
[Jul. 3, 2020]: One paper was accepted to ECCV 2020!
[Jun. 15 ~ 19, 2019]: I attended the CVPR 2019 held in Long Beach, CA, U.S.
[Mar. 1, 2019]: One paper was accepted to CVPR 2019!
[Sept. 1, 2017]: I joined the Key Lab. of IIP, ICT, CAS as a Ph.D. student. Go Bears!
Scene graph aims to faithfully reveal humans’ perception of image content. When humans look at a scene, they usually focus on their interested parts in a special priority. This innate habit indicates a hierarchical preference about human perception. Therefore, we argue to generate the Scene Graph of Interest which should be hierarchically constructed, so that the important primary content is firstly presented while the secondary one is presented on demand. To achieve this goal, we propose the Tree–Guided Importance Ranking (TGIR) model. We represent the scene with a hierarchical structure by firstly detecting objects in the scene and organizing them into a Hierarchical Entity Tree (HET) according to their spatial scale, considering that larger objects are more likely to be noticed instantly.
Scene graph generation has been a popular topic in recent years. However, it has suffered from the bias brought by the long-tailed distribution among the relationships. The scene graph generator prefers to predict the head predicates which are ambiguous and less precise. It makes the scene graph convey less information and degenerate into the stacking of objects, which restricts other applications from reasoning on the graph. In order to make the generator predict more diverse relationships and give a precise scene graph, we propose a method called Additional Biased Predictor (ABP) assisted balanced learning.
If an image tells a story, the scene graph and image caption are the most popular narrators. Generally, a scene graph prefers to be an omniscient generalist, while the image caption is more willing to be a specialist, which outlines the gist. Lots of previous studies have found that as a generalist, a scene graph is not enough to serve for downstream advanced intelligent tasks unless it can reduce the trivial contents and noises. In this respect, the image caption is a good teacher. To this end, we let the scene graph borrow the ability from the image caption.
Scene graph aims to faithfully reveal humans’ perception of image content. When humans analyze a scene, they usually prefer to describe image gist first, namely major objects and key relations in a scene graph, which contains essential image content. This humans’ inherent perceptive habit implies that there exists a hierarchical structure about humans’ preference during the scene parsing procedure. Therefore, we argue that a desirable scene graph should be also hierarchically constructed, and introduce a new scheme for modeling scene graph.
Relationship is the core of scene graph, but its prediction is far from satisfying because of its complex visual diversity. To alleviate this problem, we treat relationship as an abstract object, exploring not only significative visual pattern but contextual information for it, which are two key aspects when considering object recognition. Our observation on current datasets reveals that there exists intimate association among relationships. Therefore, inspired by the successful application of context to object-oriented tasks, we especially construct context for relationships where all of them are gathered so that the recognition could benefit from their association.
PDF Code Dataset Slides Video DOI arXiv Springer ECCV proc. supp
Python, C/C++
Scikit-Learn, Pandas, NumPy, Matplotlib
Common Machine Learning Models
Pytorch, TensorFlow, Git, VS Code, Jupyter