We generate realistic 3D avatars from just as few as five multi-view frames. Compared to the previous approaches, which may use up to 1 million images, our approach is data-efficient and easy to train. We show through the experiments, that BlendFields achieves the best results for different faces.
We extend neural 3D representations to allow for intuitive and interpretable user control beyond novel view rendering (i.e. camera control). We demonstrate, to the best of our knowledge, for the first time novel view and novel attribute re-rendering of scenes from a single video.
We present a new method that learns Constructive Solid Geometry (CSG) operations that operates on Signed Distance Field representation of 2D/3D shapes. The method discovers these operations without any supervision.
We present a new classification deep learning model for American Sign Language fingerspelling recongition. Our approach relies on multiple, image-phenomena based augmentation methods and modern learning methodology that allowed the model to achieve new state-of-the-art results.