In a recent effort, the PRIOR team has created a computer vision system called VisProg that solves complex compositional tasks described in natural language by generating and executing programs. Each line of the generated program may invoke one of several off-the-shelf computer vision models, image processing routines, or python functions to produce intermediate outputs that may be consumed by subsequent parts of the program. In other words, VisProg solves tasks by writing code!
|
|