A confusion on cross-attention module

#9
by tibetgao - opened

Hi there,
According to your tech report, you have mentioned that there is a position-aware vision-language adaptor, which comprise a single-layer cross-attention. However, though reading your code I can't find this module, but only a concatenation of the visual embedding and the hidden state. Will you kindly point it out please?

Best regards

This is in the visual.py file under the class Resampler

Sign up or log in to comment