The document proposes a method called TxV for retrieving videos using natural language text queries. TxV combines various textual and visual features from inputs like BERT, CLIP and ResNet using an identity layer. It also uses a Dual Softmax Inference technique to revise the initial text-video similarities using background queries. Experimental results on datasets like MSR-VTT, IACC.3 and V3C1 showed that TxV outperforms state-of-the-art methods, with improvements in metrics like mean extended inferred average precision and recall. The approach provides an efficient way to combine multiple features for the text-video retrieval task.