Vita: An efficient video-to-text algorithm using vlm for rag-based video analysis system