福模

免费开源AI模型下载_本地AI工具资源平台

多模态AIMultimodal AI

PaLI视觉语言模型 - 端到端语言图像理解

PaLI Vision-Language Model - End-to-End Language-Image Understanding

PaLI视觉语言模型,实现端到端语言图像理解。支持图像分类、视觉问答、图像描述等多种任务,具有统一的架构和优秀的性能。

PaLI vision-language model, achieving end-to-end language-image understanding. Supports multiple tasks including image classification, visual question answering, and image captioning, with a unified architecture and excellent performance.

视觉语言PaLI端到端图像理解Vision-LanguagePaLIEnd-to-EndImage Understanding

文件大小

18.9 GB

Upload Size

18.9 GB

上传日期

2025-03-13

Upload Date

2025-03-13

下载次数

10,200

Downloads

10,200

评分

4.6/5.0

Rating

4.6/5.0

下载资源 Download Resources

下载资源表示您同意我们的使用条款和隐私政策

By downloading this resource, you agree to our Terms of Service and Privacy Policy

相关资源推荐

Flamingo视觉语言模型 - 少样本视觉语言理解Flamingo Vision-Language Model - Few-Shot Visual Language Understanding

Flamingo视觉语言模型,实现少样本视觉语言理解。结合图像和文本信息,支持问答、描述生成等多模态任务,具有优秀的泛化能力。

Flamingo vision-language model, achieving few-shot visual language understanding. Combines image and text information, supporting multimodal tasks such as question answering and description generation, with excellent generalization capabilities.

视觉语言多模态FlamingoVision-LanguageMultimodalFlamingo
72.6 GB2025-03-11
CoCa多模态生成模型 - 联合图像文本生成CoCa Multimodal Generative Model - Joint Image-Text Generation

CoCa多模态生成模型,联合图像文本生成模型。独特地将图像编码和文本生成结合起来,实现高效的视觉语言理解与生成,适用于内容创作和图像编辑。

CoCa multimodal generative model, joint image-text generation model. Uniquely combines image encoding and text generation, achieving efficient visual language understanding and generation, suitable for content creation and image editing.

CoCa多模态图像文本CoCaMultimodalImage-Text
8.7 GB2025-04-11
CLIP多模态AI模型 - 图像文本关联理解引擎CLIP Multimodal AI Model - Image-Text Association Understanding Engine

CLIP多模态AI模型,实现图像文本关联理解的引擎。能够理解图像内容与文本描述的对应关系,支持零样本迁移学习,适用于图像检索和内容审核等任务。

CLIP multimodal AI model, an engine achieving image-text association understanding. Capable of understanding the correspondence between image content and text descriptions, supporting zero-shot transfer learning, suitable for image retrieval and content moderation tasks.

CLIP多模态图像理解CLIPMultimodalImage Understanding
8.7 GB2024-12-30