Bingsu
/

clip-vit-base-patch32-ko

@@ -1,81 +1,47 @@
 ---
-widget:
-- src: http://images.cocodataset.org/val2017/000000039769.jpg
-  candidate_labels: 고양이, 강아지, 토끼
-  example_title: cat and remote
-language: ko
 license: mit
 ---
-# clip-vit-base-patch32-ko
-Korean CLIP model trained by [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813)
-[Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813)로 학습된 한국어 CLIP 모델입니다.
-훈련 코드: <https://github.com/Bing-su/KoCLIP_training_code>
-사용된 데이터: AIHUB에 있는 모든 한국어-영어 병렬 데이터
-## How to Use
-#### 1.
-```python
-import requests
-import torch
-from PIL import Image
-from transformers import AutoModel, AutoProcessor
-repo = "Bingsu/clip-vit-base-patch32-ko"
-model = AutoModel.from_pretrained(repo)
-processor = AutoProcessor.from_pretrained(repo)
-url = "http://images.cocodataset.org/val2017/000000039769.jpg"
-image = Image.open(requests.get(url, stream=True).raw)
-inputs = processor(text=["고양이 두 마리", "개 두 마리"], images=image, return_tensors="pt", padding=True)
-with torch.inference_mode():
-    outputs = model(**inputs)
-logits_per_image = outputs.logits_per_image
-probs = logits_per_image.softmax(dim=1)
-```
-```python
->>> probs
-tensor([[0.9926, 0.0074]])
-```
-#### 2.
-```python
-from transformers import pipeline
-repo = "Bingsu/clip-vit-base-patch32-ko"
-pipe = pipeline("zero-shot-image-classification", model=repo)
-url = "http://images.cocodataset.org/val2017/000000039769.jpg"
-result = pipe(images=url, candidate_labels=["고양이 한 마리", "고양이 두 마리", "분홍색 소파에 드러누운 고양이 친구들"], hypothesis_template="{}")
-```
-```python
->>> result
-[{'score': 0.9456236958503723, 'label': '분홍색 소파에 드러누운 고양이 친구들'},
- {'score': 0.05315302312374115, 'label': '고양이 두 마리'},
- {'score': 0.0012233294546604156, 'label': '고양이 한 마리'}]
-```
-## Tokenizer
-토크나이저는 한국어 데이터와 영어 데이터를 7:3 비율로 섞어, 원본 CLIP 토크나이저에서 `.train_new_from_iterator`를 통해 학습되었습니다.
-https://github.com/huggingface/transformers/blob/bc21aaca789f1a366c05e8b5e111632944886393/src/transformers/models/clip/modeling_clip.py#L661-L666
-```python
-        # text_embeds.shape = [batch_size, sequence_length, transformer.width]
-        # take features from the eot embedding (eot_token is the highest number in each sequence)
-        # casting to torch.int for onnx compatibility: argmax doesn't support int64 inputs with opset 14
-        pooled_output = last_hidden_state[
-            torch.arange(last_hidden_state.shape[0]), input_ids.to(torch.int).argmax(dim=-1)
-        ]
-```
-CLIP 모델은 `pooled_output`을 구할때 id가 가장 큰 토큰을 사용하기 때문에, eos 토큰은 가장 마지막 토큰이 되어야 합니다.

 ---
 license: mit
+tags:
+- generated_from_keras_callback
+model-index:
+- name: clip-vit-base-patch32-ko
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information Keras had access to. You should
+probably proofread and complete it, then remove this comment. -->
+# clip-vit-base-patch32-ko
+This model is a fine-tuned version of [Bingsu/clip-vit-base-patch32-ko](https://huggingface.co/Bingsu/clip-vit-base-patch32-ko) on an unknown dataset.
+It achieves the following results on the evaluation set:
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- optimizer: None
+- training_precision: float32
+### Training results
+### Framework versions
+- Transformers 4.23.1
+- TensorFlow 2.9.2
+- Tokenizers 0.13.1

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_commit_hash": null,
   "_name_or_path": "Bingsu/clip-vit-base-patch32-ko",
   "architectures": [
     "CLIPModel"
@@ -14,6 +14,7 @@
     "architectures": null,
     "attention_dropout": 0.0,
     "bad_words_ids": null,
     "bos_token_id": 0,
     "chunk_size_feed_forward": 0,
     "cross_attention_hidden_size": null,
@@ -67,6 +68,7 @@
     "return_dict": true,
     "return_dict_in_generate": false,
     "sep_token_id": null,
     "task_specific_params": null,
     "temperature": 1.0,
     "tf_legacy_loss": false,
@@ -77,7 +79,7 @@
     "top_p": 1.0,
     "torch_dtype": null,
     "torchscript": false,
-    "transformers_version": "4.22.1",
     "typical_p": 1.0,
     "use_bfloat16": false,
     "vocab_size": 49408
@@ -91,6 +93,7 @@
     "architectures": null,
     "attention_dropout": 0.0,
     "bad_words_ids": null,
     "bos_token_id": null,
     "chunk_size_feed_forward": 0,
     "cross_attention_hidden_size": null,
@@ -146,6 +149,7 @@
     "return_dict": true,
     "return_dict_in_generate": false,
     "sep_token_id": null,
     "task_specific_params": null,
     "temperature": 1.0,
     "tf_legacy_loss": false,
@@ -156,7 +160,7 @@
     "top_p": 1.0,
     "torch_dtype": null,
     "torchscript": false,
-    "transformers_version": "4.22.1",
     "typical_p": 1.0,
     "use_bfloat16": false
   },

 {
+  "_commit_hash": "6f381bab5397bf31910ecd753491b53c84383811",
   "_name_or_path": "Bingsu/clip-vit-base-patch32-ko",
   "architectures": [
     "CLIPModel"
     "architectures": null,
     "attention_dropout": 0.0,
     "bad_words_ids": null,
+    "begin_suppress_tokens": null,
     "bos_token_id": 0,
     "chunk_size_feed_forward": 0,
     "cross_attention_hidden_size": null,
     "return_dict": true,
     "return_dict_in_generate": false,
     "sep_token_id": null,
+    "suppress_tokens": null,
     "task_specific_params": null,
     "temperature": 1.0,
     "tf_legacy_loss": false,
     "top_p": 1.0,
     "torch_dtype": null,
     "torchscript": false,
+    "transformers_version": "4.23.1",
     "typical_p": 1.0,
     "use_bfloat16": false,
     "vocab_size": 49408
     "architectures": null,
     "attention_dropout": 0.0,
     "bad_words_ids": null,
+    "begin_suppress_tokens": null,
     "bos_token_id": null,
     "chunk_size_feed_forward": 0,
     "cross_attention_hidden_size": null,
     "return_dict": true,
     "return_dict_in_generate": false,
     "sep_token_id": null,
+    "suppress_tokens": null,
     "task_specific_params": null,
     "temperature": 1.0,
     "tf_legacy_loss": false,
     "top_p": 1.0,
     "torch_dtype": null,
     "torchscript": false,
+    "transformers_version": "4.23.1",
     "typical_p": 1.0,
     "use_bfloat16": false
   },

tf_model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea376ac0b923856e999412382f09b8aab4401a99d6ceabd2cba7ac2d1b75ddd1
+size 605559544