如何在 DALL-E 3 的圖片上增加文字

根據 OpenAI 官方部落格的說明, DALL-E 目前仍然不擅長在圖片上生成文字:

您可能會想要指示 DALL·E 在圖像中生成文本,方法是給它“藍天白雲和天空書寫中的“你好”一詞”之類的指令。

然而,這不是創建文本的可靠或有效的方法。DALL·E 目前的設計目的不是生成文本,而是根據您的關鍵字或短語生成逼真且藝術的圖像。目前,它對書寫、標籤或任何其他常見文本沒有具體的理解,並且經常產生扭曲或難以理解的結果。

據筆者實際嘗試的結果,目前其他主流的 AI 生成繪圖平台如 MidJourney 、 Stable Diffusion 同樣也不擅長在圖片上生成文字。但相對於其他平台, DALL-E 3 借助於 ChatGPT 對於文字的理解能力,在某種程度上更容易生成文字。

實際操作技巧

方法:在 prompt 中加入 [ 你想要的文字 ],並加入明確的指示詞如 write 、 text 、 on 等等讓 DALL-E 3 知道你是要它把 [ ] 中的內容視作文字。請注意,盡量使用簡單的單字,避免採用過長的文字,且這個方法生成的圖片會有一定的失敗率,不是每張圖都可以順利顯現你想要的文字。若剛開始無法產生滿意的結果,請按下 Reload 按鈕多次嘗試。

假設我們想要做一張”LEGGIE的廣告看板”,這個廣告看板是放在時代廣場的大螢幕,我們先給 DALL-E 3 一個 prompt:

Make an advertising board with [LEGGIE] written on it and place it on the video wall in Times Square. Night sky, moonlight, people walking around, 16:9 format.

輸出

Here are two images featuring the advertising board with 'LEGGIE' in Times Square. Each captures the vibrant night scene with people walking around under the moonlit sky. Let me know if you need any adjustments!

圖片1

prompt: An advertising board prominently displaying the word ‘LEGGIE’ in bold, modern font, placed on the video wall in Times Square. It’s nighttime with a clear sky and bright moonlight casting a soft glow over the scene. There are people of various descents and genders casually walking around, some pausing to look at the advertisement. The bustling urban atmosphere of Times Square is captured with bright neon lights, billboards, and the iconic New York City vibe. The image should be in a 16:9 format, emphasizing the vibrant and lively environment.

圖片2

prompt: An advertising board prominently displaying the word ‘LEGGIE’ in bold, modern font, placed on the video wall in Times Square. It’s nighttime with a clear sky and bright moonlight casting a soft glow over the scene. There are people of various descents and genders casually walking around, some pausing to look at the advertisement. The bustling urban atmosphere of Times Square is captured with bright neon lights, billboards, and the iconic New York City vibe. The image should be in a 16:9 format, emphasizing the vibrant and lively environment.

第一次輸出的二張圖片中,第一張圖片正確地顯示” LEGGIE “這個單字,而第二張圖片多了一個i。

這時我們發現兩個狀況:

  1. GPT-4V 會把我們輸入的 [ ] 轉變成 ‘ ‘ 再提供給 DALL-E 3 作圖。
  2. 我們指定的文字正確,但是圖片中未指定的文字幾乎都是錯的。

第一個狀況其實筆者認為是 GPT-4V 與 DALL-E 3 溝通時的轉譯。筆者實際比較 [ ] 與 ‘ ‘ 指定輸出文字的結果,認為使用 [ ] 還是比較容易讓 DALL-E 3 聽話,正確率相對會高很多。

第二個狀況則暫時無解,在同樣的情景下只能讓 DALL-E 3 跑出相對其他文字較不明顯的圖,或調整圖片背景的描述。例如修改成這樣的 prompt :

Make an advertising board with [LEGGIE] written on it on a high mountain. Sun, sunshine, seagulls, palm trees. 16:9 format.

輸出

Here are the images featuring the 'LEGGIE' advertising board on a high mountain, surrounded by sunshine, seagulls, and palm trees. Each image is presented in a 16:9 format. Let me know if you need any further adjustments!

圖片3

prompt: An advertising board with the word ‘LEGGIE’ prominently displayed, situated on a high mountain. The scene is bathed in sunshine, with a clear blue sky and the sun shining brightly. Around the board, there are a few palm trees swaying gently in the breeze, adding a tropical feel to the mountain landscape. Seagulls are flying in the sky, enhancing the serene and beautiful outdoor setting. The image should be in a 16:9 format, capturing the contrast between the rugged mountain and the serene, sunny atmosphere with palm trees and seagulls.

圖片4

prompt: An advertising board with the word ‘LEGGIE’ prominently displayed, situated on a high mountain. The scene is bathed in sunshine, with a clear blue sky and the sun shining brightly. Around the board, there are a few palm trees swaying gently in the breeze, adding a tropical feel to the mountain landscape. Seagulls are flying in the sky, enhancing the serene and beautiful outdoor setting. The image should be in a 16:9 format, capturing the contrast between the rugged mountain and the serene, sunny atmosphere with palm trees and seagulls.

更換適合的背景會使得文字相對突出,並減少出現錯誤文字的機率。

專為製作 DALL-E 3 文字圖片的 GPT 推薦

如果你製作圖片的目的是以突顯文字內容為主(例如品牌名、特定的廣告詞等等),你也可以試用由 Leggie 團隊開發的 D3 Image Generator with Perfect Text GPT ,可以有效節省你構思這類圖片的時間喔!

進一步閱讀

  1. 直接複製!使用 DALL-E 3 新增的67種圖片風格,解放 AI 創作潛能 (附實例)
  2. DALL-E 3 新功能與實踐指南
  3. GPT-4V 進階:透過 GPT-4V 讓 DALL-E 3 以文生圖創造風格近似的作品
  4. 如何利用 Negative Prompt 優化 DALL-E 3 圖像生成
  5. 如何用 Gen ID 讓 DALL-E 3 輸出相同風格的圖片
  6. 如何在 DALL-E 3 中將數種不同物體融合為一
  7. DALL-E 3 中的光線參數 (上)
  8. 六種適用於東方特色的 DALL-E 3 創作風格
  9. 如何透過修改 Seed 讓 DALL-E 3 逐步生成完美圖片
  10. DALL-E 3 中的光線參數 (下)
  11. DALL-E 3 的角度參數及其影響

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *