Skip to main content

    630 Ideas

    fvayas
    fvayasExplorer

    Enable multimodal AI in ManyChat understand voice notes and imagesNew

    Summary:I propose that ManyChat’s integrated AI can understand audio (transcribe + extract intent) and images (OCR / recognize context). Today users prefer sending voice notes and images; allowing ManyChat to process them natively will reduce friction and enable new support and sales flows.Problem: Many users send voice messages and images because it’s more convenient; bots currently ask them to type or repeat information. This causes delays and extra work for agents. Proposal: Enable processing of voice messages to produce a transcription and the user’s main intent. Enable basic image recognition (read text in photos like tickets/receipts and detect image type: product / receipt / ID). Allow flows to combine voice + image + text to make decisions (for example: detect a complaint and automatically create a ticket). Use cases: Support: customer sends a product photo and says by voice “it arrived broken” → bot identifies order and creates a refund proposal or ticket. Commerce: user sends a photo of a product and asks the price by voice → bot replies with options and a purchase button. Pre-human assistance: automatic summary of audio + photo so the agent sees the essentials before replying. Accessibility: people who have trouble writing use voice and images to complete forms. Closing / Request:Please consider prioritizing multimodal capabilities (processing audio and images) in ManyChat’s AI. I can provide concrete flow examples if the product team wants them.

    fvayas
    fvayasExplorer

    Enable multimodal AI in ManyChat — understand voice notes and imagesNew

    Summary:I propose that ManyChat’s integrated AI can understand audio (transcribe + extract intent) and images (OCR / recognize context). Today users prefer sending voice notes and images; allowing ManyChat to process them natively will reduce friction and enable new support and sales flows.Problem: Many users send voice messages and images because it’s more convenient; bots currently ask them to type or repeat information. This causes delays and extra work for agents. Proposal: Enable processing of voice messages to produce a transcription and the user’s main intent. Enable basic image recognition (read text in photos like tickets/receipts and detect image type: product / receipt / ID). Allow flows to combine voice + image + text to make decisions (for example: detect a complaint and automatically create a ticket). Use cases: Support: customer sends a product photo and says by voice “it arrived broken” → bot identifies order and creates a refund proposal or ticket. Commerce: user sends a photo of a product and asks the price by voice → bot replies with options and a purchase button. Pre-human assistance: automatic summary of audio + photo so the agent sees the essentials before replying. Accessibility: people who have trouble writing use voice and images to complete forms. Closing / Request:Please consider prioritizing multimodal capabilities (processing audio and images) in ManyChat’s AI. I can provide concrete flow examples if the product team wants them.