Microsoft's Phi-3-Vision: AI-Powered Image Recognition

Microsoft's Phi-3-Vision: AI Tool to Process Images and Graphs
Microsoft's Phi-3-Vision: AI-Powered Image Recognition

Microsoft introduced Phi-3 Vision at the Microsoft Build 2024 event, extending its Phi-3 family. This small language model has features for processing both text and images. Microsoft's Phi-3 Vision model is ideal for running in mobile phones and laptops, with 4.2 billion parameters. 

Microsoft’s Phi-3 Vision is intended for general visual processing tasks such as chart interpretation or image processing.  Microsoft’s Phi-3 Vision was launched a few days after the launch of Microsoft’s Phi-3-mini.

In the Microsoft Build 2024 event, Microsoft's CEO, Satya Nadela stated that the model is "hybrid,"  that it can run on devices as long as the necessary hardware is available and then migrate to the cloud when that's not the case.

The Phi-3 family includes members such as Phi-3-vision, Phi-3-small, and Phi-3-mini. The Phi-3 small model has 7 billion parameters, and the Phi-3 Medium has 14 billion parameters. With its latest impressive advancements, Microsoft's Phi-3-Vision stands out in AI-powered image recognition

The increasing demand for low-cost and low-computation AI services fuels the development of lightweight and cost-efficient AI models such as Phi-3.

Developers can find Microsoft’s Phi-3-Vision model in Azure AI Studio and Azure AI Playground as a preview.

Microsoft’s visual image can analyze data in charts or photos, but it doesn’t generate images like Microsoft’s DALL—E. Instead, it helps understand information related to objects in images. It’s built with datasets, including synthetic data and filtered publicly accessible websites. Microsoft’s Phi-3-Vision has a target audience with 4.2B parameters, making it an ideal choice for mobile devices and laptops.

Phi-3 vision is the first multimodal model in the Phi-3 family. It combines text and images, reasoning over real-world visualizations, and extracting and reasoning over text from visualizations. It is optimized for chart and diagram comprehension, generating insights and answering questions.

The Phi-3 model has been optimized to operate across various hardware. ONNX Runtime and DirectML support the optimized Phi-3 variants, giving developers support for devices and platforms ranging from mobile to web deployments.

The Phi-3 AI models were designed and built in compliance with the Microsoft Responsible AI Standard and underwent stringent safety measurement and assessment, red-teaming, sensitive use review, and security guidance to support responsible development, testing, and deployment in compliance with Microsoft’s standards and best practices.

Small language models tend to work better for more manageable tasks, are easier to access and use for organizations with fewer resources, and can be more easily customized to meet specific requirements. They are well-suited for applications that run on a local machine, where a task requires little reasoning and analysis and requires a fast response.

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Related Stories

No stories found.
Analytics Insight