Vardhaman Deshpande

This is the third article in my "Building a Microsoft Teams Bot" series. In this series, I have written down some interesting things I came across while creating a Microsoft Teams Bot app which is now available on AppSource: https://appsource.microsoft.com/en-us/product/office/WA200002297

Click here to see the previous articles in the series:

Building a Microsoft Teams Bot: Posting an Adaptive Card carousel as a welcome message

Building a Microsoft Teams Bot: Deep linking to a Teams message from an Adaptive Card button

Todays article is around how to pass custom data from Adaptive Cards back to the Teams bot. This can be useful in scenarios when we want to show multiple options to the user in an Adaptive card and when the user selects an option, we want to send that option back to the bot and perform a relevant operation.

To achieve this, we will use the data property of the Adaptive Card Submit action: https://adaptivecards.io/explorer/Action.Submit.html

This property contains a key value pair of custom data which can be sent back to the bot.

Here is a sample of how the card json will look when posting it to Teams:

To build this type of card in our bot code, we will iterate over the options and create Adaptive card Submit buttons with the relevant values in the data property:

And finally, when the user clicks on a button, Teams platform will send the corresponding data property back to the bot. This can then be used in the Submit action of the bot to perform a relevant operation.

Hope this was helpful!

Microsoft Teams meetings extensions or "Apps for Teams meetings" are the newest entry in the Microsoft Teams extensibility story. They can be used to build custom experiences right into the meeting experience. Meeting participants are able to interact with the custom experiences either before, during or after the meeting. To know more about apps for Teams meetings, a great place to start is the Microsoft docs: https://docs.microsoft.com/en-us/microsoftteams/platform/apps-in-teams-meetings/teams-apps-in-meetings

Pre-meeting and Post-meeting experiences

The pre and post meeting experiences are not too different than what we are used to when building Tabs for Teams. The basic structure remains the same with the only different thing being that the Teams SDK provides the meeting specific APIs when invoked from a meeting app. These APIs can be used to get meeting details like participants and meeting context in our app. More information on the APIs here: https://docs.microsoft.com/en-us/microsoftteams/platform/apps-in-teams-meetings/api-references?tabs=dotnet

In-meeting experiences

When it comes to the In-meeting experiences, there are two main areas: The side panel and the in meeting dialog box (also known as content bubble). The side panel is used to show custom experiences while the meeting is in progress

And the in-meeting dialog box (or content bubble) is used to show content, prompt or collect feedback from the users during the meeting:

I should mention here that the In-meeting experiences only work in the Teams desktop and mobile clients as of now. They don't work in the Teams web browser interface at the time of this writing. To me this is the biggest challenge for using them in production.

Now that's enough introduction of the concepts, let's take a look at how to actually build these experiences and what are the moving pieces when building them. The Microsoft docs do provide some great step by step tutorials for each of the use cases.

In this post we will look at the In-meeting dialog box. We will take a look at the Microsoft's code sample and walk through it. You can find the code sample here: https://github.com/OfficeDev/Microsoft-Teams-Samples/tree/main/samples/meetings-content-bubble/csharp

1) Configure an Azure Bot and enable Teams Channel

Create a bot in Azure and configure the endpoint which should receive the Teams events:

Next, add the Teams channel so that the bot is able to talk to Microsoft Teams:

2) Update the Teams manifest

For the In-meeting dialog box, we don't have to make any special changes in the manifest. We just need to make sure that since the dialog box will be shown via the bot, our Teams app manifest should have the bot configured as part of it.

3) Create an Azure AD app registration for the app

When we created the Azure Bot, a new Azure AD app registration was created behind the scenes as well. Grab the client id and client secret from this app as we would need it later to add to our bot config

4) Start the ngrok tunnel and update the code sample:

To debug locally, you will need to setup an ngrok tunnel to your local machine. More details here: https://ngrok.com/

And in the code sample, update the bot client id and client secret along with the ngrok tunnel url:

5) In meeting dialog code:

The way to bring up an in-meeting dialog is to use the regular Bot Framework turnContext.SendActivityAsync(activity) code but with updated Teams channel data:

You will notice that in the In-meeting dialog url, there is a reference to {_config["BaseUrl"]}/ContentBubble

This means that the contents of the in-meeting dialog have to be hosted in our app. This is good news as that means we have complete control over what is displayed in the dialog. In this code sample, the contents are hosted in an MVC view:

And once everything fits together, we can see the sample code running to show an In-meeting dialog launched in the context of a meeting:

Hope this helps, and thanks for reading!

Following up on my previous post about Microsoft Teams meeting apps, let's now have a look at how we can invoke the Microsoft Graph in our meeting app.

Specifically, we will be using the Microsoft Graph communications API to get all meeting participants.

When I was first looking at this scenario, I thought it would be quite straightforward as this seems to be quite a common use case for meeting apps. However, I was in for a bit of a surprise as that did not turn out to be the case.

My thinking was that I would get a meetingId as part of the Teams JS SDK context object and I would use this meetingId to get details from the Graph. But it turns out that the meetingId which the Teams JS SDK provides is completely different than the meetingId recognised by that Microsoft Graph.

To fetch the meetingId which the Graph recognises, we have to introduce the Bot Framework SDK to the mix and exchange the Teams JS SDK meetingId for the Graph meeting Id.

So let's see how to achieve this using code:

1) Using Teams JS SDK, get user id, meeting id, tenant id

This bit is straightforward, in your Teams tab you can get the meeting context using the Teams JS SDK:

2) Use Bot SDK to get Graph meeting ID from the Teams meeting id

Next, from your Teams Bot, call the /v1/meetings/{meeting-id} API. Couple of things you should know beforehand:

This API is currently in developer preview. This means that it will only work for when the Microsoft Teams Clients have been switched to preview mode.

Secondly, we have to turn on Resource Specific Consent (RSC) for your app and request the required scopes. More details on both points here: https://docs.microsoft.com/en-us/microsoftteams/platform/apps-in-teams-meetings/api-references?tabs=dotnet#get-meeting-details-api

Once you have everything setup, you can call the API to get the meeting details:

The response would contain a details object with an msGraphResourceId property. This will be the meeting id which would work with the Graph API.

3) Use Graph API to get meeting participants from Graph meeting id

Finally, from the msGraphResourceId, we can make a regular call to the Graph to get the meeting details. Specifically, we will call the /onlineMeetings/{meetingId} endpoint: https://docs.microsoft.com/en-us/graph/api/onlinemeeting-get?view=graph-rest-1.0&tabs=http

In this case, we will get the meeting participants:

Make sure you have the correct Delegated or Application permissions granted to the Azure AD app registration you are using for authenticating to the Graph.

This is not an ideal situation as you might not have a requirement for a Bot in your Teams app and might be building a tab for example. In this case, Yannick Reekmans has written a post for you here: https://blog.yannickreekmans.be/get-full-meeting-details-in-a-teams-meetings-app-without-using-bot-sdk/

That's it for now. Hope you find this helpful when exploring meeting apps right now.

I have been working with Cosmos DB for a while now and until recently, there was one thing which always annoyed me: When updating a JSON document stored in a container, there was no way to only modify a few selected properties of the document.

The entire JSON document had to be fetched by the client first, then locally replace the properties to be updated, and then send the entire document back to Cosmos DB and replace the previous version of the document.

This was always a challenge because first, it added more work for developers and second, there could be concurrency issues if multiple clients could be downloading multiple copies of the document and updating the data and sending back their copy.

But fortunately, now it's possible to partially update a document in Cosmos DB and basically do a PATCH operation while only sending the properties to be changed over the wire.

So in this post, let's have a look at the different ways in which we can partially update documents in Cosmos DB:

Setting up by creating a Cosmos DB document

First, we will create our sample document using code. I should mention I am using v3.30.1 of Azure Cosmos DB .NET core package from nuget: Microsoft.Azure.Cosmos

As you can see this is a simple document representing a User object with a few attached properties:

Now in order to only change a few properties in the object, we need to use the Partial document update feature in Azure Cosmos DB

Update document properties

Now lets have a look at how we can modify and add properties to the User object:

In the code, we are updating an existing property of the document as well as adding a new property. Also, we are only sending the properties to be modified over the wire.

We can also send both properties in a single operation by adding the operations to the patchOperations array.

Update elements in array

Just like document properties, we can also update single elements in an array in the document. Elements can be updated, added, added at a specific index and also removed from an array:

Update objects and their properties

The properties of a JSON document can themselves be objects as well. The Azure Cosmos DB patch operations can also be used to update properties of specific objects as well as modifying the objects themselves.

Hope this helps! For more details, do have a look at the Microsoft docs for Azure Cosmos DB partial updates: https://docs.microsoft.com/en-us/azure/cosmos-db/partial-document-update

When building a custom Microsoft Teams tab with Teams manifest v1.7+, there is an option to natively show a loading indicator in Teams before our app loads. This can be helpful if your app needs to fetch any data before the first load.

There are two parts to showing the loading indicator:

1. The showLoadingIndicator property in the Teams manifest needs to be set to true.

2. When the tab loads, the app.initialize() and app.notifySuccess() methods of the Teams JS SDK v2 should be called to indicate to Teams that the app is ready to load and to hide the loading indicator.

Now there are some "interesting" things to know about this. If you are using the Teams Developer Portal to configure your Teams app, then as of now the showLoadingIndicator property is set to true by default and there is no way to change this in the portal. You will have to download the app manifest and make the necessary changes in the json manually.

If you are just starting with Microsoft Teams development and are unaware of the loading indicator option, the error message shown on the tab is not much help in figuring out the issue. It simply says:

"There was a problem reaching the app"

This is because the showLoadingIndicator property has been set to true by the Teams Developer portal and we don't yet have our tab calling the app.initialize() and app.notifySuccess() methods of the SDK. So, Teams thinks that our app has failed to load and shows the error message.

There are two possible solutions to fixing this problem:

1. Set the showLoadingIndicator property to false manually in the manifest json. But the drawback to this approach then, is that no native loading indicator would be shown, and the users might have to stare at a blank screen until the content loads.

2. Let the showLoadingIndicator property be set to true and then make sure we call the app.initialize() and app.notifySuccess() methods of the SDK.

Manifest.json:

Tab react code:

There are further options possible with the loading indicator like hiding the loading indicator even if the app continues to load in the background and explicitly specifying loading failure. To see more details about those, the Microsoft docs are a good place to start: Create a content page - Teams | Microsoft Learn

Hope this helps!

If you are building a Teams app and want to integrate with Teams channel messages, this post will be helpful to you. Specifically, we will be looking at posting a Teams channel message, from the app, on behalf of the currently logged in user. For example, there could be a scenario where an event occurs within the Teams app and as a result a message needs to be posted to a Teams channel but instead of it coming from a bot, it needs to be posted by the logged in user's account.

Let's see how this can be done.

1. Azure AD app and Permissions

First thing we need is an Azure AD app setup with the Microsoft Graph ChannelMessage.Senddelegated permission. This permission is needed for posting messages to Teams using the current user credentials.

I should mention setting up the right permissions is part of a larger configuration in the Azure AD app needed for Single Sign On (SSO) setup in Teams app. You can see the full configuration here: Register your tab app with Azure AD - Teams | Microsoft Learn

2. Current user's id token from Teams JS SDK v2

Once the permissions are setup, we need to setup our frontend so that it can grab the current user's id token from Microsoft Teams. More info on Azure AD id tokens here: Microsoft identity platform ID tokens - Microsoft Entra | Microsoft Learn

Although this token is available from Teams JS SDK v2, it cannot itself be used to make graph calls. We need to exchange it for a Microsoft Graph access token. For this we will send the id token to our backend:

3. Getting Microsoft Graph delegated access token and posting message to Teams

It is recommended to do the token exchange as well as any further Graph calls from the backend of your app instead passing the Graph access token back to the frontend and making the calls from there:

In the code we first graph the id token from the Authorization header, then exchange the id token for a Microsoft Graph access token, then finally we are able to make a Graph call to post a message to Teams as the current user.

Hope this helps!

Ever since OpenAI function calling was released, I have been incredibly fascinated by it. To me, it is as big a game changer as ChatGPT itself.

With function calling, we are no longer limited by the data which was used to train the Large Language Model (LLM). We can call out to external APIs, protected company data and other business specific APIs and use the data to supplement the responses from the LLM.

To know more about function calling specifically with the Azure OpenAI service, check out the Microsoft docs: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling

In this post, let's have a look at how we can leverage OpenAI function calling to chat with our user directory and search for users in natural language. To make this possible we will use the Microsoft Graph to do the heavy lifting.

This is what we want to achieve: The user asks a question about the people directory in natural language, the LLM is able to transform the question to code which the Microsoft Graph understands and the LLM is again able to transform the response from the Microsoft Graph back to natural language.

On a high level, our approach can be summarised as follows:

1. Define the OpenAI functions and make them available to the LLM

2. During the course of the chat, if the LLM thinks that to respond to the user, it needs to call our function, it will respond with the function name along with the parameters to be sent to function.

3. Call the Microsoft Graph user search API based on the parameters provided by the LLM.

4. Send the results returned from the Microsoft Graph back to the LLM to generate a response in natural language.

Alright, let's now look at the code. In this code sample I have used the following nuget packages:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.6/

https://www.nuget.org/packages/Microsoft.Graph/5.30.0/

https://www.nuget.org/packages/Azure.Identity/1.10.2/

The very first thing we will look at is our function definition:

In this function we are informing the LLM that if needs to search any users as part of providing the responses, it can call this function. The function name will be returned in the response and the relevant parameters will be provided as well.

The enums in the officeLocation and department parameter will instruct the LLM to only return those values even if user asks a slightly different variation in their question. We can see an example of this in the gif above. Even if the question asked contains words like "devs" and "NY", the LLM is able to determine and use the terms "Engineering" and "New York" instead.

Next, let's see how our orchestrator looks. I have added comments to each line where relevant:

There is a lot to unpack here as this function is the one which does the heavy lifting. This code is responsible for handling the chat with OpenAI, calling the MS Graph and also responding back to the user based on the response from the Graph.

Next, let's have a look at the code which calls the Microsoft Graph based on the parameters provided by the LLM.

Before executing this code, you will need to have created an App registration with a clientId and clientSecret. Here is how to do that: https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app

Since we are calling the Microsoft Graph /users endpoint with application permissions, the app registration will need a minimum of the User.Read.All application permission granted.
https://learn.microsoft.com/en-us/graph/api/user-list?view=graph-rest-1.0&tabs=http

This code get the parameters sent from the LLM and uses the Microsoft Graph .NET SDK to call the /users/search endpoint and fetch the users based on the officeLocation, departmentor jobTitle properties.

Once the users are returned, their displayName value is concatenated into a string and returned to the orchestrator function so that it can be sent again to the LLM.

Finally, lets have a look at our CallChatGPT function which is responsible for talking to the Open AI chat api.

This function defines the Open AI function which will be included in our Chat API calls. Also, the user's question is sent to the API to determine if the function needs to be called. This function is also called again after the response from the Microsoft Graph is fetched. At that time, this function contains the details fetched from the Graph to generate an output in natural language.

This way, we can use Open AI function calling together with Microsoft Graph API to chat with your user directory.

In the previous post, we saw what is OpenAI function calling and how to use it to chat with your organization's user directory using Microsoft Graph. Please have a look at the article here: Chat with your user directory using OpenAI functions and Microsoft Graph

In this post, we will implement function calling for a very common scenario of augmenting the large language model's responses with data fetched from internet search.

Since the Large Language model (LLM) was trained with data only up to a certain date, we cannot talk to it about events which happened after that date. To solve this, we will use OpenAI function calling to call out the Bing Search API and then augment the LLM's responses with the data returned via internet search.

This pattern is called Retrieval Augmented Generation or RAG.

Let's look at the code now on how to achieve this. In this code sample I have used the following nuget packages:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.6/

https://www.nuget.org/packages/Azure.Identity/1.10.2/

The very first thing we will look at is our function definition for informing the model that it can call out to external search API to search information:

In this function we are informing the LLM that if it needs to search the internet as part of providing the responses, it can call this function. The function name will be returned in the response and the relevant parameters will be provided as well.

Next, let's see how our orchestrator looks. I have added comments to each line where relevant:

This code is responsible for handling the chat with OpenAI, calling the Bing API and also responding back to the user based on the response from internet search.

Next, let's have a look at the code which calls the Bing API based on the parameters provided by the LLM. Before executing this code, you will need to have created Bing Web Search API resource in Azure. Here is more information on it: https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/overview

The Bing Web Search API key can be found in the "Keys and Endpoint" section on the Azure resource:

Here is the code for calling the Bing Search API:

In this code, we are calling the Bing Web Search REST API to get results based on the search query created by the LLM. Once the top 3 results are fetched we are getting the text snippets of those results, combining them and sending it back the LLM.

We are only getting the search result snippets to keep this demo simple. In production, ideally you will need to get the Url of each search result and then get the content of the page using the Url.

Finally, lets have a look at our CallChatGPT function which is responsible for talking to the Open AI chat API:

This code defines the OpenAI function which will be included in our Chat API calls. Also, the user's question is sent to the Chat API to determine if the function needs to be called. This function is also called again after the response from the Bing Web Search API is fetched. At that time, this function contains the search results and uses them to generate an output in natural language.

This way, we can use Open AI function calling together with Bing Web Search API to connect our chat bot to the internet!

The new Microsoft Teams Desktop client was made generally available for Windows and Mac recently. The good news is that the new client provides feature parity for 3rd party apps like Focusworks AI giving customers a choice of using their preferred Teams client to access the apps.

However, if you have a custom built Microsoft Teams tab or a task module as part of your solution, and find that it fails to load in the new Microsoft Teams client, there might be a specific reason for it.

And since there is no way to invoke the Developer tools in the new Teams desktop client yet (November 2023), the experience can get a bit frustrating.

In my case, I have a custom React/TypeScript based tab which is using the @microsoft/teams-js library to interact with Teams.

Since teams tabs are just HTML pages, we need to make sure that the page is being loaded inside Teams before continuing to execute the code. To do that we can use the context.app.host.name property and check that the value was "teams" before moving ahead.

However, with the new desktop client my tab was failing to load. After a bit of digging around I realised that the new Teams desktop client has an entirely different host name property and the value is "teamsModern" as mentioned here: https://learn.microsoft.com/en-us/javascript/api/%40microsoft/teams-js/hostname?view=msteams-client-js-latest

So changing my code to include the new value as well worked!

Hope this saves you some debugging time!

I was working on a project recently where we were using the Azure OpenAI service quite heavily. As part of creating the DevOps pipelines for the project, we had to look into automating the management of the Azure OpenAI service. Turns out this functionality is possible with the Azure CLI however it is available under the Cognitive Services module which can be a bit tricky to find. So here is a quick blog post detailing some of the more frequently used operations for the Azure OpenAI service through the Azure CLI:

For a full set of operations, please see the Microsoft docs: https://learn.microsoft.com/en-us/cli/azure/cognitiveservices?view=azure-cli-latest

If you have been working with OpenAI APIs, you will have come across the term "tokens". Tokens are a way in which these APIs process and output text. Various versions of the OpenAI APIs have different token context lengths. This means there is a limit to the text they can process in a single request. More about tokens here: https://learn.microsoft.com/en-us/azure/ai-services/openai/overview#tokens

When building an app based on these APIs, we need to keep track of the tokens being sent and make sure not to send more than the maximum context length of the OpenAI model being used (e.g. gpt-3.5-turbo). If more tokens are sent than the maximum context length of the model, the request will fail with the following error:

To help with counting tokens before sending to the APIs, there are various libraries available. One of them being the Microsoft Tokenizer: https://github.com/microsoft/Tokenizer which is an open source .NET and TypeScript implementation of OpenAI's tiktoken library.

So in this post, let's see how we can use the Microsoft Tokenizer .NET SDK to manage the tokens sent to OpenAI APIs.

First we will need the Microsoft Tokenizer nuget package:

https://www.nuget.org/packages/Microsoft.DeepDev.TokenizerLib/

Since we will actually be counting the tokens of a chat between the user and an AI assistant, we will also use the Azure OpenAI .NET SDK:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.8

Next, in our code we will first have to initialize the tokenizer and let it know which OpenAI model will we be working with. Most of the recent models like gpt-3.5-turbo, gpt-4 etc. share the same token encoding i.e. cl100k_base. So we can use the same tokenizer across these models.

Now let's look at the actual code:

What we have here is a sample chat history between a user and an assistant. Before sending the chat history to the OpenAI api to get the next message from the assistant, we are using the Tokenizer library to count the tokens, and if it comes out that there are more tokens present in the than the model supports, we are removing the earlier messages from the chat. This is so that the most recent conversations are sent to the API and the response generated stays relevant to the current conversation context.

Hope this helps!

With the latest gpt-4-turbo model out recently, there is one very helpful feature which came with it: The JSON mode option. Using JSON mode, we are able to predictably get responses back from OpenAI in structured JSON format.

This can help immensely when building APIs using Large Language Models (LLMs). Even though the model can be instructed to return JSON in it's system prompt, previously, there was no guarantee that the model would return valid JSON. With the JSON mode option now, we can specify the required format and the model will return data according to it.

To know more about JSON mode, have a look at the official OpenAI docs: https://platform.openai.com/docs/guides/text-generation/json-mode

Now let's look at some code to see how this works in action:

I am using the Azure OpenAI service to host the gpt-4-turbo model and I am also using the v1.0.0.-beta.12 version of the Azure OpenAI .NET SDK found on NuGet here:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.12

What is happening in the code is that in the system message, we are instructing the LLM that analyse the text provided by the user and then extract the cities mentioned in this text and return them in the specified JSON format.

Also important is line 22 where we explicitly specify to use the response format as JSON.

Next, we provide the actually text to parse in the user message.

Once we get the data back in expected JSON schema, we are able to convert it to objects which can be used in code.

And as expected we get the following output:

Hope this helps!

Dall E 3 is the latest AI image generation model coming out of OpenAI. It is leaps and bounds ahead of the previous model Dall E 2. Having explored both, the image quality as well as the adherence to text prompts is much better for Dall E 3. It is now available as a preview in Azure OpenAI Service as well.

Given all this, it is safe to say if you are working on the Microsoft stack and want to generate images with AI, using the Azure OpenAI Dall E 3 model would be the recommended option.

In this post, let's explore the image generation API for Dall E 3 and also how to use it from a SharePoint Framework (SPFx) solution. The full code of the solution is available on GitHub: https://github.com/vman/Augmentech.OpenAI

First, let's build the web api which will wrap the Azure OpenAI API to create images. This will be a simple ASP.NET Core Web API which will accept a text prompt and return the generated image to the client.

To run this code, we will need the following NuGet package: https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.13/

Now for calling the API, we will use a standard React based SPFx webpart. The webpart will use Fluent UI controls to grab the text prompt from user and send it to our API.

Hope this helps!

Microsoft 365 Copilot is an enterprise AI tool that is already trained on your Microsoft 365 data. If you want to "talk" to data such as your emails, Teams chats or SharePoint documents, then all of it is already available as part of it's "knowledge".

However, not all the data you want to work with will live in Microsoft 365. There will be instances when you want to use Copilot's AI on data residing in external systems. So how do we extend the knowledge of Microsoft 365 Copilot with real time data coming from external systems? The answer is by using plugins! Plugins not only help us do Retrieval Augmented Generation (RAG) with Copilot, but they also provide a framework for writing data to external systems.

To know more about the different Microsoft 365 Copilot extensibility options, please have a look here: https://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/decision-guide

So in this post, let's have a look at how to build a plugin which talks to an external API and then infuses the real time knowledge into Copilot's AI. At the time of this writing, there is nothing more volatile than Cryptocurrency prices! So, I will be using a cryptocurrency price API and enhance Microsoft 365 Copilot's knowledge with real time Bitcoin and Ethereum rates!

(click to zoom)

So let's see the different moving parts of the plugin. We will be using a Microsoft Teams message extension built on the Bot Framework as a base for our plugin:

1) App manifest

This is by far the most important part of the plugin. The name and description (both short and long) are what tell Copilot about the nature of the plugin and when to invoke it to get external data. We have to be very descriptive and clear about the features of the plugin here as this is what the Copilot will use to determine whether the plugin is invoked. The parameter descriptions are used to tell Copilot how to create the parameters required by the plugin based on the conversation.

2) Teams messaging extension code

This function does the heavy lifting in our code. It is called with the parameters specified in the app manifest by Copilot. Based on the parameters we can fetch external data and return it as adaptive cards.

3) Talk to the external system (Cryptocurrency API)

This is helper function which is used to actually talk to the crypto api and return rates.

Hope you found this post useful!

The code for this solution is available on GitHub: https://github.com/vman/M365CopilotPlugin

OpenAI released gpt-4o recently, which is the new flagship model that can reason across audio, vision, and text in real time. It's a single model which can be provided with multiple types of input (multi modal) and it can understand and respond based on all of them.

The model is also available on Azure OpenAI and today we are going to have a look at how to work with images using the vision capabilities of gpt-4o. We will be providing it with images directly as part of the chat and asking it to analyse the images before responding. Let's see how it works:

We will be using the Azure OpenAI service for working with the OpenAI gpt-4o and since we will be using .NET code, we will need the Azure OpenAI .NET SDK v2:

1. Basic image analysis

First, let's start with a simple scenario of sending an image to the model and asking it to describe it.

2. Answer questions based on details in images

Next, let's give a slightly more complex image of some ingredients and ask it to create a recipe:

Image source: allrecipes.com

3. Compare images

This one is my favourite, let's give it 2 images and ask it to compare them against each other. This can be useful in scenarios where there is a single "standard" image and we need to determine if another image adheres to the standard.

4. Binary data

If the URL of the image is not accessible anonymously, then we can also give the model binary data of the image:

5. Data URI

We can also use Data URI's instead of direct URLs

6. Limitations

As per OpenAI docs, there are some limitations of the vision model that we should be aware of:

Medical images: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.

Non-English: The model may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.

Small text: Enlarge text within the image to improve readability, but avoid cropping important details.

Rotation: The model may misinterpret rotated / upside-down text or images.

Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.

Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.

Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.

Image shape: The model struggles with panoramic and fisheye images.

Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.

Counting: May give approximate counts for objects in images.

CAPTCHAS: For safety reasons, we have implemented a system to block the submission of CAPTCHAs.

Overall, I do think the ability to combine text and image input as part of of the same chat is a game changer! This could unlock a lot of scenarios which were not possible just with a single mode of input. Very excited to see what is next!

Hope you found the post useful!

With OpenAI's recently released Assistants API, building AI bots becomes a lot easier. Using the API, an assistant can leverage custom instructions, files and tools (previously called functions) and answer user questions based on them.

Before the Assistants API, building such assistants was possible but for a lot of things, we had to use our own services e.g. vector storage for file search, database for maintaining chat history etc.

The Assistants API gives us a handy wrapper on top of all these disparate services and a single endpoint to work with. So in this series of posts, let's have a look at what the Assistants API can do.

Working with OpenAI Assistants: Create a simple assistant(this post)

Working with OpenAI Assistants: Using file search

Working with OpenAI Assistants: Chat with Excel files using code interpreter

Working with OpenAI Assistants: Using code interpreter to generate charts

The first thing we are going to do is build a simple assistant which has a "SharePoint Tutor" personality. It will be used to answer questions for users who are learning to use SharePoint. Before deep diving into the code, lets understand the different moving pieces of the Assistants API:

An assistant is a container in which all operations between the AI and the user are managed.

A thread is a list of messages which were exchanged between the user and AI. The thread is also responsible for maintaining the conversation history.

A run is a single invocation of an assistant based on the history in the thread as well as the tools available to the assistant. After a run is executed, new messages are generated and added to the thread.

For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

This was a simple assistant creation just to get us familiar with the Assitants API. In the next posts, we will dive deeper into the API and explore the more advanced concepts. Stay tuned!

This is the second post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the file search capabilities which allows us to upload files to the Assistants API and chat with them. See the following posts for the entire series:

Working with OpenAI Assistants: Create a simple assistant

Working with OpenAI Assistants: Using file search (this post)

Working with OpenAI Assistants: Chat with Excel files using code interpreter

Working with OpenAI Assistants: Using code interpreter to generate charts

The file search API uses the Retrieval Augmented Generation (RAG) pattern which has been made popular recently. The added advantage of using the Assistants API for this is that the API manages document chunking, vectorizing and indexing for us. Whereas without the Assistants API we would have to use a separate service like Azure AI Search and manage the document indexing ourselves.

To upload and chat with documents using the Assistants API, we have to use the following moving pieces:

First, we need to create a Vector Store in the Assistants API.
Then, we need to upload files using the Open AI File client and add them to the vector store.
Finally, we need to connect the vector store to either an assistant or a thread which would enable to assistant to answer questions based on the document.

Limitations

As per OpenAI docs, there are some limitations for the file search tool:

Each vector store can hold up to 10,000 files.
The maximum file size of a file which can be uploaded is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).

When querying for the documents in the vector store, we have to be aware of the following things which are not possible right now. However, the OpenAI team are working on this and some of these features will be available soon:

Support for deterministic pre-search filtering using custom metadata.
Support for parsing images within documents (including images of charts, graphs, tables etc.)
Support for retrievals over structured file formats (like csv or jsonl).
Better support for summarization — the tool today is optimized for search queries.

Current supported files types can be found in the OpenAI docs

Hope this helps!

This is the third post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the code interpreter tool which allows us to upload files to the Assistants API and write python code against them. This is very powerful for scenarios where you have to do data analysis on csv or Microsoft Excel files and generate charts and reports on them.

See the following posts for the entire series:

Working with the OpenAI Assistants: Create a simple assistant

Working with the OpenAI Assistants: Using file search

Working with the OpenAI Assistants: Chat with Excel files using code interpreter (this post)

Working with OpenAI Assistants: Using code interpreter to generate charts

The Retrieval Augmented Generation (RAG) pattern, which was discussed in previous posts, works great for text based files like Microsoft Word and PDF documents. However, when it comes to structured data files like csv or excel, it comes out short. An this where the Code Interpreter tool can come in very handy. It can repetitively run python code on documents until it is confident that the user's question has been answered.

So in this post, let's see how we can query an excel file with the code interpreter tool. The excel file we will be querying will contain details of customers like their name and the licenses purchased of a fictional product by them:

To upload and analyse documents using the Code interpreter, we have to use the following moving pieces:

First, we need to upload files using the Open AI File client
Then, we need to connect the uploaded file to the Code Interpreter tool in either an assistant or a thread which would enable the assistant to answer questions based on the document.

As you can see the code interpreter tool takes a few passes at the data. It tries to understand the document before answering the question. This is a really powerful feature and the possibilities are endless!

Hope this helps.

This is the fourth post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the code interpreter tool which allows us to generate charts based on some data. This is very powerful for scenarios where you have to do data analysis on JSON, csv or Microsoft Excel files and generate charts and reports based on them.

See the following posts for the entire series:

Working with the OpenAI Assistants API: Create a simple assistant

Working with the OpenAI Assistants API: Using file search

Working with the OpenAI Assistants API: Chat with Excel files using Code interpreter

Working with the OpenAI Assistants API: Using code interpreter to generate charts (this post)

The Code Interpreter tool has access to a sandboxed python code execution environment within the Assistants API. This can provide very useful as the Assistants API can iteratively run code against the files provided to it and generate charts!

So in this post, let's see how we can generate charts based on an excel file with the code interpreter tool. The excel file we will be querying will be the same one we used in the last post. It contains details of customers like their name and the licenses purchased of a fictional product by them:

To generate charts using the Code interpreter, we have to use the following moving pieces:

First, we need to upload the excel file using the Open AI File client
Then, we need to connect the uploaded file to the Code Interpreter tool in either an assistant or a thread which would enable the assistant to generate a chart on the document.

And this is the file generated by the code interpreter tool:

As you can see the code interpreter tool takes a few passes at the data. It tries to understand the document before generating the chart. This is a really powerful feature and the possibilities are endless!

Hope this helps.

By now, we have seen "Chat with your documents" functionality being introduced in many Microsoft 365 applications. It is typically built by combining Large Language Models (LLMs) and vector databases.

To make the documents "chat ready", they have to be converted to embeddings and stored in vector databases like Azure AI Search. However, indexing the documents and keeping the index in sync are not trivial tasks. There are many moving pieces involved. Also, many times there is no need for "similarity search" or "vector search" where the search is made based on meaning of the query.

In such cases, a simple "keyword" search can do the trick. The advantage of using keyword search in Microsoft 365 applications is that the Microsoft Search indexes are already available as part of the service. APIs like the Microsoft Graph Search API and the SharePoint Search REST API give us "ready to consume" endpoints which can be used to query documents across SharePoint and OneDrive. Keeping these search indexes in sync with the changes in the documents is also handled by the Microsoft 365 service itself.

So in this post, let's have a look at how we can combine OpenAI's gpt-4o Large Language Model with Microsoft Graph Search API to query SharePoint and OneDrive documents in natural language.

On a high level we will be using OpenAI function calling to achieve this. Our steps are going to be:

1. Define an OpenAI function and make it available to the LLM.

2. During the course of the chat, if the LLM thinks that to respond to the user, it needs to call our function, it will respond with the function name along with the parameters.

3. Call the Microsoft Graph Search API based on the parameters provided by the LLM.

4. Send the results returned from the Microsoft Graph back to the LLM to generate a response in natural language.

So let's see how to achieve this. In this code I have used the following nuget packages:

https://www.nuget.org/packages/Azure.AI.OpenAI/2.1.0

https://www.nuget.org/packages/Microsoft.Graph/5.64.0

The first thing we will look at is our OpenAI function definition:

In this function we are informing the LLM that if needs to search any files as part of providing the responses, it can call this function. The function name will be returned in the response and the relevant parameter will be provided as well. Now let's see how our orchestrator function looks:

Next, let's have a look at the code which calls the Microsoft Graph based on the parameters provided by the LLM.

Before executing this code, you will need to have created an App registration. Here is how to do that: https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app

Since we are calling the Microsoft Graph /search endpoint with delegated permissions, the app registration will need a minimum of the User.Read and Files.Read.All permissions granted. https://learn.microsoft.com/en-us/graph/api/search-query?view=graph-rest-1.0&tabs=http

This code get the parameters sent from the LLM and uses the Microsoft Graph .NET SDK to call the /search endpoint and fetch the files based on the searchQuery properties. Once the files are returned, their summary value is concatenated into a string and returned to the orchestrator function so that it can be sent again to the LLM.

Finally, lets have a look at our CallOpenAI function which is responsible for talking to the Open AI chat api.

This code defines the Open AI function which will be included in our Chat API calls. Also, the user's search query is sent to the API to determine if the function needs to be called. This function is also called again after the response from the Microsoft Graph is fetched. At that time, this function contains the details fetched from the Graph to generate an output in natural language. This way, we can use Open AI function calling together with Microsoft Graph API to search files in SharePoint and OneDrive.

Hope this helps!