Document Governance with Watson Discovery for Watson Assistant

Marc Nehme
IBM Data Science in Practice
6 min readFeb 27, 2020

--

You have many teams or roles within your organization. They all are using the same virtual assistant to get answers to their questions. Your virtual assistant handles common dialog and FAQs for all your teams. Yet, you need to ensure that each team is getting the correct answers, from the right documents. These document-based answers could be based on “long tail” or less frequently asked questions. Or the answers may require documentation to follow, such as steps of a process or a list of guidelines. You do not need to imagine too hard as this is a very common scenario.

In this read, I share two approaches to how you can achieve document governance with Watson Discovery along with Watson Assistant.

Please note that my guidance here is for those who are using IBM Cloud public. If you are using “IBM Cloud Pak for Data”, then “Watson Discovery for IBM Cloud Pak for Data” supports document-level security for several data sources. You can review and potentially leverage that. If you have security requirements, concerns or are dealing with sensitive data, then you should reach out to your IBM contact to discuss.

Capture Context

The first thing you need to do is pass in and capture the logged-in user’s context in Watson Assistant. In my example, I will use the “role” of the user to determine the appropriate set of documents they should have access to and get answers from (in Discovery). Upon the start of each chat session in Watson Assistant, you can pass in context using the message API using the context parameter.

(Example using Watson Python SDK)

import json
from ibm_watson import AssistantV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator(‘{apikey}’)
assistant = AssistantV2(
version=’2020–02–05',
authenticator = authenticator
)
assistant.set_service_url(‘{url}’)response=assistant.message(
assistant_id=’{assistant_id}’,
session_id=’{session_id}’,
input={
‘message_type’: ‘text’,
‘text’: ‘Hello’,
‘options’: {
‘return_context’: True
}
},
context={
‘global’: {
‘system’: {
‘user_id’: ‘my_user_id’
}
},
‘skills’: {
‘main skill’: {
‘user_defined’: {
‘role’: ‘admin’
}
}
}
}
).get_result()
print(json.dumps(response, indent=2))

In my example above, I am leveraging a “skill-specific” context to capture my user-defined entity in Watson Assistant named ‘role’. This context should be passed in from whatever front end the user is authenticating to access the assistant. You can learn more about handling context variables here. Once you’ve got the context captured, you need to store it in the dialog node that will make the call out to Watson Discovery.

Capturing context in your dialog node

The next step is to call Watson Discovery. Below are two different options you can consider when trying to implement document-level governance for Watson Discovery while using Watson Assistant as the front end.

Document-Level Filters

In this approach, you would ingest all of your documents into one Discovery collection. You can add a new parameter as metadata to your documents when ingesting them into Discovery. In this example, you would add a key/value pair with “role” as the key parameter and “admin” or “user” as the value, for example. The value would depend on the type of document and the intended audience. See the “Add a Document” API for details on how to do this.

Once you’ve ingested your document sets with the additional “role” metadata, you can use it as a filter in your query from the Watson Assistant dialog node. The user context captured in the step above is stored as the role and will be used as my filter. In this example, my dialog node will have a response type set to search skill, which calls Watson Discovery natively from Watson Assistant.

Leveraging Discovery metadata as filters in your query

Below is a simplistic view of the result from the search skill. I opted to view the title of the document (which in my case is the file name) and the URL that leads to that document. This is configurable in the search skill itself.

Search Skill response

This governs the answers presented to your users so each respective user role will get answers from the appropriate set of documents. The benefit to this approach is that it only uses one Discovery collection for all documents. That could also be a drawback if metadata tagging is not done efficiently. That is an additional task to complete.

Using Webhooks

In this approach, one Discovery collection is used for each respective set of documents (eg one for admins and one for standard users). Once ingested, the documents reside in different collections, different from the previous approach. You do not need to add additional metadata using this approach.

One of the response types in Watson Assistant is a webhook that allows you to make a POST request to an external API. In this approach, a simple program needs to be written. IBM Cloud Functions, for example, is a good deployment option for the program. This program calls a Watson Discovery collection. Explaining how to create a Cloud Function is out of scope for this post but you can view the documentation.

The collection ID called in the Cloud Function will depend on the role context provided by Watson Assistant captured at the beginning of the chat session. The cloud function will use that role as business logic to decide which collection to query. Once the Cloud Function is created, the webhook option in Watson Assistant needs to be configured to call that endpoint.

Once that is done, configure the Assistant dialog. There are many different ways to set up the dialog node that will be invoking the webhook. Here is one example. The new node that will call the webhook can be added to the very bottom of your list of nodes, depending on the use case.

In that node, we need to catch every utterance that did not have a match in your previous dialog nodes:

Then check for the “role” entity to ensure you have that context:

Specify the parameters you will pass to the Cloud Function:

Then set the response conditions:

Above, I added simple formatting to parse out exactly what I want to return from the answer that Watson Discovery provided to the Cloud Function. I opted to return the title of the document and the message body:

<b>Title:</b> “$webhook_result_1.title” <br> <b>Summary:</b> “$webhook_result_1.message”

Here is a simplistic view of an example response returned from Watson Discovery, leveraging the Cloud Function:

The benefit of this approach is that you do not need to manually add metadata to your document sets. The other benefit is that the document sets will reside in their own collection, not mixed together. The additional work required in this approach would be to create the webhook but it is not difficult nor time consuming for someone who has experience developing a simple, lightweight application.

Conclusion

Both of these approaches can be leveraged to achieve document governance. Each has its benefits and drawbacks but ultimately deliver the same results. The approach you take will depend on your preference.

Marc is the CTO for IBM Watson AI Strategic Partnerships. When not leading the technology vision and strategy for IBM Partnerships, Marc enjoys DJing, playing video games and wrestling.

--

--

Marc Nehme
IBM Data Science in Practice

Tech guy living the dream, AI enthusiast, helping scale AI across the globe, making things real - MarcNehme.com