Curate an effective Genie space

The goal of curating a Genie space is to create an environment where business users can pose natural language questions and receive accurate, consistent answers based on their data. Genie spaces use advanced models that generate sophisticated queries and understand general world knowledge.

Most business questions are domain-specific, so a space curator’s role is to bridge the gap between that general world knowledge and the specialized language used in a specific domain or by a particular company. Curators use metadata and instructions to help Genie accurately interpret and respond to business users’ questions. This article outlines best practices and principles to guide you in developing a successful space.

Best practices for defining a new space

The following sections recommended practices for creating an effective space.

Start small

Curating a Genie space is an iterative process. When creating a new space, start as small as possible, with minimal instructions and a limited set of questions to answer. Then, you can add as you iterate based on feedback and monitoring. This approach helps streamline creating and maintaining your space and allows you to curate it effectively in response to real user needs.

Use the following guidelines to help create a small Genie space:

  • Stay focused: Include only the tables necessary to answer the questions you want the space to handle. Aim for five or fewer tables. The more focused your selection, the better. Keeping your space narrowly focused on a small amount of data is ideal, so limit the number of columns in your included tables.
  • Plan to iterate: Start with a minimal setup for your space, focusing on essential tables and basic instructions. Add more detailed guidance and examples as you refine the space over time, rather than aiming for perfection initially.
  • Build on well-annotated tables: Genie uses Unity Catalog column names and descriptions to generate responses. Clear column names and descriptions help produce high-quality responses. Column descriptions should offer precise contextual information. Avoid ambiguous or unnecessary details. Inspect any AI-generated descriptions for accuracy and clarity, and only use them if they align with what you would manually provide.

Have a domain expert define the space

An effective space creator needs to understand the data and the insights that can be gleaned from it. Data analysts who are proficient in SQL typically have the knowledge and skills to curate the space.

Define the purpose of your space

Identifying your space’s specific audience and purpose helps you decide which data, instructions, and test questions to use. A space should answer questions for a particular topic and audience, not general questions across various domains.

Test and adjust

You should be your space’s first user. After you create a new space, start asking questions. Carefully examine the SQL generated in response to your questions. If Genie misinterprets the data, questions, or business jargon, you can intervene by editing the generated SQL or providing other specific instructions. Keep testing and editing until you’re getting reliable responses.

After you’ve reviewed a question, you can add it as a benchmark question that you can use to systematically test and score your space for overall accuracy. You can use variations and different questions phrasings to test Genie’s responses. See Use benchmarks in a Genie space.

See Troubleshooting for ideas on fixing erroneous responses.

Conduct user testing

After verifying response quality through testing, recruit a business user to try the Genie space. Use the following guidelines to provide a smooth user journey and collect feedback for ongoing improvement:

  • Set expectations that their job is to help to refine the room.
  • Ask them to focus their testing on the specific topic and questions the space is designed to answer.
  • If they receive an incorrect response, encourage users to add additional instructions and clarifications in the chat to refine the answer. When a correct response is provided, they should upvote the final query to minimize similar errors in future interactions.
  • Tell users to upvote or downvote responses using the built-in feedback mechanism.
  • Invite users to share additional feedback and unresolved questions directly with the space authors. Authors and editors can use feedback to refine instructions, examples, and trusted assets.

Consider providing training materials or a written document with guidelines for testing the room and providing feedback. As business users test the space, you’ll see the questions they’ve asked on the History tab. Continue adding instructions to help Genie correctly interpret the questions and data to provide accurate answers. See Review history and feedback to learn more about how to monitor Genie spaces.

Note

Business users must be members of the originating workspace to access your space. See Required permissions to learn how to provide the appropriate permissions to interact with the space.

Troubleshooting

The following sections outline how to resolve common problems.

Misunderstood business jargon

Most companies or domains have specific shorthand they use to communicate about business-specific events. For example, when referring to a year, it might always mean the fiscal year, and this fiscal year might start in February or March instead of January. To enable Genie to answer these questions naturally and accurately, include instructions that explicitly map your business jargon to words and concepts Genie can understand. See Provide instructions.

Incorrect table or column usage

If Genie is attempting to pull data from an incorrect table or run analysis on incorrect columns, you might adjust the data in one of the following ways:

  • Provide clear and precise descriptions: Check your tables and associated metadata to check that the terminology used there matches the users’ terminology in submitted questions. If it does not, refine the description or add an instruction that maps the terminology used in the table to the terminology used in the question.
  • Add example queries: Provide sample SQL queries that Genie can use to learn how to respond to certain questions. See Provide instructions.
  • Remove tables or columns from the space: Some tables might include overlapping columns or concepts that make it difficult for Genie to know which data to use in a response. If possible, remove unnecessary or overlapping tables or columns. You might want to create a view that includes only the necessary columns.

Filtering errors

Generated queries often include a WHERE clause to filter results according to a specific value. Because Genie doesn’t have visibility into the actual data, it might set the WHERE clause to filter for the wrong value. For example, it might try to match the name “California” when the table uses abbreviations like “CA.”

For situations like this, try one of the following strategies:

  • If the set of column values is reasonably small, enumerate the valid strings for each column description. Put quotation marks around string values, especially if they have spaces or numbers. Sometimes, for common enumerations, it is enough to say, “Use the three-letter country ISO code” instead of listing every state value.

Incorrect joins

If foreign key references are not defined in your Unity Catalog, your space might not know how different tables should be joined together.

Try implementing one or more of the following solutions:

  • Define foreign key references in your Unity Catalog when possible. See CONSTRAINT clause.
  • Provide example queries where you join tables together in standard ways.
  • If your tables’ foreign key relationships are not specified in your Unity Catalog, document them in the instructions.

If none of these resolve the problem, pre-join the table into a view and use that as input for the space instead. This strategy is helpful for more complex join scenarios like self-joins.

Metric calculation issues

The way that metrics are computed and rolled up can be arbitrarily complicated and encompass many business details that your space doesn’t understand. This can lead to incorrect reporting.

Try implementing one or more of the following solutions:

  • If your metrics are aggregated from base tables, provide example SQL queries computing each roll-up value.
  • If your metrics have been pre-computed and are sitting in aggregated tables, explain this in table comments. Specify valid aggregations for each metric if the metrics in that table can be further rolled up.
  • If the SQL you’re trying to generate is very complicated, try creating views that have already aggregated your metrics for your space.

Incorrect time-based calculations

Genie might not always be able to infer the timezone represented in the data or the timezone in which your analysis needs to be performed unless you explicitly provide additional guidance.

Include more explicit instructions detailing the original source timezone, the conversion function, and the target timezone. The following examples show how to alter the general instructions for more reliable timezone conversions:

  • Always convert times to a specific timezone: In this example, assume that the source timestamp is UTC and you want results in the America/Los_Angeles timezone. Add the following to the instructions replacing <timezone-column> with the appropriate column name:
    • Time zones in the tables are in UTC.
    • Convert all timezones using the following function: convert_timezone('UTC', 'America/Los_Angeles', <timezone-column>).
  • Convert non-UTC datetime formats to UTC: If the workspace default timezone is UTC but users in Los Angeles need to reference today for a specific set of records, add the following to the space’s general instructions:
    • To reference today, use `date(convert_timezone(‘UTC’, ‘America/Los_Angeles’, current_timestamp()))

See convert_timezone function for more details and syntax.

Ignoring instructions

Even if you have explained your tables and columns in comments and provided general instructions, your space might still not be using them correctly.

Try one or more of the following strategies:

  • Provide example queries that use your tables correctly. Example queries are especially effective for teaching your space how to use your data.
  • Create views from your tables that provide a more simplified view of your data.
  • Review your instructions and try to focus the space by removing irrelevant tables or instructions.
  • Try starting a new chat. Previous interactions might influence Genie’s responses in any given chat, but starting a new chat gives you a blank starting point for testing new instructions.

Performance issues

When Genie needs to generate exceptionally long queries or text responses, it can take a long time to respond or even time out during the thinking phase.

Try one or more of the following actions to improve performance:

  • Use trusted assets or views to encapsulate complex queries. See Use trusted assets in AI/BI Genie spaces.
  • Reduce the length of your example SQL queries whenever possible.
  • Start a new chat if Genie starts to generate slow or failing responses.

Unreliable responses to mission-critical questions

Use trusted assets to provide verified answers to specific questions that you expect users to ask. See Use trusted assets in AI/BI Genie spaces.

Token limit warning

Tokens are the basic units of text that Genie uses to process and understand language. Text included as instructions or metadata in a Genie space is converted into tokens. If the number of tokens in your space is nearing the limit, the product notifies you with warnings. Genie applies smart context filtering to select the tokens that represent the metadata and some types of instructions included in the Genie space. Even if you exceed the limit, the space should continue to generate responses to questions.

If your Genie space approaches the token limit, Genie might prioritize including only the parts of your table schema and instructions that are most relevant to the question. This can reduce response quality if important context gets filtered out. Consider the following practices to reduce token count:

  • Remove unnecessary columns: Unnecessary columns in your tables can significantly contribute to token usage. Create views to exclude redundant or non-essential fields from your raw tables.
  • Streamline column descriptions: While column descriptions are important, avoid duplicating information already conveyed by column names. For example, if a column is named account_name, a description like “the name of your account” might be redundant and can be omitted.
  • Simplify instructions: Verify that your instructions are clear and concise. Avoid unnecessary words.
  • Prune example SQL statements: Include a diverse range of example SQL statements to cover various types of questions but remove overlapping or redundant examples.

Your account is not enabled for cross-Geo processing

Genie is a Designated Service managed by Azure Databricks. Designated Services use Databricks Geos to manage data residency. For some regions, data cannot be processed in the same Geo as the workspace. If your workspace is in one of those regions, cross-Geo processing must be enabled by your account administrator.