You can translate the document:

Lookout

Expert trails guide Denodo users through all the relevant materials related to a specific topic, including official doc, KB articles, training, Professional Services offering, and more. The main goal is to give users a single place with references to all the information that they need to become a Denodo expert on any specific topic.

Solution Expert Trails intend to provide a backcountry guide of the options available and concepts that need to be considered in designing a comprehensive solution, with resources around each topic to prepare for this journey.  Solution Expert Trails are typically more open ended as we head into the backcountry, every organization needs to plot their own course based on the specific goals and resources of their implementation.

Basecamp

This Solution Expert Trail should be understood in the context of the Self-Service for Data Democratization use case. In particular, this Solution Expert Trail guides an organization on how to accelerate business access to data.

Before launching a Self-Service Analytics Initiative, it is important to consider the path before you.  This trail is one of planning and design; the trail that precedes implementation to ensure success.

Self-Service models are designed to remove the barriers between business users and the data they need to make decisions.  Typically, the primary driver is to enable and encourage business professionals to identify and analyze the data needed to answer their decision driving business questions with minimal, if any, IT support.  This requires an underlying data model that is clear and easy to understand by users who typically have less expertise in data structures.  In some models, consuming users may even be encouraged to suggest or prototype their own data assets to serve the organization as new actionable insights are uncovered.  While the exact architecture may vary based on the user community and data context, the fundamentals of designing around consumption patterns and requirements is paramount.  

The idea of Self-Service Analytics is often associated with a Data Fabric architecture, but the two are not strictly bound to one another.  Self-Service models can be implemented in practice without any intent to implement a Data Fabric.  However, as the Self-Service practice grows in adoption and maturity, it is likely to drift closer to the design concepts of a Data Fabric.  As such, considering the long term outlook from the outset can help to design a model that will gracefully transition from the immediate needs to a more comprehensive Data Strategy.

While this trail is designed with Self-Service Analytics in mind, many of the same considerations are relevant to other Self-Service models, such as Self-Service microservices or SOA, Self-Service for dashboard development, and Self-Service app development.  The primary difference is in the target audience being served, as other models trend towards another layer of “IT” staff building deliverable application assets for the final end users, but there are also distinctions in the volume of data expected, and the iterative feedback for specificity intended.  

The Hike

Stage 1: Requirements

In designing a Self-Service model, the first step is to formalize and communicate the corporate strategy for Self-Service and how it fits in with the larger Data Strategy.  Successful organizations must define what they want to achieve with Self-Service, be it as simple as reducing delays in delivering reports or providing data access organization-wide.  This includes clearly defining the intended audience that will be served, and the scope of their data requirements, as well as the parties responsible for building, maintaining, and supporting the solution long term, and the processes to support them.

Initial design questions to answer include:

  • What are the business objectives of the initiative?
  • How will a Self-Service practice benefit  the organization?
  • What are its limitations?  Risks?
  • How does it support the organization’s larger Data Strategy?
  • How will success be measured?
  • Are all groups aligned on these objectives?
  • What Roles need to be served?
  • Think not only of Roles as job titles or in a broad “report writer” vs “report user” sense; think of the distinct actors in the sense of different types of tasks that will need to be accomplished, as in an Agile Analysis cycle.
  • Define what the Self-Service experience should look like for each Role.
  • Design for growth and adaptability; Role definitions will likely get more granular and well defined as new parts of the project are opened and discussed with the appropriate stakeholders.
  • What Governance currently exists in well documented form?
  • Is it well known?
  • Is it being enforced?
  • What rules (internal or regulatory) exist around exposing certain data to the business?
  • Are there existing protection mechanisms?
  • Are the sensitive data elements clearly identified?
  • What gaps exist in existing Governance?
  • What challenges does existing Governance put in the way of Self-Service?  (e.g. principle of least privilege, data shopkeeping, air gap requirements, etc)
  • What general types of data exist in the enterprise?  How should these be grouped for business user discovery and consumption?
  • As with Roles; think broadly.  If the Data Marketplace is a Department Store, what department would a user visit to start their search?  Related data items should be within the same section.  The vast majority of users should be able to perform their job function without leaving their “section.”

From a recent press release, Gartner states that Nearly Half of Finance Executives See Self-Service Data and Analytics as a Driver of Employee Productivity.  The study further explains that “The advanced data and analytics and AI technologies that are driving (or are expected to deliver) high value, and where investment is expected to increase, include: self-service data analytics”.

However, companies must be mindful of the overall scope and objectives regarding any self-service initiative. These decisions will help to understand early the scale of implementation, the types of users required, their technical proficiency, and set the overall expectations of deliverables.

Best Practices

When designing, take into consideration several best practices specific to a Self-Service system, beyond the standard Data Solution Best Practices:

  1. Set realistic expectations for both developers and business users.  Successful Self-Service requires modeled, fit-for-purpose data that makes sense in business terms in order to drive successful adoption and intelligent business decisions.  Models that aren’t built with business needs in mind risk poor quality and under utilized data sets, likely driving ungoverned data acquisition outside the intended model.  Ensure the end user perspective is clearly understood and aligned with the implementation.
  2. Self-Service still requires IT involvement; letting users toy around with data with no governance in place will usually escalate data democratization to lead to data anarchy and misunderstood conclusions from the data, if not outright inaccurate data.  Ensure the role of IT is clearly defined and understood at the outset, with plans to revisit and revise as needed as the plan meets reality.
  3. Organizational investment in the Self-Service strategy needs to ensure that the business teams are committed and motivated to make the program successful.  This includes training and onboarding programs which emphasize how the insights generated align directly with organizational goals and executive support for the Data Strategy and Vision.
  4. Ensure environmental infrastructure alignment with the Self-Service goal.  Does the self-service model change the scope of QA to require distinct environments to accommodate a more open-ended UAT cycle?  Does Self-Service require discovery by users outside the traditional firewall?  Or does it need to work within a Zero Trust Architecture?

Stage 2: Data Delivery

There are several aspects in the Data Delivery that need to be present when designing the Self-Service Analytics Initiative from phases and users to delivery models and analytics categories. In this section, we cover each of those considerations.

Data Delivery Phases

Delivering Data to the user community consists of three primary phases:

  1. Integration
  2. Delivery
  3. Communication

Integration

To serve the diverse analytical needs of business users, a Self-Service solution should enable the cleansing, integration, aggregation, and preparations of data sets before delivering data of a suitable format to business users for reporting and further analysis.  Self-Service datasets should always be presented in a “ready to use” format, so the consumer can focus on business insights rather than the mechanics of their queries.  The integration of these source objects into usable business objects should typically follow four tiers of Data Preparation.

Typically, the augmented data assets built up in this stage should be reusable by multiple types of consuming users for different Analytic requirements.  How these assets are delivered to the different user types can often be distinct from the assets themselves.

Delivery

A Self-Service solution has to satisfy the demands of business users with different levels of technical expertise and should have fit-for-purpose data delivery mechanisms for data analysis, reporting and visualization, sharing, and collaboration.

CIO Magazine explains that “Data quality is more than just accuracy. Quality must also be focused on data's usage and its timeliness, relevance and accuracy”.  Data and analytics leaders will be responsible for Ensuring the quality of ‘fit for purpose’ data and aligning Data requirements with the correct User Category.  While the Data Cleansing task in the Integration phase ensures the data is accurate, fitting that data for each User Category ensures the data is represented appropriately for the intended use.

Communication

In order for a solution to be successful, there must be clear communication not only on appropriate usage, executive endorsement, and realistic expectations, but in soliciting feedback for continuous improvement.  Analysts need to be able to provide feedback on what is working and what is not, and have a clear method to suggest improvements and new assets based on their findings to grow the Analytics practice.

Self-Service User Categories

Typical data users can be categorized in three broad roles which can overlap and change depending on the specific assignment.  Within each Role or Actor identified in the Requirements stage, there can potentially be all three distinct types of users; the Self-Service model needs to serve these three distinct sets of requirements concurrently.  It is not critical to map specific individuals to each type (especially given not only aforementioned fluidity, but the likelihood of users self-identifying differently than their true usage patterns may exhibit), so much as to be aware of and design for the distinct requirements each presents.

Casual Users

Casual users consist of traditional business resources and usually have limited technical and BI skill sets.  Such users are typically not part of an Analytics practice, but may have similar self-service needs.  In most cases, their data assets require straightforward views to serve dynamic reports and dashboard creation in most cases.

Power Users

Power users are skilled BI users who need a lot of flexibility and functionality. An effective Self-Service BI solution allows them to perform Analytics on different data entities with relationships and conditions that may not be predicted by developers.  To use this data efficiently they need clearly defined objects with documented relationships, granularity, and quality metrics.

Citizen Data Scientists

Citizen Data Scientists are users with the most advanced Analytics skills and highest demand for functionality in the Self-Service solution.  They often need to work with different versions of the same data sets, including visibility into unsanitized data.  A complete solution must cover topics including data exploration, modeling, and deploying a sandbox environment.

Data Discovery

While each of these types of users has a distinct way of working with the data, and may drive variations in the way data is represented in purpose-built data assets, the general workflow for self-service is typically very similar;

  1. Access a central system that provides an inventory of data assets available to them.  In a Denodo implementation, this would usually be the Data Catalog.
  1. Identify the assets needed to answer the question at hand.  This might involve;
  1. Searching an indexed set of metadata such as identifier, description, hierarchical categorization, or tagged values.
  2. Searching an indexed set of data contents curated for such purpose.
  3. Browsing a structured hierarchy such as categories or folder structure.
  4. Exploring relationships between objects starting from known or recently discovered elements.
  5. Receiving recommendations based on automated analysis of usage patterns, or direct human suggestions from a colleague.
  1. If necessary, launch a request for access to the element(s) identified.
  2. Export connection details to access the asset(s) via their analytical client tool of choice.
  3. While analyzing the asset(s) via their client tool, an Analyst may return to the catalog to endorse the asset for a certain purpose, or to initiate a workflow request to report an issue, ask a question, suggest an enhancement or variation, thereby continually enriching the data discovery process for the organization.

Analytics Categories

In parallel with the types of data users is the manners in which data can be analyzed.  In many organizations, especially within the Casual and Power User community, only Diagnostic Analytics are even fathomed.  When implementing a Self-Service Analytics strategy, it is not only important to keep these distinct Analytics in mind when designing, but to leverage the opportunity to educate and inform the user community about the latter two, which can help mature an organization not only into a culture of data-driven decision making, but to lean more into a proactive decision model rather than a strictly reactive decision model.

Delivery Models

Based on the different types of users and Analytics the Self-Service library will serve, the final delivery model may take slightly different shapes.  In some cases, the same data may be represented in different ways for different types of users or usage; for example, a casual user might need a highly denormalized view with drag-and-drop simplicity of usage, while a data scientist may need more modular data assets to combine in less common ways for Predictive Analytics.  With a virtual model, there is no additional cost to making both objects available to users, though the distinct intentions of each should be clearly identified via meaningful metadata, typically supported by an access model that hides the less business-friendly views from less technical consumers.

  • Organizing Data Assets by usage pattern first, then by bounded domains makes it easier to isolate objects based on the type of Analytics they are intended for.  In this model, business area isolation can be managed by metadata and/or security, if appropriate, but encourages “cross-domain” Analytics, which can help spur innovation in the more complex categories.  This model, if strictly maintained, necessitates logical pointers or duplication of assets that do not need Analytic category specificity.
  • Organizing data assets at their coarsest level by bounded domains usually provides the most meaningful means of classification across all client tools.  This allows security to be applied easily based on business role, and for users to easily see the information most relevant to their job in one place.  In this model, more granular access would likely be needed to isolate Prescriptive and Cognitive Analytic Assets from the more common business-friendly Assets, but it encourages user skill growth by keeping everything in one place, and allows for easier reuse of assets that do not need Analytic category specificity.
  • A blend of both models would organize assets by bounded domain for casual and power users, with a separate section for objects intended for Data Science and more complex Analytics.  This allows isolation and organization while encouraging cross-unit usage, reducing the need for duplication or logical pointers.

Whatever the organizational model, defining standards around organization, naming conventions, and both informational and organizational metadata is critical to success.  Providing an interconnected architecture to enable discovery via exploration of related data, as well as navigational query mechanisms helps to ensure consumers are taking advantage of all the assets they have access to with less need to enlist assistance in finding what they need.

Stage 3: Governance

A governance model is all about defining the best practices, procedures, and responsibilities for efficient and secure usage of the Denodo platform.  The control over data and the way information gets handled will need to be determined between the Business and IT.

Gartner states that “The traditional Governance model fails to deliver the flexibility digital organizations need; the new one allows you to tailor your governance style to the business context” and recommends companies Choose Adaptive Data Governance Over One-Size-Fits-All for Greater Flexibility.  Overall, an Adaptive data governance model “allows organizations the flexibility to apply different governance approaches to specific business scenarios”.

There are 3 different modes for effective Data Governance, and all of these modes can co-exist depending on the business requirements and user base.

Business-Led:

Business users have oversight and control in this Self-Service mode. Users explore both governed and ungoverned data assets.

IT Owned:

IT has full ownership of the Self-Service solution and business users must send requests to add or modify data assets.  The IT Owned mode guarantees complete data integrity and safety with a single point of ownership.

Business - IT Hybrid:

This mode involves co-ownership between IT and Business leaders. Business users generate data assets from a high-quality, governed dataset produced by IT. Both IT and Business leaders define regulations and procedures while benefiting from accuracy and consistency in data.

In all three modes, it is important to establish a strong process for testing and stewardship in order to brand any given data asset as certified for production use.  Testing processes look at the technical efficiency of the design and the suitability of the design to be run at load with other data assets.  Stewardship processes should validate that the data asset is reliably delivering the data it claims to, that this data is coming from the appropriate system of record, that the asset is well documented, categorized, and tagged for discoverability and appropriate use, and that it meets corporate standards for terminology, data format, and security.  Stewardship responsibilities should, in turn, be well governed via tracking through metadata properties to show when an asset is fully curated.

Stage 4: Iterative Improvement

Self-service models will, by definition, often generate feedback and requests for changes to the data assets available to the consumers.  It is critical not only to define mechanisms for gathering this feedback, but designing a metadata model that is prepared to accommodate changes over time, typically including strong versioning conventions and deprecation communication.

Managing changes requires knowledge of current usage; this usually requires a strong monitoring practice with regular review of the Self-Service usage patterns.  Setting up blanket rules for how long to maintain deprecated versions or how severe a change requires a new version may work in smaller implementations, but as the Self-Service practice grows it is usually beneficial to make more nuanced decisions based on the usage any given view is receiving.

Ongoing monitoring can also help to find opportunities for improvement that do not originate from the consuming users.  Certain views may highlight usage patterns that were not expected that do not perform as well as expected; these can spawn new assets to fit the specific needs of the user community based on the usage patterns observed.

Exploration

Fill up your backpack with additional gear:

Success Stories

Case Studies

Program Design

Webinars

Additional Resources

Governance and Delivery

KB Articles

Expert Trails

Webinars

Podcasts

Additional Resources

Guided Routes

Denodo Training Courses

Denodo training courses provide expert data virtualization training for data professionals, including administrators, architects, and developers.

If you are interested in Self-Service Analytics you should enroll the following course/s:

Success Services

Denodo Customer Success Services can help you at the start or any part of your Self-Service Analytics trail. You can find information about the Denodo Success Services offering in:

Success Services

Advisory Sessions

Denodo Customers with active subscriptions have access to request Meet a Technical Advisory sessions.

These are the sessions available related to Self-Service Analytics.

Development Methodology

Development Lifecycle

Assistance in defining the development lifecycle when working on a Denodo Platform project.

Testing Policies

Recommendations on defining testing policies. These will help you determine practical success criteria: i.e. when to move forward with planned enhancements, pivot to the next development cycle, rollback an update:

  • Unit Tests
  • Integration Tests
  • System Tests: Regression Testing, Load Testing
  • Performance Testing, Functional Testing, and Security Testing
  • Acceptance Tests

Data Catalog & Governance

Data Catalog: Overview & Best Practices

Capabilities review and general best practices that can be followed to kick start your cataloging activities:

  • Metadata documentation and classification (tags, categories) for data stewardship
  • Advanced metadata and content search
  • Data exploration: Web-based query wizard for non-technical users, personal reports, and sharing options
  • Usage statistics

Data Governance: Overview & Best Practices

Governance capabilities from the Denodo Platform:

- i.e. relationships, data lineage, and model descriptions

Adoption Plan

Define CoE & Administration Teams

Guidance on defining the roadmap and strategy for data virtualization adoption.

Guidance on building a methodology to follow for the adoption.

Guidance to define the roles and capabilities required.

Denodo Project Lifecycle

Project Lifecycle

Guidance on how to define the project lifecycle.

Solution Implementation

Task Lists / Project Plan

Provide guidelines and templates to define the project plan and tasks needed to implement a data virtualization solution.

Success Accelerators

In addition to Advisory sessions, Success Services includes Success Accelerators that can help you on:

  1. Vision and Strategy

Develop a Vision and Strategy for how an organization will collect, store, manage, share and communicate on overall data usage.  

Deliverables and Outcomes

  • Evaluation of Goals and Objectives with Data Virtualization
  • Vision Statement and strategic action plan aligned with organizational outcomes

  1. Business Case Discovery 

Prioritize Data Virtualization Use Cases based on overall business value and feasibility to maximize the impact of your Data Virtualization program and investments.

 

Deliverables and Outcomes

  • Categorize and align selected Use Cases with Data Virtualization capabilities and expectations
  • Prioritize and ranke recommended Use Cases based on expected Business Value and Organizational Feasibility for implementation

  1. Operating Model and CoE

Identify and formalize the DV roles, responsibilities, artifacts, and processes required to accelerate insights and deliver high-value business outcomes

Deliverables and Outcomes

  • Selection Guidance and organizational recommendations for CoE deployment
  • Discussion on best proactive Governance requirements aligned with  selected Denodo Use Cases and Solutions

If you are a Denodo customer, you can reach out to your Customer Success Manager for details about any Guided Route that you need.

Big Hike Prep Check

Let’s see if you are ready to start your big trail. Take this 5-question questionnaire to check your readiness for an enjoyable hike.

Read the questions below, think about the solution and check if you got them right by looking at the solution. Have you become an expert?

  1. What eleven design Best Practices are recommended when designing an Analytics Self-Service solution?

Click here to check if you got it right

1, Consideration of interplay of consumption and discovery mechanisms

2, Definition of clear usage patterns

3, Implementation of strong Governance

4, Architecting for bounded domains

5, Establishing a strong Monitoring practice

6, Planning for continual improvement

7, Early definition of a pilot program and communication plan

8, Setting realistic expectations

9, Clearly defining the scope of IT involvement

10, Securing organizational investment

11, Ensuring environmental infrastructure alignment with the goals.

Find more details in the Top 7 Business Best Practices for Data Project's blog and the Stage 1 of the Hike of this Expert Trail.

  1. What are the four tiers of effective data preparation?

Click here to check if you got it right

1, Data Cleansing

2, Data Integration

3, Data Aggregation

4, Data Augmentation

Find more information in the Elevating Data Integration: A Four-Tier Approach to Effective Data Preparation article.

  1. What are five common methods consuming users use to discover what data assets they need to answer a given business question?

Click here to check if you got it right

1, Searching metadata (such as identifiers, descriptions, categorization, tags, and properties)

2, Searching indexed representations of accessible data

3, Browsing a structured hierarchy of assets organized by business terms

4, Exploring relationships between assets starting from a known asset

5, Receiving recommendations from human colleagues and/or automated AI analysis

See the Data Discovery section in Stage 2 of this hike for more information.

  1. Whether a Governance effort is Business-led, IT-led, or a hybrid, what are the five primary Governance responsibilities of Data Stewards?

Click here to check if you got it right

1, Validate that each data asset is delivering the data is claims to

2, Validate the data is coming from the appropriate, trusted systems of record

3, Ensure each asset is well documented, categorized, and tagged for discoverability and appropriate use

4, Ensure each asset is using appropriate corporate terminology and format standards

5, Ensure each asset is properly secured according to corporate standards and use case

For more information, revisit Stage 3 of this hike, covering Governance.

  1. What two mechanisms are critical for improving a self-service ecosystem over time?

Click here to check if you got it right

1, Clear communication channels for announcing change and gathering feedback.

2, An ongoing monitoring practice to find unexpected usage patterns that may be underperforming or similar outlier patterns prime for improvement.

Double back on your hike to Stage 4: Iterative Improvement for more information.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here