Embedded Data Lake

Mastering SAP HANA Cloud Development with HDI Containers and SAP Datasphere

Introduction

It has become apparent that organizations need to store and analyze both their transactional data and their “big data” (unstructured text, video, and so on) together. However, historically, this has been a challenge as there were different types of repositories required depending on which type of data was being processed. Fortunately, solutions to this historic challenge are starting to become a reality. Thus, the integration of enterprise data with big data has become a pivotal strategy for organizations seeking to derive actionable insights. SAP introduced an embedded data lake to SAP Datasphere specifically to address this challenge. This blog delves into the potential of the Embedded Data Lake within SAP Datasphere, addressing common data integration challenges and unlocking the potential for added business value.

The Challenge

Across industries, enterprises grapple with the complexities of integrating SAP transactional data with other types of data. This challenge is rooted in the historical evolution of data repositories. Until relatively recently, there have been different types of repositories required depending on which type of data was being processed. Data Warehouses do a great job as a repository for transactional data. Data Lakes do a good job as a repository for raw, unstructured and semi-structured data. But they stand as separate silos, the implications of this include the following:

  • Complexity of Data Analysis: It is a challenge to manage, integrate, and analyze data across multiple repositories. The data is not in one unified environment which can be challenging for business users to navigate creating extra overhead and inefficiencies.
  • Cost Implications: With multiple repositories, organizations face additional expenditures on software, hardware, licensing, and appropriately skilled resources.
  • Operational Overheads: Solutions for items such as data tiering and archiving need to be designed for each repository, creating additional operational overhead.


Meeting the Challenge: Embedded Data Lake in SAP Datasphere

In a strategic move to address these challenges head-on, SAP unveiled SAP Datasphere, the evolutionary successor to SAP Data Warehouse Cloud, on March 8, 2023. A cornerstone of this innovative offering is the integration of an Embedded Data Lake, providing a seamless and unified data management experience within the SAP ecosystem.

Understanding the Embedded Data Lake

What is a Data Lake?

Before exploring the specifics of the Embedded Data Lake, it’s essential to understand the concept of a data lake. A data lake is a centralized repository that allows organizations to store all their structured and unstructured data at any scale. Unlike traditional data storage systems, data lakes can retain data in its raw format, enabling advanced analytics and deriving valuable insights from diverse data sources.

Embedded Data Lake in SAP Datasphere

An embedded data lake in SAP Datasphere integrates the powerful data lake functionality directly within the SAP environment. This integration provides users with a unified platform where they can store, manage, and analyze their data, leveraging SAP’s advanced analytics tools and applications. By embedding a data lake within SAP Datasphere, organizations can streamline their data management processes and unlock new possibilities for data-driven decision-making.
Benefits of Embedded Data Lake in SAP Datasphere

Unified Data Management

The Embedded Data Lake facilitates seamless integration of data within a single platform, streamlining data management processes and reducing operational complexity. The centralized nature of the data lake ensures that all relevant data is readily available, empowering users to make informed choices based on the most up-to-date information.

Scalability and Cost Efficiency

By leveraging the cost-effective data storage options within SAP Datasphere, and eliminating the costs of multiple repository solutions, organizations can optimize their data management costs. By eliminating the need for separate data integration solutions and infrastructure, the Embedded Data Lake drives cost efficiencies and maximizes ROI for businesses.

Data Tiering Scenarios: Cold-to-Hot and Hot-to-Cold

Effective data management often requires balancing performance and cost, which is where data tiering comes into play. The Embedded Data Lake in SAP Datasphere supports two data tiering scenarios to optimize your data storage strategy.

  • Cold-to-Hot: In a Cold-to-Hot tiering scenario, data that is initially stored in a cold tier (less frequently accessed and lower cost) is moved to a hot tier (frequently accessed and higher cost) as it becomes more relevant for real-time analysis. This ensures that critical data is readily available when needed, without incurring high storage costs for less frequently accessed data.
  • Hot-to-Cold: Conversely, in a Hot-to-Cold tiering scenario, data that starts in a hot tier (frequently accessed) is moved to a cold tier (less frequently accessed) as its relevance decreases over time. This helps manage storage costs by keeping only the most relevant data in the more expensive, high-performance storage tier.


Real-Time Analytics

With SAP Datasphere’s real-time processing capabilities, organizations can derive actionable insights from data in real-time, enabling agile decision-making.

In Conclusion – A Point of View

The Embedded Data Lake in SAP Datasphere represents a paradigm shift. By leveraging the full power SAP Datasphere, it paves the way for a future where data-driven decision-making is not just a possibility but a reality. As we look towards the future, the Embedded Data Lake stands poised to revolutionize the way we harness the power of data, ushering in a new era of innovation and growth. Feel free to reach out to us with questions or to schedule a free live demonstration of the SAP Datasphere embedded data lake.

Please complete the form to access the whitepaper:

HDI Containers

Mastering SAP HANA Cloud Development with HDI Containers and SAP Datasphere

What Are HDI Containers?

Before we get into the nitty-gritty, let’s demystify HDI containers. HDI stands for SAP HANA Deployment Infrastructure, a key service that helps you deploy database development artifacts into containers. Think of them as specialized storage units for your database artifacts. These artifacts include:

  • Tables
  • Views
  • Procedures
  • Advanced Artifacts: Calculation views, flowgraphs, replication tasks

The beauty of HDI is that it maintains a consistent set of design-time artifacts that describe the target state of SAP HANA database features, streamlining both development and deployment processes.

Integrating HDI Containers with SAP Datasphere

SAP Datasphere allows the assignment of built HDI containers to its spaces, providing immediate bi-directional access between HDI containers and Datasphere spaces without requiring data movement. This integration enhances flexibility and efficiency in data management and modeling processes.

  • Deploy HDI Containers: Use SAP Business Application Studio (BAS) to create and deploy HDI containers in the underlying SAP HANA Cloud database.
  • Assign Containers to Spaces: In SAP Datasphere, enable HDI Container access and assign the deployed HDI containers to specific spaces to access their objects and content immediately.
  • Refine Models in SAP Datasphere: Use the Data Builder in SAP Datasphere to create and refine models within your HDI containers. You can combine these models with others in Datasphere, ensuring seamless integration.
  • Refine Models in HDI Containers: Allow models and datasets from SAP Datasphere’s space schema to be utilized within your HDI containers, enabling a two-way interaction.

Business Use Cases for HDI Containers within SAP Datasphere

HDI container-based developments support a wide range of scenarios, including:

  • Migration from HANA Enterprise Data Mart to SAP Datasphere: Organizations can leverage multi-model analytics capabilities while migrating from HANA Enterprise Data Mart to SAP Datasphere. This transition allows for advanced data analytics and modeling within a modern, integrated environment.
  • Migration from SAP BW to SAP Datasphere: By utilizing native HANA developments, companies migrating from SAP BW to SAP Datasphere can maintain their existing data processes and enhance their data warehousing capabilities with the advanced features of SAP HANA Cloud.
  • External OData Consumption or Web API Exposure: SAP Datasphere enables the publication of space objects as external OData services or Web APIs. This capability facilitates seamless data sharing and integration with external applications and services.
  • Complex On-Prem Use Cases: Handle complex on-prem scenarios with limitations in adopting Datasphere.
  • Complex DB Procedures for Actionable Functionality: Develop and manage complex database procedures to implement actionable functionalities.
  • HANA Sidecar Phased Retirement: Gradually retire HANA sidecar systems by integrating with SAP Datasphere.
  • Migrate PAL and APL Use Cases: Migrate Predictive Analysis Library (PAL) and Automated Predictive Library (APL) use cases from on-premises to HANA Cloud.
  • Leverage Machine Learning Capabilities: Utilize embedded machine learning and advanced analytics within SAP Datasphere without data extraction.
  • Data Science Enrichment: Use existing Python or R environments to trigger calculations in SAP Datasphere, train ML models, and store prediction results in HDI container tables.
  Benefits of HDI Containers in SAP Datasphere

The integration of HDI containers within SAP Datasphere offers several significant advantages:

  • Immediate Access: Objects and content of HDI containers are instantly accessible within SAP Datasphere spaces without the need for data movement.
  • Seamless Workflow: Users can harness SAP HANA Cloud’s advanced features while enjoying a user-friendly environment in SAP Datasphere.
  • Advanced Data Modelling: HDI containers support complex developments and provide advanced functionalities that complement the user-oriented features of SAP Datasphere.
  • Git Versioning: HDI introduces the usage of versioning tools like Git, which helps in conflict resolution and allows many developers to develop in parallel without interference. This supports modern development styles and accelerates development cycles on the database.
  • Life Cycle Management: Supports automated CI/CD pipelines for efficient life cycle management.
  • Higher Parallelism: HDI supports higher parallelism with no singleton deployment, allowing for more efficient and faster deployment processes.
  • Debugging and Performance Optimization: HDI provides robust debugging and performance optimization capabilities, leveraging SAP HANA optimization techniques such as pruning and parallelization to ensure high performance.
  Conclusion

Combining the development strengths of HDI containers with the user-friendly features of SAP Datasphere offers the best of both worlds. This hybrid approach supports advanced and complex data developments while ensuring ease of use and maintainability. For large projects with multiple developers, the choice between HANA and Datasphere will depend on specific requirements, such as the need for version control and Git integration.

By leveraging HDI containers in SAP Datasphere, organizations can achieve seamless data management and complex data modeling capabilities, ultimately enhancing their data warehousing solutions.

For more detailed guidance on implementing HDI container-based developments in SAP Datasphere, refer to the comprehensive resources available on the SAP Community.

Feel free to contact us with questions or to schedule a demonstration of this capability.

Please complete the form to access the whitepaper: