Tag

data

Browsing

Data privacy laws are becoming a major focus globally as businesses scramble to meet new compliance obligations.

Privacy rules generally oblige any business or organization to securely store all data collected or processed by them. What they do with that data is strictly regulated.

According to a Gartner report, by the end of next year about 65% of the world’s population will have their personal data covered under modern privacy rules. Following these extended rules can be challenging.

The harvesting of personal data from electronic transactions and the increasing use of the Internet over the past 20 years have seen companies have almost free reign.

Many organizations involved in international commerce must modify their procedures in line with the new law. This is a priority for transactions and correspondence involving e-commerce and social media.

Expanding consumer mistrust, government action, and competition for customers prompted some governments to introduce stricter rules and regulations. Its effect is changing the conditions of a no-man’s land, which has allowed both large companies and small businesses to run rampant with people’s personal data.

“The biggest challenge companies face by far is maintaining the amount of data they manage, which is subject to ever-changing data privacy requirements,” Neil Jones, director of cybersecurity evangelism at Egnyte, told TechNewsWorld.

Classification of different demands

The European Union has a General Data Protection Regulation (GDPR). According to Jones, in the UK and Continental Europe, data privacy has generally been viewed as a fundamental human right. In the US and Canada, businesses must navigate around a growing patchwork of state and provincial laws.

Data privacy law in the US and Canada has traditionally been more fragmented than in the UK and Europe. Canada’s Quebec, and the United States’ Utah and Connecticut are the latest to enact comprehensive data privacy laws, joining the US states of California, Virginia and Colorado.

By the end of 2023, 10% of states in the US will be covered by data privacy legislation, Jones said. The lack of a universal standard for data privacy has created an artificial layer of business complexity.

In addition, today’s hybrid work environment has created new levels of risk, with complex compliance with myriad privacy concerns.

what’s at stake

To increase productivity, organizations may need to ask employees detailed questions about their behavior and work-from-home arrangements. According to Jones, these types of questions can create unintended privacy implications of their own.

The recent convergence of Personally Identifiable Information (PII) and Protected Health Information (PHI) has put even highly confidential data at risk. This includes confidential test results such as workers’ compensation reports, health records of employees and patients, and COVID-19 information.

“With 65% of the world’s population expected to have personal data covered under privacy regulations by next year, respecting data privacy has never been more important,” Jones said.

cloud privacy barriers

Data privacy and security are the top challenges for implementing a cloud strategy, now rebranded as Foundry, according to a recent study by IDG. In this study, the role of data security was a major concern.

When implementing a cloud strategy, IT decision makers or ITDMs are facing challenges such as controlling cloud costs, data privacy and security challenges, and lack of cloud security skills/expertise.

With more focus on securing privacy data, this problem becomes bigger as more organizations migrate to the cloud. The two main obstacles the IDG study found were data privacy and security challenges and a lack of cloud security skills/expertise.

According to Foundry, spending on cloud infrastructure has increased by about $5 million this year.

“Although enterprise businesses are leading the charge, SMBs are not far behind when it comes to cloud migration,” said Stacey Rapp, marketing and research manager at Foundry, when the report was released.

“As more organizations move towards living entirely in the cloud, IT teams will need the appropriate talent and resources to manage their cloud infrastructure and overcome any security and privacy barriers that may occur in the cloud,” he said.

obtaining compliance

Organizations can successfully prepare for data privacy legislation, but doing so requires making data privacy initiatives a “full-time job,” Jones maintained.

“Many organizations view data privacy as a part-time project for their web teams, not a full-time business initiative that can significantly impact customer relationships, employee morale and brand reputation,” he said. offered.

Beyond that step comes establishing holistic data governance programs that provide greater visibility into a company’s regulated and sensitive data. Added to this is working with trusted business and technology partners who understand the data privacy space and can help you prepare for rapidly evolving regulations.

Jones suggests that perhaps the most dynamic approach is to use advanced privacy and compliance (APC) solutions. It enables organizations to easily comply with global privacy regulations in one place.

Specifically, APCs can help achieve compliance by:

  • Managing Data Subject Access Requests (DSARs), such as the right of individuals to be notified of personal data collected on them, the right to opt-out of personal information being sold to others, or by collecting organizations right to be forgotten
  • Assessing the company’s compliance preparedness and scope with specific regulations (eg, GDPR, CCPA)
  • Create and review technical assessments of third-party vendors and evaluate potential risks to consumer data
  • Enhance cookie consent capabilities such as integration of cookie consent into compliance workflows

active labor

It can be difficult for companies to understand today’s rapidly evolving privacy landscape, as well as how specific rules apply to them, Jones said. However, by taking proactive steps, organizations can stay on top of data privacy regulations in the future.

Those phases include these ongoing tasks:

  • Monitor the status of data privacy regulations in the countries, provinces and states where the customer base resides
  • Create a data privacy task force that can improve organizational focus and increase senior executive focus on privacy initiatives
  • Be aware of new federal data privacy legislation such as the proposed US Data Privacy and Protection Act (ADPPA)

It is also important to note the long-term benefits of data privacy compliance. Specifically strengthening the company’s overall cyber security protections.

The cost of cleaning up data is often beyond the comfort zone of businesses full of potentially dirty data. This paves the way for reliable and compliant corporate data flows.

According to Kyle Kirwan, co-founder and CEO of data observability platform BigEye, few companies have the resources needed to develop tools for challenges such as large-scale data observability. As a result, many companies are essentially going blind, reacting when something goes wrong instead of continually addressing data quality.

A data trust provides a legal framework for the management of shared data. It promotes cooperation through common rules for data protection, confidentiality and confidentiality; and enables organizations to securely connect their data sources to a shared repository of data.

Bigeye brings together data engineers, analysts, scientists and stakeholders to build trust in data. Its platform helps companies create SLAs for monitoring and anomaly detection and ensuring data quality and reliable pipelines.

With full API access, a user-friendly interface, and automated yet flexible customization, data teams can monitor quality, consistently detect and resolve issues, and ensure that each be able to rely on user data.

uber data experience

Two early members of the data team at Uber — Kirvan and bigeye co-founder and CTO Egor Gryznov — set out to use what they learned to build Uber’s scale to build easy-to-deploy SaaS tools for data engineers. prepared for.

Kiran was one of Uber’s first data scientists and the first metadata product manager. Gryaznov was a staff-level engineer who managed Uber’s Vertica data warehouse and developed a number of internal data engineering tools and frameworks.

He realized that his team was building tools to manage Uber’s vast data lake and the thousands of internal data users available to most data engineering teams.

Automatically monitoring and detecting reliability issues within thousands of tables in a data warehouse is no easy task. Companies like Instacart, Udacity, Docker, and Clubhouse use Bigeye to make their analysis and machine learning work consistently.

a growing area

Founding Bigeye in 2019, he recognized the growing problem of enterprises deploying data in operations workflows, machine learning-powered products and services, and high-ROI use cases such as strategic analysis and business intelligence-driven decision-making.

The data observability space saw several entrants in 2021. Bigeye differentiates itself from that pack by giving users the ability to automatically assess customer data quality with over 70 unique data quality metrics.

These metrics are trained with thousands of different anomaly detection models to ensure data quality problems – even the most difficult to detect – are ahead of data engineers ever. Do not increase

Last year, data observability burst onto the scene, with at least ten data observability startups announcing significant funding rounds.

Kirwan predicted that this year, data observation will become a priority for data teams as they seek to balance the demand for managing complex platforms with the need to ensure data quality and pipeline reliability.

solution rundown

Bigeye’s data platform is no longer in beta. Some enterprise-grade features are still on the roadmap, such as full role-based access control. But others, such as SSO and in-VPC deployment, are available today.

The app is closed source, and hence proprietary models are used for anomaly detection. Bigeye is a big fan of open-source alternatives, but decided to develop one on its own to achieve internally set performance goals.

Machine learning is used in a few key places to bring a unique mix of metrics to each table in a customer’s connected data sources. Anomaly detection models are trained on each of those metrics to detect abnormal behavior.

Built-in three features in late 2021 automatically detect and alert data quality issues and enable data quality SLAs.

The first, deltas, makes it easy to compare and validate multiple versions of any dataset.

Issues, second, brings together multiple alerts at the same time with valuable context about related issues. This makes it easier to document past improvements and speed up proposals.

Third, the dashboard provides a holistic view of the health of the data, helps identify data quality hotspots, close gaps in monitoring coverage, and measures a team’s improvement in reliability.

eyeball data warehouse

TechNewsWorld spoke with Kirwan to uncover some of the complexities of his company’s data sniffing platform, which provides data scientists.

TechNewsWorld: What makes Bigeye’s approach innovative or cutting edge?

Kyle Kiran Bigey Co-Founder and CEO
Kyle Kiran, BigEye Co-Founder and CEO

Kyle Kiran: Data observation requires a consistent and thorough knowledge of what is happening inside all the tables and pipelines in your data stack. It is similar to SRE [site reliability engineering] And DevOps teams use applications and infrastructure to work round the clock. But it has been repurposed for the world of data engineering and data science.

While data quality and data reliability have been an issue for decades, data applications are now important in how many major businesses run; Because any loss of data, outage, or degradation can quickly result in loss of revenue and customers.

Without data observability, data dealers must continually react to data quality issues and entanglements as they go about using the data. A better solution is to proactively identify the problems and fix the root causes.

How does trust affect data?

Ray: Often, problems are discovered by stakeholders such as executives who do not trust their often broken dashboards. Or users get confusing results from in-product machine learning models. Data engineers can better get ahead of problems and prevent business impact if they are alerted enough.

How does this concept differ from similar sounding technologies like Integrated Data Management?

Ray: Data observability is a core function within data operations (think: data management). Many customers look for best-of-breed solutions for each task within data operations. This is why technologies like Snowflake, FiveTran, Airflow and DBT are exploding in popularity. Each is considered an important part of the “modern data stack” rather than a one-size-fits-none solution.

Data Overview, Data SLA, ETL [extract, transform, load] Code version control, data pipeline testing, and other techniques must be used to keep modern data pipelines working smoothly. Just like how high-performance software engineers and DevOps teams use their collaborative technologies.

What role do data pipelines and dataops play with data visibility?

Ray: Data Observability is closely related to the emerging practice of DataOps and Data Reliability Engineering. DataOps refers to the broad set of operational challenges that data platform owners will face. Data Reliability Engineering is a part, but only part, of Data Ops, just as Site Reliability Engineering is related but does not include all DevOps.

Data security can benefit from data observation, as it can be used to identify unexpected changes in query volume on different tables or changes in the behavior of ETL pipelines. However, data observation by itself will not be a complete data protection solution.

What challenges does this technology face?

Ray: These challenges include issues such as data discovery and governance, cost tracking and management, and access control. It also includes how to handle queries, dashboards, and the growing number of ML features and models.

Reliability and uptime are certainly challenges many DevOps teams are responsible for. But they are also often charged for other aspects such as developer velocity and security reasons. Within these two areas, data overview enables data teams to know whether their data and data pipeline are error free.

What are the challenges of implementing and maintaining data observability technology?

Ray: Effective data observability systems must be integrated into the workflows of the data team. This enables them to continuously respond to data issues and focus on growing their data platform rather than putting out data fires. However, poorly tuned data observability systems can result in a flood of false positives.

An effective data system should perform more maintenance than just testing for data quality issues by automatically adapting to changes in the business. A poorly optimized data observation system, however, may not be accurate for changes in business or more accurate for changes in business that require manual tuning, which can be time-consuming.

Data observability can also be taxing on a data warehouse if not optimized properly. Bigeye teams have experience in optimizing large-scale data observation capability to ensure that the platform does not impact data warehouse performance.

Do you know whether your company data is clean and well managed? Why does it matter anyway?

Without a working governance plan, you may have no company to worry about – data-wise.

Data governance is a collection of practices and procedures establishing rules, policies and procedures that ensure data accuracy, quality, reliability and security. It ensures the formal management of data assets within an organization.

Everyone in business understands the need to have and use clean data. But making sure it’s clean and usable is a bigger challenge, according to David Kolinek, vice president of product management at Atacama.

This challenge is compounded when business users have to rely on scarce technical resources. Often, no one person oversees data governance, or that person doesn’t have a complete understanding of how the data will be used and how to clean it up.

This is where Atacama comes into play. The company’s mission is to provide a solution that even people without technical knowledge, such as SQL skills, can use to find the data they need, evaluate its quality, understand any issues How to fix that and determine if that data will serve their purposes.

“With Atacama, business users don’t need to involve IT to manage, access and clean their data,” Kolinek told TechNewsWorld.

Keeping in mind the users

Atacama was founded in 2007 and was originally bootstrapped.

It started as a part of a consulting company, Edstra, which is still in business today. However, Atacama focused on software rather than consulting. So management spun off that operation as a product company that addresses data quality issues.

Atacama started with a basic approach – an engine that did basic data cleaning and transformation. But it still requires an expert user because of the user-supplied configuration.

“So, we added a visual presentation for the steps enabling things like data transformation and cleanup. This made it a low-code platform because users were able to do most of the work using just the application user interface. But that’s right now.” was also a fat-client platform,” Kolinek explained.

However, the current version is designed with the non-technical user in mind. The software includes a thin client, a focus on automation, and an easy-to-use interface.

“But what really stands out is the user experience, made up of the seamless integration that we were able to achieve with the 13th version of our engine. It delivers robust performance that is crafted to perfection,” he said. offered.

Digging deeper into data management issues

I asked Kolinek to discuss the issues of data governance and quality further. Here is our conversation.

TechNewsWorld: How is Atacama’s concept of centralizing or consolidating data management different from other cloud systems such as Microsoft, Salesforce, AWS and Google Cloud?

David Kolinek: We are platform agnostic and do not target a specific technology. Microsoft and AWS have their own native solutions that work well, but only within their own infrastructure. Our portfolio is wide open so it can serve all use cases that should be included in any infrastructure.

In addition, we have data processing capabilities that not all cloud providers have. Metadata is useful for automated processing, generating more metadata, which can be used for additional analysis.

We have developed both these technologies in-house so that we can provide native integration. As a result, we can provide a better user experience and complete automation.

How is this concept different from the notion of standardization of data?

David Kolinek
David Kolinek
Vice President of Product Management,
atacama

Kolinek: Standardization is just one of many things we do. Typically, standardization can be easily automated, in the same way that we can automate cleaning or data enrichment. We also provide manual data correction when resolving certain issues, such as missing Social Security numbers.

We cannot generate SSN but we can get date of birth from other information. So, standardization is no different. It is a subset of things that improve quality. But for us it is not just about data standardization. It is about having good quality data so that the information can be leveraged properly.

How does Atacama’s data management platform benefit users?

Kolinek: User experience is really our biggest advantage, and the platform is ideal for handling multiple individuals. Companies need to enable both business users and IT people when it comes to data management. This requires a solution for business and IT to collaborate.

Another great advantage of our platform is the strong synergy between data processing and metadata management that it provides.

Most other data management vendors cover only one of these areas. We also use machine learning and a rules-based approach and validation/standardization, both of which, again, are not supported by other vendors.

Furthermore, because we are ignorant of technology, users can connect to many different technologies from a single platform. With edge processing, for example, you can configure something in the Atacama One once, and the platform will translate it for different platforms.

Does Atacama’s platform lock-in users the same way proprietary software often does?

Kolinek: We have developed all the main components of the platform ourselves. They are tightly integrated together. There has been a huge wave of acquisitions in this space lately, with big sellers buying out smaller sellers to fill in the gaps. In some cases, you are actually buying and managing not one platform, but several.

With Atacama, you can buy just one module, such as Data Quality/Standardization, and later expand to others, such as Master Data Management (MDM). It all works together seamlessly. Just activate our modules as you need them. This makes it easy for customers to start small and expand when the time is right.

Why is the Integrated Data Platform so important in this process?

Kolinek: The biggest advantage of a unified platform is that companies are not looking for a point-to-point solution to a single problem like data standardization. It is all interconnected.

For example, to standardize you must verify the quality of the data, and for that, you must first find and catalog it. If you have an issue, even though it may seem like a discrete problem, it probably involves many other aspects of data management.

The beauty of an integrated platform is that in most use cases, you have a solution with native integration, and you can start using other modules.

What role do AI and ML play today in data governance, data quality and master data management? How is this changing the process?

Kolinek: Machine learning enables customers to be more proactive. First, you’ll identify and report a problem. One has to check what went wrong and see if there is anything wrong with the data. You would then create a rule for data quality to prevent repetition. It’s all reactive and based on something being broken down, found, reported and fixed again.

Again, ML lets you be proactive. You give it training data instead of rules. The platform then detects differences in patterns and identifies discrepancies to help you realize there was a problem. This is not possible with a rule-based approach, and is very easy to measure if you have a large amount of data sources. The more data you have, the better the training and its accuracy.

Aside from cost savings, what benefits can enterprises gain from consolidating their data repositories? For example, does it improve security, CX results, etc.?

Kolinek: This improves safety and minimizes potential future leaks. For example, we had customers who were storing data that no one was using. In many cases, they didn’t even know the data existed! Now, they are not only integrating their technology stack, but they can also see all the stored data.

It is also very easy to add newcomers to the platform with consolidated data. The more transparent the environment, the sooner people will be able to use it and start getting value.

It is not so much about saving money as it is about leveraging all your data to generate a competitive advantage and generate additional revenue. It provides data scientists with the means to build things that will drive business forward.

What are the steps in adopting a data management platform?

Kolinek: Start with a preliminary analysis. Focus on the biggest issues the company wants to tackle and select platform modules to address them. It is important to define goals at this stage. Which KPIs do you want to target? What level of ID do you want to achieve? These are questions you should ask.

Next, you need a champion to drive execution and identify the key stakeholders driving the initiative. This requires extensive communication between various stakeholders, so it is important that one focuses on educating others about the benefits and helping the teams on the system. Then comes the implementation phase where you address the key issues identified in the analysis, followed by the rollout.

Finally, think about the next set of issues that need to be addressed, and if necessary, enable additional modules in the platform to achieve those goals. The worst part is buying a device and providing it, but not providing any service, education or support. This will ensure that the adoption rate will be low. Education, support and service are very important for the adoption phase.