Home / Computer Science / Reusing results in big data frameworks

Reusing results in big data frameworks

 

Table Of Contents


Project Abstract

Project Overview

Big Data analysis has been a very hot and active research during the past few years. It is getting hard to efficiently execute data analysis task with traditional data warehouse solutions.

Parallel processing platforms and parallel dataflow systems running on top of them are increasingly popular. They have greatly improved the throughput of data analysis tasks. The trade-off is the consumption of more computation resources. Tens or hundreds of nodes run together to execute one task.

However, it might still take hours or even days to complete a task. It is very important to improve resource utilization and computation efficiency. According to research conducted by Microsoft, there exists around 30% of common sub-computations in usual workloads. Computation redundancy is a waste of time and resources.

Apache Pig is a parallel dataflow system runs on top of Apache Hadoop, which is a parallel processing platform. Pig/Hadoop is one of the most popular combinations used to do large scale data processing.

This project proposed a framework which materializes and reuses previous computation results to avoid computation redundancy on top of Pig/Hadoop. The idea came from the materialized view technique in Relational Databases. Computation outputs were selected and stored in the Hadoop File System due to their large size.

The execution statistics of the outputs were stored in MySQL Cluster. The framework used a plan matcher and rewriter component to find the maximally shared common-computation with the query from MySQL Cluster, and rewrite the query with the materialized outputs. The framework was evaluated with the TPC-H Benchmark.

The results showed that execution time had been significantly reduced by avoiding redundant computation. By reusing sub-computations, the query execution time was reduced by 65% on average; while it only took around 30 ˜ 45 seconds when reuse whole computations. Besides, the results showed that the overhead is only around 25% on average.


Blazingprojects Mobile App

📚 Over 50,000 Project Materials
📱 100% Offline: No internet needed
📝 Over 98 Departments
🔍 Software coding and Machine construction
🎓 Postgraduate/Undergraduate Research works
📥 Instant Whatsapp/Email Delivery

Blazingprojects App

Related Research

Computer Science. 2 min read

Predicting Disease Outbreaks Using Machine Learning and Data Analysis...

The project topic, "Predicting Disease Outbreaks Using Machine Learning and Data Analysis," focuses on utilizing advanced computational techniques to ...

BP
Blazingprojects
Read more →
Computer Science. 3 min read

Implementation of a Real-Time Facial Recognition System using Deep Learning Techniqu...

The project on "Implementation of a Real-Time Facial Recognition System using Deep Learning Techniques" aims to develop a sophisticated system that ca...

BP
Blazingprojects
Read more →
Computer Science. 2 min read

Applying Machine Learning for Network Intrusion Detection...

The project topic "Applying Machine Learning for Network Intrusion Detection" focuses on utilizing machine learning algorithms to enhance the detectio...

BP
Blazingprojects
Read more →
Computer Science. 2 min read

Analyzing and Improving Machine Learning Model Performance Using Explainable AI Tech...

The project topic "Analyzing and Improving Machine Learning Model Performance Using Explainable AI Techniques" focuses on enhancing the effectiveness ...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Applying Machine Learning Algorithms for Predicting Stock Market Trends...

The project topic "Applying Machine Learning Algorithms for Predicting Stock Market Trends" revolves around the application of cutting-edge machine le...

BP
Blazingprojects
Read more →
Computer Science. 3 min read

Application of Machine Learning for Predictive Maintenance in Industrial IoT Systems...

The project topic, "Application of Machine Learning for Predictive Maintenance in Industrial IoT Systems," focuses on the integration of machine learn...

BP
Blazingprojects
Read more →
Computer Science. 3 min read

Anomaly Detection in Internet of Things (IoT) Networks using Machine Learning Algori...

Anomaly detection in Internet of Things (IoT) networks using machine learning algorithms is a critical research area that aims to enhance the security and effic...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Anomaly Detection in Network Traffic Using Machine Learning Algorithms...

Anomaly detection in network traffic using machine learning algorithms is a crucial aspect of cybersecurity that aims to identify unusual patterns or behaviors ...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Predictive maintenance using machine learning algorithms...

Predictive maintenance is a proactive maintenance strategy that aims to predict equipment failures before they occur, thereby reducing downtime and maintenance ...

BP
Blazingprojects
Read more →
WhatsApp Click here to chat with us