Big Data Discovery makes your job faster and easier than
traditional analytics tools. This topic discusses your goals and needs in data
analysis and shows how Big Data Discovery can address them.
Your goals and needs
As a data scientist or analyst, you:
- Solve complex questions
using a set of disjointed tools. You start with many dynamic imprecise
questions. You often have unpredictable needs for data, visualization, and
discovery capabilities.
To address them, you rely on open source and custom discovery
tools. You use tools in combination with other tools, and often need to reopen
the same tools several times.
In Big Data Discovery, this fragmented workflow between tools is
replaced with a single workflow that is natively part of the Hadoop ecosystem.
- Need to
collaborate. Together with your team, you work with big data from many
external and internal sources. Other team members may consume the results of
your findings. You also improve and publish insights and prototypes of new
offerings.
In Big Data Discovery, you can create personal projects, or
create and share a project with your team.
- Want to make sense of
data. To reach your goals, you often need to create and use insights. You
do this by collecting, cleaning and analyzing your data.
Big Data Discovery lets you make sense of data. You use it to
collect, join, change, and analyze your data.
- Generate ideas and
insights. You want to create insights that lead to changes in business. Or,
you want to enhance existing products, services, and operations. You also need
to define and create prototypes for new data-driven products and services.
In Big Data Discovery, you arrive at insights by using many data
visualization techniques. They include charts, statistical plots, maps, pivot
tables, summarization bars, tag clouds, timelines, and others. You can save and
share results of your discovery using snapshots and bookmarks.
- Validate, trace back,
tweak, and share your hypotheses. You often need to provide new
perspectives to address old problems. The path to the solution is usually
exploratory and involves:
- Hypotheses revision.
You must explore several hypotheses in parallel. They are often based on many
data sets and interim data products.
- Hypotheses validation.
You need to frame and test hypotheses. This requires validation and learning of
experimental methods and results done by others.
- Data recollection and
transparency. You would like all stages of the analytical effort to be
transparent so that your team can repeat them. You want to be able to recreate
analytical workflows created before. You also would like to share your work.
This requires linear history of all steps and activities.
Big Data Discovery helps you by saving your BDD projects, data
sets, and transformation scripts. This lets you improve your projects, share
them, and apply saved transformation scripts to other data sets.
Tasks you can do in Big Data Discovery
In Big Data Discovery, you can:
- Continue to work within
the Hadoop ecosystem and use Big Data Discovery as the visual representation of
data found in Hive tables.
- Run discovery repeatedly
and transparently to others in your group, and on different large-scale data
sets arriving periodically from various sources.
- Create custom discovery
applications to suit your business needs.
- Serve as the original
author of all discovery solution elements: data sets, information models,
discovery applications, and transformation scripts.
- Publish insights to
decision-making groups and social forums inside your organization.
- Roll out your own BDD
applications from scratch, reiterate, and share them with wider groups of users
in your team.