Data Engineer vs Data Scientist
Picture this 🧐
You've secured a role as a data scientist at a fledgling startup. Your mission is to forecast customer churn, and you're keen on employing an intricate machine-learning technique you've been refining over the years.
However, upon delving into the matter, you realise that all your data is spread across numerous databases. Moreover, the data is stored in tables optimised for running applications rather than for analyses.
To compound the issue, some outdated code has corrupted a significant portion of the data. Your sense of urgency is mounting.
- Data is scattered
- Not optimised for analyses
- Legacy code is causing corrupt data
You need a Data Engineer who can step in to save the day.
A data engineer develops, constructs, tests and maintains architectures such as databases and large-scale processing systems.
Data Engineer Tasks vs Data Scientist Tasks
Data Engineer Tasks | Data Scientist Tasks |
---|---|
Develop scalable data architecture | Mining data for patterns |
Streamline data acquisition | Statistical modeling |
Set up processes to bring data together | Predictive models using ML |
Clean corrupt data | Monitor business processes |
Well versed in cloud technology | Clean outliers in data |
Summary
Here are the most critical Data Engineer daily tasks:
• Gather data from different sources
• Optimise database for analyses
• Remove corrupted data
• Processing large amounts of data
• Use of clusters of machines
Quiz?
Check your knowledge from this article by answering the questions below.
Click on the answer you believe to be correct for each question to see if you are right or wrong!
Q.1. Tasks of the data engineer
Question: Below are three essential tasks that need to happen in a data-driven company. Can you find the one that best fits the job of a data engineer?
Select one answer from the below
Apply a statistical model to a large dataset to find outliers.
Set up scheduled ingestion of data from the application databases to an analytical database.
Come up with a database schema for an application.
Q.2. Data engineering problems
Question: For this exercise, imagine you work in a medium-scale company that hosts an online market for computer accessories. As the company is growing, there are unmistakably some technical growing pains.
As the first data engineer, you observe some problems and have to decide where you're best suited to be of help.
Select one answer from the below
Data scientists are querying the online store databases directly and slowing down the functioning of the application since it's using the same database.
Harmful product recommendations are affecting the sales numbers of the online store.
The online store is slow because the application's database server doesn't have enough memory.
Post Tags: