Good morning,
currently I'm exploring my options for building an internal platform for the company I work for. Our team is responsible for the company's data warehouse and reporting.
As we evolve, we'll be developing an intranet to answer some of the company's necessities and, for some time now, I'm considering scala (and PlayFramework) as the way to go.
This will also envolve a lot of machine learning to cluster clients, predict sales evolution, and so on. This is when I've started to think in Spark ML and came across PredictionIO.
As we are shifting our skills towards data science, what will benefit and teach us/company most:
I'm not trying to open a question opinion based, rather then, learn from your experience / architectures / solutions.
Thank you
Both are good options: 1. use PredictionIO
if you are new to ML
, easy to start but it will limit you in a long run, 2. use spark
if you have confidence in your data science
and data engineering team
, spark has excellent and easy to use api along with extensive ML
library, saying that in order to put things into production, you will require some distributed spark knowledge - experience and it is tricky at times to make it efficient and reliable.
Here are options:
spark
databricks cloud
expensive but easy to use spark, no data engineeringPredictionIO
if you certain that their ML
can solve all your business casesspark
in google dataproc
, easy managed cluster for 60% less than aws
, still some engineering requiredIn summary: PredictionIO
for a quick fix, and spark
for long term data - science / engineering development. You can start with databricks
to minimise expertise overheads and move to dataproc
as you go along to minimise costs