I’ll skip to the point where you have the compiled code, either from downloading from GitHub precompiled, or by compiling from source.
You can read about how to compile your own version from the Hadoop code base, or acquire a precompiled version, in my post on the subject here. This is a component of the Hadoop code base that is used for certain Windows file system operations and is needed for Spark to run on Windows. Apologies in advance to R users as not being an R user I won’t be covering this. I’ll split things into core requirements, just Python, just Scala, and Python and Scala, to cover off the main development scenarios.
Right, with that said, let’s take a look at what we need to get started. Considerations for cost reduction, developing offline, and, at least for minimal datasets, faster development workflow as network round-tripping is removed, all help. With Windows being a popular O/S for organisations’ development desktops it makes sense to consider this setup. Local databricks development offers a number of obvious advantages. The intention is to allow you to carry out development at least up to the point of unit testing your code. It includes setup for both Python and Scala development requirements. This post sets out steps required to get your local development environment setup on Windows for databricks. Local Development using Databricks Clusters.Local Databricks Development on Windows.