Remove Data Science Bottlenecks with Coder

author avatar
Mark Milligan
 on June 16th, 2023
Updated on August 7th, 2023

Like fullstack and frontend developers, professionals in data engineering, analytics and data science require a development environment and IDEs to accomplish their work.

Historically, users install languages like Python and IDEs like VS Code, Jupyter and JetBrains PyCharm on their local computers, check out source code like notebooks and download datasets. For compute-intensive work like model training and model running, jobs are submitted to remote, cloud compute platforms like Databricks and Snowflake to name a couple.

Local development environments introduce several challenges that impact end users' ability to interact and deliver data solutions quickly and on time which ultimately impact their enterprise's business outcomes. Conversely, infrastructure teams like Platform Engineering, DevOps, aka DataOps and AIOps, spend an inordinate amount of time supporting end users' development environments and the bottlenecks caused by an enterprise's data platform.

The Coder platform solves these challenges for both data professionals and the infrastructure teams that support analytics and data science development environments.

End User Challenges

Time Required To Setup Experiments

Local machines are by default put in the penalty box by enterprise IT. These laptops are mobile and connect through users' home Internet or worse coffee shop WiFi connections, so extensive VPN and networking configurations are required to connect a user with their data required to setup analytics and data science experiments.

With Coder, development environments reside in the enterprise's network which is adjacent to network resources that house data and source code required for analytical experimentation. Infrastructure teams only need to ensure connectivity from users' remote development environments and their data, both which are within an enterprise's infrastructure. Remote development alternatives to Coder are SaaS and hosted which requires or in some cases prevents any connectivity with enterprises' data.

End users also waste time installing development environments on local computers. If they switch projects with different tools, another local environment installation is required, wasting more time and keeping them out of flow and work on data experiements.

Need for Powerful Servers With a Local Development Experience

End users like a local development experience. IDEs open faster, keystrokes and mouse clicks instantly respond. With Coder, local VS Code and JetBrains IDEs can connect to Coder remote development environments for a snappy user experience but with all the security and compute power of a remote development environment leveraging an enterprise's scalable cloud computing infrastructure. Coder enhances this local experience with a VS Code extension and a JetBrains Gateway plugin that eliminates any configuration.

Persisting Code and Data

Containers as development environments are stateless. i.e., if a development environment is stopped for the day due to inactivity (which is a cost saver by the way), everything created or resident in memory is lost.

With Coder, development environments are defined by Terraform templates that create and mount a persistent disk to the Kubernetes pod and container, which serves as the user's home directory to store code and data. e.g., /home/coder When an environment shuts down, the home directory persists. When the environment is restarted, the home directory volume is remounted. Templates can include additional persistent options like network file share that users already use for artifacts in their daily flow.

Install Additional Python Packages

DevOps and team leaders can define container or VM images with required languages and kernels like Python and packages. The Terraform template also includes a bash startup script where packages can be installed like a Python package and programs can be automatically started like a code-server (VS Code in a browser) or Jupyter IDE after the development environment build is complete. End users can install additional packages as either part of a personalize script or their own dotfiles repository and may decide to persist these packages like with pip3 install --user argument to the user's home directory.

A Coder development environment with a Jupyter IDE and a Python notebook open

Use Any IDE

Unlike languages like Java that have a very limited set of IDEs that developers will use e.g., JetBrains IntelliJ or Eclipse, Python and data science languages flourish with a long tail of IDE options for specific use cases. e.g., VS Code, PyCharm, RStudio, MATLAB, Jupyter, Airflow, etc.

IDEs like VS Code and PyCharm can be installed locally or operate web-based to communicate with a Coder development environment. Others like Coder's code-server, Jupyter and Airflow are natively web-based. Finally some IDEs are thick client-based and can be made web-accessible within Coder through a VNC web client like KasmVNC.

Alternative remote development solutions like AWS Cloud9 are a proprietary, non-standard IDE that only operates on AWS and may not appeal or be acceptable to end users.

Infrastructure teams configure Terraform templates to install and start IDEs, clone source code and dotfiles git repositories, and install VS Code extensions

Clone A Code Repository

Users want to quickly get in flow when they request a remote development environment build. Data and Infrastructure leads can add a configuration to the Terraform template to either prompt users for a code repository name at build time or just automatically perform a git clone operation on a specific repository as part of the template's startup script.

End users create development environments with input parameters specified in the Terraform templates

There is also an Open in Coder button that Coder generates which can be included in a repository's README markdown which when clicked takes the user to a specific Coder deployment, builds a development environment based on a specific template and clones the repository. The result is that with a couple of clicks from within a familiar Git repository web user interface, an end user has a powerful remote development environment with an IDE and repository open and ready to code.

The Open in Coder button in a git source code repository to automatically create a development environment in Coder

Access to Data Sources

Data access is easier because Infrastructure teams control the network access with their Coder Kubernetes deployment and other network resources like databases and data lakes. e.g., Snowflake, Microsoft SQL Server, Oracle, Teradata, IBM DB/2 and Dremio. Local development environment access to these data sources is more complicated, with complex VPN configuration and latency and reliability issues with data having to traverse to the last mile of a local computer.

Access to More Computing Power

Coder managed development environments run within an enterprise's server infrastructure, such as Kubernetes or dedicated Virtual Machines. The Terraform template can be hard-wired with compute settings or prompt the user with an approved range of CPU, Memory and Disk at development environment creation. Users are more productive since they can access more compute than imaginable on their local computers. Infrastructure teams benefit from Kubernetes built-in ability to scale up additional cluster nodes and regulate compute issued to each user with resource limits and requests for a lower infrastructure cost posture than dedicated VMs.

End users specify how much CPU, memory and disk required for their remote development environment

Support Long-Running Tasks In An Ephemeral World

Infrastructure teams like Coder because they centrally enforce when development environments shutdown if inactive, and therefore saving compute costs. End users want the option to perform long-running tasks like model training and running, without fear of environments shutting off.

With Coder, administrators can enable at the template-level the ability for end users to override the auto-off settings for a development environment, thus ensuring their computing efforts get completed.

Infrastructure User Challenges

Supporting Multiple IDEs

All IDEs are not the same. Some were built in different eras like Visual Studio, Eclipse and JetBrains or VS Code more recently. They have different dependencies based on the operating system they are installed on and have different formats for customizing and configuring.

Infrastructure teams that support developers with IDEs installed on local computers may spend an inordinate amount of time troubleshooting problems with IDEs, their versions, and dependencies. There can also be challenges with reliably running Python kernels with Jupyter on Microsoft Windows.

With Coder, infrastructure teams can define remote Linux container and VM images that are the basis for end users' development environments. The image includes the required IDEs and specific versions, so the fleet of end users are all running development environments on the same IDEs' versions. End users get their own individual remote development environment with a dedicated Python kernel and no contention with other users.

End users select Terraform templates that they are authorized to use - to build remote development environments

Data Science Project Growth and the Importance of Source Code Version Control

As enterprises invest in analytics and data science as a competitive advantage, the number of end users working on projects grow requiring git version control to bring order. Coder provides administrators a secure and easy-to-deploy git integration with GitLab, GitHub, Bitbucket and Azure DevOps via an OAuth Application. Coder will seamlessly intercept end users' git commands with a link to their enterprise' Git provider. Once authenticated, Coder stores the end users' personal OAuth tokens to securely and efficiently authenticate every git action.

A Common Platform Versus Cloud-based Solutions

Public cloud providers like GitHub Codespaces, Azure, AWS and Google Cloud have VM and Virtual Desktop remote development environment solutions. They can have a higher cost of ownership because the compute options are limited. For example, if an end user needs more storage, they may have to select the larger environment option that also has more CPU and memory, and therefore a higher cost.

End users and administrators will also have to learn multiple solutions while Coder can run on-premises and on all cloud providers for a lower total cost of ownership.

Control Costs by Monitoring and Shutting Down Unused Environments

Having access to more compute with remote development environments makes end users more productive, but can lead to run-away cloud costs if not properly controlled.

Coder empowers administrators to establish auto-off governance controls that stop development environments if unused. Coder considers a development environment in use if there is an active SSH, web IDE or web terminal connection. If an end user mistakenly left a local IDE and SSH connection open, administrators can also set a maximum lifetime value which will shutdown a development environment preventing a Kubernetes pod or VM running forever and running up your enterprise's cloud bill.

Secure Your Enterprise's Intellectual Property

Continuing to allow end users to use local development environments introduces unnecessary risk of intellectual property loss of code repositories and sensitive data. Coder moves development environments to your enterprise's secure server infrastructure. Development environments have dedicated, isolated home directories for code repositories and data to be cloned to.

Next Steps

If you are with an enterprise and would like to speak with a technical account executive, fill out our demonstration form or start a 30-day complimentary trial of Coder Enterprise. For more information about analytics and data science use cases and Coder, see our solution sheet.

Subscribe to our Newsletter

Want to stay up to date on all things Coder? Subscribe to our monthly newsletter and be the first to know when we release new things!