AWS SageMaker, the machine learning brand of AWS, announced the release of SageMaker Studio, branded an “IDE for ML,” on Tuesday. Machine-learning has been gaining traction and, with its compute-heavy training workloads, could prove a decisive factor in the growing battle over public cloud. So what does this new IDE mean for AWS and the public cloud market?
First, the big picture (skip below for the feature by feature analysis of Studio): It’s no secret that SageMaker’s market share is minuscule (the Information put it around $11 million in July of 2019). SageMaker Studio attempts to solve important pain points for data scientists and machine-learning (ML) developers by streamlining model training and maintenance workloads. However, its implementation falls short due to common, long-standing, complaints about AWS in general — its steep learning curve and sheer complexity.
AWS is clearly embracing a strategy of selling to corporate IT while neglecting features and UX that could make life easier for data scientists and developers. While the underlying technologies they are releasing, like Notebooks, Debugger, and Model Monitor attempt to make ML training easier, the implementations leave a lot to be desired.
My own experience trying to access SageMaker Studio was a microcosm of this problem. I had an impossible time setting up Studio. Existing AWS accounts can’t log you into the new service; you need a new AWS single sign-on (SSO). Setting up SSO was kludgy, with unhelpful error messages like “Member must satisfy regular expression pattern: [\p{L}\p{M}\p{S}\p{N}\p{P}]+” that are more likely to confuse than enlighten. Getting a SageMaker Studio session working also required understanding the full SSO permissions model — itself a steep learning curve. Apparently, I misunderstood it, as I never got this to work. And that was with the helpful guidance of three AWS employees, one of whom was a developer.
My experience with SageMaker wasn’t unique. That same Information article stated “One person who has worked on customer projects using the technology described the service as technically complex to work with, even though AWS has sought to make machine learning more accessible to customers.” Nor is this kind of complexity unique to SageMaker; as we have seen, it generalizes to all of AWS’s cloud products. Meanwhile, its competitor Google Cloud is reported to have a better developer experience, be more “user friendly,” and be “most caring for the need of professional developers.”
For now, Investors don’t have to worry. Choosing complexity over simplicity is probably the right choice, focusing on the needs of the large, deep-pocketed corporate IT buyers who emphasize customizable fine-grained security and feature checklists (AWS has 169 separate products, as of May this year). Unfortunately, this comes at the expense of a steep learning curve and developer friendliness. While this might be the right strategy for now, Studio’s complexity opens AWS up to a potential of Christensen-Style disruption (think Innovator’s Dilemma). AWS’s sheer size (it is widely acknowledged to be the largest cloud provider) has many advantages — ability to support broader offerings, a larger certified developer base, greater economies of scale — just to name a few. But this year has already seen the IPOs of Zoom and Slack, two B2B companies that circumvented the traditional corporate IT sales path by winning over the hearts and minds of end users and forcing the hand of buyers. Could a similar developer-friendly player displace AWS?
What SageMaker Studio delivers
Now let’s take a look at Studio’s features: SageMaker announced some interesting new capabilities as a part of Studio: Notebooks, Experiments, Debugger, Model Monitor, and AutoPilot.
SageMaker Notebooks attempt to solve the biggest barrier for people learning data science: getting a Python or R environment working and figuring out how to use a notebook. Studio delivers single-click Notebooks for the SageMaker environment, competing directly against Google Colab or Microsoft Azure Notebooks in the Notebook-as-a-Service category. But SageMaker has had Notebook Instances since 2018, and it’s unclear what kind of improvement Studio offers on this front.SageMaker Experimentsprovides progress reporting capabilities for long jobs. This is handy since you often have no way of knowing how long a job will continue to run for or if it has silently crashed in the background. The Experiments feature should be a useful addition for cloud-based jobs, large data sets, or GPU-intensive projects. However, it has existed (albeit potentially in a less visual form) even as early as July 2018. Again, it’s unclear how this product is better than its predecessors.
SageMaker Debugger promises to simplify the debugging process. The announcement of this feature came with in-depth explanations, including code snippets showing how the tool can help developers debug otherwise opaque Tensorflow bugs (it presumably can or will work with other ML tools).
I spoke with Field Cady, author of The Data Science Handbook, about the value of the tool. “Debugging machine-learning models, particularly complex ones like Tensorflor or PyTorch, is a real pain point and not spotting errors early when you can have multi day training jobs really hampers productivity,” he said. “Immediate access to the models, even if they’re not fully trained yet, lets you solve those integration problems in parallel to the training itself.” Overall, the feature seems truly novel and does solve an actual user pain point.
SageMaker Model Monitor monitors models at SageMaker Endpoints for data drift. This is perhaps the most exciting feature of Studio because it helps alert model maintainers about input data (and hence model) drift. To paraphrase AWS CEO Andy Jassy’s keynote from this year’s reInvent conference, mortgage-default models trained with housing data from 2005 may perform well in 2006, but would likely fail during the bursting of the housing bubble in 2008 because of changes in the underlying model inputs. A system that could alert model maintainers to these changes automatically is very valuable. Model Monitor presents a clear benefit of standardizing model hosting on SageMaker Endpoints, AWS’s model hosting service, in the head-to-head competition with Google AI Platform and startup Algorithmia.
SageMaker AutoPilot is part of the AutoML category, which automatically trains ML models from CSV data files. The product competes with DataRobot, which raised $206 million in Series E this past September. While this type of tool has some benefits (it’s probably cheaper than having a data scientist perform this step), it’s also probably the most misunderstood category of those we’ve looked at so far. When I discussed the tool with Cady, he noted the dirty little secret of data science: While most of the hype is concentrated on the last 10% of the work that is ML and training, 90% of the work comes earlier. “By the time you have a CSV, you’ve done 90% of the work. Most of data science comes from thinking about what the right data sets to use are, what the right outcome variable to target is, the biases in your data, and then munging and joining it together,” he said. So while AutoPilot can accelerate ML, it does nothing to speed up the bulk of a data scientist’s work.
The bottom line
So what does all of this tell us about SageMaker Studio? It’s a mixed bag, with some features that appear to be just rebrandings of older products and some that solve new, legitimate customer pain points. Even the best new features are incremental improvements on existing products. To be transformative, AWS has to address the larger usability issues in SageMaker specifically and the larger AWS ecosystem more broadly.
Is a Christensen-Style disruption of AWS likely? Only time will tell. Through tools like Notebooks, Debugger, and Model Monitor, AWS seems to be attempting to win the hearts and minds of developers and data scientists. But to date, those attempts seem to be falling short.