A controlled experiment using Harbor evaluates if Claude Code's Agent Skills improve performance on domain-specific tasks, focusing on database migration scenarios.
A developer built an observability layer for Claude Code after Anthropic's update hid file-level activity, impacting visibility into agentic coding tool operations.
Today we’re launching Task Evals: a built-in way to measure whether a skill is actually steering agent behaviour. Skills can be well written and still drift as models and surrounding context change. Task evals compare outcomes with and without a skill, making it clear when it’s helping, doing nothing, or working against you.
The article discusses how a Tessl skill was developed to enable Claude Code to write PubNub Functions effectively, improving deployment success from 60% to 100%.
GitHub's Agentic Workflows aim to automate repo maintenance by embedding coding agents into GitHub Actions, enabling workflows to reason over repository state.
As AI coding agents become more autonomous, engineers shift from coding to orchestrating tasks, focusing on architecture and strategy, as seen with OpenAI's Codex app.
OpenClaw's success lies in its ability to learn and adapt, offering enterprises patterns for creating AI systems with persistent context and accessible capabilities.
Agent skills are structured instructions for automating tasks like CI monitoring, enhancing workflow efficiency by providing reusable, tested, and composable capabilities.
At Tessl, we take evaluating context seriously. This post explores three different eval methodologies that you can take advantage of on the Tessl platform, and how they work.
Enhance AI coding agents with the CodeGuard Skill to improve secure code generation by applying Cisco's security rules, covering 23 categories and multiple languages.
Automate the publishing of agent skills using GitHub Actions with Tessl, a package manager that ensures updated skills and context are consistently deployed.
Explore eight AI code review skills categorized into reviewers, workflow, and plumbing, enhancing code analysis, security checks, and integration in dev workflows.