Agile Data Technology

Applications are the nexus of the modern enterprise. They simplify operations, speed execution, and drive competitive advantage. Accelerating the application lifecycle means accelerating the business. Increasingly, organizations turn to public and private clouds, SaaS offerings, and outsourcing to hasten development and reduce risk, only to find themselves held hostage by their data.

Applications are nothing without data. Enterprise applications have data anchored in infrastructure, tied down by production requirements, legacy architecture, and security regulations. But projects demand fresh data under the control of their developers and testers, requiring processes to work around these impediments. The suboptimal result leads to cost overruns, schedule delays, and poor quality.

Agile development requires agile data. Agile data empowers developers and testers to control their data on their schedule. It unburdens IT by efficiently providing data where it is needed independent of underlying infrastructure. And it accelerates application delivery by providing fresh and complete data whenever necessary. It grants its users super powers.

Many technologies can solve part of the agile data problem, but a partial solution still leaves you with suboptimal processes that impede your business. A complete agile data solution must embrace the following attributes.

Non Disruptive Synchronization

Production data is sensitive. The environment has been highly optimized and secured, and its continued operation is critical to the success of the business – introducing risk is unacceptable. An agile data solution must automatically synchronize with production data such that it can provide fresh and relevant data copies, but it cannot mandate changes to how the production environment is managed, nor can its operation jeopardize the performance or success of business critical activities.

Service Provisioning

Data is more than just a sequence of bits. Projects access data through relational databases, NoSQL databases, REST APIs, or other APIs. An agile data solution must move beyond copying the physical representation of the data by instantiating and configuring the systems to access that data. Leaving this process to the end users induces delays and amplifies risk.

Source Freedom

Data is pervasive. Efforts to mandate a single data representation, be it a particular relational or NoSQL system, rarely succeed and limit the ability of projects to choose the data representation most appropriate for their needs. As projects needs diversify the data landscape, the ability to manage all data through a single experience becomes essential. This unified agile experience necessitates a solution not tied to a single data source.

Platform Independence

The premier storage, virtualization, and compute platforms of today may be next year’s legacy architecture. Solutions limited to a single platform inhibit the ability of organizations to capitalize on advances in the industry, be it a high performance flash array or new private cloud software. Agility over time requires a solution that is not tied to the implementation of a particular hardware or software platform.

Efficient Copies

Storage costs money, and time costs the business. Agile development requires a proliferation of data copies for each developer and tester, magnifying these effects. Working around the issue with partial data leads to costly errors that are caught late in the application lifecycle, if at all. An agile solution must be able to create, refresh, and rollback copies of production data in minutes while consuming a fraction of the space required for a full copy.

Workflow Customization

Each development environment has its own application lifecycle workflow. Data may need to be masked, projects may need multiple branches with different schemas, or developers may need to restart services as data is refreshed. Pushing responsibility to the end user is error prone and impedes application delivery. An agile solution must provide stable interfaces for automation and customization such that it can adapt to any development workflow.

Self Service Data

Developers and testers dictate the pace of their assigned tasks, and each task affects the data. Agile development mandates that developers have the ability to transform, refresh, and roll back their data without interference. This experience should shield the user from the implementation details of the environment to limit confusion and reduce opportunity for error.

Resource Management

Each data copy consumes resources through storage, memory, and compute. Once developers experience the power of agile data, they will want more \copies, run workloads on them for which they were not designed, and forget to delete them when they are through. As these resources become scarce, the failure modes (such as poor performance) become more expensive to diagnose and repair. Combatting this data sprawl requires visibility into performance and capacity usage, accountability through auditing and reports, and proactive resource constraints.

 

Delphix is the agile data platform of the future. You can sync to your production database, instantly provision virtual databases where they are needed using miniscule amount space, and provide each developer their own copy of the data that can be refreshed and rolled back on demand. This platform will become only more powerful over time as we add new data sources, provide richer workflows targeting specific applications and use cases, and streamline the self service model. An enterprise data strategy without Delphix is just a path to more data anchors, necessitating suboptimal processes that continue to slow application development and your business.

Posted on April 21, 2013 at 8:00 pm by eschrock · Permalink · Comments Closed
In: Delphix

Enterprise Software Hackathons

At Delphix, we just concluded one of our recurring Engineering Kickoff events where we get everyone together for a few days of collaboration, discussion, idea sharing, and fun. In this case it included, for the first time, an all-day hackathon event. To be honest, it was a bit of an experiment and one where we were unsure of how it would be received. We had all read about, participated in, or hear praise of, hackathons at other companies, but these companies were always more consumer-focused or had technologies that were more easily assembled into different creations. As an enterprise software company, we were concerned that even the simplest projects would be too complex to turn around over the course of a day. Given the potential benefit, however, it was clearly something we wanted to experiment with – any failure would also be a learning opportunity.

Some companies go big or go home when it comes to hackathons – week long activities, physical hacks, etc. We wanted to preserve freedom but be a little more targeted. The directive was simple: spend a day doing something unrelated to your normal day job that in some way connects to the business. People volunteered ideas and mentorship ahead of time so that even the newest engineers could meaningfully participate. The result was a resounding success. Whether people were able to give a demo, sketch on a whiteboard, or just speak to their ideas and the challenges they faced, everyone pushed themselves in new directions and walked away having learned something through the experience.

The set of activities covered a wide swath of engineering, including:

For myself, I put together a prototype of a hosted SSH/HTTP proxy for use by our support organization. This was my first real foray into the world of true PaaS cloud software – running node.js, redis, and cloudAMQP in a heroku instance, and it’s been incredibly interesting to finally play with all these tools I’ve read about but never had a reason to use. I will post details (and hopefully code) once I get it into slightly better shape.

Only a fraction of these are really what I would consider a contribution to the product itself, which is where our initial trepidation around a hackathon went awry. No matter how complex your product or how high the barriers to entry , engineers will find a way to build cool things and try out new ideas in a hackathon setting. Everything that people did, from learning how to make changes to our OS to improving our quality of life as engineers to testing new product ideas, will provide real value to the engineering organization. On top of that, it was incredibly fun and a great way to get everyone working together in different ways.

It’s something we’ll certainly look at doing again, and I’d recommend that every company, organization, or group, find some way to get engineers together with the express purpose of working on ideas not directly related to their regular work. You’ll end up with some cool ideas and prototypes, and everyone will learn new things while having fun doing it.

Posted on February 28, 2013 at 9:43 pm by eschrock · Permalink · Comments Closed
In: Delphix

Engineer Anti-Patterns

The other week I had a particularly disheartening discussion with a potential new hire. I typically describe our engineering organization at Delphix as a bottoms-up meritocracy where smart people seize opportunities to impact the company through through thoughtful execution and data-driven methodology (a.k.a. buzzword bingo gold). In this case, after hours of discussions, I couldn’t shake this engineer’s fixation with understanding how his title would affect his ability to have impact at Delphix. After deciding that it was not a good cultural fit, I spent some time thinking about what defines our engineering culture and what exactly it was that I felt was such a mismatch. Rather than writing some pseudo-recruiting material extolling the virtues of Delphix, I thought I’d take a cue from Bryan’s presentation on corporate open source anti-patterns (video) and instead  look at some engineering cultural anti-patterns that I’ve encountered in the past. What follows is a random collection of cultural engineering pathologies that I’ve observed in the past and have worked to eschew at Delphix.

The Thinker

This engineer believes his responsibility is to think up new ideas, while others actually execute those ideas. While there are certainly execution-oriented engineers with an architect title out there that do great work, at Delphix we intentionally don’t have such a title because it can send the wrong message: that “architecting” is somehow separate from executing, and permission to architect is something to be given as a reward. The hardest part of engineering comes through execution – plotting an achievable path through a maze of code and possible deliverables while maintaining a deep understanding of the customer problem and constraints of the system. It’s important to have people who can think big, deeply understand customer needs, and build complex architectures, but without a tangible connection to the code and engineering execution those ideas are rarely actionable.

The Talker

Often coinciding with “The Thinker”, this engineer would rather talk in perpetuity rather than sit down and do actual work. Talkers will give plenty of excuses about why they can’t get something done, but the majority of the time those problems could be solved simply by standing up and doing things themselves. Even more annoying is their penchant for refusing to concede any argument, resulting in orders of magnitude more verbiage with each ensuing email, despite attempts to bring the discussion to a close. In the worst case the talker will provide tacit agreement publicly but fume privately for inordinate amounts of time. In many cases the sum total of time spent talking about the problem exceeds the time it would take to simply fix the underlying issue.

The Entitled

This engineer believes that titles are granted to individuals in order to expand her influence in the company; that being a Senior Staff Engineer enables her to do something that cannot be accomplished as a Staff Engineer. Titles should be a reflection of the impact already achieved through hard work, not a license granted by a benevolent management. When someone is promoted, the reasons should be obvious to the organization as a whole, not a stroke of luck or the result of clever political maneuvering. Leadership is something earned by gaining the respect of your peers through execution, and people who would use their title to make up for a lack of execution and respect of their peers can do an incredible amount of damage within an enabling culture.

The Owner

This engineer believes that the path to greater impact in the organization is through “owning” ideas and swaths of technology. While ownership of execution is key to any successful engineering organization (clear responsibility and accountability are a necessity), ownership of ideas is toxic. This can lead to passive-aggressive counter-productive acts by senior engineering, and an environment where junior engineers struggle to get their ideas heard. The owner rarely takes code review comments well, bullies colleagues that encroach on her territory, and generally holds parts of the product hostage to her tirades. Metastasized in middle management, this leads to ever growing fiefdoms where technical decisions are made for entirely wrong organizational reasons. Ideas and innovation come from everywhere, and while different parts of the organization are better suited to execution of large projects based on their area of expertise, no one should be forbidden from articulating their ideas due to arbitrary assignment.

The Recluse

Also known as “the specialist”, this engineer defines his role in the most narrow fashion possible, creating an ivory tower limited by his skill set and attitude. Good engineers seize a hard problem and drive it to completion, even when that problem pushes them well beyond their comfort zone. The recluse, however, will use any excuse to make something someone else’s problem. Even when the problem falls within his limited domain, he will solve only the smallest portion of the problem, preferring to file a bug to have someone else finish the overall work. When confronted on architectural issues, he will often agree to do it your way, but then does it his way anyway. Months later it can turn out he never understood what you had said in the first place or discussed it in the interim, and by then it’s too late to undo the damage done.

All of us have the potential for these anti-patterns in us. It’s only through regular introspection and frank discussions with colleagues that we can hope to have enough self awareness to avoid going down these paths. Most importantly, we all need to work to create a strong engineering culture where it is impossible for these pathologies to thrive. Once these pathologies become a fixture in a culture, they breed similar mentalities as the organization grows and can be impossible to eradicate at scale.

Posted on August 14, 2012 at 1:59 pm by eschrock · Permalink · Comments Closed
In: Delphix

A node.js CLI?

Over the past several months, one of the new features I’ve been working on for the next release is the development of the new CLI for our appliance. While the CLI is the epitome of a checkbox item to most users, as a completely different client-side consumer of web APIs it can have a staggering maintenance cost. Our upcoming release introduces a number of new concepts that required that we gut our web services and built them  from the ground up – retrofitting the old CLI was simply not an option.

What we ended up building was local node.js CLI that takes programmatically defined web service API definitions and dynamically constructs a structured environment for manipulating state through those APIs. Users can log in with their Delphix credentials over SSH or the console and be presented with this CLI via custom PAM integration. Whenever I describe this to people, however, I get more than a few confused looks and questions:

Yes node.js is so hot right now.  Yes, a bunch of Joyeurs (the epicenter of node.js) are former Fishworks colleagues. But the path here, as with all engineering at Delphix, is ultimately data-driven based on real problems, and this was simply the best solution to those problems. I hope to have more blog posts about the architecture in the future, but as I was writing up a README for our own developers, I thought the content would make a reasonable first post. What follows is an engineering FAQ. From here I hope to describe our schemas and some of the basic structure of the CLI, stay tuned.

Why a local CLI?

The previous Delphix CLI was a client-side java application written in groovy. Running the CLI on the client both incurs a cost to users (they need to download an manage additional software) as well as making it a nightmare to manage different versions across different servers. Nearly every appliance shipped today has a SSH interface; doing something different just increases customer confusion. The purported benefit (there is no native Windows SSH client) has shown to be insignificant in practice, and there are other more scalable solutions to this problem (such as distributing a java SSH client).

Why Javascript?

We knew that we were going to need to be manipulating a lot of dynamic state, and the scope of the CLI would remain relatively small. A dynamic scripting language makes for a far more compelling development environment for rapid development, at the cost of needing a more robust unit test framework to catch what would otherwise be compile time errors in a strongly typed statically compiled language. We explicitly chose javascript because our new GUI will be built in javascript, and this both keeps the number of languages and environments used in production to a minimum, as well allowing these clients to share code where applicable.

Why node.js?

We knew v8 was the best-of-breed runtime when it comes to javascript, and we actually started with a custom v8 wrapper. As a single threaded environment, this was pretty straightforward. But once we started considering background tasks we knew we’d need to move to an asynchronous model. Between the cost of building infrastructure already provided by node (HTTP request handling, file manipulation, etc) and the desire to support async activity, node.js was the clear choice of runtime.

Why auto-generated?

Historically, the cost of maintaining the CLI at Delphix (and elsewhere) has been very high. CLI features lag behind the GUI, and developers face additional burden to port their APIs to multiple clients. We wanted to build a CLI that would be almost completely generated from shared schema. When developers change the schema in one place, we auto-generate both backend infrastructure (java objects and constants), GUI data model bindings, and the complete CLI experience.

Why a modal hierarchy?

The look and feel of the Delphix CLI is in many ways inspired by the Fishworks CLI. As engineers and users of many (bad) CLIs, our experience has led to the belief that a CLI with integrated help, tab completion, and a filesystem-like hierarchy promotes exploration and is more natural than a CLI with dozens of commands each with dozens of options. It also makes for a better representation of the actual web service APIs (and hence easier auto-generation), with user operations (list, select, update) that mirror the underlying CRUD API operations.

Posted on July 28, 2012 at 10:19 pm by eschrock · Permalink · Comments Closed
In: Uncategorized

Data Replication: Building a better NDMP

In my previous post I outlined some of the challenges faced when building a data replication solution, how the first Delphix implementation missed the mark, and how we set out to build something better the second time around.

The first thing that became clear after starting on the new replication subsystem was that we needed a better NDMP implementation. A binary-only separate daemon with poor error semantics that routinely left the system in an inconsistent state was not going to cut it. NDMP is a protocol built for a singular purpose: backing up files using a file-specific format (dump or tar) over arbitrary topologies (direct attached tape, 3-way restore, etc). By being both simultaneously so specific in the data semantics but so general in the control protocol, we end up with the worst of both worlds: baked-in concepts (such as file history, complete with inode numbers) that prevent us from adequately expressing Delphix concepts, and a limited control protocol (lacking multiple streams or resumable streams) with terrible error semantics. While we will ultimately replace NDMP for replication, we knew that we still needed it for backup, and that we didn’t have the time to replace both the implementation and the data protocol for the current release.

Illumos, the open source operating system our distribution is based on, provides an NDMP implementation, one that I had previously dealt with while at Fishworks (though Dave Pacheo was the one who did the actual NDMP integration). I spent some time looking at the implementation and came to the conclusion that it suffered from a number of fatal flaws:

With this in mind I looked at the ndmp.org SDK implementation, and found it to suffer from the same pathologies (and a much worse implementation to boot). It was clear that the Solaris implementation was derived from the SDK, and that there was no mythical “great NDMP implementation” waiting to be found. I was going to have to suck it up and get back to my Solaris roots to eviscerate this beast.

The first thing I did was recast the daemon as a library, elminating any code that deal with daemonizing, running a door server to report statistics, and
existing Solaris commands that communicated with the server. This allowed me to add a set of client-provided callback vector and configuration options to control state. With this library in place, we could use JNA to easily call into C code from our java management stack without having to worry about marshaling data to and from an external daemon.

The next step was to rip out all the data-handling functionality, instead creating a set of callback vectors in the library registration mechanism to start and stop backup. This left the actual implementation of the over-the-wire format up to the consumer. The sheer amount of code used to support tar and zfs send was staggering, and it had its tendrils all across the implementation. As I started to pull on the thread, more and more started to unravel. Data-specific operations would call into the “tape library management” code (which had very little to do with tape library management) that would then call back into common NDMP code, that would then do nothing.

With the data operations gone, I then had to finally address the hard part: making the code usable. The old error semantics were terrible. I had to go through every log call and non-zero return value, analyze its purpose, and restructure it to use the consumer-provided vector so that we could log such messages natively in the Delphix stack. While doing generic code cleanup, this led me to rip out huge swaths of unused code, from buffer management to NDMPv2 support (v3 has been in common use for more than a decade). This was rather painful, but the result has been quite a usable product. While the old Delphix implementation would have reported “NDMP server error CRITICAL: consult server log for details” (of course, there was no way for the customer to get to the “server log”), we would now get much more helpful messages like “NDMP client reported an error during data operation: out of space”.

The final piece of the puzzle was something that surprised me. By choosing NDMP as the replication protocol (again, a temporary choice), we needed a way to drive the 3-way restore operation from within the Delphix stack. This meant that we wanted to act as a DMA. As I looked at the unbelievable awful ‘ndmpcopy’ implementation shipped with the NDMP SDK, I noticed a lot of similarity to what we needed on the client and what we had on the server (processing requests was identical, even if the set of expected requests was quite different). Rather than build an entirely separate implementation, I converted libndmp such that it could act as a server or a client. This allowed us to build an NDMP copy operation in Java, as well as simulate a remote DMA (an invaluable testing tool).

It took more than a month of solid hard work and several more months of cleanup here and there, but the result was worth it. The new implementation clocks in at just over 11,000 lines of code, while the original was a staggering 43,000 lines of code. Our implementation doesn’t include any actual data handling, so it’s perhaps an unfair comparison. But we also include the ability to act as a full-featured DMA client, something the illumos implementation lacks.

The results of this effort will be available on github as soon as we release the next Delphix version (within a few weeks). While interesting, it’s unlikely to be useful to the general masses, and certainly not something that we’ll try to push upstream. I encourage others looking for an open-source embedded NDMP implementation to fork and improve what we have in Delphix – it’s a very flexible NDMP implementation that can be adopted for a variety of non-traditional NDMP scenarios. But with no built-in data processing, and no standalone daemon implementation, it’s a long way from replacing what can be found in illumos. If someone was so inspired, you could build a daemon on top of the current library – one that provides support for tar, dump, ZFS, and whatever other formats are supported by the current illumos implementation. It would not be a small amount of work, but I am happy to lend advice (if not code) to anyone interested.

Next up will be a post whose working title is “Data Replication: Metadata + Data = Crazy Pain in My Ass”.

Posted on February 28, 2012 at 1:10 pm by eschrock · Permalink · Comments Closed
In: Uncategorized

Data Replication: Approaching the Problem

With our next Delphix release just around the corner, I wanted to spend some time discussion the engineering process behind one of the major new features: data replication between servers. The current Delphix version already has a replication solution, so how does this constitute a “new feature”? The reason is that it’s an entirely new system, the result of an endeavor to create a more reliable, maintainable, and extensible system. How we got here makes for an interesting tale of business analysis, architecture, and implementation.

Where did we come from?

Before we begin looking at the current implementation, we need to understand why we started with a blank sheet of paper when we already had a shipping solution. The short answer is that what we had was unusable: it was unreliable, undebuggable, and unmaintainable. And when you’re in charge of data consistency for disaster recovery, “unreliable” is not an acceptable state. While I had not written any of the replication infrastructure at Fishworks (my colleagues Adam Leventhal and Dave Pacheco deserve the credit for that), I had spent a lot of time in discussions with them, as well as thinking about how to build a distributed data architecture at Fishworks. So it seemed natural for me to take on this project at Delphix. As I started to unwind our current state, I found a series of decisions that, in hindsight, led to the untenable state we were in today.

Ideals for a new system

Given this list of limitations, I (later joined by Matt) sat down with a fresh sheet of paper. The following were some of the core ideals we set forth as we built this new system:

At the same time, we were forced to limit the scope of the project so we could deliver something in an appropriate timeframe. We stuck with NDMP as a protocol despite its inherent problems, as we needed to fix our backup/restore implementation as well. And we kept the active/passive deployment model so that we did not require any significant changes to the GUI.

Next, I’ll discuss the first major piece of work: building a better NDMP implementation.

Posted on February 20, 2012 at 1:06 pm by eschrock · Permalink · Comments Closed
In: Uncategorized

Your MDB fell into my DTrace!

Yesterday, several of us from Delphix, Nexenta, Joyent, and elsewhere, convened before the OpenStorage summit as part of an illumos hackathon.  The idea was to get a bunch of illumos coders in a room, brainstorm a bunch of small project ideas, and then break off to go implement them over the course of the day.  That was the idea, at least – in reality we didn’t know what to expect or how it would turn out.  Suffice to say that the hackathon was an amazing success.  There were a lot of cool ideas, and a lot of great mentors in the room that could lead people through unfamiliar territory.

For my mini-project (suggested by ahl), I implemented MDB’s ::print functionality in DTrace via a new print() action. Today, we have the trace() action, but the result is somewhat less than useful when dealing with structs, as it degenerates into tracemem():

# dtrace -qn 'BEGIN{trace(`p0); exit(0)}'
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 00 00 00 00 00 00 00 00 60 02 c3 fb ff ff ff ff  ........`.......
        10: c8 c9 c6 fb ff ff ff ff 00 00 00 00 00 00 00 00  ................
        20: b0 ad 14 c6 00 ff ff ff 00 00 00 00 02 00 00 00  ................
        ...

The results aren’t pretty, and we end up throwing away all that useful proc_t type information. With a little tweaks to dtrace, and some cribbing from mdb_print.c, we can do much better:

# dtrace -qn 'BEGIN{print(`p0); exit(0)}'
proc_t {
    struct vnode *p_exec = 0
    struct as *p_as = 0xfffffffffbc30260
    struct plock *p_lockp = 0xfffffffffbc6c9c8
    kmutex_t p_crlock = {
        void *[1] _opaque = [ 0 ]
    }
    struct cred *p_cred = 0xffffff00c614adb0
    int p_swapcnt = 0
    char p_stat = '02'
    ....

Much better! Now, how did we get there from here? The answer was an interesting journey through libdtrace, the kernel dtrace implementation, CTF, and the horrors of bitfields.

To action or not to action?

The first question I set out to answer is what the user-visible interface should be. It seemed clear that this should be an operation on the same level as trace(), allowing arbitrary D expressions, but simply preserving the type of the result and pretty-printing it later. After briefly considering printt() (for “print type”), I decided upon just print(), since this seemed like a logical My first inclination was to create a new DTRACEACT_PRINT, but after some discussion with Adam, we decided this was extraneous – the behavior was identical to DTRACEACT_DIFEXPR (the internal name for trace), but just with type information.

Through the looking glass with types and formats

The real issue is that what we compile (dtrace statements) and what we consume (dtrace epids and records) are two very different things, and never the twain shall meet. At the time we go to generate the DIFEXPR statement in dt_cc.c, we have the CTF data in hand. We don’t want to change the DIF we generate, simply do post-processing on the other side, so we just need some way to get back to that type information in dt_consume_cpu(). We can’t simply hang it off our dtrace statement, as that would break anonymous tracing (and violate the rest of the DTrace architecture to boot).

Thankfully, this problem had already been solved for printf() (and related actions) because we need to preserve the original format string for the exact same reason. To do this, we take the action-specific integer argument, and use it to point into the DOF string table, where we stash the original format string. I simply had to hijack dtrace_dof_create() and have it do the same thing for the type information, right?

If only it could be so simple. There were two complications here: there is a lot of code that explicitly treats these as printf strings, and parses them into internal argv-style representations. Pretending our types were just format strings would cause all kinds of problems in this code. So I had to modify libdtrace to treat this more explicitly as raw ‘string data’ that is (optionally) used with the DIFEXPR action. Even with that in place, the formats I was sending down were not making it back out of the kernel. Because the argument is action-specific, the kernel needed to be modified to recognize this new argument in dtrace_ecb_action_add. With that change in place, I was able to get the format string back in userland when consuming the CPU buffers.

Bitfields, or why the D compiler cost me an hour of my life

With the trace data and type string in hand, I then proceeded to copy the mdb ::print code, first from apptrace (which turned out to be complete garbage) and then fixing it up bit by bit. Finally, after tweaking the code for an hour or two, I had it looking pretty much like identical ::print output. But when I fed it a klwp_t structure, I found that the user_desc_t structure bitfields weren’t being printed correctly:

# dtrace -n 'BEGIN{print(*((user_desc_t*)0xffffff00cb0a4d90)); exit(0)}'
dtrace: description 'BEGIN' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0      1                           :BEGIN user_desc_t {
    unsigned long usd_lolimit = 0xcff3000000ffff
    unsigned long usd_lobase = 0xcff3000000
    unsigned long usd_midbase = 0xcff300
    unsigned long usd_type = 0xcff3
    unsigned long usd_dpl :64 = 0xcff3
    unsigned long usd_p :64 = 0xcff3
    unsigned long usd_hilimit = 0xcf
    unsigned long usd_avl :64 = 0xcf
    unsigned long usd_long :64 = 0xcf
    unsigned long usd_def32 :64 = 0xcf
    unsigned long usd_gran :64 = 0xcf
    unsigned long usd_hibase = 0
}

I spent an hour trying to debug this, only to find that the CTF IDs weren’t matching what I expected from the underlying object. I finally tracked it down to the fact that the D compiler, by virtue of processing the /usr/lib/dtrace files, pulls in its own version of klwp_t from the system header files. But it botches the bitfields, leaving the user with a subtly incorrect data. Switching the type to be genunix`user_desc_t fixed the problem.

What’s next

Given the usefulness of this feature, the next steps are to clean up the code, get it reviewed, and push to the illumos gate. It should hopefully be finding its way to an illumos distribution near you soon. Here’s a final print() invocation to leave you with:

# dtrace -n 'zio_done:entry{print(*args[0]); exit(0)}'
dtrace: description 'zio_done:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  42594                   zio_done:entry zio_t {
    zbookmark_t io_bookmark = {
        uint64_t zb_objset = 0
        uint64_t zb_object = 0
        int64_t zb_level = 0
        uint64_t zb_blkid = 0
    }
    zio_prop_t io_prop = {
        enum zio_checksum zp_checksum = ZIO_CHECKSUM_INHERIT
        enum zio_compress zp_compress = ZIO_COMPRESS_INHERIT
        dmu_object_type_t zp_type = DMU_OT_NONE
        uint8_t zp_level = 0
        uint8_t zp_copies = 0
        uint8_t zp_dedup = 0
        uint8_t zp_dedup_verify = 0
    }
    zio_type_t io_type = ZIO_TYPE_NULL
    enum zio_child io_child_type = ZIO_CHILD_VDEV
    int io_cmd = 0
    uint8_t io_priority = 0
    uint8_t io_reexecute = 0
    uint8_t [2] io_state = [ 0x1, 0 ]
    uint64_t io_txg = 0
    spa_t *io_spa = 0xffffff00c6806580
    blkptr_t *io_bp = 0
    blkptr_t *io_bp_override = 0
    blkptr_t io_bp_copy = {
        dva_t [3] blk_dva = [
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
    ...
Posted on October 26, 2011 at 6:32 am by eschrock · Permalink · Comments Closed
In: Uncategorized

Delphix illumos sources posted to github

With our first illumos-based distribution (2.6) out the door, we’ve posted the illumos-derived sources to github:

https://github.com/delphix/delphix-os-2.6

This repository contains the following types of changes from the illumos gate:

We will post updates with each release of the software.  This allows us to make sure the code is fully baked and tested, while still allowing us to proactively push complete pieces of work more frequently.

If you have questions about any particular change, feel free to email the author for more information.  You can also find us on the illumos developer mailing list and the #illumos IRC channel on freenode.

Posted on July 4, 2011 at 2:09 am by eschrock · Permalink · Comments Closed
In: Uncategorized

Beyond Oracle

It’s been a little over six months since I left Oracle to join Delphix.  I’m not here to dwell on the reasons for my departure, as I think the results speak for themselves.

It is with a sad heart, however, that I look at the work so many put into making OpenSolaris what it was, only to see it turned into the next HP-UX – a commercially relevant but ultimately technologically uninteresting operating system.  This is not to denigrate the work of those at Oracle working on Solaris 11, but I personally believe that a truly innovative OS requires an engaged developer base interacting with the source code, and unique technologies that are adopted across multiple platforms.  With no one inside or outside of Oracle believing the unofficial pinky swear to release source code at some abstract future date, many may wonder what will happen to the bevy of cutting edge technologies that made up OpenSolaris.

The good news is that those technologies are alive and well in the illumos project, and many of us who left Oracle have joined companies such as Delphix, Joyent, and Nexenta that are building innovative solutions on top of the enterprise-quality OS at the core of illumos.  Combined with those dedicated souls who continue to tirelessly work on the source in their free time, the community continues to grow and evolve.  We are here today because we stand on the shoulders of giants, and we will continue to improve the source and help the community make the platform successful in whatever form it may take in the future.

And the contributions continue to pour in.  There are nasty DTrace bugs and new features, new COMSTAR protocol support, TCP/IP stability and scalability fixes, ZFS data corruption and performance improvements, and much more.  And there is a ZFS working group spanning multiple platforms and companies with a diverse set of interests helping to coordinate future ZFS development.

Suffice to say that OpenSolaris is alive and well outside the walls of Oracle, so give it a spin and get involved!

Posted on May 31, 2011 at 3:45 am by eschrock · Permalink · Comments Closed
In: Illumos

Moving On

In my seven years at Sun and Oracle, I’ve had the opportunity to work with some incredible people on some truly amazing technology. But the time has come for me to move on, and today will be my last day at Oracle.

When I started in the Solaris kernel group in 2003, I had no idea that I was entering a truly unique environment – a culture of innovation and creativity that is difficult to find anywhere, let alone working on a system as old and complex as Solaris in a company as large as Sun. While there, I worked with others to reshape the operating system landscape through technologies like Zones, SMF, DTrace, and FMA, and fortunate to be part of the team that created ZFS, one of the most innovative filesystems ever. From there I became a member of the Fishworks team that created the Sun Storage 7000 series; working with such a close-knit talented team to create a groundbreaking integrated product like that was an experience that I will never forget.

I learned so much and grew in so many ways thanks to the people I had the chance to meet and work with over the past seven years. I would not be the person I am today without your help and guidance. While I am leaving Oracle, I will always be part of the community, and I look forward to our paths crossing in the future.

Despite not updating this blog as much as I’d like, I do hope to blog in the future at my new home: http://dtrace.org/blogs/eschrock.

Thanks for all the memories.

Posted on December 8, 2010 at 2:35 pm by eschrock · Permalink · Comments Closed
In: General