Mapping an USB device to a SCSI one in FreeBSD

Posted on August 23, 2015
Tags: freebsd, c, devd, scsi, geom

Ever wondered what happens between a moment when you plug in your USB stick and HAL notify you of the new mountable device? Here is the story.

devd

The lowest level hardware reporting facility in FreeBSD is devd. It is a daemon, which amongst other things, notifies users whenever the device is added or removed from the tree. It uses UNIX domain socket located at /var/run/devd.seqpacket.pipe for communication with clients. As pipe's name suggests, you should use socket(PF_UNIX, SOCK_SEQPACKET, 0) to read from it. There is also /var/run/devd.pipe, which can be read with plain cat command.

Upon plugging in an USB stick, devd would produce pretty much output, so let's break it down to chunks:

!system=DEVFS subsystem=CDEV type=CREATE cdev=usb/0.2.0
!system=DEVFS subsystem=CDEV type=CREATE cdev=ugen0.2
!system=DEVFS subsystem=CDEV type=CREATE cdev=usb/0.2.1
!system=DEVFS subsystem=CDEV type=CREATE cdev=usb/0.2.2

These lines tells us that corresponding cdev was created in /dev/ directory. In our case these are:

# ls /dev/usb/0.2*
/dev/usb/0.2.0 /dev/usb/0.2.1 /dev/usb/0.2.2
# ls /dev/ugen0.2
/dev/ugen0.2

Next come USB subsystem events:

!system=USB subsystem=DEVICE type=ATTACH ugen=ugen0.2 cdev=ugen0.2 vendor=0x8564 product=0x1000 devclass=0x00 devsubclass=0x00 sernum="S46OYER6" release=0x0100 mode=host port=1 parent=ugen0.1
!system=USB subsystem=INTERFACE type=ATTACH ugen=ugen0.2 cdev=ugen0.2 vendor=0x8564 product=0x1000 devclass=0x00 devsubclass=0x00 sernum="S46OYER6" release=0x0100 mode=host interface=0 endpoints=2 intclass=0x08 intsubclass=0x06 intprotocol=0x50

Finally, the device addition event, in which we are mainly interested in:

+umass0 at bus=0 hubaddr=1 port=1 devaddr=2 interface=0 ugen=ugen0.2 vendor=0x8564 product=0x1000 devclass=0x00 devsubclass=0x00 sernum="S46OYER6" release=0x0100 mode=host intclass=0x08 intsubclass=0x06 intprotocol=0x50 on uhub0

First symbol is an event type, + means that a new device was added. It is immediately followed by an USB driver name and unit number, which are umass0 in this case. Generally, you can consult man page for a driver name, like man umass, to get more information on the topic.

There might be more output from devd:

!system=DEVFS subsystem=CDEV type=CREATE cdev=pass2
!system=DEVFS subsystem=CDEV type=CREATE cdev=da0
!system=GEOM subsystem=DEV type=CREATE cdev=da0

This tells us that two more cdevs were created - a direct access device (da), which you are surely familiar with and SCSI passthrough device (pass), which gets created for every SCSI device in the system. The da0 we get there is a device we are looking for, but we can’t find out which added device it belongs to yet. The last event reports that GEOM object named da0 have been created and ready to be queried. We will cover GEOM later.

Events like these:

!system=DEVFS subsystem=CDEV type=CREATE cdev=diskid/DISK-blablabla
!system=GEOM subsystem=DEV type=CREATE cdev=diskid/DISK-blablabla

are reported when GEOM detects a filesystem label and creates corresponding cdevs for it. See glabel man page.

camcontrol

Now we need to query SCSI layer to set correspondence between umass0 and da0. First, I'll present the method using command line tool camcontrol. Calling camcontrol devlist -b will produce a list of active SCSI bus adapters:

# camcontrol devlist -b
scbus0 on ahcich0 bus 0
scbus1 on ahcich1 bus 0
scbus2 on umass-sim0 bus 0
...

In this list we need to find a scbus unit which umass-sim device is attached to. The unit number of an umass-sim device (0 in this case) should be the same as umass' one we are looking for.

Now, camcontrol devlist gives us peripherial device names attached to every scbus:

# camcontrol devlist
<ST9250315AS 0002SDM1>         at scbus0 target 0 lun 0 (ada0,pass0)
<MATSHITA DVD-RAM U880AS 1.21> at scbus1 target 0 lun 0 (pass1,cd0)
<JetFlash Transcend 4GB 8.07>  at scbus2 target 0 lun 0 (da0,pass2)
...

From this we can finally find out the unit number of the SCSI direct access driver, which can then be used to query USB stick's filesystem type with fstyp and mount it.


Now let's look on how to obtain this information programmatically. The code presented here was adapted from camcontrol's source, /usr/src/sbin/camcontrol/camcontrol.c. First, parse the input data:

int req_unit;

if(argc < 2)
{
    printf("not enough arguments\n");
    return 1;
}

res = sscanf(argv[1], "umass%d", &req_unit);
if(!res || res==-1)
{
    printf("bad usb driver name\n");
    return 1;
}

Since we are working with umass devices only, the only information we need is its unit number, which is parsed using sscanf.

After that, we need to open the /dev/xpt0 device, which provides an interface to FreeBSD's CAM transport layer. See man xpt for more details.

#define XPT_DEVICE          "/dev/xpt0"
int xpt;

if ((xpt = open(XPT_DEVICE, O_RDWR)) == -1) {
    printf("couldn't open xpt device: %s\t", strerror(errno));
    return 1;
}

Communication with xpt is done via union ccb type, defined in /usr/include/cam/cam_ccb.h. CCB stands for "CAM Control Block" and union ccb type wraps specific struct ccb_* blocks into a single one. While comment for union ccb says that it shouldn't be used when operating on xpt, this is exactly the type used by camcontrol. For us, interesting fields are struct ccb_hdr and struct ccb_dev_match. The first one is a common part of every CCB and the second one is used for searching device tree by some pattern.

Before querying we need to prepare our CCB:

#define N_RESULTS   100

union ccb ccb;
struct dev_match_result matches[N_RESULTS];

bzero(&ccb, sizeof(union ccb));

ccb.ccb_h.path_id = CAM_XPT_PATH_ID;
ccb.ccb_h.target_id = CAM_TARGET_WILDCARD;
ccb.ccb_h.target_lun = CAM_LUN_WILDCARD;

ccb.ccb_h.func_code = XPT_DEV_MATCH;
ccb.cdm.match_buf_len = sizeof(struct dev_match_result) * N_RESULTS;
ccb.cdm.matches = matches;
ccb.cdm.num_matches = 0;

ccb.cdm.num_patterns = 0;
ccb.cdm.pattern_buf_len = 0;

Here values of CAM_TARGET_WILDCARD and CAM_LUN_WILDCARD work like star symbol in regular expressions. The ccb.ccb_h.func_code field sets the type of our CCB to ccb_dev_match. Last two lines tells CAM that there are no search patterns we want it to match against, so it would return all nodes of the device tree.

Ok, we are ready to perform the query. This is done using ioctl() function call:

/*
 * We do the ioctl multiple times if necessary, in case there are
 * more than 100 nodes in the EDT.
 */
do
{
    if (ioctl(xpt, CAMIOCOMMAND, &ccb) == -1)
    {
        printf("error sending CAMIOCOMMAND ioctl: %s", strerror(errno));
        return 1;
    }

    if ((ccb.ccb_h.status != CAM_REQ_CMP)
        || ((ccb.cdm.status != CAM_DEV_MATCH_LAST)
        && (ccb.cdm.status != CAM_DEV_MATCH_MORE)))
    {
        printf("got CAM error %#x, CDM error %d\n",
               ccb.ccb_h.status, ccb.cdm.status);
        return 1;
    }

As a result of this operation, our CCB gets filled with information about present SCSI devices. Now we need to filter out all nodes that are not SCSI buses:

for (int i = 0; i < ccb.cdm.num_matches; i++) {
    switch (ccb.cdm.matches[i].type)
    {
    case DEV_MATCH_BUS:
        {
            struct bus_match_result *match =
                &ccb.cdm.matches[i].result.bus_result;
            if(!strcmp("umass-sim", match->dev_name)
            && req_unit == match->unit_number)
                found_path = match->path_id;

            break;
        }
    default: break;
    }
}

if (found_path == -1)
{
    printf("not found\n");
    return 1;
}

This part of code corresponds to camcontrol devlist -b command:

Finally, we walk device tree again searching for match of type DEV_MATCH_PERIPH. For USB sticks the only peripherals are da and pass, which gets attached to every SCSI device. We filter the latter out and get the driver name.

for (int i = 0; i < ccb.cdm.num_matches; i++)
    if(ccb.cdm.matches[i].type == DEV_MATCH_PERIPH)
    {
        struct periph_match_result *match =
            &ccb.cdm.matches[i].result.periph_result;
           
        if(match->path_id == found_path
           && strcmp("pass", match->periph_name))
            printf("%s%i", match->periph_name, match->unit_number);
    }

The full source listing is available here. Don’t forget to add -lcam to the compiler command line.

GEOM

GEOM is a framework to define and execute transformations of IO requests to storage devices. These transformations are expressed in form of GEOM classes. An overview of GEOM infrastructure can be found in the man page geom(4).

The main command line interface to GEOM is geom(8). There are numerous utilities, like glabel(8), that are just shortcuts for the geom command with appropriate subcommand. On the other side, geom command is more generic, because you can pass arbitrary class names to it:

# geom disk list
Geom name: cd0
Providers:
1. Name: cd0
   Mediasize: 0 (0B)
   Sectorsize: 2048
   Mode: r0w0e0
   descr: MATSHITA DVD-RAM UJ880AS
   ident: (null)
   rotationrate: unknown
   fwsectors: 0
   fwheads: 0

Geom name: ada0
Providers:
1. Name: ada0
   Mediasize: 250059350016 (233G)
   Sectorsize: 512
   Mode: r1w1e2
   descr: ST9250315AS
   lunid: 5000c50018b4f855
   ident: 5VC72YD0
   rotationrate: 5400
   fwsectors: 63
   fwheads: 16

# gdisk list
gdisk: Command not found.

The geom utility is pretty straightforward to use, so here I'll show some examples of working with libgeom(3) API instead. I used it in my bsdisks project to gather information about attached devices.

Let's start with requesting whole GEOM hierarchy of classes:

#include <libgeom.h>

int main()
{
    gmesh mesh;
    gclass* cls = NULL;
    int error;
    
    if((error = geom_gettree(&mesh)) != 0)
    {
        puts("Cannot get GEOM tree");
        return;
    }
    
    LIST_FOREACH(cls, &mesh.lg_class, lg_class)
    {
        ...
    }
}

The LIST_FOREACH macro is used to iterate through a list-like structure of mesh.lg_class by a field and is defined in queue.h. On every iteration cls is filled with a GEOM class structure, which can be used to list all objects belonging to this class. Let's find find out all block devices which are partitions:

LIST_FOREACH(cls, &mesh.lg_class, lg_class)
{
    if(strcmp(cls->lg_name, "PART") == 0)
    {
        ggeom* obj = NULL;
        LIST_FOREACH(obj, &cls->lg_geom, lg_geom)
        {
            char* object_name = obj->lg_name;
        }
    }
}

Each object has a “config” - another list-like field, which we can inspect using the same LIST_FOREACH. This is what you see in geom <class> list output before the “Providers” list. So, let's use it to get the partition type, “scheme” field:

char* object_name = obj->lg_name;
char* partition_type = NULL;
gconfig* config_item = NULL;

LIST_FOREACH(config_item, &obj->lg_config, lg_config)
{
    if(strcmp(config_item->lg_name, "scheme") == 0)
        partition_type = config_item->lg_val;
}

Finally, every object has a notion of “providers”. What that “provider-consumer” relation really mean depends on the GEOM class. For instance, if you have a PART object (say, ada0), then its providers are partitions (ada0p1, ada0p2, etc.). You can also view them with geom command, and, of course, retrieve them via API:

char* object_name = obj->lg_name;
gprovider* prv = NULL;

LIST_FOREACH(prv, &obj->lg_provider, lg_provider)
{
}

The gprovider holds its own gconfig, so you can iterate through it in the same manner.

When we are done working with GEOM tree, we need to free it:

geom_deletetree(&mesh);

The full source that also works with DISK and LABEL classes is here.