Generating a Publication List From Mulitple ORCiDs

Posted on May 23, 2015

A website for my research group outside of the bounds of our abysmal university web restrictions has been discussed for quite some time. It’s genesis has only really started recently though, because I’ve had too many other things on my plate and no-one else has any decent web capability. Whilst I’d much prefer to use a simple static site tool or code something completely from scratch, the other members of my team really need something a bit more point and click. Plus, I’ll be leaving the group soon so it’s in their best interest to have a well rounded CMS interface with which they can administer the site.

Looking around for the latest and greatest in CMS tools (I haven’t worked with one since the very early versions of e107 15 years ago) we came across Joomla, which incidentally was pre-installed on the shared hosting we’d just purchased for the site, and started porting across content. Now that I’m familiar with the system I can undoubtedly state that building something from scratch and teaching everyone in the group how to code would have been a better choice.

But I’m getting off topic…

Managing publication lists has apparently become yet another piece of administrative overhead for researchers. One needs a ResearcherID, Scopus Author ID, ORCiD, and a google scholar profile; not to mention your institutions’ profile of you and—if you give two shits about the corporate world, a LinkedIn.

A publication list is one of the primary items expected on a research groups website, and we really didn’t want to add yet another list to the pile. So, against my better judgment, I delved back into the underworld of PHP and hacked together a little Joomla module that pulls the publications of multiple individuals via the ORCiD public API. That information is then cleared of duplicates, sorted by year and displayed as a list. See it in action over at TCQP.Science.

<?php
//Align most sanitised to least, will prefer data from earlier in the array.
$orcids = array('0000-0002-xxxx-xxxx','0000-0002-yyyy-yyyy','0000-0002-zzzz-zzzz');
foreach ($orcids as $id) {
    // create a new cURL resource
    $ch  = curl_init();
    // set URL and other appropriate options
    $options = array(
        CURLOPT_URL => 'http://pub.orcid.org/v1.2/' . $id . '/orcid-works',
        CURLOPT_HEADER => false,
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_HTTPHEADER => array(
            'Accept: application/orcid+json'
        )
    );
    curl_setopt_array($ch, $options);
    // grab URL and pass it to the browser
    $raw = curl_exec($ch);
    // close cURL resource, and free up system resources
    curl_close($ch);
    //Decode json data
    $data  = json_decode($raw, true);
    //Grab usefull stuff and merge
    $works = $data['orcid-profile']['orcid-activities']['orcid-works']['orcid-work'];
    if (!empty($works)) {
        if ($id === reset($orcids)) {
            $mergedworks = $works;
        } else {
            $mergedworks = array_merge($mergedworks, $works);
        }
    }
}

//Get all dois
$dois = array();
foreach ($mergedworks as $key => $work) {
    if (!is_null($work['work-external-identifiers'])) {
        foreach ($work['work-external-identifiers']['work-external-identifier'] as $ids) {
            if (strcmp($ids['work-external-identifier-type'], 'DOI') == 0) {
                $dois[] = $ids['work-external-identifier-id']['value'];
            }
        }
    } else {
        unset($mergedworks[$key]); //For now, kill anything without a DOI.
    }
}

//Find all unique dois
$udois = array_unique($dois);

//Sort array by year
usort($mergedworks, function($a, $b) {
    return ($a['publication-date']['year']['value'] > $b['publication-date']['year']['value']) ? -1 : 1;
});


$curr_year = date("Y");
$output = "<h2>" . $curr_year . "</h2>";

//sanitise merged array and print results.
foreach ($mergedworks as $work) {
    $toparse = 1; //Parse by default, set to zero if there's an issue.
    //Identify Duplicates
    foreach ($work['work-external-identifiers']['work-external-identifier'] as $ids) {
        if (strcmp($ids['work-external-identifier-type'], 'DOI') == 0) {
            $doi = $ids['work-external-identifier-id']['value'];

            $key = array_search($doi, $udois); // Find where DOI is in the unique list

            unset($udois[$key]); //Found one, don't need another.
            if ($key === false) {
                $toparse = 0; //Don't parse this entry
            }
        }
    }

    //Identify Results earlier than 2011
    $year = $work['publication-date']['year']['value'];
    if ($year < '2011') {
        $toparse = 0; //Don't parse this entry
    } elseif ($year < $curr_year) {
        //As our list is sorted, we've moved to the previous year now. Separate the results.
        $curr_year = $year;
        $output .= "<br><h2>" . $curr_year . "</h2>";
    }

    //Print results (For now, not sorted by year)
    if ($toparse === 1) {
        $output .= '<b>' . $work['work-title']['title']['value'] . '</b><br>';

        if (strcmp($work['work-citation']['work-citation-type'], 'BIBTEX') == 0) {
            $bibtex = $work['work-citation']['citation'];
            $volume = '';
            $pages  = '';
            if (preg_match('/volume = {(\\d+)}/', $bibtex, $match)) {
                $volume = $match[1];
            }
            if (preg_match('/pages = {([0-9-]+)}/', $bibtex, $match)) {
                $pages = $match[1];
            }
        }

        if (!is_null($work['work-contributors'])) {
            foreach ($work['work-contributors']['contributor'] as $authors) {
                if (($authors === reset($work['work-contributors']['contributor'])) && ($authors === end($work['work-contributors']['contributor']))) {
                    $output .= $authors['credit-name']['value'] . '<br>';
                } elseif ($authors === reset($work['work-contributors']['contributor'])) {
                    $output .= $authors['credit-name']['value'];
                } elseif ($authors === end($work['work-contributors']['contributor'])) {
                    $output .= ' and ' . $authors['credit-name']['value'] . '<br>';
                } else {
                    $output .= ', ' . $authors['credit-name']['value'];
                }
            }
        } else {
            //Get authorlist from bibtex
            if (preg_match('/author = {(.+)}/', $bibtex, $match)) {
                $authorstr = $match[1];
                $authors   = explode(" and ", $authorstr);
                foreach ($authors as $author) {
                    if (($author === reset($authors)) && ($author === end($authors))) {
                        $output .= $author . '<br>';
                    } elseif ($author === reset($authors)) {
                        $output .= $author;
                    } elseif ($author === end($authors)) {
                        $output .= ' and ' . $author . '<br>';
                    } else {
                        $output .= ', ' . $author;
                    }
                }
            }
        }

        $output .= '<a href="http://dx.doi.org/' . $doi . '">' . $work['journal-title']['value'] . ' <b>' . $volume . '</b> ' . $pages . ' (' . $work['publication-date']['year']['value'] . ')</a><br>';
    }
}

echo $output;
?>

If you want to use this yourself, feel free to copypasta, or get in contact and I’ll send you a complete module for Joomla. A few things you should be aware of: put the person with the cleanest list first in the array—the script doesn’t merge fields it just takes the first it finds when checking against the unique key. Secondly, we don’t expect a drastic amount of traffic and would like the list to be as up to date as possible, so we pull this info on each page load. If you’re expecting modest traffic, perhaps look into something a little more sophisticated for the API call.


Comment

Makefiles Revisited

Writing a thesis. Writing at home, writing at uni, at work, on planes, on machines half way across the planet with Swedish keyboards attached to them. The amount of computers that have my git repository cloned on it is either impressive or terrifying depending on how you look at it. One positive thing about my tree is that a good 90-95% of my ridiculous figure count has been built in Tikz / PGFPlots. Totally source control friendly, small and portable.

What hasn’t been that nice about it ...

Continue Reading »

Converting 4K Content for Samsung UHD Televisions

Early adoption of new technologies is usually mired by complications, but what’s been shitting me lately is the trend manufacturers seem to have of pushing tech to market that doesn’t even support the main draw card of the product. I purchased a Samsung UA55F9000 UHD TV in December 2013: the month they were released, solely for the 4k capability. Fast forward a year later and I’m still fighting with the upper echelons of Samsung tech support concerning the absolute necessity of being able to use my 4k TV to play 4k ...

Continue Reading »