{"id":3038,"date":"2021-03-30T18:37:00","date_gmt":"2021-03-30T18:37:00","guid":{"rendered":"http:\/\/donutzdigital.com\/comment-scrapper-un-site-web-pour-creer-un-flux-shopping\/"},"modified":"2022-12-29T13:20:40","modified_gmt":"2022-12-29T13:20:40","slug":"comment-scrapper-un-site-web-pour-creer-un-flux-shopping","status":"publish","type":"post","link":"https:\/\/donutzdigital.com\/fr\/comment-scrapper-un-site-web-pour-creer-un-flux-shopping\/","title":{"rendered":"Comment scrapper un site web pour cr\u00e9er un flux shopping"},"content":{"rendered":"<p>Dans le cadre de la cr\u00e9ation d&#8217;une campagne Google Shopping, \u00a0il est n\u00e9cessaire de cr\u00e9er un flux qui sera int\u00e9gr\u00e9 dans Google Merchant.<\/p>\n<p>Dans certains cas, il n&#8217;est pas possible d&#8217;acc\u00e9der aux informations car :<\/p>\n<ul>\n<li>il n\u2019y a pas d\u2019expert pour extraire les donn\u00e9es de la base de donn\u00e9es du client et la formater dans le bon format.<\/li>\n<li>le client travaille sur une plateforme peu ouverte et qui n\u2019a pas de plug-in pour en extraire les donn\u00e9es.<\/li>\n<\/ul>\n<p>Nous avons rencontr\u00e9 une de ces situations avec un de nos clients. Ce dernier avait acc\u00e8s au back-end de son site, mais aucun acc\u00e8s \u00e0 la base de donn\u00e9es, et pas de fonction d\u2019export de donn\u00e9es disponible. Afin de l&#8217;aider, nous avons cr\u00e9\u00e9 un outil qui scrappe les donn\u00e9es directement de son site (back-end) et qui construit \u00e0 partir de ces infos un fichier compatible avec Google Merchant.<\/p>\n<p><strong>Recherche de la solution<\/strong><\/p>\n<p>Nous avons d\u00e9cid\u00e9 de partir de l&#8217;interface du backend et non du site internet. Nous sous sommes arr\u00eat\u00e9s sur ce choix, car il est plus facile d\u2019avoir le listing complet de la base de donn\u00e9es. Nous aurions pu aussi travailler \u00e0 partir du sitemap (\u00e0 condition qu\u2019il soit complet) et extraire les informations sur les pages produits.<\/p>\n<p>Ensuite, nous avons pass\u00e9 en revu les diff\u00e9rentes solutions de scrapping du march\u00e9. L\u2019objectif n&#8217;\u00e9tait pas de r\u00e9-inventer la roue. Apr\u00e8s quelques recherches, nous avons trouv\u00e9 une solution de scrapping qui r\u00e9pondait \u00e0 nos attentes : https:\/\/webscraper.io<\/p>\n<figure class=\"w-richtext-figure-type- \" data-rt-type=\"\" data-rt-align=\"\">\n<div><img decoding=\"async\" src=\"https:\/\/donutzdigital.com\/wp-content\/uploads\/2022\/12\/611cf1f3025e45829a07c996_5ecedf16a4f8374c2a941e82_Capture2520d25E22580259925C325A9cran25202020-05-27252025C325A0252022.43.33.png\" width=\"auto\" height=\"auto\" \/><\/div>\n<\/figure>\n<p>Nous avons commenc\u00e9 par mapper notre scrapper afin de lui dire exactement comment se comporter et comment r\u00e9cup\u00e9rer les informations sur le backend.<\/p>\n<p>Nous avons ensuite cr\u00e9\u00e9 des acc\u00e8s pour le scrapper pour qu\u2019il puisse se connecter sur le backend et r\u00e9cup\u00e9rer les informations.<\/p>\n<p>Pour l&#8217;ex\u00e9cution, nous avons opt\u00e9 pour l&#8217;utilisation d&#8217;une solution de cloud computing afin de ne pas surcharger les capacit\u00e9s de nos machines. Il nous a fallu faire tourner le script pendant une vingtaine d&#8217;heures afin de r\u00e9cup\u00e9rer toutes les informations n\u00e9cessaires.<\/p>\n<p>R\u00e9sultat. Plus de 4000 fiches produits r\u00e9cup\u00e9r\u00e9es et surtout \u00e0 jour.<\/p>\n<p>Mission accomplie.<\/p>\n<p>Si vous rencontrez ce type de besoins, n&#8217;h\u00e9sitez pas \u00e0 nous contacter pour vous aider \u00e0 trouver une solution pertinente.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Comment scrapper un flux shopping pour l&#8217;injecter dans Google Merchant Center.<\/p>\n","protected":false},"author":3,"featured_media":4701,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[48],"tags":[],"post_folder":[],"class_list":["post-3038","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized-fr"],"acf":[],"_links":{"self":[{"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/posts\/3038","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/comments?post=3038"}],"version-history":[{"count":2,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/posts\/3038\/revisions"}],"predecessor-version":[{"id":4703,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/posts\/3038\/revisions\/4703"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/media\/4701"}],"wp:attachment":[{"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/media?parent=3038"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/categories?post=3038"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/tags?post=3038"},{"taxonomy":"post_folder","embeddable":true,"href":"https:\/\/donutzdigital.com\/fr\/wp-json\/wp\/v2\/post_folder?post=3038"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}