Web Crawling

webmagic-core

us.codecraft : webmagic-core

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: 0.7.5

Release Date:

webmagic-extension

us.codecraft : webmagic-extension

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: 0.7.5

Release Date:

WebMagic :: Spring Boot :: Starter

in.hocg.boot : webmagic-spring-boot-starter

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.

Last Version: 1.0.45

Release Date:

WebMagic :: Spring Boot :: AutoConfigure

in.hocg.boot : webmagic-spring-boot-autoconfigure

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.

Last Version: 1.0.45

Release Date:

Mybatis Plus :: Extensions :: WebMagic :: Spring Boot :: Starter

in.hocg.boot : mybatis-plus-extensions-webmagic-spring-boot-starter

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.

Last Version: 1.0.45

Release Date:

Mybatis Plus :: Extensions :: WebMagic

in.hocg.boot : mybatis-plus-extensions-webmagic-autoconfigure

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.

Last Version: 1.0.45

Release Date:

webmagic-selenium

us.codecraft : webmagic-selenium

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: 0.7.5

Release Date:

Last Version: 2.4

Release Date:

edu.uci.ics:crawler4j

edu.uci.ics : crawler4j

Open Source Web Crawler for Java

Last Version: 4.4.0

Release Date:

webmagic-core

com.github.ancienter : webmagic-core

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: v2020.6.17

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

webmagic-extension

com.github.ancienter : webmagic-extension

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: v2020.6.17

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

crawler4j

com.goikosoft.crawler4j : crawler4j

crawler4j: Open Source Web Crawler for Java. Modified by Dario Goikoetxea to add POST capabilities

Last Version: 4.5.11

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

webmagic-saxon

us.codecraft : webmagic-saxon

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: 0.7.5

Release Date:

webmagic-samples

us.codecraft : webmagic-samples

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: 0.7.5

Release Date:

webmagic-scripts

us.codecraft : webmagic-scripts

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: 0.7.5

Release Date:

Last Version: 1.0

Release Date:

Last Version: 1.0

Release Date:

Last Version: 4.9.1

Release Date:

edu.uci.ics:crawler4j-examples-base

edu.uci.ics : crawler4j-examples-base

Open Source Web Crawler for Java - base examples

Last Version: 4.4.0

Release Date:

Last Version: 4.4.0

Release Date:

Last Version: 4.9.1

Release Date:

crawler4j

net.s17t : crawler4j

Open Source Web Crawler for Java

Last Version: 1.0.0

Release Date:

edu.uci.ics:crawler4j-examples-postgres

edu.uci.ics : crawler4j-examples-postgres

Open Source Web Crawler for Java - example with jdbc and Postgres

Last Version: 4.4.0

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

Last Version: 4.9.1

Release Date:

crawler4j

com.blacklocus : crawler4j

Open Source Web Crawler for Java

Last Version: 3.3.2

Release Date:

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

storm-crawler-external

com.digitalpebble.stormcrawler : storm-crawler-external

A collection of resources for building low-latency, scalable web crawlers on Apache Storm.

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

storm-crawler

com.digitalpebble.stormcrawler : storm-crawler

A collection of resources for building low-latency, scalable web crawlers on Apache Storm.

Last Version: 2.4

Release Date:

storm-crawler-elasticsearch-archetype

com.digitalpebble.stormcrawler : storm-crawler-elasticsearch-archetype

A collection of resources for building low-latency, scalable web crawlers on Apache Storm.

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

storm-crawler-archetype

com.digitalpebble.stormcrawler : storm-crawler-archetype

A collection of resources for building low-latency, scalable web crawlers on Apache Storm.

Last Version: 2.4

Release Date:

Last Version: 2.4

Release Date:

Last Version: 1.0

Release Date:

webmagic-saxon

com.github.ancienter : webmagic-saxon

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

Last Version: v2020.6.17

Release Date:

  • 1